Data structures in assembly language, as in high-level languages, are nested structures composed of references, structs, and arrays. These structures are often dynamically allocated, which gives programs the capability of adjusting their memory usage to the immediate needs.
A data item that is used in several places in a program can exist as multiple copies, or a single copy can be referenced from different places. References are useful for data that will be updated frequently - only one copy needs to be updated.
References are widely used in natural languages and are supported in many ways in high-level programming languages. In assembly languages, the basic reference mechanism is the memory address, which is the mechanism behind pointers in high-level languages.
A computer can enhance this mechanism by defining addressing modes. For many RISC processors, the only memory addressing modes are base-displacement and direct addressing. The first specifies an address by adding a displacement that is coded into the instruction to the contents of a base register. Direct addressing is a simpler form in which the displacement is 0.
In high-level languages, structs, also known as records and objects, have the following characteristics.
Individual items are called members or sometimes fields.
The members may have different types and sizes.
The members are accessed with a member name, as in S.m, where S is the name of a struct and m is the name of one of its members.
In assembly language, structs are handled as follows.
The members are allocated contiguously in memory. You may need to reorder them to meet alignment restrictions. You should record the displacement for each member in a comment.
The lowest address of any of the members in a struct is called the base address of the struct.
The address of S.m is (the base address of S) + (the displacement for m).
Before accessing a member S.m, you need an instruction that places the base address into a register. For statically declared structs, this instruction is typically called "la", an abbreviation for "load address". Its first operand specifies a destination register for the base address. Its second operand is usually the label for the struct.
Then use a load or store instruction for the access.
On many machines, the operand is the displacement followed by the
parenthesized name of the register that contains the address.
For example, in MAL if the address register is $t0 and the
displacement is 20 then the operand is 20($t0)
This is called base-displacement addressing.
In high-level languages, array have the following characteristics.
Individual items are called entries.
The entries are all of the same type, and thus have the same size.
The entries are accessed by an integer index, as in A[i], where A is the name of an array and i is an integer or an expression whose value is an integer.
In assembly language, arrays are handled as follows.
The entries of an array are allocated contiguously in memory, sequentially by index.
The lowest address of any of the entries in an array is called the base address of the array.
The address of A[i] is (the base address of A) + i*(the entry size).
Before a random access to an entry, you need instructions that compute the entry address into an address register using the above formula. For statically declared arrays, the base address can be obtained with a "load address" instruction, which is often abbreviated as "la". Its first operand specifies a destination register for the base address. Its second operand is usually the label for the array.
Then use a load or store instruction for the access.
On many machines, the operand is the parenthesized name of the
register that contains the address.
For example, in MAL if the address register is $t0 then the
operand is ($t0)
This is called direct addressing.
On many processors, this is just base-displacement addressing
with a zero displacement.
For sequential access, the base address can be placed in an address register prior to the sequential processing loop. The register is incremented by the entry size at the end of each iteration.
Data structures can be nested in two ways directly and indirectly. With direct nesting, the entire inner structure is contained in the outer structure. With indirect nesting, the outer structure only contains a reference to the inner structure.
int A[][] = { { 0, 1, 2 }, { 3, 4, 5 } };
int* A[] = { { 0, 1, 2 }, { 3, 4, 5 } };
int[] A = { new int[] { 0, 1, 2 }, new int[] { 3, 4, 5 } };
In assembly language, data structures can be nested either directly or indirectly. For direct nesting, the entire inner structure is included in the outer structure. Direct nesting is usually only done for multidimensional arrays. For indirect nesting, the address of the inner structure is included in the outer structure.
Data access in the inner structure is just like data access in a statically declared structure except that a load instruction is used instead of a load address instruction to get the base address into a register.
Allocating a struct or array dynamically (at run time) requires an operating system call. The system call code depends on the operating system.