Data structure alignment
Encyclopedia
Data structure alignment is the way data is arranged and accessed in computer memory. It consists of two separate but related issues: data alignment and data structure padding. When a modern computer reads from or writes to a memory address, it will do this in word sized chunks (e.g. 4 byte chunks on a 32-bit system). Data alignment means putting the data at a memory offset equal to some multiple of the word size, which increases the system's performance due to the way the CPU handles memory. To align the data, it may be necessary to insert some meaningless bytes between the end of the last data structure and the start of the next, which is data structure padding.

For example, when the computer's word size is 4 bytes (a byte meaning 8 bits), the data to be read should be at a memory offset which is some multiple of 4. When this is not the case, e.g. the data starts at the 14th byte instead of the 16th byte, then the computer has to read two 4-byte chunks and do some calculation before the requested data has been read, or it may generate an alignment fault. Even though the previous data structure ends at the 14th byte, the next data structure should start at the 16th byte. Two padding bytes are inserted between the two data structures to align the next data structure to the 16th byte.

Although data structure alignment is a fundamental issue for all modern computers, many computer languages and computer language implementations handle data alignment automatically. Ada
Ada (programming language)
Ada is a structured, statically typed, imperative, wide-spectrum, and object-oriented high-level computer programming language, extended from Pascal and other languages...

, certain C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

 and C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 implementations, and assembly language
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...

 allow at least partial control of data structure padding, which may be useful in certain special circumstances.

Definitions

A memory
Computer memory
In computing, memory refers to the physical devices used to store programs or data on a temporary or permanent basis for use in a computer or other digital electronic device. The term primary memory is used for the information in physical systems which are fast In computing, memory refers to the...

 address a, is said to be n-byte aligned when n is a power of two and a is a multiple of n byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...

s. In this context a byte is the smallest unit of memory access, i.e. each memory address specifies a different byte. An n-byte aligned address would have log2(n) least-significant zeros when expressed in binary
Binary numeral system
The binary numeral system, or base-2 number system, represents numeric values using two symbols, 0 and 1. More specifically, the usual base-2 system is a positional notation with a radix of 2...

.

The alternate wording b-bit aligned designates a b/8 byte aligned address (ex. 64-bit aligned is 8 bytes aligned).

A memory access is said to be aligned when the datum being accessed is n bytes long and the datum address is n-byte aligned. When a memory access is not aligned, it is said to be misaligned. Note that by definition byte memory accesses are always aligned.

A memory pointer that refers to primitive data that is n bytes long is said to be aligned if it is only allowed to contain addresses that are n-byte aligned, otherwise it is said to be unaligned. A memory pointer that refers to a data aggregate (a data structure or array) is aligned if (and only if) each primitive datum in the aggregate is aligned.

Note that the definitions above assume that each primitive datum is a power of two bytes long. When this is not the case (as with 80-bit floating-point on x86) the context influences the conditions where the datum is considered aligned or not.

Problems

A computer accesses memory by a single memory word at a time. As long as the memory word size is at least as large as the largest primitive data type supported by the computer, aligned accesses will always access a single memory word. This may not be true for misaligned data accesses.

If the highest and lowest bytes in a datum are not within the same memory word the computer must split the datum access into multiple memory accesses. This requires a lot of complex circuitry to generate the memory accesses and coordinate them. To handle the case where the memory words are in different memory pages the processor must either verify that both pages are present before executing the instruction or be able to handle a TLB
Translation Lookaside Buffer
A translation lookaside buffer is a CPU cache that memory management hardware uses to improve virtual address translation speed. All current desktop and server processors use a TLB to map virtual and physical address spaces, and it is ubiquitous in any hardware which utilizes virtual memory.The...

 miss or a page fault
Page fault
A page fault is a trap to the software raised by the hardware when a program accesses a page that is mapped in the virtual address space, but not loaded in physical memory. In the typical case the operating system tries to handle the page fault by making the required page accessible at a location...

 on any memory access during the instruction execution.

When a single memory word is accessed the operation is atomic, i.e. the whole memory word is read or written at once and other devices must wait until the read or write operation completes before they can access it. This may not be true for unaligned accesses to multiple memory words, e.g. the first word might be read by one device, both words written by another device and then the second word read by the first device so that the value read is neither the original value nor the updated value. Although such failures are rare, they can be very difficult to identify.

RISC

Most RISC processors will generate an alignment fault when a load or store instruction accesses a misaligned address. This allows the operating system to emulate the misaligned access using other instructions. For example, the alignment fault handler might use byte loads or stores (which are always aligned) to emulate a larger load or store instruction.

Some architectures like MIPS
MIPS architecture
MIPS is a reduced instruction set computer instruction set architecture developed by MIPS Technologies . The early MIPS architectures were 32-bit, and later versions were 64-bit...

 have special unaligned load and store instructions. One unaligned load instruction gets the bytes from the memory word with the lowest byte address and another gets the bytes from the memory word with the highest byte address. Similarly, store-high and store-low instructions store the appropriate bytes in the higher and lower memory words respectively.

The Alpha
DEC Alpha
Alpha, originally known as Alpha AXP, is a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations. Alpha was implemented in microprocessors...

 architecture has a two-step approach to unaligned loads and stores. The first step is to load the upper and lower memory words into separate registers. The second step is to extract or modify the memory words using special low/high instructions similar to the MIPS instructions. An unaligned store is completed by storing the modified memory words back to memory. The reason for this complexity is that the original Alpha architecture could only read or write 32-bit or 64-bit values. This proved to be a severe limitation that often led to code bloat and poor performance. To address this limitation, an extension called the Byte Word Extensions (BWX) was added to the original architecture. It consisted of instructions for byte and word loads and stores.

Because these instructions are larger and slower than the normal memory load and store instructions they should only be used when necessary. Most C and C++ compilers have an “unaligned” attribute that can be applied to pointers that need the unaligned instructions.

x86 and x86-64

While the x86 architecture originally did not require aligned memory access and still works without it, SSE2
SSE2
SSE2, Streaming SIMD Extensions 2, is one of the Intel SIMD processor supplementary instruction sets first introduced by Intel with the initial version of the Pentium 4 in 2001. It extends the earlier SSE instruction set, and is intended to fully supplant MMX. Intel extended SSE2 to create SSE3...

 instructions on x86 CPUs do require the data to be 128-bit (16-byte) aligned and there can be substantial performance advantages from using aligned data on these architectures. However, there are also instructions for unaligned access such as MOVDQU.

Compatibility

The advantage to supporting unaligned access is that it is easier to write compilers that do not need to align memory, at the expense of the cost of slower access. One way to increase performance in RISC processors which are designed to maximize raw performance is to require data to be loaded or
stored on a word boundary. So though memory is commonly addressed by 8 bit bytes, loading a 32 bit integer or 64 bit floating point number would be required to start at every 64 bits on a 64 bit machine. The processor could flag a fault if it were asked to load a number which was not on such a boundary, but this would result in a slower call to a routine which would need to figure out which word or words contained the data and extract the equivalent value.

Data structure padding

Although the compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

 (or interpreter
Interpreter (computing)
In computer science, an interpreter normally means a computer program that executes, i.e. performs, instructions written in a programming language...

) normally allocates individual data items on aligned boundaries, data structures often have members with different alignment requirements. To maintain proper alignment the translator normally inserts additional unnamed data members so that each member is properly aligned. In addition the data structure as a whole may be padded with a final unnamed member. This allows each member of an array of structures to be properly aligned.

Padding is only inserted when a structure
Structure
Structure is a fundamental, tangible or intangible notion referring to the recognition, observation, nature, and permanence of patterns and relationships of entities. This notion may itself be an object, such as a built structure, or an attribute, such as the structure of society...

 member is followed by a member with a larger alignment requirement or at the end of the structure. By changing the ordering of members in a structure, it is possible to change the amount of padding required to maintain alignment. For example, if members are sorted by ascending or descending alignment requirements a minimal amount of padding is required. The minimal amount of padding required is always less than the largest alignment in the structure. Computing the maximum amount of padding required is more complicated, but is always less than the sum of the alignment requirements for all members minus twice the sum of the alignment requirements for the least aligned half of the structure members.

Although C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

 and C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 do not allow the compiler to reorder structure members to save space, other languages might. It is also possible to tell most C and C++ compilers to "pack" the members of a structure to a certain level of alignment, e.g. "pack(2)" means align data members larger than a byte to a two-byte boundary so that any padding members are at most one byte long.

One use for such "packed" structures is to conserve memory. For example, a structure containing a single byte and a four-byte integer would require three additional bytes of padding. A large array of such structures would use 37.5% less memory if they are packed, although accessing each structure might take longer. This compromise may be considered a form of space-time tradeoff
Space-time tradeoff
In computer science, a space–time or time–memory tradeoff is a situation where the memory use can be reduced at the cost of slower program execution...

.

Although use of "packed" structures is most frequently used to conserve memory space, it may also be used to format a data structure for transmission using a standard protocol. However in this usage, care must also be taken to ensure that the values of the struct members are stored with the endianness
Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...

 required by the protocol (often network byte order), which may be different from the endianness used natively by the host machine.

Computing padding

The following formulas provide the number of padding bytes required to align the start of a data structure (where mod is the modulo
Modulo operation
In computing, the modulo operation finds the remainder of division of one number by another.Given two positive numbers, and , a modulo n can be thought of as the remainder, on division of a by n...

 operator):
# pseudo-code, see actual code below
padding = align - (offset mod align)
new offset = offset + padding = offset + align - (offset mod align)
For example, the padding to add to offset 0x59d for a structure aligned to every 4 bytes is 3. The structure will then start at 0x5a0, which is a multiple of 4. Note that when offset already is a multiple of align, taking the modulo of align - (offset mod align) is required to get a padding of 0.

If the alignment is a power of two, the modulo operation can be reduced to a bitwise boolean AND operation. The following formulas provide the new offset (where & is a bitwise AND and ~ a bitwise NOT):
padding = align - (offset & (align - 1)) = (-offset) & (align - 1)
new offset = (offset + align - 1) & ~(align - 1)

Typical alignment of C structs on x86

Data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

 members are stored sequentially in a memory so that in the structure below the member Data1 will always precede Data2 and Data2 will always precede Data3:


struct MyData
{
short Data1;
short Data2;
short Data3;
};


If the type "short" is stored in two bytes of memory then each member of the data structure depicted above would be 2-byte aligned. Data1 would be at offset 0, Data2 at offset 2 and Data3 at offset 4. The size of this structure would be 6 bytes.

The type of each member of the structure usually has a default alignment, meaning that it will, unless otherwise requested by the programmer, be aligned on a pre-determined boundary. The following typical alignments are valid for compilers from Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

 (Visual C++
Visual C++
Microsoft Visual C++ is a commercial , integrated development environment product from Microsoft for the C, C++, and C++/CLI programming languages...

), Borland
Borland
Borland Software Corporation is a software company first headquartered in Scotts Valley, California, Cupertino, California and finally Austin, Texas. It is now a Micro Focus subsidiary. It was founded in 1983 by Niels Jensen, Ole Henriksen, Mogens Glad and Philippe Kahn.-The 1980s:...

/CodeGear
CodeGear
CodeGear is a wholly owned division of Embarcadero Technologies. CodeGear develops software development tools such as the Delphi IDE, the programming language Delphi, and the database server InterBase. Originally a division of Borland Software Corporation, it was launched on 14 November 2006...

 (C++Builder), Digital Mars
Digital Mars
Digital Mars is a small American software company owned by Walter Bright that makes C and C++ compilers for Windows and DOS. They also distribute the compilers for free on their web site....

 (DMC) and GNU
GNU
GNU is a Unix-like computer operating system developed by the GNU project, ultimately aiming to be a "complete Unix-compatible software system"...

 (GCC
GNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...

) when compiling for 32-bit x86:
  • A char (one byte) will be 1-byte aligned.
  • A short (two bytes) will be 2-byte aligned.
  • An int (four bytes) will be 4-byte aligned.
  • A long (four bytes) will be 4-byte aligned.
  • A float (four bytes) will be 4-byte aligned.
  • A double (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux (8-byte with -malign-double compile time option).
  • A long double (ten bytes with C++Builder and DMC, eight bytes with Visual C++, twelve bytes with GCC) will be 8-byte aligned with C++Builder, 2-byte aligned with DMC, 8-byte aligned with Visual C++ and 4-byte aligned with GCC.
  • Any pointer (four bytes) will be 4-byte aligned. (e.g.: char*, int*)


The only notable difference in alignment for a 64-bit system when compared to a 32-bit system is:
  • A long (eight bytes) will be 8-byte aligned.
  • A double (eight bytes) will be 8-byte aligned.
  • A long double (eight bytes with Visual C++, sixteen bytes with GCC) will be 8-byte aligned with Visual C++ and 16-byte aligned with GCC.
  • Any pointer (eight bytes) will be 8-byte aligned.


Here is a structure with members of various types, totaling 8 bytes before compilation:


struct MixedData
{
char Data1;
short Data2;
int Data3;
char Data4;
};


After compilation the data structure will be supplemented with padding bytes to ensure a proper alignment for each of its members:


struct MixedData /* After compilation in 32-bit x86 machine */
{
char Data1; /* 1 byte */
char Padding1[1]; /* 1 byte for the following 'short' to be aligned on a 2 byte boundary
assuming that the address where structure begins is an even number */
short Data2; /* 2 bytes */
int Data3; /* 4 bytes - largest structure member */
char Data4; /* 1 byte */
char Padding2[3]; /* 3 bytes to make total size of the structure 12 bytes */
};


The compiled size of the structure is now 12 bytes. It is important to note that the last member is padded with the number of bytes required so that the total size of the structure should be a multiple of the largest alignment of any structure member (alignment(int) in this case, which = 4 on linux-32bit/gcc). In this case 3 bytes are added to the last member to pad the structure to the size of a 12 bytes (alignment(int) × 3).


struct FinalPad {
double x;
char n[1];
};


In this example the total size of the structure sizeof(FinalPad) = 12 not 9 (so that the size is a multiple of 4 (alignment(double) = 4 on linux-32bit/gcc)).


struct FinalPadShort {
short s;
char n[3];
};


In this example the total size of the structure sizeof(FinalPadShort) = 6 not 5 (not 8 either) (so that the size is a multiple of 2 (alignment(short) = 2 on linux-32bit/gcc)).

It is possible to change the alignment of structures to reduce the memory they require (or to conform to an existing format) by reordering structure members or changing the compiler’s alignment (or “packing”) of structure members.


struct MixedData /* after reordering */
{
char Data1;
char Data4; /* reordered */
short Data2;
int Data3;
};


The compiled size of the structure now matches the pre-compiled size of 8 bytes. Note that Padding1[1] has been replaced (and thus eliminated) by Data4 and Padding2[3] is no longer necessary as the structure is already aligned to the size of a long word.

The alternative method of enforcing the MixedData structure to be aligned to a one byte boundary will cause the pre-processor to discard the pre-determined alignment of the structure members and thus no padding bytes would be inserted.

While there is no standard way of defining the alignment of structure members, some compilers use #pragma directives to specify packing inside source files. Here is an example:

  1. pragma pack(push) /* push current alignment to stack */
  2. pragma pack(1) /* set alignment to 1 byte boundary */


struct MyPackedData
{
char Data1;
long Data2;
char Data3;
};
  1. pragma pack(pop) /* restore original alignment from stack */



This structure would have a compiled size of 6 bytes on a 32-bit system. The above directives are available in compilers from Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

http://msdn.microsoft.com/en-us/library/2e70t5y1(VS.80).aspx, Borland
Borland
Borland Software Corporation is a software company first headquartered in Scotts Valley, California, Cupertino, California and finally Austin, Texas. It is now a Micro Focus subsidiary. It was founded in 1983 by Niels Jensen, Ole Henriksen, Mogens Glad and Philippe Kahn.-The 1980s:...

, GNU
GNU
GNU is a Unix-like computer operating system developed by the GNU project, ultimately aiming to be a "complete Unix-compatible software system"...

http://gcc.gnu.org/onlinedocs/gcc/Structure_002dPacking-Pragmas.html and many others.

Default packing and #pragma pack

On some Microsoft compilers, particularly for the RISC processor, there is a relationship between project default packing (the /Zp directive) and the #pragma pack directive which is unexpected for most people.

The #pragma pack directive can only be used to reduce the packing size of a structure from the project default packing. This leads to interoperability problems with library headers which use for example #pragma pack(8) if you set a project packing to smaller than this. The MSDN documentation states that if the #pragma pack packing is larger than or equal to the project packing, it will be ignored.

For this reason, one should never set a project packing to any value other than the default of 8 bytes, as it would break the #pragma pack directives used in library headers and result in binary incompatibilities between structures.

In particular, setting /Zp1 breaks all #pragma pack directives other than #pragma pack(1).

However, this limitation is not present when compiling for desktop processors, such as the x86 architecture .

Allocating memory aligned to cache lines

It would be beneficial to allocate memory aligned to cache lines. If an array is partitioned for more than one thread to operate on, having the sub-array boundaries unaligned to cache lines could lead to performance degradation. Here is an example to allocate memory (double array of size 10) aligned to cache of 64 bytes.
  1. include

double *foo(void) {
double *var;//create array of size 10
int ok;

ok = posix_memalign((void**)&var, 64, 10*sizeof(double));

if(ok

0)
return NULL;

return var;
}

Hardware significance of alignment requirements
Alignment concerns can affect areas much larger than a C structure when the purpose is the efficient mapping of that area through a hardware address translation mechanism (PCI remapping, operation of a MMU).

For instance, on a 32bit operating system a 4KB page is not just an arbitrary 4KB chunk of data. Instead, it is usually a region of memory that's aligned on a 4KB boundary. This is because aligning a page on a page-sized boundary lets the hardware map a virtual address to a physical address by substituting the higher bits in the address, rather than doing complex arithmetic.

Example: Assume that we have a TLB mapping of virtual address 0x2cfc7000 to physical address 0x12345000. (Note that both these addresses are aligned at 4KB boundaries.) Accessing data located at virtual address va=0x2cfc7abc causes a TLB resolution of 0x2cfc7 to 0x12345 to issue a physical access to pa=0x12345abc. Here, the 20/12bit split luckily match the hexadecimal representation split at 5/3 digits. The hardware can implement this translation by simply combining the first 20 bits of the physical address (0x12345) and the last 12 bits of the virtual address (0xabc). This is also referred to as virtually indexed(abc) physically tagged(12345).

A block of data of size 2^(n+1)-1 always has one sub-block of size 2^n aligned on 2^n bytes.

This is how a dynamic allocator that has no knowledge of alignment, can be used to provide aligned buffers, at the price of a factor two in data loss.


Example: get a 12bit aligned 4KBytes buffer with malloc

// unaligned pointer to large area
void *up=malloc((1<<13)-1);
// well aligned pointer to 4KBytes
void *ap=aligntonext(up,12);

where aligntonext is meant as:
move p to the right until next well aligned address if
not correct already. A possible implementation is

// PSEUDOCODE assumes uint32_t p,bits; for readability
// --- not typesafe, not side-effect safe
  1. define alignto(p,bits) (p>>bits<
  2. define aligntonext(p,bits) alignto((p+(1<


External links
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK