Memory ordering
Encyclopedia
Memory ordering is a group of properties of the modern microprocessor
s, characterising their possibilities in memory operations reordering. It is a type of out-of-order execution
. Memory reordering can be used to fully utilize different cache and memory bank
s.
On most modern uniprocessors memory operations are not executed in the order specified by the program code. But from the programmer's point of view, all operations appear to have been executed in the order specified, with all inconsistencies hidden by hardware.
On some CPUs atomic operations can be reordered with Loads and Stores.
Also, there can be
Some older x86 and AMD systems have weaker memory ordering
SPARC memory ordering modes:
asm volatile("" ::: "memory");
or even
__asm__ __volatile__ ("" ::: "memory");
forbids GCC
compiler to reorder read and write commands around it.
__memory_barrier
intrinsics.
_ReadWriteBarrier
lfence (asm), void_mm_lfence(void)
sfence (asm), void_mm_sfence(void)
mfence (asm), void_mm_mfence(void)
sync (asm)
dcs (asm)
dmb (asm)
GCC since version 4.1.0 and intel c++ compiler have special builtin for calling full hardware memory barrier:
__sync_synchronize.
Asm memory barrier (see above, "Compiler memory barrier") is also issued by this builtin in GCC;
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...
s, characterising their possibilities in memory operations reordering. It is a type of out-of-order execution
Out-of-order execution
In computer engineering, out-of-order execution is a paradigm used in most high-performance microprocessors to make use of instruction cycles that would otherwise be wasted by a certain type of costly delay...
. Memory reordering can be used to fully utilize different cache and memory bank
Memory Bank
A memory bank is a logical unit of storage in electronics, which is hardware dependent. In computer the memory bank may be determined by the memory access controller and the CPU along with physical organization of the hardware memory slots....
s.
On most modern uniprocessors memory operations are not executed in the order specified by the program code. But from the programmer's point of view, all operations appear to have been executed in the order specified, with all inconsistencies hidden by hardware.
In SMP microprocessor systems
There are several memory-consistency models for SMP systems:- sequential consistency (All reads and all writes are in-order)
- relaxed consistency (Some types of reordering are allowed)
- Loads can be reordered after Loads (for better working of cache coherency, better scaling)
- Loads can be reordered after Stores
- Stores can be reordered after Stores
- Stores can be reordered after Loads
- weak consistency (Reads and Writes are arbitrarily reordered, limited only by explicit memory barrierMemory barrierMemory barrier, also known as membar or memory fence or fence instruction, is a type of barrier and a class of instruction which causes a central processing unit or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction.CPUs employ...
s)
On some CPUs atomic operations can be reordered with Loads and Stores.
Also, there can be
- Dependent Loads Reordered is unique for Alpha. This processor can fetch data before it fetches pointer to this data. It make cache hardware simpler and faster, but leads to the requirement of memory barriers for readers and writers.
- Incoherent Instruction cache pipeline (which prevent self-modifying codeSelf-modifying codeIn computer science, self-modifying code is code that alters its own instructions while it is executing - usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, thus simplifying maintenance...
to be executed without special ICache flush/reload instructions)
Type | Alpha | ARMv7 | PA-RISC | POWER | SPARC RMO | SPARC PSO | SPARC TSO | x86 | x86 oostore | AMD64 | IA64 | zSeries |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Loads reordered after Loads | Y | Y | Y | Y | Y | Y | Y | |||||
Loads reordered after Stores | Y | Y | Y | Y | Y | Y | Y | |||||
Stores reordered after Stores | Y | Y | Y | Y | Y | Y | Y | Y | ||||
Stores reordered after Loads | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y |
Atomic reordered with Loads | Y | Y | Y | Y | Y | |||||||
Atomic reordered with Stores | Y | Y | Y | Y | Y | Y | ||||||
Dependent Loads reordered | Y | |||||||||||
Incoherent Instruction cache pipeline | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y |
Some older x86 and AMD systems have weaker memory ordering
SPARC memory ordering modes:
- SPARC TSO = total-store order (default)
- SPARC RMO = relaxed-memory order (not supported on recent CPUs)
- SPARC PSO = partial store order (not supported on recent CPUs)
Compiler memory barrier
These barriers prevent a compiler from reordering instructions, they do not prevent reordering by CPU.- The GNU inline assembler statement
asm volatile("" ::: "memory");
or even
__asm__ __volatile__ ("" ::: "memory");
forbids GCC
GNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...
compiler to reorder read and write commands around it.
- Intel ECC compilerIntel C++ CompilerIntel C++ Compiler is a group of C and C++ compilers from Intel Corporation available for GNU/Linux, Mac OS X, and Microsoft Windows....
uses "full compiler fence"
__memory_barrier
intrinsics.
- Microsoft Visual C++ Compiler:
_ReadWriteBarrier
Hardware memory barrier
Many architectures with SMP support have special hardware instruction for flushing reads and writes.- x86, x86-64X86-64x86-64 is an extension of the x86 instruction set. It supports vastly larger virtual and physical address spaces than are possible on x86, thereby allowing programmers to conveniently work with much larger data sets. x86-64 also provides 64-bit general purpose registers and numerous other...
lfence (asm), void_mm_lfence(void)
sfence (asm), void_mm_sfence(void)
mfence (asm), void_mm_mfence(void)
- PowerPCPowerPCPowerPC is a RISC architecture created by the 1991 Apple–IBM–Motorola alliance, known as AIM...
sync (asm)
- POWERIBM POWERPOWER is a reduced instruction set computer instruction set architecture developed by IBM. The name is an acronym for Performance Optimization With Enhanced RISC....
dcs (asm)
- ARMv7
dmb (asm)
GCC since version 4.1.0 and intel c++ compiler have special builtin for calling full hardware memory barrier:
__sync_synchronize.
Asm memory barrier (see above, "Compiler memory barrier") is also issued by this builtin in GCC;
Further reading
- Computer Architecture — A quantitative approach. 4th edition. J Hennessy, D Patterson, 2007. Chapter 4.6
- Sarita V. Adve, Kourosh Gharachorloo, Shared Memory Consistency Models: A Tutorial
- Intel® 64 Architecture Memory Ordering White Paper
- Memory ordering in Modern Microprocessors part 1
- Memory ordering in Modern Microprocessors part 2