Memory ordering
Encyclopedia
Memory ordering is a group of properties of the modern microprocessor
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...

s, characterising their possibilities in memory operations reordering. It is a type of out-of-order execution
Out-of-order execution
In computer engineering, out-of-order execution is a paradigm used in most high-performance microprocessors to make use of instruction cycles that would otherwise be wasted by a certain type of costly delay...

. Memory reordering can be used to fully utilize different cache and memory bank
Memory Bank
A memory bank is a logical unit of storage in electronics, which is hardware dependent. In computer the memory bank may be determined by the memory access controller and the CPU along with physical organization of the hardware memory slots....

s.

On most modern uniprocessors memory operations are not executed in the order specified by the program code. But from the programmer's point of view, all operations appear to have been executed in the order specified, with all inconsistencies hidden by hardware.

In SMP microprocessor systems

There are several memory-consistency models for SMP systems:
  • sequential consistency (All reads and all writes are in-order)
  • relaxed consistency (Some types of reordering are allowed)
    • Loads can be reordered after Loads (for better working of cache coherency, better scaling)
    • Loads can be reordered after Stores
    • Stores can be reordered after Stores
    • Stores can be reordered after Loads
  • weak consistency (Reads and Writes are arbitrarily reordered, limited only by explicit memory barrier
    Memory barrier
    Memory barrier, also known as membar or memory fence or fence instruction, is a type of barrier and a class of instruction which causes a central processing unit or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction.CPUs employ...

    s)


On some CPUs atomic operations can be reordered with Loads and Stores.

Also, there can be
  • Dependent Loads Reordered is unique for Alpha. This processor can fetch data before it fetches pointer to this data. It make cache hardware simpler and faster, but leads to the requirement of memory barriers for readers and writers.
  • Incoherent Instruction cache pipeline (which prevent self-modifying code
    Self-modifying code
    In computer science, self-modifying code is code that alters its own instructions while it is executing - usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, thus simplifying maintenance...

     to be executed without special ICache flush/reload instructions)

Memory ordering in some architectures
Type Alpha ARMv7 PA-RISC POWER SPARC RMO SPARC PSO SPARC TSO x86 x86 oostore AMD64 IA64 zSeries
Loads reordered after Loads Y Y Y Y Y Y Y
Loads reordered after Stores Y Y Y Y Y Y Y
Stores reordered after Stores Y Y Y Y Y Y Y Y
Stores reordered after Loads Y Y Y Y Y Y Y Y Y Y Y Y
Atomic reordered with Loads Y Y Y Y Y
Atomic reordered with Stores Y Y Y Y Y Y
Dependent Loads reordered Y
Incoherent Instruction cache pipeline Y Y Y Y Y Y Y Y Y Y


Some older x86 and AMD systems have weaker memory ordering

SPARC memory ordering modes:
  • SPARC TSO = total-store order (default)
  • SPARC RMO = relaxed-memory order (not supported on recent CPUs)
  • SPARC PSO = partial store order (not supported on recent CPUs)

Compiler memory barrier

These barriers prevent a compiler from reordering instructions, they do not prevent reordering by CPU.
  • The GNU inline assembler statement

asm volatile("" ::: "memory");
or even
__asm__ __volatile__ ("" ::: "memory");
forbids GCC
GNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...

 compiler to reorder read and write commands around it.
  • Intel ECC compiler
    Intel C++ Compiler
    Intel C++ Compiler is a group of C and C++ compilers from Intel Corporation available for GNU/Linux, Mac OS X, and Microsoft Windows....

     uses "full compiler fence"

__memory_barrier
intrinsics.
  • Microsoft Visual C++ Compiler:

_ReadWriteBarrier

Hardware memory barrier

Many architectures with SMP support have special hardware instruction for flushing reads and writes.
  • x86, x86-64
    X86-64
    x86-64 is an extension of the x86 instruction set. It supports vastly larger virtual and physical address spaces than are possible on x86, thereby allowing programmers to conveniently work with much larger data sets. x86-64 also provides 64-bit general purpose registers and numerous other...


lfence (asm), void_mm_lfence(void)
sfence (asm), void_mm_sfence(void)
mfence (asm), void_mm_mfence(void)
  • PowerPC
    PowerPC
    PowerPC is a RISC architecture created by the 1991 Apple–IBM–Motorola alliance, known as AIM...


sync (asm)
  • POWER
    IBM POWER
    POWER is a reduced instruction set computer instruction set architecture developed by IBM. The name is an acronym for Performance Optimization With Enhanced RISC....


dcs (asm)
  • ARMv7

dmb (asm)

GCC since version 4.1.0 and intel c++ compiler have special builtin for calling full hardware memory barrier:
__sync_synchronize.

Asm memory barrier (see above, "Compiler memory barrier") is also issued by this builtin in GCC;

Further reading

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK