FMA instruction set
Encyclopedia
The FMA instruction set is the name of a future extension to the 128-bit SIMD
Streaming SIMD Extensions
In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! . SSE contains 70 new instructions, most of which work on single precision floating point...

 instructions in the X86 microprocessor
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...

 instruction set
Instruction set
An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...

 to perform fused multiply–add (FMA) operations. Two different variants of FMA instruction sets will be used:
  • FMA3 will be supported by Intel
    Intel Corporation
    Intel Corporation is an American multinational semiconductor chip maker corporation headquartered in Santa Clara, California, United States and the world's largest semiconductor chip maker, based on revenue. It is the inventor of the x86 series of microprocessors, the processors found in most...

     in their Haswell processors in 2013 & Broadwell processors in 2014
  • FMA4 will be supported in AMD
    Advanced Micro Devices
    Advanced Micro Devices, Inc. or AMD is an American multinational semiconductor company based in Sunnyvale, California, that develops computer processors and related technologies for commercial and consumer markets...

     processors from 2011.

New instructions

The FMA3 and FMA4 instruction sets have almost identical functionality but are not mutually compatible. Both contain fused multiply–add (FMA) instructions for floating point
Floating point
In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...

 scalar and SIMD
SIMD
Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...

 operations. It will take some time for compilers to find mechanisms to cope with the differences and optimize code accordingly.

Compatibility issue

The difference between FMA3 and FMA4 concerns the issue of whether the instruction can have three or four different operands. The FMA operation has the form:



The 4-operand form (FMA4) allows a, b, c and d to be four different registers, while the 3-operand form (FMA3) requires that d is the same register as either a, b or c. The 3-operand form makes the code shorter and the hardware implementation slightly simpler while the 4-operand form provides more programming flexibility.

See XOP instruction set for more discussion of compatibility issues between Intel and AMD.

CPUs with FMA3

  • Intel
    • Intel will introduce hardware FMA in processors based on Haswell (microarchitecture) during 2013.
  • AMD
    • AMD will support FMA3 in the future for compatibility reasons if Intel sticks to FMA3 only.. It is rumoured that the second version of Bulldozer
      Bulldozer (processor)
      Bulldozer is the codename Advanced Micro Devices has given to one of the next-generation CPU cores after the K10 microarchitecture for the company's M-SPACE design methodology, with the core specifically aimed at 10-watt to 125-watt TDP computing products. Bulldozer is a completely new design...

       processor core, codenamed Piledriver, which will arrive in 2012, might support FMA3.

Excerpt from FMA3

Mnemonic (AT&T) Operands Operation
VFMADD132PDy ymm, ymm, ymm/m256 $0 = $0×$2 + $1
VFMADD132PSy
VFMADD132PDx xmm, xmm, xmm/m128
VFMADD132PSx
VFMADD132SD xmm, xmm, xmm/m64
VFMADD132SS xmm, xmm, xmm/m32
VFMADD213PDy ymm, ymm, ymm/m256 $0 = $1×$0 + $2
VFMADD213PSy
VFMADD213PDx xmm, xmm, xmm/m128
VFMADD213PSx
VFMADD213SD xmm, xmm, xmm/m64
VFMADD213SS xmm, xmm, xmm/m32
VFMADD231PDy ymm, ymm, ymm/m256 $0 = $1×$2 + $0
VFMADD231PSy
VFMADD231PDx xmm, xmm, xmm/m128
VFMADD231PSx
VFMADD231SD xmm, xmm, xmm/m64
VFMADD231SS xmm, xmm, xmm/m32

CPUs with FMA4

  • AMD
    • Bulldozer
      Bulldozer (processor)
      Bulldozer is the codename Advanced Micro Devices has given to one of the next-generation CPU cores after the K10 microarchitecture for the company's M-SPACE design methodology, with the core specifically aimed at 10-watt to 125-watt TDP computing products. Bulldozer is a completely new design...

       processor core, due to begin production in 2011.
  • Intel
    • It is uncertain whether future Intel processors will support FMA4, due to Intel's announced change to FMA3.

Excerpt from FMA4

Mnemonic (AT&T) Operands Operation
VFMADDPDx xmm, xmm, xmm/m128, xmm/m128 $0 = $1×$2 + $3
VFMADDPDy ymm, ymm, ymm/m256, ymm/m256
VFMADDPSx xmm, xmm, xmm/m128, xmm/m128
VFMADDPSy ymm, ymm, ymm/m256, ymm/m256
VFMADDSD xmm, xmm, xmm/m64, xmm/m64
VFMADDSS xmm, xmm, xmm/m32, xmm/m32

History

The incompatibility between Intel's FMA3 and AMD's FMA4 is due to both companies changing plans without coordinating coding details with each other. AMD changed their plans from FMA3 to FMA4 while Intel changed their plans from FMA4 to FMA3 almost at the same time. The history can be summarized as follows:
  • August 2007: AMD announces the SSE5
    SSE5
    The SSE5 was an instruction set extension proposed by AMD on August 30, 2007 as a supplement to the 128-bit SSE core instructions in the AMD64 architecture....

     instruction set, which includes 3-operand FMA instructions. A new coding scheme (DREX) is introduced for allowing instructions to have three operands.
  • April 2008: Intel announces their AVX
    Advanced Vector Extensions
    Advanced Vector Extensions is an extension to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Westmere processor shipping in Q1 2011 and now by AMD with the Bulldozer processor shipping in Q3 2011.AVX...

     and FMA instruction sets, including 4-operand FMA instructions. The coding of these instructions uses the new VEX
    VEX prefix
    The VEX prefix and VEX coding scheme is a proposed future extension to the x86 instruction set architecture for microprocessors from Intel, AMD and others.-Features:...

    coding scheme which is more flexible than AMD's DREX scheme.
  • December 2008: Intel changes the specification for their FMA instructions from 4-operand to 3-operand instructions. The VEX coding scheme is still used.
  • May 2009: AMD changes the specification of their FMA instructions from the 3-operand DREX form to the 4-operand VEX form, compatible with the April 2008 Intel specification rather than the December 2008 Intel specification.


It is currently uncertain whether the 3-operand VEX coded form (here called FMA3) or the 4-operand form (FMA4) will be the dominating standard in the future. It is also possible that future processors will support both forms.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK