SSE5
Encyclopedia
The SSE5 was an instruction set extension proposed by AMD
Advanced Micro Devices
Advanced Micro Devices, Inc. or AMD is an American multinational semiconductor company based in Sunnyvale, California, that develops computer processors and related technologies for commercial and consumer markets...

 on August 30, 2007 as a supplement to the 128-bit SSE
Streaming SIMD Extensions
In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! . SSE contains 70 new instructions, most of which work on single precision floating point...

 core instructions in the AMD64 architecture.

AMD chose not to implement SSE5 as originally proposed. In May 2009, AMD replaced SSE5 with three smaller instruction set extensions named as XOP
XOP instruction set
The XOP instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set for the Bulldozer processor core, which was released on October 12th, 2011....

, FMA4
FMA instruction set
The FMA instruction set is the name of a future extension to the 128-bit SIMD instructions in the X86 microprocessor instruction set to perform fused multiply–add operations...

, and CVT16
CVT16 instruction set
The CVT16 instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set.CVT16 is a revision of part of the SSE5 instruction set proposal announced on August 30, 2007...

, which retain the proposed functionality of SSE5, but encode the instructions differently for better compatibility with Intel's proposed AVX
Advanced Vector Extensions
Advanced Vector Extensions is an extension to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Westmere processor shipping in Q1 2011 and now by AMD with the Bulldozer processor shipping in Q3 2011.AVX...

 instruction set.

The three SSE5-derived instruction sets were introduced in the Bulldozer
Bulldozer (processor)
Bulldozer is the codename Advanced Micro Devices has given to one of the next-generation CPU cores after the K10 microarchitecture for the company's M-SPACE design methodology, with the core specifically aimed at 10-watt to 125-watt TDP computing products. Bulldozer is a completely new design...

 processor core, released in October 2011 on a 32nm
32 nanometer
The 32 nm process is the step following the 45 nanometer process in CMOS semiconductor device fabrication. 32 nanometer refers to the average half-pitch of a memory cell at this technology level...

 process.

Compatibility

AMD's SSE5 extension bundle does not include the full set of Intel's SSE4
SSE4
SSE4 is a CPU instruction set used in the Intel Core microarchitecture and AMD K10 . It was announced on 27 September 2006 at the Fall 2006 Intel Developer Forum, with vague details in a white paper; more precise details of 47 instructions became available at the Spring 2007 Intel Developer Forum...

 instructions, making it a competitor to SSE4 rather than a successor.

This complicates software development. It is recommended practice for a program to test for the presence of instruction set extensions by means of the CPUID instruction before entering a code path which depends upon those instructions to function correctly. For maximum portability, an optimized application will require three code paths: a base code path for compatibility with older processors (from either vendor), a separately optimized Intel code path exploiting SSE4 or AVX, and a separately optimized AMD code path exploiting SSE5.

Due to this proliferation, benchmarks between Intel and AMD processors increasingly reflect the cleverness or implementation quality of the divergent code paths rather than the strength of the underlying platform.

SSE5 enhancements

The proposed SSE5 instruction set consisted of 170 instructions (including 46 base instructions), many of which are designed to improve single-threaded performance. Some SSE5 instructions are 3-operand instructions, the use of which will increase the average number of instructions per cycle
Instructions Per Cycle
In computer architecture, instructions per clock is a term used to describe one aspect of a processor's performance: the average number of instructions executed for each clock cycle...

 achievable by x86 code. Selected new instructions include:
  • Fused multiply–accumulate (FMACxx) instructions
  • Integer multiply–accumulate (IMAC, IMADC) instructions
  • Permutation (PPERM, PERMPx) and conditional move (PCMOV) instructions
  • Precision control, rounding, and conversion instructions


AMD claims SSE5 will provide dramatic performance improvements, particularly in high-performance computing
High-performance computing
High-performance computing uses supercomputers and computer clusters to solve advanced computation problems. Today, computer systems approaching the teraflops-region are counted as HPC-computers.-Overview:...

 (HPC), multimedia
Multimedia
Multimedia is media and content that uses a combination of different content forms. The term can be used as a noun or as an adjective describing a medium as having multiple content forms. The term is used in contrast to media which use only rudimentary computer display such as text-only, or...

, and computer security
Computer security
Computer security is a branch of computer technology known as information security as applied to computers and networks. The objective of computer security includes protection of information and property from theft, corruption, or natural disaster, while allowing the information and property to...

 applications, including a 5x performance gain for Advanced Encryption Standard
Advanced Encryption Standard
Advanced Encryption Standard is a specification for the encryption of electronic data. It has been adopted by the U.S. government and is now used worldwide. It supersedes DES...

 (AES) encryption and a 30% performance gain for discrete cosine transform
Discrete cosine transform
A discrete cosine transform expresses a sequence of finitely many data points in terms of a sum of cosine functions oscillating at different frequencies. DCTs are important to numerous applications in science and engineering, from lossy compression of audio and images A discrete cosine transform...

 (DCT) used to process video streams.

For more detailed information, consult the instruction sets as subsequently divided.
  • XOP
    XOP instruction set
    The XOP instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set for the Bulldozer processor core, which was released on October 12th, 2011....

    : A revision of most of the SSE5 instruction set
  • FMA4
    FMA instruction set
    The FMA instruction set is the name of a future extension to the 128-bit SIMD instructions in the X86 microprocessor instruction set to perform fused multiply–add operations...

    : Floating-point vector multiply–accumulate.
  • CVT16
    CVT16 instruction set
    The CVT16 instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set.CVT16 is a revision of part of the SSE5 instruction set proposal announced on August 30, 2007...

    : Half-precision floating-point conversion.

2009 revision

The SSE5 specification included a proposed extension to the general coding scheme of X86 instructions in order to allow instructions to have more than two operands. In 2008, Intel announced their planned AVX
Advanced Vector Extensions
Advanced Vector Extensions is an extension to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Westmere processor shipping in Q1 2011 and now by AMD with the Bulldozer processor shipping in Q3 2011.AVX...

 instruction set which proposed a different way of coding instructions with more than two operands. The two proposed coding schemes, SSE5 and AVX, are mutually incompatible, although the AVX scheme has certain advantages over the SSE5 scheme: most importantly, AVX has plenty of space for future extensions, including larger vector sizes.

In May 2009, AMD published a revised specification for the planned future instructions. This revision changes the coding scheme to make it compatible with the AVX scheme, but with a differing prefix byte in order to avoid overlap between instructions introduced by AMD and instructions introduced by Intel.

The revised instruction set no longer carries the name SSE5, which has been criticized for being misleading, but most of the instructions in the new revision are functionally identical to the original SSE5 specification - only the way the instructions are coded differs.
The planned additions to the AMD instruction set consists of three subsets:
  1. XOP
    XOP instruction set
    The XOP instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set for the Bulldozer processor core, which was released on October 12th, 2011....

    : Integer vector multiply–accumulate instructions, integer vector horizontal addition, integer vector compare, shift and rotate instructions, byte permutation and conditional move instructions, floating point fraction extraction.
  2. FMA4
    FMA instruction set
    The FMA instruction set is the name of a future extension to the 128-bit SIMD instructions in the X86 microprocessor instruction set to perform fused multiply–add operations...

    : Floating-point vector multiply–accumulate.
  3. CVT16
    CVT16 instruction set
    The CVT16 instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set.CVT16 is a revision of part of the SSE5 instruction set proposal announced on August 30, 2007...

    : Half-precision floating-point conversion.


These new instruction sets include support for future extensions for the vector size from 128 bits to 256 bits. It is unclear from these preliminary specifications whether the Bulldozer
Bulldozer (processor)
Bulldozer is the codename Advanced Micro Devices has given to one of the next-generation CPU cores after the K10 microarchitecture for the company's M-SPACE design methodology, with the core specifically aimed at 10-watt to 125-watt TDP computing products. Bulldozer is a completely new design...

 processor will support 256-bit vector registers (YMM registers)
.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK