SSE3
Encyclopedia
SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions (PNI), is the third iteration of the SSE
Streaming SIMD Extensions
In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! . SSE contains 70 new instructions, most of which work on single precision floating point...

 instruction set for the IA-32
IA-32
IA-32 , also known as x86-32, i386 or x86, is the CISC instruction-set architecture of Intel's most commercially successful microprocessors, and was first implemented in the Intel 80386 as a 32-bit extension of x86 architecture...

 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revision of their Pentium 4
Pentium 4
Pentium 4 was a line of single-core desktop and laptop central processing units , introduced by Intel on November 20, 2000 and shipped through August 8, 2008. They had a 7th-generation x86 microarchitecture, called NetBurst, which was the company's first all-new design since the introduction of the...

 CPU. In April 2005, AMD introduced a subset of SSE3 in revision E (Venice and San Diego) of their Athlon 64
Athlon 64
The Athlon 64 is an eighth-generation, AMD64-architecture microprocessor produced by AMD, released on September 23, 2003. It is the third processor to bear the name Athlon, and the immediate successor to the Athlon XP...

 CPUs. The earlier SIMD
SIMD
Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...

 instruction sets on the x86 platform, from oldest to newest, are MMX, 3DNow!
3DNow!
3DNow! is an extension to the x86 instruction set developed by Advanced Micro Devices . It adds single instruction multiple data instructions to the base x86 instruction set, enabling it to perform simple vector processing, which improves the performance of many graphic-intensive applications...

 (developed by AMD), SSE
Streaming SIMD Extensions
In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! . SSE contains 70 new instructions, most of which work on single precision floating point...

 and SSE2
SSE2
SSE2, Streaming SIMD Extensions 2, is one of the Intel SIMD processor supplementary instruction sets first introduced by Intel with the initial version of the Pentium 4 in 2001. It extends the earlier SSE instruction set, and is intended to fully supplant MMX. Intel extended SSE2 to create SSE3...

.

SSE3 contains 13 new instructions over SSE2
SSE2
SSE2, Streaming SIMD Extensions 2, is one of the Intel SIMD processor supplementary instruction sets first introduced by Intel with the initial version of the Pentium 4 in 2001. It extends the earlier SSE instruction set, and is intended to fully supplant MMX. Intel extended SSE2 to create SSE3...

.

Changes

The most notable change is the capability to work horizontally in a register, as opposed to the more or less strictly vertical operation of all previous SSE instructions. More specifically, instructions to add and subtract the multiple values stored within a single register have been added. These instructions simplify the implementation of a number of DSP
Digital signal processing
Digital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...

 and 3D
3D computer graphics
3D computer graphics are graphics that use a three-dimensional representation of geometric data that is stored in the computer for the purposes of performing calculations and rendering 2D images...

 operations. There is also a new instruction to convert floating point values to integers without having to change the global rounding mode, thus avoiding costly pipeline
Instruction pipeline
An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....

 stalls. Finally, the extension adds LDDQU, an alternative misaligned integer vector load that has better performance on NetBurst based platforms for loads that cross cacheline boundaries.

CPUs with SSE3

  • AMD:
    • Athlon 64
      Athlon 64
      The Athlon 64 is an eighth-generation, AMD64-architecture microprocessor produced by AMD, released on September 23, 2003. It is the third processor to bear the name Athlon, and the immediate successor to the Athlon XP...

       (since Venice Stepping E3 and San Diego Stepping E4)
    • Athlon 64 X2
      Athlon 64 X2
      The Athlon 64 X2 is the first dual-core desktop CPU designed by AMD. It was designed from scratch as native dual-core by using an already multi-CPU enabled Athlon 64, joining it with another functional core on one die, and connecting both via a shared dual-channel memory controller/north bridge and...

    • Athlon 64 FX
      Athlon 64
      The Athlon 64 is an eighth-generation, AMD64-architecture microprocessor produced by AMD, released on September 23, 2003. It is the third processor to bear the name Athlon, and the immediate successor to the Athlon XP...

       (since San Diego Stepping E4)
    • Opteron
      Opteron
      Opteron is AMD's x86 server and workstation processor line, and was the first processor which supported the AMD64 instruction set architecture . It was released on April 22, 2003 with the SledgeHammer core and was intended to compete in the server and workstation markets, particularly in the same...

       (since Stepping E4)
    • Sempron
      Sempron
      Sempron has been the marketing name used by AMD for several different budget desktop CPUs, using several different technologies and CPU socket formats. The Sempron replaced the AMD Duron processor and competes against Intel's Celeron series of processors...

       (since Palermo. Stepping E3)
    • Phenom
      Phenom
      Phenom is a progressive rock group from Bangalore, India, notable for being one of the first Indian rock groups to release their work under a Creative Commons license .-The College Years:...

    • Phenom II
      Phenom II
      Phenom II is a family of AMD's multi-core 45 nm processors using the AMD K10 microarchitecture, succeeding the original Phenom. Advanced Micro Devices released the Socket AM2+ version of Phenom II in December 2008, while Socket AM3 versions with DDR3 support, along with an initial batch of...

    • Athlon II
      Athlon II
      Athlon II is a family of AMD multi-core 45 nm central processing units, which is aimed at the midrange to budget market and is a complementary product lineup to the Phenom II.-Features:...

    • Turion 64
    • Turion 64 X2
  • Intel:
    • Celeron D
    • Celeron
      Celeron
      Celeron is a brand name given by Intel Corp. to a number of different x86 computer microprocessor models targeted at budget personal computers....

       (starting with Core microarchitecture)
    • Pentium 4
      Pentium 4
      Pentium 4 was a line of single-core desktop and laptop central processing units , introduced by Intel on November 20, 2000 and shipped through August 8, 2008. They had a 7th-generation x86 microarchitecture, called NetBurst, which was the company's first all-new design since the introduction of the...

       (since Prescott)
    • Pentium D
      Pentium D
      The Pentium D brand refers to two series of desktop dual-core 64-bit x86-64 microprocessors with the NetBurst microarchitecture manufactured by Intel. Each CPU comprised two dies, each containing a single core, residing next to each other on a multi-chip module package. The brand's first processor,...

    • Pentium Extreme Edition (but NOT Pentium 4 Extreme Edition)
    • Pentium Dual-Core
    • Pentium
      Pentium
      The original Pentium microprocessor was introduced on March 22, 1993. Its microarchitecture, deemed P5, was Intel's fifth-generation and first superscalar x86 microarchitecture. As a direct extension of the 80486 architecture, it included dual integer pipelines, a faster FPU, wider data bus,...

       (starting with Core microarchitecture)
    • Core
      Intel Core
      Yonah was the code name for Intel's first generation of 65 nm process mobile microprocessors, based on the Banias/Dothan-core Pentium M microarchitecture. SIMD performance has been improved through the addition of SSE3 instructions and improvements to SSE and SSE2 implementations, while integer...

    • Xeon
      Xeon
      The Xeon is a brand of multiprocessing- or multi-socket-capable x86 microprocessors from Intel Corporation targeted at the non-consumer server, workstation and embedded system markets.-Overview:...

       (since Nocona)
    • Atom
      Intel Atom
      Intel Atom is the brand name for a line of ultra-low-voltage x86 and x86-64 CPUs from Intel, designed in 45 nm CMOS and used mainly in netbooks, nettops, embedded application ranging from health care to advanced robotics and Mobile Internet devices...

  • VIA
    VIA Technologies
    VIA Technologies is a Taiwanese manufacturer of integrated circuits, mainly motherboard chipsets, CPUs, and memory, and is part of the Formosa Plastics Group. It is the world's largest independent manufacturer of motherboard chipsets...

    /Centaur
    Centaur Technology
    Centaur Technology is an x86 CPU design company, now a wholly owned subsidiary of VIA Technologies, a member of the Formosa Plastics Group, Taiwan's largest industrial conglomerate.-History:...

    :
    • C7
      VIA C7
      The VIA C7 is an x86 central processing unit designed by Centaur Technology and sold by VIA Technologies.- Product history :The C7 delivers a number of improvements to the older VIA C3 cores but is nearly identical to the latest VIA C3 Nehemiah core. The C7 was officially launched in May 2005,...

    • Nano
      VIA Nano
      The VIA Nano is a 64-bit CPU for personal computers. The VIA Nano was released by VIA Technologies in 2008 after five years of development by its CPU division, Centaur Technology...

  • Transmeta
    Transmeta
    Transmeta Corporation was a US-based corporation that licensed low power semiconductor intellectual property. Transmeta originally produced very long instruction word code morphing microprocessors, with a focus on reducing power consumption in electronic devices. It was founded in 1995 by Bob...

    • Efficeon
      Efficeon
      The Efficeon processor is Transmeta's second-generation 256-bit VLIW design which employs a software engine to convert code written for x86 processors to the native instruction set of the chip...

       TM88xx (NOT Model Numbers TM86xx)

Common instructions

Arithmetic
  • ADDSUBPD — (Add-Subtract-Packed-Double)
    • Input: { A0, A1 }, { B0, B1 }
    • Output: { A0 − B0, A1 + B1 }
  • ADDSUBPS — (Add-Subtract-Packed-Single)
    • Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 }
    • Output: { A0 − B0, A1 + B1, A2 − B2, A3 + B3 }

AOS ( Array Of Structures )
  • HADDPD — (Horizontal-Add-Packed-Double)
    • Input: { A0, A1 }, { B0, B1 }
    • Output: { A0 + A1, B0 + B1 }
  • HADDPS (Horizontal-Add-Packed-Single)
    • Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 }
    • Output: { A0 + A1, A2 + A3, B0 + B1, B2 + B3 }
  • HSUBPD — (Horizontal-Subtract-Packed-Double)
    • Input: { A0, A1 }, { B0, B1 }
    • Output: { A0 − A1, B0 − B1 }
  • HSUBPS — (Horizontal-Subtract-Packed-Single)
    • Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 }
    • Output: { A0 − A1, A2 − A3, B0 − B1, B2 − B3 }
  • LDDQU — As stated above, this is an alternative misaligned integer vector load. It can be helpful for video compression tasks.
  • MOVDDUP, MOVSHDUP, MOVSLDUP — These are also used for complex numbers, and can be helpful for wave calculation like sound.
  • FISTTP — Like the older x87 FISTP instruction, but ignores the floating point control register's rounding mode settings and uses the "chop" (truncate) mode instead. Allows omission of the expensive loading and re-loading of the control register in languages such as C where float-to-int conversion requires truncate behaviour by standard.

Intel instructions

  • MONITOR, MWAIT - These optimize multi-threaded applications, giving processors with Hyper-Threading
    Hyper-threading
    Hyper-threading is Intel's term for its simultaneous multithreading implementation in its Atom, Intel Core i3/i5/i7, Itanium, Pentium 4 and Xeon CPUs....

    better performance.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK