X87
Encyclopedia
x87 is a floating point
Floating point
In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...

-related subset of the x86 architecture
X86 architecture
The term x86 refers to a family of instruction set architectures based on the Intel 8086 CPU. The 8086 was launched in 1978 as a fully 16-bit extension of Intel's 8-bit based 8080 microprocessor and also introduced segmentation to overcome the 16-bit addressing barrier of such designs...

 instruction set
Instruction set
An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...

. It originated as an extension of the 8086 instruction set in the form of optional floating point coprocessors that worked in tandem with corresponding x86 CPUs. These microchips had names ending in "87". Like other extensions to the basic instruction set, x87-instructions are not strictly needed to construct working programs, but provide hardware and microcode
Microcode
Microcode is a layer of hardware-level instructions and/or data structures involved in the implementation of higher level machine code instructions in many computers and other processors; it resides in special high-speed memory and translates machine instructions into sequences of detailed...

 implementations of common numerical tasks, allowing these tasks to be performed much faster than corresponding machine code
Machine code
Machine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit. Each instruction performs a very specific task, typically either an operation on a unit of data Machine code or machine language is a system of impartible instructions...

 routines can. The x87 instruction set includes instructions for basic floating point operations such as addition, subtraction and comparison, but also for more complex numerical operations, such as the computation of the tangent
Trigonometric function
In mathematics, the trigonometric functions are functions of an angle. They are used to relate the angles of a triangle to the lengths of the sides of a triangle...

 function and its inverse, for example.

Most x86 processors since the Intel 80486
Intel 80486
The Intel 80486 microprocessor was a higher performance follow up on the Intel 80386. Introduced in 1989, it was the first tightly pipelined x86 design as well as the first x86 chip to use more than a million transistors, due to a large on-chip cache and an integrated floating point unit...

 have had these x87 instructions implemented in the main CPU but the term is sometimes still used to refer to that part of the instruction set. Before x87 instructions were standard in PCs, compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

s or programmers had to use rather slow library calls to perform floating-point operations, a method that is still common in (low-cost) embedded system
Embedded system
An embedded system is a computer system designed for specific control functions within a larger system. often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal...

s.

Description

The x87 registers form an 8-level deep non-strict stack
Stack (data structure)
In computer science, a stack is a last in, first out abstract data type and linear data structure. A stack can have any abstract data type as an element, but is characterized by only three fundamental operations: push, pop and stack top. The push operation adds a new item to the top of the stack,...

 structure ranging from ST(0) to ST(7) with registers that can be directly accessed by either operand, using an offset relative to the top, as well as pushed and popped. (This scheme may be compared to how a stack frame may be both pushed, popped and indexed.)

There are instructions to push, calculate, and pop values on top of this stack; monadic operations (FSQRT, FPTAN etc.) then implicitly address the topmost ST(0) while dyadic
Dyadic
Dyadic may refer to:*Adicity of a mathematical relation or function *Dyadic communication* Dyadic counterpoint, the voice-against-voice conception of polyphony...

 operations (FADD, FMUL, FCOM, etc.) implicitly address ST(0) and ST(1). The non-strict stack-model also allows dyadic operations to use ST(0) together with a direct memory operand or with an explicitly specified stack-register, ST(x), in a role similar to a traditional accumulator
Accumulator (computing)
In a computer's central processing unit , an accumulator is a register in which intermediate arithmetic and logic results are stored. Without a register like an accumulator, it would be necessary to write the result of each calculation to main memory, perhaps only to be read right back again for...

 (a combined destination and left operand).

This can also be reversed on an instruction-by-instruction basis with ST(0) as the unmodified operand and ST(x) as the destination. Furthermore, the contents in ST(0) can be exchanged with another stack register using an instruction called FXCH ST(x).

These properties makes the x87 stack usable as seven freely addressable registers plus a dedicated accumulator (or as seven independent accumulators). This is especially applicable on superscalar
Superscalar
A superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate...

 x86 processors (such as the Pentium of 1993 and later) where these exchange instructions (codes D9C8..D9CFh) are optimized down to a zero clock penalty by using one of the integer paths for FXCH ST(x) in parallel with the FPU instruction. Despite being natural and convenient for human assembly language
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...

 programmers, some compiler writers have found it complicated to construct automatic code generators that schedule x87 code effectively.

The x87 provides single precision, double precision and 80-bit double-extended precision binary floating-point arithmetic as per the IEEE 754-1985 standard. By default, the x87 processors all use 80-bit double-extended precision internally (to allow for sustained precision over many calculations). A given sequence of arithmetic operations may thus behave slightly differently compared to a strict single-precision or double-precision IEEE 754 FPU. This may sometimes be problematic for some semi-numerical calculations relying on knowledge of exact FPU precision for correct operation. To avoid such problems, the x87 can be configured via a special configuration/status register to automatically round to single or double precision after each operation. Since the introduction of SSE2
SSE2
SSE2, Streaming SIMD Extensions 2, is one of the Intel SIMD processor supplementary instruction sets first introduced by Intel with the initial version of the Pentium 4 in 2001. It extends the earlier SSE instruction set, and is intended to fully supplant MMX. Intel extended SSE2 to create SSE3...

, the x87 instructions are not as essential as they once were, except for high-precision calculations demanding the 64-bit
64-bit
64-bit is a word size that defines certain classes of computer architecture, buses, memory and CPUs, and by extension the software that runs on them. 64-bit CPUs have existed in supercomputers since the 1970s and in RISC-based workstations and servers since the early 1990s...

 mantissa
Significand
The significand is part of a floating-point number, consisting of its significant digits. Depending on the interpretation of the exponent, the significand may represent an integer or a fraction.-Examples:...

 precision available in the 80-bit format.

Performance

Clock cycle counts for examples of typical x87 FPU instructions (only register-register versions shown here).

The A~B notation (minimum to maximum) covers timing variations dependent on transient pipeline status as well as the arithmetic precision chosen (32, 64 or 80 bits); it also includes variations due to numerical cases (such as the number of set bits, zero, etc.). The L→H notation depicts values corresponding to the lowest (L) and the highest (H) maximum clock frequencies that were available.
x87 implementation FADD FMUL FDIV FXCH FCOM FSQRT FPTAN FPATAN Max Clock Peak FMUL/sec Relative 5 MHz 8087§ FMUL
8087 70~100 90~145 193~203 10~15 40~50 180~186 30~540 250~800 5→10 MHz 34~55K → 100~111K 1.0 → 2.0 times as fast
80287 (original) 70~100 90~145 193~203 10~15 40~50 180~186 30~540 250~800 6→12 MHz 41~66K → 83~133K 1.2 → 2.4 times as fast
80387 (and later 287 models) 23~34 29~57 88~91 18 24 122~129 191~497 314~487 16→33 MHz 280~552K → 579~1100K approx 10 → 20 × as fast
80486 (or 80487) 8~20 16 73 4 4 83~87 200~273 218~303 16→50 MHz 1.0M → 3.1M approx 18 → 56 × as fast
Cyrix 6x86
Cyrix 6x86
The Cyrix 6x86 is a sixth-generation, 32-bit 80x86-compatible microprocessor designed by Cyrix and manufactured by IBM and SGS-Thomson. It was originally released in 1996.-Architecture:...

, Cyrix MII 
4~7 4~6 24~34 2 4 59~60 117~129 97~161 66→300 MHz 11~16M → 50~75M approx 320 → 1400 ×
AMD K6
AMD K6
The K6 microprocessor was launched by AMD in 1997. The main advantage of this particular microprocessor is that it was designed to fit into existing desktop designs for Pentium branded CPUs. It was marketed as a product which could perform as well as its Intel Pentium II equivalent but at a...

 (including K6 II/III)
2 2 todo 2 todo todo todo todo 166→550 MHz 83M → 275M approx 1500 → 5000 ×
Pentium / Pentium MMX 1~3 1~3 39 1 (0*) 1~4 70 17~173 19~134 60→300 MHz 20~60M → 100~300M approx 1100 → 5400 ×
Pentium Pro
Pentium Pro
The Pentium Pro is a sixth-generation x86 microprocessor developed and manufactured by Intel introduced in November 1, 1995 . It introduced the P6 microarchitecture and was originally intended to replace the original Pentium in a full range of applications...

 
1~3 2~5 16~56 1 (0*) 1 28~68 todo todo 150→200 MHz 30~75M → 40~100M approx 1400 → 1800 ×
Pentium II
Pentium II
The Pentium II brand refers to Intel's sixth-generation microarchitecture and x86-compatible microprocessors introduced on May 7, 1997. Containing 7.5 million transistors, the Pentium II featured an improved version of the first P6-generation core of the Pentium Pro, which contained 5.5 million...

 / III
1~3 2~5 17~38 1 (0*) 1 27~50 todo todo 233→1400 MHz 47~116M → 280~700M approx 2100 → 13000 ×
Athlon
Athlon
Athlon is the brand name applied to a series of x86-compatible microprocessors designed and manufactured by Advanced Micro Devices . The original Athlon was the first seventh-generation x86 processor and, in a first, retained the initial performance lead it had over Intel's competing processors...

 (K7)
1~4 1~4 13~24 1 (0*) 1~2 16~35 todo todo 500→2330 MHz 125~500M → 0.580~2.33G approx 9000 → 42000 ×
Pentium 4
Pentium 4
Pentium 4 was a line of single-core desktop and laptop central processing units , introduced by Intel on November 20, 2000 and shipped through August 8, 2008. They had a 7th-generation x86 microarchitecture, called NetBurst, which was the company's first all-new design since the introduction of the...

 
1~5 2~7 20~43 1 (0*) todo 20~43 todo todo 1.3→3.8 GHz 186~650M → 0.543~1.90G approx 11000 → 34000 ×
Athlon 64
Athlon 64
The Athlon 64 is an eighth-generation, AMD64-architecture microprocessor produced by AMD, released on September 23, 2003. It is the third processor to bear the name Athlon, and the immediate successor to the Athlon XP...

 (K8)
1~4 1~4 13~24 1 (0*) 1~2 16~35 todo todo 1.0→3.2 GHz 250~1000M → 0.800~3.2G approx 18000 → 58000 ×

* An effective zero clock delay is often possible, via superscalar execution.

§ The 5 MHz 8087 was the original x87 processor. Compared to typical software-implemented floating point routines on an 8086 (without an 8087), the factors would be even larger, perhaps by another factor of 10 (i.e., a correct floating point addition in assembly language may well consume over 1000 cycles).

Manufacturers

Companies that have designed and/or manufactured floating point units compatible with the Intel 8087 or later models include AMD (287, 387, 486DX, 5x86, K5, K6, K7, K8), Chips and Technologies
Chips and Technologies
Chips and Technologies was the first fabless semiconductor company, a model developed by its founder Gordon Campbell. Founded by Dado Banatao.Its first product was an EGA IBM compatible graphics chip...

 (the Super MATH coprocessors), Cyrix
Cyrix
Cyrix Corporation was a microprocessor developer that was founded in 1988 in Richardson, Texas as a specialist supplier of high-performance math coprocessors for 286 and 386 microprocessors. The company was founded by former Texas Instruments staff members and had a long but troubled relationship...

 (the FasMath, Cx87SLC, Cx87DLC, etc., 6x86, Cyrix MII), Fujitsu
Fujitsu
is a Japanese multinational information technology equipment and services company headquartered in Tokyo, Japan. It is the world's third-largest IT services provider measured by revenues....

 (early Pentium Mobile etc.), Harris Semiconductor (manufactured 80387 and 486DX processors), IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

 (various 387 and 486 designs), IDT
IDT
IDT may refer to:*Intelligent Design *Interdigital transducer, also called "interdigitated transducer", a sensor and transmitter for a surface acoustic wave*IDT An electronic band/production duo from Edinboro, PA...

 (the WinChip
WinChip
The WinChip series was a low-power Socket 7-based x86 processor designed by Centaur Technology and marketed by its parent company IDT.-Design:The design of the WinChip was quite different from other processors of the time...

, C3, C7, Nano, etc.), IIT (the 2C87, 3C87, etc.), LC Technology (the Green MATH coprocessors), National Semiconductor
National Semiconductor
National Semiconductor was an American semiconductor manufacturer, that specialized in analog devices and subsystems,formerly headquartered in Santa Clara, California, USA. The products of National Semiconductor included power management circuits, display drivers, audio and operational amplifiers,...

 (the Geode GX1, Geode GXm, etc.), NexGen
NexGen
NexGen was a private semiconductor company that designed x86 microprocessors until it was purchased by AMD in 1996.Like competitor Cyrix, NexGen was a fabless design house that designed its chips but relied on other companies for production...

 (the Nx587), Rise Technology
Rise Technology
Rise Technology, was a short lived microprocessor manufacturer that produced the Intel x86 MMX compatible mP6 processor.The Santa Clara, California based company was started by David Lin in 1993 with funding from 15 Taiwanese investors, including UMC, ACER and VIA Technologies...

 (the mP6), ST Microelectronics (manufactured 486DX, 5x86, etc.), Texas Instruments
Texas Instruments
Texas Instruments Inc. , widely known as TI, is an American company based in Dallas, Texas, United States, which develops and commercializes semiconductor and computer technology...

 (manufactured 486DX processors etc.), Transmeta
Transmeta
Transmeta Corporation was a US-based corporation that licensed low power semiconductor intellectual property. Transmeta originally produced very long instruction word code morphing microprocessors, with a focus on reducing power consumption in electronic devices. It was founded in 1995 by Bob...

 (the TM5600 and TM5800), ULSI (the Math·Co coprocessors), VIA
VIA Technologies
VIA Technologies is a Taiwanese manufacturer of integrated circuits, mainly motherboard chipsets, CPUs, and memory, and is part of the Formosa Plastics Group. It is the world's largest independent manufacturer of motherboard chipsets...

 (the C3, C7, and Nano, etc.), and Xtend (the 83S87SX-25 and other coprocessors).

8087

The 8087 was the first math coprocessor
Coprocessor
A coprocessor is a computer processor used to supplement the functions of the primary processor . Operations performed by the coprocessor may be floating point arithmetic, graphics, signal processing, string processing, or encryption. By offloading processor-intensive tasks from the main processor,...

 for 16-bit processors designed by Intel (the I8231 was older but designed for the 8-bit Intel 8080
Intel 8080
The Intel 8080 was the second 8-bit microprocessor designed and manufactured by Intel and was released in April 1974. It was an extended and enhanced variant of the earlier 8008 design, although without binary compatibility...

); it was built to be paired with the Intel 8088
Intel 8088
The Intel 8088 microprocessor was a variant of the Intel 8086 and was introduced on July 1, 1979. It had an 8-bit external data bus instead of the 16-bit bus of the 8086. The 16-bit registers and the one megabyte address range were unchanged, however...

 or 8086
Intel 8086
The 8086 is a 16-bit microprocessor chip designed by Intel between early 1976 and mid-1978, when it was released. The 8086 gave rise to the x86 architecture of Intel's future processors...

 microprocessors.

80287

The 80287 (i287) was the math coprocessor
Coprocessor
A coprocessor is a computer processor used to supplement the functions of the primary processor . Operations performed by the coprocessor may be floating point arithmetic, graphics, signal processing, string processing, or encryption. By offloading processor-intensive tasks from the main processor,...

 for the Intel 80286
Intel 80286
The Intel 80286 , introduced on 1 February 1982, was a 16-bit x86 microprocessor with 134,000 transistors. Like its contemporary simpler cousin, the 80186, it could correctly execute most software written for the earlier Intel 8086 and 8088...

 series of microprocessor
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...

s. Intel (and its competitors) later introduced an 80287XL, which was actually an 80387SX with a 287 pinout. The 80287XL contained an internal 3/2 multiplier so that motherboards which ran the coprocessor at 2/3 CPU speed could instead run the FPU at the same speed of the CPU. Other 287 models with 387-like performance were the Intel 80C287, built using CHMOS III, and the AMD 80EC287 manufactured in AMD's CMOS
CMOS
Complementary metal–oxide–semiconductor is a technology for constructing integrated circuits. CMOS technology is used in microprocessors, microcontrollers, static RAM, and other digital logic circuits...

 process, using only fully static gates.

The 80287 and 80287XL also worked with the 80386 microprocessor, and were initially the only coprocessors available for the 80386 until the introduction of the 80387 in 1987. Finally, they were also able to work with the Cyrix Cx486SLC
Cyrix Cx486SLC
The Cyrix Cx486SLC was Cyrix's first CPU offering, released after years of selling coprocessors that competed with Intel's units and offered better performance at a comparable or lower price....

. However, for both of these chips the 80387 was strongly preferred for its higher performance and the greater capability of its instruction set.

Intel's models included i80287 variants with specified upper frequency limits ranging from 6 up to 12 MHz. Later followed the i80287XL with 387 microarchitecture and the i80287XLT, a special version intended for laptops, as well as other variants.

80387

The 80387 (387 or i387) was the first Intel coprocessor to be fully compliant with the IEEE 754 standard. Released in 1987, a full two years after the 386 chip, the i387 included much improved speed over Intel's previous 8087/80287 coprocessors, and improved the characteristics of trigonometric functions. (The 80287 limited the argument range to plus or minus 45 degrees.)

Without a coprocessor, the 386 normally performed floating-point arithmetic through (slow) software routines, implemented at runtime through a software exception-handler. When a math coprocessor is paired with the 386, the coprocessor performs the floating point arithmetic in hardware, returning results much faster than an (emulating) software library call.

The i387 was compatible only with the standard i386 chip, which had a 32-bit processor bus. The later cost-reduced i386SX, which had a narrower 16-bit data bus, could not interface with the i387's 32-bit bus. The i386SX required its own coprocessor, the Intel 80387SX
Intel 80387SX
The Intel 80387SX is the math coprocessor for the Intel 80386SX microprocessor. It was used to perform floating point arithmetic operations directly in hardware. The coprocessor was designed only to work with the SX variant of the i386, rather than the standard 80386...

, which was compatible with the SX's narrower 16-bit data bus.

80187

The 80187 (80C187) was the math coprocessor for Intel 80186
Intel 80186
The 80188 is a version with an 8-bit external data bus, instead of 16-bit. This makes it less expensive to connect to peripherals. The 80188 is otherwise very similar to the 80186. It has a throughput of 1 million instructions per second....

 and 80188 CPUs. The 80187 did not appear at the same time as the 80186 and 80188, but was in fact launched after the 80287 and the 80387. Although the interface to the main processor was the same as the 8087, its core was that of the 80387, and was thus fully IWWW 754 compliant as well as capable of executing all the 80387's extra instructions . Although the 8087 was perfectly capable of operating with a 80186 or 80188, the 80187 does not work particularly well with the 8086 or 8088. There are sufficient differences that code has to be specially written to allow a 80187 and 8086/8 combination to work flawlessly.

80487

The i487SX was marketed as a floating point unit
Floating point unit
A floating-point unit is a part of a computer system specially designed to carry out operations on floating point numbers. Typical operations are addition, subtraction, multiplication, division, and square root...

 coprocessor
Coprocessor
A coprocessor is a computer processor used to supplement the functions of the primary processor . Operations performed by the coprocessor may be floating point arithmetic, graphics, signal processing, string processing, or encryption. By offloading processor-intensive tasks from the main processor,...

 for Intel i486SX machines. It actually contained a full-blown i486DX implementation. When installed into an i486SX system, the i487 disabled the main CPU and took over all CPU operations. The i487 took measures to detect the presence of an i486SX and would not function without the original CPU in place.

80587

The Nx587 was the last FPU for x86 to be manufactured separately from the CPU, in this case NexGen's Nx586.

See also

  • MMX
  • SSE
    Streaming SIMD Extensions
    In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! . SSE contains 70 new instructions, most of which work on single precision floating point...

    , SSE2
    SSE2
    SSE2, Streaming SIMD Extensions 2, is one of the Intel SIMD processor supplementary instruction sets first introduced by Intel with the initial version of the Pentium 4 in 2001. It extends the earlier SSE instruction set, and is intended to fully supplant MMX. Intel extended SSE2 to create SSE3...

    , SSE3
    SSE3
    SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions , is the third iteration of the SSE instruction set for the IA-32 architecture. Intel introduced SSE3 in early 2004 with the Prescott revision of their Pentium 4 CPU...

    , SSSE3
    SSSE3
    Supplemental Streaming SIMD Extensions 3 is a SIMD instruction set created by Intel and is the fourth iteration of the SSE technology.- History :...

    , SSE4
    SSE4
    SSE4 is a CPU instruction set used in the Intel Core microarchitecture and AMD K10 . It was announced on 27 September 2006 at the Fall 2006 Intel Developer Forum, with vague details in a white paper; more precise details of 47 instructions became available at the Spring 2007 Intel Developer Forum...

    , SSE5
    SSE5
    The SSE5 was an instruction set extension proposed by AMD on August 30, 2007 as a supplement to the 128-bit SSE core instructions in the AMD64 architecture....

  • 3DNow!
    3DNow!
    3DNow! is an extension to the x86 instruction set developed by Advanced Micro Devices . It adds single instruction multiple data instructions to the base x86 instruction set, enabling it to perform simple vector processing, which improves the performance of many graphic-intensive applications...

  • SIMD
    SIMD
    Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK