Pentium Pro
Encyclopedia
The Pentium Pro is a sixth-generation x86 microprocessor
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...

 developed and manufactured by Intel introduced in November 1, 1995 http://www.nytimes.com/1995/11/02/business/intel-offers-its-pentium-pro-for-work-station-market.html. It introduced the P6 microarchitecture
P6 (microarchitecture)
The P6 microarchitecture is the sixth generation Intel x86 microarchitecture, implemented by the Pentium Pro microprocessor that was introduced in November 1995. It is sometimes referred to as i686. It was succeeded by the NetBurst microarchitecture in 2000, but eventually revived in the Pentium M...

 (sometime referred as i686) and was originally intended to replace the original Pentium
Pentium compatible processor
A Pentium compatible processor is a 32-bit processor computer chip which supports the instructions in the IA-32 instruction set that were implemented by the Intel P5 Pentium processor family...

 in a full range of applications. While the Pentium and Pentium MMX had 3.1 and 4.5 million transistor
Transistor
A transistor is a semiconductor device used to amplify and switch electronic signals and power. It is composed of a semiconductor material with at least three terminals for connection to an external circuit. A voltage or current applied to one pair of the transistor's terminals changes the current...

s, respectively, the Pentium Pro contained 5.5 million transistors. Later, it was reduced to a more narrow role as a server and high-end desktop processor and was used in supercomputer
Supercomputer
A supercomputer is a computer at the frontline of current processing capacity, particularly speed of calculation.Supercomputers are used for highly calculation-intensive tasks such as problems including quantum physics, weather forecasting, climate research, molecular modeling A supercomputer is a...

s like ASCI Red
ASCI Red
ASCI Red was the first computer built under the Advanced Strategic Computing Initiative . ASCI Red was built by Intel and installed at Sandia in late 1996. The design was based on the Intel Paragon computer...

. The Pentium Pro was capable of both dual- and quad-processor configurations. It only came in one form factor, the relatively large rectangular Socket 8
Socket 8
The Socket 8 CPU socket was used exclusively with the Intel Pentium Pro and Pentium II Overdrive computer processors. Intel discontinued Socket 8 in favor of Slot 1 with the introduction of the Pentium II.-Technical specifications:...

. The Pentium Pro was succeeded by the Pentium II Xeon in 1998. The Pentium Pro processor powered ASCI Red
ASCI Red
ASCI Red was the first computer built under the Advanced Strategic Computing Initiative . ASCI Red was built by Intel and installed at Sandia in late 1996. The design was based on the Intel Paragon computer...

, the first computer to reach the TeraFlop performance mark.

Microarchitecture

Summary

Belying its name, the Pentium Pro had a completely new microarchitecture
Microarchitecture
In computer engineering, microarchitecture , also called computer organization, is the way a given instruction set architecture is implemented on a processor. A given ISA may be implemented with different microarchitectures. Implementations might vary due to different goals of a given design or...

, a departure from the Pentium rather than an extension of it.
It has a decoupled, 12-stage superpipelined architecture which uses an instruction pool.
The Pentium Pro (P6
P6 (microarchitecture)
The P6 microarchitecture is the sixth generation Intel x86 microarchitecture, implemented by the Pentium Pro microprocessor that was introduced in November 1995. It is sometimes referred to as i686. It was succeeded by the NetBurst microarchitecture in 2000, but eventually revived in the Pentium M...

) featured many advanced concepts not found in the Pentium, although it wasn't the first or only x86 processor to implement them (see NexGen Nx586 or Cyrix 6x86). The Pentium Pro pipeline had extra decode stages to dynamically translate IA-32
IA-32
IA-32 , also known as x86-32, i386 or x86, is the CISC instruction-set architecture of Intel's most commercially successful microprocessors, and was first implemented in the Intel 80386 as a 32-bit extension of x86 architecture...

 instructions into buffered micro-operation
Micro-operation
In computer central processing units, micro-operations are detailed low-level instructions used in some designs to implement complex machine instructions .Various forms of μops have long been the basis for traditional microcode routines used to simplify the implementation of a...

 sequences which could then be analysed, reordered, and renamed in order to detect parallelizable operations that may be issued to more than one execution unit
Execution unit
In computer engineering, an execution unit is a part of a CPU that performs the operations and calculations called for by the Branch Unit, which receives data from the CPU...

 at once. The Pentium Pro thus featured out of order execution, including speculative execution
Speculative execution
Speculative execution in computer systems is doing work, the result of which may not be needed. This performance optimization technique is used in pipelined processors and other systems.-Main idea:...

 via register renaming
Register renaming
In computer architecture, register renaming refers to a technique used to avoid unnecessary serialization of program operations imposed by the reuse of registers by those operations.-Problem definition:...

. It also had a wider 36-bit address bus
Address bus
An address bus is a computer bus that is used to specify a physical address. When a processor or DMA-enabled device needs to read or write to a memory location, it specifies that memory location on the address bus...

 (usable by PAE
Physical Address Extension
In computing, Physical Address Extension is a feature to allow x86 processors to access a physical address space larger than 4 gigabytes....

).

The Pentium Pro has an 8 KiB instruction cache, from which up to 16 bytes are fetched on each cycle and sent to the instruction decoders. There are three instruction decoders. The decoders are not equal in capability: only one can decode any x86 instruction, while the other two can only decode simple x86 instructions. This restricts the Pentium Pro's ability to decode multiple instructions simultaneously, limiting superscalar execution. x86 instructions are decoded into 118-bit micro-operation
Micro-operation
In computer central processing units, micro-operations are detailed low-level instructions used in some designs to implement complex machine instructions .Various forms of μops have long been the basis for traditional microcode routines used to simplify the implementation of a...

s (micro-ops). The micro-ops are RISC-like; that is, they encode an operation, two sources, and a destination. The general decoder can generate up to four micro-ops per cycle, whereas the simple decoders can generate one micro-op each per cycle. Thus, x86 instructions that operate on the memory (e.g., add this register to this location in the memory) can only be processed by the general decoder, as this operation requires at a minimum of three micro-ops. Likewise, the simple decoders are limited to instructions that can be translated into one micro-op. Instructions that require more micro-ops than four are translated with the assistance of a sequencer, which generates the required micro-ops over multiple clock cycles.

Micro-ops exit the ROB and enter a reserve station, where they await dispatch to the execution units. In each clock cycle, up to five micro-ops can be dispatched to five execution units. The Pentium Pro has two integer units and one floating-point unit (FPU). One of the integer units shares the same ports as the FPU, and therefore the Pentium Pro can only dispatch two integer micro-ops and one floating-point micro-op per a cycle. Of the two integer units, only one has the full complement of functions such as a barrel shifter
Barrel shifter
A barrel shifter is a digital circuit that can shift a data word by a specified number of bits in one clock cycle. It can be implemented as a sequence of multiplexers , and in such an implementation the output of one mux is connected to the input of the next mux in a way that depends on the shift...

, multiplier and divider. The second integer unit, which shares paths with the FPU, does not have these facilities and is limited to simple operations such as add, subtract, and the calculation of branch target addresses.

The FPU executes floating-point operations. Addition and multiplication are pipelined and have a latency of three and five cycles, respectively. Division and square-root are not pipelined and are executed in separate units that share the FPU's ports. Division and square root have a latency of 18-36 and 29-69 cycles, respectively. The smallest number is for single precision (32-bit) floating-point numbers and the largest for extended precision (80-bit) numbers. Division and square root can operate simultaneously with adds and multiplies, preventing them from executing only when the result has to be stored in the ROB.

After the microprocessor was released, a bug was discovered in the floating point unit
Floating point unit
A floating-point unit is a part of a computer system specially designed to carry out operations on floating point numbers. Typical operations are addition, subtraction, multiplication, division, and square root...

, commonly called the "Pentium Pro and Pentium II FPU bug" and by Intel as the "flag erratum". The bug occurs under some circumstances during floating point-to-integer conversion when the floating point number won't fit into the smaller integer format, causing the FPU to deviate from its documented behaviour. The bug is considered to be minor and occurs under such special circumstances that very few, if any, software programs are affected.

The Pentium Pro P6 microarchitecture
P6 (microarchitecture)
The P6 microarchitecture is the sixth generation Intel x86 microarchitecture, implemented by the Pentium Pro microprocessor that was introduced in November 1995. It is sometimes referred to as i686. It was succeeded by the NetBurst microarchitecture in 2000, but eventually revived in the Pentium M...

 was used in one form or another by Intel for more than a decade. The pipeline would scale from its initial 150 MHz start, all the way up to 1.4 GHz with the "Tualatin" Pentium III
Pentium III
The Pentium III brand refers to Intel's 32-bit x86 desktop and mobile microprocessors based on the sixth-generation P6 microarchitecture introduced on February 26, 1999. The brand's initial processors were very similar to the earlier Pentium II-branded microprocessors...

. The design's various traits would continue after that in the derivative core called "Banias" in Pentium M
Pentium M
The Pentium M brand refers to a family of mobile single-core x86 microprocessors introduced in March 2003 , and forming a part of the Intel Carmel notebook platform under the then new Centrino brand...

 and Intel Core
Intel Core
Yonah was the code name for Intel's first generation of 65 nm process mobile microprocessors, based on the Banias/Dothan-core Pentium M microarchitecture. SIMD performance has been improved through the addition of SSE3 instructions and improvements to SSE and SSE2 implementations, while integer...

 (Yonah), which itself would evolve into the Core microarchitecture (Core 2 processor) in 2006 and onward.

Performance

Performance with 32-bit
32-bit
The range of integer values that can be stored in 32 bits is 0 through 4,294,967,295. Hence, a processor with 32-bit memory addresses can directly access 4 GB of byte-addressable memory....

 code was excellent and well ahead of the older Pentiums at the time, usually by 25-35%. However, Pentium Pro's 16-bit performance was the same as the original Pentium. It was this, along with the Pentium Pro's high price, that caused the rather lackluster reception among PC enthusiasts, given the dominance at the time of the 16-bit MS-DOS
MS-DOS
MS-DOS is an operating system for x86-based personal computers. It was the most commonly used member of the DOS family of operating systems, and was the main operating system for IBM PC compatible personal computers during the 1980s to the mid 1990s, until it was gradually superseded by operating...

, 16/32-bit Windows 3.1x
Windows 3.1x
Windows 3.1x is a series of 16-bit operating systems produced by Microsoft for use on personal computers. The series began with Windows 3.1, which was first sold during March 1992 as a successor to Windows 3.0...

, and 32/16-bit Windows 95
Windows 95
Windows 95 is a consumer-oriented graphical user interface-based operating system. It was released on August 24, 1995 by Microsoft, and was a significant progression from the company's previous Windows products...

 (parts of Windows 95, such as USER.exe, were still mostly 16-bit). To gain the full advantages of Pentium Pro's P6 microarchitecture
P6 (microarchitecture)
The P6 microarchitecture is the sixth generation Intel x86 microarchitecture, implemented by the Pentium Pro microprocessor that was introduced in November 1995. It is sometimes referred to as i686. It was succeeded by the NetBurst microarchitecture in 2000, but eventually revived in the Pentium M...

, one needed to run a fully 32-bit OS
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

 such as Windows NT 3.51
Windows NT
Windows NT is a family of operating systems produced by Microsoft, the first version of which was released in July 1993. It was a powerful high-level-language-based, processor-independent, multiprocessing, multiuser operating system with features comparable to Unix. It was intended to complement...

, Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

, or OS/2
OS/2
OS/2 is a computer operating system, initially created by Microsoft and IBM, then later developed by IBM exclusively. The name stands for "Operating System/2," because it was introduced as part of the same generation change release as IBM's "Personal System/2 " line of second-generation personal...

.

Compared to RISC microprocessors, the Pentium Pro, when introduced, slightly outperformed the fastest RISC microprocessors on integer performance when running the SPECint95 benchmark. Floating-point performance was significantly lower, half of some RISC microprocessors. The Pentium Pro's integer performance lead disappeared rapidly, first overtaken by the MIPS Technologies
MIPS Technologies
MIPS Technologies, Inc. , formerly MIPS Computer Systems, Inc., is most widely known for developing the MIPS architecture and a series of pioneering RISC chips. MIPS provides processor architectures and cores for digital home, networking and mobile applications.MIPS Computer Systems Inc. was...

 R10000
R10000
The R10000, code-named "T5", is a RISC microprocessor implementation of the MIPS IV instruction set architecture developed by MIPS Technologies, Inc. , then a division of Silicon Graphics, Inc. . The chief designers were Chris Rowen and Kenneth C. Yeager...

 in January 1996, and then by Digital Equipment Corporation
Digital Equipment Corporation
Digital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...

's EV56 variant of the Alpha 21164
Alpha 21164
The Alpha 21164, also known by its code name, EV5, is a microprocessor developed and fabricated by Digital Equipment Corporation that implemented the Alpha instruction set architecture . It was introduced in January 1995, succeeding the Alpha 21064A as Digital's flagship microprocessor...

.

An innovation in cache

Likely Pentium Pro's most noticeable addition was its on-package L2 cache, which ranged from 256 KiB at introduction to 1 MiB in 1997. At the time, manufacturing technology did not feasibly allow a large L2 cache to be integrated into the processor core. Intel instead placed the L2 die(s) separately in the package which still allowed it to run at the same clock speed as the CPU core. Additionally, unlike most motherboard-based cache schemes that shared the main system bus with the CPU, the Pentium Pro's cache had its own back-side bus (called dual independent bus
Dual independent bus
Dual Independent Bus is a processor architecture that includes two buses: one to the main system memory and another to the level 2 cache...

by Intel). Because of this, the CPU could read main memory and cache concurrently, greatly reducing a traditional bottleneck. The cache was also "non-blocking", meaning that the processor could issue more than one cache request at a time (up to 4), reducing cache-miss penalties. (This is an example of MLP, Memory Level Parallelism
Memory level parallelism
Memory Level Parallelism or MLP is a term in computer architecture referring to the ability to have pending multiple memory operations, in particular cache misses or translation lookaside buffer misses, at the same time....

.) These properties combined to produce an L2 cache that was immensely faster than the motherboard-based caches of older processors. This cache alone gave the CPU an advantage in input/output performance over older x86 CPUs. In multiprocessor configurations, Pentium Pro's integrated cache skyrocketed performance in comparison to architectures which had each CPU sharing a central cache.

However, this far faster L2 cache did come with some complications. The Pentium Pro's "on-package cache" arrangement was unique. The processor and the cache were on separate dies in the same package and connected closely by a full-speed bus. The two or three dies had to be bonded together early in the production process, before testing was possible. This meant that a single, tiny flaw in either die made it necessary to discard the entire assembly, which was one of the reasons for the Pentium Pro's relatively low production yield and high cost. All versions of the chip were expensive, those with 1024 KiB being particularly so, since it required two 512 KiB cache dies as well as the processor die.

Available models

Pentium Pro clock speeds were 150, 166, 180 or 200 MHz with a 60 or 66 MHz external bus
Front side bus
A front-side bus is a computer communication interface often used in computers during the 1990s and 2000s.It typically carries data between the central processing unit and a memory controller hub, known as the northbridge....

 clock. Some users chose to overclock their Pentium Pro chips, with the 200 MHz version often being run at 233 MHz, and the 150 MHz version often being run at 166 MHz. The chip was popular in symmetric multiprocessing
Symmetric multiprocessing
In computing, symmetric multiprocessing involves a multiprocessor computer hardware architecture where two or more identical processors are connected to a single shared main memory and are controlled by a single OS instance. Most common multiprocessor systems today use an SMP architecture...

 configurations, with dual and quad SMP server and workstation setups being commonplace.

In Intel's "Family/Model/Stepping" scheme, the Pentium Pro is family 6, model 1, and its Intel Product code is 80521.

Evolution in fabrication

As time progressed, the process used to fabricate the Pentium Pro processor die and its separate cache memory die changed, leading to a combination of processes used in the same package:
  • The 133 MHz Pentium Pro prototype processor die was fabricated in a 0.6 µm BiCMOS process.
  • The 150 MHz Pentium Pro processor die was fabricated in a 0.50 µm BiCMOS
    BiCMOS
    BiCMOS is an evolved semiconductor technology that integrates two formerly separate semiconductor technologies - those of the analog bipolar junction transistor and the digital CMOS transistor - in a single integrated circuit device....

     process.
  • The 166, 180, and 200 MHz Pentium Pro processor die was fabricated in a 0.35 µm BiCMOS process.
  • The 256 KiB L2 cache die was fabricated in a 0.50 µm BiCMOS process.
  • The 512 and 1024 KiB L2 cache die was fabricated in a 0.35 µm BiCMOS process.

Packaging

The Pentium Pro is packaged in a ceramic multi-chip module (MCM). The MCM contains two underside cavities in which the microprocessor die and its companion cache die reside. The dies are bonded to a heat slug, whose exposed top helps enables the heat from the dies to be transferred more directly to cooling apparatus such as a heat sink. The dies are connected to the package using conventional wire bonding. The cavities are capped with a ceramic plate. The Pentium Pro with 1 MiB of cache uses a plastic MCM. Instead of two cavities, there is only one, in which the three dies reside, bonded to the package instead of a heat slug. The cavities are filled in with epoxy.

The MCM has 387 pins, of which approximately half are arranged in a pin grid array (PGA) and half in an interstitial pin grid array (IPGA). The packaging was designed for Socket 8
Socket 8
The Socket 8 CPU socket was used exclusively with the Intel Pentium Pro and Pentium II Overdrive computer processors. Intel discontinued Socket 8 in favor of Slot 1 with the introduction of the Pentium II.-Technical specifications:...

.

Upgrade paths

In 1998, the 300/333 MHz Pentium II Overdrive processor for Socket 8 was released. Featuring 512 KiB of full-speed cache, it was produced by Intel as a drop-in upgrade option for owners of Pentium Pro systems. However, it only supported two-way glueless multiprocessing, not four-way or higher, which did not make it a usable upgrade for quad-processor systems. These specially packaged Pentium II Xeon processors were used to upgrade ASCI Red
ASCI Red
ASCI Red was the first computer built under the Advanced Strategic Computing Initiative . ASCI Red was built by Intel and installed at Sandia in late 1996. The design was based on the Intel Paragon computer...

, which became the first computer to reach the TeraFlop performance mark with the Pentium Pro processor and then the first to exceed 2 TeraFlops after the upgrade to Pentium II Xeon processors.

As Slot 1
Slot 1
Slot 1 refers to the physical and electrical specification for the connector used by some of Intel's microprocessors, including the Pentium Pro, Celeron, Pentium II and the Pentium III...

 motherboards became prevalent, several manufacturers released slocket adapters, such as the Tyan M2020, Asus C-P6S1, Tekram P6SL1, and the Abit KP6. The slockets allowed Pentium Pro processors to be used with Slot 1 motherboards. The Intel 440FX chipset explicitly supported both Pentium Pro and Pentium II processors, but the Intel 440BX
Intel 440BX
The Intel 440BX , is a chipset from Intel, supporting Pentium II, Pentium III, and Celeron processors. It is also known as the i440BX and was released in April 1998...

 and later Slot 1 chipsets did not explicitly support the Pentium Pro, so the Socket 8 slockets did not see wide use. Slockets—in the form of Socket 370
Socket 370
Socket 370 is a common format of CPU socket first used by Intel for Pentium III and Celeron processors to replace the older Slot 1 CPU interface on personal computers. The "370" refers to the number of pin holes in the socket for CPU pins...

 to Slot 1 adapters—saw renewed popularity when Intel introduced Socket 370 Celeron
Celeron
Celeron is a brand name given by Intel Corp. to a number of different x86 computer microprocessor models targeted at budget personal computers....

 and Pentium III
Pentium III
The Pentium III brand refers to Intel's 32-bit x86 desktop and mobile microprocessors based on the sixth-generation P6 microarchitecture introduced on February 26, 1999. The brand's initial processors were very similar to the earlier Pentium II-branded microprocessors...

 processors.

Pentium Pro

  • L1 cache: 8, 8 KiB (data, instructions)
  • L2 cache: 256, 512 KiB (one die) or 1024 KiB (two 512 KiB dies) in a multi-chip module
    Multi-Chip Module
    A multi-chip module is a specialized electronic package where multiple integrated circuits , semiconductor dies or other discrete components are packaged onto a unifying substrate, facilitating their use as a single component...

     clocked at CPU-speed
  • Socket: Socket 8
    Socket 8
    The Socket 8 CPU socket was used exclusively with the Intel Pentium Pro and Pentium II Overdrive computer processors. Intel discontinued Socket 8 in favor of Slot 1 with the introduction of the Pentium II.-Technical specifications:...

  • Front side bus
    Front side bus
    A front-side bus is a computer communication interface often used in computers during the 1990s and 2000s.It typically carries data between the central processing unit and a memory controller hub, known as the northbridge....

    : 60 and 66 MHz
  • VCore: 3.1–3.3 V
  • Fabrication: 0.50 µm or 0.35 BiCMOS
    BiCMOS
    BiCMOS is an evolved semiconductor technology that integrates two formerly separate semiconductor technologies - those of the analog bipolar junction transistor and the digital CMOS transistor - in a single integrated circuit device....

  • Clockrate: 150, 166, 180, 200 MHz
  • First release: November 1995

Pentium II Overdrive

  • L1 cache: 16, 16 KiB (data + instructions)
  • L2 cache: 512 KiB external chip on CPU module clocked at CPU-speed
  • Socket: Socket 8
  • Multiplier: Locked at 5×
  • Front side bus: 60 and 66 MHz
  • VCore: 3.1–3.3 V (has on-board voltage regulator)
  • Fabrication: 0.25 µm
  • Based on the Deschutes-generation Pentium II
  • First release: 1997
  • Supports MMX technology

Pentium Pro/6th generation competitors

  • AMD K5
    AMD K5
    The K5 was AMD's first x86 processor to be developed entirely in-house. Introduced in March 1996, its primary competition was Intel's Pentium microprocessor. The K5 was an ambitious design, closer to a Pentium Pro than a Pentium regarding technical solutions and internal architecture...

     and K6
    AMD K6
    The K6 microprocessor was launched by AMD in 1997. The main advantage of this particular microprocessor is that it was designed to fit into existing desktop designs for Pentium branded CPUs. It was marketed as a product which could perform as well as its Intel Pentium II equivalent but at a...

  • Cyrix 6x86
    Cyrix 6x86
    The Cyrix 6x86 is a sixth-generation, 32-bit 80x86-compatible microprocessor designed by Cyrix and manufactured by IBM and SGS-Thomson. It was originally released in 1996.-Architecture:...

     and MII
    Cyrix 6x86
    The Cyrix 6x86 is a sixth-generation, 32-bit 80x86-compatible microprocessor designed by Cyrix and manufactured by IBM and SGS-Thomson. It was originally released in 1996.-Architecture:...

  • IDT WinChip
    WinChip
    The WinChip series was a low-power Socket 7-based x86 processor designed by Centaur Technology and marketed by its parent company IDT.-Design:The design of the WinChip was quite different from other processors of the time...

  • Intel P5
    P5 (microarchitecture)
    The original Pentium microprocessor was introduced on March 22, 1993. Its microarchitecture, deemed P5, was Intel's fifth-generation and first superscalar x86 microarchitecture. As a direct extension of the 80486 architecture, it included dual integer pipelines, a faster FPU, wider data bus,...

     Pentium (co-existed with Pentium Pro for several years)

See also


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK