SPARC64 V
Encyclopedia
SPARC64 V refers to two unique microprocessor
s, the SPARC64 V "Zeus" developed by Fujitsu
, and an earlier design developed by HAL Computer Systems
that never made it into production. The HAL design was canceled in mid-2001 when HAL, a subsidiary of Fujitsu, was closed. The SPARC64 V developed by Fujitsu is a replacement for the HAL design.
The SPARC64 V was presented at Microprocessor Forum 2002 by Aiichiro Inoue, the director of the Processor Development Division of the Development Department at Fujitsu. At introduction, it had the highest clock frequency of both SPARC implementations and 64-bit server microprocessor in production; and the highest SPEC rating of any SPARC implementation.
microprocessor with out-of-order execution
. It was based on the Fujitsu GS8900 mainframe
microprocessor.
(ALU) and a shift unit, but only EXA has multiply and divide units. Loads and stores are executed by two address generators (AGs) designated AGA and AGB. These are simple ALUs used to calculate virtual addresses.
The two floating-point units (FPUs) are designated FLA and FLB. Each FPU contains an adder and a multiplier, but only FLA has a graphics unit attached. They execute add, subtract, multiply, divide, square root and multiply–add instructions. Unlike its successor SPARC64 VI
, the SPARC64 V performs the multiply–add with separate multiplication and addition operations, thus with up to two rounding errors. The graphics unit executes Visual Instruction Set
(VIS) instructions, a set of single instruction, multiple data
(SIMD) instructions. All instructions are pipelined except for divide and square root, which are executed using iterative algorithms. The FMA instruction is implemented by reading three operands from the operand register, multiplying two of the operands, forwarding the result and the third operand to the adder, and adding them to produce the final result.
Results from the execution units and loads are not written to the register file. To maintain program order, they are written to update buffers, where they reside until committed. The SPARC64 V has separate update buffers for integer and floating-point units. Both have 32 entries each. The integer register has eight read ports and four write ports. Half of the write ports are used for results from the integer execution units and the other half by data returned by loads. The floating-point update buffer has six read ports and four write ports.
Commit takes place during stage ten at the earliest. The SPARC64 V can commit up to four instructions per cycle. During stage eleven, results are written to the register file, where it becomes visible to software.
The level 1 (L1) caches each have a capacity of 128 KB. They are both two-way set associative and have 64-byte line size. They are virtually indexed and physically tagged. The instruction cache is accessed via a 256-bit bus. The data cache is accessed with two 128-bit buses. The data cache consists of eight banks separated by 32-bit boundaries. It uses a write-back policy. The data cache writes to the L2 cache with its own 128-bit unidirectional bus.
The second level cache has a capacity of 1 or 2 MB and the set associativity depends on the capacity.
(SOI) process. The die measured 18.14 mm by 15.99 mm for a die area of 290 mm2.
The first SPARC64 V+, a 1.89 GHz version, was shipped in September 2004 for the Fujitsu PrimePower 650 and 850. In December 2004, a 1.82 GHz version was shipped in the PrimePower 2500. In February 2006, four versions were introduced: 1.65 and 1.98 GHz versions with 3 MB of L2 cache shipped in the PrimePower 250 and 450; and 2.08 and 2.16 GHz versions with 4 MB of L2 cache shipped in mid-range and high-end models.
It contained approximately 400 million transistors on a die with dimensions of 18.46 mm by 15.94 mm for a die area of 294.25 mm2. It was fabricated in a 90 nm CMOS process with ten levels of copper interconnect.
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...
s, the SPARC64 V "Zeus" developed by Fujitsu
Fujitsu
is a Japanese multinational information technology equipment and services company headquartered in Tokyo, Japan. It is the world's third-largest IT services provider measured by revenues....
, and an earlier design developed by HAL Computer Systems
HAL Computer Systems
HAL Computer Systems, Inc was a Campbell, California-based computer manufacturer founded in 1990 by Andrew Heller, a principal designer of the original IBM POWER architecture...
that never made it into production. The HAL design was canceled in mid-2001 when HAL, a subsidiary of Fujitsu, was closed. The SPARC64 V developed by Fujitsu is a replacement for the HAL design.
History
The first SPARC64 V microprocessors were fabricated in December 2001. They operated at 1.1 to 1.35 GHz. Fujitsu's 2003 SPARC64 roadmap showed that the company planned a 1.62 GHz version for release in late 2003 or early 2004, but it was canceled in favor of the SPARC64 V+. The SPARC64 V was used by Fujitsu in their PRIMEPOWER servers.The SPARC64 V was presented at Microprocessor Forum 2002 by Aiichiro Inoue, the director of the Processor Development Division of the Development Department at Fujitsu. At introduction, it had the highest clock frequency of both SPARC implementations and 64-bit server microprocessor in production; and the highest SPEC rating of any SPARC implementation.
Description
The SPARC64 V is a four-issue superscalarSuperscalar
A superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate...
microprocessor with out-of-order execution
Out-of-order execution
In computer engineering, out-of-order execution is a paradigm used in most high-performance microprocessors to make use of instruction cycles that would otherwise be wasted by a certain type of costly delay...
. It was based on the Fujitsu GS8900 mainframe
Mainframe computer
Mainframes are powerful computers used primarily by corporate and governmental organizations for critical applications, bulk data processing such as census, industry and consumer statistics, enterprise resource planning, and financial transaction processing.The term originally referred to the...
microprocessor.
Pipeline
The SPARC64 V fetches up to eight instructions from the instruction cache during the first stage and places them into a 48-entry instruction buffer. In the next stage, four instructions are taken from this buffer, decoded and issued to the appropriate reserve stations. The SPARC64 V has six reserve stations, two that serve the integer units, one for the address generators, two for the floating-point units, and one for branch instructions. Each integer, address generator and floating-point unit has an eight-entry reserve station. Each reserve station can dispatch an instruction to its execution unit. Which instruction is dispatched firstly depends on operand availability and then its age. Older instructions are given higher priority than newer ones. The reserve stations can dispatch instructions speculatively (speculative dispatch). That is, instructions can be dispatched to the execution units even when their operands are not yet available but will be when execution begins. During stage six, up to six instructions are be dispatched.Register read
The register files are read during stage seven. The SPARC architecture has separate register files for integer and floating-point instructions. The integer register file has eight register windows. The JWR contains 64 entries and has eight read ports and two write ports. The JWR contains a subset of the eight register windows, the previous, current and next register windows. Its purpose is reduce the size of register file so that the microprocessor can operate at higher clock frequencies. The floating-point register file contains 64 entries and has six read ports and two write ports.Execution
Execution begins during stage nine. There are six execution units, two for integer, two for loads and stores, and two for floating-point. The two integer execution units are designated EXA and EXB. Both have an arithmetic logic unitArithmetic logic unit
In computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logical operations.The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers...
(ALU) and a shift unit, but only EXA has multiply and divide units. Loads and stores are executed by two address generators (AGs) designated AGA and AGB. These are simple ALUs used to calculate virtual addresses.
The two floating-point units (FPUs) are designated FLA and FLB. Each FPU contains an adder and a multiplier, but only FLA has a graphics unit attached. They execute add, subtract, multiply, divide, square root and multiply–add instructions. Unlike its successor SPARC64 VI
SPARC64 VI
The SPARC64 VI, code-named Olympus-C, is a microprocessor, developed by Fujitsu. It implements the SPARC V9 instruction set architecture and is compliant with the Joint Programming Specification developed by Fujitsu and Sun. It is used by Fujitsu and Sun Microsystems in their SPARC Enterprise...
, the SPARC64 V performs the multiply–add with separate multiplication and addition operations, thus with up to two rounding errors. The graphics unit executes Visual Instruction Set
Visual Instruction Set
Visual Instruction Set, or VIS, is a SIMD instruction set for SPARC V9 microprocessors developed by Sun Microsystems. There are three versions of VIS: VIS 1, VIS 2 and VIS 2+...
(VIS) instructions, a set of single instruction, multiple data
SIMD
Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...
(SIMD) instructions. All instructions are pipelined except for divide and square root, which are executed using iterative algorithms. The FMA instruction is implemented by reading three operands from the operand register, multiplying two of the operands, forwarding the result and the third operand to the adder, and adding them to produce the final result.
Results from the execution units and loads are not written to the register file. To maintain program order, they are written to update buffers, where they reside until committed. The SPARC64 V has separate update buffers for integer and floating-point units. Both have 32 entries each. The integer register has eight read ports and four write ports. Half of the write ports are used for results from the integer execution units and the other half by data returned by loads. The floating-point update buffer has six read ports and four write ports.
Commit takes place during stage ten at the earliest. The SPARC64 V can commit up to four instructions per cycle. During stage eleven, results are written to the register file, where it becomes visible to software.
Cache
The SPARC64 V has two-level cache hierarchy. The first level consists of two caches, an instruction cache and a data cache. The second level consists of an on-die unified cache.The level 1 (L1) caches each have a capacity of 128 KB. They are both two-way set associative and have 64-byte line size. They are virtually indexed and physically tagged. The instruction cache is accessed via a 256-bit bus. The data cache is accessed with two 128-bit buses. The data cache consists of eight banks separated by 32-bit boundaries. It uses a write-back policy. The data cache writes to the L2 cache with its own 128-bit unidirectional bus.
The second level cache has a capacity of 1 or 2 MB and the set associativity depends on the capacity.
System bus
The microprocessor has a 128-bit system bus that operates at 260 MHz. The bus can operate in two modes, single-data rate (SDR) or double-data (DDR) rate, yielding a peak bandwidth of 4.16 or 8.32 GB/s, respectively.Physical
The SPARC64 V consisted of 191 million transistors, of which 19 million are contained in logic circuits. It was fabricated by unnamed foundry in a 0.13 µm, eight-layer copper metallization, complementary metal–oxide–semiconductor (CMOS) silicon on insulatorSilicon on insulator
Silicon on insulator technology refers to the use of a layered silicon-insulator-silicon substrate in place of conventional silicon substrates in semiconductor manufacturing, especially microelectronics, to reduce parasitic device capacitance and thereby improving performance...
(SOI) process. The die measured 18.14 mm by 15.99 mm for a die area of 290 mm2.
Electrical
At 1.3 GHz, the SPARC64 V has a power dissipation of 34.7 W. The Fujitsu PrimePower servers that use the SPARC64 V supply a slightly higher voltage the microprocessor to enable it to operate at 1.35 GHz. The increased power supply voltage and operating frequency increased the power dissipation to ~45 W.SPARC64 V+
The SPARC64 V+, code-named "Olympus-B", is a further development of the SPARC64 V. Improvements over the SPARC64 V included higher clock frequencies of 1.82 to 2.16 GHz and a larger secondary cache with a capacity of 3 or 4 MB.The first SPARC64 V+, a 1.89 GHz version, was shipped in September 2004 for the Fujitsu PrimePower 650 and 850. In December 2004, a 1.82 GHz version was shipped in the PrimePower 2500. In February 2006, four versions were introduced: 1.65 and 1.98 GHz versions with 3 MB of L2 cache shipped in the PrimePower 250 and 450; and 2.08 and 2.16 GHz versions with 4 MB of L2 cache shipped in mid-range and high-end models.
It contained approximately 400 million transistors on a die with dimensions of 18.46 mm by 15.94 mm for a die area of 294.25 mm2. It was fabricated in a 90 nm CMOS process with ten levels of copper interconnect.