POWER7
Encyclopedia
POWER7 is a Power Architecture
microprocessor
released in 2010 that succeeded the POWER6
. POWER7 was developed by IBM at several sites including IBM
's Rochester, MN
; Austin, TX; Essex Junction, Vermont; T. J. Watson Research Center, NY; Bromont, QC
and Böblingen
, Germany laboratories. IBM announced servers based on POWER7 on 8 February 2010.
supercomputer
architecture before the end of 2010 in the HPCS
project. The contract also states that the architecture shall be available commercially. IBM's proposal, PERCS
(Productive, Easy-to-use, Reliable Computer System), which won them the contract, is based on the POWER7 processor, AIX operating system and General Parallel File System.
One feature that IBM and DARPA collaborated on is modifying the addressing and page table hardware to support global shared memory space for POWER7 clusters. This enables research scientists to program a cluster as if it were a single system, without using message passing. From a productivity standpoint, this is essential since some scientists are not conversant with MPI
or other parallel programming techniques used in clusters.
(SMT). The POWER6 microarchitecture was built from the ground up for frequencies, at the cost of power efficiency and achieved remarkable 5 GHz. IBM claimed at ISCA 29 that peak performance was achieved by high frequency designs with 10-20 FO4
delays per pipeline stage at the cost of power efficiency. However, the POWER6 binary floating-point unit achieves a “6-cycle, 13-FO4
pipeline”.
Therefore the pipeline for the POWER7 CPU has been changed again, just as it was for the POWER5 and POWER6 designs. In some respects, this rework is similar to Intel’s turn in 2005 that left the P4 7th-generation x86 microarchitecture. The P4 NetBurst microarchitecture achieved performance by featuring a very deep instruction pipeline
to achieve very high clock speeds. Each core is a full processor with only one execution pipeline, that is a sequence of processing steps through which an instruction or set of instructions is executed. In comparison with the classic RISC pipeline
a hyperthreaded core has parts of the pipeline – e.g. control registers or general-purpose registers – in multiple implementations. This results in the recognition of the processor to appear as two processors to the operating system, allowing the operating system to schedule multiple threads or processes simultaneously. This ingeniously also results in a lower demand of Die
expanse, therefore economizing transistor count and power demand. While the POWER6 features a dual-core processor, each capable of two-way simultaneous multithreading
(SMT), the IBM POWER 7 processor has eight cores, and four threads per core, for a total theoretical capacity of 32 simultaneous threads.
Each core is capable of four-way simultaneous multithreading (SMT). The POWER7 has approximately 1.2 billion transistors and is 567 mm2 large fabricated on a 45 nm process. A notable difference from POWER6 is that the POWER7 executes instructions out-of-order instead of in-order. Despite the decrease in maximum frequency compared to POWER6 (4.25 GHz vs 5.0 GHz), each core has higher performance than the POWER6, while having up to 4 times the number of cores.
POWER7 has these specifications:
"Each POWER7 processor core implements aggressive out-of-order (OoO) instruction
execution to drive high efficiency in the use of available execution paths. The POWER7
processor has an Instruction Sequence Unit that is capable of dispatching up to six
instructions per cycle to a set of queues. Up to eight instructions per cycle can be issued to
the Instruction Execution units. The POWER7 processor has a set of twelve execution units
as [described above]"
This gives the following theoretical performance figures (based on a 4.14 GHz 8 core implementation):
IBM also offers 5 POWER7 based BladeCenter
s.http://www-03.ibm.com/systems/power/hardware/blades/index.html Specifications are shown in the table below.
The following are supercomputer projects that use the POWER7 processor
Power Architecture
Power Architecture is a broad term to describe similar RISC instruction sets for microprocessors developed and manufactured by such companies as IBM, Freescale, AMCC, Tundra and P.A. Semi...
microprocessor
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...
released in 2010 that succeeded the POWER6
POWER6
The POWER6 is a microprocessor developed by IBM that implemented the Power ISA v.2.03. When it became available in systems in 2007, it succeeded the POWER5+ as IBM's flagship Power microprocessor...
. POWER7 was developed by IBM at several sites including IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
's Rochester, MN
IBM Rochester
IBM Rochester is the facility of International Business Machines in Rochester, Minnesota, not to be confused with the IBM Global Services facility in Rochester, NY. The initial structure was designed by Eero Saarinen, who clad the structure in blue panels of varying hues after being inspired by...
; Austin, TX; Essex Junction, Vermont; T. J. Watson Research Center, NY; Bromont, QC
Bromont, Quebec
Bromont is a city in southwestern Quebec in Canada, 75 kilometres east of Montreal on Autoroute 10, bordering the Eastern Townships at the base of Mount Brome . The population as of the Canada 2006 Census was 6,049....
and Böblingen
Böblingen
Böblingen is a town in Baden-Württemberg, Germany, seat of Böblingen District. Physically Sindelfingen and Böblingen are continuous.-History:Böblingen was founded by Count Wilhelm von Tübingen-Böblingen in 1253. Württemberg acquired the town in 1357, and on 12 May 1525 one of the bloodiest battles...
, Germany laboratories. IBM announced servers based on POWER7 on 8 February 2010.
History
IBM won a $244 million DARPA contract in November 2006 to develop a petascalePetascale
In computing, petascale refers to a computer system capable of reaching performance in excess of one petaflop, i.e. one quadrillion floating point operations per second. The standard benchmark tool is LINPACK and Top500.org is the organisation which tracks the fastest supercomputers...
supercomputer
Supercomputer
A supercomputer is a computer at the frontline of current processing capacity, particularly speed of calculation.Supercomputers are used for highly calculation-intensive tasks such as problems including quantum physics, weather forecasting, climate research, molecular modeling A supercomputer is a...
architecture before the end of 2010 in the HPCS
High Productivity Computing Systems
High Productivity Computing Systems is a DARPA project for developing a new generation of economically viable high productivity computing systems for national security and industry in the 2002-2010 timeframe....
project. The contract also states that the architecture shall be available commercially. IBM's proposal, PERCS
PERCS
PERCS , officially known as the Power 775, is IBM's answer to DARPA's High Productivity Computing Systems initiative....
(Productive, Easy-to-use, Reliable Computer System), which won them the contract, is based on the POWER7 processor, AIX operating system and General Parallel File System.
One feature that IBM and DARPA collaborated on is modifying the addressing and page table hardware to support global shared memory space for POWER7 clusters. This enables research scientists to program a cluster as if it were a single system, without using message passing. From a productivity standpoint, this is essential since some scientists are not conversant with MPI
Message Passing Interface
Message Passing Interface is a standardized and portable message-passing system designed by a group of researchers from academia and industry to function on a wide variety of parallel computers...
or other parallel programming techniques used in clusters.
Design
The POWER7 processor microarchitecture was a substantial evolution from the POWER6 design, focusing more on power efficiency through multiple cores and simultaneous multithreadingSimultaneous multithreading
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading...
(SMT). The POWER6 microarchitecture was built from the ground up for frequencies, at the cost of power efficiency and achieved remarkable 5 GHz. IBM claimed at ISCA 29 that peak performance was achieved by high frequency designs with 10-20 FO4
FO4
Fan-out of 4 is a process independent delay metric used in digital CMOS technologies.Fan out = Cload / CinCload = total MOS gate capacitance driven by the logic gate under considerationCin = the MOS gate capacitance of the logic gate under consideration...
delays per pipeline stage at the cost of power efficiency. However, the POWER6 binary floating-point unit achieves a “6-cycle, 13-FO4
FO4
Fan-out of 4 is a process independent delay metric used in digital CMOS technologies.Fan out = Cload / CinCload = total MOS gate capacitance driven by the logic gate under considerationCin = the MOS gate capacitance of the logic gate under consideration...
pipeline”.
Therefore the pipeline for the POWER7 CPU has been changed again, just as it was for the POWER5 and POWER6 designs. In some respects, this rework is similar to Intel’s turn in 2005 that left the P4 7th-generation x86 microarchitecture. The P4 NetBurst microarchitecture achieved performance by featuring a very deep instruction pipeline
Instruction pipeline
An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....
to achieve very high clock speeds. Each core is a full processor with only one execution pipeline, that is a sequence of processing steps through which an instruction or set of instructions is executed. In comparison with the classic RISC pipeline
Classic RISC pipeline
In the history of computer hardware, some early reduced instruction set computer central processing units used a very similar architectural solution, now called a classic RISC pipeline. Those CPUs were: MIPS, SPARC, Motorola 88000, and later DLX....
a hyperthreaded core has parts of the pipeline – e.g. control registers or general-purpose registers – in multiple implementations. This results in the recognition of the processor to appear as two processors to the operating system, allowing the operating system to schedule multiple threads or processes simultaneously. This ingeniously also results in a lower demand of Die
Die
Die usually refers to the action of death.Die may also refer to:-Objects:* Die , a material-shaping device* Die , a rectangular piece of a semiconductor wafer* One of a set of dice, gambling or game devices...
expanse, therefore economizing transistor count and power demand. While the POWER6 features a dual-core processor, each capable of two-way simultaneous multithreading
Simultaneous multithreading
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading...
(SMT), the IBM POWER 7 processor has eight cores, and four threads per core, for a total theoretical capacity of 32 simultaneous threads.
Specifications
The POWER7 is a multi-core processor, available with 4, 6, or 8 cores. There is also a special TurboCore mode that can turn off half of the cores from an eight-core processor, but those 4 cores have access to all the memory controllers and L3 cache at increased clock speeds. This makes each core's performance higher which is important for workloads which require the fastest cores possible. TurboCore mode can reduce "software costs in half for those applications that are licensed per core, while increasing per core performance from that software." The new IBM Power 780 scalable, high-end servers featuring the new TurboCore workload optimizing mode and delivering up to double performance per core of POWER6 based systems.Each core is capable of four-way simultaneous multithreading (SMT). The POWER7 has approximately 1.2 billion transistors and is 567 mm2 large fabricated on a 45 nm process. A notable difference from POWER6 is that the POWER7 executes instructions out-of-order instead of in-order. Despite the decrease in maximum frequency compared to POWER6 (4.25 GHz vs 5.0 GHz), each core has higher performance than the POWER6, while having up to 4 times the number of cores.
POWER7 has these specifications:
- 45 nm45 nanometerPer the International Technology Roadmap for Semiconductors, the 45 nm technology node should refer to the average half-pitch of a memory cell manufactured at around the 2007–2008 time frame....
SOISilicon on insulatorSilicon on insulator technology refers to the use of a layered silicon-insulator-silicon substrate in place of conventional silicon substrates in semiconductor manufacturing, especially microelectronics, to reduce parasitic device capacitance and thereby improving performance...
process, 567 mm2 - 1.2 billion transistorTransistorA transistor is a semiconductor device used to amplify and switch electronic signals and power. It is composed of a semiconductor material with at least three terminals for connection to an external circuit. A voltage or current applied to one pair of the transistor's terminals changes the current...
s - 3.0 – 4.25 GHz clock speed
- max 4 chips per quad-chip moduleMulti-Chip ModuleA multi-chip module is a specialized electronic package where multiple integrated circuits , semiconductor dies or other discrete components are packaged onto a unifying substrate, facilitating their use as a single component...
- 4, 6 or 8 cores per chip
- 4 SMT threadsSimultaneous multithreadingSimultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading...
per core (available in AIX 6.1 TL05 (releases in April 2010) and above) - 12 execution units per core:
- 2 fixed-point units
- 2 load/store units
- 4 double-precision floating-point units
- 1 vector unit supporting VSXAltiVecAltiVec is a floating point and integer SIMD instruction set designed and owned by Apple, IBM and Freescale Semiconductor, formerly the Semiconductor Products Sector of Motorola, , and implemented on versions of the PowerPC including Motorola's G4, IBM's G5 and POWER6 processors, and P.A. Semi's...
- 1 decimal floating-point unit
- 1 branch unit
- 1 condition register unit
- 4 SMT threads
- 32+32 kB L1 instruction and data cache (per core)
- 256 kB L2 Cache (per core)
- 4 MB L3 cache per core with maximum up to 32MB supported. The cache is implemented in eDRAMEDRAMeDRAM stands for "embedded DRAM", a capacitor-based dynamic random access memory integrated on the same die as an ASIC or processor. The cost-per-bit is higher than for stand-alone DRAM chips but in many applications the performance advantages of placing the eDRAM on the same chip as the processor...
, which does not require as many transistors per cell as a standard SRAM so it allows for a larger cache while using the same area as SRAM.
- 4, 6 or 8 cores per chip
"Each POWER7 processor core implements aggressive out-of-order (OoO) instruction
execution to drive high efficiency in the use of available execution paths. The POWER7
processor has an Instruction Sequence Unit that is capable of dispatching up to six
instructions per cycle to a set of queues. Up to eight instructions per cycle can be issued to
the Instruction Execution units. The POWER7 processor has a set of twelve execution units
as [described above]"
This gives the following theoretical performance figures (based on a 4.14 GHz 8 core implementation):
- max 33.12 GFLOPS per core
- max 264.96 GFLOPS per chip
Products
As of October 2011, the range of POWER7 systems includes "Express" models (710, 720, 730, 740 and 750), Enterprise models (770, 780 and 795) and High Performance computing models (755 and 775). Enterprise models differ in having Capacity on Demand capabilities. Maximum specifications are shown in the table below.Name | Number of chips | Number of cores | CPU clock frequency |
---|---|---|---|
710 Express | 1 | 6 | 3.7 GHz |
710 Express | 1 | 8 | 3.55 GHz |
720 Express | 1 | 8 | 3.0 GHz |
730 Express | 2 | 12 | 3.7 GHz |
730 Express | 2 | 16 | 3.55 GHz |
740 Express | 2 | 12 | 3.7 GHz |
740 Express | 2 | 16 | 3.55 GHz |
750 Express | 4 | 24 | 3.72 GHz |
750 Express | 4 | 32 | 3.22 GHz or 3.61 GHz |
755 | 4 | 32 | 3.61 GHz |
770 | 8 | 48 | 3.7 GHz |
770 | 8 | 64 | 3.3 GHz |
775 (Per Node) | 32 | 256 | 3.83 GHz |
780 (MaxCore mode) | 8 | 64 | 3.92 GHz |
780 (TurboCore mode) | 8 | 32 | 4.14 GHz |
780 (4 Socket Node) | 16 | 96 | 3.44 GHz |
795 | 32 | 192 | 3.72 GHz |
795 (MaxCore mode) | 32 | 256 | 4.0 GHz |
795 (TurboCore mode) | 32 | 128 | 4.25 GHz |
IBM also offers 5 POWER7 based BladeCenter
IBM BladeCenter
The IBM BladeCenter is IBM's blade server architecture.-History:Originally introduced in 2002, based on engineering work started in 1999, the IBM BladeCenter was a relative late comer to the blade market. But, it differed from prior offerings in that it supported the full range of high powered x86...
s.http://www-03.ibm.com/systems/power/hardware/blades/index.html Specifications are shown in the table below.
Name | Number of cores | CPU clock frequency | Blade slots required |
---|---|---|---|
BladeCenter PS700 | 4 | 3.0 GHz | 1 |
BladeCenter PS701 | 8 | 3.0 GHz | 1 |
BladeCenter PS702 | 16 | 3.0 GHz | 2 |
BladeCenter PS703 | 16 | 2.4 GHz | 1 |
BladeCenter PS704 | 32 | 2.4 GHz | 2 |
The following are supercomputer projects that use the POWER7 processor
- PERCSPERCSPERCS , officially known as the Power 775, is IBM's answer to DARPA's High Productivity Computing Systems initiative....
- Blue WatersBlue WatersBlue Waters is the name of a petascale supercomputer to be deployed at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign...
- Watson (artificial intelligence software)Watson (artificial intelligence software)Watson is an artificial intelligence computer system capable of answering questions posed in natural language, developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci. Watson was named after IBM's first president, Thomas J...
External links
- IBM POWER7 Systems - IBM POWER7 product page
- IBM POWER7 Technology and Systems - IBM Journal of Research and Development (published by IEEE Xplore)
- IBM Won DARPA HPCS Phase-III
- IBM Won DARPA HPCS Phase-II
- IBM PERCS
- POWER 780 SPECint_rate_base2006 result
- IBM BladeCenter PS703 and PS704 Technical Overview and Introduction