Emotion Engine
Encyclopedia
The Emotion Engine is a CPU developed and manufactured by Sony Computer Entertainment
and Toshiba
for use in the Sony PlayStation 2
video game console
, as well as early PlayStation 3
models sold in Japan and North America (Model Numbers CECHAxx & CECHBxx). Mass production of the Emotion Engine began in 1999.
. These units are: a CPU core, two Vector Processing Units
(VPU), a graphics interface (GIF), a 10 channel DMA
unit, a memory controller
, an Image Processing Unit (IPU) and an input output interface.
The CPU core is tightly coupled to the first VPU, VPU0. Together, they are responsible for executing game code and high-level modeling computations. The second VPU, VPU1, is dedicated to geometry-transformations and lighting and operates independently, parallel to the CPU core, controlled by microcode
. VPU0, when not utilized, can also be used for geometry-transformations. Display list
s generated are sent to the GIF, which prioritizes them before dispatching them to the Graphics Synthesizer for rendering.
in-order RISC processor. It implements the MIPS-III
instruction set architecture (ISA) and much of MIPS-IV in addition to a custom 128-bit single instruction multiple data
(SIMD) instruction set developed by Sony. The custom instruction set consists of 107 instructions for operating on four 32-bit, eight 16-bit or sixteen 8-bit integers simultaneously. Instructions defined include: add, subtract, multiply, divide, min/max, shift, logical, leading-zero count, 128-bit load/store and 256-bit to 128-bit funnel shift in addition to some not described by Sony for competitive reasons.
The MIPS-based core consists of two arithmetic logic unit
s (ALUs) and a floating point unit
(FPU). The integer units are 64-bit
, but the FPU was single-precision, or 32-bit
. The custom instruction set was implemented by grouping the two 64-bit integer units. Both the integer and floating-point pipelines are both six stages long. To support the custom instruction set, the integer registers are 128 bits wide.
To feed the execution units with instructions and data, there is a 16 KB two-way set associative instruction cache, an 8 KB two-way set associative non blocking data cache and a 16 KB scratchpad RAM
. Both the instruction and data caches are virtually indexed and physically tagged while the scratchpad RAM
exists in a separate memory space. A combined 48 double entry instruction and data translation lookaside buffer
is provided for translating virtual addresses. Branch prediction is achieved by a 64-entry branch target address cache and a branch history table that is integrated into the instruction cache. The branch mispredict penalty is three cycles due to the short six stage pipeline.
performance is provided by two vector processing unit
s (VPU), designated VPU0 and VPU1. Each VPU features 32 128-bit
registers, 16 16-bit fixed-point registers, four FMAC (Floating point Multiply-ACcumulate) units, an FDIV (Floating point DIVide) unit and a local data memory. The data memory for VPU0 is 4 KB in size, while VPU1 features a 16 KB data memory.
To achieve high bandwidth, the VPU's data memory is connected directly to the GIF, and both of the data memories can be read directly by the DMA unit. A single vector instruction consists of four 32-bit IEEE compliant single-precision floating-point values which are distributed to the four single-precision (32-bit) FMAC units for processing. Contrary to popular belief, the Emotion Engine is not a 128-bit processor as it does not process a single 128-bit value, but a group of four 32-bit values that are stored in one 128-bit register. This scheme is similar to the SSEx
extensions by Intel.
The FMAC units take four cycles to execute one instruction, but as the units have a six-stage pipeline, they have a throughput of one instruction per cycle. The FDIV unit has a nine-stage pipeline and can execute one instruction every seven cycles.
/s. DMA transfers over this bus occurs in packets of eight 128-bit words, achieving a peak usable bandwidth of 2 GB/s.
, which interfaces to the internal data bus. Each channel is 16 bits wide and operates at 400 MHz. Combined, the two channels of DRDRAM have a maximum theoretical bandwidth of 3.2 GB/s, about 33% more bandwidth than the internal data bus. Because of this, the memory controller buffers data sent from the DRDRAM channels so the extra bandwidth can be utilised by the CPU.
The Emotion Engine interfaces directly to the Graphics Synthesizer via the GIF with a dedicated 64-bit, 150 MHz bus that has a maximum theoretical bandwidth of 1.2 GB/s.
To provide communications between the Emotion Engine and the Input Output Processor (IOP), the input output interface interfaces a 32-bit wide, 37.5 MHz input output bus with a maximum theoretical bandwidth of 150 MB/s to the internal data bus. This interface provides vastly more bandwidth than what is required by the PlayStation's input output devices.
(0.18 µm
effective LG) complementary metal–oxide–semiconductor (CMOS) process with four levels of interconnect.
(PBGA).
's CPU. The first SKUs of the PlayStation 3
also featured an Emotion Engine on the motherboard to achieve backwards compatibility with PlayStation 2 games. However, the second revision of the PlayStation 3
lacked a physical Emotion Engine in order to lower costs, performing all of its functions using software emulation performed by the Cell Broadband Processor
, coupled with a hardware Graphics Synthesizer still present to achieve PlayStation 2 backwards compatibility. In all subsequent revisions, the Graphics Synthesizer was removed, with no software emulation to replace it.
Sony Computer Entertainment
Sony Computer Entertainment, Inc. is a major video game company specializing in a variety of areas in the video game industry, and is a wholly owned subsidiary and part of the Consumer Products & Services Group of Sony...
and Toshiba
Toshiba
is a multinational electronics and electrical equipment corporation headquartered in Tokyo, Japan. It is a diversified manufacturer and marketer of electrical products, spanning information & communications equipment and systems, Internet-based solutions and services, electronic components and...
for use in the Sony PlayStation 2
PlayStation 2
The PlayStation 2 is a sixth-generation video game console manufactured by Sony as part of the PlayStation series. Its development was announced in March 1999 and it was first released on March 4, 2000, in Japan...
video game console
Video game console
A video game console is an interactive entertainment computer or customized computer system that produces a video display signal which can be used with a display device to display a video game...
, as well as early PlayStation 3
PlayStation 3
The is the third home video game console produced by Sony Computer Entertainment and the successor to the PlayStation 2 as part of the PlayStation series. The PlayStation 3 competes with Microsoft's Xbox 360 and Nintendo's Wii as part of the seventh generation of video game consoles...
models sold in Japan and North America (Model Numbers CECHAxx & CECHBxx). Mass production of the Emotion Engine began in 1999.
Description
The Emotion Engine consists of eight separate "units", each performing a specific task, integrated onto the same dieDie (integrated circuit)
A die in the context of integrated circuits is a small block of semiconducting material, on which a given functional circuit is fabricated.Typically, integrated circuits are produced in large batches on a single wafer of electronic-grade silicon or other semiconductor through processes such as...
. These units are: a CPU core, two Vector Processing Units
Vector processor
A vector processor, or array processor, is a central processing unit that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors. This is in contrast to a scalar processor, whose instructions operate on single data items...
(VPU), a graphics interface (GIF), a 10 channel DMA
Direct memory access
Direct memory access is a feature of modern computers that allows certain hardware subsystems within the computer to access system memory independently of the central processing unit ....
unit, a memory controller
Memory controller
The memory controller is a digital circuit which manages the flow of data going to and from the main memory. It can be a separate chip or integrated into another chip, such as on the die of a microprocessor...
, an Image Processing Unit (IPU) and an input output interface.
The CPU core is tightly coupled to the first VPU, VPU0. Together, they are responsible for executing game code and high-level modeling computations. The second VPU, VPU1, is dedicated to geometry-transformations and lighting and operates independently, parallel to the CPU core, controlled by microcode
Microcode
Microcode is a layer of hardware-level instructions and/or data structures involved in the implementation of higher level machine code instructions in many computers and other processors; it resides in special high-speed memory and translates machine instructions into sequences of detailed...
. VPU0, when not utilized, can also be used for geometry-transformations. Display list
Display list
A display list is a series of graphics commands that define an output image. The image is created by executing the commands....
s generated are sent to the GIF, which prioritizes them before dispatching them to the Graphics Synthesizer for rendering.
CPU core
The CPU core is a two-way superscalarSuperscalar
A superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate...
in-order RISC processor. It implements the MIPS-III
MIPS architecture
MIPS is a reduced instruction set computer instruction set architecture developed by MIPS Technologies . The early MIPS architectures were 32-bit, and later versions were 64-bit...
instruction set architecture (ISA) and much of MIPS-IV in addition to a custom 128-bit single instruction multiple data
SIMD
Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...
(SIMD) instruction set developed by Sony. The custom instruction set consists of 107 instructions for operating on four 32-bit, eight 16-bit or sixteen 8-bit integers simultaneously. Instructions defined include: add, subtract, multiply, divide, min/max, shift, logical, leading-zero count, 128-bit load/store and 256-bit to 128-bit funnel shift in addition to some not described by Sony for competitive reasons.
The MIPS-based core consists of two arithmetic logic unit
Arithmetic logic unit
In computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logical operations.The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers...
s (ALUs) and a floating point unit
Floating point unit
A floating-point unit is a part of a computer system specially designed to carry out operations on floating point numbers. Typical operations are addition, subtraction, multiplication, division, and square root...
(FPU). The integer units are 64-bit
64-bit
64-bit is a word size that defines certain classes of computer architecture, buses, memory and CPUs, and by extension the software that runs on them. 64-bit CPUs have existed in supercomputers since the 1970s and in RISC-based workstations and servers since the early 1990s...
, but the FPU was single-precision, or 32-bit
32-bit
The range of integer values that can be stored in 32 bits is 0 through 4,294,967,295. Hence, a processor with 32-bit memory addresses can directly access 4 GB of byte-addressable memory....
. The custom instruction set was implemented by grouping the two 64-bit integer units. Both the integer and floating-point pipelines are both six stages long. To support the custom instruction set, the integer registers are 128 bits wide.
To feed the execution units with instructions and data, there is a 16 KB two-way set associative instruction cache, an 8 KB two-way set associative non blocking data cache and a 16 KB scratchpad RAM
Scratchpad RAM
Scratchpad memory , also known as scratchpad, scatchpad RAM or local store in computer terminology, is a high-speed internal memory used for temporary storage of calculations, data, and other work in progress...
. Both the instruction and data caches are virtually indexed and physically tagged while the scratchpad RAM
Scratchpad RAM
Scratchpad memory , also known as scratchpad, scatchpad RAM or local store in computer terminology, is a high-speed internal memory used for temporary storage of calculations, data, and other work in progress...
exists in a separate memory space. A combined 48 double entry instruction and data translation lookaside buffer
Translation Lookaside Buffer
A translation lookaside buffer is a CPU cache that memory management hardware uses to improve virtual address translation speed. All current desktop and server processors use a TLB to map virtual and physical address spaces, and it is ubiquitous in any hardware which utilizes virtual memory.The...
is provided for translating virtual addresses. Branch prediction is achieved by a 64-entry branch target address cache and a branch history table that is integrated into the instruction cache. The branch mispredict penalty is three cycles due to the short six stage pipeline.
Vector processing units
The majority of the Emotion Engine's floating pointFloating point
In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...
performance is provided by two vector processing unit
Vector processor
A vector processor, or array processor, is a central processing unit that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors. This is in contrast to a scalar processor, whose instructions operate on single data items...
s (VPU), designated VPU0 and VPU1. Each VPU features 32 128-bit
128-bit
There are currently no mainstream general-purpose processors built to operate on 128-bit integers or addresses, though a number of processors do operate on 128-bit data. The IBM System/370 could be considered the first rudimentary 128-bit computer as it used 128-bit floating point registers...
registers, 16 16-bit fixed-point registers, four FMAC (Floating point Multiply-ACcumulate) units, an FDIV (Floating point DIVide) unit and a local data memory. The data memory for VPU0 is 4 KB in size, while VPU1 features a 16 KB data memory.
To achieve high bandwidth, the VPU's data memory is connected directly to the GIF, and both of the data memories can be read directly by the DMA unit. A single vector instruction consists of four 32-bit IEEE compliant single-precision floating-point values which are distributed to the four single-precision (32-bit) FMAC units for processing. Contrary to popular belief, the Emotion Engine is not a 128-bit processor as it does not process a single 128-bit value, but a group of four 32-bit values that are stored in one 128-bit register. This scheme is similar to the SSEx
Streaming SIMD Extensions
In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! . SSE contains 70 new instructions, most of which work on single precision floating point...
extensions by Intel.
The FMAC units take four cycles to execute one instruction, but as the units have a six-stage pipeline, they have a throughput of one instruction per cycle. The FDIV unit has a nine-stage pipeline and can execute one instruction every seven cycles.
Internal data bus
Communications between the MIPS core, the two VPUs, GIF, memory controller and other units is handled by a 128-bit wide internal data bus running at half the clock frequency of the Emotion Engine. At 300 MHz, the internal data bus provides a maximum theoretical bandwidth of 2.4 GBGigabyte
The gigabyte is a multiple of the unit byte for digital information storage. The prefix giga means 109 in the International System of Units , therefore 1 gigabyte is...
/s. DMA transfers over this bus occurs in packets of eight 128-bit words, achieving a peak usable bandwidth of 2 GB/s.
External interface
Communication between the Emotion Engine and RAM occurs through two channels of DRDRAM (Direct Rambus Dynamic Random Access Memory) and the memory controllerMemory controller
The memory controller is a digital circuit which manages the flow of data going to and from the main memory. It can be a separate chip or integrated into another chip, such as on the die of a microprocessor...
, which interfaces to the internal data bus. Each channel is 16 bits wide and operates at 400 MHz. Combined, the two channels of DRDRAM have a maximum theoretical bandwidth of 3.2 GB/s, about 33% more bandwidth than the internal data bus. Because of this, the memory controller buffers data sent from the DRDRAM channels so the extra bandwidth can be utilised by the CPU.
The Emotion Engine interfaces directly to the Graphics Synthesizer via the GIF with a dedicated 64-bit, 150 MHz bus that has a maximum theoretical bandwidth of 1.2 GB/s.
To provide communications between the Emotion Engine and the Input Output Processor (IOP), the input output interface interfaces a 32-bit wide, 37.5 MHz input output bus with a maximum theoretical bandwidth of 150 MB/s to the internal data bus. This interface provides vastly more bandwidth than what is required by the PlayStation's input output devices.
Fabrication
The Emotion Engine contained 10.5 million transistors on a die measuring 240 mm2. It was fabricated by Sony and Toshiba in a 0.25 µmMicrometre
A micrometer , is by definition 1×10-6 of a meter .In plain English, it means one-millionth of a meter . Its unit symbol in the International System of Units is μm...
(0.18 µm
Micrometre
A micrometer , is by definition 1×10-6 of a meter .In plain English, it means one-millionth of a meter . Its unit symbol in the International System of Units is μm...
effective LG) complementary metal–oxide–semiconductor (CMOS) process with four levels of interconnect.
Packaging
The Emotion Engine was packaged in a 540-contact plastic ball grid arrayBall grid array
A ball grid array is a type of surface-mount packaging used for integrated circuits.- Description :The BGA is descended from the pin grid array , which is a package with one face covered with pins in a grid pattern. These pins conduct electrical signals from the integrated circuit to the printed...
(PBGA).
Uses
The primary use of the Emotion Engine was to serve as the PlayStation 2PlayStation 2
The PlayStation 2 is a sixth-generation video game console manufactured by Sony as part of the PlayStation series. Its development was announced in March 1999 and it was first released on March 4, 2000, in Japan...
's CPU. The first SKUs of the PlayStation 3
PlayStation 3
The is the third home video game console produced by Sony Computer Entertainment and the successor to the PlayStation 2 as part of the PlayStation series. The PlayStation 3 competes with Microsoft's Xbox 360 and Nintendo's Wii as part of the seventh generation of video game consoles...
also featured an Emotion Engine on the motherboard to achieve backwards compatibility with PlayStation 2 games. However, the second revision of the PlayStation 3
PlayStation 3
The is the third home video game console produced by Sony Computer Entertainment and the successor to the PlayStation 2 as part of the PlayStation series. The PlayStation 3 competes with Microsoft's Xbox 360 and Nintendo's Wii as part of the seventh generation of video game consoles...
lacked a physical Emotion Engine in order to lower costs, performing all of its functions using software emulation performed by the Cell Broadband Processor
Cell microprocessor
Cell is a microprocessor architecture jointly developed by Sony, Sony Computer Entertainment, Toshiba, and IBM, an alliance known as "STI". The architectural design and first implementation were carried out at the STI Design Center in Austin, Texas over a four-year period beginning March 2001 on a...
, coupled with a hardware Graphics Synthesizer still present to achieve PlayStation 2 backwards compatibility. In all subsequent revisions, the Graphics Synthesizer was removed, with no software emulation to replace it.
Specifications
- Clock frequency: 294 MHz, 299 MHz (later versions)
- Instruction setInstruction setAn instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...
: MIPS III, MIPS IV subset, 107 vector instructions - MIPS based core: 2 issue, 2 64-bit fixed point units, 1 floating point unit, 6 stage pipeline
- Instruction cacheCPU cacheA CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...
: 16 KB, 2-way set associative - Data cacheCPU cacheA CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...
: 8 KB, 2-way set associative - Scratchpad RAMScratchpad RAMScratchpad memory , also known as scratchpad, scatchpad RAM or local store in computer terminology, is a high-speed internal memory used for temporary storage of calculations, data, and other work in progress...
: 16 KB - Translation look aside bufferTranslation Lookaside BufferA translation lookaside buffer is a CPU cache that memory management hardware uses to improve virtual address translation speed. All current desktop and server processors use a TLB to map virtual and physical address spaces, and it is ubiquitous in any hardware which utilizes virtual memory.The...
: 48-entry combined instruction/data - Vector processing unit: 4 FMAC units, 1 FDIV unit
- Vector processing unit registersProcessor registerIn computer architecture, a processor register is a small amount of storage available as part of a CPU or other digital processor. Such registers are addressed by mechanisms other than main memory and can be accessed more quickly...
: 128-bit wide, 32 entries - Image processing unit: MPEG2 macroblockMacroblockMacroblock is an image compression component and technique based on discrete cosine transform used on still images and video frames. Macroblocks are usually composed of two or more blocks of pixels. In the JPEG standard macroblocks are called MCU blocks....
layer decoderDecoderA decoder is a device which does the reverse operation of an encoder, undoing the encoding so that the original information can be retrieved. The same method used to encode is usually just reversed in order to decode... - Direct memory accessDirect memory accessDirect memory access is a feature of modern computers that allows certain hardware subsystems within the computer to access system memory independently of the central processing unit ....
: 10 channels - VDD VoltageVoltageVoltage, otherwise known as electrical potential difference or electric tension is the difference in electric potential between two points — or the difference in electric potential energy per unit charge between two points...
: 1.8 V - Power consumption: 15 WWattThe watt is a derived unit of power in the International System of Units , named after the Scottish engineer James Watt . The unit, defined as one joule per second, measures the rate of energy conversion.-Definition:...
at 1.8 V
Theoretical performance
- Floating pointFloating pointIn computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...
: 6.2 billion single precision (32-bit) floating point operations per second - PerspectivePerspective (graphical)Perspective in the graphic arts, such as drawing, is an approximate representation, on a flat surface , of an image as it is seen by the eye...
transformationTransformation (mathematics)In mathematics, a transformation could be any function mapping a set X on to another set or on to itself. However, often the set X has some additional algebraic or geometric structure and the term "transformation" refers to a function from X to itself that preserves this structure.Examples include...
: 66 million polygons per second - With lightingLightingLighting or illumination is the deliberate application of light to achieve some practical or aesthetic effect. Lighting includes the use of both artificial light sources such as lamps and light fixtures, as well as natural illumination by capturing daylight...
and fogFogFog is a collection of water droplets or ice crystals suspended in the air at or near the Earth's surface. While fog is a type of stratus cloud, the term "fog" is typically distinguished from the more generic term "cloud" in that fog is low-lying, and the moisture in the fog is often generated...
: 36 million polygons per second - BezierBézier curveA Bézier curve is a parametric curve frequently used in computer graphics and related fields. Generalizations of Bézier curves to higher dimensions are called Bézier surfaces, of which the Bézier triangle is a special case....
surface patches: 16 million polygons per second - Image decompression: 150 million pixelPixelIn digital imaging, a pixel, or pel, is a single point in a raster image, or the smallest addressable screen element in a display device; it is the smallest unit of picture that can be represented or controlled....
s per second
See also
- PlayStation 2PlayStation 2The PlayStation 2 is a sixth-generation video game console manufactured by Sony as part of the PlayStation series. Its development was announced in March 1999 and it was first released on March 4, 2000, in Japan...
- PlayStation 3PlayStation 3The is the third home video game console produced by Sony Computer Entertainment and the successor to the PlayStation 2 as part of the PlayStation series. The PlayStation 3 competes with Microsoft's Xbox 360 and Nintendo's Wii as part of the seventh generation of video game consoles...
- Graphics card
- Graphics processing unitGraphics processing unitA graphics processing unit or GPU is a specialized circuit designed to rapidly manipulate and alter memory in such a way so as to accelerate the building of images in a frame buffer intended for output to a display...
- Computer graphicsComputer graphicsComputer graphics are graphics created using computers and, more generally, the representation and manipulation of image data by a computer with help from specialized software and hardware....
- List of computer graphics and descriptive geometry topics
- Cell microprocessorCell microprocessorCell is a microprocessor architecture jointly developed by Sony, Sony Computer Entertainment, Toshiba, and IBM, an alliance known as "STI". The architectural design and first implementation were carried out at the STI Design Center in Austin, Texas over a four-year period beginning March 2001 on a...