XDR DRAM
Encyclopedia
XDR DRAM or extreme data rate dynamic random access memory
is a high-performance RAM interface and successor to the Rambus
RDRAM
it is based on, competing with the rival DDR2 SDRAM
and GDDR4
technology. XDR was designed to be effective in small, high-bandwidth consumer systems, high-performance memory applications, and high-end GPUs
. It eliminates the unusually high latency problems that plagued early forms of RDRAM. Also, the XDR DRAM have heavy emphasis on per pin bandwidth, which can benefit further cost control on PCB production. This is because fewer lanes are needed for the same amount of bandwidth. Rambus owns the rights to the technology. XDR is used by Sony
in the PlayStation 3
console.
request/command bus (RQ11..0), and a bidirectional differential data bus up to 16 bits wide (DQ15..0/DQN15..0). The request bus may be connected to several memory chips in parallel, but the data bus is point to point; only one RAM chip may be connected to it. To support different amounts of memory with a fixed-width memory controller, the chips have a programmable interface width. A 32-bit-wide DRAM controller may support 2 16-bit chips, or be connected to 4 memory chips each of which supplies 8 bits of data, or up to 16 chips configured with 2-bit interfaces.
In addition, each chip has a low-speed serial bus used to determine its capabilities and configure its interface. This consists of three shared inputs: a reset line (RST), a serial command input (CMD) and a serial clock (SCK), and serial data in/out lines (SDI and SDO) that are daisy-chained together and eventually connect to a single pin on the memory controller.
All single-ended lines are active-low; an asserted signal or logical 1 is represented by a low voltage.
The request bus operates at double data rate
relative to the clock input. Two consecutive 12-bit transfers (beginning with the falling edge of CFM) make a 24-bit command packet.
The data bus operates at 8x the speed of the clock; a 400 MHz clock generates 3200 MT/s. All data reads and writes operate in 16-transfer bursts lasting 2 clock cycles.
Request packet formats are as follows:
There are a large number of timing constraints giving minimum times that must elapse between various commands (see Dynamic random access memory: Memory timing); the DRAM controller sending them must ensure they are all met.
Some commands contain delay fields. These delay the effect of the command by the given number of clock cycles. This permits multiple commands (to different banks) to take effect on the same clock cycle.
If the chip is using a data bus less than 16 bits wide, one or more of the sub-column address bits are used to select the portion of the column to be presented on the data bus. If the data bus is 8 bits wide, SC3 is used to identify which half of the read data to access; if the data bus is 4 bits wide, SC3 and SC2 are used, etc.
Unlike conventional SDRAM, there is no provision for choosing the order in which the data is supplied within a burst. Thus, it is not possible to perform critical-word-first reads.
Each byte is the 8 consecutive bits transferred across one data line during a particular clock cycle. M0 is matched to the first data bit transferred during a clock cycle, and M7 is matched to the last bit.
This convention also interferes with performing critical-word-first reads; any word must include bits from at least the first 8 bits transferred.
Precharge commands may only be sent to one bank at a time; unlike a conventional SDRAM, there is no "precharge all banks" command.
Refresh commands are also different from a conventional SDRAM. There is no "refresh all banks" command, and the refresh operation is divided into separate activate and precharge operations so the timing is determined by the memory controller. The refresh counter is also programmable by the controller. Operations are:
The fourth subcommand places the chip in power-down mode. In this mode, it performs internal refresh and ignores the high-speed data lines. It must be woken up using the low-speed serial bus.
On reset, each chip drives its SDO pin low (1). When reset is released, a series of SCK pulses are sent to the chips. Each chip drives its SDO output high (0) one cycle after seeing its SDI input high (0). Further, it counts the number of cycles that elapse between releasing reset and seeing its SDI input high, and copies that count to an internal chip ID register. Commands sent by the controller over the CMD line include an address which must match the chip ID field.
Normally, the CMD line is left high (logic 0) and SCK pulses have no effect. To send a command, a sequence of 32 bits is clocked out over the CMD lines:
Dynamic random access memory
Dynamic random-access memory is a type of random-access memory that stores each bit of data in a separate capacitor within an integrated circuit. The capacitor can be either charged or discharged; these two states are taken to represent the two values of a bit, conventionally called 0 and 1...
is a high-performance RAM interface and successor to the Rambus
Rambus
Rambus Incorporated , founded in 1990, is a technology licensing company. The company became well known for its intellectual property based litigation following the introduction of DDR-SDRAM memory.- History :...
RDRAM
RDRAM
Direct Rambus DRAM or DRDRAM is a type of synchronous dynamic RAM. RDRAM was developed by Rambus inc., in the mid-1990s as a replacement for then-prevalent DIMM SDRAM memory architecture....
it is based on, competing with the rival DDR2 SDRAM
DDR2 SDRAM
DDR2 SDRAM is a double data rate synchronous dynamic random-access memory interface. It supersedes the original DDR SDRAM specification and has itself been superseded by DDR3 SDRAM...
and GDDR4
GDDR4
GDDR4 SDRAM is a type of graphics card memory specified by the JEDEC Semiconductor Memory Standard. It is a rival medium to Rambus's XDR DRAM...
technology. XDR was designed to be effective in small, high-bandwidth consumer systems, high-performance memory applications, and high-end GPUs
Graphics processing unit
A graphics processing unit or GPU is a specialized circuit designed to rapidly manipulate and alter memory in such a way so as to accelerate the building of images in a frame buffer intended for output to a display...
. It eliminates the unusually high latency problems that plagued early forms of RDRAM. Also, the XDR DRAM have heavy emphasis on per pin bandwidth, which can benefit further cost control on PCB production. This is because fewer lanes are needed for the same amount of bandwidth. Rambus owns the rights to the technology. XDR is used by Sony
Sony
, commonly referred to as Sony, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan and the world's fifth largest media conglomerate measured by revenues....
in the PlayStation 3
PlayStation 3
The is the third home video game console produced by Sony Computer Entertainment and the successor to the PlayStation 2 as part of the PlayStation series. The PlayStation 3 competes with Microsoft's Xbox 360 and Nintendo's Wii as part of the seventh generation of video game consoles...
console.
Performance
- Initial clock rate at 400 MHz.
- Octal Data Rate (ODR): Eight bits per clock cycle per lane.
- Each chip provides 8, 16, or 32 programmable lanes, providing up to 230.4 GbitGigabitThe gigabit is a multiple of the unit bit for digital information or computer storage. The prefix giga is defined in the International System of Units as a multiplier of 109 , and therefore...
/s (28.8 GBGigabyteThe gigabyte is a multiple of the unit byte for digital information storage. The prefix giga means 109 in the International System of Units , therefore 1 gigabyte is...
/s) at 900 MHz (7.2 GHz effective).
Features
- Bi-directional differentialDifferential signalingDifferential signaling is a method of transmitting information electrically by means of two complementary signals sent on two separate wires. The technique can be used for both analog signaling, as in some audio systems, and digital signaling, as in RS-422, RS-485, Ethernet , PCI Express and USB...
Rambus Signalling Levels (DRSL)- This uses differential open-collectorOpen collectorAn open collector is a common type of output found on many integrated circuits . Instead of outputting a signal of a specific voltage or current, the output signal is applied to the base of an internal NPN transistor whose collector is externalized on a pin of the IC. The emitter of the...
driver, voltage swing 0.2V. It is not the same as LVDSLow voltage differential signalingLow-voltage differential signaling, or LVDS, is an electrical digital signaling system that can run at very high speeds over inexpensive twisted-pair copper cables. It was introduced in 1994, and has since become very popular in computers, where it forms part of very high-speed networks and...
.http://www.rambus.com/products/xdr/innovations/drsl.aspx
- This uses differential open-collector
- Programmable on-chip termination
- Adaptive impedance matching
- Eight bank memory architecture
- Up to four bank-interleaved transactions at full bandwidth
- Point-to-point data interconnect
- Chip scale packageChip scale packageA chip scale package is a type of integrated circuit chip carrier.Originally, CSP was the acronym for chip-size packaging. Since only a few packages are chip size, the meaning of the acronym was adapted to chip-scale packaging...
packaging - Dynamic request scheduling
- Early-read-after-write support for maximum efficiency
- Zero overhead refresh
Power requirements
- 1.8 V Vdd
- Programmable ultra-low-voltage DRSL 200 mV swing
- Low-power PLLPhase-locked loopA phase-locked loop or phase lock loop is a control system that generates an output signal whose phase is related to the phase of an input "reference" signal. It is an electronic circuit consisting of a variable frequency oscillator and a phase detector...
/DLLDelay-locked loopIn electronics, a delay-locked loop is a digital circuit similar to a phase-locked loop , with the main difference being the absence of an internal voltage-controlled oscillator...
design - Power-down self-refresh support
- Dynamic data width support with dynamic clock gating
- Per-pin I/O power-down
- Sub-page activation support
Ease of system design
- Per-bit FlexPhase circuits compensate to a 2.5 ps resolution
- XDR Interconnect uses minimum pin count
Protocol
An XDR RAM chip's high-speed signals are a differential clock input (clock from master, CFM/CFMN), a 12-bit single-endedSingle-ended signalling
Single-ended signaling is the simplest and most commonly used method of transmitting electrical signals over wires. One wire carries a varying voltage that represents the signal, while the other wire is connected to a reference voltage, usually ground....
request/command bus (RQ11..0), and a bidirectional differential data bus up to 16 bits wide (DQ15..0/DQN15..0). The request bus may be connected to several memory chips in parallel, but the data bus is point to point; only one RAM chip may be connected to it. To support different amounts of memory with a fixed-width memory controller, the chips have a programmable interface width. A 32-bit-wide DRAM controller may support 2 16-bit chips, or be connected to 4 memory chips each of which supplies 8 bits of data, or up to 16 chips configured with 2-bit interfaces.
In addition, each chip has a low-speed serial bus used to determine its capabilities and configure its interface. This consists of three shared inputs: a reset line (RST), a serial command input (CMD) and a serial clock (SCK), and serial data in/out lines (SDI and SDO) that are daisy-chained together and eventually connect to a single pin on the memory controller.
All single-ended lines are active-low; an asserted signal or logical 1 is represented by a low voltage.
The request bus operates at double data rate
Double data rate
In computing, a computer bus operating with double data rate transfers data on both the rising and falling edges of the clock signal. This is also known as double pumped, dual-pumped, and double transition....
relative to the clock input. Two consecutive 12-bit transfers (beginning with the falling edge of CFM) make a 24-bit command packet.
The data bus operates at 8x the speed of the clock; a 400 MHz clock generates 3200 MT/s. All data reads and writes operate in 16-transfer bursts lasting 2 clock cycles.
Request packet formats are as follows:
Clock edge |
Bit | NOP | Column read/write | Calibrate/power-down | Precharge/refresh | Row Activate | Masked write | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Bit | Bit | Description | Bit | Description | Bit | Description | Bit | Description | Bit | Description | ||
↓ | RQ11 | 0 | 0 | COL opcode | 0 | COLX opcode | 0 | ROWP opcode | 0 | ROWA opcode | 1 | COLM opcode |
↓ | RQ10 | 0 | 0 | 0 | 0 | 1 | M3 | Write mask low bits |
||||
↓ | RQ9 | 0 | 0 | 1 | 1 | R9 | Row address high bits |
M2 | ||||
↓ | RQ8 | 0 | 1 | 0 | 1 | R10 | M1 | |||||
↓ | RQ7 | x | WRX | Write/Read bit | x | reserved | POP1 | Precharge delay (0..3) |
R11 | M0 | ||
↓ | RQ6 | x | C8 | Column address high bits |
x | POP0 | R12 | reserved | C8 | Column address high bits |
||
↓ | RQ5 | x | C9 | x | x | reserved | R13 | C9 | ||||
↓ | RQ4 | x | C10 | reserved | x | x | R14 | C10 | reserved | |||
↓ | RQ3 | x | C11 | XOP3 | Subopcode | x | R15 | C11 | ||||
↓ | RQ2 | x | BC2 | Bank address | XOP2 | BP2 | Precharge bank | BA2 | Bank address | BC2 | Bank address | |
↓ | RQ1 | x | BC1 | XOP1 | BP1 | BA1 | BC1 | |||||
↓ | RQ0 | x | BC0 | XOP0 | BP0 | BA0 | BC0 | |||||
↑ | RQ11 | x | DELC | Command delay (0..1) | x | reserved | POP2 | Precharge enable | DELA | Command delay (0..1) | M7 | Write mask high bits |
↑ | RQ10 | x | x | reserved | x | ROP2 | Refresh command | R8 | Row address low bits |
M6 | ||
↑ | RQ9 | x | x | x | ROP1 | R7 | M5 | |||||
↑ | RQ8 | x | x | x | ROP0 | R6 | M4 | |||||
↑ | RQ7 | x | C7 | Column address low bits |
x | DELR1 | Refresh delay (0..3) |
R5 | C7 | Column address low bits |
||
↑ | RQ6 | x | C6 | x | DELR0 | R4 | C6 | |||||
↑ | RQ5 | x | C5 | x | x | reserved | R3 | C5 | ||||
↑ | RQ4 | x | C4 | x | x | R2 | C4 | |||||
↑ | RQ3 | x | SC3 | Sub-column address | x | x | R1 | SC3 | Sub-column address | |||
↑ | RQ2 | x | SC2 | x | BR2 | Refresh bank | R0 | SC2 | ||||
↑ | RQ1 | x | SC1 | x | BR1 | SR1 | Sub-row address | SC1 | ||||
↑ | RQ0 | x | SC0 | x | BR0 | SR0 | SC0 |
There are a large number of timing constraints giving minimum times that must elapse between various commands (see Dynamic random access memory: Memory timing); the DRAM controller sending them must ensure they are all met.
Some commands contain delay fields. These delay the effect of the command by the given number of clock cycles. This permits multiple commands (to different banks) to take effect on the same clock cycle.
Row activate command
This operates equivalently to standard SDRAM's activate command, specifying a row address to be loaded into the bank's sense amplifier array. To save power, a chip may be configured to only activate a portion of the sense amplifier array. In this case, the SR1..0 bits specify the half or quarter of the row to activate, and following read/write commands' column addresses are required to be limited to that portion. (Refresh operations always use the full row.)Read/write commands
These operate analogously to a standard SDRAM's read or write commands, specifying a column address. Data is provided to the chip a few cycles after a write command (typically 3), and is output by the chip several cycles after a read command (typically 6). Just as with other forms of SDRAM, the DRAM controller is responsible for ensuring that the data bus is not scheduled for use in both directions at the same time. Data is always transferred in 16-transfer bursts, lasting 2 clock cycles. Thus, for a ×16 device, 256 bits (32 bytes) are transferred per burst.If the chip is using a data bus less than 16 bits wide, one or more of the sub-column address bits are used to select the portion of the column to be presented on the data bus. If the data bus is 8 bits wide, SC3 is used to identify which half of the read data to access; if the data bus is 4 bits wide, SC3 and SC2 are used, etc.
Unlike conventional SDRAM, there is no provision for choosing the order in which the data is supplied within a burst. Thus, it is not possible to perform critical-word-first reads.
Masked write command
The masked write command is similar to a normal write, but no command delay is permitted and a mask byte is supplied, which permits controlling which 8-bit fields are written. This is not a bitmap indicating which bytes are to be written; it would not be large enough for the 32 bytes in a write burst. Rather, it is a bit pattern which the DRAM controller fills unwritten bytes with. The DRAM controller is responsible for finding a pattern which does not appear in the other bytes that are to be written. Because there are 256 possible patterns and only 32 bytes in the burst, it is straightforward to find one. Even when multiple devices are connected in parallel, a mask byte can always be found when the bus is at most 128 bits wide. (This would produce 256 bytes per burst, but a masked write command is only used if at least one of them is not to be written.)Each byte is the 8 consecutive bits transferred across one data line during a particular clock cycle. M0 is matched to the first data bit transferred during a clock cycle, and M7 is matched to the last bit.
This convention also interferes with performing critical-word-first reads; any word must include bits from at least the first 8 bits transferred.
Precharge/refresh command
This command is similar to a combination of a conventional SDRAM's precharge and refresh commands. The POPx and BPx bits specify a precharge operation, while the ROPx, DELRx, and BRx bits specify a refresh operation. Each may be separately enabled. If enabled, each may have a different command delay and must be addressed to a different bank.Precharge commands may only be sent to one bank at a time; unlike a conventional SDRAM, there is no "precharge all banks" command.
Refresh commands are also different from a conventional SDRAM. There is no "refresh all banks" command, and the refresh operation is divided into separate activate and precharge operations so the timing is determined by the memory controller. The refresh counter is also programmable by the controller. Operations are:
- 000: NOPR Perform no refresh operation
- 001: REFP Refresh precharge; end the refresh operation on the selected bank.
- 010: REFA Refresh activate; activate the row selected by the REFH/M/L register and selected bank for refresh.
- 011: REFI Refresh & increment; as for REFA, but also increment the REFH/M/L register.
- 100: LRR0 Load refresh register low; copy RQ7–0 to the low 8 bits of the refresh counter REFL. No command delay.
- 101: LRR1 Load refresh register middle; copy RQ7–0 to the middle 8 bits of the refresh counter REFM. No command delay.
- 110: LRR2 Load refresh register high; copy RQ7–0 to the high 8 bits of the refresh counter REFH (if implemented). No command delay.
- 111 reserved
Calibrate/powerdown command
This command performs a number of miscellaneous functions, as determined by the XOPx field. Although there are 16 possibilities, only 4 are actually used. Three subcommands start and stop output driver calibration (which must be performed periodically, every 100 ms).The fourth subcommand places the chip in power-down mode. In this mode, it performs internal refresh and ignores the high-speed data lines. It must be woken up using the low-speed serial bus.
Low-speed serial bus
XDR DRAMs are probed and configured using a low-speed serial bus. The RST, SCK, and CMD signals are driven by the controller to every chip in parallel. The SDI and SDO lines are daisy-chained together, with the last SDO output connected to the controller, and the first SDI input tied high (logic 0).On reset, each chip drives its SDO pin low (1). When reset is released, a series of SCK pulses are sent to the chips. Each chip drives its SDO output high (0) one cycle after seeing its SDI input high (0). Further, it counts the number of cycles that elapse between releasing reset and seeing its SDI input high, and copies that count to an internal chip ID register. Commands sent by the controller over the CMD line include an address which must match the chip ID field.
General structure of commands
Each command either reads or writes a single 8-bit register, using an 8-bit address. This allows up to 256 registers, but only the range 1–31 is currently assigned.Normally, the CMD line is left high (logic 0) and SCK pulses have no effect. To send a command, a sequence of 32 bits is clocked out over the CMD lines:
- 4 bits of
1100
, a command start signal. - A read/write bit. If 0, this is a read, if 1 this is a write.
- A single/broadcast bit. If 0, only the device with the matching ID is selected. If 1, all devices execute the command.
- 6 bits of serial device ID. Device IDs are automatically assigned, starting with 0, on device reset.
- 8 bits of register address
- A single bit of "0". This provides time to process read requests, and enable the SDO output in case of a read,
- 8 bits of data. If this is a read command, the bits provided must be 0, and the register's value is produced on the SDO pin of the selected chip. All non-selected chips connect their SDI inputs to their SDO outputs, so the controller will see the value.
- A single bit of "0". This ends the command and provides time to disable the SDO output.