Runahead
Encyclopedia
Runahead is a technique that allows a microprocessor
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...

 to pre-process instructions during cache
CPU cache
A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...

 miss cycles instead of stalling. The pre-processed instructions are used to generate instruction and data stream
Data stream
In telecommunications and computing, a data stream is a sequence of digitally encoded coherent signals used to transmit or receive information that is in the process of being transmitted....

 prefetches
Instruction prefetch
In computer architecture, instruction prefetch is a technique used in microprocessors to speed up the execution of a program by reducing wait states....

 by detecting cache misses
Cache
In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...

 before they would otherwise occur by using the idle execution resources to calculate instruction and data stream fetch addresses using the available information that is independent of the cache miss.

The principal hardware cost is a means of checkpointing
Application checkpointing
Checkpointing is a technique for inserting fault tolerance into computing systems. It basically consists of storing a snapshot of the current application state, and later on, use it for restarting the execution in case of failure.- Technique properties :...

 the register
Hardware register
In digital electronics, especially computing, a hardware register stores bits of information, in a way that all the bits can be written to or read out simultaneously.The hardware registers inside a central processing unit are called processor registers....

 file state and preventing pre-processed stores from modifying memory. This checkpointing can be accomplished using very little hardware since all results computed during runahead are discarded after the cache miss has been serviced, at which time normal execution resumes using the checkpointed register file
Register file
A register file is an array of processor registers in a central processing unit . Modern integrated circuit-based register files are usually implemented by way of fast static RAMs with multiple ports...

 state.

Branch outcomes computed during runahead mode can be saved into a shift register
Shift register
In digital circuits, a shift register is a cascade of flip flops, sharing the same clock, which has the output of any one but the last flip-flop connected to the "data" input of the next one in the chain, resulting in a circuit that shifts by one position the one-dimensional "bit array" stored in...

, which can be used as a highly accurate branch predictor
Branch predictor
In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch will go before this is known for sure. The purpose of the branch predictor is to improve the flow in the instruction pipeline...

 when normal operation resumes.

Runahead was initially investigated in the context of an in-order microprocessor, however this technique has been extended for use with out of order
Out-of-order execution
In computer engineering, out-of-order execution is a paradigm used in most high-performance microprocessors to make use of instruction cycles that would otherwise be wasted by a certain type of costly delay...

 microprocessors.

Entering runahead

When a runahead processor detects a level one instruction
Instruction set
An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...

 or data cache miss it records the instruction address of the faulting access and enters runahead mode. A demand fetch for the missing instruction or data cache line is generated if necessary. The processor checkpoints the register file by one of several mechanisms discussed later. The state of the memory hierarchy
Memory hierarchy
The term memory hierarchy is used in the theory of computation when discussing performance issues in computer architectural design, algorithm predictions, and the lower level programming constructs such as involving locality of reference. A 'memory hierarchy' in computer storage distinguishes each...

 is checkpointed by disabling stores. Store instructions are allowed to compute addresses
Memory address
A digital computer's memory, more specifically main memory, consists of many memory locations, each having a memory address, a number, analogous to a street address, at which computer programs store and retrieve, machine code or data. Most application programs do not directly read and write to...

 and check for a hit, but they are not allowed to write to memory.

Because the value returned from a cache miss cannot be known ahead of time, it is possible for pre-processed instructions to be dependent upon invalid data. These are denoted by adding an "invalid" or INV bit to every register in the register file. If runahead was initiated by a load instruction, the load's destination register is marked INV.

Pre-processing instructions

The processor then continues to execute instructions after the miss, however all results are strictly temporary and are only used to attempt to generate additional load, store, and instruction cache misses, which are turned into prefetches. The designer can opt to allow runahead to skip over instructions that are not present in the instruction cache with the understanding that the quality of any prefetches generated will be reduced since the effect of the missing instructions is unknown.

Registers that are the target of an instruction that has one or more source registers marked INV are marked INV. This allows the processor to know which register values can be (reasonably) trusted during runahead mode. Branch instructions that cannot be resolved due to INV sources are simply assumed to have had their direction predicted correctly. Branch outcomes are saved in a shift register for later use as highly accurate predictions during normal operation.

Note that it is not possible to perfectly track INV register values during runahead mode. This is not required since runahead is only used to optimize performance and all results computed during runahead mode are discarded. In fact, it is impossible to perfectly track invalid register values if runahead was initiated by an instruction cache miss, an instruction cache miss occurred during runahead, a load is dependent upon a store with an INV address (assumes that hardware is present to allow store to load forwarding during runahead), or if a branch outcome during runahead is dependent upon an INV register.

Leaving runahead

The register file state is restored from the checkpoint and the processor is redirected to the original faulting fetch address when the fetch that initiated runahead mode has been serviced.

Register file checkpoint options

The most obvious method of checkpointing the register file (RF) is to simply perform a flash copy
FlashCopy
FlashCopy is an IBM feature supported on various IBM storage devices that makes it possible to create, nearly instantaneously, Point in Time copies of entire logical volumes or data sets. The Hitachi Data Systems implementation providing similar function is branded as ShadowImage...

to a shadow register file, or backup register file (BRF) when the processor enters runahead mode, then perform a flash copy from the BRF to the RF when normal operation resumes. There are simpler options available.

One way to eliminate the flash copy operations is to write to both the BRF and RF during normal operation, read from only the RF during normal operation, and read/write only the BRF during runahead mode.

An even more aggressive approach is to eliminate the BRF and rely upon the forwarding paths to provide modified values during runahead mode. Checkpointing is accomplished by disabling register file writes. Modified values during runahead mode can only be provided by the forwarding paths.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK