Tomasulo algorithm
Encyclopedia
The Tomasulo algorithm is a hardware algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

 developed in 1967 by Robert Tomasulo
Robert Tomasulo
Robert Marco Tomasulo was a computer scientist, and the inventor of the Tomasulo algorithm. Tomasulo was the recipient of the 1997 Eckert–Mauchly Award "[f]or the ingenious Tomasulo's algorithm, which enabled out-of-order execution processors to be implemented."On January 30, 2008, Tomasulo spoke...

 from IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

. It allows sequential instructions that would normally be stalled due to certain dependencies to execute non-sequentially (out-of-order execution
Out-of-order execution
In computer engineering, out-of-order execution is a paradigm used in most high-performance microprocessors to make use of instruction cycles that would otherwise be wasted by a certain type of costly delay...

). It was first implemented for the IBM System/360 Model 91’s floating point unit.

This algorithm differs from scoreboarding
Scoreboarding
Scoreboarding is a centralized method, used in the CDC 6600 computer, for dynamically scheduling a pipeline so that the instructions can execute out of order when there are no conflicts and the hardware is available. In a scoreboard, the data dependencies of every instruction are logged...

 in that it utilizes register renaming
Register renaming
In computer architecture, register renaming refers to a technique used to avoid unnecessary serialization of program operations imposed by the reuse of registers by those operations.-Problem definition:...

. Where scoreboarding resolves Write-after-Write (WAW) and Write-after-Read (WAR) hazards
Hazard (computer architecture)
Hazards are problems with the instruction pipeline in central processing unit microarchitectures that potentially result in incorrect computation...

 by stalling, register renaming allows the continual issuing of instructions. The Tomasulo algorithm also uses a common data bus (CDB) on which computed values are broadcast to all the reservation stations
Reservation stations
Reservation Stations are decentralized features of the microarchitecture of a CPU that allow for register renaming, and are used by the Tomasulo algorithm for dynamic instruction scheduling....

 that may need it. This allows for improved parallel execution of instructions which may otherwise stall under the use of scoreboarding.

Robert Tomasulo received the Eckert-Mauchly Award
Eckert-Mauchly Award
The Eckert–Mauchly Award recognizes contributions to digital systems and computer architecture. First awarded in 1979, it was named for John Presper Eckert and John William Mauchly, who between 1943 and 1946 collaborated on the design and construction of the first large scale electronic computing...

 in 1997 for this algorithm.

Implementation concepts

The following are the concepts necessary to the implementation of Tomasulo's Algorithm.
  • Instructions are issued sequentially so that the effects of a sequence of instructions such as exceptions raised by these instructions occur in the same order as they would in a non-pipelined processor, regardless of the fact that they are being executed non-sequentially.

  • All general-purpose and reservation station
    Reservation stations
    Reservation Stations are decentralized features of the microarchitecture of a CPU that allow for register renaming, and are used by the Tomasulo algorithm for dynamic instruction scheduling....

     registers hold either real or virtual values. If a real value is unavailable to a destination register during the issue stage, a virtual value is initially used. The functional unit that is computing the real value is assigned as the virtual value. The virtual register values are converted to real values as soon as the designated functional unit completes its computation.

  • Functional units use reservation stations
    Reservation stations
    Reservation Stations are decentralized features of the microarchitecture of a CPU that allow for register renaming, and are used by the Tomasulo algorithm for dynamic instruction scheduling....

     with multiple slots. Each slot holds information needed to execute a single instruction, including the operation and the operands. The functional unit begins processing when it is free and when all source operands needed for an instruction are real.

Instruction lifecycle

The three stages listed below are the stages through which each instruction passes from the time it is issued to the time its execution is complete.

Stage 1: issue

In the issue stage, instructions are issued for execution if all operands and reservation stations are ready or else they are stalled. Registers are renamed in this step, eliminating WAR and WAW hazards.
  • Retrieve the next instruction from the head of the instruction queue. If the instruction operands are currently in the registers, then
    • If a matching functional unit is available, issue the instruction.
    • Else, as there is no available functional unit, stall the instruction until a station or buffer is free.
  • Otherwise, we can assume the operands are not in the registers, so use virtual values; the functional unit must calculate the real value, in order to keep track of the functional units that will produce the operand.

Stage 2: execute

In the execute stage, the instruction operations are carried out. Instructions are delayed in this step until all of their operands are available, eliminating RAW hazards. Program correctness is maintained through effective address calculation to prevent hazards through memory.
  • If one or more of the operands is not yet available then: wait for operand to become available on the CDB.
  • When all operands are available, then: if the instruction is a load or store
    • Compute the effective address when the base register is available, and place it in the load/store buffer
      • If the instruction is a load then: execute as soon as the memory unit is available
      • Else, if the instruction is a store then: wait for the value to be stored before sending it to the memory unit
  • Else, the instruction is an ALU operation then: execute the instruction at the corresponding functional unit

Stage 3: write result

In the write Result stage, ALU operations results are written back to registers and store operations are written back to memory.
  • If the instruction was an ALU operation
    • If the result is available, then: write it on the CDB and from there into the registers and any reservation stations waiting for this result
  • Else, if the instruction was a store then: write the data to memory during this step

See also

  • Re-order buffer
    Re-order buffer
    A re-order buffer is used in a Tomasulo algorithm for out-of-order instruction execution. It allows instructions to be committed in-order....

  • Instruction level parallelism
    Instruction level parallelism
    Instruction-level parallelism is a measure of how many of the operations in a computer program can be performed simultaneously. Consider the following program: 1. e = a + b 2. f = c + d 3. g = e * f...

  • Out-of-order execution
    Out-of-order execution
    In computer engineering, out-of-order execution is a paradigm used in most high-performance microprocessors to make use of instruction cycles that would otherwise be wasted by a certain type of costly delay...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK