Threaded code
Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, the term threaded code refers to a compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

 implementation technique where the generated code has a form that essentially consists entirely of calls to subroutine
Subroutine
In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....

s. The code may be processed by an interpreter
Interpreter (computing)
In computer science, an interpreter normally means a computer program that executes, i.e. performs, instructions written in a programming language...

, or may simply be a sequence of machine code
Machine code
Machine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit. Each instruction performs a very specific task, typically either an operation on a unit of data Machine code or machine language is a system of impartible instructions...

 call
Command (computing)
In computing, a command is a directive to a computer program acting as an interpreter of some kind, in order to perform a specific task. Most commonly a command is a directive to some kind of command line interface, such as a shell....

 instructions.

Threaded code has better code density than code generated by alternative code generation techniques and alternative calling convention
Calling convention
In computer science, a calling convention is a scheme for how subroutines receive parameters from their caller and how they return a result; calling conventions can differ in:...

s, at the expense of slightly slower execution speed (usually only one machine instruction). However, a program small enough to fit fully in a computer processor's cache
Cache
In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...

 may run faster than a larger program that suffers many cache misses.

Threaded code is best known as the implementation technique commonly used in some programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

s, such as Forth, many implementations of BASIC
BASIC
BASIC is a family of general-purpose, high-level programming languages whose design philosophy emphasizes ease of use - the name is an acronym from Beginner's All-purpose Symbolic Instruction Code....

, some implementations of COBOL
COBOL
COBOL is one of the oldest programming languages. Its name is an acronym for COmmon Business-Oriented Language, defining its primary domain in business, finance, and administrative systems for companies and governments....

, early versions of B, and other languages for small minicomputer
Minicomputer
A minicomputer is a class of multi-user computers that lies in the middle range of the computing spectrum, in between the largest multi-user systems and the smallest single-user systems...

s and satellites
AMSAT
AMSAT is a name for amateur radio satellite organizations worldwide, but in particular the Radio Amateur Satellite Corporation with headquarters at Silver Spring, Maryland, near Washington DC. AMSAT organizations design, build, arrange launches for, and then operate satellites carrying amateur...

.

Preparatory history

The common way to make computer programs is to 'translate' a computer program written in some symbolic language to machine code
Machine code
Machine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit. Each instruction performs a very specific task, typically either an operation on a unit of data Machine code or machine language is a system of impartible instructions...

 using a compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

. The code is typically fast but nonportable since the binary
Executable
In computing, an executable file causes a computer "to perform indicated tasks according to encoded instructions," as opposed to a data file that must be parsed by a program to be meaningful. These instructions are traditionally machine code instructions for a physical CPU...

 code is designed for a specific hardware
Hardware
Hardware is a general term for equipment such as keys, locks, hinges, latches, handles, wire, chains, plumbing supplies, tools, utensils, cutlery and machine parts. Household hardware is typically sold in hardware stores....

 platform.
A different approach uses a virtual machine
Virtual machine
A virtual machine is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software emulation or hardware virtualization or both together.-VM Definitions:A virtual machine is a software...

 instruction set
Instruction set
An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...

 - that has no particular target hardware. An interpreter executes it on each new target hardware.

Early computers had relatively little memory. For example, most Data General Nova
Data General Nova
The Data General Nova was a popular 16-bit minicomputer built by the American company Data General starting in 1969. The Nova was packaged into a single rack mount case and had enough power to do most simple computing tasks. The Nova became popular in science laboratories around the world, and...

, IBM 1130
IBM 1130
The IBM 1130 Computing System was introduced in 1965. It was IBM's least-expensive computer to date, and was aimed at price-sensitive, computing-intensive technical markets like education and engineering. It succeeded the IBM 1620 in that market segment. The IBM 1800 was a process control variant...

, and many Apple II
Apple II
The Apple II is an 8-bit home computer, one of the first highly successful mass-produced microcomputer products, designed primarily by Steve Wozniak, manufactured by Apple Computer and introduced in 1977...

 computers had only 4 K words of RAM installed. Consequently a lot of time was spent trying to find ways to reduce the size of programs so they would fit in the memory available. At the same time, computers were relatively slow, so simple interpretation was very noticeably slower than executing machine code.

Instead of writing out every step of an operation in every part of the program where it was needed, programmers saved memory by writing each step of such operations once (see "Don't repeat yourself
Don't repeat yourself
In software engineering, Don't Repeat Yourself is a principle of software development aimed at reducing repetition of information of all kinds, especially useful in multi-tier architectures...

") and placing it in a subroutine.

This process — code refactoring — is used today, although for different reasons. The top-level application in these programs may consist of nothing but subroutine calls. Many of these subroutines, in turn, also consist of nothing but lower level subroutine calls.

Mainframes and some early microprocessors such as the RCA 1802
RCA 1802
The RCA CDP1802, also known as the COSMAC , is an 8-bit CMOS microprocessor introduced by RCA in early 1976. It is being by Intersil Corporation as a high-reliability microprocessor...

 required several instructions to call a subroutine. In the top-level application and in many subroutines, that sequence is constantly repeated, only the subroutine address changing from one call to the next. Using memory to store the same instructions repeatedly is wasteful.

To save space, programmers squeezed that series of subroutine calls into a list containing only contiguous addresses of the sub-routines, and used a tiny "interpreter" to call each subroutine in turn.

This is identical to the way other programmers squeezed a series of jumps in a branch table
Branch table
In computer programming, a branch table is a term used to describe an efficient method of transferring program control to another part of a program using a table of branch instructions. It is a form of multiway branch...

, dispatch table
Dispatch table
In computer science, a dispatch table is a table of pointers to functions or methods. Use of such a table is a common technique when implementing late binding in object-oriented programming.-Perl implementation:...

, or virtual method table
Virtual method table
A virtual method table, virtual function table, dispatch table, or vtable, is a mechanism used in a programming language to support dynamic dispatch ....

 into a list containing only the destination addresses, and used a small selector to branch to the selected destination.
In threaded code and these other techniques, the program becomes a list of entry points to the actual code to be executed.

Over the years, programmers have created many variations on that "interpreter" or "small selector".
The particular address in the list of addresses may be extracted using an index, general purpose register or pointer. The addresses may be direct or indirect, contiguous or non-contiguous (linked by pointers), relative or absolute, resolved at compile time or dynamically built.
No one variation is "best".

Development

To save space, programmers squeezed the lists of subroutine calls into simple lists of subroutine addresses, and used a small loop to call each subroutine in turn. For example:


start:
ip = &thread
top:
jump *ip++
thread:
&pushA
&pushB
&add
...
pushA:
*sp++ = A
jump top
pushB:
*sp++ = B
jump top
add:
*sp++ = *--sp + *--sp
jump top


In this case, decoding the bytecodes is performed once, during program compilation or program load, so it is not repeated each time an instruction is executed. This can save much time and space when decode and dispatch overhead is large compared to the execution cost.

Note, however, addresses in thread for &pushA, &pushB, etc., are two or more bytes, compared to one byte, typically, for the decode and dispatch interpreter described above. In general, instructions for a decode and dispatch interpreter may be any size. For example, a decode and dispatch interpreter to simulate an Intel Pentium decodes instructions that range from 1 to 16 bytes. However, bytecoded systems typically choose 1-byte codes for the most-common operations. Thus, the thread often has a higher space cost than bytecodes. In most uses, the reduction in decode cost outweighs the increase in space cost.

Note also that while bytecodes are nominally machine-independent, the format and value of the pointers in threads generally depend on the target machine which is executing the interpreter. Thus, an interpreter might load a portable bytecode program, decode the bytecodes to generate platform-dependent threaded code, then execute threaded code without further reference to the bytecodes.

The loop is simple, so is duplicated in each handler, removing jump top from the list of machine instructions needed to execute each interpreter instruction. For example:


start:
ip = &thread
jump *ip++
thread:
&pushA
&pushB
&add
...
pushA:
*sp++ = A
jump *ip++
pushB:
*sp++ = B
jump *ip++
add:
*sp++ = *--sp + *--sp
jump *ip++


This is called direct threaded code (DTC). Although the technique is older, the first widely circulated use of the term "threaded code" is probably Bell's article "Threaded Code" from 1973.

Charles H. Moore
Charles H. Moore
Charles H. Moore is the inventor of the Forth programming language.- Biography :In 1968, while employed at the United States National Radio Astronomy Observatory , Moore invented the initial version of the Forth language to help control radio telescopes...

 invented a more compact notation in 1970 for his Forth virtual machine: indirect threaded code (ITC). Originally, Moore invented this because it was easy and fast on Nova
Data General Nova
The Data General Nova was a popular 16-bit minicomputer built by the American company Data General starting in 1969. The Nova was packaged into a single rack mount case and had enough power to do most simple computing tasks. The Nova became popular in science laboratories around the world, and...

 minicomputers, which have an indirection bit in every address. He said (in published remarks, Byte Magazine's Forth Issue) that he found it so convenient that he propagated it into all later Forth designs.

Some Forth compilers compile Forth programs into direct-threaded code, while others make indirect-threaded code. The programs act the same either way.

Threading models

Practically all executable threaded code uses one or another of these methods for invoking subroutines (each method is called a "threading model").

Direct threading

Addresses in the thread are the addresses of machine language. This form is simple, but may have overheads because the thread consists only of machine addresses, so all further parameters must be loaded indirectly from memory. Some Forth systems produce direct-threaded code. On many machines direct-threading is faster than subroutine threading (see reference below).

As example, a stack machine might execute the sequence "push A, push B, add". That might be translated to the following thread and routines, where ip is initialized to the address &thread.


thread: pushA: *sp++ = A pushB: *sp++ = B add: *sp++ = *--sp + *--sp
&pushA jump *ip++ jump *ip++ jump *ip++
&pushB
&add
...


Alternatively, operands may be included in the thread. This can remove some indirection needed above, but makes the thread larger:


thread: push: *sp++ = *ip++ add: *sp++ = *--sp + *--sp
&push jump *ip++ jump *ip++
&A
&push
&B
&add

Indirect threading

Indirect threading uses pointers to locations that in turn point to machine code. The indirect pointer may be followed by operands which are stored in the indirect "block" rather than storing them repeatedly in the thread. Thus, indirect code is often more compact than direct-threaded code, but the indirection also typically makes it slower, though still usually faster than bytecode interpreters. Where the handler operands include both values and types, the space savings over direct-threaded code may be significant. Older FORTH systems typically produce indirect-threaded code.

As example, if the goal is to execute "push A, push B, add", the following might be used. Here, ip is initialized to address &thread, each code fragment (push, add) is found by double-indirecting through ip; and operands to each code fragment are found in the first-level indirection following the address of the fragment.


thread: i_pushA: push: add:
&i_pushA &push *sp++ = *(*ip + 1) *sp++ = *--sp + *--sp
&i_pushB &A jump *(*ip++) jump *(*ip++)
&i_add i_pushB:
&push
&B
i_add:
&add

Subroutine threading

So-called "subroutine-threaded code" (also "call-threaded code") consists of a series of machine-language "call" instructions (or addresses of functions to "call", as opposed to direct threading's use of "jump"). Early compilers for ALGOL
ALGOL
ALGOL is a family of imperative computer programming languages originally developed in the mid 1950s which greatly influenced many other languages and became the de facto way algorithms were described in textbooks and academic works for almost the next 30 years...

, Fortran, Cobol and some Forth systems often produced subroutine-threaded code. The code in many of these systems operated on a last-in-first-out (LIFO) stack of operands, which had well-developed compiler theory. Most modern processors have special hardware support for subroutine "call" and "return" instructions, so the overhead of one extra machine instruction per dispatch is somewhat diminished. Anton Ertl has stated "that, in contrast to popular myths, subroutine threading is usually slower than direct threading." However, Ertl's most recent tests show that subroutine threading is faster than direct threading in 15 out of 25 test cases. Ertl's most recent tests show that direct threading is the fastest threading model on Xeon, Opteron, and Athlon processors; indirect threading is the fastest threading model on Pentium M processors; and subroutine threading is the fastest threading model on Pentium 4, Pentium III, and PPC processors.

As an example of call threading "push A, push B, add":


thread: pushA: pushB: add:
call pushA *sp++ = A *sp++ = B *sp++ = *--sp + *--sp
call pushB ret ret ret
call add

Token threading

Token threaded code uses lists of 8 or 12-bit indexes to a table of pointers. Token threaded code is notably compact, without much special effort by a programmer. It is usually half to three-fourths the size of other threaded-codes, which are themselves a quarter to an eighth the size of compiled code. The table's pointers can either be indirect or direct. Some Forth compilers produce token threaded code. Some programmers consider the "p-code
P-Code machine
In computer programming, a p-code machine, or portable code machine is a virtual machine designed to execute p-code...

" generated by some Pascal
Pascal (programming language)
Pascal is an influential imperative and procedural programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structuring.A derivative known as Object Pascal...

 compilers, as well as the bytecode
Bytecode
Bytecode, also known as p-code , is a term which has been used to denote various forms of instruction sets designed for efficient execution by a software interpreter as well as being suitable for further compilation into machine code...

s used by .NET
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

, Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

, BASIC and some C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

 compilers to be token-threading.

A common approach historically is bytecode
Bytecode
Bytecode, also known as p-code , is a term which has been used to denote various forms of instruction sets designed for efficient execution by a software interpreter as well as being suitable for further compilation into machine code...

, which uses 8-bit opcodes and, often, a stack-based virtual machine. A typical interpreter is known as a "decode and dispatch interpreter", and follows the form


bytecode: top: pushA: pushB: add:
0 /*pushA*/ i = decode(vpc++) *sp++ = A *sp++ = B *sp++ = *--sp + *--sp
1 /*pushB*/ addr = table[i] jump top jump top jump top
2 /*add*/ jump *addr


If the virtual machine uses only byte-size instructions, decode is simply a fetch from bytecode, but often there are commonly used 1-byte instructions plus some less-common multibyte instructions
Complex instruction set computer
A complex instruction set computer , is a computer where single instructions can execute several low-level operations and/or are capable of multi-step operations or addressing modes within single instructions...

, in which case decode is more complex. The decoding of single byte opcodes can be very simply and efficiently handled by a branch table
Branch table
In computer programming, a branch table is a term used to describe an efficient method of transferring program control to another part of a program using a table of branch instructions. It is a form of multiway branch...

 using the opcode directly as an index.

For instructions where the individual operations are simple, such as "push" and "add", the overhead
Overhead
Overhead may be:* Overhead , the ongoing operating costs of running a business* Engineering overhead, ancillary design features required by a component of a device...

 involved in deciding what to execute is larger than the cost of actually executing it, so such interpreters are often much slower than machine code. However for more complex ("compound") instructions, the overhead percentage is proportionally less significant.

Huffman threading

Huffman threaded code consists of lists of Huffman codes. A Huffman code is a variable length bit string used to identify a unique item. A Huffman-threaded interpreter locates subroutines using an index table or tree of pointers that can be navigated by the Huffman code. Huffman threaded code is one of the most compact representations known for a computer program. Basically the index and codes are organized by measuring the frequency that each subroutine occurs in the code. Frequent calls are given the shortest codes. Operations with approximately equal frequencies are given codes with nearly equal bit-lengths. Most Huffman-threaded systems have been implemented as direct-threaded Forth systems, and used to pack large amounts of slow-running code into small, cheap microcontroller
Microcontroller
A microcontroller is a small computer on a single integrated circuit containing a processor core, memory, and programmable input/output peripherals. Program memory in the form of NOR flash or OTP ROM is also often included on chip, as well as a typically small amount of RAM...

s. Most published uses have been in toys, calculators, and watches.

Lesser used threading

  • String threading, where operations are identified by strings, usually looked-up by a hash table. This was used in Charles H. Moore's earliest Forth implementations and in the University of Illinois
    University of Illinois at Urbana-Champaign
    The University of Illinois at Urbana–Champaign is a large public research-intensive university in the state of Illinois, United States. It is the flagship campus of the University of Illinois system...

    's experimental hardware-interpreted computer language. It is also used in Bashforth
    Bashforth
    Bashforth is a free Forth interpreter, written entirely in the Bash scripting language. It requires GNU Bash v2.04 or higher. Its virtual machine makes use of string threading....

    .

Branches

Examples above show no branches. For all interpreters, a branch changes the thread pointer (ip above). As example, a conditional branch when the top-of-stack value is zero might be encoded as follows. Note that &thread[123] is the location to jump to, not the address of a handler, and so must be skipped (ip++) whether or not the branch is taken.


thread: brz:
... tmp = *ip++
&brz if (*sp++

0)
&thread[123] ip = tmp
... jump *ip++


Common amenities
Separating the data and return stacks in a machine eliminates a great deal of stack management code, substantially reducing the size of the threaded code. The dual-stack principle was originated three times independently: for Burroughs large systems, Forth and PostScript
PostScript
PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...

, and is used in some Java virtual machine
Java Virtual Machine
A Java virtual machine is a virtual machine capable of executing Java bytecode. It is the code execution component of the Java software platform. Sun Microsystems stated that there are over 4.5 billion JVM-enabled devices.-Overview:...

s.

Three registers
Processor register
In computer architecture, a processor register is a small amount of storage available as part of a CPU or other digital processor. Such registers are addressed by mechanisms other than main memory and can be accessed more quickly...

 are often present in a threaded virtual machine. Another one exists for passing data between subroutine
Subroutine
In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....

s ('words'). These are:
  • ip or i (instruction pointer) of the virtual machine (not to be confused with the program counter
    Program counter
    The program counter , commonly called the instruction pointer in Intel x86 microprocessors, and sometimes called the instruction address register, or just part of the instruction sequencer in some computers, is a processor register that indicates where the computer is in its instruction sequence...

     of the underlying hardware implementing the VM)
  • w (work pointer)
  • rp or r (return stack
    Stack (data structure)
    In computer science, a stack is a last in, first out abstract data type and linear data structure. A stack can have any abstract data type as an element, but is characterized by only three fundamental operations: push, pop and stack top. The push operation adds a new item to the top of the stack,...

     pointer)
  • sp or s (parameter
    Parameter
    Parameter from Ancient Greek παρά also “para” meaning “beside, subsidiary” and μέτρον also “metron” meaning “measure”, can be interpreted in mathematics, logic, linguistics, environmental science and other disciplines....

     stack pointer for passing parameters between words)


Often, threaded virtual machine
Virtual machine
A virtual machine is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software emulation or hardware virtualization or both together.-VM Definitions:A virtual machine is a software...

s such as implementations of Forth have a simple virtual machine at heart, consisting of three primitives. Those are:
  • nest, also called docol
  • unnest, or semi_s
  • next


In an indirect-threaded virtual machine, the one given here, the operations are:
next: (ip)+ -> w ; jmp (w)+
nest: ip -> -(rp) ; w -> ip ; next
unnest: (rp)+ -> ip ; next

This is perhaps the simplest and fastest interpreter or virtual machine.
See also

  • Continuation-passing style
    Continuation-passing style
    In functional programming, continuation-passing style is a style of programming in which control is passed explicitly in the form of a continuation. Gerald Jay Sussman and Guy L. Steele, Jr...

    , which replaces the global variable ip with a function parameter
  • Tail recursion
  • Just-in-time compilation
    Just-in-time compilation
    In computing, just-in-time compilation , also known as dynamic translation, is a method to improve the runtime performance of computer programs. Historically, computer programs had two modes of runtime operation, either interpreted or static compilation...

  • Branch table
    Branch table
    In computer programming, a branch table is a term used to describe an efficient method of transferring program control to another part of a program using a table of branch instructions. It is a form of multiway branch...


Further reading
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK