Branch table
Encyclopedia
In computer programming
, a branch table (sometimes known as a jump table) is a term used to describe an efficient method of transferring program control (branching
) to another part of a program (or a different program that may have been dynamically loaded) using a table of branch instructions. It is a form of multiway branch
. The branch table construction is commonly used when programming in assembly language
but may also be generated by a compiler
, especially when implementing an optimized switch statement
(where known, small ranges are involved with few gaps).
a sequential index by the instruction length (the number of bytes in memory occupied by each branch instruction). It relies on the fact that machine code
instructions for branching have a fixed length and can be executed extremely efficiently by most hardware
, and is most useful when dealing with raw data
values that may be easily converted to sequential index values. Given such data, a branch table can be extremely efficient. It usually consists of the following 3 steps:
The following pseudocode illustrates the concept
... validate x /* transform x to 0 (invalid) or 1,2,3, according to value..) */
y = x*4; /* multiply by branch instruction length (e.g. 4 ) */
goto next(y); /* branch into 'table' of branch instructions */
/* start of branch table */
next: goto codebad; /* x= 0 (invalid) */
goto codeone; /* x= 1 */
goto codetwo; /* x= 2 */
... rest of branch table
codebad: /* deal with invalid input */
" or "virtual method table
" but essentially performing exactly the same purpose. This pointer function method can result in saving one machine instruction, and avoids the indirect jump (to one of the branch instructions).
The resulting list of pointers to functions is almost identical to direct threaded code
, and is conceptually similar to a control table
.
The actual method used to implement a branch table is usually based on:
encoding was common in the early days of computing when memory was expensive, CPUs were slower and compact data representation and efficient choice of alternatives were important. Nowadays, they are commonly still used in:
For library
functions, where they may be referenced by an integer
:
In addition, calling functions by number (the index into the table), can sometimes be useful in some cases in normal application programming
assembly language is:
Note: this code will work only if PCL < (table + index_last). To ensure this condition we may use "org" directive.
And if GOTO (PIC18F for example) is 2 bytes, this limits table entries to less than 128.
However, a little 'common sense' can transform this particular case, and many other similar cases, to a simple two-step process with very large potential savings – while still eventually leaving the ultimate choice to the compiler – but 'assisting its decision' considerably:
Variations along similar lines can be used in cases where there are two sets of short ranges with a large gap between ranges.
A hash table
may be required to form the index in some cases. However, for single byte input values such as A-Z (or the first byte of a longer key), the contents of the byte itself (raw data
) can be used in a two-step, "trivial hash function", process to obtain a final index for a branch table with zero gaps.
The array would be no larger than (256 x 2) bytes - to hold all possible 16-bit unsigned (short) integers. If no validation is required, and only upper case is used, the size of the array may be as small as (26 x 2) = 52 bytes.
Other uses of technique
Although the technique of branching using a branch table is most frequently utilized solely for the purpose of altering program flow - to jump to a program label that is an unconditional branch - the same technique can be used for other purposes. For example, it can be used to select a starting point in a sequence of repeated instructions where drop through is the norm and intentional. This can be used for example by optimizing compilers or JIT
compilers in loop unrolling.
See also
External links
Computer programming
Computer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...
, a branch table (sometimes known as a jump table) is a term used to describe an efficient method of transferring program control (branching
Branch (computer science)
A branch is sequence of code in a computer program which is conditionally executed depending on whether the flow of control is altered or not . The term can be used when referring to programs in high level languages as well as program written in machine code or assembly language...
) to another part of a program (or a different program that may have been dynamically loaded) using a table of branch instructions. It is a form of multiway branch
Multiway branch
Multiway branch is a computer science term used to describe the change to a program's control flow based upon a value matching a selected criteria. It is a form of conditional statement...
. The branch table construction is commonly used when programming in assembly language
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...
but may also be generated by a compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
, especially when implementing an optimized switch statement
Switch statement
In computer programming, a switch, case, select or inspect statement is a type of selection control mechanism that exists in most imperative programming languages such as Pascal, Ada, C/C++, C#, Java, and so on. It is also included in several other types of languages...
(where known, small ranges are involved with few gaps).
Typical implementation
A branch table consists of a serial list of unconditional branch instructions that is branched into using an offset created by multiplyingMultiple
The word multiple can refer to:*Multiple , multiples of numbers*List of multiple discoveries, instances of scientists, working independently of each other, reaching similar findings...
a sequential index by the instruction length (the number of bytes in memory occupied by each branch instruction). It relies on the fact that machine code
Machine code
Machine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit. Each instruction performs a very specific task, typically either an operation on a unit of data Machine code or machine language is a system of impartible instructions...
instructions for branching have a fixed length and can be executed extremely efficiently by most hardware
Hardware
Hardware is a general term for equipment such as keys, locks, hinges, latches, handles, wire, chains, plumbing supplies, tools, utensils, cutlery and machine parts. Household hardware is typically sold in hardware stores....
, and is most useful when dealing with raw data
Raw data
'\putang inaIn computing, it may have the following attributes: possibly containing errors, not validated; in sfferent formats; uncoded or unformatted; and suspect, requiring confirmation or citation. For example, a data input sheet might contain dates as raw data in many forms: "31st January...
values that may be easily converted to sequential index values. Given such data, a branch table can be extremely efficient. It usually consists of the following 3 steps:
- optionally validatingData validationIn computer science, data validation is the process of ensuring that a program operates on clean, correct and useful data. It uses routines, often called "validation rules" or "check routines", that check for correctness, meaningfulness, and security of data that are input to the system...
the input data to ensure it is acceptable (this may occur without cost as part of the next step, if the input is a single byte and a 256 byte translate table is used to directly obtain the offset below). Also, if there is no doubt about the values of the input, this step can be omitted. - transform the data into an offset into the branch table. This usually involves multiplying or shifting it to take into account the instruction length. If a static translate table is used, this multiplying can be performed manually or by the compiler, without any run time cost.
- branching to an address made up of the base address of the branch table plus the just generated offset. This sometimes involves an additionAdditionAddition is a mathematical operation that represents combining collections of objects together into a larger collection. It is signified by the plus sign . For example, in the picture on the right, there are 3 + 2 apples—meaning three apples and two other apples—which is the same as five apples....
of the offset onto the program counterProgram counterThe program counter , commonly called the instruction pointer in Intel x86 microprocessors, and sometimes called the instruction address register, or just part of the instruction sequencer in some computers, is a processor register that indicates where the computer is in its instruction sequence...
registerProcessor registerIn computer architecture, a processor register is a small amount of storage available as part of a CPU or other digital processor. Such registers are addressed by mechanisms other than main memory and can be accessed more quickly...
(unless, in some instruction setInstruction setAn instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...
s, the branch instruction allows an extra index register). This final address usually points to one of a sequence of unconditional branch instructions, or the instruction immediately beyond them (saving one entry in the table).
The following pseudocode illustrates the concept
... validate x /* transform x to 0 (invalid) or 1,2,3, according to value..) */
y = x*4; /* multiply by branch instruction length (e.g. 4 ) */
goto next(y); /* branch into 'table' of branch instructions */
/* start of branch table */
next: goto codebad; /* x= 0 (invalid) */
goto codeone; /* x= 1 */
goto codetwo; /* x= 2 */
... rest of branch table
codebad: /* deal with invalid input */
Alternative implementation using addresses
Another method of implementing a branch table is with an array of pointers from which the required function's address is retrieved. This method is also more recently known under such different names as "dispatch tableDispatch table
In computer science, a dispatch table is a table of pointers to functions or methods. Use of such a table is a common technique when implementing late binding in object-oriented programming.-Perl implementation:...
" or "virtual method table
Virtual method table
A virtual method table, virtual function table, dispatch table, or vtable, is a mechanism used in a programming language to support dynamic dispatch ....
" but essentially performing exactly the same purpose. This pointer function method can result in saving one machine instruction, and avoids the indirect jump (to one of the branch instructions).
The resulting list of pointers to functions is almost identical to direct threaded code
Threaded code
In computer science, the term threaded code refers to a compiler implementation technique where the generated code has a form that essentially consists entirely of calls to subroutines...
, and is conceptually similar to a control table
Control table
Control tables are tables that control the program flow or play a major part in program control. There are no rigid rules concerning the structure or content of a control table - its only qualifying attribute is its ability to direct program flow in some way through its 'execution' by an associated...
.
The actual method used to implement a branch table is usually based on:
- the architecture of the processor on which the code is to be executed,
- whether it is a compiled or interpreted language and
- whether late bindingLate bindingLate binding is a computer programming mechanism in which the method being called upon an object is looked up by name at runtime. This is informally known as duck typing or name binding....
is involved or not.
History
Use of branch tables and other raw dataRaw data
'\putang inaIn computing, it may have the following attributes: possibly containing errors, not validated; in sfferent formats; uncoded or unformatted; and suspect, requiring confirmation or citation. For example, a data input sheet might contain dates as raw data in many forms: "31st January...
encoding was common in the early days of computing when memory was expensive, CPUs were slower and compact data representation and efficient choice of alternatives were important. Nowadays, they are commonly still used in:
- embeddedEmbedded systemAn embedded system is a computer system designed for specific control functions within a larger system. often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal...
programming - operating systemOperating systemAn operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
development. In many operating systems, both system callSystem callIn computing, a system call is how a program requests a service from an operating system's kernel. This may include hardware related services , creating and executing new processes, and communicating with integral kernel services...
s and libraryLibrary (computer science)In computer science, a library is a collection of resources used to develop software. These may include pre-written code and subroutines, classes, values or type specifications....
functions may be referenced by an integerIntegerThe integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...
index into a branch table. - some computer architectureComputer architectureIn computer science and engineering, computer architecture is the practical art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals and the formal modelling of those systems....
s such as IBM/360, use branch tables for dispatching interruptInterruptIn computing, an interrupt is an asynchronous signal indicating the need for attention or a synchronous event in software indicating the need for a change in execution....
s
Advantages
Advantages of branch tables include:- compact code structure (despite repeated branch opcodes)
- reduced source statements (versus repetitive
If
statements) - reduced requirement to test return codes individually (if used at call siteCall siteIn programming, a call site of a function is a line in the code which calls a function. A call site passes zero or more arguments to the function, and receives zero or more return values.-Example:...
to determine subsequent program flow) - AlgorithmicAlgorithmic efficiencyIn computer science, efficiency is used to describe properties of an algorithm relating to how much of various types of resources it consumes. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process, where the goal is to reduce...
and code efficiency (data need only be encodeEncodeEncode may refer to:* Can be related to "Code"* Encode ApS, a Danish software company* Encode SA, a Greek information security company* ENCODE, the ENCyclopedia Of DNA Elements...
d once and branch table code is usually compact), and the potential to attain high data compressionData compressionIn computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....
ratios. For example, when compressing country names to country codes, a string such as "Central African Republic" can be compressed to a single index, resulting in large savings - particularly when the string appears many times. In addition, this same index can be used to access related data in separate tables, reducing storage requirements further.
For library
Library (computer science)
In computer science, a library is a collection of resources used to develop software. These may include pre-written code and subroutines, classes, values or type specifications....
functions, where they may be referenced by an integer
Integer
The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...
:
- improve compatibility with subsequent software versions. If the code of a function and the address of its entry point is changed, only the branch instruction in the branch table needs to be adjusted, application software compiled against the library, or for the operating system, does not need modification.
In addition, calling functions by number (the index into the table), can sometimes be useful in some cases in normal application programming
Disadvantages
- Extra level of indirectionIndirectionIn computer programming, indirection is the ability to reference something using a name, reference, or container instead of the value itself. The most common form of indirection is the act of manipulating a value through its memory address. For example, accessing a variable through the use of a...
- Restrictions in some programming languages (although there are usually alternative ways of implementing the basic concept of multiway branching).
Example
A simple example of branch table use in the 8-bit Microchip PICPIC microcontroller
PIC is a family of Harvard architecture microcontrollers made by Microchip Technology, derived from the PIC1650 originally developed by General Instrument's Microelectronics Division...
assembly language is:
Note: this code will work only if PCL < (table + index_last). To ensure this condition we may use "org" directive.
And if GOTO (PIC18F for example) is 2 bytes, this limits table entries to less than 128.
Jump table example in C
Another simple example, this time demonstrating a jump table rather than a mere branch table. This allows program blocks outside of the currently active procedure/function to be called:Compiler generated branch tables
Programmers frequently leave the decision of whether or not to create a branch table to the compiler, believing that it is perfectly capable of making the correct choice from the known search keys. This may be true for optimizing compilers for relatively simple cases where the range of search keys is limited. However, compilers are not as intelligent as humans and cannot have a deep knowledge of 'context', believing that a range of possible search key integer values such as 1,2,4 ,6,7,20,23,40,42,50 & 1000 would generate a branch table with an excessively large number of empty entries (900+) for very little advantage. (A good optimizing compiler may then presort the values and generate code for a binary chop search, as a 'second best' option). In fact, the application may be highly "time critical" and memory requirement may not really be an issue at all http://www.netrino.com/node/137.However, a little 'common sense' can transform this particular case, and many other similar cases, to a simple two-step process with very large potential savings – while still eventually leaving the ultimate choice to the compiler – but 'assisting its decision' considerably:
- First, test for search key=1000 and perform appropriate branch.
- Allow the compiler to 'choose' to generate a branch table on the remaining search keys (1-50).
Variations along similar lines can be used in cases where there are two sets of short ranges with a large gap between ranges.
Computed GoTo
While the technique is now known as 'branch tables', early compiler users called the implementation 'computed GoTo', referring to the instruction found in the Fortran series of compilers. The instruction was eventually deprecated in Fortran 90 (in favour of SELECT & CASE statements at the source level) .Creating the index for the branch table
Where there is no obvious integer value available for a Branch table it can nevertheless be created from a search key (or part of a search key) by some form of arithmetic transformation or could simply be the row number of a database or the entry number in an array containing the search key found during earlier validation of the key.A hash table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
may be required to form the index in some cases. However, for single byte input values such as A-Z (or the first byte of a longer key), the contents of the byte itself (raw data
Raw data
'\putang inaIn computing, it may have the following attributes: possibly containing errors, not validated; in sfferent formats; uncoded or unformatted; and suspect, requiring confirmation or citation. For example, a data input sheet might contain dates as raw data in many forms: "31st January...
) can be used in a two-step, "trivial hash function", process to obtain a final index for a branch table with zero gaps.
- Convert the raw dataRaw data'\putang inaIn computing, it may have the following attributes: possibly containing errors, not validated; in sfferent formats; uncoded or unformatted; and suspect, requiring confirmation or citation. For example, a data input sheet might contain dates as raw data in many forms: "31st January...
character to its numeric equivalent (example ASCIIASCIIThe American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
'A' > 65 decimal, 0x41 hexadecimal) - Use the numeric integer value as index into a 256 byte array, to obtain a second index (invalid entries 0; representing gaps , otherwise 1, 2, 3 etc.)
The array would be no larger than (256 x 2) bytes - to hold all possible 16-bit unsigned (short) integers. If no validation is required, and only upper case is used, the size of the array may be as small as (26 x 2) = 52 bytes.
Other uses of technique
Although the technique of branching using a branch table is most frequently utilized solely for the purpose of altering program flow - to jump to a program label that is an unconditional branch - the same technique can be used for other purposes. For example, it can be used to select a starting point in a sequence of repeated instructions where drop through is the norm and intentional. This can be used for example by optimizing compilers or JIT
Just-in-time compilation
In computing, just-in-time compilation , also known as dynamic translation, is a method to improve the runtime performance of computer programs. Historically, computer programs had two modes of runtime operation, either interpreted or static compilation...
compilers in loop unrolling.
See also
- Dispatch tableDispatch tableIn computer science, a dispatch table is a table of pointers to functions or methods. Use of such a table is a common technique when implementing late binding in object-oriented programming.-Perl implementation:...
a branch table by another name used for late bindingLate bindingLate binding is a computer programming mechanism in which the method being called upon an object is looked up by name at runtime. This is informally known as duck typing or name binding.... - Function pointerFunction pointerA function pointer is a type of pointer in C, C++, D, and other C-like programming languages, and Fortran 2003. When dereferenced, a function pointer can be used to invoke a function and pass it arguments just like a normal function...
a term used to describe arrays of addresses to functions as used in branch tables - Lookup tableLookup tableIn computer science, a lookup table is a data structure, usually an array or associative array, often used to replace a runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than...
an array of items to be matched, sometimes holding pre-calculated results - Switch statementSwitch statementIn computer programming, a switch, case, select or inspect statement is a type of selection control mechanism that exists in most imperative programming languages such as Pascal, Ada, C/C++, C#, Java, and so on. It is also included in several other types of languages...
a high level language conditional statement that may generate a branch table - Virtual method tableVirtual method tableA virtual method table, virtual function table, dispatch table, or vtable, is a mechanism used in a programming language to support dynamic dispatch ....
a branch table by another name with dynamically assigned pointers for dispatching (see Dispatch table)
External links
- http://en.wikibooks.org/wiki/360_Assembly/Branch_Instructions Example of branch table in WikibooksWikibooksWikibooks is a Wiki hosted by the Wikimedia Foundation for the creation of free content textbooks and annotated texts that anyone can edit....
for IBM S/360 - http://www.netrino.com/node/137 Examples of, and arguments for, Jump Tables via Function Pointer Arrays in CC (programming language)C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
/C++C++C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell... - http://www.eventhelix.com/realtimemantra/Basics/CToAssemblyTranslation3.htm Example code generated by 'Switch/Case' branch table in C, versus IF/ELSE.
- http://www.eventhelix.com/realtimemantra/Basics/CToAssemblyTranslation2.htm Example code generated for array indexing if structure size is divisible by powers of 2 or otherwise.
- http://www.rmbconsulting.us/Publications/PointerToFunction.pdf "Arrays of Pointers to Functions" by Nigel Jones