Off-by-one error
Encyclopedia
An off-by-one error is a logical error involving the discrete equivalent of a boundary condition. It often occurs in computer programming
when an iterative loop iterates one time too many or too few. Usually this problem arises when a programmer fails to take into account that a sequence starts at zero rather than one (as with array indices in many languages), or makes mistakes such as using "is less than or equal to" where "is less than" should have been used in a comparison. This can also occur in a mathematical
context.
of items, and items m through n (inclusive) are to be processed. How many items are there? An intuitive answer may be n−m, but that is off by one, exhibiting a fencepost error; the correct answer is n−m+1.
For this reason, ranges in computing are often represented by half-open intervals; the range from m to n (inclusive) is represented by the range from m (inclusive) to n+1 (exclusive) to avoid fencepost errors. For example, a loop that iterates five times can be written as a half-open interval from 0 to 5:
The loop body is executed first of all with i equal to 0; i then becomes 1, 2, 3, and finally 4 on successive iterations. At that point, i becomes 5, so i < 5 is false and the loop ends. However, if the comparison used were <= (less than or equal to), the loop would be carried out six times: i takes the values 0, 1, 2, 3, 4, and 5. Likewise, if i were initialized to 1 rather than 0, there would only be four iterations: i takes the values 1, 2, 3, and 4. Both of these alternatives can cause off-by-one errors.
Another such error can occur if a do-while loop is used in place of a while loop
(or vice versa.) A do-while loop is guaranteed to run at least once.
Array-related confusion may also result from differences in programming languages. Numbering from 0 is most common, but some languages start array numbering with 1. Pascal
has arrays with user-defined indices. This makes it possible to model the array indices after the problem domain.
The intuitive answer 10 is wrong. The fence has 10 sections, but 11 posts.
The reverse error occurs when the number of posts is known and the number of sections is assumed to be the same. The actual number of sections is one fewer than the number of posts. A similar problem involves a shoe with ten pairs of eyelets. When calculating the total length of shoelace required, the length needed to go from one pair of eyelets to the next higher pair of eyelets should be counted only nine times and not ten.
More generally, the problem can be stated as follows:
The correct answer may be n - 1, n or n + 1, depending on the conditions. The precise problem definition must be carefully considered, as the setup for one scenario may give the wrong answer for other scenarios.
Fencepost errors can also occur in units other than length. For example, the Time Pyramid
, consisting of 120 blocks placed at 10 year intervals between blocks, is scheduled to take 1190 (not 1200) years to build, from the installation of the first block to the last block.
"Fencepost error" can, in rare occasions, refer to an error induced by unexpected regularities in input values, which can (for instance) completely thwart a theoretically efficient binary tree
or hash function
implementation. This error involves the difference between expected and worst case behaviours of an algorithm
.
Off-by-one errors are common in using the C library because it is not consistent with respect to whether one needs to subtract 1 byte -- functions like
On some systems (little endian
architectures in particular) this can result in the overwriting of the least significant byte of the frame pointer. This can cause an exploitable condition where an attacker can hijack the local variables for the calling routine.
One approach that often helps avoid such problems is to use variants of these functions that calculate how much to write based on the total length of the buffer, rather than the maximum number of characters to write. Such functions include
Computer programming
Computer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...
when an iterative loop iterates one time too many or too few. Usually this problem arises when a programmer fails to take into account that a sequence starts at zero rather than one (as with array indices in many languages), or makes mistakes such as using "is less than or equal to" where "is less than" should have been used in a comparison. This can also occur in a mathematical
Mathematics
Mathematics is the study of quantity, space, structure, and change. Mathematicians seek out patterns and formulate new conjectures. Mathematicians resolve the truth or falsity of conjectures by mathematical proofs, which are arguments sufficient to convince other mathematicians of their validity...
context.
Looping over arrays
Consider an arrayArray data type
In computer science, an array type is a data type that is meant to describe a collection of elements , each selected by one or more indices that can be computed at run time by the program. Such a collection is usually called an array variable, array value, or simply array...
of items, and items m through n (inclusive) are to be processed. How many items are there? An intuitive answer may be n−m, but that is off by one, exhibiting a fencepost error; the correct answer is n−m+1.
For this reason, ranges in computing are often represented by half-open intervals; the range from m to n (inclusive) is represented by the range from m (inclusive) to n+1 (exclusive) to avoid fencepost errors. For example, a loop that iterates five times can be written as a half-open interval from 0 to 5:
The loop body is executed first of all with i equal to 0; i then becomes 1, 2, 3, and finally 4 on successive iterations. At that point, i becomes 5, so i < 5 is false and the loop ends. However, if the comparison used were <= (less than or equal to), the loop would be carried out six times: i takes the values 0, 1, 2, 3, 4, and 5. Likewise, if i were initialized to 1 rather than 0, there would only be four iterations: i takes the values 1, 2, 3, and 4. Both of these alternatives can cause off-by-one errors.
Another such error can occur if a do-while loop is used in place of a while loop
While loop
In most computer programming languages, a while loop is a control flow statement that allows code to be executed repeatedly based on a given boolean condition. The while loop can be thought of as a repeating if statement....
(or vice versa.) A do-while loop is guaranteed to run at least once.
Array-related confusion may also result from differences in programming languages. Numbering from 0 is most common, but some languages start array numbering with 1. Pascal
Pascal (programming language)
Pascal is an influential imperative and procedural programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structuring.A derivative known as Object Pascal...
has arrays with user-defined indices. This makes it possible to model the array indices after the problem domain.
Fencepost error
A fencepost error (occasionally called a "telegraph pole" or "lamp-post" error) is a specific type of off-by-one error. The following problem illustrates the error:The intuitive answer 10 is wrong. The fence has 10 sections, but 11 posts.
The reverse error occurs when the number of posts is known and the number of sections is assumed to be the same. The actual number of sections is one fewer than the number of posts. A similar problem involves a shoe with ten pairs of eyelets. When calculating the total length of shoelace required, the length needed to go from one pair of eyelets to the next higher pair of eyelets should be counted only nine times and not ten.
More generally, the problem can be stated as follows:
The correct answer may be n - 1, n or n + 1, depending on the conditions. The precise problem definition must be carefully considered, as the setup for one scenario may give the wrong answer for other scenarios.
Fencepost errors can also occur in units other than length. For example, the Time Pyramid
Time pyramid
The Time pyramid is a work of public art under construction in Wemding, Germany. The pyramid, begun in 1993, at the 1200-year anniversary of Wemding, will take almost another 1200 years to complete and is scheduled to be finished in the year 3183.-History:...
, consisting of 120 blocks placed at 10 year intervals between blocks, is scheduled to take 1190 (not 1200) years to build, from the installation of the first block to the last block.
"Fencepost error" can, in rare occasions, refer to an error induced by unexpected regularities in input values, which can (for instance) completely thwart a theoretically efficient binary tree
Binary tree
In computer science, a binary tree is a tree data structure in which each node has at most two child nodes, usually distinguished as "left" and "right". Nodes with children are parent nodes, and child nodes may contain references to their parents. Outside the tree, there is often a reference to...
or hash function
Hash function
A hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...
implementation. This error involves the difference between expected and worst case behaviours of an algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
.
Security implications
A common off-by-one error which results in a security related bug is caused by misuse of the libcstrncat
routine. A common misconception with strncat
is that the guaranteed null termination will not write beyond the maximum length. In reality it will write a terminating null character one byte beyond the maximum length specified. The following code contains such a bug:Off-by-one errors are common in using the C library because it is not consistent with respect to whether one needs to subtract 1 byte -- functions like
fgets
and strncpy
will never write past the length given them (fgets
subtracts 1 itself, and only retrieves (length - 1) bytes), whereas others, like strncat
will write past the length given them. So the programmer has to remember for which functions he or she needs to subtract 1.On some systems (little endian
Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...
architectures in particular) this can result in the overwriting of the least significant byte of the frame pointer. This can cause an exploitable condition where an attacker can hijack the local variables for the calling routine.
One approach that often helps avoid such problems is to use variants of these functions that calculate how much to write based on the total length of the buffer, rather than the maximum number of characters to write. Such functions include
strlcat
and strlcpy
, and are often considered "safer" because they make it easier to avoid accidentally writing past the end of a buffer. (In the code example above, calling strlcat(buf, s, sizeof(buf))
instead would remove the bug.)