Linear search
Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, linear search or sequential search is a method for finding a particular value in a list, that consists of checking every one of its elements, one at a time and in sequence, until the desired one is found.

Linear search is the simplest search algorithm
Search algorithm
In computer science, a search algorithm is an algorithm for finding an item with specified properties among a collection of items. The items may be stored individually as records in a database; or may be elements of a search space defined by a mathematical formula or procedure, such as the roots...

; it is a special case of brute-force search
Brute-force search
In computer science, brute-force search or exhaustive search, also known as generate and test, is a trivial but very general problem-solving technique that consists of systematically enumerating all possible candidates for the solution and checking whether each candidate satisfies the problem's...

. Its worst case cost is proportional to the number of elements in the list; and so is its expected cost
Average-case complexity
Average-case complexity is a subfield of computational complexity theory that studies the complexity of algorithms on random inputs.The study of average-case complexity has applications in the theory of cryptography....

, if all list elements are equally likely to be searched for. Therefore, if the list has more than a few elements, other methods (such as binary search or hashing
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...

) will be faster, but they also impose additional requirements.

Analysis

For a list with n items, the best case is when the value is equal to the first element of the list, in which case only one comparison is needed. The worst case is when the value is not in the list (or occurs only once at the end of the list), in which case n comparisons are needed.

If the value being sought occurs k times in the list, and all orderings of the list are equally likely, the expected number of comparisons is


For example, if the value being sought occurs once in the list, and all orderings of the list are equally likely, the expected number of comparisons is . However, if it is known that it occurs once, then at most n - 1 comparisons are needed, and the expected number of comparisons is

(for example, for n = 2 this is 1, corresponding to a single if-then-else construct).

Either way, asymptotically the worst-case cost and the expected cost of linear search are both O
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

(n).

Non-uniform probabilities

The performance of linear search improves if the desired value is more likely to be near the beginning of the list than to its end. Therefore, if some values are much more likely to be searched than others, it is desirable to place them at the beginning of the list.

In particular, when the list items are arranged in order of decreasing probability, and these probabilities are geometrically distributed, the cost of linear search is only O(1). If the table size n is large enough, linear search will be faster than binary search, whose cost is O(log n).

Application

Linear search is usually very simple to implement, and is practical when the list has only a few elements, or when performing a single search in an unordered list.

When many values have to be searched in the same list, it often pays to pre-process the latter in order to use a faster method. For example, one may sort the list and use binary search, or build any efficient search data structure from it. Should the content of the list change frequently, repeated re-organisation may be more trouble than it is worth.

Forward iteration

The following pseudocode
Pseudocode
In computer science and numerical computation, pseudocode is a compact and informal high-level description of the operating principle of a computer program or other algorithm. It uses the structural conventions of a programming language, but is intended for human reading rather than machine reading...

 describes a typical variant of linear search, where the result of the search is supposed to be either the location of the list item where the desired value was found; or an invalid location Λ, to indicate that the desired element does not occur in the list.

For each item in the list:
if that item has the desired value,
stop the search and return the item's location.
Return Λ.

In this pseudocode, the last line is executed only after all list items have been examined with none matching.

If the list is stored as an array data structure, the location may be the index of the item found (usually between 1 and n, or 0 and n−1). In that case the invalid location Λ can be any index before the first element (such as 0 or −1, respectively) or after the last one (n+1 or n, respectively).

If the list is a simply linked list
Linked list
In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...

, then the item's location is its reference
Reference (computer science)
In computer science, a reference is a value that enables a program to indirectly access a particular data item, such as a variable or a record, in the computer's memory or in some other storage device. The reference is said to refer to the data item, and accessing those data is called...

, and Λ is usually the null pointer.

Recursive version

Linear search can also be described as a recursive algorithm
Recursion
Recursion is the process of repeating items in a self-similar way. For instance, when the surfaces of two mirrors are exactly parallel with each other the nested images that occur are a form of infinite recursion. The term has a variety of meanings specific to a variety of disciplines ranging from...

:

LinearSearch(value, list)
if the list is empty, return Λ;
else
if the first item of the list has the desired value, return its location;
else return LinearSearch(value, remainder of the list)

Searching in reverse order

Linear search in an array is usually programmed by stepping up an index variable until it reaches the last index. This normally requires two comparison instructions for each list item: one to check whether the index has reached the end of the array, and another one to check whether the item has the desired value. In many computers, one can reduce the work of the first comparison by scanning the items in reverse order.

Suppose an array A with elements indexed 1 to n is to be searched for a value x. The following
pseudocode performs a forward search, returning n + 1 if the value is not found:

Set i to 1.
Repeat this loop:
If i > n, then exit the loop.
If A[i] = x, then exit the loop.
Set i to i + 1.
Return i.

The following pseudocode searches the array in the reverse order, returning 0 when the element is not found:

Set i to n.
Repeat this loop:
If i ≤ 0, then exit the loop.
If A[i] = x, then exit the loop.
Set i to i − 1.
Return i.

Most computers have a conditional branch instruction that tests the sign of a value in a register, or the sign of the result of the most recent arithmetic operation. One can use that instruction, which is usually faster than a comparison against some arbitrary value (requiring a subtraction), to implement the command "If i ≤ 0, then exit the loop".

This optimization is easily implemented when programming in machine or assembly language
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...

. That branch instruction is not directly accessible in typical high-level programming languages, although many compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

s will be able to perform that optimization on their own.

Using a sentinel

Another way to reduce the overhead is to eliminate all checking of the loop index. This can be done by inserting the desired item itself as a sentinel value
Sentinel value
In computer programming, a sentinel value is a special value whose presence guarantees termination of a loop that processes structured data...

at the far end of the list, as in this pseudocode:

Set A[n + 1] to x.
Set i to 1.
Repeat this loop:
If A[i] = x, then exit the loop.
Set i to i + 1.
Return i.

With this stratagem, it is not necessary to check the value of i against the list length n: even if x was not in A to begin with, the loop will terminate when i = n + 1. However this method is possible only if the array slot A[n + 1] exists but is not being otherwise used. Similar arrangements could be made if the array were to be searched in reverse order, and element A(0) were available.

Although the effort avoided by these ploys is tiny, it is still a significant component of the overhead of performing each step of the search, which is small. Only if many elements are likely to be compared will it be worthwhile considering methods that make fewer comparisons but impose other requirements.

Linear search on an ordered list

For ordered lists that must be accessed sequentially, such as linked lists or files with variable-length records lacking an index, the average performance can be improved by giving up at the first element which is greater than the unmatched target value, rather than examining the entire list.

If the list is stored as an ordered array, then binary search is almost always more efficient than linear search as with n > 8, say, unless there is some reason to suppose that most searches will be for the small elements near the start of the sorted list.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK