Substring index
Encyclopedia
A substring index is a data structure
which gives substring
search in a text or text collection in sublinear time. If you have a document of length , or a set of documents of total length , you can locate all occurrences of a pattern in time. ( means less than . See Big O notation
.)
The phrase full-text index is also often used for an index of all substrings of a text. But is ambiguous, as it is also used for regular word indexes such as inverted files and signature files. See full text search
.
Substring indexes include:
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
which gives substring
Substring
A subsequence, substring, prefix or suffix of a string is a subset of the symbols in a string, where the order of the elements is preserved...
search in a text or text collection in sublinear time. If you have a document of length , or a set of documents of total length , you can locate all occurrences of a pattern in time. ( means less than . See Big O notation
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...
.)
The phrase full-text index is also often used for an index of all substrings of a text. But is ambiguous, as it is also used for regular word indexes such as inverted files and signature files. See full text search
Full text search
In text retrieval, full text search refers to techniques for searching a single computer-stored document or a collection in a full text database...
.
Substring indexes include:
- Suffix treeSuffix treeIn computer science, a suffix tree is a data structure that presents the suffixes of a given string in a way that allows for a particularly fast implementation of many important string operations.The suffix tree for a string S is a tree whose edges are labeled with strings, such that each suffix...
- Suffix arraySuffix arrayIn computer science, a suffix array is an array of integers giving the starting positions of suffixes of a string in lexicographical order.-Details:Consider the string...
- N-gram index, an inverted file for all N-gramN-gramIn the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items in question can be phonemes, syllables, letters, words or base pairs according to the application...
s of the text - Compressed suffix arrayCompressed suffix arrayThe Compressed Suffix Array is a compressed data structure for pattern matching. Given a text T of n characters from an alphabet Σ, the compressed suffix array support searching for arbitrary patterns in T...
- FM-indexFM-indexThe FM-index is type of Substring index based on the Burrows-Wheeler transform,with some similarities to the Suffix Array.It is named after the creators of the algorithm, Paolo Ferragina and Giovanni Manzini, who describe it as an opportunistic data structure as it allows compression of the input...
- LZ-index