Substring index
Encyclopedia
A substring index is a data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

 which gives substring
Substring
A subsequence, substring, prefix or suffix of a string is a subset of the symbols in a string, where the order of the elements is preserved...

 search in a text or text collection in sublinear time. If you have a document of length , or a set of documents of total length , you can locate all occurrences of a pattern in time. ( means less than . See Big O notation
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

.)

The phrase full-text index is also often used for an index of all substrings of a text. But is ambiguous, as it is also used for regular word indexes such as inverted files and signature files. See full text search
Full text search
In text retrieval, full text search refers to techniques for searching a single computer-stored document or a collection in a full text database...

.

Substring indexes include:
  • Suffix tree
    Suffix tree
    In computer science, a suffix tree is a data structure that presents the suffixes of a given string in a way that allows for a particularly fast implementation of many important string operations.The suffix tree for a string S is a tree whose edges are labeled with strings, such that each suffix...

  • Suffix array
    Suffix array
    In computer science, a suffix array is an array of integers giving the starting positions of suffixes of a string in lexicographical order.-Details:Consider the string...

  • N-gram index, an inverted file for all N-gram
    N-gram
    In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items in question can be phonemes, syllables, letters, words or base pairs according to the application...

    s of the text
  • Compressed suffix array
    Compressed suffix array
    The Compressed Suffix Array is a compressed data structure for pattern matching. Given a text T of n characters from an alphabet Σ, the compressed suffix array support searching for arbitrary patterns in T...

  • FM-index
    FM-index
    The FM-index is type of Substring index based on the Burrows-Wheeler transform,with some similarities to the Suffix Array.It is named after the creators of the algorithm, Paolo Ferragina and Giovanni Manzini, who describe it as an opportunistic data structure as it allows compression of the input...

  • LZ-index
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK