Generalised suffix tree
Encyclopedia
A generalised suffix tree is a suffix tree
Suffix tree
In computer science, a suffix tree is a data structure that presents the suffixes of a given string in a way that allows for a particularly fast implementation of many important string operations.The suffix tree for a string S is a tree whose edges are labeled with strings, such that each suffix...

 for a set of strings
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

. Given the set of strings of total length , it is a Patricia tree containing all suffixes of the strings. It is mostly used in bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

.

Functionality

It can be built in time and space, and can be used to find all occurrences of a string of length in time, which is asymptotically optimal
Asymptotically optimal
In computer science, an algorithm is said to be asymptotically optimal if, roughly speaking, for large inputs it performs at worst a constant factor worse than the best possible algorithm...

 (assuming the size of the alphabet is constant, see page 119).

When constructing such a tree, each string should be padded with a unique out-of-alphabet marker symbol (or string) to ensure no suffix is a substring of another, guaranteeing each suffix is represented by a unique leaf node.

Algorithms for constructing a GST include Ukkonen's algorithm
Ukkonen's algorithm
In 1995, Esko Ukkonen proposed a linear-time, online algorithm for constructing suffix trees that has come to be known as Ukkonen's algorithm.The algorithm begins with an implicit suffix tree containing the first character of the string. Then it steps through the string adding successive characters...

 and McCreight's algorithm.

Example

A suffix tree for the strings ABAB and BABA is shown in a figure above. They are padded with the unique terminator strings $0 and $1. The numbers in the leaf nodes are string number and starting position. Notice how a left to right traversal of the leaf nodes corresponds to the sorted order of the suffixes. The terminators might be strings or unique single symbols. Edges on $ from the root are left out in this example.

Alternatives

An alternative to building a generalised suffix tree is to concatenate the strings, and build a regular suffix tree or suffix array
Suffix array
In computer science, a suffix array is an array of integers giving the starting positions of suffixes of a string in lexicographical order.-Details:Consider the string...

for the resulting string. When hits are evaluated after a search, global positions are mapped into documents and local positions with some algorithm and/or data structure, such as a binary search in the starting/ending positions of the documents.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK