data:image/s3,"s3://crabby-images/ca364/ca36463f9b82c90b6f3e27c6326cdcdc31617e4c" alt=""
Succinct data structure
Encyclopedia
In computer science
, a succinct data structure is data structure
which uses an amount of space that is "close" to the information-theoretic lower bound, but (unlike other compressed representations) still allows for efficient query operations. The concept was originally introduced by Jacobson to encode bit vectors, (unlabeled) trees
, and planar graph
s. Unlike general lossless data compression
algorithms, succinct data structures retain the ability to use them in-place, without decompressing them first. A related notion is that of a compressed data structure
, in which the size of the data structure depends upon the particular data being represented.
Suppose that
is the information-theoretical optimal number of bits needed to store some data. A representation of this data is called
Implicit structures are thus usually reduced to storing information using some permutation of the input data; the most well-known example of this is the heap.
-ary trees and multisets, as well as suffix tree
s and arrays
. The basic problem is to store a subset
of a universe
, usually represented as a bit array
where
iff
. An indexable dictionary supports the usual methods on dictionaries (queries, and insertions/deletions in the dynamic case) as well as the following operations:
for
.
There is a simple representation which uses
bits of storage space (the original bit array and an
auxiliary structure) and supports rank and select in constant time. It uses an idea similar to that for range-minimum queries
; there are a constant number of recursions before stopping at a subproblem of a limited size. The bit array
is partitioned into large blocks of size
bits and small blocks of size
bits. For each large block, the rank of its first bit is stored in a separate table
; each such entry takes
bits for a total of
bits of storage. Within a large block, another directory
stores the rank of each of the
small blocks it contains. The difference here is that it only needs
bits for each entry, since only only the differences from the rank of the first bit in the containing large block need to be stored. Thus, this table takes a total of
bits. A lookup table
can then be used that stores the answer to every possible rank query on a bit string of length
for
; this requires
bits of storage space. Thus, since each of these auxiliary tables take
space, this data structure supports rank queries in
time and
bits of space.
To answer a query for
in constant time, a constant time algorithm computes
data:image/s3,"s3://crabby-images/6749c/6749ca5267624f610ffdff963882f9b4eed483aa" alt=""
In practice, the lookup table
can be replaced by bitwise operations and smaller tables to perform find the number of bits set in the small blocks. This is often beneficial, since succinct data structures find their uses in large data sets, in which case cache misses become much more frequent and the chances of the lookup table being evicted from closer CPU caches becomes higher. Select queries can be easily supported by doing a binary search on the same auxiliary structure used for rank; however, this takes
time in the worst case. A more complicated structure using
bits of additional storage can be used to support select in constant time. In practice, many of these solutions have hidden constants in the
notation which dominate before any asymptotic advantage becomes apparent; implementations using broadword operations and word-aligned blocks often perform better in practice.
space approach can be improved by noting that there are
distinct
-subsets of
(or binary strings of length
with exactly
1’s), and thus
is an information theoretic lower bound on the number of bits needed to store
. There is a succinct (static) dictionary which attains this bound, namely using
space. This structure can be extended to support rank and select queries and takes
space. This bound can be reduced to a space/time tradeoff by reducing the storage space of the dictionary to
with queries taking
time.
function can quickly determine where each item begins, given its index.
Another example is the representation of a binary tree
: an arbitrary binary tree on n nodes can be represented in
bits while supporting a variety of operations on any node, which includes finding its parent, its left and right child, and returning the size of its subtree, each in constant time. The number of different binary trees on
nodes is data:image/s3,"s3://crabby-images/10300/10300f627c9eb11ffdd1841e6435d002919705dd" alt=""
. For large
, this is about
; thus we need at least about
bits to encode it. A succinct binary tree therefore would occupy only
bits per node.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, a succinct data structure is data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
which uses an amount of space that is "close" to the information-theoretic lower bound, but (unlike other compressed representations) still allows for efficient query operations. The concept was originally introduced by Jacobson to encode bit vectors, (unlabeled) trees
Tree (data structure)
In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at...
, and planar graph
Planar graph
In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints...
s. Unlike general lossless data compression
Lossless data compression
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. The term lossless is in contrast to lossy data compression, which only allows an approximation of the original data to be reconstructed, in exchange...
algorithms, succinct data structures retain the ability to use them in-place, without decompressing them first. A related notion is that of a compressed data structure
Compressed data structure
The term compressed data structure arises in the computer science subfields of algorithms, data structures, and theoretical computer science. It refers to a data structure whose operations are roughly as fast as those of a conventional data structure for the problem, but whose size can be...
, in which the size of the data structure depends upon the particular data being represented.
Suppose that
data:image/s3,"s3://crabby-images/d25f2/d25f2a9486297c75b46d038b7a773a45fc394c75" alt=""
- implicitImplicit data structureIn computer science, an implicit data structure is a data structure that uses very little memory besides the actual data elements i.e. very little information other than main data is stored in these structures. These are storage schemes which retain no pointers and represent the file of n k-key...
if it takesbits of space,
- succinct if it takes
bits of space, and
- compact if it takes
bits of space.
Implicit structures are thus usually reduced to storing information using some permutation of the input data; the most well-known example of this is the heap.
Succinct dictionaries
Succinct indexable dictionaries, also called rank/select dictionaries, form the basis of a number of succinct representation techniques, including binary trees,data:image/s3,"s3://crabby-images/ce1fa/ce1fa5599a63f861c5a194bfcd81ee8b88fa0ab9" alt=""
Suffix tree
In computer science, a suffix tree is a data structure that presents the suffixes of a given string in a way that allows for a particularly fast implementation of many important string operations.The suffix tree for a string S is a tree whose edges are labeled with strings, such that each suffix...
s and arrays
Suffix array
In computer science, a suffix array is an array of integers giving the starting positions of suffixes of a string in lexicographical order.-Details:Consider the string...
. The basic problem is to store a subset
data:image/s3,"s3://crabby-images/e9f1c/e9f1c2179bb7715e69176f7fa7af59272081b954" alt=""
data:image/s3,"s3://crabby-images/a8acb/a8acbf86751373f8418525165b92ada0a7c73394" alt=""
data:image/s3,"s3://crabby-images/be7cf/be7cf1945b053bac68854493e2a45ab2d630057d" alt=""
data:image/s3,"s3://crabby-images/389b7/389b7ea73e8e25ed28aa79f85ae07232aa526aa6" alt=""
data:image/s3,"s3://crabby-images/a3e98/a3e98b7d80aced80979ed0cae76946994cc13c8f" alt=""
for
data:image/s3,"s3://crabby-images/e3a0d/e3a0df43377310bbc85a5c0d9ae207b7bd52eb4f" alt=""
There is a simple representation which uses
data:image/s3,"s3://crabby-images/319d6/319d6c47112c896bf98bcc2ac2b0bb626f30c74a" alt=""
data:image/s3,"s3://crabby-images/c0d10/c0d10b387dedcfa93fec5cd9a6dd69c56c688298" alt=""
Range Minimum Query
Given an array A[1,n] of n ordered objects , a Range Minimum Query from i to j asks for the position of a minimum element in the sub-array A[i,j]....
; there are a constant number of recursions before stopping at a subproblem of a limited size. The bit array
data:image/s3,"s3://crabby-images/aa93c/aa93c8a2430e5a71b6dd818c71f37cb51ab15674" alt=""
data:image/s3,"s3://crabby-images/ba7bf/ba7bf070dc8036ade18eb0a27223ffb6e245569c" alt=""
data:image/s3,"s3://crabby-images/63594/635944775f0e5dff1043603953095e6733ebfadc" alt=""
data:image/s3,"s3://crabby-images/e4a17/e4a173cadba462af7eb043aa79b7b2ea1995cdd0" alt=""
data:image/s3,"s3://crabby-images/1df0e/1df0ee4a5178f4428ce58667f9776dc62f5528e3" alt=""
data:image/s3,"s3://crabby-images/de252/de25255c6978b2a5ff21cb32edf72667d27a738f" alt=""
data:image/s3,"s3://crabby-images/0ccc9/0ccc97759c7dec073b783206cb2ddd59fd6b154d" alt=""
data:image/s3,"s3://crabby-images/4bb26/4bb261b2fcaa281c9299ea070248093944001311" alt=""
data:image/s3,"s3://crabby-images/13088/1308806eee70bc8eff501d073e55e9fd7d0ace4a" alt=""
data:image/s3,"s3://crabby-images/80e17/80e1709f4ed983ee5c52f7c4092598338353f724" alt=""
data:image/s3,"s3://crabby-images/7107b/7107be33cef9a44f24c9f8a48c810bf8ce591cf0" alt=""
data:image/s3,"s3://crabby-images/85374/8537454a6bb8514b7ecb44992e6b59dfde9df10a" alt=""
data:image/s3,"s3://crabby-images/9664a/9664a7e96804eb0854df69ebb321b0112ddfc109" alt=""
data:image/s3,"s3://crabby-images/2008f/2008fd7cd9b918f0d25876beb7be27b2c876023e" alt=""
data:image/s3,"s3://crabby-images/a1b60/a1b600d4795c2d75a778e0aae3322e41b7997136" alt=""
data:image/s3,"s3://crabby-images/7d9ad/7d9ad2ea3bf554ed2140122d417e9fcaf3f47203" alt=""
data:image/s3,"s3://crabby-images/ef110/ef11077e21f9a54953fd43d95ea33af3f1add609" alt=""
To answer a query for
data:image/s3,"s3://crabby-images/4df55/4df55de3333cb40349f9f007366c8a016db1d55e" alt=""
data:image/s3,"s3://crabby-images/6749c/6749ca5267624f610ffdff963882f9b4eed483aa" alt=""
In practice, the lookup table
data:image/s3,"s3://crabby-images/3e3e0/3e3e0b8f12fd9eced64f0850c582ab9ed344722e" alt=""
data:image/s3,"s3://crabby-images/bdffe/bdffec5b01353e1ebafc3a0b3c701abed9a22d12" alt=""
data:image/s3,"s3://crabby-images/a05fc/a05fc2375a6bfa13ca57378f6bf78fc0eb6d67a7" alt=""
data:image/s3,"s3://crabby-images/10791/1079142acffeb184af8f20f8ce96661654aa62a2" alt=""
Entropy-compressed dictionaries
Thedata:image/s3,"s3://crabby-images/1f53f/1f53f8034b417a19cb224f5f555b8764ca8e8e5c" alt=""
data:image/s3,"s3://crabby-images/5f57b/5f57b565356ede482514d25ef5354e41584b2116" alt=""
data:image/s3,"s3://crabby-images/7a189/7a1898e824ed0f10a5eb52ed343957e4f4a02c66" alt=""
data:image/s3,"s3://crabby-images/ccb5f/ccb5fbf926cbea513f0985dfe9daa612e648a3d6" alt=""
data:image/s3,"s3://crabby-images/484f0/484f0107bcf9cf205af77b4cc2ce9f31a53470e0" alt=""
data:image/s3,"s3://crabby-images/dd51b/dd51b4769be222c78e41a90b2a887cc6010b073e" alt=""
data:image/s3,"s3://crabby-images/ae660/ae660108884cbc943abba25b23b82673b92cb0d2" alt=""
data:image/s3,"s3://crabby-images/8b50f/8b50f039a7491aa3b3ad7e5b91f9416421d6f067" alt=""
data:image/s3,"s3://crabby-images/23150/23150310f21330092a2250c2dbb33f4145a58926" alt=""
data:image/s3,"s3://crabby-images/8cccc/8cccc545e03864c22f676681244ab6f145c5b87b" alt=""
data:image/s3,"s3://crabby-images/6ab2a/6ab2a60b682d2eafb993a5f492cda4b624257c7b" alt=""
data:image/s3,"s3://crabby-images/ff0a6/ff0a63eb4e41b682b79a34633de604f2b3d267a4" alt=""
Examples
When a sequence of variable-length items needs to be encoded, the items can simply be placed one after another, with no delimiters. A separate binary string consisting of 1s in the positions where an item begins, and 0s every where else is encoded along with it. Given this string, thedata:image/s3,"s3://crabby-images/cb39e/cb39e88b43d16a9a53419f1620020ee62bd440cf" alt=""
Another example is the representation of a binary tree
Binary tree
In computer science, a binary tree is a tree data structure in which each node has at most two child nodes, usually distinguished as "left" and "right". Nodes with children are parent nodes, and child nodes may contain references to their parents. Outside the tree, there is often a reference to...
: an arbitrary binary tree on n nodes can be represented in
data:image/s3,"s3://crabby-images/c912f/c912fa7d731ce1c68611f8f329e5428000576b9e" alt=""
data:image/s3,"s3://crabby-images/30401/30401ef2e84d6a21918d14b03ca1b940c2b27f17" alt=""
data:image/s3,"s3://crabby-images/10300/10300f627c9eb11ffdd1841e6435d002919705dd" alt=""
data:image/s3,"s3://crabby-images/70438/70438ddd6b91e9958c5c41fa4b4fa390532da7e2" alt=""
data:image/s3,"s3://crabby-images/20516/20516a08340f0b2ec661f96ce65e64266c901019" alt=""
data:image/s3,"s3://crabby-images/44cc8/44cc8b59aebf9e88ed84e8b16e5e2993a7daea78" alt=""
data:image/s3,"s3://crabby-images/2db3c/2db3c48d55a896f7facdb738607b258da1cdcc00" alt=""
data:image/s3,"s3://crabby-images/92575/925753a03198ea9ac16bb5f1a1b1fc07b92ba622" alt=""