Judy array
Encyclopedia
In computer science
and software engineering
, a Judy array is a data structure
that has high performance, low memory usage and implements an associative array
. Unlike normal arrays, Judy arrays may be sparse, that is, they may have large ranges of unassigned indices. They can be used for storing and looking up values using integer or string keys. The key benefits of using Judy is its scalability, high performance, memory efficiency and ease of use.
Judy arrays are both speed- and memory-efficient, with no tuning or configuration required and therefore they can replace common data structures (skiplists, linked lists, binary, ternary, b-trees, hashing) and work better with very large data sets.
Roughly speaking, it is similar to a highly-optimised 256-ary trie
data structure. To make memory consumption small, Judy arrays use over 20 different compression techniques to compress trie nodes.
The Judy array was invented by Douglas Baskins and named after his sister.
optimizations, Judy arrays are fast, sometimes even faster than a hash table
, especially for very big datasets. Despite Judy arrays being a type of trie
, they consume much less memory than hash tables. Also because a Judy array is a trie, it is possible to do an ordered sequential traversal of keys, which is not possible in hash tables.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
and software engineering
Software engineering
Software Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software...
, a Judy array is a data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
that has high performance, low memory usage and implements an associative array
Associative array
In computer science, an associative array is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection....
. Unlike normal arrays, Judy arrays may be sparse, that is, they may have large ranges of unassigned indices. They can be used for storing and looking up values using integer or string keys. The key benefits of using Judy is its scalability, high performance, memory efficiency and ease of use.
Judy arrays are both speed- and memory-efficient, with no tuning or configuration required and therefore they can replace common data structures (skiplists, linked lists, binary, ternary, b-trees, hashing) and work better with very large data sets.
Roughly speaking, it is similar to a highly-optimised 256-ary trie
Trie
In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the...
data structure. To make memory consumption small, Judy arrays use over 20 different compression techniques to compress trie nodes.
The Judy array was invented by Douglas Baskins and named after his sister.
Terminology
Expanse, population and density are commonly used when it comes to Judy. As they are not commonly used in tree search literature, it is important to define them--- Expanse is a range of possible keys. ex: 200, 300, etc
- Population is the count of keys contained in an expanse. ex: 200, 360, 400, 512, 720 = 5
- Density is used to describe the sparseness of an expanse of keys--> Density = Population / Expanse
Memory allocation
Judy arrays are designed to be unbounded arrays and therefore their sizes are not pre-allocated. They can dynamically choose to grow or shrink the memory used according to the population of the array and can scale to a large number of elements. Since it allocates memory dynamically as it grows, it is only bounded by machine memory. The memory used by Judy is nearly proportional to the number of elements (population) in the Judy array.Speed
Judy arrays are designed to keep the number of processor cache-line fills as low as possible, and the algorithm is internally complex in an attempt to satisfy this goal as often as possible. Due to these cacheCache
In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...
optimizations, Judy arrays are fast, sometimes even faster than a hash table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
, especially for very big datasets. Despite Judy arrays being a type of trie
Trie
In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the...
, they consume much less memory than hash tables. Also because a Judy array is a trie, it is possible to do an ordered sequential traversal of keys, which is not possible in hash tables.