Voldemort (distributed data store)
Encyclopedia
Voldemort is a distributed data store
that is designed as a key-value store used by LinkedIn
for high-scalability storage. It is named after the fictional Harry Potter
villain Lord Voldemort
.
Voldemort is still under development. It is neither an object database, nor a relational database. It does not try to satisfy arbitrary relations and the ACID
properties, but a big, distributed, fault-tolerant, persistent hash table.
Distributed data store
A distributed data store is a blurred concept and means either a distributed database where users store their information on a number of nodes, or a network in which a user stores their information on a number of peer network nodes ....
that is designed as a key-value store used by LinkedIn
LinkedIn
LinkedIn is a business-related social networking site. Founded in December 2002 and launched in May 2003, it is mainly used for professional networking. , LinkedIn reports more than 120 million registered users in more than 200 countries and territories. The site is available in English, French,...
for high-scalability storage. It is named after the fictional Harry Potter
Harry Potter
Harry Potter is a series of seven fantasy novels written by the British author J. K. Rowling. The books chronicle the adventures of the adolescent wizard Harry Potter and his best friends Ron Weasley and Hermione Granger, all of whom are students at Hogwarts School of Witchcraft and Wizardry...
villain Lord Voldemort
Lord Voldemort
Lord Voldemort is the main antagonist of the Harry Potter series written by British author J. K. Rowling. Voldemort first appeared in Harry Potter and the Philosopher's Stone, which was released in 1997...
.
Voldemort is still under development. It is neither an object database, nor a relational database. It does not try to satisfy arbitrary relations and the ACID
ACID
In computer science, ACID is a set of properties that guarantee database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction...
properties, but a big, distributed, fault-tolerant, persistent hash table.
Advantages
Voldemort offers a number of advantages over other databases:- It combines in-memory caching with the storage system so that a separate caching tier is not required (instead the storage system itself is just fast)
- It is possible to emulate the storage layer, as it is completely mockable. This makes the development and the unit testing easy, as it can be done against a throw-away in-memory storage system without the need for a real cluster or real storage system
- Reads and writes scale horizontally
- Simple API: The API decides data replication and placement and accommodates a wide range of application-specific strategies
- Transparent data portioning: This allows for cluster expansion without rebalancing all data
Properties
The Voldemort distributed data store has following properties:- Data placement: Support for pluggable data placement strategies exists to support things like distribution across data centers that are far apart.
- Data replication: The data is automatically replicated over a large number of servers.
- Data partitioning: The data is automatically partitioned so that the server contains only a subset of the total data
- Good single node performance: 10-20k operations per second can occur depending on the machines, the network, the disk system, and the data replication factor
- Node independence: Each node is independent of other nodes with no central point of failureSingle point of failureA single point of failure is a part of a system that, if it fails, will stop the entire system from working. They are undesirable in any system with a goal of high availability or reliability, be it a business practice, software application, or other industrial system.-Overview:Systems can be made...
or coordination - Pluggable serialization: This allows rich keys and values including lists and tuples with named fields, as well as the integration with common serialisation frameworks. Examples for these frameworks are Avro, Java Serialization, Protocol Buffers, and Thrift
- Transparent failures: Server failures are handled transparently so that the user doesn't see such problems
- Versioning: The data items are versioned to maximize data integrity in case of failure without compromising availability of the system