Distributed data store
Encyclopedia
A distributed data store is a blurred concept and means either a distributed database where users store their information on a number of nodes, or a network in which a user stores their information on a number of peer network nodes .

Distributed databases

Distributed data store are non-relational databases that make a quick access to data over a large number of nodes possible. Examples for this kind of data stores are Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

's BigTable
BigTable
BigTable is a compressed, high performance, and proprietary database system built on Google File System , Chubby Lock Service, SSTable and a few other Google technologies; it is currently not distributed nor is it used outside of Google, although Google offers access to it as part of their Google...

, which is much more than a distributed file system or a peer-to-peer network, Amazon
Amazon.com
Amazon.com, Inc. is a multinational electronic commerce company headquartered in Seattle, Washington, United States. It is the world's largest online retailer. Amazon has separate websites for the following countries: United States, Canada, United Kingdom, Germany, France, Italy, Spain, Japan, and...

's Dynamo
Dynamo (storage system)
Dynamo is a highly available, proprietary key-value structured storage system or a distributed data store. It has properties of both databases and distributed hash tables...


and Windows Azure Storage
Azure Services Platform
The Windows Azure Platform is a Microsoft cloud platform used to build, host and scale web applications through Microsoft data centers. Windows Azure Platform is thus classified as platform as a service and forms part of Microsoft's cloud computing strategy, along with their software as a service...

.

As the ability of arbitrary querying is not as important as the availability, designers of distributed data stores have increased the latter at an expense of consistency. But the high-speed read/write access results in reduced consistency, as it is not possible to have both consistency
Consistency
Consistency can refer to:* Consistency , the psychological need to be consistent with prior acts and statements* "Consistency", an 1887 speech by Mark Twain...

, availability
Availability
In telecommunications and reliability theory, the term availability has the following meanings:* The degree to which a system, subsystem, or equipment is in a specified operable and committable state at the start of a mission, when the mission is called for at an unknown, i.e., a random, time...

, and partition tolerance of the network, as it has been proven by the CAP theorem
CAP theorem
In theoretical computer science the CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:...

.

Peer network node data stores

In peer network data stores, the user can usually reciprocate and allow other users to use their computer as a storage node as well. Information may or may not be accessible to other users depending on the design of the network.

Most of the peer-to-peer
Peer-to-peer
Peer-to-peer computing or networking is a distributed application architecture that partitions tasks or workloads among peers. Peers are equally privileged, equipotent participants in the application...

 networks do not have distributed data stores in that the user's data is only available when their node is on the network. However, this distinction is somewhat blurred in a system such as BitTorrent, where it is possible for the originating node to go offline but the content to continue to be served. Still, this is only the case for individual files requested by the redistributors, as contrasted with a network such as Freenet
Freenet
Freenet is a decentralized, censorship-resistant distributed data store originally designed by Ian Clarke. According to Clarke, Freenet aims to provide freedom of speech through a peer-to-peer network with strong protection of anonymity; as part of supporting its users' freedom, Freenet is free and...

 where all computers are made available to serve all files.

Distributed data stores typically use an error detection and correction
Error detection and correction
In information theory and coding theory with applications in computer science and telecommunication, error detection and correction or error control are techniques that enable reliable delivery of digital data over unreliable communication channels...

 technique.
Some distributed data stores (such as Parchive
Parchive
Parchive is an open source software project that emerged in 2001 to develop a parity file format, as conceived by Tobias Rieper and Stefan Wehlus...

 over NNTP) use forward error correction
Forward error correction
In telecommunication, information theory, and coding theory, forward error correction or channel coding is a technique used for controlling errors in data transmission over unreliable or noisy communication channels....

 techniques to recover the original file when parts of that file are damaged or unavailable.
Others try again to download that file from a different mirror.

Distributed non-relational databases

  • Apache Cassandra, the data store of Facebook
    Facebook
    Facebook is a social networking service and website launched in February 2004, operated and privately owned by Facebook, Inc. , Facebook has more than 800 million active users. Users must register before using the site, after which they may create a personal profile, add other users as...

  • BigTable
    BigTable
    BigTable is a compressed, high performance, and proprietary database system built on Google File System , Chubby Lock Service, SSTable and a few other Google technologies; it is currently not distributed nor is it used outside of Google, although Google offers access to it as part of their Google...

    , the data store of Google
    Google
    Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

  • Dynamo
    Dynamo (storage system)
    Dynamo is a highly available, proprietary key-value structured storage system or a distributed data store. It has properties of both databases and distributed hash tables...

     of Amazon
    Amazon.com
    Amazon.com, Inc. is a multinational electronic commerce company headquartered in Seattle, Washington, United States. It is the world's largest online retailer. Amazon has separate websites for the following countries: United States, Canada, United Kingdom, Germany, France, Italy, Spain, Japan, and...

  • HBase
    HBase
    HBase is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS , providing BigTable-like capabilities for Hadoop...

  • Riak
    Riak
    Riak is a NoSQL database implementing the principles from Amazon's Dynamo paper.Riak has a pluggable backend for its core shard-partitioned storage, with the default storage backend being Bitcask as of the 0.12 release...

  • Voldemort
    Voldemort (distributed data store)
    Voldemort is a distributed data store that is designed as a key-value store used by LinkedIn for high-scalability storage. It is named after the fictional Harry Potter villain Lord Voldemort....



Peer network node data stores

  • BitTorrent
  • Chord project
    Chord project
    In computing, Chord is a protocol and algorithm for a peer-to-peer distributed hash table. A distributed hash table stores key-value pairs by assigning keys to different computers ; a node will store the values for all the keys for which it is responsible...

  • GNUnet
    GNUnet
    GNUnet is a free software framework for decentralized, peer-to-peer networking. The framework offers link encryption, peer discovery and resource allocation....

  • Freenet
    Freenet
    Freenet is a decentralized, censorship-resistant distributed data store originally designed by Ian Clarke. According to Clarke, Freenet aims to provide freedom of speech through a peer-to-peer network with strong protection of anonymity; as part of supporting its users' freedom, Freenet is free and...

  • NNTP
    Network News Transfer Protocol
    The Network News Transfer Protocol is an Internet application protocol used for transporting Usenet news articles between news servers and for reading and posting articles by end user client applications...

     (the distributed data storage protocol used for Usenet
    Usenet
    Usenet is a worldwide distributed Internet discussion system. It developed from the general purpose UUCP architecture of the same name.Duke University graduate students Tom Truscott and Jim Ellis conceived the idea in 1979 and it was established in 1980...

     news)
  • Mnet
  • Storage@home
    Storage@home
    Storage@home is a distributed storage infrastructure designed to store massive amounts of scientific data across a large host of volunteer machines.The project was developed by some of the Folding@home team at Stanford University.- Function :...


See also

  • Data store
    Data store
    A data store is a data repository of a set of integrated objects. These objects are modeled using classes defined in database schemas. Data store includes not only data repositories like databases, it is a more general concept that includes also flat files that can store data.Some data stores do...

  • Distributed file system
    Distributed file system
    Network file system may refer to:* A distributed file system, which is accessed over a computer network* Network File System , a specific brand of distributed file system...

  • Keyspace
    Keyspace (distributed data store)
    A key space in a NoSQL data store is an object that holds together all column families of a design. It is the outer most grouping of the data in the data store. It resembles to the schema concept in Relational database management systems. Generally, there is one keyspace per...

    , the DDS schema
  • Peer-to-peer
    Peer-to-peer
    Peer-to-peer computing or networking is a distributed application architecture that partitions tasks or workloads among peers. Peers are equally privileged, equipotent participants in the application...

  • Distributed hash table
    Distributed hash table
    A distributed hash table is a class of a decentralized distributed system that provides a lookup service similar to a hash table; pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK