Terrastore
Encyclopedia
Terrastore is a distributed, scalable and consistent document store
supporting single-cluster and multi-cluster deployments.
It provides advanced scalability support and elasticity feature without loosening the consistency at data level. Terrastore
provides ubiquity by using universally supported HTTP protocol
Data is partitioned
and distributed
among the nodes in the cluster(s) with automatic and transparent re-balancing when nodes join and leave. Moreover, it distributes the computational load for operations like queries and updates to the nodes that
actually hold the data. In this way Terrastore facilitates with scalability at both data and computational layers.
Terrastore employs Terracotta clustering software . Terracotta is used as a distributed lock manager for locking single document access during write operations, as an intra-cluster group membership service, and for durable document storage (and replication).
which is stored in documents and buckets which are analogous to table row and table correspondingly
in relational DBs. Data (documents and buckets) is partitioned according to the consistent hashing schema
and is distributed on different Terrastore servers.
Terrastore servers.
Master is responsible for managing the cluster membership: hence it notifies when the servers join/leave, changing the group
view. In addition to this membership management, Master is also responsible to durably store the whole documents. It is also
responsible for replicating the data to server nodes but it does not partition the data itself and partitioning strategy is
decided by the server nodes which is either the default consistent hashing or a user defined one. Replication is a pull strategy
performed by server nodes from the master node. Hence each server requests its own partition from the master. All the writes go
through the master but only the first read request goes through the master and later requests will be read from the server memory.
Each server owns a partition to which a number of documents are mapped. Each document is only own by one server node. If a request
is sent to server that does not own the document, then the request is routed to the corresponding server. All the write requests go
to both the server that owns the document and the master node.
The role of ensemble is to join multiple clusters and make them work together. It provides better scalability by providing multiple
active masters. It also facilitates the whole system partition-tolerance behavior. Thus in the case of partitioning the data will
be available locally but it can not be seen by other clusters except the cluster owns the data.
Document-oriented database
A document-oriented database is a computer program designed for storing, retrieving, and managing document-oriented, or semi structured data, information...
supporting single-cluster and multi-cluster deployments.
It provides advanced scalability support and elasticity feature without loosening the consistency at data level. Terrastore
provides ubiquity by using universally supported HTTP protocol
Data is partitioned
Partition (database)
A partition is a division of a logical database or its constituting elements into distinct independent parts. Database partitioning is normally done for manageability, performance or availability reasons....
and distributed
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...
among the nodes in the cluster(s) with automatic and transparent re-balancing when nodes join and leave. Moreover, it distributes the computational load for operations like queries and updates to the nodes that
actually hold the data. In this way Terrastore facilitates with scalability at both data and computational layers.
Terrastore employs Terracotta clustering software . Terracotta is used as a distributed lock manager for locking single document access during write operations, as an intra-cluster group membership service, and for durable document storage (and replication).
Data Model
Data model is pure JSONJSON
JSON , or JavaScript Object Notation, is a lightweight text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects...
which is stored in documents and buckets which are analogous to table row and table correspondingly
in relational DBs. Data (documents and buckets) is partitioned according to the consistent hashing schema
and is distributed on different Terrastore servers.
Building Blocks and Architecture
Terrastore system consists of an ensemble of clusters that in each cluster can exist one Terrastore master and severalTerrastore servers.
Master is responsible for managing the cluster membership: hence it notifies when the servers join/leave, changing the group
view. In addition to this membership management, Master is also responsible to durably store the whole documents. It is also
responsible for replicating the data to server nodes but it does not partition the data itself and partitioning strategy is
decided by the server nodes which is either the default consistent hashing or a user defined one. Replication is a pull strategy
performed by server nodes from the master node. Hence each server requests its own partition from the master. All the writes go
through the master but only the first read request goes through the master and later requests will be read from the server memory.
Each server owns a partition to which a number of documents are mapped. Each document is only own by one server node. If a request
is sent to server that does not own the document, then the request is routed to the corresponding server. All the write requests go
to both the server that owns the document and the master node.
The role of ensemble is to join multiple clusters and make them work together. It provides better scalability by providing multiple
active masters. It also facilitates the whole system partition-tolerance behavior. Thus in the case of partitioning the data will
be available locally but it can not be seen by other clusters except the cluster owns the data.