Amazon SimpleDB
Encyclopedia
Amazon SimpleDB is a distributed database written in Erlang by Amazon.com
Amazon.com
Amazon.com, Inc. is a multinational electronic commerce company headquartered in Seattle, Washington, United States. It is the world's largest online retailer. Amazon has separate websites for the following countries: United States, Canada, United Kingdom, Germany, France, Italy, Spain, Japan, and...

. It is used as a web service
Web service
A Web service is a method of communication between two electronic devices over the web.The W3C defines a "Web service" as "a software system designed to support interoperable machine-to-machine interaction over a network". It has an interface described in a machine-processable format...

 in concert with Amazon Elastic Compute Cloud
Amazon Elastic Compute Cloud
Amazon Elastic Compute Cloud is a central part of Amazon.com's cloud computing platform, Amazon Web Services . EC2 allows users to rent virtual computers on which to run their own computer applications...

 (EC2) and Amazon S3
Amazon S3
Amazon S3 is an online storage web service offered by Amazon Web Services. Amazon S3 provides storage through web services interfaces...

 and is part of Amazon Web Services
Amazon Web Services
Amazon Web Services is a collection of remote computing services that together make up a cloud computing platform, offered over the Internet by Amazon.com...

. It was announced on December 13, 2007.

As with EC2 and S3, Amazon charges fees for SimpleDB storage, transfer, and throughput over the Internet. On December 1, 2008, Amazon introduced a new pricing with free tier for 1 GB of data & 25 machine hours. Transfer to other Amazon Web Services is free of charge.

Limitations

SimpleDB provides eventual consistency
Eventual consistency
Eventual consistency is one of the consistency models used in the domain of parallel programming, for example in distributed shared memory, distributed transactions, and optimistic replication, it means that given a sufficiently long period of time over which no changes are sent, all updates can be...

, which is a weaker form of consistency, compared other database management systems. This is often considered a limitation, because it is harder to reason about, which makes it harder to write correct programs that make use of SimpleDB. This limitation is the result of a fundamental design trade-off. By foregoing consistency, the system is able to achieve two other highly desirable properties:
  1. availability - Components of the system may fail, but the service will continue to operate correctly.
  2. partition tolerance - Components in the system are connected to one another by a computer network
    Computer network
    A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....

    . If components are not able to contact one another using the network (a condition known as a network partition), operation of the system will continue.


Component failures are assumed to be inevitable; thus, both of these properties were deemed necessary in order to provide a reliable web service
Web service
A Web service is a method of communication between two electronic devices over the web.The W3C defines a "Web service" as "a software system designed to support interoperable machine-to-machine interaction over a network". It has an interface described in a machine-processable format...

. The CAP theorem
CAP theorem
In theoretical computer science the CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:...

 states that it is not possible for a system to exhibit these properties along with consistency; thus, the designers needed to settle for a weaker form of consistency.

Published limitations:

Store limitations

Attribute Maximum
domains 250 active domains per account. More can be requested by filling a form.
size of each domain 10 GB
attributes per domain 1,000,000,000
attributes per item 256 attributes
size per attribute 1024 bytes

Query limitations

Attribute Maximum
items returned in a query response 2500 items
seconds a query may run 5 seconds
attribute names per query predicate 1 attribute name
comparisons per predicate 20 operators
predicates per query expression 20 predicates

Conditional Put and Delete

Conditional put and conditional delete are new operations, that were added in February, 2010. They address a problem that arises when accessing SimpleDB concurrently. Consider a simple program that uses SimpleDB to store a counter, i.e. a number that can be incremented. The program must do three things:
  1. Retrieve the current value of the counter from SimpleDB.
  2. Add one to the value.
  3. Store the new value in the same place as the old value in SimpleDB.


If this program runs while no other programs access SimpleDB, it will work correctly; however, it is often desirable for software application (particularly web application
Web application
A web application is an application that is accessed over a network such as the Internet or an intranet. The term may also mean a computer software application that is coded in a browser-supported language and reliant on a common web browser to render the application executable.Web applications are...

s) to access the same data concurrently. When the same data is accessed concurrently, a race condition
Race condition
A race condition or race hazard is a flaw in an electronic system or process whereby the output or result of the process is unexpectedly and critically dependent on the sequence or timing of other events...

 arises, which would result in undetectable data loss.

Continuing the previous example, consider two processes, A and B, running the same program. Suppose SimpleDB services requests for data, as described in step 1, from both A and B. A and B see the same value. Let's say that the current value of the counter is 0. Because of steps 2 and 3, A will try to store 1. B will try to do the same; thus, the final counter value will be 1, even though the expected final counter value is 2, because the system attempted two increment operations, one by A, and another by B.

This problem can be solved by the use of conditional put. Suppose we change step 3 as follows: instead of unconditionally storing the new value, the program asks SimpleDB to store the new value only if the value that it currently holds is the same as the value that was retrieved in step 1. Then, we can be sure that the counter's value actually increases. This introduces some additional complexity; if SimpleDB was not able to store the new value because the current value was not as expected, the program must repeat steps 1-3 until the conditional put operation actually changes the stored value.

Consistent Read

Consistent read was a new feature that was released at the same time as conditional put and conditional delete. As the name suggests, consistent read addresses problems that arise due to SimpleDB's eventual consistency model (See the Limitations section). Consider the following sequence of operations:
  1. Program A stores some data in SimpleDB.
  2. Immediately after, A requests the data it just stored.


SimpleDB's eventual consistency guarantee does not allow us to say that the data retrieved in step 2 reflects the updates that were made in step 1. Eventual consistency only guarantees that step 2 reflects the complete set of updates in step 1, or none of those updates. Consistent read can be used to ensure that the data retrieved in step 2 reflect changes in step 1.

The reason that inconsistent results can arise when the consistent read operation is not used is that SimpleDB stores data in multiple locations (for availability), and the new data in step 1 might not be written at all locations when SimpleDB receives the data request in step 2. In that case, it is possible that the data request in step 2 is serviced at one of the locations where the new data has not been written.

Amazon discourages the use of consistent read, unless it is required for correctness. The reason for this recommendation is that consistent read operations cannot be serviced as quickly. More precisely, the rate at which consistent reads are serviced is lower, compared to regular reads.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK