Citrusleaf database
Encyclopedia
The Citrusleaf database is an ACID-compliant, post-relational NoSQL database produced and marketed by Citrusleaf, Inc. It was originally developed for managing the mission-critical data for applications on the Real-time web
Real-time web
The real-time web is a set of technologies and practices that enable users to receive information as soon as it is published by its authors, rather than requiring that they or their software check a source periodically for updates....

. These applications require the ability to store 5 to 10 Kilobytes of information on hundreds of millions of webs users and compare it to potential ads to display with sub-millisecond response time. Citrusleaf takes advantage of the properties of Solid-state drive
Solid-state drive
A solid-state drive , sometimes called a solid-state disk or electronic disk, is a data storage device that uses solid-state memory to store persistent data with the intention of providing access in the same manner of a traditional block i/o hard disk drive...

 (SSD) to accomplish this. As of 2010 Citrusleaf has been implemented in production.

History

While at Yahoo!
Yahoo!
Yahoo! Inc. is an American multinational internet corporation headquartered in Sunnyvale, California, United States. The company is perhaps best known for its web portal, search engine , Yahoo! Directory, Yahoo! Mail, Yahoo! News, Yahoo! Groups, Yahoo! Answers, advertising, online mapping ,...

 and Aggregate Knowledge, the founders of Citrusleaf Corporation encountered a problem. The volume and performance demands of Real-time web
Real-time web
The real-time web is a set of technologies and practices that enable users to receive information as soon as it is published by its authors, rather than requiring that they or their software check a source periodically for updates....

 applications caused traditional SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

 databases to fail. This was due to several reasons. The first was the sheer volume of data. Keeping track of 5 to 10 Kilobytes of information for each of hundreds of millions of people produced a database with billions of objects. Retrieving and processing this information with sub-millisecond response time was impossible with traditional database approaches. Traditional databases approaches were designed with traditional rotational disk storage in mind. The average seek time of rotating disk storage is ten milliseconds and therefore a sub-millisecond response time is not possible.

Design Drivers

The answer lay in making use of solid state drives SSD
SSD
-Computing:* Solid-state drive, a type of data storage device which uses memory rather than rotating media* Seven-segment display, a display which uses 7 segments to display mostly numbers* System sequence diagram, a type of UML software engineering diagram...

. In addition to performance, Fault-tolerant design
Fault-tolerant design
In engineering, fault-tolerant design is a design that enables a system to continue operation, possibly at a reduced level , rather than failing completely, when some part of the system fails...

 was an issue. Their applications were mission-critical so in addition to the performance requirements the solution had to be available without interruption. Therefore in 2008 Brian Bulkowski created a key-value data store and later was joined by Srini Srinivasan in 2009. Together they created the Citrusleaf database platform. The Citrusleaf database platform is an ACID
ACID
In computer science, ACID is a set of properties that guarantee database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction...

-compliant, extremely fast, scalable, fault-tolerant database engine. The system is capable of 100,000 transactions per second per node, with a response time of under one millisecond. To support these transaction loads in a non-stop manner during node arrivals and departures, the authors created software solutions in the areas of distributed systems, real-time prioritization, and storage management across all kinds of storage.

Data model

Citrusleaf organizes all data into namespaces. These namespaces are similar to a database instance in an RDBMS, and control policies like replication count and storage location. Within a namespace, individual data objects are referenced by tables and primary keys which could be strings, integers, or binary data. A key is a unique reference to a piece of data: common keys include usernames and session identifiers.

Each data object is a collection of 'bins' in Citrusleaf's parlance, which are similar to column names in SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

. The system is schema-less in that different columns can be used in different data objects of the same table. Each column's value is typed. The types supported are strings, integers, blobs, and "reflection blobs", which are binary data which has been reflected by the serializer of an individual object (such as a Java blob generated by Java's serializer). The use of typed values allows different languages to inter-operate simply: a string set in Java will appear correctly through the Python client, even though Java and Python use different underlying character representations (Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 vs UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

).

Some high level operations (such as atomically adding integers) are supported, in the style of Redis
Redis
Redis is used to refer to Romani people.Redis may also refer to:* Redis , an advanced key-value store...

, but the set of instructions is not very rich.

Citrusleaf's data model allows it to be considered as a document store, although it is more similar to a schema-less version of the row based schema typically used in relational systems.

Replication and Failover

  • Automatic failure detection and in-flight transaction rerouting for nonstop operation in the face of failure.
  • Automatic Client failover: Clients track cluster membership for automatic load balancing and transaction re-try.
  • Flexible replication policy: Set replication factors for individual data items.
  • Randomized object replication allows smooth load balancing during failure recovery.

Scalability and Performance

  • Distributed object store: Easily store and retrieve large volumes of data through Citrusleaf client for C, C#, PHP, Java, Python and Ruby.
  • Automatic cluster resizing and rebalancing: Citrusleaf cluster will automatically grow or shrink using zeroconfig networking.
  • High sustained throughput of over 100,000 transactions per second per commodity node.
  • Real-time performance: Low, predictable sub-millisecond latency from memory or flash storage.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK