Document-oriented database
Encyclopedia
A document-oriented database is a computer program
designed for storing, retrieving, and managing document-oriented, or semi structured data
, information. Document-oriented databases are one of the main categories of so-called NoSQL
databases and the popularity of the term "document-oriented database" (or "document store") has grown with the use of the term NoSQL
itself.
, YAML
, JSON
and BSON
, as well as binary forms like PDF and Microsoft Office documents (MS Word, Excel, and so on).
Documents inside a document-oriented database are similar, in some ways, to records or rows, in relational databases, but they are less rigid. They are not required to adhere to a standard schema nor will they have all the same sections, slots, parts, keys, or the like. For example here's a document:
Another document could be:
Both documents have some similar information and some different. Unlike a relational database where each record would have the same set of fields and unused fields might be kept empty, there are no empty 'fields' in either document (record) in this case. This system allows new information to be added and it doesn't require explicitly stating if other pieces of information are left out.
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
designed for storing, retrieving, and managing document-oriented, or semi structured data
Semi-structured model
The semi-structured model is a database model. In this model, there is no separation between the data and the schema, and the amount of structure used depends on the purpose.The advantages of this model are the following:...
, information. Document-oriented databases are one of the main categories of so-called NoSQL
Nosql
In computing, NoSQL is a broad class of database management systems that differ from the classic model of the relational database management system in some significant ways. These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally...
databases and the popularity of the term "document-oriented database" (or "document store") has grown with the use of the term NoSQL
Nosql
In computing, NoSQL is a broad class of database management systems that differ from the classic model of the relational database management system in some significant ways. These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally...
itself.
Documents
The central concept of a document-oriented database is the notion of a Document. While each document-oriented database implementation differs on the details of this definition, in general, they all assume documents encapsulate and encode data (or information) in some standard format(s) (or encoding(s)). Encodings in use include XMLXML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
, YAML
YAML
YAML is a human-readable data serialization format that takes concepts from programming languages such as C, Perl, and Python, and ideas from XML and the data format of electronic mail . YAML was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net and Oren Ben-Kiki...
, JSON
JSON
JSON , or JavaScript Object Notation, is a lightweight text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects...
and BSON
BSON
BSON is a computer data interchange format used mainly as a data storage and network transfer format in the MongoDB database. It is a binary form for representing simple data structures and associative arrays...
, as well as binary forms like PDF and Microsoft Office documents (MS Word, Excel, and so on).
Documents inside a document-oriented database are similar, in some ways, to records or rows, in relational databases, but they are less rigid. They are not required to adhere to a standard schema nor will they have all the same sections, slots, parts, keys, or the like. For example here's a document:
- FirstName="Bob", Address="5 Oak St.", Hobby="sailing".
Another document could be:
- FirstName="Jonathan", Address="15 Wanamassa Point Road", Children=[{Name:"Michael",Age:10}, {Name:"Jennifer", Age:8}, {Name:"Samantha", Age:5}, {Name:"Elena", Age:2}].
Both documents have some similar information and some different. Unlike a relational database where each record would have the same set of fields and unused fields might be kept empty, there are no empty 'fields' in either document (record) in this case. This system allows new information to be added and it doesn't require explicitly stating if other pieces of information are left out.
Keys
Documents are addressed in the database via a unique key that represents that document. Often, this key is a simple string. In some cases, this string is a URI or path. Regardless, you can use this key to retrieve the document from the database. Typically, the database retains an index on the key such that document retrieval is fast.Retrieval
One of the other defining characteristics of a document-oriented database is that, beyond the simple key-document (or key-value) lookup that you can use to retrieve a document, the database will offer an API or query language that will allow you to retrieve documents based on their contents. For example, you may want a query that gets you all the documents with a certain field set to a certain value. The set of query APIs or query language features available, as well as the expected performance of the queries, varies significantly from one implementation to the next.Organization
Implementations offer a variety of ways of organizing documents, including notions of- Collections
- Tags
- Non-visible Metadata
- DIrectory hierarchies
Implementations
Name | Publisher | License | Language | Notes | RESTful Representational State Transfer Representational state transfer is a style of software architecture for distributed hypermedia systems such as the World Wide Web. The term representational state transfer was introduced and defined in 2000 by Roy Fielding in his doctoral dissertation... API |
---|---|---|---|---|---|
Lotus Notes Lotus Notes Lotus Notes is the client of a collaborative platform originally created by Lotus Development Corp. in 1989. In 1995 Lotus was acquired by IBM and became known as the Lotus Development division of IBM and is now part of the IBM Software Group... |
IBM | Proprietary Proprietary software Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary... |
(unknown) | ||
askSam | askSam Systems | Proprietary Proprietary software Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary... |
(unknown) | ||
Apstrata | Apstrata | Proprietary Proprietary software Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary... |
(unknown) | ||
Datawasp | Significant Data Systems | Proprietary Proprietary software Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary... |
(unknown) | ||
Clusterpoint Clusterpoint Clusterpoint is a high-performance, schema-free, document-oriented database server written in the C++ programming language. It manages collections of XML documents that are stored in native XML data format. It allows many applications to store data in a natural human-readable way that matches... |
Clusterpoint Ltd. | Free community license / Commercial Commercial software Commercial software, or less commonly, payware, is computer software that is produced for sale or that serves commercial purposes.Commercial software is most often proprietary software, but free software packages may also be commercial software.... |
C++ C++ C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell... |
Scalable, high-performance, schema-free, document-oriented database management system Database management system A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated... platform with server based data storage, fast full text search Full text search In text retrieval, full text search refers to techniques for searching a single computer-stored document or a collection in a full text database... engine functionality, information ranking Ranking A ranking is a relationship between a set of items such that, for any two items, the first is either 'ranked higher than', 'ranked lower than' or 'ranked equal to' the second.... for search revelevance and clustering Cluster Computing Cluster Computing: the Journal of Networks, Software Tools and Applications is a journal for parallel processing, distributed computing systems, and computer communication networks.... . |
Yes |
CRX | Day Software | Proprietary Proprietary software Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary... |
(unknown) | ||
MUMPS MUMPS MUMPS , or alternatively M, is a programming language created in the late 1960s, originally for use in the healthcare industry. It was designed for the production of multi-user database-driven applications... Database |
Proprietary Proprietary software Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary... and GNU Affero GPL |
MUMPS MUMPS MUMPS , or alternatively M, is a programming language created in the late 1960s, originally for use in the healthcare industry. It was designed for the production of multi-user database-driven applications... |
Commonly used in health applications. | (unknown) | |
UniVerse Universe The Universe is commonly defined as the totality of everything that exists, including all matter and energy, the planets, stars, galaxies, and the contents of intergalactic space. Definitions and usage vary and similar terms include the cosmos, the world and nature... |
Rocket Software | Proprietary Proprietary software Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary... |
Yes (Beta) | ||
UniData | Rocket Software | Proprietary Proprietary software Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary... |
Yes (Beta) | ||
Jackrabbit Apache Jackrabbit Apache Jackrabbit is an open source content repository for the Java platform. The Jackrabbit project was started on August 28, 2004, when Day Software licensed an initial implementation of the Java Content Repository API . Jackrabbit was also used as the reference implementation of JSR-170,... |
Apache Software Foundation Apache Software Foundation The Apache Software Foundation is a non-profit corporation to support Apache software projects, including the Apache HTTP Server. The ASF was formed from the Apache Group and incorporated in Delaware, U.S., in June 1999.The Apache Software Foundation is a decentralized community of developers... |
Apache License Apache License The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer.... |
Java Java (programming language) Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities... |
(unknown) | |
CouchDB CouchDB Apache CouchDB, commonly referred to as CouchDB, is an open source document-oriented database written mostly in the Erlang programming language. It is part of the NoSQL group of data stores and is designed for local replication and to scale horizontally across a wide range of devices... |
Couchbase Couchbase Couchbase is a Silicon Valley-based enterprise software company which offers product "Couchbase ", plus sells support & training for these... , Apache Software Foundation Apache Software Foundation The Apache Software Foundation is a non-profit corporation to support Apache software projects, including the Apache HTTP Server. The ASF was formed from the Apache Group and incorporated in Delaware, U.S., in June 1999.The Apache Software Foundation is a decentralized community of developers... |
Apache License Apache License The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer.... |
Erlang | JSON over REST/HTTP with Multi-Version Concurrency Control and ACID ACID In computer science, ACID is a set of properties that guarantee database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction... properties. Uses map Map (higher-order function) In many programming languages, map is the name of a higher-order function that applies a given function to each element of a list, returning a list of results. They are examples of both catamorphisms and anamorphisms... and reduce Fold (higher-order function) In functional programming, fold – also known variously as reduce, accumulate, compress, or inject – are a family of higher-order functions that analyze a recursive data structure and recombine through use of a given combining operation the results of recursively processing its... for views and queries. |
Yes (there is only RESTful API) |
FleetDB | FleetDB | MIT License MIT License The MIT License is a free software license originating at the Massachusetts Institute of Technology . It is a permissive license, meaning that it permits reuse within proprietary software provided all copies of the licensed software include a copy of the MIT License terms... |
Clojure Clojure Clojure |closure]]") is a recent dialect of the Lisp programming language created by Rich Hickey. It is a general-purpose language supporting interactive development that encourages a functional programming style, and simplifies multithreaded programming.... |
A JSON-based schema-free database optimized for agile development. | (unknown) |
MongoDB MongoDB MongoDB is an open source, high-performance, schema-free, document-oriented database written in the C++ programming language... |
10gen, Inc | GNU AGPL v3.0 Affero General Public License The Affero General Public License, often abbreviated as Affero GPL and AGPL , refers to two distinct, though historically related, free software licenses:... |
C, C++ C++ C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell... , Erlang Erlang Erlang may refer to:* Agner Krarup Erlang , a mathematician and engineer after whom several concepts are named** Erlang , a unit to measure traffic in telecommunications or other domains... , Haskell, Java Java Java is an island of Indonesia. With a population of 135 million , it is the world's most populous island, and one of the most densely populated regions in the world. It is home to 60% of Indonesia's population. The Indonesian capital city, Jakarta, is in west Java... , Javascript, .NET .NET Framework The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability... (C# F#, PowerShell, etc), Perl, PHP, Python, Ruby, Scala |
Fast, document-oriented database optimized for highly transient data. | Optional using external tools |
GemFire Enterprise http://www.vmware.com/products/vfabric-gemfire | VMWare | Commercial Commercial software Commercial software, or less commonly, payware, is computer software that is produced for sale or that serves commercial purposes.Commercial software is most often proprietary software, but free software packages may also be commercial software.... |
Java, .NET, | Memory-oriented, fast, key-value database with indexing and querying support. | Yes |
OrientDB OrientDB OrientDB is an open source NoSQL database management system written in Java. Even if it is a document-based database, the relationships are managed as in graph databases with direct connections between records. It supports schema-less, schema-full and schema-mixed modes. It has a strong security... |
Orient Technologies | Apache License Apache License The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer.... |
Java Java (programming language) Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities... |
JSON over HTTP | Yes |
RavenDB | RavenDB | commercial or GNU AGPL v3.0 Affero General Public License The Affero General Public License, often abbreviated as Affero GPL and AGPL , refers to two distinct, though historically related, free software licenses:... |
.NET .NET Framework The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability... |
A .NET LINQ-enabled Document Database, focused on providing high performance, transactional, schema-less, flexible and scalable NoSQL data store for the .NET and Windows platforms. | Yes |
Redis Redis Redis is used to refer to Romani people.Redis may also refer to:* Redis , an advanced key-value store... |
BSD License | ANSI C ANSI C ANSI C refers to the family of successive standards published by the American National Standards Institute for the C programming language. Software developers writing in C are encouraged to conform to the standards, as doing so aids portability between compilers.-History and outlook:The first... |
Key-value store supporting lists and sets with fast, simple and binary-safe protocol. | (unknown) | |
StrokeDB | http://strokedb.com/ | MIT License MIT License The MIT License is a free software license originating at the Massachusetts Institute of Technology . It is a permissive license, meaning that it permits reuse within proprietary software provided all copies of the licensed software include a copy of the MIT License terms... |
Alpha software. | (unknown) | |
Terrastore Terrastore Terrastore is a distributed, scalable and consistent document store supporting single-cluster and multi-cluster deployments.It provides advanced scalability support and elasticity feature without loosening the consistency at data level... |
Apache License Apache License The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer.... |
Java Java (programming language) Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities... |
JSON/HTTP | (unknown) | |
ThruDB | BSD License | C++, Java Java (programming language) Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities... |
Built on top of Apache Thrift framework that provides indexing and document storage services for building and scaling websites. Alternate implementation is being developed in Java. Alpha software. | (unknown) | |
Persevere | Persevere | BSD License | A JSON database and JavaScript Application Server. Provides RESTful JSON interface for Create, read, update, and delete access to data. Also supports JSONQuery/JSONPath querying. | Yes | |
DBSlayer | DBSlayer | Apache License Apache License The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer.... |
C C (programming language) C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system.... |
database abstraction layer Database abstraction layer A database abstraction layer is an application programming interface which unifies the communication between a computer application and databases such as SQL Server, DB2, MySQL, PostgreSQL, Oracle or SQLite... (over MySQL MySQL MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My... ) used by the New York Times. JSON over HTTP. |
(unknown) |
Eloquera DB | Eloquera | Proprietary | .NET .NET Framework The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability... |
High performance. Based on Dynamic objects. Supports LINQ, SQL queries. | (unknown) |
XML database implementations
All XML databases are document-oriented databases.See also
- Internet Message Access ProtocolInternet Message Access ProtocolInternet message access protocol is one of the two most prevalent Internet standard protocols for e-mail retrieval, the other being the Post Office Protocol...
(IMAP) - Database theoryDatabase theoryDatabase theory encapsulates a broad range of topics related to the study and research of the theoretical realm of databases and database management systems....
- In-memory database
- NoSQLNosqlIn computing, NoSQL is a broad class of database management systems that differ from the classic model of the relational database management system in some significant ways. These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally...
- Object databaseObject databaseAn object database is a database management system in which information is represented in the form of objects as used in object-oriented programming...
- Online databaseOnline databaseAn online database is a database accessible from a network, including from the Internet.It differs from a local database, held in an individual computer or its attached storage, such as a CD....
- Real time databaseReal time databaseA real-time database is a processing system designed to handle workloads whose state is constantly changing . This differs from traditional databases containing persistent data, mostly unaffected by time. For example, a stock market changes very rapidly and is dynamic...
- Relational databaseRelational databaseA relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
- Data hierarchyData hierarchyData Hierarchy refers to the systematic organization of data, often in a hierarchical form. Data organization involves fields, records, files and so on....
Further reading
- Assaf Arkin. (2007, September 20). Read Consistency: Dumb Databases, Smart Services. Labnotes:Don’t let the bubble go to your head!
External links
- http://solprovider.com/articles/20020612&cat=Lotus/IBM