MongoDB
Encyclopedia
MongoDB is an open source, high-performance, schema-free, document-oriented database
Document-oriented database
A document-oriented database is a computer program designed for storing, retrieving, and managing document-oriented, or semi structured data, information...

 written in the C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

  programming language. It manages collections of BSON
BSON
BSON is a computer data interchange format used mainly as a data storage and network transfer format in the MongoDB database. It is a binary form for representing simple data structures and associative arrays...

 documents that can be nested in complex hierarchies and still be easy to query and index, which allows many applications to store data in a natural way that matches their native data types and structures.

Development of MongoDB began in October 2007 by 10gen
10gen
10gen is a software company that develops and provides commercial support for the open source database MongoDB.-Overview:10gen was founded in 2007 by former DoubleClick Founder and CTO Dwight Merriman and former DoubleClick engineer and ShopWiki Founder and CTO Eliot Horowitz...

. The first public release was in February 2009.

Features

Among the features are:
  • Consistent UTF-8
    UTF-8
    UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

     encoding. Non-UTF-8 data can be saved, queried, and retrieved with a special binary data type.
  • Cross-platform support: binaries are available for Windows, Linux
    Linux
    Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

    , OS X, and Solaris. MongoDB can be compiled on almost any little-endian system.
  • Type-rich: supports dates, regular expressions, code, binary data, and more (all BSON
    BSON
    BSON is a computer data interchange format used mainly as a data storage and network transfer format in the MongoDB database. It is a binary form for representing simple data structures and associative arrays...

     types)
  • Cursor
    Cursor (databases)
    In computer science and technology, a database cursor is a control structure that enables traversal over the records in a database. Cursors facilitate subsequent processing in conjunction with the traversal, such as retrieval, addition and removal of database records...

    s for query results


More features:

Ad hoc queries

In MongoDB, any field can be queried at any time. MongoDB supports range queries, regular expression searches, and other special types of queries in addition to exactly matching fields. Queries can also include user-defined JavaScript functions (if the function returns true, the document matches).

Queries can return specific fields of documents (instead of the entire document), as well as sorting, skipping, and limiting results.

Querying nested fields

Queries can "reach into" embedded objects and arrays. If the following object is inserted into the users collection:

Indexing

The software supports secondary indexes, including single-key, compound, unique, non-unique, and geospatial indexes. Nested fields (as described above in the ad hoc query section) can also be indexed and indexing an array type will index each element of the array.

MongoDB's query optimizer will try a number of different query plans when a query is run and select the fastest, periodically resampling. Developers can see the index being used with the `explain` function and choose a different index with the `hint` function.

Indexes can be created or removed at any time.

Aggregation

In addition to ad hoc queries, the database supports a couple of tools for aggregation, including MapReduce and a group function similar to SQL's GROUP BY.

File storage

The software implements a protocol called GridFS that is used to store and retrieve files from the database. This file storage mechanism has been used in plugins for NGINX
Nginx
nginx is a Web server and a reverse proxy server for HTTP, SMTP, POP3 and IMAP protocols, with a strong focus on high concurrency, performance and low memory usage. It is licensed under a BSD-like license and it runs on Unix, Linux, BSD variants, Mac OS X, Solaris, and Microsoft Windows.- Overview...

 and lighttpd
Lighttpd
lighttpd is an open-source web server more optimized for speed-critical environments than common products while remaining standards-compliant, secure and flexible...

.

Server-side JavaScript execution

JavaScript is the lingua franca
Lingua franca
A lingua franca is a language systematically used to make communication possible between people not sharing a mother tongue, in particular when it is a third language, distinct from both mother tongues.-Characteristics:"Lingua franca" is a functionally defined term, independent of the linguistic...

 of MongoDB and can be used in queries, aggregation functions (such as MapReduce), and sent directly to the database to be executed.

Example of JavaScript in a query:

> db.foo.find({$where : function { return this.x this.y; }})


Example of code sent to the database to be executed:

> db.eval(function(name) { return "Hello, "+name; }, ["Joe"])


This returns "Hello, Joe".

JavaScript variables can also be stored in the database and used by any other JavaScript as a global variable. Any legal JavaScript type, including functions and objects, can be stored in MongoDB so that JavaScript can be used to write "stored procedures."

Capped collections

MongoDB supports fixed-size collections called capped collections. A capped collection is created with a set size and, optionally, number of elements. Capped collections are the only type of collection that maintains insertion order: once the specified size has been reached, a capped collection behaves like a circular queue.

A special type of cursor, called a tailable cursor, can be used with capped collections. This cursor was named after the `tail -f` command, and does not close when it finishes returning results but continues to wait for more to be returned, returning new results as they are inserted into the capped collection.
Deployment
MongoDB can be built and installed from source, but it is more commonly installed from a binary package. Many Linux package management systems now include a MongoDB package, including CentOS and Fedora, Debian and Ubuntu, Gentoo and Arch Linux. It can also be acquired through the official website.

MongoDB uses memory-mapped file
Memory-mapped file
A memory-mapped file is a segment of virtual memory which has been assigned a direct byte-for-byte correlation with some portion of a file or file-like resource. This resource is typically a file that is physically present on-disk, but can also be a device, shared memory object, or other resource...

s, limiting data size to 2GB on 32-bit machines (64-bit systems have a much larger data size). The MongoDB server can only be used on little-endian
Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...

 systems, although most of the drivers work on both little-endian and big-endian systems.

Language support

MongoDB has official drivers for:
  • C
    C (programming language)
    C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

  • C++
    C++
    C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

  • C#
  • Erlang
  • Haskell
    Haskell (programming language)
    Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...

  • Java
    Java (programming language)
    Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

  • JavaScript
    JavaScript
    JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....

  • Lisp
  • Perl
    Perl
    Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

  • PHP
    PHP
    PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...

  • Python
    Python (programming language)
    Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

  • Ruby
    Ruby (programming language)
    Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...

  • Scala


There are also a large number of unofficial drivers, for C# and .NET, ColdFusion, Delphi, Erlang, Factor, Fantom, Go, JVM languages (Clojure, Groovy, Scala, etc.), Lua, node.js, HTTP REST, Ruby, Racket, and Smalltalk.

Replication

MongoDB supports master-slave replication. A master can perform reads and writes. A slave copies data from the master and can only be used for reads or backup (not writes).

MongoDB allows developers to guarantee that an operation has been replicated to at least N servers on a per-operation basis.

Master-slave

As operations are performed on the master, the slave will replicate any changes to the data.

Example: starting a master/slave pair locally:


$ mkdir -p ~/dbs/master ~/dbs/slave
$ ./mongod --master --port 10000 --dbpath ~/dbs/master
$ ./mongod --slave --port 10001 --dbpath ~/dbs/slave --source localhost:10000

Replica sets

Replica sets are similar to master-slave, but they incorporate the ability for the slaves to elect a new master if the current one goes down.

Sharding

MongoDB scales horizontally using a system called sharding which is very similar to the BigTable
BigTable
BigTable is a compressed, high performance, and proprietary database system built on Google File System , Chubby Lock Service, SSTable and a few other Google technologies; it is currently not distributed nor is it used outside of Google, although Google offers access to it as part of their Google...

 and PNUTS
Pnuts
Pnuts is a dynamic scripting language for the Java platform. It is designed to be used in a dual language system with the Java programming language. The goals of the Pnuts project are to provide a small, fast scripting language that has tight integration with the Java language...

 scaling model. The developer chooses a shard key, which determines how the data in a collection will be distributed. The data is split into ranges (based on the shard key) and distributed across multiple shards. (A shard is a master with one or more slaves.)

The developer's application must know that it is talking to a sharded cluster when performing some operations. For example, a "findAndModify" query must contain the shard key if the queried collection is sharded. The application talks to a special routing process called `mongos` that looks identical to a single MongoDB server. This `mongos` process knows what data is on each shard and routes the client's requests appropriately. All requests flow through this process: it not only forwards requests and responses but also performs any necessary final data merges or sorts. Any number of `mongos` processes can be run: usually one per application server is recommended.

Official tools

The most powerful and useful management tool is the database shell, mongo. The shell lets developers view, insert, remove, and update data in their databases, as well as get replication information, setting up sharding, shut down servers, execute JavaScript, and more. mongo is built on SpiderMonkey
SpiderMonkey
SpiderMonkey is the code name for the first-ever JavaScript engine, written by Brendan Eich at Netscape Communications, later released as open source and now maintained by the Mozilla Foundation.-History:Eich "wrote JavaScript in ten days" in 1995,...

, so it is a full JavaScript shell as well as being able to connect to MongoDB servers.

Administrative information can also be accessed through the admin interface: a simple html webpage that serves information about the current server status. By default, this interface is 1000 ports above the database port (http://localhost:28017) and it can be turned off with the --norest option.

mongostat is a command-line tool that displays a simple list of stats about the last second: how many inserts, updates, removes, queries, and commands were performed, as well as what percentage of the time the database was locked and how much memory it is using.

mongosniff sniffs network traffic going to and from MongoDB.

Monitoring

There are monitoring plugins available for MongoDB:
  • munin
    Munin (Network Monitoring Application)
    Munin is a network/system monitoring application that presents output in graphs through a web interface. Its emphasis is on plug and play capabilities. About 500 monitoring plugins are currently available. Using Munin you can monitor the performance of your computers, networks, SANs, and...

  • ganglia
    Ganglia (software)
    Ganglia is a scalable distributed system monitor tool for high-performance computing systems such as clusters and grids. It allows the user to remotely view live or historical statistics for all machines that are being monitored.-Ganglia:It is based on a hierarchical design targeted at...

  • scout
    Scout
    A scout is a soldier performing reconnaissance and other support duties.Scout may also refer to:-Aircraft:* Scout , pre-1920s terminology for a single-seat fighter...

  • cacti

GUIs

Several GUIs have been created by MongoDB's developer community to help visualize their data. Some popular ones are:
  • Fang of Mongo – a web-based UI built with Django and jQuery.
  • Futon4Mongo – a clone of the CouchDB
    CouchDB
    Apache CouchDB, commonly referred to as CouchDB, is an open source document-oriented database written mostly in the Erlang programming language. It is part of the NoSQL group of data stores and is designed for local replication and to scale horizontally across a wide range of devices...

     Futon web interface for MongoDB.
  • JMongoBrowser – a desktop application for all platforms.
  • Mongo3 – a Ruby-based interface.
  • MongoHub – a native OS X application for managing MongoDB.
  • Opricot – a browser-based MongoDB shell written in PHP.
  • Database Master Windows based MongoDB Management Studio, supports also RDBMS.

Licensing and support


MongoDB is available for free under the GNU Affero General Public License
Affero General Public License
The Affero General Public License, often abbreviated as Affero GPL and AGPL , refers to two distinct, though historically related, free software licenses:...

. The language drivers are available under an Apache License
Apache License
The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer....

.

Prominent users
  • MTV Networks
    MTV Networks
    MTV Networks is a division of media conglomerate Viacom that oversees the operations of many television channels and Internet brands, including the original MTV channel in the United States...

  • craigslist
    Craigslist
    Craigslist is a centralized network of online communities featuring free online classified advertisements, with sections devoted to jobs, housing, personals, for sale, services, community, gigs, résumés, and discussion forums....

  • Disney Interactive Media Group
    Disney Interactive Media Group
    The Disney Interactive Media Group , formerly known as the Walt Disney Internet Group and Disney Interactive Studios, oversees various websites and interactive media owned by The Walt Disney Company and its subsidiaries....

  • Wordnik
    Wordnik
    Wordnik.com is an online dictionary and language resource that provides dictionary and thesaurus content, some of it based on print dictionaries such as the Century Dictionary, the American Heritage Dictionary, WordNet, and GCIDE...

  • diaspora
    Diaspora (software)
    Diaspora is a free personal web server that implements a distributed social networking service. Installations of the software form nodes which make up the distributed Diaspora social network....

  • Shutterfly
    Shutterfly
    Shutterfly is an Internet-based social expression and personal publishing service. Shutterfly's flagship product is its photo book line. It is based in Redwood City, California.-Features:...

  • foursquare
  • bit.ly
    Bit.ly
    bitly is a URL shortening service owned by bitly, Inc., a betaworks company. It is especially popular on microblogging website Twitter because it is the default URL shortening service on the website since May 6, 2009, replacing TinyURL...

  • The New York Times
    The New York Times
    The New York Times is an American daily newspaper founded and continuously published in New York City since 1851. The New York Times has won 106 Pulitzer Prizes, the most of any news organization...

  • SourceForge
    SourceForge
    SourceForge Enterprise Edition is a collaborative revision control and software development management system. It provides a front-end to a range of software development lifecycle services and integrates with a number of free software / open source software applications .While originally itself...

  • Business Insider
    Business Insider
    Business Insider is a U.S. business/entertainment news website launched in February 2009. Founded by DoubleClick Founder and former C.E.O. Kevin P. Ryan it is the overarching brand beneath which fall the Silicon Alley Insider and Clusterstock verticals...

  • Etsy
    Etsy
    Etsy is an e-commerce website focused on handmade or vintage items as well as art and craft supplies. These items cover a wide range including art, photography, clothing, jewelry, edibles, bath & beauty products, quilts, knick-knacks and toys. Many individuals also sell craft supplies like beads,...

  • CERN LHC
    Large Hadron Collider
    The Large Hadron Collider is the world's largest and highest-energy particle accelerator. It is expected to address some of the most fundamental questions of physics, advancing the understanding of the deepest laws of nature....

  • Thumbtack
    Thumbtack (website)
    Thumbtack is an internet marketplace for local services, launched in December 2009. Thumbtack allows service providers and consumers to find each other and negotiate jobs online...

  • AppScale
    AppScale
    AppScale is an open-source framework for running Google App Engine applications. It is an implementation of a cloud computing platform , supporting Xen, KVM, Amazon EC2 and Eucalyptus. It has been developed and is maintained by the RACELab at UC Santa Barbara.AppScale allows users to upload...

  • Uber
    Über
    Über comes from the German language. It has one umlaut. It is a cognate of both Latin super and Greek ὑπέρ...


External links
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK