Memcached
Encyclopedia
In computing
, memcached is a general-purpose distributed memory caching system that was originally developed by Danga Interactive
for LiveJournal
, but is now used by many other sites. It is often used to speed up dynamic database
-driven websites by caching data and objects
in RAM
to reduce the number of times an external data source (such as a database or API) must be read. Memcached runs on Unix, Linux, Windows and MacOSX and is distributed under a permissive free software license.
Memcached's APIs provide a giant hash table
distributed across multiple machines. When the table is full, subsequent inserts cause older data to be purged in least recently used (LRU)
order. Applications using Memcached typically layer requests and additions into RAM before falling back on a slower backing store, such as a database.
The system is used by sites including YouTube
, Reddit
, Zynga
, Facebook
, Orange, and Twitter
. Heroku
(now part of Salesforce) offers a Couchbase
-managed memcached add-on service as part of their platform as a service
. Google App Engine
, AppScale
and Amazon Web Services
also offer a memcached service through an API. Memcached is also supported by some popular CMSs
such as Drupal
, Joomla, and WordPress
.
for his website LiveJournal
, on May 22, 2003.
; the clients populate this array and query it. Keys are up to 250 bytes long and values can be at most 1 megabyte
in size.
Clients use client side libraries to contact the servers which, by default, expose their service at port
11211. Each client knows all servers; the servers do not communicate with each other. If a client wishes to set or read the value corresponding to a certain key, the client's library first computes a hash of the key to determine the server that will be used. Then it contacts that server. The server will compute a second hash of the key to determine where to store or read the corresponding value.
The servers keep the values in RAM; if a server runs out of RAM, it discards the oldest values. Therefore, clients must treat Memcached as a transitory cache; they cannot assume that data stored in Memcached is still there when they need it. A Memcached-protocol compatible product known as MemcacheDB
provides persistent storage.
If all client libraries use the same hashing algorithm to determine servers, then clients can read each other's cached data; this is obviously desirable.
A typical deployment will have several servers and many clients. However, it is possible to use Memcached on a single computer, acting simultaneously as client and server.
authentication support. The SASL support requires the binary protocol.
A presentation at BlackHat USA 2010
revealed that a number of large public websites had left memcached open to inspection, analysis, retrieval, and modification of data.
only. Memcached calls and programming languages may vary based on the API used.
Converting database or object creation queries to use Memcached is simple. Typically, when using straight database queries, example code would be as follows:
After conversion to Memcached, the same call might look like the following
The server would first check whether a Memcached value with the unique key "userrow:userid" exists, where userid is some number. If the result does not exist, it would select from the database as usual, and set the unique key using the Memcached API add function call.
However, if only this API call were modified, the server would end up fetching incorrect data following any database update actions: the Memcached "view" of the data would become out of date. Therefore, in addition to creating an "add" call, an update call would be also needed, using the Memcached set function.
This call would update the currently cached data to match the new data in the database, assuming the database query succeeds. An alternative approach would be to invalidate the cache with the Memcached delete function, so that subsequent fetches result in a cache miss. Similar action would need to be taken when database records were deleted, to maintain either a correct or incomplete cache.
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...
, memcached is a general-purpose distributed memory caching system that was originally developed by Danga Interactive
Danga Interactive
Danga Interactive is a software and Internet company most widely known for its LiveJournal service. The company's slogan is "We make cool stuff." The company was founded under the name Bradfitz, Inc., on August 27, 1999, by Brad Fitzpatrick, who also created LiveJournal...
for LiveJournal
LiveJournal
LiveJournal is a virtual community where Internet users can keep a blog, journal or diary. LiveJournal is also the name of the free and open source server software that was designed to run the LiveJournal virtual community....
, but is now used by many other sites. It is often used to speed up dynamic database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
-driven websites by caching data and objects
Object (computer science)
In computer science, an object is any entity that can be manipulated by the commands of a programming language, such as a value, variable, function, or data structure...
in RAM
Ram
-Animals:*Ram, an uncastrated male sheep*Ram cichlid, a species of freshwater fish endemic to Colombia and Venezuela-Military:*Battering ram*Ramming, a military tactic in which one vehicle runs into another...
to reduce the number of times an external data source (such as a database or API) must be read. Memcached runs on Unix, Linux, Windows and MacOSX and is distributed under a permissive free software license.
Memcached's APIs provide a giant hash table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
distributed across multiple machines. When the table is full, subsequent inserts cause older data to be purged in least recently used (LRU)
Cache algorithms
In computing, cache algorithms are optimizing instructions – algorithms – that a computer program or a hardware-maintained structure can follow to manage a cache of information stored on the computer...
order. Applications using Memcached typically layer requests and additions into RAM before falling back on a slower backing store, such as a database.
The system is used by sites including YouTube
YouTube
YouTube is a video-sharing website, created by three former PayPal employees in February 2005, on which users can upload, view and share videos....
Reddit
reddit is a social news website where the registered users submit content, in the form of either a link or a text "self" post. Other users then vote the submission "up" or "down," which is used to rank the post and determine its position on the site's pages and front page.Reddit was originally...
, Zynga
Zynga
Zynga is a social network game developer located in San Francisco, United States. The company develops browser-based games that work both stand-alone and as application widgets on social networking websites such as Facebook and MySpace....
Facebook
Facebook is a social networking service and website launched in February 2004, operated and privately owned by Facebook, Inc. , Facebook has more than 800 million active users. Users must register before using the site, after which they may create a personal profile, add other users as...
, Orange, and Twitter
Twitter
Twitter is an online social networking and microblogging service that enables its users to send and read text-based posts of up to 140 characters, informally known as "tweets".Twitter was created in March 2006 by Jack Dorsey and launched that July...
. Heroku
Heroku
Heroku is a cloud Platform as a Service run by the San Francisco, California-based company with the same name. Heroku led the way for a multi-language PaaS, introducing the 'polyglot platform'. Heroku initially supported the Ruby programming language, with Rack and Ruby on Rails. Heroku PaaS now...
(now part of Salesforce) offers a Couchbase
Couchbase
Couchbase is a Silicon Valley-based enterprise software company which offers product "Couchbase ", plus sells support & training for these...
-managed memcached add-on service as part of their platform as a service
Platform as a service
Platform as a service is a category of cloud computing services that provide a computing platform and a solution stack as a service...
. Google App Engine
Google App Engine
Google App Engine is a platform as a service cloud computing platform for developing and hosting web applications in Google-managed data centers. It virtualizes applications across multiple servers,...
, AppScale
AppScale
AppScale is an open-source framework for running Google App Engine applications. It is an implementation of a cloud computing platform , supporting Xen, KVM, Amazon EC2 and Eucalyptus. It has been developed and is maintained by the RACELab at UC Santa Barbara.AppScale allows users to upload...
and Amazon Web Services
Amazon Web Services
Amazon Web Services is a collection of remote computing services that together make up a cloud computing platform, offered over the Internet by Amazon.com...
also offer a memcached service through an API. Memcached is also supported by some popular CMSs
Content management system
A content management system is a system providing a collection of procedures used to manage work flow in a collaborative environment. These procedures can be manual or computer-based...
such as Drupal
Drupal
Drupal is a free and open-source content management system and content management framework written in PHP and distributed under the GNU General Public License. It is used as a back-end system for at least 1.5% of all websites worldwide ranging from personal blogs to corporate, political, and...
, Joomla, and WordPress
WordPress
WordPress is a free and open source blogging tool and publishing platform powered by PHP and MySQL. It is often customized into a content management system . It has many features including a plug-in architecture and a template system. WordPress is used by over 14.7% of Alexa Internet's "top 1...
.
History
Memcached was first developed by Brad FitzpatrickBrad Fitzpatrick
Bradley Joseph "Brad" Fitzpatrick , is an American programmer. He is best known as the creator of LiveJournal and is the author of a variety of free software projects such as memcached....
for his website LiveJournal
LiveJournal
LiveJournal is a virtual community where Internet users can keep a blog, journal or diary. LiveJournal is also the name of the free and open source server software that was designed to run the LiveJournal virtual community....
, on May 22, 2003.
Architecture
The system uses a client–server architecture. The servers maintain a key–value associative arrayAssociative array
In computer science, an associative array is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection....
; the clients populate this array and query it. Keys are up to 250 bytes long and values can be at most 1 megabyte
Megabyte
The megabyte is a multiple of the unit byte for digital information storage or transmission with two different values depending on context: bytes generally for computer memory; and one million bytes generally for computer storage. The IEEE Standards Board has decided that "Mega will mean 1 000...
in size.
Clients use client side libraries to contact the servers which, by default, expose their service at port
Computer port (software)
In computer programming, port has a wide range of meanings.A software port is a virtual/logical data connection that can be used by programs to exchange data directly, instead of going through a file or other temporary storage location...
11211. Each client knows all servers; the servers do not communicate with each other. If a client wishes to set or read the value corresponding to a certain key, the client's library first computes a hash of the key to determine the server that will be used. Then it contacts that server. The server will compute a second hash of the key to determine where to store or read the corresponding value.
The servers keep the values in RAM; if a server runs out of RAM, it discards the oldest values. Therefore, clients must treat Memcached as a transitory cache; they cannot assume that data stored in Memcached is still there when they need it. A Memcached-protocol compatible product known as MemcacheDB
Memcachedb
MemcacheDB is a persistence enabled variant of memcached, a general-purpose distributed memory caching system often used to speed up dynamic database-driven websites by caching data and objects in memory...
provides persistent storage.
If all client libraries use the same hashing algorithm to determine servers, then clients can read each other's cached data; this is obviously desirable.
A typical deployment will have several servers and many clients. However, it is possible to use Memcached on a single computer, acting simultaneously as client and server.
Security
Most deployments of Memcached exist within trusted networks where clients may freely connect to any server. There are cases, however, where Memcached is deployed in untrusted networks or where administrators would like to exercise control over the clients that are connecting. For this purpose Memcached can be compiled with optional SASLSimple Authentication and Security Layer
Simple Authentication and Security Layer is a framework for authentication and data security in Internet protocols. It decouples authentication mechanisms from application protocols, in theory allowing any authentication mechanism supported by SASL to be used in any application protocol that uses...
authentication support. The SASL support requires the binary protocol.
A presentation at BlackHat USA 2010
Black Hat Briefings
The Black Hat Conference is a computer security conference that brings together a variety of people interested in information security. Representatives of federal agencies and corporations attend along with hackers. The Briefings take place regularly in Las Vegas, Barcelona and Tokyo...
revealed that a number of large public websites had left memcached open to inspection, analysis, retrieval, and modification of data.
Example code
Note that all functions described on this page are pseudocodePseudocode
In computer science and numerical computation, pseudocode is a compact and informal high-level description of the operating principle of a computer program or other algorithm. It uses the structural conventions of a programming language, but is intended for human reading rather than machine reading...
only. Memcached calls and programming languages may vary based on the API used.
Converting database or object creation queries to use Memcached is simple. Typically, when using straight database queries, example code would be as follows:
After conversion to Memcached, the same call might look like the following
The server would first check whether a Memcached value with the unique key "userrow:userid" exists, where userid is some number. If the result does not exist, it would select from the database as usual, and set the unique key using the Memcached API add function call.
However, if only this API call were modified, the server would end up fetching incorrect data following any database update actions: the Memcached "view" of the data would become out of date. Therefore, in addition to creating an "add" call, an update call would be also needed, using the Memcached set function.
This call would update the currently cached data to match the new data in the database, assuming the database query succeeds. An alternative approach would be to invalidate the cache with the Memcached delete function, so that subsequent fetches result in a cache miss. Similar action would need to be taken when database records were deleted, to maintain either a correct or incomplete cache.
External links
- Official memcached site
- memcached wiki and faq
- PHP Memcached Manager with Tag Support
- membase
- Memcached and Ruby
- QuickCached - memcached server implementation in Java
Commercially Supported Distributions
- Couchbase Server (formerly Membase) offers a memcached "bucket type" (free for use, subscription support available)
- GigaSpaces Java based Memcached (free community edition, fault tolerance)
- Hazelcast Memcached clustered, elastic, fault-tolerant, Java based memcached (free for use, subscription support available)