Content addressable network
Encyclopedia
The Content Addressable Network (CAN) is a distributed, decentralized P2P
Peer-to-peer
Peer-to-peer computing or networking is a distributed application architecture that partitions tasks or workloads among peers. Peers are equally privileged, equipotent participants in the application...

 infrastructure that provides hash table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...

 functionality on an Internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...

-like scale. CAN was one of the original four distributed hash table
Distributed hash table
A distributed hash table is a class of a decentralized distributed system that provides a lookup service similar to a hash table; pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key...

 proposals, introduced concurrently with Chord
Chord project
In computing, Chord is a protocol and algorithm for a peer-to-peer distributed hash table. A distributed hash table stores key-value pairs by assigning keys to different computers ; a node will store the values for all the keys for which it is responsible...

, Pastry
Pastry (DHT)
Pastry is an overlay and routing network for the implementation of a distributed hash table similar to Chord. The key-value pairs are stored in a redundant peer-to-peer network of connected Internet hosts. The protocol is bootstrapped by supplying it with the IP address of a peer already in the...

, and Tapestry
Tapestry (DHT)
Tapestry is a distributed hash table which provides a decentralized object location, routing, and multicasting infrastructure for distributed applications...

.

Overview

Like other distributed hash tables, CAN is designed to be scalable, fault tolerant, and self-organizing. The architectural design is a virtual multi-dimensional Cartesian
Cartesian coordinate system
A Cartesian coordinate system specifies each point uniquely in a plane by a pair of numerical coordinates, which are the signed distances from the point to two fixed perpendicular directed lines, measured in the same unit of length...

 coordinate space
Coordinate space
In mathematics, specifically in linear algebra, the coordinate space, Fn, is the prototypical example of an n-dimensional vector space over a field F. It can be defined as the product space of F over a finite index set.-Definition:...

, a type of overlay network
Overlay network
An overlay network is a computer network which is built on the top of another network. Nodes in the overlay can be thought of as being connected by virtual or logical links, each of which corresponds to a path, perhaps through many physical links, in the underlying network...

, on a multi-torus
Torus
In geometry, a torus is a surface of revolution generated by revolving a circle in three dimensional space about an axis coplanar with the circle...

. This d-dimensional coordinate space is a virtual
Virtual
The term virtual is a concept applied in many fields with somewhat differing connotations, and also, differing denotations.The term has been defined in philosophy as "that which is not real" but may display the salient qualities of the real....

 logical address
Logical address
In computing, a logical address is the address at which an item appears to reside from the perspective of an executing application program....

, completely independent of the physical location and physical connectivity of the nodes. Point
Point
-Business and finance:* Basis point, 1/100 of one percent, denoted bp, bps, and ‱* Pivot point, a price level of significance in analysis of a financial market that is used as a predictive indicator of market movement...

s within the space are identified with coordinates. The entire coordinate space is dynamically partitioned among all the nodes in the system such that every node possesses at least one distinct zone within the overall space.

Routing

A CAN node maintains a routing table
Routing table
In computer networking a routing table, or Routing Information Base , is a data table stored in a router or a networked computer that lists the routes to particular network destinations, and in some cases, metrics associated with those routes. The routing table contains information about the...

 that holds the IP address
IP address
An Internet Protocol address is a numerical label assigned to each device participating in a computer network that uses the Internet Protocol for communication. An IP address serves two principal functions: host or network interface identification and location addressing...

 and virtual coordinate zone of each of its neighbors. A node routes a message towards a destination point in the coordinate space. The node first determines which neighboring zone is closest to the destination point, and then looks up that zone's node's IP address via the routing table.

Node joining

To join a CAN, a joining node must:
  1. Find a node already in the overlay network.
  2. Identify a zone that can be split
  3. Update the routing tables of nodes neighboring the newly split zone.


To find a node already in the overlay network, bootstrapping node
Bootstrapping node
A bootstrapping node, also known as a rendezvous host, is a node in an overlay network that provides initial configuration information to newly joining nodes so that they may successfully join the overlay network...

s may be used to inform the joining node of IP addresses of nodes currently in the overlay network.

After the joining node receives an IP address of a node already in the CAN, it can attempt to identify a zone for itself. The joining node randomly picks a point in the coordinate space and sends a join request, directed to the random point, to one of the received IP addresses. The nodes already in the overlay network route the join request to the correct device via their zone-to-IP routing tables. Once the node managing the destination point's zone receives the join request, it may honor the join request by splitting its zone in half, allocating itself the first half, and allocating the joining node the second half. If it does not honor the join request, the joining node keeps picking random points in the coordinate space and sending join requests directed to these random points until it successfully joins the network.

After the zone split and allocation is complete, the neighboring nodes are updated with the coordinates of the two new zones and the corresponding IP addresses. Routing tables are updated and updates are propagated across the network.

Node departing

To handle a node departing, the CAN must i) identify a node is departing, ii) have the departing node's zone merged or taken-over by a neighboring node, and iii) update the routing tables across the network.

Detecting a node's departure can be done, for instance, via heartbeat messages that periodically broadcast routing table information between neighbors. After a predetermined period of silence from a neighbor, that neighboring node is determined as failed and is considered a departing node. Alternatively, a node that is willingly departing may broadcast such a notice to its neighbors.

After a departing node is identified, its zone must be either merged or taken-over. First the departed node's zone is analyzed to determine whether a neighboring node's zone can merge with the departed node's zone to form a valid zone. For example, a zone in a 2d coordinate space must be either a square or rectangle and cannot be L-shaped. The validation test may cycle through all neighboring zones to determine if a successful merge can occur. If one of the potential merges is deemed a valid merge, the zones are then merged. If none of the potential merges are deemed valid, then the neighboring node with the smallest zone takes over control of the departing node's zone. After a take-over, the take-over node may periodically attempt to merge its additionally controlled zones with respective neighboring zones.

If the merge is successful, routing tables of neighboring zones' nodes are updated to reflect the merge. The network will see the subsection of the overlay network as one, single zone after a merge and treat all routing processing with this mindset. To effectuate a take-over, the take-over node updates neighboring zones' nodes' routing tables, so that requests to either zone resolve to the take-over node. And, as such, the network still sees the subsection of the overlay network as two separate zones and treats all routing processing with this mindset.

Developers

Sylvia Ratnasamy, Paul Francis, Mark Handley
Mark Handley (computer scientist)
Mark Handley is Professor of Networked Systems in the Department of Computer Science of University College London since 2003, where he leads the Networks Research Group...

, Richard Karp
Richard Karp
Richard Manning Karp is a computer scientist and computational theorist at the University of California, Berkeley, notable for research in the theory of algorithms, for which he received a Turing Award in 1985, The Benjamin Franklin Medal in Computer and Cognitive Science in 2004, and the Kyoto...

, Scott Shenker
Scott Shenker
Scott Shenker is a Professor of Computer Science at UC Berkeley. He is also the head of the Networking Group and the Vice President of the International Computer Science Institute in Berkeley, California. He received his Sc.B. in Physics from Brown University in 1978, and his PhD in Physics from...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK