Shared nothing architecture
Encyclopedia
A shared nothing architecture (SN) is a distributed computing
architecture in which each node is independent and self-sufficient, and there is no single point of contention across the system. More specifically, none of the nodes share memory or disk storage.
People typically contrast SN with systems that keep a large amount of centrally-stored state
information, whether in a database
, an application server
, or any other similar single point of contention. While SN is best known in the context of web
development, the concept predates the web: Michael Stonebraker
at the University of California, Berkeley
used the term in a 1986 database paper. In it he mentions existing commercial implementations of the architecture (although none are named explicitly). Teradata
, which delivered its first system in 1983, was probably one of those commercial implementations.
Shared nothing is popular for web development because of its scalability
. As Google
has demonstrated, a pure SN system can scale almost infinitely simply by adding nodes in the form of inexpensive computers, since there is no single bottleneck to slow the system down. Google calls this sharding. An SN system typically partitions its data among many nodes on different databases (assigning different computers to deal with different users or queries), or may require every node to maintain its own copy of the application's data, using some kind of coordination protocol. This is often referred to as database sharding.
There is some doubt about whether a web application with many independent web nodes but a single, shared database (clustered or otherwise) should be counted as SN. One of the approaches to achieve SN architecture for stateful applications (which typically maintain state in a centralized database
) is the use of a data grid, also known as distributed caching. This still leaves the centralized database as a single point of failure.
Shared nothing architectures have become prevalent in the data warehousing space. There is much debate as to whether the shared nothing approach is superior to shared Disk with sound arguments presented by both camps. Shared nothing architectures certainly take longer to respond to queries that involve joins over large data sets from different partitions (machines). However the potential for scaling is huge.
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...
architecture in which each node is independent and self-sufficient, and there is no single point of contention across the system. More specifically, none of the nodes share memory or disk storage.
People typically contrast SN with systems that keep a large amount of centrally-stored state
State (computer science)
In computer science and automata theory, a state is a unique configuration of information in a program or machine. It is a concept that occasionally extends into some forms of systems programming such as lexers and parsers....
information, whether in a database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
, an application server
Application server
An application server is a software framework that provides an environment in which applications can run, no matter what the applications are or what they do...
, or any other similar single point of contention. While SN is best known in the context of web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...
development, the concept predates the web: Michael Stonebraker
Michael Stonebraker
Michael Ralph Stonebraker is a computer scientist specializing in database research.Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational database systems on the market today...
at the University of California, Berkeley
University of California, Berkeley
The University of California, Berkeley , is a teaching and research university established in 1868 and located in Berkeley, California, USA...
used the term in a 1986 database paper. In it he mentions existing commercial implementations of the architecture (although none are named explicitly). Teradata
Teradata
Teradata Corporation is a vendor specializing in data warehousing and analytic applications. Its products are commonly used by companies to manage data warehouses for analytics and business intelligence purposes. Teradata was formerly a division of NCR Corporation, with the spinoff from NCR on...
, which delivered its first system in 1983, was probably one of those commercial implementations.
Shared nothing is popular for web development because of its scalability
Scalability
In electronics scalability is the ability of a system, network, or process, to handle growing amount of work in a graceful manner or its ability to be enlarged to accommodate that growth...
. As Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...
has demonstrated, a pure SN system can scale almost infinitely simply by adding nodes in the form of inexpensive computers, since there is no single bottleneck to slow the system down. Google calls this sharding. An SN system typically partitions its data among many nodes on different databases (assigning different computers to deal with different users or queries), or may require every node to maintain its own copy of the application's data, using some kind of coordination protocol. This is often referred to as database sharding.
There is some doubt about whether a web application with many independent web nodes but a single, shared database (clustered or otherwise) should be counted as SN. One of the approaches to achieve SN architecture for stateful applications (which typically maintain state in a centralized database
Centralized database
A Centralized database is a database located and maintained in one location, unlike a distributed database. One main advantage is that all data is located in one place. The disadvantage is that bottlenecks may occur....
) is the use of a data grid, also known as distributed caching. This still leaves the centralized database as a single point of failure.
Shared nothing architectures have become prevalent in the data warehousing space. There is much debate as to whether the shared nothing approach is superior to shared Disk with sound arguments presented by both camps. Shared nothing architectures certainly take longer to respond to queries that involve joins over large data sets from different partitions (machines). However the potential for scaling is huge.
What Is Shared
While there is no single point of contention within the software/hardware components of SN systems, it should be noted that information from disparate nodes still needs to be reintegrated at some point. Such points occur wherever an information system that is outside the SN architecture queries information from disparate nodes within the SN architecture for a single purpose. Examples of such external nodes might be:- persons (minds) who look at two SN nodes and decide that they hold or process data about the same thing (simply recognising that two nodes belong to the same SN system would be sufficient)
- any software/hardware system that is written to query different nodes within the SN architecture
See also
- Oracle RACOracle RACIn database computing, Oracle Real Application Clusters — an option for the Oracle Database software produced by Oracle Corporation and introduced in 2001 with Oracle9i — provides software for clustering and high availability in Oracle database environments...
(Shared Everything) - Ad hoc networking
- Ambient networkAmbient networkAmbient Networks is a network integration design that seeks to solve problems relating to switching between networks to maintain contact with the outside world...
- Byzantine fault toleranceByzantine fault toleranceByzantine fault tolerance is a sub-field of fault tolerance research inspired by the Byzantine Generals' Problem, which is a generalized version of the Two Generals' Problem....
- Client–server model
- Comparison of P2P applications
- Computer cluster
- Decentralized computingDecentralized computingDecentralized computing is the allocation of resources, both hardware and software, to each individual workstation, or office location. In contrast, centralized computing exists when the majority of functions are carried out, or obtained from a remote centralized location. Decentralized computing...
- Distributed hash tableDistributed hash tableA distributed hash table is a class of a decentralized distributed system that provides a lookup service similar to a hash table; pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key...
(DHT) - File sharingFile sharingFile sharing is the practice of distributing or providing access to digitally stored information, such as computer programs, multimedia , documents, or electronic books. It may be implemented through a variety of ways...
- Friend-to-friendFriend-to-friendA friend-to-friend computer network is a type of peer-to-peer network in which users only make direct connections with people they know. Passwords or digital signatures can be used for authentication....
(F2F) - Friend-to-friend with third party storage
- GreenplumGreenplumGreenplum is a database software company in San Mateo, California, specializing in enterprise data cloud solutions for large-scale data warehousing and analytics...
- Grid computingGrid computingGrid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files...
- IBM DB2IBM DB2The IBM DB2 Enterprise Server Edition is a relational model database server developed by IBM. It primarily runs on Unix , Linux, IBM i , z/OS and Windows servers. DB2 also powers the different IBM InfoSphere Warehouse editions...
(EEE and DPF) - MySQL ClusterMySQL ClusterMySQL Cluster is a technology which provides shared-nothing clustering capabilities for the MySQL database management system. It was first included in the production release of MySQL 4.1 in November 2004. It is designed to provide high availability and high performance, while allowing for nearly...
- Overlay networkOverlay networkAn overlay network is a computer network which is built on the top of another network. Nodes in the overlay can be thought of as being connected by virtual or logical links, each of which corresponds to a path, perhaps through many physical links, in the underlying network...
- Private peer-to-peer
- ServentServentIn general a servent is a peer-to-peer network node, which has the functionalities of both a server and a client. This is a portmanteau derived from the terms server and client; it is a play on the word "servant"....
- Swarm intelligenceSwarm intelligenceSwarm intelligence is the collective behaviour of decentralized, self-organized systems, natural or artificial. The concept is employed in work on artificial intelligence...
- TREXTREX search engineTREX is a search engine in the SAP NetWeaver integrated technology platform produced by SAP AG. The TREX engine is a standalone component that can be used in a range of system environments but is used primarily as an integral part of such SAP products as Enterprise Portal, Knowledge Warehouse, and...
(search engine in the SAP NetWeaverNetWeaverSAP NetWeaver is SAP's integrated technology computing platform and is the technical foundation for many SAP applications since the SAP Business Suite. SAP NetWeaver is marketed as a service-oriented application and integration platform...
integrated technology platform produced by SAP AGSAP AGSAP AG is a German software corporation that makes enterprise software to manage business operations and customer relations. Headquartered in Walldorf, Baden-Württemberg, with regional offices around the world, SAP is the market leader in enterprise application software...
)