Standard column family
Encyclopedia
The standard column family is a NoSQL object that contains column
Column (database)
In the context of a relational database table, a column is a set of data values of a particular simple type, one for each row of the table. The columns provide the structure according to which the rows are composed....

s of related data. It is a tuple
Tuple
In mathematics and computer science, a tuple is an ordered list of elements. In set theory, an n-tuple is a sequence of n elements, where n is a positive integer. There is also one 0-tuple, an empty sequence. An n-tuple is defined inductively using the construction of an ordered pair...

 (pair) that consists of a key-value pair, where the key is mapped to a value that is a set of columns. In analogy with relational databases, a standard column family is as a "table", each key-value pair being a "row". Each column is a tuple
Tuple
In mathematics and computer science, a tuple is an ordered list of elements. In set theory, an n-tuple is a sequence of n elements, where n is a positive integer. There is also one 0-tuple, an empty sequence. An n-tuple is defined inductively using the construction of an ordered pair...

 (triplet
Triplet
-Science:* A series of three nucleotide bases that form Genetic code* J-coupling as part of NMR spectroscopy* Opal in preparation to be a gemstone* Spin triplet in quantum mechanics — as in triplet oxygen, or simply triplet state in general....

) consisting of a column name, a value, and a timestamp
Timestamp
A timestamp is a sequence of characters, denoting the date or time at which a certain event occurred. A timestamp is the time at which an event is recorded by a computer, not the time of the event itself...

. In a relational
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...

 database table, this data would be grouped together within a table with other non-related data.

Standard column families are column containers sorted by their names can be referenced and sorted by their row key.

Benefits

Accessing the data in a distributed
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...

 data store
Data store
A data store is a data repository of a set of integrated objects. These objects are modeled using classes defined in database schemas. Data store includes not only data repositories like databases, it is a more general concept that includes also flat files that can store data.Some data stores do...

 would be expensive (time-consuming), if it would be saved in form of a table. It would also be inefficient to read all column families that would make up a row in a relational table and put it together to form a row, as the data for it is distributed on a large number of nodes
Node (networking)
In communication networks, a node is a connection point, either a redistribution point or a communication endpoint . The definition of a node depends on the network and protocol layer referred to...

. Therefore, the user accesses only the related information required.

As an example, a relational table could consist of the columns UID, first name, surname, birthdate, gender, etc. In a distributed data store, the same table would be implemented by creating columns families for "UID, first name, surname", "birthdate, gender", etc. If one needs only the males that were born between 1950 and 1960, for a query in the relational database, all the table has to be read. In a distributed data store, it suffices to access only the second standard column family, as the rest of information is irrelevant.

Sorting and querying

There is no way to sort columns, nor to query
Query language
Query languages are computer languages used to make queries into databases and information systems.Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages...

 an arbitrary query in distributed data store
Distributed data store
A distributed data store is a blurred concept and means either a distributed database where users store their information on a number of nodes, or a network in which a user stores their information on a number of peer network nodes ....

s. Columns are sorted when they are added to the column family. The way of sorting is defined by an attribute. For instance, this is done by the CompareWith attribute in Apache Cassandra that can have the following values:
  • AsciiType
  • BytesType
  • LexicalUUIDType
  • LongType
  • TimeUUIDType
  • UTF8Type


It is also possible to add some user-defined sorting attributes. Using this way of sorting makes the process extremely quick.

Standard column families vs. rows

Standard column families have a schemeless nature so that each of their "row"s can contain a different number of columns, and even different column names could be in each row. So, they are a very different concept than the rows in relational database management system (RDBMS)
Relational database management system
A relational database management system is a database management system that is based on the relational model as introduced by E. F. Codd. Most popular databases currently in use are based on the relational database model....

s. This is one of the reasons why the concept is not trivial for an experienced RDBMS expert.

Examples

In JSON-like
JSON
JSON , or JavaScript Object Notation, is a lightweight text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects...

notation, a column family definition would look as follows:


UserProfile = {
Cassandra = { emailAddress:”casandra@apache.org” , age:”20”}
TerryCho = { emailAddress:”terry.cho@apache.org” , gender:”male”}
Cath = { emailAddress:”cath@apache.org” , age:”20”,gender:”female”,address:”Seoul”}
}

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK