Candidate key
Encyclopedia
In the relational model
of database
s, a candidate key of a relation
is a minimal superkey
for that relation; that is, a set of attributes such that
The constituent attributes are called prime attributes. Conversely, an attribute that does not occur in ANY candidate key is called a non-prime attribute.
Since a relation contains no duplicate tuples, the set of all its attributes is a superkey if NULL values are not used. It follows that every relation will have at least one candidate key.
The candidate keys of a relation tell us all the possible ways we can identify its tuples. As such they are an important concept for the design database schema
.
For practical reasons RDBMSs usually require that for each relation one of its candidate keys is declared as the primary key, which means that it is considered as the preferred way to identify individual tuples. Foreign key
s, for example, are usually required to reference such a primary key and not any of the other candidate keys.
Here r2 differs from r1 only in the A and D values of the last tuple.
For r1 the following sets have the uniqueness property, i.e., there are no two distinct tuples in the instance with the same values for the attributes in the set:
For r2 the uniqueness property holds for the following sets;
Since superkeys of a relvar are those sets of attributes that have the uniqueness property for all legal values of that relvar and because we assume that r1 and r2 are all the legal values that R can take, we can determine the set of superkeys of R by taking the intersection of the two lists:
Finally we need to select those sets for which there is no proper subset in the list, which are in this case:
These are indeed the candidate keys of relvar R.
We have to consider all the relations that might be assigned to a relvar to determine whether a certain set of attributes is a candidate key. For example, if we had considered only r1 then we would have concluded that {A,B} is a candidate key, which is incorrect. However, we might be able to conclude from such a relation that a certain set is not a candidate key, because that set does not have the uniqueness property (example {A,D} for r1). Note that the existence of a proper subset of a set that has the uniqueness property cannot in general be used as evidence that the superset is not a candidate key. In particular, note that in the case of an empty relation, every subset of the heading has the uniqueness property, including the empty set.
. We can derive more superkeys by applying the following rule:
Consider for example the relation
for which it will trivially hold that
is a superkey. If we assume that a certain person can only marry once on a given date then this implies the functional dependencies:
In this case, applying the above rule leads to the derivation of the superkeys {Husband, Date} and {Wife, Date} respectively.
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...
of database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
s, a candidate key of a relation
Relvar
In relational databases, a relvar is a term coined by C. J. Date as an abbreviation for the concept of relation variable, which is the actual term used by the inventor of the relational model, E. F. Codd, regarding the same concept...
is a minimal superkey
Superkey
A superkey is defined in the relational model of database organization as a set of attributes of a relation variable for which it holds that in all relations assigned to that variable, there are no two distinct tuples that have the same values for the attributes in this set...
for that relation; that is, a set of attributes such that
- the relation does not have two distinct tupleTupleIn mathematics and computer science, a tuple is an ordered list of elements. In set theory, an n-tuple is a sequence of n elements, where n is a positive integer. There is also one 0-tuple, an empty sequence. An n-tuple is defined inductively using the construction of an ordered pair...
s (i.e. rows or records in common database language) with the same values for these attributes (which means that the set of attributes is a superkey) - there is no proper subset of these attributes for which (1) holds (which means that the set is minimal).
The constituent attributes are called prime attributes. Conversely, an attribute that does not occur in ANY candidate key is called a non-prime attribute.
Since a relation contains no duplicate tuples, the set of all its attributes is a superkey if NULL values are not used. It follows that every relation will have at least one candidate key.
The candidate keys of a relation tell us all the possible ways we can identify its tuples. As such they are an important concept for the design database schema
Logical schema
A Logical Schema is a data model of a specific problem domain expressed in terms of a particular data management technology. Without being specific to a particular database management product, it is in terms of either relational tables and columns, object-oriented classes, or XML tags...
.
For practical reasons RDBMSs usually require that for each relation one of its candidate keys is declared as the primary key, which means that it is considered as the preferred way to identify individual tuples. Foreign key
Foreign key
In the context of relational databases, a foreign key is a referential constraint between two tables.A foreign key is a field in a relational table that matches a candidate key of another table...
s, for example, are usually required to reference such a primary key and not any of the other candidate keys.
Example
The definition of candidate keys can be illustrated with the following (abstract) example. Consider a relation variable (relvar) R with attributes (A, B, C, D) that has only the following two legal values r1 and r2:A | B | C | D |
---|---|---|---|
a1 | b1 | c1 | d1 |
a1 | b2 | c2 | d1 |
a2 | b1 | c2 | d1 |
A | B | C | D |
---|---|---|---|
a1 | b1 | c1 | d1 |
a1 | b2 | c2 | d1 |
a1 | b1 | c2 | d2 |
Here r2 differs from r1 only in the A and D values of the last tuple.
For r1 the following sets have the uniqueness property, i.e., there are no two distinct tuples in the instance with the same values for the attributes in the set:
- {A,B}, {A,C}, {B,C}, {A,B,C}, {A,B,D}, {A,C,D}, {B,C,D}, {A,B,C,D}
For r2 the uniqueness property holds for the following sets;
- {B,C}, {B,D}, {C,D}, {A,B,C}, {A,B,D}, {A,C,D}, {B,C,D}, {A,B,C,D}
Since superkeys of a relvar are those sets of attributes that have the uniqueness property for all legal values of that relvar and because we assume that r1 and r2 are all the legal values that R can take, we can determine the set of superkeys of R by taking the intersection of the two lists:
- {B,C}, {A,B,C}, {A,B,D}, {A,C,D}, {B,C,D}, {A,B,C,D}
Finally we need to select those sets for which there is no proper subset in the list, which are in this case:
- {B,C}, {A,B,D}, {A,C,D}
These are indeed the candidate keys of relvar R.
We have to consider all the relations that might be assigned to a relvar to determine whether a certain set of attributes is a candidate key. For example, if we had considered only r1 then we would have concluded that {A,B} is a candidate key, which is incorrect. However, we might be able to conclude from such a relation that a certain set is not a candidate key, because that set does not have the uniqueness property (example {A,D} for r1). Note that the existence of a proper subset of a set that has the uniqueness property cannot in general be used as evidence that the superset is not a candidate key. In particular, note that in the case of an empty relation, every subset of the heading has the uniqueness property, including the empty set.
Determining candidate keys
The previous example only illustrates the definition of a candidate key and not how these are determined in practice. It is important to determine all superkeys, which is especially difficult if the relation represents a set of relationships rather than a set of entities. Therefore it is often useful to attempt to find any "forgotten" superkeys by also determining the functional dependenciesFunctional dependency
A functional dependency is a constraint between two sets of attributes in a relation from a database.Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, if, and only if, each X value is associated with precisely one Y value...
. We can derive more superkeys by applying the following rule:
- if S is a superkey and X→Y a functional dependency
- then (S ∖ {Y}) ∪ {X} is also a superkey, where '\' is the set differenceComplement (set theory)In set theory, a complement of a set A refers to things not in , A. The relative complement of A with respect to a set B, is the set of elements in B but not in A...
.
Consider for example the relation
- Marriage (Husband, Wife, Date)
for which it will trivially hold that
- {Husband, Wife, Date}
is a superkey. If we assume that a certain person can only marry once on a given date then this implies the functional dependencies:
- {Husband, Date} → Wife
- {Wife, Date} → Husband
In this case, applying the above rule leads to the derivation of the superkeys {Husband, Date} and {Wife, Date} respectively.
See also
- Alternate key
- Compound keyCompound keyIn database design, a compound key is a key that consists of 2 or more attributes that uniquely identify an entity occurrence. Each attribute that makes up the compound key is a simple key in its own right....
- Database normalizationDatabase normalizationIn the design of a relational database management system , the process of organizing data to minimize redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in order to produce smaller, well-structured relations...
- Primary key
- Relational databaseRelational databaseA relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
- SuperkeySuperkeyA superkey is defined in the relational model of database organization as a set of attributes of a relation variable for which it holds that in all relations assigned to that variable, there are no two distinct tuples that have the same values for the attributes in this set...
External links
- Relation Database terms of reference, Keys: An overview of the different types of keys in an RDBMS