Null (SQL)
Encyclopedia
Null is a special marker used in Structured Query Language (SQL)
SQL
SQL is a programming language designed for managing data in relational database management systems ....

 to indicate that a data value does not exist in the database. Introduced by the creator of the relational
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

 database model, E. F. Codd
Edgar F. Codd
Edgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...

, SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

 Null serves to fulfill the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek
Greek alphabet
The Greek alphabet is the script that has been used to write the Greek language since at least 730 BC . The alphabet in its classical and modern form consists of 24 letters ordered in sequence from alpha to omega...

 omega
Omega
Omega is the 24th and last letter of the Greek alphabet. In the Greek numeric system, it has a value of 800. The word literally means "great O" , as opposed to omicron, which means "little O"...

 (ω) symbol to represent Null in database theory
Database theory
Database theory encapsulates a broad range of topics related to the study and research of the theoretical realm of databases and database management systems....

. NULL is also an SQL reserved keyword
Reserved word
Reserved words are one type of grammatical construct in programming languages. These words have special meaning within the language and are predefined in the language’s formal specifications...

 used to identify the Null special marker.

Null has been the focus of controversy and a source of debate because of its associated three-valued logic
Ternary logic
In logic, a three-valued logic is any of several many-valued logic systems in which there are three truth values indicating true, false and some indeterminate third value...

 (3VL), special requirements for its use in SQL joins
Join (SQL)
An SQL join clause combines records from two or more tables in a database. It creates a set that can be saved as a table or used as is. A JOIN is a means for combining fields from two tables by using values common to each. ANSI standard SQL specifies four types of JOINs: INNER, OUTER, LEFT, and RIGHT...

, and the special handling required by aggregate functions and SQL grouping operators. Although special functions and predicates are provided to properly handle Nulls, opponents feel that resolving these issues introduces unnecessary complexity and inconsistency into the relational model
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

 of databases.

History

Null was introduced by E. F. Codd as a method of representing missing data in the relational model
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

. Codd later reinforced his requirement that all RDBMS support Null to indicate missing data in a two-part series published in ComputerWorld magazine. Codd also introduced a ternary (three-valued)
Ternary logic
In logic, a three-valued logic is any of several many-valued logic systems in which there are three truth values indicating true, false and some indeterminate third value...

 logic, consisting of the truth values True, False, and Unknown, which is closely tied to the concept of Null. The Unknown truth value is generated whenever Null is compared with any data value, or with another Null.

Codd indicated in his 1990 book The Relational Model for Database Management, Version 2 that the single Null mandated by the SQL standard was inadequate, and should be replaced by two separate Null-type markers to indicate the reason why data is missing. These two Null-type markers are commonly referred to as 'A-Values' and 'I-Values', representing 'Missing But Applicable' and 'Missing But Inapplicable', respectively. Codd's recommendation would have required SQL's logic system be expanded to accommodate a four-valued logic system. Because of this additional complexity, the idea of multiple Null-type values has not gained widespread acceptance.

Three-valued logic (3VL)

Since Null is not a member of any data domain
Data domain
In data management and database analysis, a data domain refers to all the unique values which a data element may contain. The rule for determining the domain boundary may be as simple as a data type with an enumerated list of values....

, it is not considered a "value", but rather a marker (or placeholder) indicating the absence of value
Undefined value
In computing , undefined value is a condition where an expression has not a correct value, although it is syntactically correct. Undefined value may not be confused with empty string, boolean "false" or other "empty" values...

. Because of this, comparisons with Null can never result in either True or False, but always in a third logical result, Unknown. The logical result of the expression below, which compares the value 10 to Null, is Unknown:


SELECT 10 = NULL -- Results in Unknown


However, certain operations on Null can return values if the value of Null is not relevant to the outcome of the operation. Consider the following example in which the OR statement is evaluated in short-circuited form:


SELECT TRUE OR NULL -- Results in True


In this case, the fact that the value on the right of OR is unknowable is irrelevant, because the outcome of the OR operation would be True regardless of the value on the right.

SQL implements three logical results, so SQL implementations must provide for a specialized three-valued logic (3VL)
Ternary logic
In logic, a three-valued logic is any of several many-valued logic systems in which there are three truth values indicating true, false and some indeterminate third value...

. The rules governing SQL three-valued logic are shown in the tables below (p and q represent logical states)"
Basic SQL comparison operators always return Unknown when comparing anything with Null, so the SQL standard provides for two special Null-specific comparison predicates. The IS NULL and IS NOT NULL predicates test whether data is, or is not, Null.

Data typing

Null is untyped in SQL, meaning that it is not designated as an integer, character, or any other specific data type
Data type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...

. Because of this, it is sometimes mandatory (or desirable) to explicitly convert Nulls to a specific data type. For example, if overloaded functions are supported by the RDBMS, SQL might not be able to automatically resolve to the correct function without knowing the data types of all parameters, including those for which Null is passed.

Data Manipulation Language

SQL three-valued logic is encountered in Data Manipulation Language (DML)
Data Manipulation Language
A data manipulation language is a family of syntax elements similar to a computer programming language used for inserting, deleting and updating data in a database...

 in comparison predicates of DML statements and queries. The WHERE clause causes the DML statement to act on only those rows for which the predicate evaluates to True. Rows for which the predicate evaluates to either False or Unknown are not acted on by INSERT
Insert (SQL)
An SQL INSERT statement adds one or more records to any single table in a relational database.-Basic form:Insert statements have the following form:* INSERT INTO table VALUES...

, UPDATE
Update (SQL)
An SQL UPDATE statement changes the data of one or more records in a table. Either all the rows can be updated, or a subset may be chosen using a condition.The UPDATE statement has the following form:...

, or DELETE
Delete (SQL)
In the database structured query language , the DELETE statement removes one or more records from a table. A subset may be defined for deletion using a condition, otherwise all records are removed.-Usage:The DELETE statement follows the syntax:...

DML statements, and are discarded by SELECT
Select (SQL)
The SQL SELECT statement returns a result set of records from one or more tables.A SELECT statement retrieves zero or more rows from one or more database tables or database views. In most applications, SELECT is the most commonly used Data Manipulation Language command...

queries. Interpreting Unknown and False as the same logical result is a common error encountered while dealing with Nulls. The following simple example demonstrates this fallacy:


SELECT *
FROM t
WHERE i = NULL;


The example query above logically always returns zero rows because the comparison of the i column with Null always returns Unknown, even for those rows where i is Null. The Unknown result causes the SELECT statement to summarily discard each and every row. (However, in practice, some SQL tools will retrieve rows using a comparison with Null.)

CASE expressions

SQL CASE expressions operate under the same rules as the DML WHERE clause rules for Null. Because it can be evaluated as a series of equality comparison conditions, a simple CASE expression cannot check for the existence of Null directly. A check for Null in a simple CASE expression always results in Unknown, as in the following:


SELECT CASE i WHEN NULL THEN 'Is Null' -- This will never be returned
WHEN 0 THEN 'Is Zero' -- This will be returned when i = 0
WHEN 1 THEN 'Is One' -- This will be returned when i = 1
END
FROM t;


Because the expression i = NULL evaluates to Unknown no matter what value column i contains (even if it contains Null), the string 'Is Null' will never be returned.

A searched CASE expression also returns the first value for which the result of the comparison predicate evaluates to True, including comparisons using the IS NULL and IS NOT NULL comparison predicates. The following example shows how to use a searched CASE expression to properly check for Null:


SELECT CASE WHEN i IS NULL THEN 'Null Result' -- This will be returned when i is NULL
WHEN i = 0 THEN 'Zero' -- This will be returned when i = 0
WHEN i = 1 THEN 'One' -- This will be returned when i = 1
END
FROM t;


In the searched CASE expression, the string 'Null Result' is returned for all rows in which i is Null.

Check constraints

The primary place in which SQL three-valued logic intersects with SQL Data Definition Language (DDL)
Data Definition Language
A data definition language or data description language is a syntax similar to a computer programming language for defining data structures, especially database schemas.-History:...

 is in the form of check constraint
Check Constraint
A check constraint is a condition that defines valid data when adding or updating an entry in a table of a relational database. A check constraint is applied to each row in the table. The constraint must be a predicate. It can refer to a single or multiple columns of the table...

s. A check constraint placed on a column operates under a slightly different set of rules than those for the DML WHERE clause. While a DML WHERE clause must evaluate to True for a row, a check constraint must not evaluate to False. This means that a check constraint will succeed if the result of the check is either True or Unknown. The following example table with a check constraint will prohibit any integer values from being inserted into column i, but will allow Null to be inserted since the result of the check will always evaluate to Unknown for Nulls.


CREATE TABLE t (
i INTEGER,
CONSTRAINT ck_i CHECK ( i < 0 AND i = 0 AND i > 0 ) );


In order to constrain a column to reject Nulls, the NOT NULL constraint can be applied, as shown in the example below. The NOT NULL constraint is semantically equivalent to a check constraint
Check Constraint
A check constraint is a condition that defines valid data when adding or updating an entry in a table of a relational database. A check constraint is applied to each row in the table. The constraint must be a predicate. It can refer to a single or multiple columns of the table...

 with an IS NOT NULL predicate.


CREATE TABLE t ( i INTEGER NOT NULL );

Procedural extensions

SQL/PSM (SQL Persistent Stored Modules) defines procedural
Procedural programming
Procedural programming can sometimes be used as a synonym for imperative programming , but can also refer to a programming paradigm, derived from structured programming, based upon the concept of the procedure call...

 extensions for SQL, such as the IF statement. However, the major SQL vendors have historically included their own proprietary procedural extensions. Procedural extensions for looping and comparisons operate under Null comparison rules similar to those for DML statements and queries. The following code fragment, in ISO SQL standard format, demonstrates the use of Null 3VL in an IF statement.


IF i = NULL THEN
SELECT 'Result is True'
ELSEIF NOT(i = NULL) THEN
SELECT 'Result is False'
ELSE
SELECT 'Result is Unknown';


The IF statement performs actions only for those comparisons that evaluate to True. For statements that evaluate to False or Unknown, the IF statement passes control to the ELSEIF clause, and finally to the ELSE clause. The result of the code above will always be the message 'Result is Unknown' since the comparisons with Null always evaluate to Unknown.

Joins

SQL outer joins
Join (SQL)
An SQL join clause combines records from two or more tables in a database. It creates a set that can be saved as a table or used as is. A JOIN is a means for combining fields from two tables by using values common to each. ANSI standard SQL specifies four types of JOINs: INNER, OUTER, LEFT, and RIGHT...

, including left outer joins, right outer joins, and full outer joins, automatically produce Nulls as placeholders for missing values in related tables. For left outer joins, for instance, Nulls are produced in place of rows missing from the table appearing on the right-hand side of the LEFT OUTER JOIN operator. The following simple example uses two tables to demonstrate Null placeholder production in a left outer join.

The first table (Employee) contains employee ID numbers and names, while the second table (PhoneNumber) contains related employee ID numbers and phone numbers
Telephone number
A telephone number or phone number is a sequence of digits used to call from one telephone line to another in a public switched telephone network. When telephone numbers were invented, they were short — as few as one, two or three digits — and were given orally to a switchboard operator...

, as shown below.
Employee
ID LastName FirstName
1 Johnson Joe
2 Lewis Larry
3 Thompson Thomas
4 Patterson Patricia
PhoneNumber
ID Number
1 555-2323
3 555-9876


The following sample SQL query performs a left outer join on these two tables.


SELECT e.ID, e.LastName, e.FirstName, pn.Number
FROM Employee e
LEFT OUTER JOIN PhoneNumber pn
ON e.ID = pn.ID;


The result set generated by this query demonstrates how SQL uses Null as a placeholder for values missing from the right-hand (PhoneNumber) table, as shown below.
Query result
ID LastName FirstName Number
1 Johnson Joe 555-2323
2 Lewis Larry NULL
3 Thompson Thomas 555-9876
4 Patterson Patricia NULL


Inner joins and cross joins, also available in standard SQL, do not generate Null placeholders for missing values in related tables.

Care must be taken when using nullable columns in SQL join criteria. Because a Null is not equal to any other Null, Nulls in a column of one table will not join to Nulls in the related column of another table using the standard equality comparison operators. The SQL COALESCE function or CASE expressions can be used to "simulate" Null equality in join criteria, and the IS NULL and IS NOT NULL predicates can be used in the join criteria as well.

The following predicate tests for equality of the values A and B and treats Nulls as being equal. The IFNULL operator is required since A = B returns a Null value if at least one of A or B is Null and NULL OR FALSE is Null itself.


IFNULL( A = B, FALSE ) OR ( A IS NULL AND B IS NULL )

Mathematical and string concatenation

Because Null is not a data value, but a marker for an unknown value, using mathematical operators on Null results in an unknown value, which is represented by Null. In the following example, multiplying 10 by Null results in Null:


10 * NULL -- Result is NULL


This can lead to unanticipated results. For instance, when an attempt is made to divide Null by zero, platforms may return Null instead of throwing an expected "data exception - division by zero". Though this behavior is not defined by the ISO SQL standard many DBMS vendors treat this operation similarly. For instance, the Oracle, PostgreSQL, MySQL Server, and Microsoft SQL Server platforms all return a Null result for the following:


NULL / 0


String concatenation
Concatenation
In computer programming, string concatenation is the operation of joining two character strings end-to-end. For example, the strings "snow" and "ball" may be concatenated to give "snowball"...

 operations, which are common in SQL, also result in Null when one of the operands is Null. The following example demonstrates the Null result returned by using Null with the SQL || string concatenation operator.


'Fish ' || NULL || 'Chips' -- Result is NULL

This is not true for all database implementations. In an Oracle RDBMS for example NULL and the empty string are considered the same thing and therefore 'Fish ' || NULL || 'Chips' results in 'Fish Chips'.

Aggregate functions

SQL defines aggregate function
Aggregate function
In computer science, an aggregate function is a function where the values of multiple rows are grouped together as input on certain criteria to form a single value of more significant meaning or measurement such as a set, a bag or a list....

s to simplify server-side aggregate calculations on data. Almost all aggregate functions perform a Null-elimination step, so that Null values are not included in the final result of the calculation. This implicit Null elimination, however, can have an impact on aggregate function results.

The following example table results in different results being returned for each column when the SQL AVG (average) aggregate function is applied:
Table
i j
150 150
200 200
350 350
NULL 0


The SQL AVG aggregate function returns 233 when applied to column i, but returns 175 when applied to column j. The aggregate function's Null-elimination step accounts for the difference in these results. The only aggregate function that does not implicitly eliminate Null is the COUNT(*) function.

Grouping and sorting

Because SQL:2003
SQL:2003
SQL:2003 is the fifth revision of the SQL database query language. The latest revision of the standard is SQL:2008.-Summary:The SQL:2003 standard makes minor modifications to all parts of SQL:1999 , and officially introduces a few new features such as:* XML-related features * Window functions* the...

 defines all Null markers as being unequal to one another, a special definition was required in order to group Nulls together when performing certain operations. SQL defines "any two values that are equal to one another, or any two Nulls", as "not distinct". This definition of not distinct allows SQL to group and sort Nulls when the GROUP BY clause (and other keywords that perform grouping) are used.

Other SQL operations, clauses, and keywords use "not distinct" in their treatment of Nulls. These include the following:
  • PARTITION BY clause of ranking and windowing functions like ROW_NUMBER
  • UNION, INTERSECT, and EXCEPT operator, which treat NULLs as the same for row comparison/elimination purposes
  • DISTINCT keyword used in SELECT queries


The SQL standard does not explicitly define a default sort order for Nulls. Instead, on conforming systems, Nulls can be sorted before or after all data values by using the NULLS FIRST or NULLS LAST clauses of the ORDER BY list, respectively. Not all DBMS vendors implement this functionality, however. Vendors who do not implement this functionality may specify different treatments for Null sorting in the DBMS.

Effect on index operation

Some SQL products do not index keys containing NULL values. For instance, PostgreSQL
PostgreSQL
PostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...

  versions prior to 8.3 did not, with the documentation for a B-Tree
B-tree
In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children...

 index stating that
In cases where the index enforces uniqueness, NULL values are excluded from the index and uniqueness is not enforced between NULL values. Again, quoting from the PostgreSQL
PostgreSQL
PostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...

 documentation:
This is consistent with the SQL:2003-defined behavior of scalar Null comparisons.

Another method of indexing Nulls involves handling them as not distinct in accordance with the SQL:2003-defined behavior. For example, Microsoft SQL Server
Microsoft SQL Server
Microsoft SQL Server is a relational database server, developed by Microsoft: It is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network...

 documentation states the following:
Both of these indexing strategies are consistent with the SQL:2003-defined behavior of Nulls. Because indexing methodologies are not explicitly defined by the SQL:2003 standard, indexing strategies for Nulls are left entirely to the vendors to design and implement.

Null-handling functions

SQL defines two functions to explicitly handle Nulls: NULLIF and COALESCE. Both functions are abbreviations for searched CASE expressions.

NULLIF

The NULLIF function accepts two parameters. If the first parameter is equal to the second parameter, NULLIF returns Null. Otherwise, the value of the first parameter is returned.


NULLIF(value1, value2)


Thus, NULLIF is an abbreviation for the following CASE expression:


CASE WHEN value1 = value2 THEN NULL ELSE value1 END

COALESCE

The COALESCE function accepts a list of parameters, returning the first non-Null value from the list:


COALESCE(value1, value2, value3, ...)


COALESCE is defined as shorthand for the following SQL CASE expression:


CASE WHEN value1 IS NOT NULL THEN value1
WHEN value2 IS NOT NULL THEN value2
WHEN value3 IS NOT NULL THEN value3
...
END


Some SQL DBMSs implement vendor-specific functions similar to COALESCE. Some systems (e.g. Transact-SQL
Transact-SQL
Transact-SQL is Microsoft's and Sybase's proprietary extension to SQL. SQL, often expanded to Structured Query Language, is a standardized computer language that was originally developed by IBM for querying, altering and defining relational databases, using declarative statements...

) implement an ISNULL function, or other similar functions that are functionally similar to COALESCE. (See Is functions
Is functions
The Is functions are a set of functions in Microsoft's Visual Basic 6, Visual Basic for Applications, VBScript, and Visual Basic .NET...

 for more on the IS functions in Transact-SQL.)

NVL

The NVL function accepts two parameters. It returns the first non-NULL parameter or NULL if all parameters are NULL.

A COALESCE expression can be converted into an equivalent NVL expression thus:

COALESCE ( val1, ... , val{n} )


turns into:


NVL( val1 , NVL( val2 , NVL( val3 , … , NVL ( val{n-1} , val{n} ) … )))


There is, however, one notable exception. In most implementations, COALESCE evaluates its parameters until it reaches the first non-NULL one, while NVL evaluates all of its parameters. This is important for several reasons. A parameter after the first non-NULL parameter could be a function, which could either be computationally expensive, invalid, or could create unexpected side effects.

Common mistakes

Misunderstanding of how Null works is the cause of a great number of errors in SQL code, both in ISO standard SQL statements and in the specific SQL dialects supported by real-world database management systems. These mistakes are usually the result of confusion between Null and either 0 (zero) or an empty string (a string value with a length of zero, represented in SQL as ). Null is defined by the ISO SQL standard as different from both an empty string and the numerical value 0, however. While Null indicates the absence of any value, the empty string and numerical zero both represent actual values.

For example, a WHERE clause or conditional statement might compare a column's value with a constant. It is often incorrectly assumed that a missing value would be "less than" or "not equal to" a constant if that field contains Null, but, in fact, such expressions return Unknown. An example is below:


SELECT *
FROM sometable
WHERE num <> 1; -- Rows where num is NULL will not be returned,
-- contrary to many users' expectations.


Similarly, Null values are often confused with empty strings. Consider the LENGTH function, which returns the number of characters in a string. When a Null is passed into this function, the function returns Null. This can lead to unexpected results, if users are not well versed in 3-value logic. An example is below:


SELECT *
FROM sometable
WHERE LENGTH(string) < 20; -- Rows where string is NULL will not be returned.


This is complicated by the fact that in some database interface programs, NULL is reported as an empty string, and empty strings may be incorrectly stored as NULL.

Criticisms

The ISO SQL implementation of Null is the subject of criticism, debate and calls for change. In The Relational Model for Database Management: Version 2, Codd suggested that the SQL implementation of Null was flawed and should be replaced by two distinct Null-type markers. The markers he proposed were to stand for "Missing but Applicable" and "Missing but Inapplicable", known as A-values and I-values, respectively. Codd's recommendation, if accepted, would have required the implementation of a four-valued logic in SQL. Others have suggested adding additional Null-type markers to Codd's recommendation to indicate even more reasons that a data value might be "Missing", increasing the complexity of SQL's logic system. At various times, proposals have also been put forth to implement multiple user-defined Null markers in SQL. Because of the complexity of the Null-handling and logic systems required to support multiple Null markers, none of these proposals have gained widespread acceptance.

Chris Date and Hugh Darwen
Hugh Darwen
Hugh Darwen is a computer scientist who was an employee of IBM United Kingdom from 1967 to 2004, and has been involved in the history of the relational model.- Work :...

, authors of The Third Manifesto
The Third Manifesto
The Third Manifesto is Christopher J. Date's and Hugh Darwen's proposal for future database management systems, a response to two earlier Manifestos with the same purpose. The theme of the manifestos is how to avoid the 'object-relational impedance mismatch' between object-oriented programming...

, have suggested that the SQL Null implementation is inherently flawed and should be eliminated altogether, pointing to inconsistencies and flaws in the implementation of SQL Null-handling (particularly in aggregate functions) as proof that the entire concept of Null is flawed and should be removed from the relational model. Others, like author Fabian Pascal
Fabian Pascal
Fabian Pascal is a consultant to large software vendors such as IBM, Oracle Corporation, and Borland, but is better known as an author and seminar speaker. Born in Romania, Pascal lives in the San Francisco, CA area of the US, and works in association with Christopher J...

, have stated a belief that "how the function calculation should treat missing values is not governed by the relational model."

Closed world assumption

Another point of conflict concerning Nulls is that they violate the closed world assumption
Closed world assumption
The closed world assumption is the presumption that what is not currently known to be true, is false. The same name also refers to a logical formalization of this assumption by Raymond Reiter. The opposite of the closed world assumption is the open world assumption , stating that lack of knowledge...

 model of relational databases by introducing an open world assumption
Open World Assumption
In formal logic, the open world assumption is the assumption that the truth-value of a statement is independent of whether or not it is known by any single observer or agent to be true. It is the opposite of the closed world assumption, which holds that any statement that is not known to be true is...

 into it. The closed world assumption, as it pertains to databases, states that "Everything stated by the database, either explicitly or implicitly, is true; everything else is false." This view assumes that the knowledge of the world stored within a database is complete. Nulls, however, operate under the open world assumption, in which some items stored in the database are considered unknown, making the database's stored knowledge of the world incomplete.

Law of the excluded middle

SQL allows three logical choices, true, false, and unknown, which means that SQL necessarily ignores the law of the excluded middle
Law of excluded middle
In logic, the law of excluded middle is the third of the so-called three classic laws of thought. It states that for any proposition, either that proposition is true, or its negation is....

. Put simply the Law of the Excluded Middle essentially states that when given any Boolean result, the opposite of the result can be obtained by applying the logical "not" operator. This does not apply to SQL nulls, however. Under the precepts of the law of the excluded middle, a Boolean expression like the following can be simplified:


SELECT * FROM stuff WHERE ( x = 10 ) OR NOT ( x = 10 );


The law of the excluded middle allows for simplification of the WHERE clause predicate, which would result in a statement like the following:


SELECT * FROM stuff;


This will not work in SQL, since the x column could contain nulls which would result in some new rows being returned.

Actually:


SELECT * FROM stuff;
-- is (because of 3VL) equivalent to:
SELECT * FROM stuff WHERE ( x = 10 ) OR NOT ( x = 10 ) OR x IS NULL;


Thus, to correctly simplify the first statement in SQL requires that we return all rows in which x is not null.


SELECT * FROM stuff WHERE x IS NOT NULL;


While ignoring the law of the excluded middle does introduce additional complexity to SQL logic, attempts to apply this rule to SQL's 3VL results in a false dichotomy
False dilemma
A false dilemma is a type of logical fallacy that involves a situation in which only two alternatives are considered, when in fact there are additional options...

.

Boolean datatype inconsistency

The ISO SQL:1999 standard introduced the Boolean datatype
Boolean datatype
In computer science, the Boolean or logical data type is a data type, having two values , intended to represent the truth values of logic and Boolean algebra...

 to SQL. The Boolean datatype, as defined by the standard, can hold the truth values TRUE, FALSE, and UNKNOWN. Null is defined in this one instance as equivalent to the truth value UNKNOWN.

This "null equals UNKNOWN truth value" proposition introduces an inconsistency into SQL 3VL. One major problem is that it contradicts a basic property of nulls, the property of propagation. Nulls, by definition, propagate through all SQL expressions. The Boolean truth values do not have this property. Consider the following scenarios in SQL:1999, in which two Boolean truth values are combined into a compound predicate. According to the rules of SQL 3VL, and as shown in the 3VL truth table shown earlier in this article, the following statements hold:
  • ( TRUE OR UNKNOWN ) → TRUE
  • ( FALSE AND UNKNOWN ) → FALSE


However, because nulls propagate, treating null as UNKNOWN results in the following logical inconsistencies in SQL 3VL:
  • ( TRUE OR NULL ) → NULL ( = UNKNOWN )
  • ( FALSE AND NULL ) → NULL ( = UNKNOWN )


The SQL:1999 standard does not define how to deal with this inconsistency, and results could vary between implementations. Because of these inconsistencies and lack of support from vendors the SQL Boolean datatype did not gain widespread acceptance. Most SQL DBMS platforms now offer their own platform-specific recommendations for storing Boolean-type data.

Note that in the PostgreSQL
PostgreSQL
PostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...

 implementation of SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

, the null value is used to represent all UNKNOWN results and the following evaluations occur:
  • ( TRUE OR NULL ) → TRUE
  • ( FALSE AND NULL ) → FALSE
  • ( FALSE OR NULL ) IS NULL → TRUE
  • ( TRUE AND NULL ) IS NULL → TRUE


MySQL
MySQL
MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...

 behaves similarly to PostgreSQL
PostgreSQL
PostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...

 in this regard (with the minor exception that MySQL
MySQL
MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...

 regards TRUE and FALSE as no different from the ordinary integers 1 and 0).

See also

  • SQL
    SQL
    SQL is a programming language designed for managing data in relational database management systems ....

  • SQL:2008
  • Tutorial D
  • Ternary logic
    Ternary logic
    In logic, a three-valued logic is any of several many-valued logic systems in which there are three truth values indicating true, false and some indeterminate third value...

  • Data Manipulation Language
    Data Manipulation Language
    A data manipulation language is a family of syntax elements similar to a computer programming language used for inserting, deleting and updating data in a database...

  • Codd's 12 rules
    Codd's 12 rules
    Codd's twelve rules are a set of thirteen rules proposed by Edgar F. Codd, a pioneer of the relational model for databases, designed to define what is required from a database management system in order for it to be considered relational, i.e., a relational database management system...

  • Check Constraint
    Check Constraint
    A check constraint is a condition that defines valid data when adding or updating an entry in a table of a relational database. A check constraint is applied to each row in the table. The constraint must be a predicate. It can refer to a single or multiple columns of the table...

  • Relational Database Management System
  • Join (SQL)
    Join (SQL)
    An SQL join clause combines records from two or more tables in a database. It creates a set that can be saved as a table or used as is. A JOIN is a means for combining fields from two tables by using values common to each. ANSI standard SQL specifies four types of JOINs: INNER, OUTER, LEFT, and RIGHT...

  • The Third Manifesto
    The Third Manifesto
    The Third Manifesto is Christopher J. Date's and Hugh Darwen's proposal for future database management systems, a response to two earlier Manifestos with the same purpose. The theme of the manifestos is how to avoid the 'object-relational impedance mismatch' between object-oriented programming...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK