Select (SQL)
Encyclopedia
The SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

 SELECT statement returns a result set
Result set
An SQL result set is a set of rows from a database, as well as meta-information about the query such as the column names, and the types and sizes of each column. Depending on the database system, the number of rows in the result set may or may not be known. Usually, this number is not known up...

 of records from one or more tables
Table (database)
In relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...

.

A SELECT statement retrieves zero or more rows from one or more database tables or database views
View (database)
In database theory, a view consists of a stored query accessible as a virtual table in a relational database or a set of documents in a document-oriented database composed of the result set of a query or map and reduce functions...

. In most applications, SELECT is the most commonly used Data Manipulation Language
Data Manipulation Language
A data manipulation language is a family of syntax elements similar to a computer programming language used for inserting, deleting and updating data in a database...

 (DML) command. As SQL is a declarative programming
Declarative programming
In computer science, declarative programming is a programming paradigm that expresses the logic of a computation without describing its control flow. Many languages applying this style attempt to minimize or eliminate side effects by describing what the program should accomplish, rather than...

 language, SELECT queries specify a result set, but do not specify how to calculate it. The database translates the query into a "query plan
Query plan
A query plan is an ordered set of steps used to access or modify information in a SQL relational database management system. This is a specific case of the relational model concept of access plans....

" which may vary between executions, database versions and database software. This functionality is called the "query optimizer
Query optimizer
The query optimizer is the component of a database management system that attempts to determine the most efficient way to execute a query. The optimizer considers the possible query plans for a given input query, and attempts to determine which of those plans will be the most efficient...

" as it is responsible for finding the best possible execution plan for the query, within applicable constraints.

The SELECT statement has many optional clauses:
  • WHERE
    Where (SQL)
    A WHERE clause in SQL specifies that a SQL Data Manipulation Language statement should only affect rows that meet specified criteria. The criteria are expressed in the form of predicates...

    specifies which rows to retrieve.
  • GROUP BY groups rows sharing a property so that an aggregate function
    Aggregate function
    In computer science, an aggregate function is a function where the values of multiple rows are grouped together as input on certain criteria to form a single value of more significant meaning or measurement such as a set, a bag or a list....

     can be applied to each group.
  • HAVING
    Having (SQL)
    A HAVING clause in SQL specifies that an SQL SELECT statement should only return rows where aggregate values meet the specified conditions. It was added to the SQL language because the WHERE keyword could not be used with aggregate functions.- Examples :...

    selects among the groups defined by the GROUP BY clause.
  • ORDER BY
    Order by (SQL)
    An ORDER BY clause in SQL specifies that a SQL SELECT statement returns a result set with the rows being sorted by the values of one or more columns. The sort criteria do not have to be included in the result set. The sort criteria can be expressions, including – but not limited to – column...

    specifies an order in which to return the rows.

Examples

Table "T" Query Result
EWLINE
C1 C2
1 a
2 b
SELECT * FROM T; EWLINE
C1 C2
1 a
2 b
EWLINE
C1 C2
1 a
2 b
SELECT C1 FROM T; EWLINE
C1
1
2
EWLINE
C1 C2
1 a
2 b
SELECT * FROM T WHERE C1 = 1; EWLINE
C1 C2
1 a
EWLINE
C1 C2
1 a
2 b
SELECT * FROM T ORDER BY C1 DESC; EWLINE
C1 C2
2 b
1 a

Given a table T, the query SELECT * FROM T will result in all the elements of all the rows of the table being shown.

With the same table, the query SELECT C1 FROM T will result in the elements from the column C1 of all the rows of the table being shown. This is similar to a projection in Relational algebra
Relational algebra
Relational algebra, an offshoot of first-order logic , deals with a set of finitary relations that is closed under certain operators. These operators operate on one or more relations to yield a relation...

, except that in the general case, the result may contain duplicate rows. This is also known as a Vertical Partition in some database terms, restricting query output to view only specified fields or columns.

With the same table, the query SELECT * FROM T WHERE C1 = 1 will result in all the elements of all the rows where the value of column C1 is '1' being shown — in Relational algebra
Relational algebra
Relational algebra, an offshoot of first-order logic , deals with a set of finitary relations that is closed under certain operators. These operators operate on one or more relations to yield a relation...

 terms, a selection will be performed, because of the WHERE clause. This is also known as a Horizontal Partition, restricting rows output by a query according to specified conditions.

Limiting result rows

Often it is convenient to indicate a maximum number of rows that are returned. This can be used for testing or to prevent consuming excessive resources if the query returns more information than expected. The approach to do this often varies per vendor.

In ISO
International Organization for Standardization
The International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...

 SQL:2003
SQL:2003
SQL:2003 is the fifth revision of the SQL database query language. The latest revision of the standard is SQL:2008.-Summary:The SQL:2003 standard makes minor modifications to all parts of SQL:1999 , and officially introduces a few new features such as:* XML-related features * Window functions* the...

, result sets may be limited by using
  • cursors
    Cursor (databases)
    In computer science and technology, a database cursor is a control structure that enables traversal over the records in a database. Cursors facilitate subsequent processing in conjunction with the traversal, such as retrieval, addition and removal of database records...

    , or
  • By introducing SQL window function to the SELECT-statement


ISO SQL:2008 introduced the FETCH FIRST clause.

ROW_NUMBER window function

ROW_NUMBER OVER may be used for a simple table on the returned rows, e.g. to return no more than ten rows:


SELECT * FROM
( SELECT
ROW_NUMBER OVER (ORDER BY sort_key ASC) AS row_number,
columns
FROM tablename
) foo
WHERE row_number <= 10



ROW_NUMBER can be non-deterministic
Nondeterministic algorithm
In computer science, a nondeterministic algorithm is an algorithm that can exhibit different behaviors on different runs, as opposed to a deterministic algorithm. There are several ways an algorithm may behave differently from run to run. A concurrent algorithm can perform differently on different...

: if sort_key is not unique, each time you run the query it is possible to get different row numbers assigned to any rows where sort_key is the same. When sort_key is unique, each row will always get a unique row number.

RANK window function

The RANK OVER window function acts like ROW_NUMBER, but may return more than n rows in case of tie conditions, e.g. to return the top-10 youngest persons:


SELECT * FROM (
SELECT
RANK OVER (ORDER BY age ASC) AS ranking,
person_id,
person_name,
age
FROM person
) AS foo
WHERE ranking <= 10


The above code could return more than ten rows, e.g. if there are two people of the same age, it could return eleven rows.

FETCH FIRST clause

Since ISO SQL:2008 results limits can be specified as in the following example using the FETCH FIRST clause.

SELECT * FROM T FETCH FIRST 10 ROWS ONLY

This clause currently is supported by IBM DB2, Sybase SQL Anywhere, PostgreSQL, EffiProz and HSQLDB version 2.0.

Result limits

Not all DBMSes support the mentioned window functions, and non-standard syntax has to be used. Below, variants of the simple limit query for different DBMSes are listed:
SELECT * FROM T LIMIT 10 OFFSET 20 Netezza
Netezza
Netezza designs and markets high-performance data warehouse appliances and advanced analytics applications for uses including enterprise data warehousing, business intelligence, predictive analytics and business continuity planning....

, MySQL
MySQL
MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...

, PostgreSQL
PostgreSQL
PostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...

 (also supports the standard, since version 8.4), SQLite
SQLite
SQLite is an ACID-compliant embedded relational database management system contained in a relatively small C programming library. The source code for SQLite is in the public domain and implements most of the SQL standard...

, HSQLDB
HSQLDB
HSQLDB is a relational database management system written in Java. It has a JDBC driver and supports a large subset of SQL-92 and SQL:2008 standards. It offers a fast, small database engine which offers both in-memory and disk-based tables...

, H2
H2 (DBMS)
H2 is a relational database management system written in Java. It can be embedded in Java applications or run in the client-server mode. The disk footprint is about 1 MB....

, Vertica
Vertica
Vertica Systems is an analytic database management software company. Vertica was founded in 2005 by database researcher Michael Stonebraker, and Andrew Palmer; its President and CEO is Christopher P. Lynch. HP announced it would acquire the company in February 2011. On March 22, 2011, HP completed...

SELECT * from T WHERE ROWNUM <= 10 Oracle
Oracle database
The Oracle Database is an object-relational database management system produced and marketed by Oracle Corporation....

 (also supports the standard, since Oracle8i)
SELECT FIRST 10 * from T Ingres
SELECT FIRST 10 * FROM T order by a Informix
SELECT SKIP 20 FIRST 10 * FROM T order by c, d Informix (row numbers are filtered after order by is evaluated. SKIP clause was introduced in a v10.00.xC4 fixpack)
SELECT TOP 10 * FROM T MS SQL Server
Microsoft SQL Server
Microsoft SQL Server is a relational database server, developed by Microsoft: It is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network...

, Sybase ASE
Adaptive Server Enterprise
Adaptive Server Enterprise is Sybase Corporation's flagship enterprise-class relational model database server product. ASE is predominantly used on the Unix platform but is also available for Windows.-History:...

, MS Access
Microsoft Access
Microsoft Office Access, previously known as Microsoft Access, is a relational database management system from Microsoft that combines the relational Microsoft Jet Database Engine with a graphical user interface and software-development tools. It is a member of the Microsoft Office suite of...

SELECT TOP 10 START AT 20 * FROM T Sybase SQL Anywhere
SQL Anywhere
SQL Anywhere is a relational database management system product from the company Sybase iAnywhere, a subsidiary of Sybase.- Features :...

 (also supports the standard, since version 9.0.1)
SELECT FIRST 10 SKIP 20 * FROM T Interbase
InterBase
InterBase is a relational database management system currently developed and marketed by Embarcadero Technologies. InterBase is distinguished from other DBMSs by its small footprint, close to zero administration requirements, and multi-generational architecture...

, Firebird
Firebird (database server)
Firebird is an open source SQL relational database management system that runs on Linux, Windows, and a variety of Unix. The database forked from Borland's open source edition of InterBase in 2000, but since Firebird 1.5 the code has been largely rewritten ....

SELECT * FROM T ROWS 20 TO 30 Firebird
Firebird (database server)
Firebird is an open source SQL relational database management system that runs on Linux, Windows, and a variety of Unix. The database forked from Borland's open source edition of InterBase in 2000, but since Firebird 1.5 the code has been largely rewritten ....

 (since version 2.1)
SELECT * FROM T
WHERE ID_T > 10 FETCH FIRST 10 ROWS ONLY
DB2
IBM DB2
The IBM DB2 Enterprise Server Edition is a relational model database server developed by IBM. It primarily runs on Unix , Linux, IBM i , z/OS and Windows servers. DB2 also powers the different IBM InfoSphere Warehouse editions...

SELECT * FROM T
WHERE ID_T > 20 FETCH FIRST 10 ROWS ONLY
DB2
IBM DB2
The IBM DB2 Enterprise Server Edition is a relational model database server developed by IBM. It primarily runs on Unix , Linux, IBM i , z/OS and Windows servers. DB2 also powers the different IBM InfoSphere Warehouse editions...

 (new rows are filtered after comparing with key column of table T)

Hierarchical query

Some databases provide specialised syntax
Hierarchical query
A hierarchical query is a type of SQL query that handles hierarchical model data.Standard SQL specifies hierarchical queries by way of recursive common table expressions...

 for hierarchical data.

Window function

A window function in SQL:2003
SQL:2003
SQL:2003 is the fifth revision of the SQL database query language. The latest revision of the standard is SQL:2008.-Summary:The SQL:2003 standard makes minor modifications to all parts of SQL:1999 , and officially introduces a few new features such as:* XML-related features * Window functions* the...

 is an aggregate function
Aggregate function
In computer science, an aggregate function is a function where the values of multiple rows are grouped together as input on certain criteria to form a single value of more significant meaning or measurement such as a set, a bag or a list....

applied to a partition of the result set.

For example,

sum(population) OVER( PARTITION BY city )

calculates the sum of the populations of all rows having the same city value as the current row.

Partitions are specified using the OVER clause which modifies the aggregate. Syntax:

:: =
OVER ( [ PARTITION BY , ... ]
[ ORDER BY ] )

The OVER clause can partition and order the result set. Ordering is used for order-relative functions such as row_number.

Query evaluation ANSI

The processing of a SELECT statement according to ANSI SQL would be the following:



  1. select g.*
    from users u
    inner join groups g on g.Userid = u.Userid
    where u.LastName = 'Smith'
    and u.FirstName = 'John'


  2. the FROM clause is evaluated, a cross join or Cartesian product is produced for the first two tables in the FROM clause resulting in a virtual table as Vtable1

  3. the ON clause is evaluated for vtable1; only records which meet the join condition g.Userid = u.Userid are inserted into Vtable2

  4. If an outer join is specified, records which were dropped from vTable2 are added into VTable 3, for instance if the above query were:

    select u.*
    from users u
    left join groups g on g.Userid = u.Userid
    where u.LastName = 'Smith'
    and u.FirstName = 'John'
    all users who did not belong to any groups would be added back into Vtable3

  5. the WHERE clause is evaluated, in this case only group information for user John Smith would be added to vTable4

  6. the GROUP BY is evaluated; if the above query were:

    select g.GroupName, count(g.*) as NumberOfMembers
    from users u
    inner join groups g on g.Userid = u.Userid
    group by GroupName

    vTable5 would consist of members returned from vTable4 arranged by the grouping, in this case the GroupName

  7. the HAVING clause is evaluated for groups for which the HAVING clause is true and inserted into vTable6. For example:

    select g.GroupName, count(g.*) as NumberOfMembers
    from users u
    inner join groups g on g.Userid = u.Userid
    group by GroupName
    having count(g.*) > 5

  8. the SELECT list is evaluated and returned as Vtable 7

  9. the DISTINCT clause is evaluated; duplicate rows are removed and returned as Vtable 8

  10. the ORDER BY clause is evaluated, ordering the rows and returning VCursor9. This is a cursor and not a table because ANSI defines a cursor as an ordered set of rows (not relational).


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK