Query string
Encyclopedia
In World Wide Web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

, a query string is the part of a Uniform Resource Locator
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....

 (URL) that contains data to be passed to web applications such as CGI
Common Gateway Interface
The Common Gateway Interface is a standard method for web servers software to delegate the generation of web pages to executable files...

 programs.

When a web page
Web page
A web page or webpage is a document or information resource that is suitable for the World Wide Web and can be accessed through a web browser and displayed on a monitor or mobile device. This information is usually in HTML or XHTML format, and may provide navigation to other web pages via hypertext...

 is requested via the Hypertext Transfer Protocol
Hypertext Transfer Protocol
The Hypertext Transfer Protocol is a networking protocol for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web....

, the server locates a file in its file system
File system
A file system is a means to organize data expected to be retained after a program terminates by providing procedures to store, retrieve and update data, as well as manage the available space on the device which contain it. A file system organizes data in an efficient manner and is tuned to the...

 based on the requested URL
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....

. This file may be a regular file or a program. In the second case, the server may (depending on its configuration) run the program, sending its output as the required page. The query string is a part of the URL which is passed to the program. Its use permits data to be passed from the HTTP client (often a web browser
Web browser
A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...

) to the program which generates the web page.

Structure

A typical URL containing a query string is as follows:
http://server/path/program?query_string

When a server receives a request for such a page, it runs a program (if configured to do so), passing the query_string unchanged to the program. The question mark is used as a separator and is not part of the query string.

A link in a web page may have a URL that contains a query string. However, the main use of query strings is to contain the content of an HTML form
Form (web)
A webform on a web page allows a user to enter data that is sent to a server for processing. Webforms resemble paper or database forms because internet users fill out the forms using checkboxes, radio buttons, or text fields...

, also known as web form. In particular, when a form containing the fields field1, field2, field3 is submitted, the content of the fields is encoded as a query string as follows:
field1=value1&field2=value2&field3=value3...
  • The query string is composed of a series of field-value pairs.
  • The field-value pairs are each separated by an equals sign
    Equals sign
    The equality sign, equals sign, or "=" is a mathematical symbol used to indicate equality. It was invented in 1557 by Robert Recorde. The equals sign is placed between the things stated to have the same value, as in an equation...

    . The equals sign may be omitted if the value is an empty string.
  • The series of pairs is separated by the ampersand
    Ampersand
    An ampersand is a logogram representing the conjunction word "and". The symbol is a ligature of the letters in et, Latin for "and".-Etymology:...

    , '&' or semicolon
    Semicolon
    The semicolon is a punctuation mark with several uses. The Italian printer Aldus Manutius the Elder established the practice of using the semicolon to separate words of opposed meaning and to indicate interdependent statements. "The first printed semicolon was the work of ... Aldus Manutius"...

    , ';'.


Multiple values can also be associated with a single field:
field1=value1&field1=value2&field1=value3...


For each field
Field (computer science)
In computer science, data that has several parts can be divided into fields. Relational databases arrange data as sets of database records, also called rows. Each record consists of several fields; the fields of all records form the columns....

 of the form, the query string contains a pair field=value. Web forms may include fields that are not visible to the user; these fields are included in the query string when the form is submitted

This convention is a W3C recommendation. W3C recommends that all web servers support semicolon
Semicolon
The semicolon is a punctuation mark with several uses. The Italian printer Aldus Manutius the Elder established the practice of using the semicolon to separate words of opposed meaning and to indicate interdependent statements. "The first printed semicolon was the work of ... Aldus Manutius"...

 separators in the place of ampersand
Ampersand
An ampersand is a logogram representing the conjunction word "and". The symbol is a ligature of the letters in et, Latin for "and".-Etymology:...

 separators.

Technically, the form content is only encoded as a query string when the form submission method is GET. The same encoding is used by default when the submission method is POST, but the result is not sent as a query string, that is, is not added to the action URL of the form. Rather, the string is sent as the body of the HTTP request.

URL encoding

Some characters
Character (computing)
In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language....

 cannot be part of a URL (for example, the space) and some other characters have a special meaning in a URL: for example, the character # can be used to further specify a subsection (or fragment
Fragment identifier
In computer hypertext, a fragment identifier is a short string of characters that refers to a resource that is subordinate to another, primary resource...

) of a document; the character = is used to separate a name from a value. A query string may need to be converted to satisfy these constraints. This can be done using a schema known as URL encoding.

In particular, encoding the query string uses the following rules:
  • Letters (A-Z and a-z), numbers (0-9) and the characters '.','-','~' and '_' are left as-is
  • SPACE is encoded as '+'
  • All other characters are encoded as %FF hex
    Hexadecimal
    In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

     representation with any non-ASCII characters first encoded as UTF-8 (or other specified encoding)


The octet corresponding to the tilde ("~") character is often encoded as "%7E" by older URI processing implementations; the "%7E" can be replaced by"~" without changing its interpretation.

The encoding of SPACE as '+' and the selection of "as-is" characters distinguishes this encoding from RFC 1738.

RFC

As defined in RFC 1738, a URL of scheme http can contain a searchpart following the rest of the URL and separated from it by a ? character. RFC 3986 specifies that the query component of a URI
Uniform Resource Identifier
In computing, a uniform resource identifier is a string of characters used to identify a name or a resource on the Internet. Such identification enables interaction with representations of the resource over a network using specific protocols...

 is the part between the ? and the end of the URI or the character #. The term query string is of common usage for referring to this part for the case of HTTP URLs.

Example

If a form
Form (web)
A webform on a web page allows a user to enter data that is sent to a server for processing. Webforms resemble paper or database forms because internet users fill out the forms using checkboxes, radio buttons, or text fields...

 is embedded in an HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

 page as follows:








and the user inserts the strings “this is a field” and “was it clear (already)?” in the two text fields
Text box
A text box, text field or text entry box is a kind of widget used when building a graphical user interface . A text box's purpose is to allow the user to input text information to be used by the program...

 and presses the submit button, the program test.cgi will receive the following query string:
first=this+is+a+field&second=was+it+clear+%28already%29%3F

If the form is processed on the server
Web server
Web server can refer to either the hardware or the software that helps to deliver content that can be accessed through the Internet....

 by a CGI
Common Gateway Interface
The Common Gateway Interface is a standard method for web servers software to delegate the generation of web pages to executable files...

 script
Scripting language
A scripting language, script language, or extension language is a programming language that allows control of one or more applications. "Scripts" are distinct from the core code of the application, as they are usually written in a different language and are often created or at least modified by the...

, the script may typically receive the query string as an environment variable
Environment variable
Environment variables are a set of dynamic named values that can affect the way running processes will behave on a computer.They can be said in some sense to create the operating environment in which a process runs...

 named QUERY_STRING.

Tracking

A program receiving a query string can ignore part or all of it. If the requested URL corresponds to a file and not to a program, the whole query string is ignored. However, regardless of whether the query string is used or not, the whole URL including it is stored in the server log file
Log file
The term log file can refer to:*Text saved by a computer operating system to recored its activities, such as by the Unix syslog facility*Output produced by a data loggerAlso see Wikibooks chapter...

s.

These facts allow query strings to be used to track users in a manner similar to that provided by HTTP cookie
HTTP cookie
A cookie, also known as an HTTP cookie, web cookie, or browser cookie, is used for an origin website to send state information to a user's browser and for the browser to return the state information to the origin site...

s. For this to work, every time the user downloads a page, a unique identifier must be chosen and added as a query string to the URLs of all links the page contains. As soon as the user follows one of these links, the corresponding URL is requested to the server. This way, the download of this page is linked with the previous one.

For example, when a web page containing the following is requested:

see my page!
mine is better


a unique string, such as e0a72cb2a2c7 is chosen, and the page is modified as follows:

see my page!
mine is better


The addition of the query string does not change the way the page is shown to the user. When the user follows, for example, the first link, the browser requests the page frank.html?e0a72cb2a2c7 to the server, which ignores what follows ? and sends the page frank.html as expected, adding the query string to its links as well.

This way, any subsequent page request from this user will carry the same query string e0a72cb2a2c7, making it possible to establish that all these pages have been viewed by the same user. Query strings are often used in association with web beacons.

The main differences between query strings used for tracking and HTTP cookies are that:
  1. Query strings form part of the URL, and are therefore included if the user saves or sends the URL to another user; cookies can be maintained across browsing sessions, but are not saved or sent with the URL.
  2. If the user arrives at the same web server by two (or more) independent paths, it will be assigned two different query strings, while the stored cookies are the same.
  3. The user can disable cookies, in which case using cookies for tracking does not work. However, using query strings for tracking should work in all situations.

Flexibility vs. security

A URL query string allows for flexibility in retrieving data from a web server and possibly from the database used to populate pages for that web server. A read only data store, such as a weather mapping service, is one example where URL query strings can be used with great flexibility.

In some circumstances, a URL query string may expose security issues because it can be edited by a user to retrieve data that they do not have access to. In particular, a URL query string containing a username and password could be used with a dictionary attack
Dictionary attack
In cryptanalysis and computer security, a dictionary attack is a technique for defeating a cipher or authentication mechanism by trying to determine its decryption key or passphrase by searching likely possibilities.-Technique:...

 to guess at valid login credentials to a particular web site. This concern is not specific to query strings—form data submitted via POST can also be similarly retrieved and edited by the user, with the appropriate browser extensions. Most secure webservers use at least MD5
MD5
The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity...

 hash checking, or more powerful encoding methods to validate all given strings.

Compatibility issues

According to the HTTP
Hypertext Transfer Protocol
The Hypertext Transfer Protocol is a networking protocol for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web....

 specification:
Servers should be cautious about depending on URI
Uniform Resource Identifier
In computing, a uniform resource identifier is a string of characters used to identify a name or a resource on the Internet. Such identification enables interaction with representations of the resource over a network using specific protocols...

 (which includes URLs) lengths above 255 bytes, because some older client or proxy implementations may not properly support these lengths.


The HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

 3 specification declares that any attribute value (e.g. url in <a href="url">) cannot have more than 1024 characters However, the HTML 4 specification omits this restriction.
The specification does not dictate a minimum or maximum URL length, but implementation varies by browser and version. For example, Internet Explorer
Internet Explorer
Windows Internet Explorer is a series of graphical web browsers developed by Microsoft and included as part of the Microsoft Windows line of operating systems, starting in 1995. It was first released as part of the add-on package Plus! for Windows 95 that year...

 does not support URLs that have more than 2083 characters. There is no limit on the number of parameters in a URL; only the raw (as opposed to URL encoded) character length of the URL matters. Web servers may also impose limits on the length of the query string, depending on how the URL and query string is stored. If the URL is too long, the web server fails with the 414 Request-URI Too Long HTTP status code.

The common workaround for these problems is to use POST
POST (HTTP)
In computing, POST is one of many request methods supported by the HTTP protocol used by the World Wide Web. The POST request method is used when the client needs to send data to the server as part of the request, such as when uploading a file or submitting a completed form.In contrast to the GET...

 instead of GET and store the parameters in the request body. The length limits on request bodies are typically much higher than those on URL length. For example, the limit on POST size, by default, is 2 MB on IIS 4.0 and 128 KB on IIS 5.0. the limit is changeable on Apache2 using the LimitRequestBody Directive which specifies the number of bytes from 0 (meaning unlimited) to 2147483647 (2GB) that are allowed in a request body .

See also

  • Clean URLs
    Clean URLs
    Clean URLs, RESTful URLs or user-friendly URLs are purely structural URLs that do not contain a query string and instead contain only the path of the resource . This is often done for aesthetic, usability, or search engine optimization purposes...

  • Common Gateway Interface
    Common Gateway Interface
    The Common Gateway Interface is a standard method for web servers software to delegate the generation of web pages to executable files...

  • HTTP cookie
    HTTP cookie
    A cookie, also known as an HTTP cookie, web cookie, or browser cookie, is used for an origin website to send state information to a user's browser and for the browser to return the state information to the origin site...

  • HyperText Transfer Protocol
    Hypertext Transfer Protocol
    The Hypertext Transfer Protocol is a networking protocol for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web....

  • URI scheme
    URI scheme
    In the field of computer networking, a URI scheme is the top level of the Uniform Resource Identifier naming structure. All URIs and absolute URI references are formed with a scheme name, followed by a colon character , and the remainder of the URI called the scheme-specific part...

  • Web beacon
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK