Common Gateway Interface
Encyclopedia
The Common Gateway Interface (CGI) is a standard (see RFC 3875: CGI Version 1.1) method for web server
Web server
Web server can refer to either the hardware or the software that helps to deliver content that can be accessed through the Internet....

s software to delegate the generation of web page
Web page
A web page or webpage is a document or information resource that is suitable for the World Wide Web and can be accessed through a web browser and displayed on a monitor or mobile device. This information is usually in HTML or XHTML format, and may provide navigation to other web pages via hypertext...

s to executable files. Such files are known as CGI scripts; they are programs, often stand-alone applications, usually written in a scripting language
Scripting language
A scripting language, script language, or extension language is a programming language that allows control of one or more applications. "Scripts" are distinct from the core code of the application, as they are usually written in a different language and are often created or at least modified by the...

.

More details

A web server that supports CGI can be configured to interpret a URL
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....

 that it serves as a reference to CGI scripts. A common convention is to have a cgi-bin/ directory at the base of the directory tree and treat all executable files within it as CGI scripts. Another popular convention is to use filename extension
Filename extension
A filename extension is a suffix to the name of a computer file applied to indicate the encoding of its contents or usage....

s; for instance, if CGI scripts are consistently given the extension .cgi, the web server can be configured to interpret all such files as CGI scripts.

In the case of HTTP PUT or POSTs, the user-submitted data is provided to the program via the standard input. In any case, according to the CGI standard, data is passed into the program using certain, specific environment variables. This is in contrast to typical execution, where command-line arguments are used and the environment is in constant upheaval and cannot be trusted. The web server creates a small and efficient subset of the environment variables passed to it and adds details pertinent to the execution of the program.

Simple Example

The following CGI program shows all the environment variables passed by the web server:


#!/usr/local/bin/perl
##
## printenv—demo CGI program which just prints its environment
##
#
print "Content-type: text/plain\n\n";
foreach $var (sort(keys(%ENV))) {
$val = $ENV{$var};
$val =~ s|\n|\\n|g;
$val =~ s|"|\\"|g;
print "${var}=\"${val}\"\n";
}

  • If a web browser issues a request for the environment variables at http://example.com/cgi-bin/printenv.pl/foo/bar?var1=value1&var2=with%20percent%20encoding, a 64-bit Microsoft Windows
    Microsoft Windows
    Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

     web server running cygwin
    Cygwin
    Cygwin is a Unix-like environment and command-line interface for Microsoft Windows. Cygwin provides native integration of Windows-based applications, data, and other system resources with applications, software tools, and data of the Unix-like environment...

     returns the following information:

COMSPEC="C:\Windows\system32\cmd.exe"
DOCUMENT_ROOT="C:/Program Files (x86)/Apache Software Foundation/Apache2.2/htdocs"
GATEWAY_INTERFACE="CGI/1.1"
HOME="/home/SYSTEM"
HTTP_ACCEPT="text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
HTTP_ACCEPT_CHARSET="ISO-8859-1,utf-8;q=0.7,*;q=0.7"
HTTP_ACCEPT_ENCODING="gzip, deflate"
HTTP_ACCEPT_LANGUAGE="en-us,en;q=0.5"
HTTP_CONNECTION="keep-alive"
HTTP_HOST="example.com"
HTTP_USER_AGENT="Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0"
PATH="/home/SYSTEM/bin:/bin:/cygdrive/c/progra~2/php:/cygdrive/c/windows/system32:..."
PATHEXT=".COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC"
PATH_INFO="/foo/bar"
PATH_TRANSLATED="C:\Program Files (x86)\Apache Software Foundation\Apache2.2\htdocs\foo\bar"
QUERY_STRING="var1=value1&var2=with%20percent%20encoding"
REMOTE_ADDR="127.0.0.1"
REMOTE_PORT="63555"
REQUEST_METHOD="GET"
REQUEST_URI="/cgi-bin/printenv.pl/foo/bar?var1=value1&var2=with%20percent%20encoding"
SCRIPT_FILENAME="C:/Program Files (x86)/Apache Software Foundation/Apache2.2/cgi-bin/printenv.pl"
SCRIPT_NAME="/cgi-bin/printenv.pl"
SERVER_ADDR="127.0.0.1"
SERVER_ADMIN="(server admin's email address)"
SERVER_NAME="127.0.0.1"
SERVER_PORT="80"
SERVER_PROTOCOL="HTTP/1.1"
SERVER_SIGNATURE=""
SERVER_SOFTWARE="Apache/2.2.19 (Win32) PHP/5.2.17"
SYSTEMROOT="C:\Windows"
TERM="cygwin"
WINDIR="C:\Windows"


From the environment, we see that the web browser
Web browser
A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...

 is Firefox running on a Windows 7 PC, the web server is Apache running on a system which emulates Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

, and the CGI script is named cgi-bin/printenv.pl.

The program could then generate any content, write that to its standard output, and the web server will transmit it to the browser.

Environment variables passed to a CGI program

  • Server specific variables:
    • SERVER_SOFTWAREname/version of HTTP server.
    • SERVER_NAME — host name of the server, may be dot-decimal
      Dot-decimal notation
      Dot-decimal notation is a presentation format for numerical data. It consists of a string of decimal numbers, each pair separated by a full stop ....

       IP address.
    • GATEWAY_INTERFACE — CGI/version.
  • Request specific variables:
    • SERVER_PROTOCOL — HTTP/version.
    • SERVER_PORT — TCP port (decimal).
    • REQUEST_METHOD — name of HTTP method (see above).
    • PATH_INFO — path suffix, if appended to URL after program name and a slash.
    • PATH_TRANSLATED — corresponding full path as supposed by server, if PATH_INFO is present.
    • SCRIPT_NAME — relative path to the program, like /cgi-bin/script.cgi.
    • QUERY_STRING — the part of URL after ?
      Question mark
      The question mark , is a punctuation mark that replaces the full stop at the end of an interrogative sentence in English and many other languages. The question mark is not used for indirect questions...

       character. May be composed of *name=value pairs separated with ampersand
      Ampersand
      An ampersand is a logogram representing the conjunction word "and". The symbol is a ligature of the letters in et, Latin for "and".-Etymology:...

      s (such as var1=val1&var2=val2…) when used to submit form
      Form (web)
      A webform on a web page allows a user to enter data that is sent to a server for processing. Webforms resemble paper or database forms because internet users fill out the forms using checkboxes, radio buttons, or text fields...

       data transferred via GET method as defined by HTML application/x-www-form-urlencoded.
    • REMOTE_HOST — host name of the client, unset if server did not perform such lookup.
    • REMOTE_ADDRIP address
      IP address
      An Internet Protocol address is a numerical label assigned to each device participating in a computer network that uses the Internet Protocol for communication. An IP address serves two principal functions: host or network interface identification and location addressing...

       of the client (dot-decimal).
    • AUTH_TYPE — identification type, if applicable.
    • REMOTE_USER used for certain AUTH_TYPEs.
    • REMOTE_IDENT — see ident
      Ident
      The Ident Protocol, specified in RFC 1413, is an Internet protocol that helps identify the user of a particular TCP connection. One popular daemon program for providing the ident service is identd.-How ident works:...

      , only if server performed such lookup.
    • CONTENT_TYPE — MIME type of input data if PUT or POST method are used, as provided via HTTP header.
    • CONTENT_LENGTH — similarly, size of input data (decimal, in octets
      Octet (computing)
      An octet is a unit of digital information in computing and telecommunications that consists of eight bits. The term is often used when the term byte might be ambiguous, as there is no standard for the size of the byte.-Overview:...

      ) if provided via HTTP header.
    • Variables passed by user agent (HTTP_ACCEPT, HTTP_ACCEPT_LANGUAGE, HTTP_USER_AGENT, HTTP_COOKIE and possibly others) contain values of corresponding HTTP headers and therefore have the same sense.

Output format

The program returns the result to the web server in the form of standard output, beginning with a header and a blank line.

The header is encoded in the same way as an HTTP header and must include the MIME type of the document returned. The headers, supplemented by the web server, are generally forwarded with the response back to the user.

Example

An example of a CGI program is one implementing a wiki
Wiki
A wiki is a website that allows the creation and editing of any number of interlinked web pages via a web browser using a simplified markup language or a WYSIWYG text editor. Wikis are typically powered by wiki software and are often used collaboratively by multiple users. Examples include...

. The user agent requests the name of an entry; the server retrieves the source of that entry's page (if one exists), transforms it into HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

, and sends the result.

History

In 1993, the World Wide Web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

 (WWW) was small but booming. WWW software developers and web site developers kept in touch on the www-talk mailing list, so it was there that a standard for calling command line executables was agreed upon. Specifically mentioned in RFC 3875 are the following contributors:
  • Rob McCool (author of the NCSA httpd
    NCSA HTTPd
    NCSA HTTPd was a web server originally developed at the NCSA by Robert McCool and others. It was among the earliest web servers developed, following Tim Berners-Lee's CERN httpd, Tony Sanders' Plexus server, and some others. It was for some time the natural counterpart to the Mosaic web browser in...

     web server)
  • John Franks (author of the GN web server)
  • Ari Luotonen (the developer of the CERN httpd
    CERN httpd
    CERN httpd was a web server daemon originally developed at CERN from 1990 onwards by Tim Berners-Lee, Ari Luotonen and Henrik Frystyk Nielsen...

     web server)
  • Tony Sanders (author of the Plexus web server)
  • George Phillips (web server maintainer at the University of British Columbia
    University of British Columbia
    The University of British Columbia is a public research university. UBC’s two main campuses are situated in Vancouver and in Kelowna in the Okanagan Valley...

    )

The NCSA
National Center for Supercomputing Applications
The National Center for Supercomputing Applications is an American state-federal partnership to develop and deploy national-scale cyberinfrastructure that advances science and engineering. NCSA operates as a unit of the University of Illinois at Urbana-Champaign but it provides high-performance...

 team wrote the specification, however, NCSA no longer hosts this. (A possible mirror of the original documentation is available.) The other web server developers adopted it, and it has been a standard for web servers ever since. Since its initial adoption an effort was mounted to get it published more formally which resulted in RFC 3875.

Drawbacks

Calling a command generally means the invocation of a newly created process on the server. Starting the process can consume much more time and memory than the actual work of generating the output, especially when the program still needs to be interpreted or compiled
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

.
If the command is called often, the resulting workload can quickly overwhelm the web server.

The overhead
Computational overhead
In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal...

 involved in interpretation may be reduced by using compiled CGI programs, such as those in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

/C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

, rather than using Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

 or other scripting languages. The overhead involved in process creation can be reduced by solutions such as FastCGI
FastCGI
FastCGI is a protocol for interfacing interactive programs with a web server. FastCGI is a variation on the earlier Common Gateway Interface ; FastCGI's main aim is to reduce the overhead associated with interfacing the web server and CGI programs, allowing a server to handle more web page...

, or by running the application code entirely within the web server using extension modules such as mod php.

Alternatives

Several approaches can be adopted for remedying this:
  • The popular Web servers developed their own extension mechanisms that allows third-party software to run inside the web server itself, e.g. Apache modules, Netscape NSAPI
    Netscape Server Application Programming Interface
    The Netscape Server Application Programming Interface is an application programming interface for extending server software, typically web server software.-History:...

     plug-ins, IIS
    Internet Information Services
    Internet Information Services – formerly called Internet Information Server – is a web server application and set of feature extension modules created by Microsoft for use with Microsoft Windows. It is the most used web server after Apache HTTP Server. IIS 7.5 supports HTTP, HTTPS,...

     ISAPI plug-ins.
  • Simple Common Gateway Interface
    Simple Common Gateway Interface
    The Simple Common Gateway Interface is a protocol for applications to interface with HTTP servers, as an alternative to the CGI protocol...

     or SCGI
  • FastCGI
    FastCGI
    FastCGI is a protocol for interfacing interactive programs with a web server. FastCGI is a variation on the earlier Common Gateway Interface ; FastCGI's main aim is to reduce the overhead associated with interfacing the web server and CGI programs, allowing a server to handle more web page...

     allows a single, long-running process to handle more than one user request while keeping close to the CGI programming model, retaining the simplicity while eliminating the overhead of creating a new process for each request. Unlike converting an application to a web server plug-in, FastCGI applications remain independent of the web server.
  • Replacement of the architecture for dynamic websites can also be used. This is the approach taken by solutions including Java Platform, Enterprise Edition
    Java Platform, Enterprise Edition
    Java Platform, Enterprise Edition or Java EE is widely used platform for server programming in the Java programming language. The Java platform differs from the Java Standard Edition Platform in that it adds libraries which provide functionality to deploy fault-tolerant, distributed, multi-tier...

     (a.k.a. Java EE), which runs Java code in a Java servlet container in order to serve dynamic content and optionally static content. This approach replaces the overhead of generating and destroying processes with the much lower overhead of generating and destroying threads
    Thread (computer science)
    In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process...

    , and also exposes the programmer to the library that comes with Java Platform, Standard Edition
    Java Platform, Standard Edition
    Java Platform, Standard Edition or Java SE is a widely used platform for programming in the Java language. It is the Java Platform used to deploy portable applications for general use...

     that the version of Java EE in use is based on.


The optimal configuration for any web application depends on application-specific details, amount of traffic, and complexity of the transaction; these tradeoffs need to be analyzed to determine the best implementation for a given task and time budget.

See also

  • FastCGI
    FastCGI
    FastCGI is a protocol for interfacing interactive programs with a web server. FastCGI is a variation on the earlier Common Gateway Interface ; FastCGI's main aim is to reduce the overhead associated with interfacing the web server and CGI programs, allowing a server to handle more web page...

  • SCGI
    Simple Common Gateway Interface
    The Simple Common Gateway Interface is a protocol for applications to interface with HTTP servers, as an alternative to the CGI protocol...

  • Web Server Gateway Interface
  • PSGI
  • http://www.boutell.com/cgic/

External links

  • Cgicc, FSF C++ library for CGI request parsing and HTML response generation
  • CGI, a standard Perl module for CGI request parsing and HTML response generation
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK