Http compression
Encyclopedia
HTTP compression is a capability that can be built into web server
Web server
Web server can refer to either the hardware or the software that helps to deliver content that can be accessed through the Internet....

s and web clients to make better use of available bandwidth , and provide faster transmission speeds between both. HTTP data is compressed
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

 before it is sent from the server: compliant browsers will announce what methods are supported to the server before downloading the correct format; browsers that do not support compliant compression method will download uncompressed data. The most common compression schemas include gzip
Gzip
Gzip is any of several software applications used for file compression and decompression. The term usually refers to the GNU Project's implementation, "gzip" standing for GNU zip. It is based on the DEFLATE algorithm, which is a combination of Lempel-Ziv and Huffman coding...

 and deflate
Zlib
zlib is a software library used for data compression. zlib was written by Jean-Loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. Zlib is also a crucial component of many software platforms including Linux, Mac OS X,...

, however a full list of available schemas is maintained by IANA
Internet Assigned Numbers Authority
The Internet Assigned Numbers Authority is the entity that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System , media types, and other Internet Protocol-related symbols and numbers...

. Additionally, third parties develop new methods and include them in their products (e.g. the Google SDCH schema implemented in Google Chrome
Google Chrome
Google Chrome is a web browser developed by Google that uses the WebKit layout engine. It was first released as a beta version for Microsoft Windows on September 2, 2008, and the public stable release was on December 11, 2008. The name is derived from the graphical user interface frame, or...

 browser and used on certain Google servers).

A 2009 article by Google engineers Arvind Jain and Jason Glasgow states that more than 99 person-years are wasted daily due to page load time increases when users do not receive compressed content. This occurs where anti-virus software interferes with connections to force them to uncompressed, where proxies are used (with overcautious web browsers), where servers are misconfigured, and where browser bugs stop compression being used. Internet Explorer 6, which drops to HTTP 1.0 (without features like compression or pipelining) when behind a proxy- a common configuration in corporate environments- was the mainstream browser most prone to failing back to uncompressed HTTP.

Client/Server compression scheme negotiation

In most cases, excluding the SDCH, the negotiation is done in two steps, described in the RFC 2616:

1. The web client includes an Accept-Encoding field in the HTTP request, with supported compression schema names (called content-coding tokens), separated by commas.

GET /encrypted-area HTTP/1.1
Host: www.example.com
Accept-Encoding: gzip, deflate

2. If the server supports one or more compression schemas, the outgoing data may be compressed by one or more methods supported by both parties. If this is the case, the server will add a Content-Encoding field in the HTTP response with the used schemas, separated by commas.

HTTP/1.1 200 OK
Date: Mon, 23 May 2005 22:38:34 GMT
Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
Etag: "3f80f-1b6-3e1cb03b"
Accept-Ranges: bytes
Content-Length: 438
Connection: close
Content-Type: text/html; charset=UTF-8
Content-Encoding: gzip

The web server
Web server
Web server can refer to either the hardware or the software that helps to deliver content that can be accessed through the Internet....

 is by no means obligated to use any compression method - this depends on the internal settings of the web server
Web server
Web server can refer to either the hardware or the software that helps to deliver content that can be accessed through the Internet....

 and also may depend on the internal architecture of the website in question.

In case of SDCH a dictionary negotiation is also required, which may involve additional steps, like downloading a proper dictionary from the external server.

Content-coding tokens

  • compress
    Compress
    Compress is a UNIX compression program based on the LZC compression method, which is an LZW implementation using variable size pointers as in LZ78.- Description of program :Files compressed by compress are typically given the extension .Z...

     - UNIX "compress" program method
  • deflate - despite its name the zlib
    Zlib
    zlib is a software library used for data compression. zlib was written by Jean-Loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. Zlib is also a crucial component of many software platforms including Linux, Mac OS X,...

     compression (RFC 1950) should be used (in combination with the deflate
    DEFLATE
    Deflate is a lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool and was later specified in RFC 1951....

     compression (RFC 1951)) as described in the RFC 2616. The implementation in the real world however seems to vary between the zlib compression and the (raw) deflate compression. Due to this confusion, gzip has positioned itself as the more reliable default method (March 2011).
  • exi
    Efficient XML Interchange
    Efficient XML Interchange is a proposed data format from the Efficient XML Interchange Working Group of the World Wide Web Consortium...

     - W3C Efficient XML Interchange
  • gzip
    Gzip
    Gzip is any of several software applications used for file compression and decompression. The term usually refers to the GNU Project's implementation, "gzip" standing for GNU zip. It is based on the DEFLATE algorithm, which is a combination of Lempel-Ziv and Huffman coding...

     - GNU zip format (described in RFC 1952). This method is the most broadly supported as of March 2011 .
  • identity - No transformation is used. This is the default value for content coding.
  • pack200-gzip
    Pack200
    Pack200, specified in JSR 200, is an HTTP compression method by Sun for faster JAR file transfer speeds over the network. Pack200 may also refer to the Pack200 compression tools provided in Sun's JDK since 1.5.0, as well as the Pack200 compressed files....

     - Network Transfer Format for Java Archives
  • sdch - Google
    Google
    Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

     Shared Dictionary Compression for HTTP
  • bzip2
    Bzip2
    bzip2 is a free and open source implementation of the Burrows–Wheeler algorithm. It is developed and maintained by Julian Seward. Seward made the first public release of bzip2, version 0.15, in July 1996.-Compression efficiency:...

     - free and open source lossless data compression algorithm
  • peerdist - Microsoft
    Microsoft
    Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

     Peer Content Caching and Retrieval (described in MS-PCCRPT)

Servers that support HTTP compression

  • Microsoft IIS
    Internet Information Services
    Internet Information Services – formerly called Internet Information Server – is a web server application and set of feature extension modules created by Microsoft for use with Microsoft Windows. It is the most used web server after Apache HTTP Server. IIS 7.5 supports HTTP, HTTPS,...

    : built-in or using 3rd-party module
  • Apache HTTP Server
    Apache HTTP Server
    The Apache HTTP Server, commonly referred to as Apache , is web server software notable for playing a key role in the initial growth of the World Wide Web. In 2009 it became the first web server software to surpass the 100 million website milestone...

    , via mod_deflate (despite its name currently only supporting gzip) or mod_gzip
  • Sun Java System Web Server
  • Zeus Web Server
    Zeus Web Server
    Zeus Web Server is a proprietary web server for Unix and Unix-like platforms . Support for AIX, Tru64, and Mac OS X was dropped on 10 June 2008....

  • Lighttpd
    Lighttpd
    lighttpd is an open-source web server more optimized for speed-critical environments than common products while remaining standards-compliant, secure and flexible...

    , via mod_compress and the newer mod_deflate (1.5.x)
  • Nginx
    Nginx
    nginx is a Web server and a reverse proxy server for HTTP, SMTP, POP3 and IMAP protocols, with a strong focus on high concurrency, performance and low memory usage. It is licensed under a BSD-like license and it runs on Unix, Linux, BSD variants, Mac OS X, Solaris, and Microsoft Windows.- Overview...

     - built-in
  • Geoserver
    GeoServer
    In computing, GeoServer - an open-source server written in Java - allows users to share and edit geospatial data. Designed for interoperability, it publishes data from any major spatial data source using open standards...



The compression in HTTP can also be achieved by using the functionality of server-side scripting
Server-side scripting
Server-side scripting is a web server technology in which a user's request is verified by running a script directly on the web server to generate dynamic web pages. It is usually used to provide interactive web sites that interface to databases or other data stores. This is different from...

 languages, like PHP
PHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...

 or Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK