Chunked transfer encoding
Encyclopedia
Chunked transfer encoding is a data transfer mechanism in version 1.1 of the Hypertext Transfer Protocol
(HTTP) in which a web server
serves content in a series of chunks. It uses the Transfer-Encoding HTTP response header in place of the Content-Length header, which the protocol would otherwise require. Because the Content-Length header is not used, the server does not need to know the length of the content before it starts transmitting a response to the client
(usually a web browser
). Web servers can begin transmitting responses with dynamically-generated content before knowing the total size of that content.
The size of each chunk is sent right before the chunk itself so that a client can tell when it has finished receiving data for that chunk. The data transfer is terminated by a final chunk of length zero.
Each chunk starts with the number of octets
of the data it embeds expressed in hexadecimal
followed by optional parameters (chunk extension) and a terminating CRLF (carriage return
and line feed) sequence, followed by the chunk data. The chunk is terminated by CRLF. If chunk extensions are provided, the chunk size is terminated by a semicolon followed with the extension name and an optional equal sign and value.
The last chunk is a zero-length chunk, with the chunk size coded as 0, but without any chunk data section.
The final chunk may be followed by an optional trailer of additional entity header fields that are normally delivered in the HTTP header to allow the delivery of data that can only be computed after all chunk data has been generated. The sender may indicate in a Trailer header field which additional fields it will send in the trailer after the chunks.
Note: Chunk size only indicates size of chunk data not the trailer. does not include CRLF("\r\n")
The response ends with a zero-length last chunk: "0\r\n" and the final "\r\n".
Hypertext Transfer Protocol
The Hypertext Transfer Protocol is a networking protocol for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web....
(HTTP) in which a web server
Web server
Web server can refer to either the hardware or the software that helps to deliver content that can be accessed through the Internet....
serves content in a series of chunks. It uses the Transfer-Encoding HTTP response header in place of the Content-Length header, which the protocol would otherwise require. Because the Content-Length header is not used, the server does not need to know the length of the content before it starts transmitting a response to the client
Client (computing)
A client is an application or system that accesses a service made available by a server. The server is often on another computer system, in which case the client accesses the service by way of a network....
(usually a web browser
Web browser
A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...
). Web servers can begin transmitting responses with dynamically-generated content before knowing the total size of that content.
The size of each chunk is sent right before the chunk itself so that a client can tell when it has finished receiving data for that chunk. The data transfer is terminated by a final chunk of length zero.
Rationale
The introduction of chunked encoding into HTTP 1.1 provided a number of benefits:- Chunked transfer encoding allows a server to maintain an HTTP persistent connectionHTTP persistent connectionHTTP persistent connection, also called HTTP keep-alive, or HTTP connection reuse, is the idea of using the same TCP connection to send and receive multiple HTTP requests/responses, as opposed to opening a new connection for every single request/response pair.- Operation :Under HTTP 1.0, there is...
for dynamically generated content. Normally, persistent connections require the server to send a Content-Length field in the header before starting to send the entity body, but for dynamically generated content this is usually not known before the content is created. - Chunked encoding allows the sender to send additional header fields after the message body. This is important in cases where values of a field cannot be known until the content has been produced such as when the content of the message must be digitally signed. Without chunked encoding, the sender would have to buffer the content until it was complete in order to calculate a field value and send it before the content.
- HTTP servers sometimes use compressionData compressionIn computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....
(gzipGzipGzip is any of several software applications used for file compression and decompression. The term usually refers to the GNU Project's implementation, "gzip" standing for GNU zip. It is based on the DEFLATE algorithm, which is a combination of Lempel-Ziv and Huffman coding...
) or deflateDEFLATEDeflate is a lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool and was later specified in RFC 1951....
methods to optimize transmission. Chunked transfer encoding can be used to delimit parts of the compressed object. In this case the chunks are not individually compressed. Instead, the complete payload is compressed and the output of the compression process is chunk encoded. In the case of compression, chunked encoding has the benefit that the compression can be performed on the fly while the data is delivered, as opposed to completing the compression process beforehand to determine the final size.
Applicability
For version 1.1 of the HTTP protocol, the chunked transfer mechanism is considered to be always acceptable, even if not listed in the TE request header field, and when used with other transfer mechanisms, should always be applied last to the transferred data and never more than one time. This transfer coding method also allows additional entity header fields to be sent after the last chunk if the client specified the "trailers" parameter as an argument of the TE field. The origin server of the response can also decide to send additional entity trailers even if the client did not specify the "trailers" option in the TE request field, but only if the metadata is optional (i.e. the client can use the received entity without them). Whenever the trailers are used, the server should list their names in the Trailer header field; 3 header field types are specifically prohibited from appearing as a trailer field: Transfer-Encoding, Content-Length and Trailer.Format
If a Transfer-Encoding field with a value of chunked is specified in an HTTP message (either a request sent by a client or the response from the server), the body of the message consists of an unspecified number of chunks, a terminating last-chunk, an optional trailer of entity header fields, and a final CRLF sequence.Each chunk starts with the number of octets
Octet (computing)
An octet is a unit of digital information in computing and telecommunications that consists of eight bits. The term is often used when the term byte might be ambiguous, as there is no standard for the size of the byte.-Overview:...
of the data it embeds expressed in hexadecimal
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
followed by optional parameters (chunk extension) and a terminating CRLF (carriage return
Carriage return
Carriage return, often shortened to return, refers to a control character or mechanism used to start a new line of text.Originally, the term "carriage return" referred to a mechanism or lever on a typewriter...
and line feed) sequence, followed by the chunk data. The chunk is terminated by CRLF. If chunk extensions are provided, the chunk size is terminated by a semicolon followed with the extension name and an optional equal sign and value.
The last chunk is a zero-length chunk, with the chunk size coded as 0, but without any chunk data section.
The final chunk may be followed by an optional trailer of additional entity header fields that are normally delivered in the HTTP header to allow the delivery of data that can only be computed after all chunk data has been generated. The sender may indicate in a Trailer header field which additional fields it will send in the trailer after the chunks.
Encoded response
HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked
25
This is the data in the first chunk
1C
and this is the second one
3
con
8
sequence
0
Note: Chunk size only indicates size of chunk data not the trailer. does not include CRLF("\r\n")
Anatomy of encoded response
The first two chunks in the above sample contain explicit \r\n characters in the chunk data.
"This is the data in the first chunk\r\n" (37 chars => hex: 0x25)
"and this is the second one\r\n" (28 chars => hex: 0x1C)
"con" (3 chars => hex: 0x03)
"sequence" (8 chars => hex: 0x08)
The response ends with a zero-length last chunk: "0\r\n" and the final "\r\n".
Decoded data
This is the data in the first chunk
and this is the second one
consequence