Quoted-printable
Encyclopedia
Quoted-printable, or QP encoding, is an encoding
using printable ASCII characters
(i.e. alphanumeric
and the equals sign
"=") to transmit 8-bit
data over a 7-bit data path or, generally, over a medium which is not 8-bit clean
. It is defined as a MIME content transfer encoding for use in e-mail
.
defines mechanisms for sending other kinds of information in e-mail, including text in languages other than English
, using character encoding
s other than ASCII. However these encodings often use byte values outside the ASCII range so they need to be encoded further before they are suitable for use in a non-8-bit-clean environment. Quoted-printable encoding is one method used for mapping arbitrary bytes into sequences of ASCII characters. So, Quoted-printable is not a character encoding scheme itself, but a data coding layer to be used under some byte-oriented character encoding. QP encoding is reversible, meaning the original bytes and hence the non-ASCII characters they represent can be identically recovered.
Quoted-printable and Base64
are the two basic MIME content transfer encodings, if a trivial "8bit" encoding is not counted. If the text to be encoded does not contain many non-ASCII characters, then quoted-printable results in a fairly readable and compact encoded result. On the other hand if the input is not mostly ASCII then quoted-printable becomes both unreadable and extremely inefficient. Base64 is not human-readable but has a uniform overhead for all data and is the more sensible choice for binary formats or text in non-Latin based languages.
digits (0–9 or A–F) representing the byte's numeric value. For example, an ASCII form feed character (decimal value 12) can be represented by "=0C", and an ASCII equal sign (decimal value 61) must be represented by "=3D". All characters except printable ASCII characters or end of line characters
must be encoded in this fashion.
All printable ASCII characters (decimal values between 33 and 126) may be represented by themselves, except "=" (decimal 61).
ASCII tab and space characters, decimal values 9 and 32, may be represented by themselves, except if these characters appear at the end of a line. If one of these characters appears at the end of a line it must be encoded as "=09" (tab) or "=20" (space).
If the data being encoded contains meaningful line breaks, they must be encoded as an ASCII CR LF sequence, not as their original byte values, neither directly nor via "=" signs. Conversely if byte values 13 and 10 have meanings other than end of line, then they must be encoded as =0D and =0A respectively.
Lines of quoted-printable encoded data must not be longer than 76 characters. To satisfy this requirement without altering the encoded text, soft line breaks may be added as desired. A soft line break consists of an "=" at the end of an encoded line, and does not appear as a line break in the decoded text. These soft line breaks also allow encoding text without line breaks (or containing very long lines) for an environment where line size is limited, such as the "1000 characters per line" limit of some SMTP software, as allowed by RFC 2821.
A slightly modified version of quoted-printable is used in message headers; see MIME#Encoded-Word.
This encodes the string:
Code
A code is a rule for converting a piece of information into another form or representation , not necessarily of the same type....
using printable ASCII characters
Graphic character
In ISO/IEC 646 and related standards including ISO 8859 and Unicode, a graphic character is any character intended to be written, printed, or otherwise displayed in a form that can be read by humans...
(i.e. alphanumeric
Alphanumeric
Alphanumeric is a combination of alphabetic and numeric characters, and is used to describe the collection of Latin letters and Arabic digits or a text constructed from this collection. There are either 36 or 62 alphanumeric characters. The alphanumeric character set consists of the numbers 0 to...
and the equals sign
Equals sign
The equality sign, equals sign, or "=" is a mathematical symbol used to indicate equality. It was invented in 1557 by Robert Recorde. The equals sign is placed between the things stated to have the same value, as in an equation...
"=") to transmit 8-bit
Octet (computing)
An octet is a unit of digital information in computing and telecommunications that consists of eight bits. The term is often used when the term byte might be ambiguous, as there is no standard for the size of the byte.-Overview:...
data over a 7-bit data path or, generally, over a medium which is not 8-bit clean
8-bit clean
8-bit clean describes a computer system that correctly handles 8-bit character sets, such as the ISO 8859 series and the UTF-8 encoding of Unicode.- History :...
. It is defined as a MIME content transfer encoding for use in e-mail
E-mail
Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...
.
Introduction
MIMEMIME
Multipurpose Internet Mail Extensions is an Internet standard that extends the format of email to support:* Text in character sets other than ASCII* Non-text attachments* Message bodies with multiple parts...
defines mechanisms for sending other kinds of information in e-mail, including text in languages other than English
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
, using character encoding
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...
s other than ASCII. However these encodings often use byte values outside the ASCII range so they need to be encoded further before they are suitable for use in a non-8-bit-clean environment. Quoted-printable encoding is one method used for mapping arbitrary bytes into sequences of ASCII characters. So, Quoted-printable is not a character encoding scheme itself, but a data coding layer to be used under some byte-oriented character encoding. QP encoding is reversible, meaning the original bytes and hence the non-ASCII characters they represent can be identically recovered.
Quoted-printable and Base64
Base64
Base64 is a group of similar encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation...
are the two basic MIME content transfer encodings, if a trivial "8bit" encoding is not counted. If the text to be encoded does not contain many non-ASCII characters, then quoted-printable results in a fairly readable and compact encoded result. On the other hand if the input is not mostly ASCII then quoted-printable becomes both unreadable and extremely inefficient. Base64 is not human-readable but has a uniform overhead for all data and is the more sensible choice for binary formats or text in non-Latin based languages.
Quoted-printable encoding
Any 8-bit byte value may be encoded with 3 characters, an "=" followed by two hexadecimalHexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
digits (0–9 or A–F) representing the byte's numeric value. For example, an ASCII form feed character (decimal value 12) can be represented by "=0C", and an ASCII equal sign (decimal value 61) must be represented by "=3D". All characters except printable ASCII characters or end of line characters
Newline
In computing, a newline, also known as a line break or end-of-line marker, is a special character or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line—that is, on the next line below the...
must be encoded in this fashion.
All printable ASCII characters (decimal values between 33 and 126) may be represented by themselves, except "=" (decimal 61).
ASCII tab and space characters, decimal values 9 and 32, may be represented by themselves, except if these characters appear at the end of a line. If one of these characters appears at the end of a line it must be encoded as "=09" (tab) or "=20" (space).
If the data being encoded contains meaningful line breaks, they must be encoded as an ASCII CR LF sequence, not as their original byte values, neither directly nor via "=" signs. Conversely if byte values 13 and 10 have meanings other than end of line, then they must be encoded as =0D and =0A respectively.
Lines of quoted-printable encoded data must not be longer than 76 characters. To satisfy this requirement without altering the encoded text, soft line breaks may be added as desired. A soft line break consists of an "=" at the end of an encoded line, and does not appear as a line break in the decoded text. These soft line breaks also allow encoding text without line breaks (or containing very long lines) for an environment where line size is limited, such as the "1000 characters per line" limit of some SMTP software, as allowed by RFC 2821.
A slightly modified version of quoted-printable is used in message headers; see MIME#Encoded-Word.
Example
If you believe that truth=3Dbeauty, then surely=20=
mathematics is the most beautiful branch of philosophy.
This encodes the string:
If you believe that truth=beauty, then surely mathematics is the most beautiful branch of philosophy.
Similar encoding schemes
- Percent-encodingPercent-encodingPercent-encoding, also known as URL encoding, is a mechanism for encoding information in a Uniform Resource Identifier under certain circumstances. Although it is known as URL encoding it is, in fact, used more generally within the main Uniform Resource Identifier set, which includes both Uniform...
(data encoding in URL, most used for text) - Numeric character referenceNumeric character referenceA numeric character reference is a common markup construct used in SGML and other SGML-related markup languages such as HTML and XML. It consists of a short sequence of characters that, in turn, represent a single character from the Universal Character Set of Unicode...
(text encoding in SGML, HTML, XML) - Rich Text Format#Character encoding (a component of text encoding)
External links
- RFC 1521 (obsolete)
- RFC 2045 (MIME)
- Quoted-printable encoder/decoder
- Online quoted-printable encoder/decoder
- Online quoted-printable encoder and decoder