Escape character
Encyclopedia
In computing
and telecommunication
, an escape character is a character
which invokes an alternative interpretation on subsequent characters in a character sequence. An escape character is a particular case of metacharacter
s. Generally, the judgement of whether something is an escape character or not depends on context.
There are usually two functions of escape sequences. The first is to encode a syntactic entity, such as device commands or special data which cannot be directly represented by the alphabet. The second use, referred to as character quoting, is to represent characters which cannot be typed in current context, or would have an undesired interpretation. In the latter case an escape sequence is a digraph consisting of an escape character itself and a "quoted" character.
s, nor vice versa. If we define control characters as non-graphic
, or as having a special meaning for an output device (e.g. printer
or text terminal) then any escape character for this device is a control one. But escape characters used in programming (see below) are graphic, hence are not control characters. Conversely most (but not all) of the ASCII "control characters" have some control function in isolation, therefore are not escape characters.
: \033, or ^[, or, in decimal, 27) is used in many output devices to start a series of characters called a control sequence or escape sequence
. Typically, the escape character was sent first in such a sequence to alert the device that the following characters were to be interpreted as a control sequence rather than as plain characters, then one or more characters would follow to specify some detailed action, after which the device would go back to interpreting characters normally. For example, the sequence of ^[, followed by the printable characters
VT102 terminal to move its cursor
to the 10th cell of the 2nd line of the screen. This was later developed to ANSI escape code
s covered by the ANSI
X3.64 standard. The escape character also starts each command sequence in the Hewlett Packard Printer Command Language
.
Early reference to the term "escape character" is found in Bob Bemer
's IBM technical publications. Apparently, it is he who invented this mechanism, during his work on the ASCII
character set.
The Escape key is usually found on standard PC keyboards. However it is commonly absent from keyboards for PDAs and other devices not designed primarily for ASCII communications, and not generally used as part of the common user interface for applications on the Windows operating system. Linux systems, or applications such as FireFox, often use the key as the functional equivalent to clicking on a Cancel button with a mouse. The DEC VT220
series was one of the few popular keyboards that did not have a dedicated Esc key, instead using one of the keys above the main keypad. In user interface
s of 1970s
–1980s
it was not uncommon to use this key as an escape character, but in modern desktop computers such use is dropped. Sometimes the key was identified with AltMode (for alternative mode). Even with no dedicated key, the escape character code could be generated by typing '[' while simultaneously holding down the Control key
, 'Ctrl'.
s specify the doublequote character (
for a string literal
. The backslash
(
In Perl
or Python
print "Nancy said "Hello World!" to the crowd.";
produces a syntax error, whereas:
print "Nancy said \"Hello World!\" to the crowd."; ### example of \"
produces the intended output.
Another alternative:
print "Nancy said \x22Hello World!\x22 to them."; ### example of \x22
uses numeric escape-sequence of hexadecimal "x22" for a quotemark. This would not produce the required text if run on a non-ASCII
machine.
C
, C++
, and Java
all allow exactly the same two backslash escape styles. The PostScript
language and Microsoft Rich Text Format
also use backslash escapes. The quoted-printable
encoding uses the equals sign
as an escape character.
URL and URI
use %
-escapes
to quote characters with a special meaning, as for non-ASCII characters. The ampersand
(
and XML
.
Another similar (and partially overlapping) syntactic trick is stropping.
Some programming languages also provide other ways to represent special characters in literals, without requiring an escape character (see e.g. delimiter collision).
uses the 0x
7D octet
(\175, or ASCII: } ) as an escape character. The octet immediately following should be XORed by 0x20 before being passed to a higher level protocol. This is applied to both 0x7D itself and the control character 0x7E (which is used in PPP to mark the beginning and end of a frame) when those octets need to be transmitted by a higher level protocol encapsulated by PPP, as well as other octets negotiated when the link is established. That is, when a higher level protocol wishes to transmit 0x7D, it is transmitted as the sequence 0x7D 0x5D, and 0x7E is transmitted as 0x7D 0x5E.
(sh), the asterisk
(
(
s expanded via globbing. Without a preceding escape character, an
that don't start with a period iff
there are such files, otherwise
uses a caret
character (
, though it supports similar syntax, does not support this.
For example, on the Windows Command Prompt, this will result in a syntax error.
echo
whereas this will output the string:
echo ^
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...
and telecommunication
Telecommunication
Telecommunication is the transmission of information over significant distances to communicate. In earlier times, telecommunications involved the use of visual signals, such as beacons, smoke signals, semaphore telegraphs, signal flags, and optical heliographs, or audio messages via coded...
, an escape character is a character
Character (computing)
In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language....
which invokes an alternative interpretation on subsequent characters in a character sequence. An escape character is a particular case of metacharacter
Metacharacter
A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression engine.-Examples:...
s. Generally, the judgement of whether something is an escape character or not depends on context.
Definition
Escape characters are part of the syntax for many programming languages, data formats and communication protocols. For a given alphabet an escape character's purpose is to start character sequences (so named escape sequences) which have to be interpreted differently from the same characters occurring alone. An escape character may not have its own meaning, so all escape sequences are of 2 or more characters.There are usually two functions of escape sequences. The first is to encode a syntactic entity, such as device commands or special data which cannot be directly represented by the alphabet. The second use, referred to as character quoting, is to represent characters which cannot be typed in current context, or would have an undesired interpretation. In the latter case an escape sequence is a digraph consisting of an escape character itself and a "quoted" character.
Escape character vs control character
Generally, an escape character is not a particular case of (device) control characterControl character
In computing and telecommunication, a control character or non-printing character is a code point in a character set, that does not in itself represent a written symbol.It is in-band signaling in the context of character encoding....
s, nor vice versa. If we define control characters as non-graphic
Graphic character
In ISO/IEC 646 and related standards including ISO 8859 and Unicode, a graphic character is any character intended to be written, printed, or otherwise displayed in a form that can be read by humans...
, or as having a special meaning for an output device (e.g. printer
Computer printer
In computing, a printer is a peripheral which produces a text or graphics of documents stored in electronic form, usually on physical print media such as paper or transparencies. Many printers are primarily used as local peripherals, and are attached by a printer cable or, in most new printers, a...
or text terminal) then any escape character for this device is a control one. But escape characters used in programming (see below) are graphic, hence are not control characters. Conversely most (but not all) of the ASCII "control characters" have some control function in isolation, therefore are not escape characters.
ASCII escape character
The ASCII "escape" character (octalOctal
The octal numeral system, or oct for short, is the base-8 number system, and uses the digits 0 to 7. Numerals can be made from binary numerals by grouping consecutive binary digits into groups of three...
: \033, or ^[, or, in decimal, 27) is used in many output devices to start a series of characters called a control sequence or escape sequence
Escape sequence
An escape sequence is a series of characters used to change the state of computers and their attached peripheral devices. These are also known as control sequences, reflecting their use in device control. Some control sequences are special characters that always have the same meaning...
. Typically, the escape character was sent first in such a sequence to alert the device that the following characters were to be interpreted as a control sequence rather than as plain characters, then one or more characters would follow to specify some detailed action, after which the device would go back to interpreting characters normally. For example, the sequence of ^[, followed by the printable characters
[2;10H
, would cause a DECDigital Equipment Corporation
Digital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...
VT102 terminal to move its cursor
Cursor (computers)
In computing, a cursor is an indicator used to show the position on a computer monitor or other display device that will respond to input from a text input or pointing device. The flashing text cursor may be referred to as a caret in some cases...
to the 10th cell of the 2nd line of the screen. This was later developed to ANSI escape code
ANSI escape code
ANSI escape sequences are characters embedded in the text used to control formatting, color, and other output options on video text terminals. Almost all terminal emulators designed to show text output from a remote computer, and to show text output from local software, interpret at least some of...
s covered by the ANSI
American National Standards Institute
The American National Standards Institute is a private non-profit organization that oversees the development of voluntary consensus standards for products, services, processes, systems, and personnel in the United States. The organization also coordinates U.S. standards with international...
X3.64 standard. The escape character also starts each command sequence in the Hewlett Packard Printer Command Language
Printer Command Language
Printer Command Language, more commonly referred to as PCL, is a page description language developed by Hewlett-Packard as a printer protocol and has become a de facto industry standard. Originally developed for early inkjet printers in 1984, PCL has been released in varying levels for thermal,...
.
Early reference to the term "escape character" is found in Bob Bemer
Bob Bemer
Robert William Bemer was a computer scientist best known for his work at IBM during the late 1950s and early 1960s.-Biography:...
's IBM technical publications. Apparently, it is he who invented this mechanism, during his work on the ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
character set.
The Escape key is usually found on standard PC keyboards. However it is commonly absent from keyboards for PDAs and other devices not designed primarily for ASCII communications, and not generally used as part of the common user interface for applications on the Windows operating system. Linux systems, or applications such as FireFox, often use the key as the functional equivalent to clicking on a Cancel button with a mouse. The DEC VT220
VT220
The VT220 was a terminal produced by Digital Equipment Corporation from 1983 to 1987.-Hardware:The VT220 improved on the earlier VT100 series of terminals with a redesigned keyboard, much smaller physical packaging, and a much faster microprocessor...
series was one of the few popular keyboards that did not have a dedicated Esc key, instead using one of the keys above the main keypad. In user interface
User interface
The user interface, in the industrial design field of human–machine interaction, is the space where interaction between humans and machines occurs. The goal of interaction between a human and a machine at the user interface is effective operation and control of the machine, and feedback from the...
s of 1970s
1970s
File:1970s decade montage.png|From left, clockwise: US President Richard Nixon doing the V for Victory sign after his resignation from office after the Watergate scandal in 1974; Refugees aboard a US naval boat after the Fall of Saigon, leading to the end of the Vietnam War in 1975; The 1973 oil...
–1980s
1980s
File:1980s decade montage.png|thumb|400px|From left, clockwise: The first Space Shuttle, Columbia, lifted off in 1981; American President Ronald Reagan and Soviet leader Mikhail Gorbachev eased tensions between the two superpowers, leading to the end of the Cold War; The Fall of the Berlin Wall in...
it was not uncommon to use this key as an escape character, but in modern desktop computers such use is dropped. Sometimes the key was identified with AltMode (for alternative mode). Even with no dedicated key, the escape character code could be generated by typing '[' while simultaneously holding down the Control key
Control key
In computing, a Control key is a modifier key which, when pressed in conjunction with another key, will perform a special operation ; similar to the Shift key, the Control key rarely performs any function when pressed by itself...
, 'Ctrl'.
Programming and data formats
Many modern programming languageProgramming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
s specify the doublequote character (
"
) as a delimiterDelimiter
A delimiter is a sequence of one or more characters used to specify the boundary between separate, independent regions in plain text or other data streams. An example of a delimiter is the comma character, which acts as a field delimiter in a sequence of comma-separated values.Delimiters represent...
for a string literal
String literal
A string literal is the representation of a string value within the source code of a computer program. There are numerous alternate notations for specifying string literals, and the exact notation depends on the individual programming language in question...
. The backslash
Backslash
The backslash is a typographical mark used mainly in computing. It was first introduced to computers in 1960 by Bob Bemer. Sometimes called a reverse solidus or a slosh, it is the mirror image of the common slash....
(
\
) escape character provides two ways to include doublequotes inside a string literal, either by modifying the meaning of the doublequote character embedded in the string (\"
becomes "
), or by modifying the meaning of the three characters that are the hexadecimal value of a doublequote character (\x22
becomes "
).In Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
or Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
print "Nancy said "Hello World!" to the crowd.";
produces a syntax error, whereas:
print "Nancy said \"Hello World!\" to the crowd."; ### example of \"
produces the intended output.
Another alternative:
print "Nancy said \x22Hello World!\x22 to them."; ### example of \x22
uses numeric escape-sequence of hexadecimal "x22" for a quotemark. This would not produce the required text if run on a non-ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
machine.
C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
, C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
, and Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
all allow exactly the same two backslash escape styles. The PostScript
PostScript
PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...
language and Microsoft Rich Text Format
Rich Text Format
The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation since 1987 for Microsoft products and for cross-platform document interchange....
also use backslash escapes. The quoted-printable
Quoted-printable
Quoted-printable, or QP encoding, is an encoding using printable ASCII characters to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean...
encoding uses the equals sign
Equals sign
The equality sign, equals sign, or "=" is a mathematical symbol used to indicate equality. It was invented in 1557 by Robert Recorde. The equals sign is placed between the things stated to have the same value, as in an equation...
as an escape character.
URL and URI
Úri
Úriis a village and commune in the comitatus of Pest in Hungary....
use %
Percent sign
The percent sign is the symbol used to indicate a percentage .Related signs include the permille sign ‰ and the permyriad sign , which indicate that a number is divided by one thousand or ten thousand respectively...
-escapes
Percent-encoding
Percent-encoding, also known as URL encoding, is a mechanism for encoding information in a Uniform Resource Identifier under certain circumstances. Although it is known as URL encoding it is, in fact, used more generally within the main Uniform Resource Identifier set, which includes both Uniform...
to quote characters with a special meaning, as for non-ASCII characters. The ampersand
Ampersand
An ampersand is a logogram representing the conjunction word "and". The symbol is a ligature of the letters in et, Latin for "and".-Etymology:...
(
&
) character may be considered as an escape character in SGML and derived formats such as HTMLHTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....
and XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
.
Another similar (and partially overlapping) syntactic trick is stropping.
Some programming languages also provide other ways to represent special characters in literals, without requiring an escape character (see e.g. delimiter collision).
Communication protocols
The Point-to-Point ProtocolPoint-to-Point Protocol
In networking, the Point-to-Point Protocol is a data link protocol commonly used in establishing a direct connection between two networking nodes...
uses the 0x
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
7D octet
Octet (computing)
An octet is a unit of digital information in computing and telecommunications that consists of eight bits. The term is often used when the term byte might be ambiguous, as there is no standard for the size of the byte.-Overview:...
(\175, or ASCII: } ) as an escape character. The octet immediately following should be XORed by 0x20 before being passed to a higher level protocol. This is applied to both 0x7D itself and the control character 0x7E (which is used in PPP to mark the beginning and end of a frame) when those octets need to be transmitted by a higher level protocol encapsulated by PPP, as well as other octets negotiated when the link is established. That is, when a higher level protocol wishes to transmit 0x7D, it is transmitted as the sequence 0x7D 0x5D, and 0x7E is transmitted as 0x7D 0x5E.
Bourne shell
In Bourne shellBourne shell
The Bourne shell, or sh, was the default Unix shell of Unix Version 7 and most Unix-like systems continue to have /bin/sh - which will be the Bourne shell, or a symbolic link or hard link to a compatible shell - even when more modern shells are used by most users.Developed by Stephen Bourne at AT&T...
(sh), the asterisk
Asterisk
An asterisk is a typographical symbol or glyph. It is so called because it resembles a conventional image of a star. Computer scientists and mathematicians often pronounce it as star...
(
*
) and question markQuestion mark
The question mark , is a punctuation mark that replaces the full stop at the end of an interrogative sentence in English and many other languages. The question mark is not used for indirect questions...
(
?
) characters are wildcard characterWildcard character
-Telecommunication:In telecommunications, a wildcard character is a character that may be substituted for any of a defined subset of all possible characters....
s expanded via globbing. Without a preceding escape character, an
*
will expand to the names of all files in the working directoryWorking directory
In computing, the working directory of a process is a directory of a hierarchical file system, if any, dynamically associated with each process. When the process refers to a file using a simple file name or relative path , the reference is interpreted relative to the current working directory of...
that don't start with a period iff
IFF
IFF, Iff or iff may refer to:Technology/Science:* Identification friend or foe, an electronic radio-based identification system using transponders...
there are such files, otherwise
*
remains unexpanded. So to refer to a file literally called "*", the shell must be told not to interpret it in this way, by preceding it with a backslash (\
). This modifies the interpretation of the asterisk (*
). Compare:
Windows Command Prompt
The Windows command-line interpreterCmd.exe
Command Prompt is the Microsoft-supplied command-line interpreter on OS/2, Windows CE and on Windows NT-based operating systems...
uses a caret
Caret
Caret usually refers to the spacing symbol ^ in ASCII and other character sets. In Unicode, however, the corresponding character is , whereas the Unicode character named caret is actually a similar but lowered symbol: ....
character (
^
) to escape reserved characters that have special meanings (in particular: & | < > ^
). The DOS command-line interpreterCOMMAND.COM
COMMAND.COM is the filename of the default operating system shell for DOS operating systems and the default command line interpreter on Windows 95, Windows 98 and Windows Me...
, though it supports similar syntax, does not support this.
For example, on the Windows Command Prompt, this will result in a syntax error.
echo
whereas this will output the string:
<wiki>
echo ^