String literal
Encyclopedia
A string literal is the representation of a string
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

 value within the source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...

 of a computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

. There are numerous alternate notations for specifying string literals, and the exact notation depends on the individual programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

 in question. Nevertheless, there are some
general guidelines that most modern programming languages follow.

Specifically, most string literals can be specified using:
  • declarative notation;
  • whitespace delimiters (indentation);
  • bracketed delimiters (quoting);
  • escape characters; or
  • a combination of some or all of the above

Declarative notation

In the original FORTRAN
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

 programming language (for example), string literals were written in so-called Hollerith notation, where a decimal count of the number of characters was followed by the letter H, and then the characters of the string:

35HAn example Hollerith string literal

This declarative notation style is contrasted with bracketed delimiter
Delimiter
A delimiter is a sequence of one or more characters used to specify the boundary between separate, independent regions in plain text or other data streams. An example of a delimiter is the comma character, which acts as a field delimiter in a sequence of comma-separated values.Delimiters represent...

 quoting, because it does
not require the use of balanced "bracketed" characters on either side of the string.

Advantages:
  • eliminates text searching (for the delimiter character) and therefore requires significantly less overhead
    Computational overhead
    In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal...

  • avoids the problem of delimiter collision
  • enables the inclusion of metacharacter
    Metacharacter
    A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression engine.-Examples:...

    s that might otherwise be mistaken as commands
  • can be used for quite effective data compression of plain text strings


Drawbacks:
  • this type of notation is error-prone if used as manual entry by programmer
    Programmer
    A programmer, computer programmer or coder is someone who writes computer software. The term computer programmer can refer to a specialist in one area of computer programming or to a generalist who writes code for many kinds of software. One who practices or professes a formal approach to...

    s

This is however not a drawback when the prefix is generated by an algorithm as most likely the case.

Whitespace delimiters

In YAML
YAML
YAML is a human-readable data serialization format that takes concepts from programming languages such as C, Perl, and Python, and ideas from XML and the data format of electronic mail . YAML was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net and Oren Ben-Kiki...

, string literals may be specified by the relative positioning of whitespace and
indentation.

- title: An example multi-line string in YAML
body : |
This is a multi-line string.
"special" metacharacters may
appear here. The extent of this string is
indicated by indentation.

Bracketed delimiters

Most modern programming languages use bracket delimiters (also balanced delimiters, or quoting)
to specify string literals. Double quotations
Quotation mark
Quotation marks or inverted commas are punctuation marks at the beginning and end of a quotation, direct speech, literal title or name. Quotation marks can also be used to indicate a different meaning of a word or phrase than the one typically associated with it and are often used to express irony...

 are the most common quoting delimiters used:

"Hi There!"

Some languages also allow the use of single quotations as an alternative to double quotations (though the string must begin and end with the same kind of quotation mark):

'Hi There!'

Note that these quotation marks are unpaired (the same character is used as an opener and a closer), which is a hangover from the typewriter
Typewriter
A typewriter is a mechanical or electromechanical device with keys that, when pressed, cause characters to be printed on a medium, usually paper. Typically one character is printed per keypress, and the machine prints the characters by making ink impressions of type elements similar to the pieces...

 technology which was the precursor of the earliest computer input and output devices. The Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 character set includes paired (separate opening and closing) versions of both single and double quotations, used in text, mostly in other languages than English:

“Hi There!”
‘Hi There!’
„Hi There!“
« Hi There! »

The paired double quotations can be used in Visual Basic .NET
Visual Basic .NET
Visual Basic .NET , is an object-oriented computer programming language that can be viewed as an evolution of the classic Visual Basic , which is implemented on the .NET Framework...

, but many other programming languages will not accept them. Unpaired marks are preferred for compatibility - many web browsers, text editors, and other tools will not correctly display unicode paired quotes, and so even in languages where they are permitted, many projects forbid their use for source code.

The PostScript
PostScript
PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...

 programming language uses parentheses, with embedded newlines allowed,
and also embedded unescaped parentheses provided they are properly paired:

(The quick
(brown
fox))

Similarly, the Tcl
Tcl
Tcl is a scripting language created by John Ousterhout. Originally "born out of frustration", according to the author, with programmers devising their own languages intended to be embedded into applications, Tcl gained acceptance on its own...

 programming language uses braces (embedded newlines allowed, embedded unescaped braces allowed provided properly paired):

{The quick
{brown
fox}}

On one hand, this practice is derived from the single quotations in Unix shells (these are raw strings) and, on the other, from the use of braces in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

 for compound statements, since blocks of code is in Tcl syntactically the same thing as string literals. That the delimiters are paired is essential for making this feasible.

Delimiter collision

Delimiter collision is a common problem for string literal notations that use
balanced delimiters and quoting. The problem occurs when a programmer attempts to use a quoting character as part of the string literal itself. Because this is a very common problem, a number of methods for avoiding delimiter collision have been invented.

Dual quoting style

Some languages (e.g., Modula-2
Modula-2
Modula-2 is a computer programming language designed and developed between 1977 and 1980 by Niklaus Wirth at ETH Zurich as a revision of Pascal to serve as the sole programming language for the operating system and application software for the personal workstation Lilith...

, JavaScript
JavaScript
JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....

, and Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

) attempt to avoid the delimiter collision problem by allowing a dual quoting
style. Typically, this consists of allowing the programmer to use either single quotations
or double quotations interchangeably.

"This is John's apple."
'I said, "Can you hear me?"'

One problem with dual quoting is that it doesn't allow for the inclusion of both styles
of quotations at once within the same literal (unless escaped, see below).

Some programming languages allow subtle variations on dual quoting, treating single quotations
and double quotations slightly differently (e.g. sh
Bourne shell
The Bourne shell, or sh, was the default Unix shell of Unix Version 7 and most Unix-like systems continue to have /bin/sh - which will be the Bourne shell, or a symbolic link or hard link to a compatible shell - even when more modern shells are used by most users.Developed by Stephen Bourne at AT&T...

, Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

).

Escape character

One method for avoiding delimiter collision is to use escape character
Escape character
In computing and telecommunication, an escape character is a character which invokes an alternative interpretation on subsequent characters in a character sequence. An escape character is a particular case of metacharacters...

s:

"I said, \"Can you hear me?\""

The most commonly-used escape character for this purpose is the backslash "\",
the tradition for which originated on Unix. From a language design standpoint, this
approach is adequate, but there are drawbacks:
  • text can be rendered unreadable when littered with numerous escape characters
  • escape characters are required to be escaped, when not intended as escape characters
  • although easy to type, they can be cryptic to someone unfamiliar with the language
  • when entering a string literal intended to be evaluated by another program - such as a sh
    SH
    - Places :* Saint Helena Island ISO 3166 digram and FIPS PUB 10-4 territory code** .sh, the country code top-level domain of Saint Helena* the deprecated ISO 639-1 code for Serbo-Croatian* Schleswig-Holstein, Germany* Shanghai, China...

     command inside a Perl
    Perl
    Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

     script - characters may need to be escaped twice or three times


"I said, \"The Windows path is C:\\Foo\\Bar\\Baz\""

The confusing presence of too many escape and slash characters in a string is commonly disparaged as leaning toothpick syndrome
Leaning toothpick syndrome
In computer programming, leaning toothpick syndrome is the situation in which a quoted expression becomes unreadable because it contains a large number of escape characters, usually backslashes , to avoid delimiter collision....

.

Escape sequence

An extended concept of the escape character, an escape sequence is also a means of avoiding
delimiter collision. An escape sequence consists of two or more consecutive characters that can have
special meaning when used in the context of a string literal. For example, programming languages such as Perl, Python and Ruby support the following:

"I said, \x22Can you hear me?\x22"

Escape sequences can also be used for purposes other than avoiding delimiter collision, and
can also include metacharacters. (see Metacharacters below).

Double-up escape sequence

Some languages (such as Pascal
Pascal (programming language)
Pascal is an influential imperative and procedural programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structuring.A derivative known as Object Pascal...

, BASIC
BASIC
BASIC is a family of general-purpose, high-level programming languages whose design philosophy emphasizes ease of use - the name is an acronym from Beginner's All-purpose Symbolic Instruction Code....

, DCL
DIGITAL Command Language
DCL, the DIGITAL Command Language, is the standard command languageadopted by most of the operating systems that were sold by the former Digital Equipment Corporation...

, Smalltalk
Smalltalk
Smalltalk is an object-oriented, dynamically typed, reflective programming language. Smalltalk was created as the language to underpin the "new world" of computing exemplified by "human–computer symbiosis." It was designed and created in part for educational use, more so for constructionist...

, and SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

) avoid delimiter collision
by doubling up on the quotation marks that are intended to be part of the string literal
itself:


'This Pascal stringcontains two apostrophes
"I said, ""Can you hear me?"""

Extended quoting styles

Some languages extend the previously-mentioned quoting conventions even further. These extended approaches provide an even more flexible style of notation for avoiding delimiter collision.

Triple quoting:
One such extension, the use of triple quoting, is used in Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

:

This is John's apple.

"""John is Nancy's so-called "boyfriend"."""

Triple quoted string literals may be delimited by """ or
. Triple quoting in Python also has the added benefit of allowing string literals to span more than one physical line of source code.

Multiple quoting:
Another such extension is the use of multiple quoting, which allows the author to choose which characters should specify the bounds of a string literal.

For example in Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

:

qq^I said, "Can you hear me?"^

qq@I said, "Can you hear me?"@

qq§I said, "Can you hear me?"§

all produce the desired result.
Although this notation is more flexible, few languages support it. Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...


and Ruby
Ruby (programming language)
Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...

 are two that do.

Here documents

A Here document is an alternate quoting notation that allows the programmer
to specify an arbitrary unique identifier as a content boundary for a string literal.
This avoids delimiter collision, and also preserves newlines in the source code
as newlines in the string literal itself.

Concatenation

In some languages (e.g., BASIC
BASIC
BASIC is a family of general-purpose, high-level programming languages whose design philosophy emphasizes ease of use - the name is an acronym from Beginner's All-purpose Symbolic Instruction Code....

) there is no provision for escape sequences or any of the workarounds discussed above. To place a string delimiter character in a string, it is necessary to use string concatenation. The following example shows how this might be done in BASIC on a system using ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

:

"I said, " + CHR$(34) + "Can you hear me?" + CHR$(34)

Here, the CHR$ function returns the character corresponding to its argument; in ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 the quotation mark has the value 34.

Metacharacters

Many languages support the use of metacharacter
Metacharacter
A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression engine.-Examples:...

s inside string literals. Metacharacters
have varying interpretations depending on the context and language, but are generally a kind
of 'processing command' for representing printing or nonprinting characters.

For instance, in a C string
C string
In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character...

 literal, if the backslash is followed
by a letter such as "b", "n" or "t", then this represents a nonprinting backspace, newline
or tab character respectively. Or if the backslash is followed by 1-3 octal
Octal
The octal numeral system, or oct for short, is the base-8 number system, and uses the digits 0 to 7. Numerals can be made from binary numerals by grouping consecutive binary digits into groups of three...

 digits,
then this sequence is interpreted as representing the arbitrary character with the specified
ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 code. This was later extended to allow more modern hexadecimal
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

 character code notation:

"I said,\t\t\x22Can you hear me?\x22\n"

Raw strings

A few languages provide a method of specifying that a literal is to be processed without any language-specific interpretation.

For example, in Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

 'raw strings' are preceded by an r. In such strings backslashes are not interpreted as escape sequences, making it simpler to write DOS/Windows paths
Path (computing)
A path, the general form of a filename or of a directory name, specifies a unique location in a file system. A path points to a file system location by following the directory tree hierarchy expressed in a string of characters in which path components, separated by a delimiting character, represent...

 and regular expressions:
r"The Windows path is C:\Foo\Bar\Baz\ "

C#'s notation is called @-quoting:

@"C:\Foo\Bar\Baz\"

Which also allows double-up quotations:

@"I said, ""Hello there."""


In XML documents, CDATA sections allows use of characters such as & and < without an XML parser attempting to interpret them as part of the structure of the document itself. This can be useful when including literal text and scripting code, to keep the document well formed.



Variable interpolation

Languages differ on whether and how to interpret string literals as either
'raw' or 'variable interpolated'. Variable interpolation is the process
of evaluating an expression containing one or more variables, and returning
output where the variables are replaced with their corresponding values in
memory.
In sh-compatible Unix shells
Bourne shell
The Bourne shell, or sh, was the default Unix shell of Unix Version 7 and most Unix-like systems continue to have /bin/sh - which will be the Bourne shell, or a symbolic link or hard link to a compatible shell - even when more modern shells are used by most users.Developed by Stephen Bourne at AT&T...

, quotation-delimited (") strings are interpolated, while apostrophe-delimited (') strings are not.

For example, the following Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

 code:


$name = "Nancy";
$greeting = "Hello World";
print "$name said $greeting to the crowd of people.";


produces the output:

Nancy said Hello World to the crowd of people.

The sigil
Sigil (computer programming)
In computer programming, a sigil is a symbol attached to a variable name, showing the variable's datatype or scope. In 1999 Philip Gwyn adopted the term "to mean the funny character at the front of a Perl variable".- Historical context:...

 character ($) is interpreted to indicate variable
interpolation.

Similarly, the printf
Printf
Printf format string refers to a control parameter used by a class of functions typically associated with some types of programming languages. The format string specifies a method for rendering an arbitrary number of varied data type parameter into a string...

function produces the same output
using notation such as:

printf "%s said %s to the crowd of people.", $name, $greeting;

The metacharacters (%s) indicate variable interpolation.

This is contrasted with "raw" strings:

print '$name said $greeting to the crowd of people.';

which produce output like:

$name said $greeting to the crowd of people.

Here the $ characters are not sigils
Sigil (computer programming)
In computer programming, a sigil is a symbol attached to a variable name, showing the variable's datatype or scope. In 1999 Philip Gwyn adopted the term "to mean the funny character at the front of a Perl variable".- Historical context:...

, and are not interpreted to have any meaning other than plain text.

Binary and hexadecimal strings

REXX
REXX
REXX is an interpreted programming language that was developed at IBM. It is a structured high-level programming language that was designed to be both easy to learn and easy to read...

 uses suffix characters to specify characters or strings using their hexadecimal or binary code. E.g.,


'20'x
"0010 0000"b
"00100000"b

all yield the space character, avoiding the function call X2C(20).

Embedding source code in string literals

Languages that lack flexibility in specifying string literals make
it particularly cumbersome to write programming code that generates
other programming code. This is particularly true when the generation
language is the same or similar to the output language.

for example:
  • writing code to produce quines
  • generating an output language from within a web template
    Web template
    A web template is a tool used to separate content from presentation in web design, and for mass-production of web documents. It is a basic component of a web template system.Web templates can be used to set up any type of website...

    ;
  • using XSLT
    XSLT
    XSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...

     to generate XSLT, or SQL
    SQL
    SQL is a programming language designed for managing data in relational database management systems ....

     to generate more SQL
  • generating a PostScript
    PostScript
    PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...

     representation of a document for printing purposes, from within a document-processing application written in C
    C (programming language)
    C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

     or some other language.


Nevertheless, some languages are particularly well-adapted to produce
this sort of self-similar output, especially those that support multiple options
for avoiding delimiter collision.

Using string literals as code that generates
other code may have adverse security implications, especially if the output is based at least partially on untrusted
user input. This is particularly acute in the case of Web-based applications, where malicious users can take advantage of such weaknesses to subvert the operation of the application, for example by mounting an SQL injection
SQL injection
A SQL injection is often used to attack the security of a website by inputting SQL statements in a web form to get a badly designed website in order to dump the database content to the attacker. SQL injection is a code injection technique that exploits a security vulnerability in a website's software...

 attack.
A string literal is the representation of a string
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

 value within the source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...

 of a computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

. There are numerous alternate notations for specifying string literals, and the exact notation depends on the individual programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

 in question. Nevertheless, there are some
general guidelines that most modern programming languages follow.

Specifically, most string literals can be specified using:
  • declarative notation;
  • whitespace delimiters (indentation);
  • bracketed delimiters (quoting);
  • escape characters; or
  • a combination of some or all of the above

Declarative notation

In the original FORTRAN
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

 programming language (for example), string literals were written in so-called Hollerith notation, where a decimal count of the number of characters was followed by the letter H, and then the characters of the string:

35HAn example Hollerith string literal

This declarative notation style is contrasted with bracketed delimiter
Delimiter
A delimiter is a sequence of one or more characters used to specify the boundary between separate, independent regions in plain text or other data streams. An example of a delimiter is the comma character, which acts as a field delimiter in a sequence of comma-separated values.Delimiters represent...

 quoting, because it does
not require the use of balanced "bracketed" characters on either side of the string.

Advantages:
  • eliminates text searching (for the delimiter character) and therefore requires significantly less overhead
    Computational overhead
    In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal...

  • avoids the problem of delimiter collision
  • enables the inclusion of metacharacter
    Metacharacter
    A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression engine.-Examples:...

    s that might otherwise be mistaken as commands
  • can be used for quite effective data compression of plain text strings


Drawbacks:
  • this type of notation is error-prone if used as manual entry by programmer
    Programmer
    A programmer, computer programmer or coder is someone who writes computer software. The term computer programmer can refer to a specialist in one area of computer programming or to a generalist who writes code for many kinds of software. One who practices or professes a formal approach to...

    s

This is however not a drawback when the prefix is generated by an algorithm as most likely the case.

Whitespace delimiters

In YAML
YAML
YAML is a human-readable data serialization format that takes concepts from programming languages such as C, Perl, and Python, and ideas from XML and the data format of electronic mail . YAML was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net and Oren Ben-Kiki...

, string literals may be specified by the relative positioning of whitespace and
indentation.

- title: An example multi-line string in YAML
body : |
This is a multi-line string.
"special" metacharacters may
appear here. The extent of this string is
indicated by indentation.

Bracketed delimiters

Most modern programming languages use bracket delimiters (also balanced delimiters, or quoting)
to specify string literals. Double quotations
Quotation mark
Quotation marks or inverted commas are punctuation marks at the beginning and end of a quotation, direct speech, literal title or name. Quotation marks can also be used to indicate a different meaning of a word or phrase than the one typically associated with it and are often used to express irony...

 are the most common quoting delimiters used:

"Hi There!"

Some languages also allow the use of single quotations as an alternative to double quotations (though the string must begin and end with the same kind of quotation mark):

'Hi There!'

Note that these quotation marks are unpaired (the same character is used as an opener and a closer), which is a hangover from the typewriter
Typewriter
A typewriter is a mechanical or electromechanical device with keys that, when pressed, cause characters to be printed on a medium, usually paper. Typically one character is printed per keypress, and the machine prints the characters by making ink impressions of type elements similar to the pieces...

 technology which was the precursor of the earliest computer input and output devices. The Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 character set includes paired (separate opening and closing) versions of both single and double quotations, used in text, mostly in other languages than English:

“Hi There!”
‘Hi There!’
„Hi There!“
« Hi There! »

The paired double quotations can be used in Visual Basic .NET
Visual Basic .NET
Visual Basic .NET , is an object-oriented computer programming language that can be viewed as an evolution of the classic Visual Basic , which is implemented on the .NET Framework...

, but many other programming languages will not accept them. Unpaired marks are preferred for compatibility - many web browsers, text editors, and other tools will not correctly display unicode paired quotes, and so even in languages where they are permitted, many projects forbid their use for source code.

The PostScript
PostScript
PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...

 programming language uses parentheses, with embedded newlines allowed,
and also embedded unescaped parentheses provided they are properly paired:

(The quick
(brown
fox))

Similarly, the Tcl
Tcl
Tcl is a scripting language created by John Ousterhout. Originally "born out of frustration", according to the author, with programmers devising their own languages intended to be embedded into applications, Tcl gained acceptance on its own...

 programming language uses braces (embedded newlines allowed, embedded unescaped braces allowed provided properly paired):

{The quick
{brown
fox}}

On one hand, this practice is derived from the single quotations in Unix shells (these are raw strings) and, on the other, from the use of braces in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

 for compound statements, since blocks of code is in Tcl syntactically the same thing as string literals. That the delimiters are paired is essential for making this feasible.

Delimiter collision

Delimiter collision is a common problem for string literal notations that use
balanced delimiters and quoting. The problem occurs when a programmer attempts to use a quoting character as part of the string literal itself. Because this is a very common problem, a number of methods for avoiding delimiter collision have been invented.

Dual quoting style

Some languages (e.g., Modula-2
Modula-2
Modula-2 is a computer programming language designed and developed between 1977 and 1980 by Niklaus Wirth at ETH Zurich as a revision of Pascal to serve as the sole programming language for the operating system and application software for the personal workstation Lilith...

, JavaScript
JavaScript
JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....

, and Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

) attempt to avoid the delimiter collision problem by allowing a dual quoting
style. Typically, this consists of allowing the programmer to use either single quotations
or double quotations interchangeably.

"This is John's apple."
'I said, "Can you hear me?"'

One problem with dual quoting is that it doesn't allow for the inclusion of both styles
of quotations at once within the same literal (unless escaped, see below).

Some programming languages allow subtle variations on dual quoting, treating single quotations
and double quotations slightly differently (e.g. sh
Bourne shell
The Bourne shell, or sh, was the default Unix shell of Unix Version 7 and most Unix-like systems continue to have /bin/sh - which will be the Bourne shell, or a symbolic link or hard link to a compatible shell - even when more modern shells are used by most users.Developed by Stephen Bourne at AT&T...

, Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

).

Escape character

One method for avoiding delimiter collision is to use escape character
Escape character
In computing and telecommunication, an escape character is a character which invokes an alternative interpretation on subsequent characters in a character sequence. An escape character is a particular case of metacharacters...

s:

"I said, \"Can you hear me?\""

The most commonly-used escape character for this purpose is the backslash "\",
the tradition for which originated on Unix. From a language design standpoint, this
approach is adequate, but there are drawbacks:
  • text can be rendered unreadable when littered with numerous escape characters
  • escape characters are required to be escaped, when not intended as escape characters
  • although easy to type, they can be cryptic to someone unfamiliar with the language
  • when entering a string literal intended to be evaluated by another program - such as a sh
    SH
    - Places :* Saint Helena Island ISO 3166 digram and FIPS PUB 10-4 territory code** .sh, the country code top-level domain of Saint Helena* the deprecated ISO 639-1 code for Serbo-Croatian* Schleswig-Holstein, Germany* Shanghai, China...

     command inside a Perl
    Perl
    Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

     script - characters may need to be escaped twice or three times


"I said, \"The Windows path is C:\\Foo\\Bar\\Baz\""

The confusing presence of too many escape and slash characters in a string is commonly disparaged as leaning toothpick syndrome
Leaning toothpick syndrome
In computer programming, leaning toothpick syndrome is the situation in which a quoted expression becomes unreadable because it contains a large number of escape characters, usually backslashes , to avoid delimiter collision....

.

Escape sequence

An extended concept of the escape character, an escape sequence is also a means of avoiding
delimiter collision. An escape sequence consists of two or more consecutive characters that can have
special meaning when used in the context of a string literal. For example, programming languages such as Perl, Python and Ruby support the following:

"I said, \x22Can you hear me?\x22"

Escape sequences can also be used for purposes other than avoiding delimiter collision, and
can also include metacharacters. (see Metacharacters below).

Double-up escape sequence

Some languages (such as Pascal
Pascal (programming language)
Pascal is an influential imperative and procedural programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structuring.A derivative known as Object Pascal...

, BASIC
BASIC
BASIC is a family of general-purpose, high-level programming languages whose design philosophy emphasizes ease of use - the name is an acronym from Beginner's All-purpose Symbolic Instruction Code....

, DCL
DIGITAL Command Language
DCL, the DIGITAL Command Language, is the standard command languageadopted by most of the operating systems that were sold by the former Digital Equipment Corporation...

, Smalltalk
Smalltalk
Smalltalk is an object-oriented, dynamically typed, reflective programming language. Smalltalk was created as the language to underpin the "new world" of computing exemplified by "human–computer symbiosis." It was designed and created in part for educational use, more so for constructionist...

, and SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

) avoid delimiter collision
by doubling up on the quotation marks that are intended to be part of the string literal
itself:


'This Pascal stringcontains two apostrophes
"I said, ""Can you hear me?"""

Extended quoting styles

Some languages extend the previously-mentioned quoting conventions even further. These extended approaches provide an even more flexible style of notation for avoiding delimiter collision.

Triple quoting:
One such extension, the use of triple quoting, is used in Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

:

This is John's apple.

"""John is Nancy's so-called "boyfriend"."""

Triple quoted string literals may be delimited by """ or
. Triple quoting in Python also has the added benefit of allowing string literals to span more than one physical line of source code.

Multiple quoting:
Another such extension is the use of multiple quoting, which allows the author to choose which characters should specify the bounds of a string literal.

For example in Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

:

qq^I said, "Can you hear me?"^

qq@I said, "Can you hear me?"@

qq§I said, "Can you hear me?"§

all produce the desired result.
Although this notation is more flexible, few languages support it. Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...


and Ruby
Ruby (programming language)
Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...

 are two that do.

Here documents

A Here document is an alternate quoting notation that allows the programmer
to specify an arbitrary unique identifier as a content boundary for a string literal.
This avoids delimiter collision, and also preserves newlines in the source code
as newlines in the string literal itself.

Concatenation

In some languages (e.g., BASIC
BASIC
BASIC is a family of general-purpose, high-level programming languages whose design philosophy emphasizes ease of use - the name is an acronym from Beginner's All-purpose Symbolic Instruction Code....

) there is no provision for escape sequences or any of the workarounds discussed above. To place a string delimiter character in a string, it is necessary to use string concatenation. The following example shows how this might be done in BASIC on a system using ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

:

"I said, " + CHR$(34) + "Can you hear me?" + CHR$(34)

Here, the CHR$ function returns the character corresponding to its argument; in ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 the quotation mark has the value 34.

Metacharacters

Many languages support the use of metacharacter
Metacharacter
A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression engine.-Examples:...

s inside string literals. Metacharacters
have varying interpretations depending on the context and language, but are generally a kind
of 'processing command' for representing printing or nonprinting characters.

For instance, in a C string
C string
In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character...

 literal, if the backslash is followed
by a letter such as "b", "n" or "t", then this represents a nonprinting backspace, newline
or tab character respectively. Or if the backslash is followed by 1-3 octal
Octal
The octal numeral system, or oct for short, is the base-8 number system, and uses the digits 0 to 7. Numerals can be made from binary numerals by grouping consecutive binary digits into groups of three...

 digits,
then this sequence is interpreted as representing the arbitrary character with the specified
ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 code. This was later extended to allow more modern hexadecimal
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

 character code notation:

"I said,\t\t\x22Can you hear me?\x22\n"

Raw strings

A few languages provide a method of specifying that a literal is to be processed without any language-specific interpretation.

For example, in Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

 'raw strings' are preceded by an r. In such strings backslashes are not interpreted as escape sequences, making it simpler to write DOS/Windows paths
Path (computing)
A path, the general form of a filename or of a directory name, specifies a unique location in a file system. A path points to a file system location by following the directory tree hierarchy expressed in a string of characters in which path components, separated by a delimiting character, represent...

 and regular expressions:
r"The Windows path is C:\Foo\Bar\Baz\ "

C#'s notation is called @-quoting:

@"C:\Foo\Bar\Baz\"

Which also allows double-up quotations:

@"I said, ""Hello there."""


In XML documents, CDATA sections allows use of characters such as & and < without an XML parser attempting to interpret them as part of the structure of the document itself. This can be useful when including literal text and scripting code, to keep the document well formed.






Variable interpolation

Languages differ on whether and how to interpret string literals as either
'raw' or 'variable interpolated'. Variable interpolation is the process
of evaluating an expression containing one or more variables, and returning
output where the variables are replaced with their corresponding values in
memory.
In sh-compatible Unix shells
Bourne shell
The Bourne shell, or sh, was the default Unix shell of Unix Version 7 and most Unix-like systems continue to have /bin/sh - which will be the Bourne shell, or a symbolic link or hard link to a compatible shell - even when more modern shells are used by most users.Developed by Stephen Bourne at AT&T...

, quotation-delimited (") strings are interpolated, while apostrophe-delimited (') strings are not.

For example, the following Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

 code:


$name = "Nancy";
$greeting = "Hello World";
print "$name said $greeting to the crowd of people.";


produces the output:

Nancy said Hello World to the crowd of people.

The sigil
Sigil (computer programming)
In computer programming, a sigil is a symbol attached to a variable name, showing the variable's datatype or scope. In 1999 Philip Gwyn adopted the term "to mean the funny character at the front of a Perl variable".- Historical context:...

 character ($) is interpreted to indicate variable
interpolation.

Similarly, the printf
Printf
Printf format string refers to a control parameter used by a class of functions typically associated with some types of programming languages. The format string specifies a method for rendering an arbitrary number of varied data type parameter into a string...

function produces the same output
using notation such as:

printf "%s said %s to the crowd of people.", $name, $greeting;

The metacharacters (%s) indicate variable interpolation.

This is contrasted with "raw" strings:

print '$name said $greeting to the crowd of people.';

which produce output like:

$name said $greeting to the crowd of people.

Here the $ characters are not sigils
Sigil (computer programming)
In computer programming, a sigil is a symbol attached to a variable name, showing the variable's datatype or scope. In 1999 Philip Gwyn adopted the term "to mean the funny character at the front of a Perl variable".- Historical context:...

, and are not interpreted to have any meaning other than plain text.

Binary and hexadecimal strings

REXX
REXX
REXX is an interpreted programming language that was developed at IBM. It is a structured high-level programming language that was designed to be both easy to learn and easy to read...

 uses suffix characters to specify characters or strings using their hexadecimal or binary code. E.g.,


'20'x
"0010 0000"b
"00100000"b

all yield the space character, avoiding the function call X2C(20).

Embedding source code in string literals

Languages that lack flexibility in specifying string literals make
it particularly cumbersome to write programming code that generates
other programming code. This is particularly true when the generation
language is the same or similar to the output language.

for example:
  • writing code to produce quines
  • generating an output language from within a web template
    Web template
    A web template is a tool used to separate content from presentation in web design, and for mass-production of web documents. It is a basic component of a web template system.Web templates can be used to set up any type of website...

    ;
  • using XSLT
    XSLT
    XSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...

     to generate XSLT, or SQL
    SQL
    SQL is a programming language designed for managing data in relational database management systems ....

     to generate more SQL
  • generating a PostScript
    PostScript
    PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...

     representation of a document for printing purposes, from within a document-processing application written in C
    C (programming language)
    C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

     or some other language.


Nevertheless, some languages are particularly well-adapted to produce
this sort of self-similar output, especially those that support multiple options
for avoiding delimiter collision.

Using string literals as code that generates
other code may have adverse security implications, especially if the output is based at least partially on untrusted
user input. This is particularly acute in the case of Web-based applications, where malicious users can take advantage of such weaknesses to subvert the operation of the application, for example by mounting an SQL injection
SQL injection
A SQL injection is often used to attack the security of a website by inputting SQL statements in a web form to get a badly designed website in order to dump the database content to the attacker. SQL injection is a code injection technique that exploits a security vulnerability in a website's software...

 attack.
A string literal is the representation of a string
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

 value within the source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...

 of a computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

. There are numerous alternate notations for specifying string literals, and the exact notation depends on the individual programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

 in question. Nevertheless, there are some
general guidelines that most modern programming languages follow.

Specifically, most string literals can be specified using:
  • declarative notation;
  • whitespace delimiters (indentation);
  • bracketed delimiters (quoting);
  • escape characters; or
  • a combination of some or all of the above

Declarative notation

In the original FORTRAN
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

 programming language (for example), string literals were written in so-called Hollerith notation, where a decimal count of the number of characters was followed by the letter H, and then the characters of the string:

35HAn example Hollerith string literal

This declarative notation style is contrasted with bracketed delimiter
Delimiter
A delimiter is a sequence of one or more characters used to specify the boundary between separate, independent regions in plain text or other data streams. An example of a delimiter is the comma character, which acts as a field delimiter in a sequence of comma-separated values.Delimiters represent...

 quoting, because it does
not require the use of balanced "bracketed" characters on either side of the string.

Advantages:
  • eliminates text searching (for the delimiter character) and therefore requires significantly less overhead
    Computational overhead
    In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal...

  • avoids the problem of delimiter collision
  • enables the inclusion of metacharacter
    Metacharacter
    A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression engine.-Examples:...

    s that might otherwise be mistaken as commands
  • can be used for quite effective data compression of plain text strings


Drawbacks:
  • this type of notation is error-prone if used as manual entry by programmer
    Programmer
    A programmer, computer programmer or coder is someone who writes computer software. The term computer programmer can refer to a specialist in one area of computer programming or to a generalist who writes code for many kinds of software. One who practices or professes a formal approach to...

    s

This is however not a drawback when the prefix is generated by an algorithm as most likely the case.

Whitespace delimiters

In YAML
YAML
YAML is a human-readable data serialization format that takes concepts from programming languages such as C, Perl, and Python, and ideas from XML and the data format of electronic mail . YAML was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net and Oren Ben-Kiki...

, string literals may be specified by the relative positioning of whitespace and
indentation.

- title: An example multi-line string in YAML
body : |
This is a multi-line string.
"special" metacharacters may
appear here. The extent of this string is
indicated by indentation.

Bracketed delimiters

Most modern programming languages use bracket delimiters (also balanced delimiters, or quoting)
to specify string literals. Double quotations
Quotation mark
Quotation marks or inverted commas are punctuation marks at the beginning and end of a quotation, direct speech, literal title or name. Quotation marks can also be used to indicate a different meaning of a word or phrase than the one typically associated with it and are often used to express irony...

 are the most common quoting delimiters used:

"Hi There!"

Some languages also allow the use of single quotations as an alternative to double quotations (though the string must begin and end with the same kind of quotation mark):

'Hi There!'

Note that these quotation marks are unpaired (the same character is used as an opener and a closer), which is a hangover from the typewriter
Typewriter
A typewriter is a mechanical or electromechanical device with keys that, when pressed, cause characters to be printed on a medium, usually paper. Typically one character is printed per keypress, and the machine prints the characters by making ink impressions of type elements similar to the pieces...

 technology which was the precursor of the earliest computer input and output devices. The Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 character set includes paired (separate opening and closing) versions of both single and double quotations, used in text, mostly in other languages than English:

“Hi There!”
‘Hi There!’
„Hi There!“
« Hi There! »

The paired double quotations can be used in Visual Basic .NET
Visual Basic .NET
Visual Basic .NET , is an object-oriented computer programming language that can be viewed as an evolution of the classic Visual Basic , which is implemented on the .NET Framework...

, but many other programming languages will not accept them. Unpaired marks are preferred for compatibility - many web browsers, text editors, and other tools will not correctly display unicode paired quotes, and so even in languages where they are permitted, many projects forbid their use for source code.

The PostScript
PostScript
PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...

 programming language uses parentheses, with embedded newlines allowed,
and also embedded unescaped parentheses provided they are properly paired:

(The quick
(brown
fox))

Similarly, the Tcl
Tcl
Tcl is a scripting language created by John Ousterhout. Originally "born out of frustration", according to the author, with programmers devising their own languages intended to be embedded into applications, Tcl gained acceptance on its own...

 programming language uses braces (embedded newlines allowed, embedded unescaped braces allowed provided properly paired):

{The quick
{brown
fox}}

On one hand, this practice is derived from the single quotations in Unix shells (these are raw strings) and, on the other, from the use of braces in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

 for compound statements, since blocks of code is in Tcl syntactically the same thing as string literals. That the delimiters are paired is essential for making this feasible.

Delimiter collision

Delimiter collision is a common problem for string literal notations that use
balanced delimiters and quoting. The problem occurs when a programmer attempts to use a quoting character as part of the string literal itself. Because this is a very common problem, a number of methods for avoiding delimiter collision have been invented.

Dual quoting style

Some languages (e.g., Modula-2
Modula-2
Modula-2 is a computer programming language designed and developed between 1977 and 1980 by Niklaus Wirth at ETH Zurich as a revision of Pascal to serve as the sole programming language for the operating system and application software for the personal workstation Lilith...

, JavaScript
JavaScript
JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....

, and Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

) attempt to avoid the delimiter collision problem by allowing a dual quoting
style. Typically, this consists of allowing the programmer to use either single quotations
or double quotations interchangeably.

"This is John's apple."
'I said, "Can you hear me?"'

One problem with dual quoting is that it doesn't allow for the inclusion of both styles
of quotations at once within the same literal (unless escaped, see below).

Some programming languages allow subtle variations on dual quoting, treating single quotations
and double quotations slightly differently (e.g. sh
Bourne shell
The Bourne shell, or sh, was the default Unix shell of Unix Version 7 and most Unix-like systems continue to have /bin/sh - which will be the Bourne shell, or a symbolic link or hard link to a compatible shell - even when more modern shells are used by most users.Developed by Stephen Bourne at AT&T...

, Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

).

Escape character

One method for avoiding delimiter collision is to use escape character
Escape character
In computing and telecommunication, an escape character is a character which invokes an alternative interpretation on subsequent characters in a character sequence. An escape character is a particular case of metacharacters...

s:

"I said, \"Can you hear me?\""

The most commonly-used escape character for this purpose is the backslash "\",
the tradition for which originated on Unix. From a language design standpoint, this
approach is adequate, but there are drawbacks:
  • text can be rendered unreadable when littered with numerous escape characters
  • escape characters are required to be escaped, when not intended as escape characters
  • although easy to type, they can be cryptic to someone unfamiliar with the language
  • when entering a string literal intended to be evaluated by another program - such as a sh
    SH
    - Places :* Saint Helena Island ISO 3166 digram and FIPS PUB 10-4 territory code** .sh, the country code top-level domain of Saint Helena* the deprecated ISO 639-1 code for Serbo-Croatian* Schleswig-Holstein, Germany* Shanghai, China...

     command inside a Perl
    Perl
    Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

     script - characters may need to be escaped twice or three times


"I said, \"The Windows path is C:\\Foo\\Bar\\Baz\""

The confusing presence of too many escape and slash characters in a string is commonly disparaged as leaning toothpick syndrome
Leaning toothpick syndrome
In computer programming, leaning toothpick syndrome is the situation in which a quoted expression becomes unreadable because it contains a large number of escape characters, usually backslashes , to avoid delimiter collision....

.

Escape sequence

An extended concept of the escape character, an escape sequence is also a means of avoiding
delimiter collision. An escape sequence consists of two or more consecutive characters that can have
special meaning when used in the context of a string literal. For example, programming languages such as Perl, Python and Ruby support the following:

"I said, \x22Can you hear me?\x22"

Escape sequences can also be used for purposes other than avoiding delimiter collision, and
can also include metacharacters. (see Metacharacters below).

Double-up escape sequence

Some languages (such as Pascal
Pascal (programming language)
Pascal is an influential imperative and procedural programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structuring.A derivative known as Object Pascal...

, BASIC
BASIC
BASIC is a family of general-purpose, high-level programming languages whose design philosophy emphasizes ease of use - the name is an acronym from Beginner's All-purpose Symbolic Instruction Code....

, DCL
DIGITAL Command Language
DCL, the DIGITAL Command Language, is the standard command languageadopted by most of the operating systems that were sold by the former Digital Equipment Corporation...

, Smalltalk
Smalltalk
Smalltalk is an object-oriented, dynamically typed, reflective programming language. Smalltalk was created as the language to underpin the "new world" of computing exemplified by "human–computer symbiosis." It was designed and created in part for educational use, more so for constructionist...

, and SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

) avoid delimiter collision
by doubling up on the quotation marks that are intended to be part of the string literal
itself:


'This Pascal stringcontains two apostrophes
"I said, ""Can you hear me?"""

Extended quoting styles

Some languages extend the previously-mentioned quoting conventions even further. These extended approaches provide an even more flexible style of notation for avoiding delimiter collision.

Triple quoting:
One such extension, the use of triple quoting, is used in Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

:

This is John's apple.

"""John is Nancy's so-called "boyfriend"."""

Triple quoted string literals may be delimited by """ or
. Triple quoting in Python also has the added benefit of allowing string literals to span more than one physical line of source code.

Multiple quoting:
Another such extension is the use of multiple quoting, which allows the author to choose which characters should specify the bounds of a string literal.

For example in Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

:

qq^I said, "Can you hear me?"^

qq@I said, "Can you hear me?"@

qq§I said, "Can you hear me?"§

all produce the desired result.
Although this notation is more flexible, few languages support it. Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...


and Ruby
Ruby (programming language)
Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...

 are two that do.

Here documents

A Here document is an alternate quoting notation that allows the programmer
to specify an arbitrary unique identifier as a content boundary for a string literal.
This avoids delimiter collision, and also preserves newlines in the source code
as newlines in the string literal itself.

Concatenation

In some languages (e.g., BASIC
BASIC
BASIC is a family of general-purpose, high-level programming languages whose design philosophy emphasizes ease of use - the name is an acronym from Beginner's All-purpose Symbolic Instruction Code....

) there is no provision for escape sequences or any of the workarounds discussed above. To place a string delimiter character in a string, it is necessary to use string concatenation. The following example shows how this might be done in BASIC on a system using ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

:

"I said, " + CHR$(34) + "Can you hear me?" + CHR$(34)

Here, the CHR$ function returns the character corresponding to its argument; in ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 the quotation mark has the value 34.

Metacharacters

Many languages support the use of metacharacter
Metacharacter
A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression engine.-Examples:...

s inside string literals. Metacharacters
have varying interpretations depending on the context and language, but are generally a kind
of 'processing command' for representing printing or nonprinting characters.

For instance, in a C string
C string
In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character...

 literal, if the backslash is followed
by a letter such as "b", "n" or "t", then this represents a nonprinting backspace, newline
or tab character respectively. Or if the backslash is followed by 1-3 octal
Octal
The octal numeral system, or oct for short, is the base-8 number system, and uses the digits 0 to 7. Numerals can be made from binary numerals by grouping consecutive binary digits into groups of three...

 digits,
then this sequence is interpreted as representing the arbitrary character with the specified
ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 code. This was later extended to allow more modern hexadecimal
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

 character code notation:

"I said,\t\t\x22Can you hear me?\x22\n"

Raw strings

A few languages provide a method of specifying that a literal is to be processed without any language-specific interpretation.

For example, in Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

 'raw strings' are preceded by an r. In such strings backslashes are not interpreted as escape sequences, making it simpler to write DOS/Windows paths
Path (computing)
A path, the general form of a filename or of a directory name, specifies a unique location in a file system. A path points to a file system location by following the directory tree hierarchy expressed in a string of characters in which path components, separated by a delimiting character, represent...

 and regular expressions:
r"The Windows path is C:\Foo\Bar\Baz\ "

C#'s notation is called @-quoting:

@"C:\Foo\Bar\Baz\"

Which also allows double-up quotations:

@"I said, ""Hello there."""


In XML documents, CDATA sections allows use of characters such as & and < without an XML parser attempting to interpret them as part of the structure of the document itself. This can be useful when including literal text and scripting code, to keep the document well formed.






Variable interpolation

Languages differ on whether and how to interpret string literals as either
'raw' or 'variable interpolated'. Variable interpolation is the process
of evaluating an expression containing one or more variables, and returning
output where the variables are replaced with their corresponding values in
memory.
In sh-compatible Unix shells
Bourne shell
The Bourne shell, or sh, was the default Unix shell of Unix Version 7 and most Unix-like systems continue to have /bin/sh - which will be the Bourne shell, or a symbolic link or hard link to a compatible shell - even when more modern shells are used by most users.Developed by Stephen Bourne at AT&T...

, quotation-delimited (") strings are interpolated, while apostrophe-delimited (') strings are not.

For example, the following Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

 code:


$name = "Nancy";
$greeting = "Hello World";
print "$name said $greeting to the crowd of people.";


produces the output:

Nancy said Hello World to the crowd of people.

The sigil
Sigil (computer programming)
In computer programming, a sigil is a symbol attached to a variable name, showing the variable's datatype or scope. In 1999 Philip Gwyn adopted the term "to mean the funny character at the front of a Perl variable".- Historical context:...

 character ($) is interpreted to indicate variable
interpolation.

Similarly, the printf
Printf
Printf format string refers to a control parameter used by a class of functions typically associated with some types of programming languages. The format string specifies a method for rendering an arbitrary number of varied data type parameter into a string...

function produces the same output
using notation such as:

printf "%s said %s to the crowd of people.", $name, $greeting;

The metacharacters (%s) indicate variable interpolation.

This is contrasted with "raw" strings:

print '$name said $greeting to the crowd of people.';

which produce output like:

$name said $greeting to the crowd of people.

Here the $ characters are not sigils
Sigil (computer programming)
In computer programming, a sigil is a symbol attached to a variable name, showing the variable's datatype or scope. In 1999 Philip Gwyn adopted the term "to mean the funny character at the front of a Perl variable".- Historical context:...

, and are not interpreted to have any meaning other than plain text.

Binary and hexadecimal strings

REXX
REXX
REXX is an interpreted programming language that was developed at IBM. It is a structured high-level programming language that was designed to be both easy to learn and easy to read...

 uses suffix characters to specify characters or strings using their hexadecimal or binary code. E.g.,


'20'x
"0010 0000"b
"00100000"b

all yield the space character, avoiding the function call X2C(20).

Embedding source code in string literals

Languages that lack flexibility in specifying string literals make
it particularly cumbersome to write programming code that generates
other programming code. This is particularly true when the generation
language is the same or similar to the output language.

for example:
  • writing code to produce quines
  • generating an output language from within a web template
    Web template
    A web template is a tool used to separate content from presentation in web design, and for mass-production of web documents. It is a basic component of a web template system.Web templates can be used to set up any type of website...

    ;
  • using XSLT
    XSLT
    XSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...

     to generate XSLT, or SQL
    SQL
    SQL is a programming language designed for managing data in relational database management systems ....

     to generate more SQL
  • generating a PostScript
    PostScript
    PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...

     representation of a document for printing purposes, from within a document-processing application written in C
    C (programming language)
    C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

     or some other language.


Nevertheless, some languages are particularly well-adapted to produce
this sort of self-similar output, especially those that support multiple options
for avoiding delimiter collision.

Using string literals as code that generates
other code may have adverse security implications, especially if the output is based at least partially on untrusted
user input. This is particularly acute in the case of Web-based applications, where malicious users can take advantage of such weaknesses to subvert the operation of the application, for example by mounting an SQL injection
SQL injection
A SQL injection is often used to attack the security of a website by inputting SQL statements in a web form to get a badly designed website in order to dump the database content to the attacker. SQL injection is a code injection technique that exploits a security vulnerability in a website's software...

attack.

x
OK