C trigraph
Encyclopedia
In computer programming
Computer programming
Computer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...

, digraphs and trigraphs are sequences of two and three character
Character (computing)
In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language....

s respectively, appearing in source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...

, which a programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

 specification requires an implementation of that language to treat as if they were one other character.

Various reasons exist for using digraphs and trigraphs: keyboards may not have keys to cover the entire character set of the language, input of special characters may be difficult, text editor
Text editor
A text editor is a type of program used for editing plain text files.Text editors are often provided with operating systems or software development packages, and can be used to change configuration files and programming language source code....

s may reserve some characters for special use and so on. Trigraphs might also be used for some EBCDIC
EBCDIC
Extended Binary Coded Decimal Interchange Code is an 8-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems....

 code pages that lack characters such as { and }.

History

The basic character set of the C programming language is a superset of the ASCII character set that includes nine characters which lie outside the ISO 646 invariant character set. This can pose a problem for writing source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...

 when the keyboard
Computer keyboard
In computing, a keyboard is a typewriter-style keyboard, which uses an arrangement of buttons or keys, to act as mechanical levers or electronic switches...

 being used does not support any of these nine characters. The ANSI C
ANSI C
ANSI C refers to the family of successive standards published by the American National Standards Institute for the C programming language. Software developers writing in C are encouraged to conform to the standards, as doing so aids portability between compilers.-History and outlook:The first...

 committee invented trigraphs as a way of entering source code using keyboards that support any version of the ISO 646 character set.

Implementations

Trigraphs are not commonly encountered outside compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

 test suite
Test suite
In software development, a test suite, less commonly known as a validation suite, is a collection of test cases that are intended to be used to test a software program to show that it has some specified set of behaviours. A test suite often contains detailed instructions or goals for each...

s. Some compilers support an option to turn recognition of trigraphs off, or disable trigraphs by default and require an option to turn them on. Some can issue warnings when they encounter trigraphs in source files. Borland
Borland
Borland Software Corporation is a software company first headquartered in Scotts Valley, California, Cupertino, California and finally Austin, Texas. It is now a Micro Focus subsidiary. It was founded in 1983 by Niels Jensen, Ole Henriksen, Mogens Glad and Philippe Kahn.-The 1980s:...

 supplied a separate program, the trigraph preprocessor, to be used only when trigraph processing is desired (the rationale was to maximise speed of compilation).

Pascal

Pascal programming language
Pascal (programming language)
Pascal is an influential imperative and procedural programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structuring.A derivative known as Object Pascal...

 supports digraphs (., .), (* and *) for [, ], { and } respectively. Unlike all other cases mentioned here, (* and *) were in wide use.

Vim

Vim
Vim (text editor)
Vim is a text editor written by Bram Moolenaar and first released publicly in 1991. Based on the vi editor common to Unix-like systems, Vim is designed for use both from a command line interface and as a standalone application in a graphical user interface...

 text editor supports digraphs
for actual entry of text characters, following RFC 1345.

J

The J programming language uses dot and colon characters to extend the meaning of the basic characters available. These however are not digraphs, because the resulting sequences do not have a single-character equivalent.

C

C programming language
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

 supports digraphs in ISO C 94 mode of compiling.

The C preprocessor
C preprocessor
The C preprocessor is the preprocessor for the C and C++ computer programming languages. The preprocessor handles directives for source file inclusion , macro definitions , and conditional inclusion ....

 replaces all occurrences of the following nine trigraph sequences by their single-character equivalents before any other processing.

A programmer may want to place two question marks together yet not have the compiler treat them as introducing a trigraph. The C grammar does not permit two consecutive ? tokens, so the only places in a C file where two question marks in a row may be used are in multi-character constants, string literals, and comments. To safely place two consecutive question marks within a string literal, the programmer can use string concatenation "...?""?..." or an escape sequence "...?\?...".
Trigraph Equivalent
??= #
??/ \
??' ^
??( [
??) ]
??! >
??< {
??> }
??- ~


??? is not itself a trigraph sequence.

The ??/ trigraph can be used to introduce an escaped newline for line splicing; this must be taken into account for correct and efficient handling of trigraphs within the preprocessor. It can also cause surprises, particularly within comments. For example:


// Will the next line be executed????????????????/
a++;


which is a single logical comment line (used in C++ and C99), and


/??/
* A comment *??/
/


which is a correctly formed block comment.

In 1994 a normative amendment to the C standard, included in C99
C99
C99 is a modern dialect of the C programming language. It extends the previous version with new linguistic and library features, and helps implementations make better use of available computer hardware and compiler technology.-History:...

, supplied digraphs as more readable alternatives to five of the trigraphs. They are:
Digraph Equivalent
<: [
:> ]
<% {
%> }
%: #


Unlike trigraphs, digraphs are handled during tokenization
Tokenization
Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens becomes input for further processing such as parsing or text mining...

, and any digraph must always represent a full token by itself, or compose the token %:%: replacing the preprocessor concatenation token ##. If a digraph sequence occurs inside another token, for example a quoted string, or a character constant, it will not be replaced.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK