C syntax
Encyclopedia
The syntax
Syntax
In linguistics, syntax is the study of the principles and rules for constructing phrases and sentences in natural languages....

 of the C programming language
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

is a set of rules that specifies whether the sequence of characters in a file is conforming C source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...

. The rules specify how the character sequences are to be chunked into tokens (the lexical grammar
Lexical grammar
In computer science, a lexical grammar can be thought of as the syntax of tokens. That is, the rules governing how a character sequence is divided up into subsequences of characters, each part of which represents an individual token....

), the permissible sequences of these tokens and some of the meaning to be attributed to these permissible token sequences (additional meaning is assigned by the semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....

 of the language).

C syntax makes use of the maximal munch
Maximal munch
In computer programming and computer science, "maximal munch" or "longest match" is the principle that when creating some construct, as much of the available input as possible should be consumed...

 principle.

Primitive data types

The C language represents numbers in three forms: integral, real and complex. This distinction reflects similar distinctions in the instruction set
Instruction set
An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...

 architecture of most central processing unit
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

s. Integral data types store numbers in the set of integers, while real and complex numbers represent numbers (or pair of numbers) in the set of real numbers in floating point
Floating point
In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...

 form.

All C integer types have signed and unsigned variants. If signed or unsigned is not specified explicitly, in most circumstances signed is assumed. However, for historic reasons plain char is a type distinct from both signed char and unsigned char. It may be a signed type or an unsigned type, depending on the compiler and the character set (C guarantees that members of the C basic character set have positive values). Also, bit field
Bit field
A bit field is a common idiom used in computer programming to compactly store multiple logical values as a short series of bits where each of the single bits can be addressed separately. A bit field is most commonly used to represent integral types of known, fixed bit-width. A well-known usage of...

 types specified as plain int may be signed or unsigned, depending on the compiler.

Integer types

C's integer types come in different fixed sizes, capable of representing various ranges of numbers. The type char occupies exactly one byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...

 (the smallest addressable storage unit), which is typically 8 bits wide. (Although char can represent any of C's "basic" characters, a wider type may be required for international character sets.) Most integer types have both signed and unsigned
Signedness
In computing, signedness is a property of data types representing numbers in computer programs. A numeric variable is signed if it can represent both positive and negative numbers, and unsigned if it can only represent non-negative numbers .As signed numbers can represent negative numbers, they...

 varieties, designated by the signed and unsigned keywords. Signed integer types may use a two's complement
Two's complement
The two's complement of a binary number is defined as the value obtained by subtracting the number from a large power of two...

, ones' complement, or sign-and-magnitude representation
Signed number representations
In computing, signed number representations are required to encode negative numbers in binary number systems.In mathematics, negative numbers in any base are represented by prefixing them with a − sign. However, in computer hardware, numbers are represented in binary only without extra...

. In many cases, there are multiple equivalent ways to designate the type; for example, signed short int and short are synonymous.

The representation of some types may include unused "padding" bits, which occupy storage but are not included in the width. The following table provides a complete list of the standard integer types and their minimum allowed widths (including any sign bit).
Specifications for standard integer types
Shortest form of specifier Minimum width (bits)
_Bool 1
char 8
signed char 8
unsigned char 8
short 16
unsigned short 16
int 16
unsigned 16
long 32
unsigned long 32
long long 64
unsigned long long 64


The char type is distinct from both signed char and unsigned char, but is guaranteed to have the same representation as one of them. The _Bool and long long types are standardized since 1999, and may not be supported by older C compilers. Type _Bool is usually accessed via the typedef name bool defined by the standard header stdbool.h.

In general, the widths and representation scheme implemented for any given platform are chosen based on the machine architecture, with some consideration given to the ease of importing source code developed for other platforms. The width of the int type varies especially widely among C implementations; it often corresponds to the most "natural" word size for the specific platform. The standard header limits.h defines macros for the minimum and maximum representable values of the standard integer types as implemented on any specific platform.

In addition to the standard integer types, there may be other "extended" integer types, which can be used for typedefs in standard headers. For more precise specification of width, programmers can and should use typedefs from the standard header stdint.h.

Integer constants may be specified in source code in several ways. Numeric values can be specified as decimal
Decimal
The decimal numeral system has ten as its base. It is the numerical base most widely used by modern civilizations....

 (example: 1022), octal
Octal
The octal numeral system, or oct for short, is the base-8 number system, and uses the digits 0 to 7. Numerals can be made from binary numerals by grouping consecutive binary digits into groups of three...

 with zero (0) as a prefix (01776), or hexadecimal
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

 with 0x (zero x) as a prefix (0x3FE). A character in single quotes (example: 'R'), called a "character constant," represents the value of that character in the execution character set, with type int. Except for character constants, the type of an integer constant is determined by the width required to represent the specified value, but is always at least as wide as int. This can be overridden by appending an explicit length and/or signedness modifier; for example, 12lu has type unsigned long. There are no negative integer constants, but the same effect can often be obtained by using a unary negation operator "-".

Enumerated type

The enumerated type
Enumerated type
In computer programming, an enumerated type is a data type consisting of a set of named values called elements, members or enumerators of the type. The enumerator names are usually identifiers that behave as constants in the language...

 in C, specified with the enum keyword, and often just called an "enum" (usually pronounced ee'-num /ˌi.nʌm/ or ee'-noom /ˌi.nuːm/), is a type designed to represent values across a series of named constants. Each of the enumerated constants has type int. Each enum type itself is compatible with char or a signed or unsigned integer type, but each implementation defines its own rules for choosing a type.

Some compilers warn if an object with enumerated type is assigned a value that is not one of its constants. However, such an object can be assigned any values in the range of their compatible type, and enum constants can be used anywhere an integer is expected. For this reason, enum values are often used in place of preprocessor #define directives to create named constants. Such constants are generally safer to use than macros, since they reside within a specific identifier namespace.

An enumerated type is declared with the enum specifier and an optional name (or tag) for the enum, followed by a list of one or more constants contained within curly braces and separated by commas, and an optional list of variable names. Subsequent references to a specific enumerated type use the enum keyword and the name of the enum. By default, the first constant in an enumeration is assigned the value zero, and each subsequent value is incremented by one over the previous constant. Specific values may also be assigned to constants in the declaration, and any subsequent constants without specific values will be given incremented values from that point onward.
For example, consider the following declaration:

enum colors { RED, GREEN, BLUE = 5, YELLOW } paint_color;

This declares the enum colors type; the int constants RED (whose value is 0), GREEN (whose value is one greater than RED, 1), BLUE (whose value is the given value, 5), and YELLOW (whose value is one greater than BLUE, 6); and the enum colors variable paint_color. The constants may be used outside of the context of the enum (where any integer value is allowed), and values other than the constants may be assigned to paint_color, or any other variable of type enum colors.

Floating point types

The floating-point form is used to represent numbers with a fractional component. They do not, however, represent most rational numbers exactly; they are instead a close approximation. There are three types of real values, denoted by their specifiers: single precision (float), double precision (double), and double extended precision (long double). Each of these may represent values in a different form, often one of the IEEE floating point formats.
Floating-point types
Type specifiers Precision (decimal digits) Exponent range
Minimum IEEE 754 Minimum IEEE 754
float 6 7.2 (24 bits) ±37 ±38 (8 bits)
double 10 15.9 (53 bits) ±37 ±307 (11 bits)
long double 10 34.0 (113 bits) ±37 ±4931 (15 bits)


Floating-point constants may be written in decimal notation, e.g. 1.23. Scientific notation may be used by adding e or E followed by a decimal exponent, e.g. 1.23e2 (which has the value 123.0). Either a decimal point or an exponent is required (otherwise, the number is parsed as an integer constant). Hexadecimal floating-point constants follow similar rules, except that they must be prefixed by 0x and use p or P to specify a binary exponent, e.g. 0xAp-2 (which has the value 2.5, since 10 × 2−2 = 10 ÷ 4). Both decimal and hexadecimal floating-point constants may be suffixed by f or F to indicate a constant of type float, by l (letter l) or L to indicate type long double, or left unsuffixed for a double constant.

The standard header file float.h defines the minimum and maximum values of the implementation's floating-point types float, double, and long double. It also defines other limits that are relevant to the processing of floating-point numbers.

Storage duration specifiers

Every object has a storage class, which may be automatic, static, or allocated.
Storage classes
Specifiers Lifetime Scope Default initializer
auto Block (stack) Block Uninitialized
register Block (stack or CPU register) Block Uninitialized
static Program Block or compilation unit Zero
extern Program Block or compilation unit Zero
(none)1 Dynamic (heap) Uninitialized

1 Allocated and deallocated using the malloc and free library functions.


Variables declared within a block by default have automatic storage, as do those explicitly declared with the auto or register storage class specifiers. The auto and register specifiers may only be used within functions and function argument declarations; as such, the auto specifier is always redundant. Objects declared outside of all blocks and those explicitly declared with the static
Static variable
In computer programming, a static variable is a variable that has been allocated statically — whose lifetime extends across the entire run of the program...

 storage class specifier have static storage duration. Static variables are initialized to zero by default by the compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

.

Objects with automatic storage are local to the block in which they were declared and are discarded when the block is exited. Additionally, objects declared with the register storage class may be given higher priority by the compiler for access to registers; although they may not actually be stored in registers, objects with this storage class may not be used with the address-of (&) unary operator. Objects with static storage persist for the program's entire duration. In this way, the same object can be accessed by a function across multiple calls. Objects with allocated storage duration are created and destroyed explicitly with malloc
Malloc
C dynamic memory allocation refers to performing dynamic memory allocation in the C via a group of functions in the C standard library, namely malloc, realloc, calloc and free....

, free, and related functions.

The extern
External variable
In the C programming language, an external variable is a variable defined outside any function block. On the other hand, a local variable is a variable defined inside a function block.- Definition, declaration and the extern keyword :...

 storage class specifier indicates that the storage for an object has been defined elsewhere. When used inside a block, it indicates that the storage has been defined by a declaration outside of that block. When used outside of all blocks, it indicates that the storage has been defined outside of the compilation unit. The extern storage class specifier is redundant when used on a function declaration. It indicates that the declared function has been defined outside of the compilation unit.

Note that storage specifiers apply only to functions and objects; other things such as type and enum declarations are private to the compilation unit in which they appear. Types, on the other hand, have qualifiers (see below).

Type qualifiers

Objects can be qualified to indicate special properties of the data they contain. The const type qualifier indicates that the value of an object should not change once it has been initialized. Attempting to modify an object qualified with const yields undefined behavior, so some C implementations store them in read-only segments of memory. The volatile type qualifier indicates that the value of an object may be changed externally without any action by the program (see volatile variable
Volatile variable
In computer programming, particularly in the C, C++, C#, and Java programming languages, a variable or object declared with the volatile keyword usually has special properties related to optimization and/or threading...

); it may be completely ignored by the compiler.

Incomplete types

An incomplete type is a structure or union type whose members have not yet been specified, an array type whose dimension has not yet been specified, or the void type (the void type cannot be completed). Such a type may not be instantiated (its size is not known), nor may its members be accessed (they, too, are unknown); however, the derived pointer type may be used (but not dereferenced).

They are often used with pointers, either as forward or external declarations. For instance, code could declare an incomplete type like this:

struct thing *pt;

This declares pt as a pointer to struct thing and the incomplete type struct thing. Pointers always have the same byte-width regardless of what they point to, so this statement is valid by itself (as long as pt is not dereferenced). The incomplete type can be completed later in the same scope by redeclaring it:

struct thing
{
int num;
} /* thing struct type is now completed */


Incomplete types are used to implement recursive
Recursive
Recursive may refer to:*Recursion, the technique of functions calling themselves*Recursive function, a total computable function*Recursive language, a language which is decidable...

 structures; the body of the type declaration may be deferred to later in the translation unit:

typedef struct Bert Bert;
typedef struct Wilma Wilma;

struct Bert
{
Wilma *wilma;
};

struct Wilma
{
Bert *bert;
};


Incomplete types are also used for data hiding; the incomplete type is defined in a header file
Header file
Some programming languages use header files. These files allow programmers to separate certain elements of a program's source code into reusable files. Header files commonly contain forward declarations of classes, subroutines, variables, and other identifiers...

, and the body only within the relevant source file.

Pointers

In declarations the asterisk modifier (*) specifies a pointer type. For example, where the specifier int would refer to the integer type, the specifier int* refers to the type "pointer to integer". Pointer values associate two pieces of information: a memory address and a data type. The following line of code declares a pointer-to-integer variable called ptr:

int *ptr;

Referencing

When a non-static pointer is declared, it has an unspecified value associated with it. The address associated with such a pointer must be changed by assignment prior to using it. In the following example, ptr is set so that it points to the data associated with the variable a:


int *ptr;
int a;

ptr = &a;


In order to accomplish this, the "address-of" operator (unary &) is used. It produces the memory location of the data object that follows.

Dereferencing

The pointed-to data can be accessed through a pointer value. In the following example, the integer variable b is set to the value of integer variable a, which is 10:


int *p;
int a, b;

a = 10;
p = &a;
b = *p;


In order to accomplish that task, the unary dereference operator
Dereference operator
The dereference operator or indirection operator, denoted by "*" , is a unary operator found in C-like languages that include pointer variables. It operates on a pointer variable, and returns an l-value equivalent to the value at the pointer address. This is called "dereferencing" the pointer...

, denoted by an asterisk (*), is used. It returns the data to which its operand—which must be of pointer type—points. Thus, the expression *p denotes the same value as a. Dereferencing a null pointer is illegal.

Array definition

Arrays are used in C to represent structures of consecutive elements of the same type. The definition of a (fixed-size) array has the following syntax:

int array[100];

which defines an array named array to hold 100 values of the primitive type int. If declared within a function, the array dimension may also be a non-constant expression, in which case memory for the specified number of elements will be allocated. In most contexts in later use, a mention of the variable array is converted to a pointer to the first item in the array. The sizeof operator is an exception: sizeof array yields the size of the entire array (that is, 100 times the size of an int). Another exception is the & (address-of) operator, which yields a pointer to the entire array, for example

int (*ptr_to_array)[100] = &array;

Accessing elements

The primary facility for accessing the values of the elements of an array is the array subscript operator. To access the i-indexed element of array, the syntax would be array[i], which refers to the value stored in that array element.

Array subscript numbering begins at 0 (see Zero-based indexing). The largest allowed array subscript is therefore equal to the number of elements in the array minus 1. To illustrate this, consider an array a declared as having 10 elements; the first element would be a[0] and the last element would be a[9].

C provides no facility for automatic bounds checking
Bounds checking
In computer programming, bounds checking is any method of detecting whether a variable is within some bounds before its use. It is particularly relevant to a variable used as an index into an array to ensure its value lies within the bounds of the array...

 for array usage. Though logically the last subscript in an array of 10 elements would be 9, subscripts 10, 11, and so forth could accidentally be specified, with undefined results.

Due to arrays and pointers being interchangeable, the addresses of each of the array elements can be expressed in equivalent pointer arithmetic. The following table illustrates both methods for the existing array:
Array subscripts vs. pointer arithmetic
Element First Second Third nth
Array subscript
Dereferenced pointer


Since the expression a[i] is semantically equivalent to *(a+i), which in turn is equivalent to *(i+a), the expression can also be written as i[a], although this form is rarely used.

Dynamic arrays

A constant value is required for the dimension in a declaration of a static array. A desired feature is the ability to set the length of an array dynamically at run-time instead:


int n = ...;
int a[n];
a[3] = 10;


This behavior can be simulated with the help of the C standard library
C standard library
The C Standard Library is the standard library for the programming language C, as specified in the ANSI C standard.. It was developed at the same time as the C POSIX library, which is basically a superset of it...

. The malloc
Malloc
C dynamic memory allocation refers to performing dynamic memory allocation in the C via a group of functions in the C standard library, namely malloc, realloc, calloc and free....

function provides a simple method for allocating memory. It takes one parameter: the amount of memory to allocate in bytes. Upon successful allocation, malloc returns a generic (void) pointer value, pointing to the beginning of the allocated space. The pointer value returned is converted to an appropriate type implicitly by assignment. If the allocation could not be completed, malloc returns a null pointer. The following segment is therefore similar in function to the above desired declaration:

  1. include /* declares malloc */

...
int *a;
a = malloc(n * sizeof(int));
a[3] = 10;


The result is a "pointer to int" variable (a) that points to the first of n contiguous int objects; due to array–pointer equivalence this can be used in place of an actual array name, as shown in the last line. The advantage in using this dynamic allocation is that the amount of memory that is allocated to it can be limited to what is actually needed at run time, and this can be changed as needed (using the standard library function realloc).

When the dynamically-allocated memory is no longer needed, it should be released back to the run-time system. This is done with a call to the free function. It takes a single parameter: a pointer to previously allocated memory. This is the value that was returned by a previous call to malloc. It is considered good practice to then set the pointer variable to NULL so that further attempts to access the memory to which it points will fail. If this is not done, the variable becomes a dangling pointer
Dangling pointer
Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations....

, and such errors in the code (or manipulations by an attacker) might be very hard to detect and lead to obscure and potentially dangerous malfunction caused by memory corruption.


free(a);
a = NULL;


Standard C-99 also supports variable-length array
Variable-length array
In programming, a variable-length array is an array data structure of automatic storage duration whose length is determined at run time ....

s (VLAs) within block scope. Such array variables are allocated based on the value of an integer value at runtime upon entry to a block, and are deallocated at the end of the block.


float read_and_process(int sz)
{
float vals[sz]; // VLA, size determined at runtime

for (int i = 0; i < sz; i++)
vals[i] = read_value;
return process(vals, sz);
}

Multidimensional arrays

In addition, C supports arrays of multiple dimensions, which are stored in row-major order
Row-major order
In computing, row-major order and column-major order describe methods for storing multidimensional arrays in linear memory. Following standard matrix notation, rows are numbered by the first index of a two-dimensional array and columns by the second index. Array layout is critical for correctly...

. Technically, C multidimensional arrays are just one-dimensional arrays whose elements are arrays. The syntax for declaring multidimensional arrays is as follows:
int array2d[ROWS][COLUMNS];
where ROWS and COLUMNS are constants. This defines a two-dimensional array. Reading the subscripts from left to right, array2d is an array of length ROWS, each element of which is an array of COLUMNS integers.

To access an integer element in this multidimensional array, one would use
array2d[4][3]
Again, reading from left to right, this accesses the 5th row, and the 4th element in that row. The expression array2d[4] is an array, which we are then subscripting with [3] to access the fourth integer.
Array subscripts vs. pointer arithmetic
Element First Second row, second column ith row, jth column
Array subscript
Dereferenced pointer


Higher-dimensional arrays can be declared in a similar manner.

A multidimensional array should not be confused with an array of references to arrays (also known as an Iliffe vector
Iliffe vector
In computer programming, an Iliffe vector, also known as a display, is a data structure used to implement multi-dimensional arrays. An Iliffe vector for an n-dimensional array consists of a vector of pointers to an -dimensional array...

s or sometimes an array of arrays). The former is always rectangular (all subarrays must be the same size), and occupies a contiguous region of memory. The latter is a one-dimensional array of pointers, each of which may point to the first element of a subarray in a different place in memory, and the sub-arrays do not have to be the same size. The latter can be created by multiple uses of malloc.

Strings

In C, string constants (literals) are surrounded by double quotes ("), e.g. "Hello world!" and are compiled to an array of the specified char values with an additional null terminating character (0-valued) code to mark the end of the string.

String literals may not contain embedded newlines; this proscription somewhat simplifies parsing of the language. To include a newline in a string, the backslash escape \n may be used, as below.

There are several standard library functions for operating with string data (not necessarily constant) organized as array of char using this null-terminated format; see below.

C's string-literal syntax has been very influential, and has made its way into many other languages, such as C++, Perl, Python, PHP, Java, Javascript, C#, Ruby. Nowadays, almost all new languages adopt or build upon C-style string syntax. Languages that lack this syntax tend to precede C.

Backslash escapes

If you wish to include a double quote inside the string, that can be done by escaping it with a backslash (\), for example, "This string contains \"double quotes\".". To insert a literal backslash, one must double it, e.g. "A backslash looks like this: \\".

Backslashes may be used to enter control characters, etc., into a string:
Escape Meaning
\\ Literal backslash
\" Double quote
\' Single quote
\n Newline (line feed)
\r Carriage return
\b Backspace
\t Horizontal tab
\f Form feed
\a Alert (bell)
\v Vertical tab
\? Question mark (used to escape trigraphs
C trigraph
In computer programming, digraphs and trigraphs are sequences of two and three characters respectively, appearing in source code, which a programming language specification requires an implementation of that language to treat as if they were one other character.Various reasons exist for using...

)
\nnn Character with octal value nnn
\xhh Character with hexadecimal value hh


The use of other backslash escapes is not defined by the C standard, although compiler vendors often provide additional escape codes as language extensions.

String literal concatenation

Adjacent string literals are concatenated at compile time; this allows long strings to be split over multiple lines, and also allows string literals resulting from C preprocessor
C preprocessor
The C preprocessor is the preprocessor for the C and C++ computer programming languages. The preprocessor handles directives for source file inclusion , macro definitions , and conditional inclusion ....

 defines and macros to be appended to strings at compile time:

printf(__FILE__ ": %d: Hello "
"world\n", __LINE__);

will expand to

printf("helloworld.c" ": %d: Hello "
"world\n", 10);

which is syntactically equivalent to

printf("helloworld.c: %d: Hello world\n", 10);

Character constants

Individual character constants are single-quoted, e.g. 'A', and have type int (in C++, char). The difference is that "A" represents a null-terminated array of two characters, 'A' and '\0', whereas 'A' directly represents the character value (65 if ASCII is used). The same backslash-escapes are supported as for strings, except that (of course) " can validly be used as a character without being escaped, whereas ' must now be escaped.

A character constant cannot be empty (i.e. is invalid syntax), although a string may be (it still has the null terminating character). Multi-character constants (e.g. 'xy') are valid, although rarely useful — they let one store several characters in an integer (e.g. 4 ASCII characters can fit in a 32-bit integer, 8 in a 64-bit one). Since the order in which the characters are packed into an int is not specified, portable use of multi-character constants is difficult.

Wide character strings

Since type char is usually 1 byte wide, a single char value typically can represent at most 255 distinct character codes, not nearly enough for all the characters in use worldwide. To provide better support for international characters, the first C standard (C89) introduced wide character
Wide character
A wide character is a computer character datatype that generally has a size greater than the traditional 8-bit character. The increased datatype size allows for the use of larger coded character sets.-History:...

s (encoded in type wchar_t) and wide character strings, which are written as L"Hello world!"

Wide characters are most commonly either 2 bytes (using a 2-byte encoding such as UTF-16) or 4 bytes (usually UTF-32), but Standard C does not specify the width for wchar_t, leaving the choice to the implementor. Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

 generally uses UTF-16, thus the above string would be 26 bytes long for a Microsoft compiler; the Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 world prefers UTF-32, thus compilers such as GCC would generate a 52-byte string. A 2-byte wide wchar_t suffers the same limitation as char, in that certain characters (those outside the BMP) cannot be represented in a single wchar_t; but must be represented using surrogate pairs.

The original C standard specified only minimal functions for operating with wide character strings; in 1995 the standard was modified to include much more extensive support, comparable to that for char strings. The relevant functions are mostly named after their char equivalents, with the addition of a "w" or the replacement of "str" with "wcs"; they are specified in , with containing wide-character classification and mapping functions.

The now generally recommended method of supporting international characters is through UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

, which is stored in char arrays, and can be written directly in the source code if using a UTF-8 editor, because UTF-8 is a direct ASCII extension
Extended ASCII
The term extended ASCII describes eight-bit or larger character encodings that include the standard seven-bit ASCII characters as well as others...

.

Variable width strings

A common alternative to wchar_t is to use a variable-width encoding
Variable-width encoding
A variable-width encoding is a type of character encoding scheme in which codes of differing lengths are used to encode a character set for representation in a computer...

, whereby a logical character may extend over multiple positions of the string. Variable-width strings may be encoded into literals verbatim, at the risk of confusing the compiler, or using numerical backslash escapes (e.g. "\xc3\xa9" for "é" in UTF-8). The UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

 encoding was specifically designed (under Plan 9
Plan 9 from Bell Labs
Plan 9 from Bell Labs is a distributed operating system. It was developed primarily for research purposes as the successor to Unix by the Computing Sciences Research Center at Bell Labs between the mid-1980s and 2002...

) for compatibility with the standard library string functions; supporting features of the encoding include a lack of embedded nulls, no valid interpretations for subsequences, and trivial resynchronisation. Encodings lacking these features are likely to prove incompatible with the standard library functions; encoding-aware string functions are often used in such cases.

Library functions

Strings
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

, both constant and variable, may be manipulated without using the standard library
Standard library
A standard library for a programming language is the library that is conventionally made available in every implementation of that language. In some cases, the library is described directly in the programming language specification; in other cases, the contents of the standard library are...

. However, the library contains many useful functions for working with null-terminated strings. It is the programmer's responsibility to ensure that enough storage has been allocated to hold the resulting strings.

The most commonly used string functions are:
  • strcat(dest, source) - appends the string source to the end of string dest
  • strchr(s, c) - finds the first instance of character c in string s and returns a pointer to it or a null pointer if c is not found
  • strcmp(a, b) - compares strings a and b (lexicographical order
    Lexicographical order
    In mathematics, the lexicographic or lexicographical order, , is a generalization of the way the alphabetical order of words is based on the alphabetical order of letters.-Definition:Given two partially ordered sets A and B, the lexicographical order on...

    ing); returns negative if a is less than b, 0 if equal, positive if greater.
  • strcpy(dest, source) - copies the string source onto the string dest
  • strlen(st) - return the length of string st
  • strncat(dest, source, n) - appends a maximum of n characters from the string source to the end of string dest and null terminates the string at the end of input or at index n+1 when the max length is reached
  • strncmp(a, b, n) - compares a maximum of n characters from strings a and b (lexical ordering); returns negative if a is less than b, 0 if equal, positive if greater
  • strrchr(s, c) - finds the last instance of character c in string s and returns a pointer to it or a null pointer if c is not found


Other standard string functions include:
  • strcoll(s1, s2) - compare two strings according to a locale-specific collating sequence
  • strcspn(s1, s2) - returns the index of the first character in s1 that matches any character in s2
  • strerror(errno) - returns a string with an error message corresponding to the code in errno
  • strncpy(dest, source, n) - copies n characters from the string source onto the string dest, substituting null bytes once past the end of source; does not null terminate if max length is reached
  • strpbrk(s1, s2) - returns a pointer to the first character in s1 that matches any character in s2 or a null pointer if not found
  • strspn(s1, s2) - returns the index of the first character in s1 that matches no character in s2
  • strstr(st, subst) - returns a pointer to the first occurrence of the string subst in st or a null pointer if no such substring exists
  • strtok(s1, s2) - returns a pointer to a token within s1 delimited by the characters in s2
  • strxfrm(s1, s2, n) - transforms s2 onto s1, such that s1 used with strcmp gives the same results as s2 used with strcoll


There is a similar set of functions for handling wide character strings.

Structures

Structures in C are defined as data containers consisting of a sequence of named members of various types. They are similar to records in other programming languages. The members of a structure are stored in consecutive locations in memory, although the compiler is allowed to insert padding between or after members (but not before the first member) for efficiency. The size of a structure is equal to the sum of the sizes of its members, plus the size of the padding.

Unions

Unions in C are related to structures and are defined as objects that may hold (at different times) objects of different types and sizes. They are analogous to variant records in other programming languages. Unlike structures, the components of a union all refer to the same location in memory. In this way, a union can be used at various times to hold different types of objects, without the need to create a separate object for each new type. The size of a union is equal to the size of its largest component type.

Declaration

Structures are declared with the struct keyword and unions are declared with the union keyword. The specifier keyword is followed by an optional identifier name, which is used to identify the form of the structure or union. The identifier is followed by the declaration of the structure or union's body: a list of member declarations, contained within curly braces, with each declaration terminated by a semicolon. Finally, the declaration concludes with an optional list of identifier names, which are declared as instances of the structure or union.

For example, the following statement declares a structure named s that contains three members; it will also declare an instance of the structure known as tee:


struct s
{
int x;
float y;
char *z;
} tee;


And the following statement will declare a similar union named u and an instance of it named n:


union u
{
int x;
float y;
char *z;
} n;


Members of structures and unions cannot have an incomplete or function type. Thus members cannot be an instance of the structure or union being declared (because it is incomplete at that point) but can be pointers to the type being declared.

Once a structure or union body has been declared and given a name, it can be considered a new data type using the specifier struct or union, as appropriate, and the name. For example, the following statement, given the above structure declaration, declares a new instance of the structure s named r:

struct s r;

It is also common to use the typedef specifier to eliminate the need for the struct or union keyword in later references to the structure. The first identifier after the body of the structure is taken as the new name for the structure type (structure instances may not be declared in this context). For example, the following statement will declare a new type known as s_type that will contain some structure:

typedef struct {…} s_type;

Future statements can then use the specifier s_type (instead of the expanded struct … specifier) to refer to the structure.

Accessing members

Members are accessed using the name of the instance of a structure or union, a period (.), and the name of the member. For example, given the declaration of tee from above, the member known as y (of type float) can be accessed using the following syntax:

t.y

Structures are commonly accessed through pointers. Consider the following example that defines a pointer to tee, known as ptr_to_tee:

struct s *ptr_to_tee = &tee;

Member y of tee can then be accessed by dereferencing ptr_to_tee and using the result as the left operand:

(*ptr_to_t).y

Which is identical to the simpler tee.y above as long as ptr_to_tee points to tee. Because this operation is common, C provides an abbreviated syntax for accessing a member directly from a pointer. With this syntax, the name of the instance is replaced with the name of the pointer and the period is replaced with the character sequence ->. Thus, the following method of accessing y is identical to the previous two:

ptr_to_tee->y

Members of unions are accessed in the same way.

Assignment

Assigning values to individual members of structures and unions is syntactically identical to assigning values to any other object. The only difference is that the lvalue of the assignment is the name of the member, as accessed by the syntax mentioned above.

A structure can also be assigned as a unit to another structure of the same type. Structures (and pointers to structures) may also be used as function parameter and return types.

For example, the following statement assigns the value of 74 (the ASCII code point for the letter 't') to the member named x in the structure tee, from above:

tee.x = 74;

And the same assignment, using ptr_to_tee in place of tee, would look like:

ptr_to_tee->x = 74;

Assignment with members of unions is identical.

Other operations

According to the C standard, the only legal operations that can be performed on a structure are copying it, assigning to it as a unit (or initializing it), taking its address with the address-of (&) unary operator, and accessing its members. Unions have the same restrictions. One of the operations implicitly forbidden is comparison: structures and unions cannot be compared using C's standard comparison facilities (, >, <, etc.).

Bit fields

C also provides a special type of structure member known as a bit field
Bit field
A bit field is a common idiom used in computer programming to compactly store multiple logical values as a short series of bits where each of the single bits can be addressed separately. A bit field is most commonly used to represent integral types of known, fixed bit-width. A well-known usage of...

, which is an integer with an explicitly specified number of bits. A bit field is declared as a structure member of type int, signed int, unsigned int, or _Bool, following the member name by a colon (:) and the number of bits it should occupy. The total number of bits in a single bit field must not exceed the total number of bits in its declared type.

As a special exception to the usual C syntax rules, it is implementation-defined whether a bit field declared as type int, without specifying signed or unsigned, is signed or unsigned. Thus, it is recommended to explicitly specify signed or unsigned on all structure members for portability.

Empty entries consisting of just a colon followed by a number of bits are also allowed; these indicate padding.

The members of bit fields do not have addresses, and as such cannot be used with the address-of (&) unary operator. The sizeof operator may not be applied to bit fields.

The following declaration declares a new structure type known as f and an instance of it known as g. Comments provide a description of each of the members:


struct f
{
unsigned int flag : 1; /* a bit flag: can either be on (1) or off (0) */
signed int num : 4; /* a signed 4-bit field; range -7...7 or -8...7 */
: 3; /* 3 bits of padding to round out to 8 bits */
} g;

Initialization

Default initialization depends on the storage duration specifier, described above.

Because of the language's grammar, a scalar initializer may be enclosed in any number of curly brace pairs. Most compilers issue a warning if there is more than one such pair, though.
int x = 12;
int y = { 23 }; //Legal, no warning
int z = { { 34 } }; //Legal, expect a warning

Structures, unions and arrays can be initialized in their declarations using an initializer list. Unless designators are used, the components of an initializer correspond with the elements in the order they are defined and stored, thus all preceding values must be provided before any particular element’s value. Any unspecified elements are set to zero (except for unions). Mentioning too many initialization values yields an error.

The following statement will initialize a new instance of the structure s known as pi:
struct s
{
int x;
float y;
char *z;
};

struct s pi = { 3, 3.1415, "Pi" };

Designated initializers allow members to be initialized by name, in any order, and without explicitly providing the preceding values. The following initialization is equivalent to the previous one:
struct s pi = { .z = "Pi", .x = 3, .y = 3.1415 };

Using a designator in an initializer moves the initialization "cursor". In the example below, if MAX is greater than 10, there will be some zero-valued elements in the middle of a; if it is less than 10, some of the values provided by the first five initializers will be overridden by the second five (if MAX is less than 5, there will be a compilation error):
int a[MAX] = { 1, 3, 5, 7, 9, [MAX-5] = 8, 6, 4, 2, 0 };

In C89, a union was initialized with a single value applied to its first member. That is, the union u defined above could only have its int x member initialized:
union u value = { 3 };

Using a designated initializer, the member to be initialized does not have to be the first member:
union u value = { .y = 3.1415 };

If an array has unknown size (i.e. the array was an incomplete type), the number of initializers determines the size of the array and its type becomes complete:

int x[] = { 0, 1, 2 } ;

Compound designators can be used to provide explicit initialization when unadorned initializer lists
might be misunderstood. In the example below, w is declared as an array of structures, each structure consisting in a member a (an array of 3 int) and a member b (an int). The initializer sets the size of w to 2 and sets the values of the first element of each a:
struct { int a[3], b; } w[] = { [0].a = {1}, [1].a[0] = 2 };
This is equivalent to:struct { int a[3], b; } w[] =
{
{ { 1, 0, 0 }, 0 },
{ { 2, 0, 0 }, 0 }
};

There is no way to specify repetition of an initializer in standard C.

Compound literals

In C99
C99
C99 is a modern dialect of the C programming language. It extends the previous version with new linguistic and library features, and helps implementations make better use of available computer hardware and compiler technology.-History:...

 it is possible to borrow the initialization methodology to the assignment of an array or structure, as shown:

array = (int[]){ 10, 20, 30, 40 };
pi = (struct s){ 3, 3.1415, "Pi" };

The alternative initialization also works:

pi = (struct s){ .z = "Pi", .x = 3, .y = 3.1415 };
Control structures
C is a free-form language
Free-form language
In computer programming, a free-form language is a programming language in which the positioning of characters on the page in program text is not significant. Program text does not need to be placed in specific columns as on old punched card systems, and frequently ends of lines are not significant...

.

Bracing style varies from programmer
Computer programming
Computer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...

 to programmer and can be the subject of debate. See Indent style
Indent style
In computer programming, an indent style is a convention governing the indentation of blocks of code to convey the program's structure. This article largely addresses the C programming language and its descendants, but can be applied to most other programming languages...

 for more details.

Compound statements

In the items in this section, any can be replaced with a compound statement. Compound statements have the form:

{


}

and are used as the body of a function or anywhere that a single statement is expected. The declaration-list declares variables to be used in that scope
Scope (programming)
In computer programming, scope is an enclosing context where values and expressions are associated. Various programming languages have various types of scopes. The type of scope determines what kind of entities it can contain and how it affects them—or semantics...

, and the statement-list are the actions to be performed. Brackets define their own scope, and variables defined inside those brackets will be automatically
deallocated at the closing bracket. Declarations and statements can be freely intermixed within a compound statement (as in C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

).

Selection statements

C has two types of selection statements: the if statement and the switch statement
Switch statement
In computer programming, a switch, case, select or inspect statement is a type of selection control mechanism that exists in most imperative programming languages such as Pascal, Ada, C/C++, C#, Java, and so on. It is also included in several other types of languages...

.

The if statement is in the form:

if ()

else



In the if statement, if the in parentheses is nonzero (true), control passes to . If the else clause is present and the is zero (false), control will pass to . The else part is optional and, if absent, a false will simply result in skipping over the . An else always matches the nearest previous unmatched if; braces may be used to override this when necessary, or for clarity.

The switch statement causes control to be transferred to one of several statements depending on the value of an expression
Expression (mathematics)
In mathematics, an expression is a finite combination of symbols that is well-formed according to rules that depend on the context. Symbols can designate numbers , variables, operations, functions, and other mathematical symbols, as well as punctuation, symbols of grouping, and other syntactic...

, which must have integral type. The substatement controlled by a switch is typically compound. Any statement within the substatement may be labeled with one or more case labels, which consist of the keyword case followed by a constant expression and then a colon (:). The syntax is as follows:

switch ()
{
case :

case :

break;
default :

}


No two of the case constants associated with the same switch may have the same value. There may be at most one default label associated with a switch. If none of the case labels are equal to the expression in the parentheses following switch, control passes to the default label or, if there is no default label, execution resumes just beyond the entire construct.

Switches may be nested; a case or default label is associated with the innermost switch that contains it. Switch statements can "fall through", that is, when one case section has completed its execution, statements will continue to be executed downward until a break; statement is encountered. Fall-through is useful in some circumstances, but is usually not desired.
In the preceding example, if is reached, the statements are executed and nothing more inside the braces. However if is reached, both and are executed since there is no break to separate the two case statements.

It is possible, although unusual, to insert the switch labels into the sub-blocks of other control structures. Examples of this include Duff's device
Duff's device
In computer science, Duff's device is an optimized implementation of a serial copy that uses a technique widely applied in assembly language for loop unwinding. Its discovery is credited to Tom Duff in November of 1983, who at the time was working for Lucasfilm. It is perhaps the most dramatic...

 and Simon Tatham
Simon Tatham
Simon Tatham is an English programmer known primarily for creating and maintaining PuTTY, a free software implementation of Telnet and SSH clients for Unix and Windows API platforms, along with an xterm terminal emulator...

's implementation of coroutine
Coroutine
Coroutines are computer program components that generalize subroutines to allow multiple entry points for suspending and resuming execution at certain locations...

s in Putty.

Iteration statements

C has three forms of iteration statement:

do

while ( ) ;

while ( )


for ( ; ; )



In the while
While loop
In most computer programming languages, a while loop is a control flow statement that allows code to be executed repeatedly based on a given boolean condition. The while loop can be thought of as a repeating if statement....

 and do statements, the substatement is executed repeatedly so long as the value of the expression remains nonzero (true). With while, the test, including all side effects from the expression, occurs before each execution
Execution (computers)
Execution in computer and software engineering is the process by which a computer or a virtual machine carries out the instructions of a computer program. The instructions in the program trigger sequences of simple actions on the executing machine...

 of the statement; with do, the test follows each iteration
Iteration
Iteration means the act of repeating a process usually with the aim of approaching a desired goal or target or result. Each repetition of the process is also called an "iteration," and the results of one iteration are used as the starting point for the next iteration.-Mathematics:Iteration in...

. Thus, a do statement always executes its substatement at least once, whereas while may not execute the substatement at all.

If all three expressions are present in a for, the statement:

for (e1; e2; e3)
s;

is equivalent to:

e1;
while (e2)
{
s;
e3;
}

except for the behavior of a continue; statement (which in the for loop jumps to e3 instead of e2).

Any of the three expressions in the for loop may be omitted. A missing second expression makes the while test always nonzero, creating a potentially infinite loop.

Since C99
C99
C99 is a modern dialect of the C programming language. It extends the previous version with new linguistic and library features, and helps implementations make better use of available computer hardware and compiler technology.-History:...

, the first expression may take the form of a declaration, typically including an initializer, such as:

for (int i=0; i< limit; i++){
...
}


The declaration's scope is limited to the extent of the for loop.

Jump statements

Jump statements transfer control unconditionally. There are four types of jump statements
Branch (computer science)
A branch is sequence of code in a computer program which is conditionally executed depending on whether the flow of control is altered or not . The term can be used when referring to programs in high level languages as well as program written in machine code or assembly language...

 in C: goto
Goto
goto is a statement found in many computer programming languages. It is a combination of the English words go and to. It performs a one-way transfer of control to another line of code; in contrast a function call normally returns control...

, continue, break, and return
Return statement
In computer programming, a return statement causes execution to leave the current subroutine and resume at the point in the code immediately after where the subroutine was called, known as its return address. The return address is saved, usually on the process's call stack, as part of the operation...

.

The goto statement looks like this:

goto ;


The identifier
Identifier
An identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical [countable] object , or physical [noncountable] substance...

 must be a label
Label (programming language)
A label in a programming language is a sequence of characters that identifies a location within source code. In most languages labels take the form of an identifier, often followed by a punctuation character . In many high level programming languages the purpose of a label is to act as the...

 (followed by a colon) located in the current function. Control transfers to the labeled statement.

A continue statement may appear only within an iteration statement and causes control to pass to the loop-continuation portion of the innermost enclosing iteration statement. That is, within each of the statements


while (expression)
{
/* ... */
cont: ;
}

do
{
/* ... */
cont: ;
} while (expression);

for (expr1; expr2; expr3) {
/* ... */
cont: ;
}


a continue not contained within a nested iteration statement is the same as goto cont.

The break statement is used to end a for loop, while loop, do loop, or switch statement. Control passes to the statement following the terminated statement.

A function returns to its caller by the return statement. When return is followed by an expression, the value is returned to the caller as the value of the function. Encountering the end of the function is equivalent to a return with no expression. In that case, if the function is declared as returning a value and the caller tries to use the returned value, the result is undefined.

Storing the address of a label

GCC
GNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...

 extends the C language with a unary && operator that returns the address of a label. This address can be stored in a void* variable type and may be used later in a goto instruction. For example, the following prints "hi " in an infinite loop:


void *ptr = &&J1;

J1: printf("hi ");
goto *ptr;


This feature can be used to implement a jump table.

Syntax

A C function definition consists of a return type
Return type
In computer programming, the return type defines and constrains the data type of value returned from a method or function...

 (void if no value is returned), a unique name, a list of parameters in parentheses, and various statements:

functionName( )
{

return ;
}


A function with non-void return type should include at least one return statement. The parameters are given by the , a comma-separated list of parameter declarations, each item in the list being a data type followed by an identifier: , , ....

If there are no parameters, the may be left empty or optionally be specified with the single word void.

It is possible to define a function as taking a variable number of parameters by providing the ... keyword as the last parameter instead of a data type and variable identifier. A commonly used function that does this is the standard library function printf, which has the declaration:

int printf (const char*, ...);


Manipulation of these parameters can be done by using the routines in the standard library header
Stdarg.h
stdarg.h is a header in the C standard library of the C programming language that allows functions to accept an indefinite number of arguments. It provides facilities for stepping through a list of function arguments of unknown number and type...

.

Function Pointers

A pointer to a function can be declared as follows:

(*)();


The following program shows use of a function pointer for selecting between addition and subtraction:
  1. include


int (*operation)(int x, int y);

int add(int x, int y)
{
return x + y;
}

int subtract(int x, int y)
{
return x - y;
}

int main(int argc, char* args[])
{
int foo = 1, bar = 1;

operation = add;
printf("%d + %d = %d\n", foo, bar, operation(foo, bar));
operation = subtract;
printf("%d - %d = %d\n", foo, bar, operation(foo, bar));
return 0;
}

Global structure

After preprocessing, at the highest level a C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

 program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

 consists of a sequence of declarations at file scope. These may be partitioned into several separate source files, which may be compiled separately; the resulting object modules are then linked along with implementation-provided run-time support modules to produce an executable image.

The declarations introduce functions, variable
Variable (programming)
In computer programming, a variable is a symbolic name given to some known or unknown quantity or information, for the purpose of allowing the name to be used independently of the information it represents...

s and type
Data type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...

s. C functions are akin to the subroutines of Fortran or the procedures of Pascal.

A definition is a special type of declaration. A variable definition sets aside storage and possibly initializes it, a function definition provides its body.

An implementation of C providing all of the standard library functions is called a hosted implementation. Programs written for hosted implementations are required to define a special function called main
Main function (programming)
In many programming languages, the main function is where a program starts execution. It is responsible for the high-level organization of the program's functionality, and typically has access to the command arguments given to the program when it was executed....

, which is the first function called when execution of the program begins.

Hosted implementations start program execution by invoking the main function, which must be defined following one of these prototypes:

int main {...}
int main(void) {...}
int main(int argc, char *argv[]) {...}
int main(int argc, char **argv) {...}


The first two definitions are equivalent (and both are compatible with C++). It is probably up to individual preference which one is used (the current C standard contains two examples of main and two of main(void), but the draft C++ standard uses main). The return value of main (which should be int) serves as termination status returned to the host environment.

The C standard defines return values 0 and EXIT_SUCCESS as indicating success and EXIT_FAILURE as indicating failure. (EXIT_SUCCESS and EXIT_FAILURE are defined in ). Other return values have implementation-defined meanings; for example, under Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 a program killed by a signal
Signal (computing)
A signal is a limited form of inter-process communication used in Unix, Unix-like, and other POSIX-compliant operating systems. Essentially it is an asynchronous notification sent to a process in order to notify it of an event that occurred. When a signal is sent to a process, the operating system...

 yields a return code of the numerical value of the signal plus 128.

A minimal C program would consist only of an empty main routine:

int main{}


The main function will usually call other functions to help it perform its job.

Some implementations are not hosted, usually because they are not intended to be used with an operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

. Such implementations are called free-standing in the C standard. A free-standing implementation is free to specify how it handles program startup; in particular it need not require a program to define a main function.

Functions may be written by the programmer or provided by existing libraries. Interfaces for the latter are usually declared by including header files—with the #include preprocessing directive—and the library objects are linked into the final executable image. Certain library functions, such as printf
Printf
Printf format string refers to a control parameter used by a class of functions typically associated with some types of programming languages. The format string specifies a method for rendering an arbitrary number of varied data type parameter into a string...

, are defined by the C standard; these are referred to as the standard library functions.

A function may return a value to caller (usually another C function, or the hosting environment for the function main). The printf function mentioned above returns how many characters were printed, but this value is often ignored.

Argument passing

In C, arguments are passed to functions by value while other languages may pass variables by reference.
This means that the receiving function gets copies of the values and has no direct way of altering the original variables.
For a function to alter a variable passed from another function, the caller must pass its address (a pointer to it), which can then be dereferenced in the receiving function. See Pointers for more information.


void incInt(int *y)
{
(*y)++; // Increase the value of 'x', in 'main' below, by one
}

int main(void)
{
int x = 0;
incInt(&x); // pass a reference to the var 'x'
return 0;
}


The function scanf
Scanf
Scanf format string refers to a control parameter used by a class of functions typically associated with some types of programming languages. The format string specifies a method for reading a string into an arbitrary number of varied data type parameter...

 works the same way:

int x;
scanf("%d", &x);


In order to pass an editable pointer to a function you have to pass a pointer to that pointer: its address.

  1. include
  2. include


void setInt(int **p, int n)
{
*pp = malloc(sizeof(int)); // allocate a memory area, saving the pointer in the
// location pointed to by the parameter "pp"
if (*pp NULL)
{
perror("malloc");
exit(EXIT_FAILURE);
}

// dereference the given pointer that has been assigned an address
// of dynamically allocated memory and set the int to the value of n (42)
**pp = n;
}

int main(void)
{
int *p; // create a pointer to an integer
setInt(&p, 42); // pass the address of 'p'
free(p);
return 0;
}


The parameter int **pp is a pointer to a pointer to an int, which is the address of the pointer p defined in the main function in this case.

Array parameters

Function parameters of array type may at first glance appear to be an exception to C's pass-by-value rule. The following program will print 2, not 1:
  1. include


void setArray(int array[], int index, int value)
{
array[index] = value;
}

int main(void)
{
int a[1] = {1};
setArray(a, 0, 2);
printf ("a[0]=%d\n", a[0]);
return 0;
}


However, there is a different reason for this behavior. In fact, a function parameter declared with an array type is treated like one declared to be a pointer. That is, the preceding declaration of setArray is equivalent to the following:

void setArray(int *array, int index, int value)


At the same time, C rules for the use of arrays in expressions cause the value of a in the call to setArray to be converted to a pointer to the first element of array a. Thus, in fact this is still an example of pass-by-value, with the caveat that it is the address of the first element of the array being passed by value, not the contents of the array.

Reserved keywords

The following words are reserved, and may not be used as identifiers:
auto
_Bool
break
case
char
_Complex
const
continue
default
do

double
else
enum
extern
float
for
goto
if
_Imaginary
inline

int
long
register
restrict
return
short
signed
sizeof
static
struct

switch
typedef
union
unsigned
void
volatile
while

Implementations may reserve other keywords, such as asm, although implementations typically provide non-standard keywords that begin with one or two underscores.

Case sensitivity

C identifiers are case sensitive (e.g., foo, FOO, and Foo are the names of different objects). Some linkers may map external identifiers to a single case, although this is uncommon in most modern linkers.

Comments

Text starting with /* is treated as a comment
Comment (computer programming)
In computer programming, a comment is a programming language construct used to embed programmer-readable annotations in the source code of a computer program. Those annotations are potentially significant to programmers but typically ignorable to compilers and interpreters. Comments are usually...

 and ignored. The comment ends at the next */; it can occur within expressions, and can span multiple lines. Accidental omission of the comment terminator is problematic in that the next comment's properly constructed comment terminator will be used to terminate the initial comment, and all code in between the comments will be considered as a comment. C-style comments do not nest; that is, accidentally placing a comment within a comment has unintended results:

/*
This line will be ignored.
/*
These lines will also be ignored. Note that the comment opening token above did
not start a new comment, and that the comment closing token below will close
the comment begun on line 1.
This line and the line below it will not be ignored. Both may produce compile errors.


C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 style line comments start with // and extend to the end of the line. This style of comment originated in BCPL
BCPL
BCPL is a procedural, imperative, and structured computer programming language designed by Martin Richards of the University of Cambridge in 1966.- Design :...

 and became valid C syntax in C99
C99
C99 is a modern dialect of the C programming language. It extends the previous version with new linguistic and library features, and helps implementations make better use of available computer hardware and compiler technology.-History:...

; it is not available in the original K&R C nor in ANSI C
ANSI C
ANSI C refers to the family of successive standards published by the American National Standards Institute for the C programming language. Software developers writing in C are encouraged to conform to the standards, as doing so aids portability between compilers.-History and outlook:The first...

:

// this line will be ignored by the compiler

/* these lines
will be ignored
by the compiler */

x = *p/*q; /* note: this comment starts after the 'p' */

Command-line arguments

The parameter
Parameter
Parameter from Ancient Greek παρά also “para” meaning “beside, subsidiary” and μέτρον also “metron” meaning “measure”, can be interpreted in mathematics, logic, linguistics, environmental science and other disciplines....

s given on a command line are passed to a C program with two predefined variables - the count of the command-line arguments in argc and the individual arguments
Parameter
Parameter from Ancient Greek παρά also “para” meaning “beside, subsidiary” and μέτρον also “metron” meaning “measure”, can be interpreted in mathematics, logic, linguistics, environmental science and other disciplines....

 as character strings in the pointer array argv. So the command:


myFilt p1 p2 p3


results in something like:
m y F i l t \0 p 1 \0 p 2 \0 p 3 \0
argv[0] argv[1] argv[2] argv[3]


While individual strings are arrays of contiguous characters, there is no guarantee that the strings are stored as a contiguous group.

The name of the program, argv[0], may be useful when printing diagnostic messages or for making one binary serve multiple purposes. The individual values of the parameters may be accessed with argv[1], argv[2], and argv[3], as shown in the following program:

  1. include


int main(int argc, char *argv[])
{
int i;

printf ("argc\t= %d\n", argc);
for (i = 0; i < argc; i++)
printf ("argv[%i]\t= %s\n", i, argv[i]);
return 0;
}

Evaluation order

In any reasonably complex expression, there arises a choice as to the order in which to evaluate the parts of the expression: may be evaluated in the order , , , , or in the order , , , . Formally, a conforming C compiler may evaluate expressions in any order between sequence point
Sequence point
A sequence point in imperative programming defines any point in a computer program's execution at which it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have yet been performed...

s (this allows the compiler to do some optimization). Sequence points are defined by:
  • Statement ends at semicolons.
  • The sequencing operator: a comma. However, commas that delimit function arguments are not sequence points.
  • The short-circuit operators: logical and (&&, which can be read and then) and logical or (||, which can be read or else).
  • The ternary operator
    ?:
    In computer programming, ?: is a ternary operator that is part of the syntax for a basic conditional expression in several programming languages...

     (?:): This operator evaluates its first sub-expression first, and then its second or third (never both of them) based on the value of the first.
  • Entry to and exit from a function call (but not between evaluations of the arguments).


Expressions before a sequence point are always evaluated before those after a sequence point. In the case of short-circuit evaluation, the second expression may not be evaluated depending on the result of the first expression. For example, in the expression , if the first argument evaluates to nonzero (true), the result of the entire expression cannot be anything else than true, so b is not evaluated. Similarly, in the expression , if the first argument evaluates to zero (false), the result of the entire expression cannot be anything else than false, so b is not evaluated.

The arguments to a function call may be evaluated in any order, as long as they are all evaluated by the time the function is entered. The following expression, for example, has undefined behavior:

printf("%s %s\n", argv[i = 0], argv[++i]);

Undefined behavior

An aspect of the C standard (not unique to C) is that the behavior of certain code is said to be "undefined". In practice, this means that the program produced from this code can do anything, from working as the programmer intended, to crashing every time it is run.

For example, the following code produces undefined behavior, because the variable b is modified more than once with no intervening sequence point:

  1. include


int main(void)
{
int a, b = 1;

a = b++ + b++;
printf("%d\n", a);
return 0;
}


Because there is no sequence point between the modifications of b in "b++ + b++", it is possible to perform the evaluation steps in more than one order, resulting in an ambiguous statement. This can be fixed by rewriting the code to insert a sequence point in order to enforce an unambiguous behavior, for example:

a = b++;
a += b++;

See also

  • Blocks (C language extension)
    Blocks (C language extension)
    Blocks are a nonstandard extension added by Apple Inc. to the C, C++, and Objective-C programming languages that uses a lambda expression-like syntax to create closures within these languages...

  • C programming language
    C (programming language)
    C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

  • C variable types and declarations
    C variable types and declarations
    In the C programming language, data types refers to an extensive system for declaring variables of different types. The language itself provides basic arithmetic types and syntax to build array and compound types...

  • Operators in C and C++
    Operators in C and C++
    This is a list of operators in the C and C++ programming languages. All the operators listed exist in C++; the fourth column "Included in C", dictates whether an operator is also present in C...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK