Scanf
Encyclopedia
Scanf format string refers to a control parameter used by a class of functions typically associated with some types of programming languages. The format string specifies a method for reading a string into an arbitrary number of varied data type parameter(s). The input string is by default read from the standard input, but variants exist that read the input from other sources.
).
The following shows code in C that reads a variable number of unformatted decimal integer
s from the console and prints out each of them on a separate line:
After being processed by the program above, a messy list of integers such as
456 123 789 456 12
456 1
2378
will appear neatly as:
456
123
789
456
12
456
1
2378
To print out a word:
No matter what the datatype the programmer wants the program to read, the arguments (such as
As
s, such as PHP
, have derivatives such as
Format string specifications
The formatting placeholder
s in
, its reverse function.
There are rarely constants (i.e. characters that are not formatting placeholder
s) in a format string, mainly because a program is usually not designed to read known data. The exception is one or more whitespace
characters, which discards all whitespace characters in the input.
Some of the most commonly used placeholders follow:
The above can be used in compound with numeric modifiers and the
(
The
An example of a format string is
The above format string scans the first seven characters as a decimal integer, then reads the remaining as a string until a space, new line or tab is found, then scans the first non-whitespace character following and a double-precision floating-point number afterwards.
Error handling
Security
Like
s. Great care should be taken to ensure that the formatting string includes limitations for string and array sizes. In most cases the input string size from a user is arbitrary; it can not be determined before the
/*Another use that works only on some special compilers is:
scanf("Please enter a value %d",&n);
Which prints the string in quotes and stops to accept input at the indicated %signs.*/
See also
External links
Usage
Thescanf
function is found in C, in which it reads input for numbers and other datatypes from standard input (often a command line interface or similar kind of a text user interfaceText user interface
TUI short for: Text User Interface or Textual User Interface , is a retronym that was coined sometime after the invention of graphical user interfaces, to distinguish them from text-based user interfaces...
).
The following shows code in C that reads a variable number of unformatted decimal integer
Integer
The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...
s from the console and prints out each of them on a separate line:
After being processed by the program above, a messy list of integers such as
456 123 789 456 12
456 1
2378
will appear neatly as:
456
123
789
456
12
456
1
2378
To print out a word:
No matter what the datatype the programmer wants the program to read, the arguments (such as
&n
above) must be pointers pointing to memory. Otherwise, the function will not perform correctly because it will be attempting to overwrite the wrong sections of memory, rather than pointing to the memory location of the variable you are attempting to get input for.As
scanf
is designated to read only from standard input, many programming languages with interfaceInterface (computer science)
In the field of computer science, an interface is a tool and concept that refers to a point of interaction between components, and is applicable at the level of both hardware and software...
s, such as PHP
PHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...
, have derivatives such as
sscanf
and fscanf
but not scanf
itself.Format string specifications
The formatting placeholder
Placeholder
Placeholder may refer to:In language:* Placeholder name, words that can refer to objects or people, whose names are unknown or irrelevant* Filler text, shares some characteristics of a real written text, but is random or otherwise generated...
s in
scanf
are more or less the same as that in printf
Printf
Printf format string refers to a control parameter used by a class of functions typically associated with some types of programming languages. The format string specifies a method for rendering an arbitrary number of varied data type parameter into a string...
, its reverse function.
There are rarely constants (i.e. characters that are not formatting placeholder
Placeholder
Placeholder may refer to:In language:* Placeholder name, words that can refer to objects or people, whose names are unknown or irrelevant* Filler text, shares some characteristics of a real written text, but is random or otherwise generated...
s) in a format string, mainly because a program is usually not designed to read known data. The exception is one or more whitespace
Whitespace (computer science)
In computer science, whitespace is any single character or series of characters that represents horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page...
characters, which discards all whitespace characters in the input.
Some of the most commonly used placeholders follow:
-
%d
: Scan an integer as a signed decimalDecimalThe decimal numeral system has ten as its base. It is the numerical base most widely used by modern civilizations....
number. -
%i
: Scan an integer as a signed number. Similar to%d
, but interprets the number as hexadecimalHexadecimalIn mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
when preceded by0x
and octalOctalThe octal numeral system, or oct for short, is the base-8 number system, and uses the digits 0 to 7. Numerals can be made from binary numerals by grouping consecutive binary digits into groups of three...
when preceded by0
. For example, the string031
would be read as 31 using%d
, and 25 using%i
. The flagh
in%hi
indicates conversion to ashort
andhh
conversion to achar
. -
%u
: Scan for decimalunsigned int
(Note that in the C99 standard the input value minus sign is optional, so if a negative number is read, no errors will arise and the result will be the two's complementTwo's complementThe two's complement of a binary number is defined as the value obtained by subtracting the number from a large power of two...
. Seestrtoul
.) Correspondingly,%hu
scans for anunsigned short
and%hhu
for anunsigned char
. -
%f
: Scan a floating-point number in normal (fixed-pointFixed-point arithmeticIn computing, a fixed-point number representation is a real data type for a number that has a fixed number of digits after the radix point...
) notation. -
%g
,%G
: Scan a floating-point number in either normal or exponential notation.%g
uses lower-case letters and%G
uses upper-case. -
%x
,%X
: Scan an integer as an unsigned hexadecimalHexadecimalIn mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
number. -
%o
: Scan an integer as an octalOctalThe octal numeral system, or oct for short, is the base-8 number system, and uses the digits 0 to 7. Numerals can be made from binary numerals by grouping consecutive binary digits into groups of three...
number. -
%s
: Scan a character string. The scan terminates at whitespaceWhitespace (computer science)In computer science, whitespace is any single character or series of characters that represents horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page...
. A null characterNull characterThe null character , abbreviated NUL, is a control character with the value zero.It is present in many character sets, including ISO/IEC 646 , the C0 control code, the Universal Character Set , and EBCDIC...
is stored at the end of the string, which means that the buffer supplied must be at least one character longer than the specified input length. -
%c
: Scan a character (char). No null characterNull characterThe null character , abbreviated NUL, is a control character with the value zero.It is present in many character sets, including ISO/IEC 646 , the C0 control code, the Universal Character Set , and EBCDIC...
is added. -
(space)
: Space scans for whitespaceWhitespace (computer science)In computer science, whitespace is any single character or series of characters that represents horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page...
characters. -
%lf
: Scan as a double floating-point number. -
%Lf
: Scan as a long doubleLong doubleIn C and related programming languages, long double refers to a floating point data type that is often more precise than double precision. As with C's other floating point types, it may not necessarily map to an IEEE format.-History:...
floating-point number.
The above can be used in compound with numeric modifiers and the
l
, L
modifiers which stand for "long" in between the percent symbol and the letter. There can also be numeric values between the percent symbol and the letters, preceding the long
modifiers if any, that specifies the number of characters to be scanned. An optional asteriskAsterisk
An asterisk is a typographical symbol or glyph. It is so called because it resembles a conventional image of a star. Computer scientists and mathematicians often pronounce it as star...
(
*
) right after the percent symbol denotes that the datum read by this format specifier is not to be stored in a variable. No argument behind the format string should be included for this dropped variable.The
ff
modifier in printf is not present in scanf, causing differences between modes of input and output. The ll
and hh
modifiers are not present in the C90 standard, but are present in the C99 standard.An example of a format string is
"%7d%s %c%lf"
The above format string scans the first seven characters as a decimal integer, then reads the remaining as a string until a space, new line or tab is found, then scans the first non-whitespace character following and a double-precision floating-point number afterwards.
Error handling
scanf
is usually used in situations when the program cannot guarantee that the input is in the expected format. Therefore a robust program must check whether the scanf
call succeeded and take appropriate action. If the input was not in the correct format, the erroneous data will still be on the input stream and must be read and discarded before new input can be read. An alternative method of reading input, which avoids this, is to use fgets
and then examine the string read in. The last step can be done by sscanf
, for example.Security
Like
printfPrintfPrintf format string refers to a control parameter used by a class of functions typically associated with some types of programming languages. The format string specifies a method for rendering an arbitrary number of varied data type parameter into a string...
, scanf
is vulnerable to format string attackFormat string attack
Uncontrolled format string is a type of software vulnerability, discovered around 1999, that can be used in security exploits. Previously thought harmless, format string exploits can be used to crash a program or to execute harmful code...
s. Great care should be taken to ensure that the formatting string includes limitations for string and array sizes. In most cases the input string size from a user is arbitrary; it can not be determined before the
scanf
function is executed. This means that uses of %s
placeholders without length specifiers are inherently insecure and exploitable for buffer overflows. Another potential problem is to allow dynamic formatting strings, for example formatting strings stored in configuration files or other user controlled files. In this case the allowed input length of string sizes can not be specified unless the formatting string is checked beforehand and limitations are enforced. Related to this are additional or mismatched formatting placeholders which do not match the actual vararg list. These placeholders might be partially extracted from the stack, contain undesirable or even insecure pointers depending on the particular implementation of varargs./*Another use that works only on some special compilers is:
scanf("Please enter a value %d",&n);
Which prints the string in quotes and stops to accept input at the indicated %signs.*/
See also
- Printf format string
- C programming language
- C++C++C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
- PHPPHPPHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...
- Format string attackFormat string attackUncontrolled format string is a type of software vulnerability, discovered around 1999, that can be used in security exploits. Previously thought harmless, format string exploits can be used to crash a program or to execute harmful code...
External links
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
PHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...
Format string attack
Uncontrolled format string is a type of software vulnerability, discovered around 1999, that can be used in security exploits. Previously thought harmless, format string exploits can be used to crash a program or to execute harmful code...