Concolic testing
Encyclopedia
Concolic testing is a hybrid software verification
technique that interleaves concrete execution (testing
on particular inputs) with symbolic execution
, a classical technique that treats program variables as symbolic variables. Symbolic execution is used in conjunction with an automated theorem prover or constraint solver based on constraint logic programming to generate new concrete inputs (test cases) with the aim of maximizing code coverage
. Its main focus is finding bugs in real-world software, rather than demonstrating program correctness.
The first tool of this type was PathCrawler, publicly demonstrated in September 2004. with a more detailed description published in April 2005. Another tool, called EXE (later improved and renamed to KLEE), based on similar ideas was independently developed by Professor Dawson Engler's group at Stanford University in 2005, and published in 2005 and 2006 Another description and discussion of the concept of concolic testing was published in DART, as implemented in the DART tool developed at Bell Labs (summer 2004) and the term concolic first occurred in "CUTE: a concolic unit testing engine for C". These tools (PathCrawler, EXE, DART and CUTE) applied concolic testing to unit testing of C programs and concolic testing was originally conceived as a white box
improvement upon established random testing methodologies. The technique was later generalized to testing multithreaded Java programs with jCUTE, and unit testing programs from their executable codes (tool OSMOSE). It was also combined with fuzz testing
and extended to detect exploitable security issues in large-scale x86 binaries by Microsoft Research
's SAGE.
Simple random testing, trying random values of x and y, would require an impractically large number of tests to reproduce the failure.
We begin with an arbitrary choice for x and y, for example x = y = 1. In the concrete execution, line 2 sets z to 2, and the test in line 3 fails since 1 ≠ 100000. Concurrently, the symbolic execution follows the same path but treats x and y as symbolic variables. It sets z to the expression 2y and notes that, because the test in line 3 failed, x ≠ 100000. This inequality is called a path condition and must be true for all executions following the same execution path as the current one.
Since we'd like the program to follow a different execution path on the next run, we take the last path condition encountered, x ≠ 100000, and negate it, giving x = 100000. An automated theorem prover is then invoked to find values for the input variables x and y given the complete set of symbolic variable values and path conditions constructed during symbolic execution. In this case, a valid response from the theorem prover might be x = 100000, y = 0.
Running the program on this input allows it to reach the inner branch on line 4, which is not taken since 100000 is not less than 2. The path conditions are x = 100000 and x ≥ z. The latter is negated, giving x < z. The theorem prover then looks for x, y satisfying x = 100000, x < z, and z = 2y; for example, x = 100000, y = 50001. This input reaches the error.
Algorithm
Essentially, a concolic testing algorithm operates as follows:
There are a few complications to the above procedure:
Limitations
Concolic testing has a number of limitations:
Many tools, notably DART and SAGE, have not been made available to the public at large. Note however that for instance SAGE is "used daily" for internal security testing at Microsoft.
Software verification
Software verification is a broader and more complex discipline of software engineering whose goal is to assure that software fully satisfies all the expected requirements.There are two fundamental approaches to verification:...
technique that interleaves concrete execution (testing
Software testing
Software testing is an investigation conducted to provide stakeholders with information about the quality of the product or service under test. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software...
on particular inputs) with symbolic execution
Symbolic execution
In computer science, symbolic execution refers to the analysis of programs by tracking symbolic rather than actual values, a case of abstract interpretation. The field of symbolic simulation applies the same concept to hardware...
, a classical technique that treats program variables as symbolic variables. Symbolic execution is used in conjunction with an automated theorem prover or constraint solver based on constraint logic programming to generate new concrete inputs (test cases) with the aim of maximizing code coverage
Code coverage
Code coverage is a measure used in software testing. It describes the degree to which the source code of a program has been tested. It is a form of testing that inspects the code directly and is therefore a form of white box testing....
. Its main focus is finding bugs in real-world software, rather than demonstrating program correctness.
The first tool of this type was PathCrawler, publicly demonstrated in September 2004. with a more detailed description published in April 2005. Another tool, called EXE (later improved and renamed to KLEE), based on similar ideas was independently developed by Professor Dawson Engler's group at Stanford University in 2005, and published in 2005 and 2006 Another description and discussion of the concept of concolic testing was published in DART, as implemented in the DART tool developed at Bell Labs (summer 2004) and the term concolic first occurred in "CUTE: a concolic unit testing engine for C". These tools (PathCrawler, EXE, DART and CUTE) applied concolic testing to unit testing of C programs and concolic testing was originally conceived as a white box
White box (software engineering)
In software engineering white box, in contrast to a black box, is a subsystem whose internals can be viewed, but usually cannot be altered. This is useful during white box testing, where a system is examined to make sure that it fulfills its requirements....
improvement upon established random testing methodologies. The technique was later generalized to testing multithreaded Java programs with jCUTE, and unit testing programs from their executable codes (tool OSMOSE). It was also combined with fuzz testing
Fuzz testing
Fuzz testing or fuzzing is a software testing technique, often automated or semi-automated, that involves providing invalid, unexpected, or random data to the inputs of a computer program. The program is then monitored for exceptions such as crashes or failing built-in code assertions...
and extended to detect exploitable security issues in large-scale x86 binaries by Microsoft Research
Microsoft Research
Microsoft Research is the research division of Microsoft created in 1991 for developing various computer science ideas and integrating them into Microsoft products. It currently employs Turing Award winners C.A.R. Hoare, Butler Lampson, and Charles P...
's SAGE.
Example
Consider the following simple example, written in C:Simple random testing, trying random values of x and y, would require an impractically large number of tests to reproduce the failure.
We begin with an arbitrary choice for x and y, for example x = y = 1. In the concrete execution, line 2 sets z to 2, and the test in line 3 fails since 1 ≠ 100000. Concurrently, the symbolic execution follows the same path but treats x and y as symbolic variables. It sets z to the expression 2y and notes that, because the test in line 3 failed, x ≠ 100000. This inequality is called a path condition and must be true for all executions following the same execution path as the current one.
Since we'd like the program to follow a different execution path on the next run, we take the last path condition encountered, x ≠ 100000, and negate it, giving x = 100000. An automated theorem prover is then invoked to find values for the input variables x and y given the complete set of symbolic variable values and path conditions constructed during symbolic execution. In this case, a valid response from the theorem prover might be x = 100000, y = 0.
Running the program on this input allows it to reach the inner branch on line 4, which is not taken since 100000 is not less than 2. The path conditions are x = 100000 and x ≥ z. The latter is negated, giving x < z. The theorem prover then looks for x, y satisfying x = 100000, x < z, and z = 2y; for example, x = 100000, y = 50001. This input reaches the error.
Algorithm
Essentially, a concolic testing algorithm operates as follows:
- Classify a particular set of variables as input variables. These variables will be treated as symbolic variables during symbolic execution. All other variables will be treated as concrete values.
- Instrument the program so that each operation which may affect a symbolic variable value or a path condition is logged to a trace file, as well as any error that occurs.
- Choose an arbitrary input to begin with.
- Execute the program.
- Symbolically re-execute the program on the trace, generating a set of symbolic constraints (including path conditions).
- Negate the last path condition not already negated in order to visit a new execution path. If there is no such path condition, the algorithm terminates.
- Invoke an automated theorem prover to generate a new input. If there is no input satisfying the constraints, return to step 6 to try the next execution path.
- Return to step 4.
There are a few complications to the above procedure:
- The algorithm performs a depth-first searchDepth-first searchDepth-first search is an algorithm for traversing or searching a tree, tree structure, or graph. One starts at the root and explores as far as possible along each branch before backtracking....
over an implicit tree of possible execution paths. In practice programs may have very large or infinite path trees — a common example is testing data structures that have an unbounded size or length. To prevent spending too much time on one small area of the program, the search may be depth-limited (bounded). - Symbolic execution and automated theorem provers have limitations on the classes of constraints they can represent and solve. For example, a theorem prover based on linear arithmetic will be unable to cope with the nonlinear path condition xy = 6. Any time that such constraints arise, the symbolic execution may substitute the current concrete value of one of the variables to simplify the problem. An important part of the design of a concolic testing system is selecting a symbolic representation precise enough to represent the constraints of interest.
Limitations
Concolic testing has a number of limitations:
- If the program exhibits nondeterministic behavior, it may follow a different path than the intended one. This can lead to nontermination of the search and poor coverage.
- Even in a deterministic program, a number of factors may lead to poor coverage, including imprecise symbolic representations, incomplete theorem proving, and failure to search the most fruitful portion of a large or infinite path tree.
- Programs which thoroughly mix the state of their variables, such as crytographic primitives, generate very large symbolic representations that cannot be solved in practice. For example, the condition "if (md5_hash(input) 0xdeadbeef)" requires the theorem prover to invert MD5MD5The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity...
, which is an open problem.
Tools
- pathcrawler-online.com is a restricted version of the current PathCrawler tool which is publicly available as an online test-case server for evaluation and education purposes.
- CUTE and jCUTE are available as binaries under a research-use only license by Urbana-Champaign for CC (programming language)C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
and for JavaJava (programming language)Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
. - CREST is an open-source solution for CC (programming language)C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
comparable to CUTE (modified BSD license). - Microsoft Pex, developed at Microsoft Rise, is publicly available as a Microsoft Visual Studio 2010 Power Tool for the NET Framework.
Many tools, notably DART and SAGE, have not been made available to the public at large. Note however that for instance SAGE is "used daily" for internal security testing at Microsoft.