Funarg problem
Encyclopedia
In computer science
, the funarg problem refers to the difficulty in implementing first-class function
s (functions as first-class object
s) in stack-based programming language
implementations.
The difficulty only arises if the body of a nested function
refers directly (i.e., not via argument passing) to identifiers defined in the environment in which the function is defined, but not in the environment of the function call. To summarize the discussion below, two standard resolutions are to either forbid such references or to create closure
s.
There are two subtly different versions of the funarg problem. The upwards funarg problem arises from returning (or otherwise transmitting "upwards") a function from a function call. The downwards funarg problem arises from passing a function as a parameter to another function call.
s and local variable
s) must be preserved in order for execution to proceed after the callee returns. In most compiled programs, this local state is stored on the call stack
in a data structure called an activation record or stack frame. This stack frame is pushed, or allocated, when a function is called, and popped, or deallocated, when the function returns. The upwards funarg problem arises when the calling function refers to the called/exited function's state after that function has returned. Therefore, the activation record containing the called function's state variables must not be deallocated when the function returns, violating the stack-based function call paradigm
.
One solution to the upwards funarg problem is to simply allocate all activation records from the heap instead of the stack, and rely on some form of garbage collection
or reference counting
to deallocate the activation records when they are no longer needed. Managing activation records on the heap is much less efficient than on the stack, so this strategy may significantly degrade performance. Moreover, because most functions in typical programs do not create upwards funargs, much of this overhead is unnecessary.
Some efficiency-minded compilers employ a hybrid approach in which the activation records for a function are allocated from the stack if the compiler is able to deduce, through static program analysis, that the function creates no upwards funargs. Otherwise, the activation records are allocated from the heap.
Another solution is to simply copy the value of the variables into the closure at the time the closure is created. This will cause a different behavior in the case of mutable variables, because the state will no longer be shared between closures. But if it is known that the variables are constant, then this approach will be equivalent. The ML languages take this approach since variables in those languages are bound to values -- i.e. variables cannot be changed. Java
also takes this approach with respect to anonymous classes, in that it only allows one to refer to variables in the enclosing scope that are
Some languages allow you to explicitly choose between the two behaviors. PHP
5.3's anonymous functions allow you to specify variables to include in the closure using the
anonymous functions, the upwards funarg problem is solved by requiring one to use
-inspired pseudocode
defines function composition
:
compose f g = λx → f (g x)
The problem in this case exists if the compose function allocates the parameter variables
and activation records that can complicate human and machine reasoning about the program state.
The downwards funarg problem complicates the efficient compilation of tail recursion and code written in continuation-passing style
. In these special cases, the intent of the programmer is (usually) that the function run in limited stack space, so the "faster" behavior may actually be undesirable.
historically avoids the main difficulty of the funarg problem by not allowing function definitions to be nested; because the environment of every function is the same, containing just the statically-allocated global variables and functions, a pointer to a function's code describes the function completely. Apple has proposed and implemented a closure syntax for C that solves the upwards funarg problem by dynamically moving closures from the stack to the heap as necessary. The Java programming language deals with it by requiring that context used by nested functions in anonymous inner classes be declared final
.
In functional languages, functions are first-class values and can be passed anywhere. Thus, implementations of Scheme or SML must address both the upwards and downwards funarg problems. This is usually accomplished by representing function values as heap-allocated closures, as previously described. The Objective Caml compiler employs a hybrid technique (based on program analysis
) to maximize efficiency.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, the funarg problem refers to the difficulty in implementing first-class function
First-class function
In computer science, a programming language is said to have first-class functions if it treats functions as first-class objects. Specifically, this means that the language supports passing functions as arguments to other functions, returning them as the values from other functions, and assigning...
s (functions as first-class object
First-class object
In programming language design, a first-class citizen , in the context of a particular programming language, is an entity that can be constructed at run-time, passed as a parameter, returned from a subroutine, or assigned into a variable...
s) in stack-based programming language
Stack-oriented programming language
A stack-oriented programming language is one that relies on a stack machine model for passing parameters. Several programming languages fit this description, notably Forth, RPL, PostScript, and also many Assembly languages ....
implementations.
The difficulty only arises if the body of a nested function
Nested function
In computer programming, a nested function is a function which is lexically encapsulated within another function. It can only be called by the enclosing function or by functions directly or indirectly nested within the same enclosing function. In other words, the scope of the nested function is...
refers directly (i.e., not via argument passing) to identifiers defined in the environment in which the function is defined, but not in the environment of the function call. To summarize the discussion below, two standard resolutions are to either forbid such references or to create closure
Closure (computer science)
In computer science, a closure is a function together with a referencing environment for the non-local variables of that function. A closure allows a function to access variables outside its typical scope. Such a function is said to be "closed over" its free variables...
s.
There are two subtly different versions of the funarg problem. The upwards funarg problem arises from returning (or otherwise transmitting "upwards") a function from a function call. The downwards funarg problem arises from passing a function as a parameter to another function call.
Upwards funarg problem
When one function calls another during a typical program's execution, the local state of the caller (including parameterParameter (computer science)
In computer programming, a parameter is a special kind of variable, used in a subroutine to refer to one of the pieces of data provided as input to the subroutine. These pieces of data are called arguments...
s and local variable
Local variable
In computer science, a local variable is a variable that is given local scope. Such a variable is accessible only from the function or block in which it is declared. In programming languages with only two levels of visibility, local variables are contrasted with global variables...
s) must be preserved in order for execution to proceed after the callee returns. In most compiled programs, this local state is stored on the call stack
Call stack
In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, control stack, run-time stack, or machine stack, and is often shortened to just "the stack"...
in a data structure called an activation record or stack frame. This stack frame is pushed, or allocated, when a function is called, and popped, or deallocated, when the function returns. The upwards funarg problem arises when the calling function refers to the called/exited function's state after that function has returned. Therefore, the activation record containing the called function's state variables must not be deallocated when the function returns, violating the stack-based function call paradigm
Stack-based memory allocation
Stacks in computing architectures are regions of memory where data is added or removed in a last-in-first-out manner.In most modern computer systems, each thread has a reserved region of memory referred to as its stack. When a function executes, it may add some of its state data to the top of the...
.
One solution to the upwards funarg problem is to simply allocate all activation records from the heap instead of the stack, and rely on some form of garbage collection
Garbage collection (computer science)
In computer science, garbage collection is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program...
or reference counting
Reference counting
In computer science, reference counting is a technique of storing the number of references, pointers, or handles to a resource such as an object, block of memory, disk space or other resource...
to deallocate the activation records when they are no longer needed. Managing activation records on the heap is much less efficient than on the stack, so this strategy may significantly degrade performance. Moreover, because most functions in typical programs do not create upwards funargs, much of this overhead is unnecessary.
Some efficiency-minded compilers employ a hybrid approach in which the activation records for a function are allocated from the stack if the compiler is able to deduce, through static program analysis, that the function creates no upwards funargs. Otherwise, the activation records are allocated from the heap.
Another solution is to simply copy the value of the variables into the closure at the time the closure is created. This will cause a different behavior in the case of mutable variables, because the state will no longer be shared between closures. But if it is known that the variables are constant, then this approach will be equivalent. The ML languages take this approach since variables in those languages are bound to values -- i.e. variables cannot be changed. Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
also takes this approach with respect to anonymous classes, in that it only allows one to refer to variables in the enclosing scope that are
final
(i.e. constant).Some languages allow you to explicitly choose between the two behaviors. PHP
PHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...
5.3's anonymous functions allow you to specify variables to include in the closure using the
use
clause; if the variable is listed by reference, it includes a reference to the original variable; otherwise, it makes a copy. In Apple's BlocksBlocks (C language extension)
Blocks are a nonstandard extension added by Apple Inc. to the C, C++, and Objective-C programming languages that uses a lambda expression-like syntax to create closures within these languages...
anonymous functions, the upwards funarg problem is solved by requiring one to use
Block_copy
when returning the block to a higher context, and Block_copy
copies normal variables from the stack into a separate copy in the closure; if one wants to share the state between closures, the variable must be declared with the __block
modifier, in which case that variable is allocated on the heap.Example
The following HaskellHaskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...
-inspired pseudocode
Pseudocode
In computer science and numerical computation, pseudocode is a compact and informal high-level description of the operating principle of a computer program or other algorithm. It uses the structural conventions of a programming language, but is intended for human reading rather than machine reading...
defines function composition
Function composition
In mathematics, function composition is the application of one function to the results of another. For instance, the functions and can be composed by computing the output of g when it has an argument of f instead of x...
:
compose f g = λx → f (g x)
λLambda calculusIn mathematical logic and computer science, lambda calculus, also written as λ-calculus, is a formal system for function definition, function application and recursion. The portion of lambda calculus relevant to computation is now called the untyped lambda calculus...
is the operator for constructing a new function, which -- in this case -- has one argument, x
, and returns the result of first applying g
to x
then applying f
to that. This λ function carries the functions f
and g
(or pointers to them) as internal state.The problem in this case exists if the compose function allocates the parameter variables
f
and g
on the stack. When compose
returns, the stack frame containing f
and g
is discarded. When the internal function λx
attempts to access g
, it will access a discarded memory area.Downwards funarg problem
A downwards funarg may also refer to a function's state when that function is not actually executing. However, because, by definition, the existence of a downwards funarg is contained in the execution of the function that creates it, the activation record for the function can usually still be stored on the stack. Nonetheless, the existence of downwards funargs implies a tree structure of closuresClosure (computer science)
In computer science, a closure is a function together with a referencing environment for the non-local variables of that function. A closure allows a function to access variables outside its typical scope. Such a function is said to be "closed over" its free variables...
and activation records that can complicate human and machine reasoning about the program state.
The downwards funarg problem complicates the efficient compilation of tail recursion and code written in continuation-passing style
Continuation-passing style
In functional programming, continuation-passing style is a style of programming in which control is passed explicitly in the form of a continuation. Gerald Jay Sussman and Guy L. Steele, Jr...
. In these special cases, the intent of the programmer is (usually) that the function run in limited stack space, so the "faster" behavior may actually be undesirable.
Practical implications
Historically, the upwards funarg problem has proven to be the more difficult. For example, the Pascal programming language allows functions to be passed as arguments but not returned as results; thus implementations of Pascal are required to address the downwards funarg problem but not the upwards one. The Oberon programming language (a descendant of Pascal) allows functions both as parameters and return values but the assigned function may not be a nested function. The C programming languageC (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
historically avoids the main difficulty of the funarg problem by not allowing function definitions to be nested; because the environment of every function is the same, containing just the statically-allocated global variables and functions, a pointer to a function's code describes the function completely. Apple has proposed and implemented a closure syntax for C that solves the upwards funarg problem by dynamically moving closures from the stack to the heap as necessary. The Java programming language deals with it by requiring that context used by nested functions in anonymous inner classes be declared final
Final (Java)
In the Java programming language, the final keyword is used in several different contexts to define an entity which cannot later be changed.- Final classes :...
.
In functional languages, functions are first-class values and can be passed anywhere. Thus, implementations of Scheme or SML must address both the upwards and downwards funarg problems. This is usually accomplished by representing function values as heap-allocated closures, as previously described. The Objective Caml compiler employs a hybrid technique (based on program analysis
Program analysis (computer science)
In computer science, program analysis is the process of automatically analysing the behavior of computer programs. Two main approaches in program analysis are static program analysis and dynamic program analysis...
) to maximize efficiency.
See also
- Activation record
- Closure (computer science)Closure (computer science)In computer science, a closure is a function together with a referencing environment for the non-local variables of that function. A closure allows a function to access variables outside its typical scope. Such a function is said to be "closed over" its free variables...
- Functional programmingFunctional programmingIn computer science, functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasizes the application of functions, in contrast to the imperative programming style, which emphasizes changes in state...
- Lambda calculusLambda calculusIn mathematical logic and computer science, lambda calculus, also written as λ-calculus, is a formal system for function definition, function application and recursion. The portion of lambda calculus relevant to computation is now called the untyped lambda calculus...
- Name bindingName bindingIn programming languages, name binding is the association of objects with identifiers. An identifier bound to an object is said to reference that object. Machine languages have no built-in notion of identifiers, but name-object bindings as a service and notation for the programmer is implemented...
- Referential transparency
- Scope (programming)Scope (programming)In computer programming, scope is an enclosing context where values and expressions are associated. Various programming languages have various types of scopes. The type of scope determines what kind of entities it can contain and how it affects them—or semantics...
- Spaghetti stackSpaghetti stackA spaghetti stack in computer science is an N-ary tree data structure in which child nodes have pointers to the parent nodes . When a list of nodes is traversed from a leaf node to the root node by chasing these parent pointers, the structure looks like a linked list stack...
External links
- Bindings, Procedures, Functions, Functional Programming, and the Lambda Calculus
- Joel MosesJoel MosesJoel Moses is an Israeli-American computer scientist and Institute Professor at the Massachusetts Institute of Technology.Joel Moses was born in Palestine in 1941 and emigrated to the U.S. in 1954. He attended Midwood High School in Brooklyn, New York...
. "The Function of FUNCTION in LISP, or Why the FUNARG Problem Should be Called the Environment Problem". MIT AI Memo 199, 1970. - Andrew W. AppelAndrew AppelAndrew Wilson Appel is the Eugene Higgins Professor of computer science at Princeton University, New Jersey. He is especially well-known because of his compiler books, the Modern Compiler Implementation in ML series, as well as Compiling With Continuations...
, Zhong Shao. An Empirical and Analytic Study of Stack vs. Heap Cost for Languages with Closures. [ftp://ftp.cs.princeton.edu/techreports/1994/450.ps.gz Princeton CS Tech Report TR-450-94], 1994.