Memoization
Encyclopedia


In computing
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

, memoization is an optimization
Optimization (computer science)
In computer science, program optimization or software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources...

 technique used primarily to speed up computer programs by having function calls
Subroutine
In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....

 avoid repeating the calculation of results for previously processed inputs. Memoization has also been used in other contexts (and for purposes other than speed gains), such as in simple mutually recursive
Mutual recursion
Mutual recursion is a form of recursion where two mathematical or computational functions are defined in terms of each other.For instance, consider two functions even? and odd? defined as follows: function even?...

 descent parsing in a general top-down
Top-down parsing
Top-down parsing is a type of parsing strategy where in one first looks at the highest level of the parse tree and works down the parse tree by using the rewriting rules of a formal grammar...

 parsing
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...

 algorithm that accommodates ambiguity and left recursion
Left recursion
In computer science, left recursion is a special case of recursion.In terms of context-free grammar, a non-terminal r is left-recursive if the left-most symbol in any of r’s ‘alternatives’ either immediately or through some other non-terminal definitions rewrites to r again.- Definition :"A...

 in polynomial time and space. Although related to caching
Cache
In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...

, memoization refers to a specific case of this optimization, distinguishing it from forms of caching such as buffering
Buffer (computer science)
In computer science, a buffer is a region of a physical memory storage used to temporarily hold data while it is being moved from one place to another. Typically, the data is stored in a buffer as it is retrieved from an input device or just before it is sent to an output device...

 or page replacement
Page replacement algorithm
In a computer operating system that uses paging for virtual memory management, page replacement algorithms decide which memory pages to page out when a page of memory needs to be allocated...

. In the context of some logic programming
Logic programming
Logic programming is, in its broadest sense, the use of mathematical logic for computer programming. In this view of logic programming, which can be traced at least as far back as John McCarthy's [1958] advice-taker proposal, logic is used as a purely declarative representation language, and a...

 languages, memoization is also known as tabling; see also lookup table
Lookup table
In computer science, a lookup table is a data structure, usually an array or associative array, often used to replace a runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than...

.

Etymology

The term memoization was coined by Donald Michie
Donald Michie
Donald Michie was a British researcher in artificial intelligence. During World War II, Michie worked for the Government Code and Cypher School at Bletchley Park, contributing to the effort to solve "Tunny," a German teleprinter cipher.-Early life and career:Michie was born in Rangoon, Burma...

 in 1968 and is derived from the Latin
Latin
Latin is an Italic language originally spoken in Latium and Ancient Rome. It, along with most European languages, is a descendant of the ancient Proto-Indo-European language. Although it is considered a dead language, a number of scholars and members of the Christian clergy speak it fluently, and...

 word memorandum
Memorandum
A memorandum is from the Latin verbal phrase memorandum est, the gerundive form of the verb memoro, "to mention, call to mind, recount, relate", which means "It must be remembered ..."...

(to be remembered), and thus carries the meaning of turning [the results of] a function into something to be remembered. While memoization might be confused with memorization
Memorization
Memorization is the process of committing something to memory. The act of memorization is often a deliberate mental process undertaken in order to store in memory for later recall items such as experiences, names, appointments, addresses, telephone numbers, lists, stories, poems, pictures, maps,...

(because of the shared cognate
Cognate
In linguistics, cognates are words that have a common etymological origin. This learned term derives from the Latin cognatus . Cognates within the same language are called doublets. Strictly speaking, loanwords from another language are usually not meant by the term, e.g...

), memoization has a specialized meaning in computing.

Overview

A memoized function "remembers" the results corresponding to some set of specific inputs. Subsequent calls with remembered inputs return the remembered result rather than recalculating it, thus eliminating the primary cost of a call with given parameters from all but the first call made to the function with those parameters. The set of remembered associations may be a fixed-size set controlled by a replacement algorithm or a fixed set, depending on the nature of the function and its use. A function can only be memoized if it is referentially transparent; that is, only if calling the function has exactly the same effect as replacing that function call with its return value. (Special case exceptions to this restriction exist, however.) While related to lookup table
Lookup table
In computer science, a lookup table is a data structure, usually an array or associative array, often used to replace a runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than...

s, since memoization often uses such tables in its implementation, memoization populates its cache of results transparently on the fly
On the fly
-Colloquial usage:In colloquial use, on the fly means something created when needed. The phrase is used to mean:# something that was not planned ahead# changes that are made during the execution of same activity: ex tempore, impromptu.-Automotive usage:...

, as needed, rather than in advance.

Memoization is a means of lowering a function's time cost in exchange for space cost; that is, memoized functions become optimized for speed in exchange for a higher use of computer memory
Computer memory
In computing, memory refers to the physical devices used to store programs or data on a temporary or permanent basis for use in a computer or other digital electronic device. The term primary memory is used for the information in physical systems which are fast In computing, memory refers to the...

 space. The time/space "cost" of algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

s has a specific name in computing: computational complexity
Computational complexity theory
Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other...

. All functions have a computational complexity in time (i.e. they take time to execute) and in space.

Although a trade-off
Trade-off
A trade-off is a situation that involves losing one quality or aspect of something in return for gaining another quality or aspect...

 occurs (i.e., space used is speed gained), this differs from some other optimizations that involve time-space trade-off, such as strength reduction
Strength reduction
Strength reduction is a compiler optimization where expensive operations are replaced with equivalent but less expensive operations. The classic example of strength reduction converts "strong" multiplications inside a loop into "weaker" additions – something that frequently occurs in array...

, in that memoization is a run-time rather than compile-time optimization. Moreover, strength reduction potentially replaces a costly operation such as multiplication with a less costly operation such as addition, and the results in savings can be highly machine-dependent
Machine-dependent
Machine-dependent is a term for application software that runs only on a particular type of computer. Conversely, applications that run on a variety of different types of computers are called machine-independent, or cross-platform....

, non-portable across machines, whereas memoization is a more machine-independent, cross-platform
Cross-platform
In computing, cross-platform, or multi-platform, is an attribute conferred to computer software or computing methods and concepts that are implemented and inter-operate on multiple computer platforms...

 strategy.

Consider the following pseudocode
Pseudocode
In computer science and numerical computation, pseudocode is a compact and informal high-level description of the operating principle of a computer program or other algorithm. It uses the structural conventions of a programming language, but is intended for human reading rather than machine reading...

 function to calculate the factorial
Factorial
In mathematics, the factorial of a non-negative integer n, denoted by n!, is the product of all positive integers less than or equal to n...

 of n:

function factorial (n is a non-negative integer)
if n is 0 then
return 1 [by the convention that 0! = 1]
else
return factorial(n – 1) times n [recursively invoke factorial
with the parameter 1 less than n]
end if
end function

For every integer
Integer
The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...

 n such that n≥0, the final result of the function factorial is invariant
Invariant (computer science)
In computer science, a predicate is called an invariant to a sequence of operations provided that: if the predicate is true before starting the sequence, then it is true at the end of the sequence.-Use:...

; if invoked as x = factorial(3), the result is such that x will always be assigned the value 6. A non-memoized version of the above, given the nature of the recursive
Recursion
Recursion is the process of repeating items in a self-similar way. For instance, when the surfaces of two mirrors are exactly parallel with each other the nested images that occur are a form of infinite recursion. The term has a variety of meanings specific to a variety of disciplines ranging from...

 algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

 involved, would require n + 1 invocations of factorial to arrive at a result, and each of these invocations, in turn, has an associated cost in the time it takes the function to return the value computed. Depending on the machine, this cost might be the sum of:
  1. The cost to set up the functional call stack frame.
  2. The cost to compare n to 0.
  3. The cost to subtract 1 from n.
  4. The cost to set up the recursive call stack frame. (As above.)
  5. The cost to multiply the result of the recursive call to factorial by n.
  6. The cost to store the return result so that it may be used by the calling context.


In a non-memoized implementation, every top-level call to factorial includes the cumulative cost of steps 2 through 6 proportional to the initial value of n.

A memoized version of the factorial function follows:

function factorial (n is a non-negative integer)
if n is in lookup-table then
return lookup-table-value-for-n
else if n is 0 then
return 1 [by the convention that 0! = 1]
else
let x = factorial(n – 1) times n [recursively invoke factorial
with the parameter 1 less than n]
store x in lookup-table in the nth slot [remember the result of n! for later]
return x
end if
end function

In this particular example, if factorial is first invoked with 5, and then invoked later with any value less than or equal to five, those return values will also have been memoized, since factorial will have been called recursively with the values 5, 4, 3, 2, 1, and 0, and the return values for each of those will have been stored. If it is then called with a number greater than 5, such as 7, only 2 recursive calls will be made (7 and 6), and the value for 5! will have been stored from the previous call. In this way, memoization allows a function to become more time-efficient the more often it is called, thus resulting in eventual overall speed up.

Automatic memoization

While memoization may be added to functions internally and explicitly by a computer programmer in much the same way the above memoized version of factorial is implemented, referentially transparent functions may also be automatically memoized externally. The techniques employed by Peter Norvig
Peter Norvig
Peter Norvig is an American computer scientist. He is currently the Director of Research at Google Inc.-Educational Background:...

 have application not only in Common Lisp
Common Lisp
Common Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 , . From the ANSI Common Lisp standard the Common Lisp HyperSpec has been derived for use with web browsers...

 (the language in which his paper demonstrated automatic memoization), but in various other programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

s. Applications of automatic memoization have also been formally explored in the study of term rewriting and artificial intelligence
Artificial intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...

.

In those programming languages where functions are first or second-class objects
First-class object
In programming language design, a first-class citizen , in the context of a particular programming language, is an entity that can be constructed at run-time, passed as a parameter, returned from a subroutine, or assigned into a variable...

 (such as Lua, with its first-class function
First-class function
In computer science, a programming language is said to have first-class functions if it treats functions as first-class objects. Specifically, this means that the language supports passing functions as arguments to other functions, returning them as the values from other functions, and assigning...

s, or Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

 http://perl.plover.com/MiniMemoize/memoize.html), automatic memoization can be implemented by replacing (at run-time) a function with its calculated value once a value has been calculated for a given set of parameters. The function that does this value-for-function-object replacement can generically wrap any referentially transparent function. Consider the following pseudocode
Pseudocode
In computer science and numerical computation, pseudocode is a compact and informal high-level description of the operating principle of a computer program or other algorithm. It uses the structural conventions of a programming language, but is intended for human reading rather than machine reading...

 (where it is assumed that functions are first-class values):

function memoized-call (F is a function object parameter)
if F has no attached array values then
allocate an associative array
Associative array
In computer science, an associative array is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection....

 called values;
attach values to F;
end if;

if F.values[arguments] is empty then
F.values[arguments] = F(arguments);
end if;

return F.values[arguments];
end function

In order to call an automatically memoized version of factorial using the above strategy, rather than calling factorial directly, code invokes memoized-call(factorial(n)). Each such call first checks to see if a holder array has been allocated to store results, and if not, attaches that array. If no entry exists at the position values[arguments] (where arguments are used as the key of the associative array), a real call is made to factorial with the supplied arguments. Finally, the entry in the array at the key position is returned to the caller.

The above strategy requires explicit wrapping at each call to a function that is to be memoized. In those languages that allow closures
Closure (computer science)
In computer science, a closure is a function together with a referencing environment for the non-local variables of that function. A closure allows a function to access variables outside its typical scope. Such a function is said to be "closed over" its free variables...

, memoization can be effected implicitly by a functor
Function object
A function object, also called a functor, functional, or functionoid, is a computer programming construct allowing an object to be invoked or called as though it were an ordinary function, usually with the same syntax.-Description:...

 factory that returns a wrapped memoized function object. In pseudocode, this can be expressed as follows:

function construct-memoized-functor (F is a function object parameter)
allocate a function object called memoized-version;

let memoized-version(arguments) be
if self has no attached array values then [self is a reference to this
This (computer science)
In many object-oriented programming languages, this is a keyword that is used in instance methods to refer to the object on which they are working. C++ and languages which derive in style from it generally use this...

 object]
allocate an associative array called
values;
attach
values to self;
end if;

if self.
values[arguments] is empty then
self.
values[arguments] = F(arguments);
end if;

return self.
values[arguments];
end let;

return
memoized-version;
end function

Rather than call factorial, a new function object memfact is created as follows:

memfact = construct-memoized-functor(factorial)

The above example assumes that the function factorial has already been defined
before the call to construct-memoized-functor is made. From this point forward, memfact(n) is called whenever the factorial of n is desired. In languages such as Lua, more sophisticated techniques exist which allow a function to be replaced by a new function with the same name, which would permit:

factorial = construct-memoized-functor(factorial)

Essentially, such techniques involve attaching the
original function object to the created functor and forwarding calls to the original function being memoized via an alias when a call to the actual function is required (to avoid endless recursion
Recursion
Recursion is the process of repeating items in a self-similar way. For instance, when the surfaces of two mirrors are exactly parallel with each other the nested images that occur are a form of infinite recursion. The term has a variety of meanings specific to a variety of disciplines ranging from...

), as illustrated below:

function construct-memoized-functor (
F is a function object parameter)
allocate a function object called
memoized-version;

let
memoized-version(arguments) be
if
self has no attached array values then [self is a reference to this
This (computer science)
In many object-oriented programming languages, this is a keyword that is used in instance methods to refer to the object on which they are working. C++ and languages which derive in style from it generally use this...

 object
]
allocate an associative array called values;
attach values to self;
allocate a new function object called alias;
attach alias to self; [for later ability to invoke F indirectly]
self.alias = F;
end if;

if self.values[arguments] is empty then
self.values[arguments] = self.alias(arguments); [not a direct call to F]
end if;

return self.values[arguments];
end let;

return memoized-version;
end function

(Note: Some of the steps shown above may be implicitly managed by the implementation language and are provided for illustration.)

Parsers

When a top-down parser
Top-down parsing
Top-down parsing is a type of parsing strategy where in one first looks at the highest level of the parse tree and works down the parse tree by using the rewriting rules of a formal grammar...

 tries to parse an ambiguous input with respect to an ambiguous CFG
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....

, it may need an exponential number of steps (with respect to the length of the input) to try all alternatives of the CFG in order to produce all possible parse trees. This eventually would require exponential memory space. Memoization was explored as a parsing
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...

 strategy in 1991 by Norvig
Peter Norvig
Peter Norvig is an American computer scientist. He is currently the Director of Research at Google Inc.-Educational Background:...

, who demonstrated that an algorithm similar to the use of dynamic programming
Dynamic programming
In mathematics and computer science, dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems. It is applicable to problems exhibiting the properties of overlapping subproblems which are only slightly smaller and optimal substructure...

 and state-sets in Earley's algorithm
Earley parser
In computer science, the Earley parser is an algorithm for parsing strings that belong to a given context-free language, named after its inventor, Jay Earley...

 (1970), and tables in the CYK algorithm
CYK algorithm
The Cocke–Younger–Kasami algorithm is a parsing algorithm for context-free grammars. It employs bottom-up parsing and dynamic programming....

 of Cocke, Younger and Kasami, could be generated by introducing automatic memoization to a simple backtracking
Backtracking
Backtracking is a general algorithm for finding all solutions to some computational problem, that incrementally builds candidates to the solutions, and abandons each partial candidate c as soon as it determines that c cannot possibly be completed to a valid solution.The classic textbook example...

 recursive descent parser
Recursive descent parser
A recursive descent parser is a top-down parser built from a set of mutually-recursive procedures where each such procedure usually implements one of the production rules of the grammar...

 to solve the problem of exponential time complexity. The basic idea in Norvig’s approach is that when a parser is applied to the input, the result is stored in a memotable for subsequent reuse if the same parser is ever reapplied to the same input. Richard Frost also used memoization to reduce the exponential time complexity of parser combinators
Parser Combinator
In functional programming, a parser combinator is a higher-order function which accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some structure as output, typically a parse tree or a set of...

, which can be viewed as “Purely Functional Top-Down Backtracking” parsing technique. He showed that basic memoized parser combinators can be used as building blocks to construct complex parsers as executable specifications of CFGs. It was again explored in the context of parsing in 1995 by Johnson and Dörre. In 2002, it was examined in considerable depth by Ford in the form called packrat parsing.

In 2007, Frost, Hafiz and Callaghan described a top-down parsing algorithm that uses memoization for refraining redundant computations to accommodate any form of ambiguous CFG in polynomial
Polynomial
In mathematics, a polynomial is an expression of finite length constructed from variables and constants, using only the operations of addition, subtraction, multiplication, and non-negative integer exponents...

 time (Θ
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

(n4) for left-recursive
Left recursion
In computer science, left recursion is a special case of recursion.In terms of context-free grammar, a non-terminal r is left-recursive if the left-most symbol in any of r’s ‘alternatives’ either immediately or through some other non-terminal definitions rewrites to r again.- Definition :"A...

 grammars and Θ
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

(n3) for non left-recursive grammars). Their top-down parsing algorithm also requires polynomial space for potentially exponential ambiguous parse trees by 'compact representation' and 'local ambiguities grouping'. Their compact representation is comparable with Tomita’s compact representation of bottom-up parsing
Bottom-up parsing
Bottom-up parsing is a strategy for analyzing unknown information that attempts to identify the most fundamental units first, and then to infer higher-order structures from them...

. Their use of memoization is not only limited to retrieving the previously computed results when a parser is applied to a same input position repeatedly (which is essential for polynomial time requirement); it is specialized to perform the following additional tasks:
  • The memoization process (which could be viewed as a ‘wrapper’ around any parser execution) accommodates an ever-growing direct left-recursive parse by imposing depth restrictions with respect to input length and current input position.
  • The algorithm’s memo-table ‘lookup’ procedure also determines the reusability of a saved result by comparing the saved result’s computational context with the parser’s current context. This contextual comparison is the key to accommodate indirect (or hidden) left-recursion.
  • When performing a successful lookup in a memotable, instead of returning the complete result-set, the process only returns the references of the actual result and eventually speeds up the overall computation.
  • During updating the memotable, the meoization process groups the (potentially exponential) ambiguous results and ensures the polynomial space requirement.


Frost, Hafiz and Callaghan also described the implementation of the algorithm in PADL’08 as a set of higher-order function
Higher-order function
In mathematics and computer science, higher-order functions, functional forms, or functionals are functions which do at least one of the following:*take one or more functions as an input*output a function...

s (called parser combinators) in Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...

, which enables the construction of directly executable specifications of CFGs as language processors. The importance of their polynomial algorithm’s power to accommodate ‘any form of ambiguous CFG’ with top-down parsing is vital with respect to the syntax and semantics analysis during Natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

. The X-SAIGA site has more about the algorithm and implementation details.

While Norvig increased the power of the parser through memoization, the augmented parser was still as time complex as Earley's algorithm, which demonstrates a case of the use of memoization for something other than speed optimization. Johnson and Dörre demonstrate another such non-speed related application of memoization: the use of memoization to delay linguistic constraint resolution to a point in a parse where sufficient information has been accumulated to resolve those constraints. By contrast, in the speed optimization application of memoization, Ford demonstrated that memoization could guarantee that parsing expression grammar
Parsing expression grammar
A parsing expression grammar, or PEG, is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language...

s could parse in linear
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

 time even those languages
Formal language
A formal language is a set of words—that is, finite strings of letters, symbols, or tokens that are defined in the language. The set from which these letters are taken is the alphabet over which the language is defined. A formal language is often defined by means of a formal grammar...

 that resulted in worst-case backtracking behavior.

Consider the following grammar
Formal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...

:

S → (A c) | (B d)
A → X (a|b)
B → X b
X → x [X]

(Notation note: In the above example, the production S → (A c) | (B d) reads: "An S is either an A followed by a c or a B followed by a d." The production X → x [X] reads "An X is an x followed by an optional X.")

This grammar generates one of the following three variations of string
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

: xac, xbc, or xbd (where x here is understood to mean one or more x's.) Next, consider how this grammar, used as a parse specification, might effect a top-down, left-right parse of the string xxxxxbd:
The rule A will recognize xxxxxb (by first descending into X to recognize one x, and again descending into X until all the x's are consumed, and then recognizing the b), and then return to S, and fail to recognize a c. The next clause of S will then descend into B, which in turn again descends into X and recognizes the x's by means of many recursive calls to X, and then a b, and returns to S and finally recognizes a d.


The key concept here is inherent in the phrase
again descends into X
. The process of looking forward, failing, backing up, and then retrying the next alternative is known in parsing as backtracking
Backtracking
Backtracking is a general algorithm for finding all solutions to some computational problem, that incrementally builds candidates to the solutions, and abandons each partial candidate c as soon as it determines that c cannot possibly be completed to a valid solution.The classic textbook example...

, and it is primarily backtracking that presents opportunities for memoization in parsing. Consider a function RuleAcceptsSomeInput(Rule, Position, Input), where the parameters are as follows:
  • Rule is the name of the rule under consideration.
  • Position is the offset currently under consideration in the input.
  • Input is the input under consideration.


Let the return value of the function RuleAcceptsSomeInput be the length of the input accepted by Rule, or 0 if that rule does not accept any input at that offset in the string. In a backtracking scenario with such memoization, the parsing process is as follows:
When the rule A descends into X at offset 0, it memoizes the length 5 against that position and the rule X. After having failed at d, B then, rather than descending again into X, queries the position 0 against rule X in the memoization engine, and is returned a length of 5, thus saving having to actually descend again into X, and carries on as if it had descended into X as many times as before.


In the above example, one or many descents into X may occur, allowing for strings such as xxxxxxxxxxxxxxxxbd. In fact, there may be any number of x's before the b. While the call to S must recursively descend into X as many times as there are x's, B will never have to descend into X at all, since the return value of RuleAcceptsSomeInput(X, 0, xxxxxxxxxxxxxxxxbd) will be 16 (in this particular case).

Those parsers that make use of syntactic predicate
Syntactic predicate
A syntactic predicate specifies the syntactic validity of applying a production in a formal grammar and is analogous to a semantic predicate that specifies the semantic validity of applying a production. It is a simple and effective means of dramatically improving the recognition strength of an LL...

s are also able to memoize the results of predicate parses, as well, thereby reducing such constructions as:

S → (A)? A
A → /* some rule */

to one descent into A.

If a parser builds a parse tree
Parse tree
A concrete syntax tree or parse tree or parsing treeis an ordered, rooted tree that represents the syntactic structure of a string according to some formal grammar. In a parse tree, the interior nodes are labeled by non-terminals of the grammar, while the leaf nodes are labeled by terminals of the...

 during a parse, it must memoize not only the length of the input that matches at some offset against a given rule, but also must store the sub-tree that is generated by that rule at that offset in the input, since subsequent calls to the rule by the parser will not actually descend and rebuild that tree. For the same reason, memoized parser algorithms that generate calls to external code (sometimes called a semantic action routine) when a rule matches must use some scheme to ensure that such rules are invoked in a predictable order.

Since, for any given backtracking or syntactic predicate capable parser not every grammar will need backtracking or predicate checks, the overhead of storing each rule's parse results against every offset in the input (and storing the parse tree if the parsing process does that implicitly) may actually slow down a parser. This effect can be mitigated by explicit selection of those rules the parser will memoize.

See also

  • Computational complexity theory
    Computational complexity theory
    Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other...

     – more information on algorithm complexity
  • Strength reduction
    Strength reduction
    Strength reduction is a compiler optimization where expensive operations are replaced with equivalent but less expensive operations. The classic example of strength reduction converts "strong" multiplications inside a loop into "weaker" additions – something that frequently occurs in array...

     – a compiler
    Compiler
    A compiler is a computer program that transforms source code written in a programming language into another computer language...

     optimization that replaces a costly operation with an equivalent, less costly one
  • Partial evaluation
    Partial evaluation
    In computing, partial evaluation is a technique for several different types of program optimization by specialization. The most straightforward application is to produce new programs which run faster than the originals while being guaranteed to behave in the same way...

     – a related technique for automatic program optimization
  • Lazy evaluation
    Lazy evaluation
    In programming language theory, lazy evaluation or call-by-need is an evaluation strategy which delays the evaluation of an expression until the value of this is actually required and which also avoids repeated evaluations...

     – shares some concepts with memoization
  • Lookup table
    Lookup table
    In computer science, a lookup table is a data structure, usually an array or associative array, often used to replace a runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than...

     – a key data structure
    Data structure
    In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

     used in memoization
  • Flyweight pattern
    Flyweight pattern
    Flyweight is a software design pattern. A flyweight is an object that minimizes memory use by sharing as much data as possible with other similar objects; it is a way to use objects in large numbers when a simple repeated representation would use an unacceptable amount of memory. The term is named...

     – an object programming design pattern
    Design pattern (computer science)
    In software engineering, a design pattern is a general reusable solution to a commonly occurring problem within a given context in software design. A design pattern is not a finished design that can be transformed directly into code. It is a description or template for how to solve a problem that...

    , that also uses a kind of memoization
  • Director string
    Director string
    In mathematics, in the area of lambda calculus and computation, directors or director strings are a mechanism for keeping track of the free variables in a term...

  • Dynamic programming
    Dynamic programming
    In mathematics and computer science, dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems. It is applicable to problems exhibiting the properties of overlapping subproblems which are only slightly smaller and optimal substructure...

     – some applications of memoizing techniques
  • Hashlife
    Hashlife
    Hashlife is a memoized algorithm for computing the long-term fate of a given starting configuration in Conway's Game of Life and related cellular automata, much more quickly than would be possible using alternative algorithms that simulate each time step of each cell of the automaton...

     – a memoizing technique to speed up the computation of cellular automata
  • Higher-Order Perl
    Higher-Order Perl
    Higher-Order Perl , ISBN 1-55860-701-3 , is a book written by Mark Jason Dominus with the goal to teach Perl programmers with a strong C and Unix background how to use techniques with roots in functional programming languages like Lisp that are available in Perl as well, but less known. The book is...

     – a free book by Mark Jason Dominus contains an entire chapter on implementing memoization, along with some background

External links

Examples of memoization in various programming languages
  • Memoize – Memoize is a small library, written by Tim Bradshaw, for performing memoization in Common Lisp
    Common Lisp
    Common Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 , . From the ANSI Common Lisp standard the Common Lisp HyperSpec has been derived for use with web browsers...

    .
  • IncPy – A custom Python
    Python (programming language)
    Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

     interpreter that performs automatic memoization (with no required user annotations)
  • Dave Herman's Macros for defining memoized procedures in Racket.
  • Memoize.pm – a Perl
    Perl
    Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

     module
    Perl module
    A Perl module is a discrete component of software for the Perl programming language. Technically, it is a particular set of conventions for using Perl's package mechanism that has become universally adopted....

     that implements memoized functions.
  • Java memoization – an example in Java using dynamic proxy classes to create a generic memoization pattern.
  • Memoization in C++ – although C++
    C++
    C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

     doesn't support first-class functions, here is a toolkit that supports automated memoization (via pre-processing).
  • Tek271 Memoizer – Open source Java memoizer using annotations and pluggable cache implementations.
  • memoize – A Ruby module that implements memoized methods.
  • Python memoization – A Python
    Python (programming language)
    Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

     example of memoization.
  • OCaml memoization – Implemented as a Camlp4
    Camlp4
    Camlp4 is a software system for writing extensible parsers for programming languages. It provides a set of Objective Caml libraries that are used to define grammars as well as loadable syntax extensions of such grammars...

     syntax extension.
  • Memoization in Lua – Two example implementations of a general memoize function in Lua.
  • Memoization in Mathematica – Memoization and limited memoization in Mathematica
    Mathematica
    Mathematica is a computational software program used in scientific, engineering, and mathematical fields and other areas of technical computing...

    .
  • Memoization in Javascript – Extending the Function prototype in JavaScript
    JavaScript
    JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....

    .
  • Memoization in Javascript – Examples of memoization in javascript using own caching mechanism and using the YUI library
  • X-SAIGA – eXecutable SpecificAtIons of GrAmmars. Contains publications related to top-down parsing algorithm that supports left-recursion and ambiguity in polynomial time and space.
  • Memoization in Scheme – A Scheme example of memoization on a class webpage.
  • Memoization in Combinatory Logic - A web service to reduce Combinatory Logic
    Combinatory logic
    Combinatory logic is a notation introduced by Moses Schönfinkel and Haskell Curry to eliminate the need for variables in mathematical logic. It has more recently been used in computer science as a theoretical model of computation and also as a basis for the design of functional programming...

    while memoizing every step in a database.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK