Statistical parsing
Encyclopedia
Statistical parsing is a group of parsing
methods within natural language processing
. The methods have in common that they associate grammar
rules with a probability. Grammar rules are traditionally viewed in computational linguistics
as defining the valid sentences in a language. Within this mindset, the idea of associating each rule with a probability then provides the relative frequency of any given grammar rule and, by deduction, the probability of a complete parse for a sentence. (The probability associated with a grammar rule may be induced, but the application of that grammar rule within a parse tree and the computation of the probability of the parse tree based on its component rules is a form of deduction.) Using this concept, statistical parsers make use of a procedure to search over a space of all candidate parses, and the computation of each candidate's probability, to derive the most probable parse of a sentence. The expectation maximization algorithm is one popular method of searching for the most probable parse.
"Search" in this context is an application of the very useful search algorithm
in artificial intelligence
.
By way of example, think about the sentence "The can can hold water". A reader would instantly see that there is an object called "the can" and that this object is performing the action 'can' (i.e. is able to); and the thing the object is able to do is "hold"; and the thing the object is able to hold is "water". Using more linguistic terminology, "The can" is a noun phrase composed of a determiner followed by a noun, and "can hold water" is a verb phrase which is itself composed of a verb followed by a verb phrase. But is this the only interpretation of the sentence? Certainly "The can can" is a perfectly valid noun-phrase referring to a type of dance, and "hold water" is also a valid verb-phrase, although the coerced meaning of the combined sentence is non-obvious. This lack of meaning is not seen as a problem by most linguists (for a discussion on this point, see Colorless green ideas sleep furiously
) but from a pragmatic point of view it is desirable to obtain the first interpretation rather than the second and statistical parsers achieve this by ranking the interpretations based on their probability.
(In this example various assumptions about the grammar
have been made, such as a simple left-to-right derivation rather than head-driven, its use of noun-phrases rather than the currently fashionable determiner-phrases, and no type-check preventing a concrete noun being combined with an abstract verb phrase. None of these assumptions affect the thesis of the argument and a comparable argument can be made using any other grammatical formalism.)
There are a number of methods that statistical parsing algorithms frequently use. While few algorithms will use all of these they give a good overview of the general field. Most statistical parsing algorithms are based on a modified form of chart parsing. The modifications are necessary to support an extremely large number of grammatical rules and therefore search space, and essentially involve applying classical artificial intelligence
algorithms to the traditionally exhaustive search. Some examples of the optimisations are only searching a likely subset of the search space (stack search
), for optimising the search probability (Baum-Welch algorithm
) and for discarding parses that are too similar to be treated separately (Viterbi algorithm
).
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...
methods within natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
. The methods have in common that they associate grammar
Grammar
In linguistics, grammar is the set of structural rules that govern the composition of clauses, phrases, and words in any given natural language. The term refers also to the study of such rules, and this field includes morphology, syntax, and phonology, often complemented by phonetics, semantics,...
rules with a probability. Grammar rules are traditionally viewed in computational linguistics
Computational linguistics
Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....
as defining the valid sentences in a language. Within this mindset, the idea of associating each rule with a probability then provides the relative frequency of any given grammar rule and, by deduction, the probability of a complete parse for a sentence. (The probability associated with a grammar rule may be induced, but the application of that grammar rule within a parse tree and the computation of the probability of the parse tree based on its component rules is a form of deduction.) Using this concept, statistical parsers make use of a procedure to search over a space of all candidate parses, and the computation of each candidate's probability, to derive the most probable parse of a sentence. The expectation maximization algorithm is one popular method of searching for the most probable parse.
"Search" in this context is an application of the very useful search algorithm
Search algorithm
In computer science, a search algorithm is an algorithm for finding an item with specified properties among a collection of items. The items may be stored individually as records in a database; or may be elements of a search space defined by a mathematical formula or procedure, such as the roots...
in artificial intelligence
Artificial intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...
.
By way of example, think about the sentence "The can can hold water". A reader would instantly see that there is an object called "the can" and that this object is performing the action 'can' (i.e. is able to); and the thing the object is able to do is "hold"; and the thing the object is able to hold is "water". Using more linguistic terminology, "The can" is a noun phrase composed of a determiner followed by a noun, and "can hold water" is a verb phrase which is itself composed of a verb followed by a verb phrase. But is this the only interpretation of the sentence? Certainly "The can can" is a perfectly valid noun-phrase referring to a type of dance, and "hold water" is also a valid verb-phrase, although the coerced meaning of the combined sentence is non-obvious. This lack of meaning is not seen as a problem by most linguists (for a discussion on this point, see Colorless green ideas sleep furiously
Colorless green ideas sleep furiously
"Colorless green ideas sleep furiously" is a sentence composed by Noam Chomsky in his 1957 Syntactic Structures as an example of a sentence that is grammatically correct but semantically nonsensical. The term was originally used in his 1955 thesis "Logical Structures of Linguistic Theory"...
) but from a pragmatic point of view it is desirable to obtain the first interpretation rather than the second and statistical parsers achieve this by ranking the interpretations based on their probability.
(In this example various assumptions about the grammar
Grammar
In linguistics, grammar is the set of structural rules that govern the composition of clauses, phrases, and words in any given natural language. The term refers also to the study of such rules, and this field includes morphology, syntax, and phonology, often complemented by phonetics, semantics,...
have been made, such as a simple left-to-right derivation rather than head-driven, its use of noun-phrases rather than the currently fashionable determiner-phrases, and no type-check preventing a concrete noun being combined with an abstract verb phrase. None of these assumptions affect the thesis of the argument and a comparable argument can be made using any other grammatical formalism.)
There are a number of methods that statistical parsing algorithms frequently use. While few algorithms will use all of these they give a good overview of the general field. Most statistical parsing algorithms are based on a modified form of chart parsing. The modifications are necessary to support an extremely large number of grammatical rules and therefore search space, and essentially involve applying classical artificial intelligence
Artificial intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...
algorithms to the traditionally exhaustive search. Some examples of the optimisations are only searching a likely subset of the search space (stack search
Stack search
Stack search is a search algorithm similar to beam search. It can be used to explore tree-structured search spaces and is often employed in Natural language processing applications, such as parsing of natural languages, or for decoding of error correcting codes where the technique goes under the...
), for optimising the search probability (Baum-Welch algorithm
Baum-Welch algorithm
In electrical engineering, computer science, statistical computing and bioinformatics, the Baum–Welch algorithm is used to find the unknown parameters of a hidden Markov model . It makes use of the forward-backward algorithm and is named for Leonard E. Baum and Lloyd R...
) and for discarding parses that are too similar to be treated separately (Viterbi algorithm
Viterbi algorithm
The Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states – called the Viterbi path – that results in a sequence of observed events, especially in the context of Markov information sources, and more generally, hidden Markov models...
).
Notable people in statistical parsing
- Eugene CharniakEugene CharniakEugene Charniak is a Computer Science and Cognitive Science professor at Brown University. He has an A.B. in Physics from The University of Chicago and a Ph.D. from M.I.T. in Computer Science. His research has always been in the area of language understanding or technologies which relate to it,...
Author of Statistical techniques for natural language parsing amongst many other contributions - Fred Jelinek Applied and developed numerous techniques from Information Theory to build the field
- David Magerman Major contributor to turning the field from theoretical to practical by managing data
- James Curran Applying the MaxEnt algorithm, word representation, and other contributions
- Michael Collins (computational linguist)Michael Collins (computational linguist)Michael J. Collins is a researcher in the field of computational linguistics.His research interests are in natural language processing as well as machine learning and he has made important contributions in statistical parsing and in statistical machine learning...
First very high performance statistical parser - Joshua Goodman Hypergraphs, and other generalizations between different methods