Tree-adjoining grammar - AbsoluteAstronomy.com

Tree-adjoining grammar is a grammar formalism defined by Aravind Joshi

Aravind Joshi

Aravind Krishna Joshi is the Henry Salvatori Professor of Computer and Cognitive Science in the computer science department of the University of Pennsylvania...

. Tree-adjoining grammars are somewhat similar to context-free grammar

Context-free grammar

In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....

s, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees (see tree (graph theory)

Tree (graph theory)

In mathematics, more specifically graph theory, a tree is an undirected graph in which any two vertices are connected by exactly one simple path. In other words, any connected graph without cycles is a tree...

and tree (data structure)

Tree (data structure)

In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at...

History

TAG originated in investigations by Joshi and his students into the family of adjunction grammars (AG),
the "string grammar" of Zellig Harris

Zellig Harris

Zellig Sabbettai Harris was a renowned American linguist, mathematical syntactician, and methodologist of science. Originally a Semiticist, he is best known for his work in structural linguistics and discourse analysis and for the discovery of transformational structure in language...

. AGs handle endocentric

Endocentric

In linguistics, an endocentric construction is a grammatical construction that fulfills the same linguistic function as one of its parts. An endocentric construction is not an exocentric construction, and an exocentric construction is not an endocentric construction...

properties of language in a natural and effective way, but do not have a good characterization of exocentric constructions; the converse is true of rewrite grammars

Rewrite rule

In linguistics, a rewrite rule for natural language in generative grammar is a rule of the form A → X where A is a syntactic category label, such as noun phrase or sentence, and X is a sequence of such labels and/or morphemes, expressing the fact that A can be replaced by X in generating the...

, or phrase-structure grammar (PSG). In 1969, Joshi introduced a family of grammars that exploits this complementarity by mixing the two types of rules. A few very simple rewrite rules suffice to generate the vocabulary of strings for adjunction rules. This family is distinct from the Chomsky hierarchy

Chomsky hierarchy

Within the field of computer science, specifically in the area of formal languages, the Chomsky hierarchy is a containment hierarchy of classes of formal grammars....

but intersects it in interesting and linguistically relevant ways.

Description

The rules in a TAG are trees with a special leaf node known as the foot node, which is anchored to a word.
There are two types of basic trees in TAG: initial trees (often represented as '

') and auxiliary trees ('

'). Initial trees represent basic valency relations, while auxiliary trees allow for recursion.
Auxiliary trees have the root (top) node and foot node labeled with the same symbol.
A derivation starts with an initial tree, combining via either substitution or adjunction. Substitution replaces a frontier node with another tree whose top node has the same label. Adjunction inserts an auxiliary tree into the center of another tree.
The root/foot label of the auxiliary tree must match the label of the node at which it adjoins.

Other variants of TAG allow multi-component trees, trees with multiple foot nodes, and other extensions.

Complexity and application

Tree-adjoining grammars are often described as mildly context-sensitive, meaning that they possess certain properties that make them more powerful (in terms of weak generative capacity

Weak generative capacity

Weak generative capacity refers to the set of strings that can be generated by a grammar. All grammars of a given formal complexity can generate the same strings....

) than context-free grammars, but less powerful than indexed

Indexed grammar

An indexed grammar is a formal grammar which describes indexed languages. They have three disjoint sets of symbols: the usual terminals and nonterminals, as well as index symbols, which appear only in intermediate derivation steps on a stack associated with the non-terminals of that step.The...

or context-sensitive grammar

Context-sensitive grammar

A context-sensitive grammar is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols...

s. Mildly context-sensitive grammars are conjectured to be powerful enough to model natural language

Natural language

In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...

s while remaining efficiently parseable in the general case.

A TAG can describe the language of squares (in which some arbitrary string is repeated), and the language

. This type of processing can be represented by an embedded pushdown automaton

Embedded pushdown automaton

An embedded pushdown automaton or EPDA is a computational model for parsing languages generated by tree-adjoining grammars . It is similar to the context-free grammar-parsing pushdown automaton, except that instead of using a plain stack to store symbols, it has a stack of iterated stacks that...

.

Languages with cubes (i.e. triplicated strings) or with more than four distinct character strings of equal length cannot be generated by tree-adjoining grammars.

For these reasons, languages generated by tree-adjoining grammars are referred to as mildly context-sensitive language
Mildly context-sensitive language
In formal grammar theory, mildly context-sensitive languages are a class of formal languages which can be efficiently parsed, but still possess enough context sensitivity to allow the parsing of natural languages. The concept was first introduced by Aravind Joshi in 1985.-Definition:Mild...

s.

Equivalencies

Vijay-Shanker and Weir (1994) demonstrates that Linear Indexed Grammars, Combinatory Categorial Grammars

Combinatory categorial grammar

Combinatory categorial grammar is an efficiently parseable, yet linguistically expressive grammar formalism. It has a transparent interface between surface syntax and underlying semantic representation, including predicate-argument structure, quantification and information structure.CCG relies on...

, Tree-adjoining Grammars, and Head Grammars

Head grammar

Head Grammar is a grammar formalism introduced in Carl Pollard as an extension of the Context-free grammar class of grammars. The class of head grammars is a subset of the linear context-free rewriting systems....

are weakly equivalent

Weak equivalence

-Mathematics:In mathematics, a weak equivalence is a notion from homotopy theory which in some sense identifies objects that have the same basic "shape"...

formalisms, in that they all define the same string languages.

External links

The XTAG project, which uses a TAG for natural language processing.
A tutorial on TAG
Another tutorial with focus on comparison with Lexical Functional Grammar
Lexical functional grammar
Lexical functional grammar is a grammar framework in theoretical linguistics, a variety of generative grammar. It is a type of phrase structure grammar, as opposed to a dependency grammar. The development of the theory was initiated by Joan Bresnan and Ronald Kaplan in the 1970s, in reaction to...

and grammars extraction from Treebank
Treebank
A treebank or parsed corpus is a text corpus in which each sentence has been parsed, i.e. annotated with syntactic structure. Syntactic structure is commonly represented as a tree structure, hence the name Treebank...
SemConst Documentation A quick survey on Syntax and Semantic Interface problematic within the TAG framework.
The TuLiPa project The Tübingen Linguistic Parsing Architecture (TuLiPA) is a multi-formalism syntactic (and semantic) parsing environment, designed mainly for Multi-Component Tree Adjoining Grammars with Tree Tuples
The Metagrammar Toolkit which provides several tools to edit and compile MetaGrammars into TAGs. It also include a wide coverage French Metagrammars.
LLP2 A Lexicalized Tree Adjoining Grammar parser which provides an easy to use graphical environment (page in French)

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.