
Natural language generation
    
    Encyclopedia
    
        Natural Language Generation (NLG) is the natural language processing
task of generating natural language
from a machine representation system such as a knowledge base
or a logical form
. Psycholinguists
prefer the term language production
when such formal representations are interpreted as models for mental representations.
In a sense, one can say that an NLG system is like a translator that converts a computer based representation into a natural language representation. However, the methods to produce the final language are very different from those of a compiler due to the inherent expressivity of natural languages.
NLG may be viewed as the opposite of natural language understanding
. The difference can be put this way: whereas in natural language understanding the system needs to disambiguate the input sentence to produce the machine representation language, in NLG the system needs to make decisions about how to put a concept into words.
The simplest (and perhaps trivial) examples are systems that generate form letters. Such systems do not typically involve grammar rules, but may generate a letter to a consumer, e.g. stating that a credit card spending limit is about to be reached. More complex NLG systems dynamically create texts to meet a communicative goal.
As in other areas of natural language processing
, this can be done using either explicit models of
language (e.g., grammars) and the domain, or using statistical models derived by analysing human-written texts.
NLG is a fast-evolving field. The best single source for up-to-date research in the area is the SIGGEN portion
of the ACL Anthology. Perhaps the closest the field comes to
a specialist textbook is Reiter and Dale (2000), but this book does not describe developments in
the field since 2000.
shows a simple
NLG system in action. This system takes as input six numbers, which give predicted pollen levels in
different parts of Scotland. From these numbers, the system generates a short textual summary of
pollen levels as its output.
For example, using the historical data for 1-July-2005, the software produces
In contrast, the actual forecast (written by a human meteorologist) from this data was
Comparing these two illustrates some of the choices that NLG systems must make; these are further
discussed below.
Content determination
: Deciding what information to mention in the text.
For instance, in the pollen example above, deciding whether to explicitly mention that pollen
level is 7 in the south east.
Document structuring
: Overall organisation of the information to convey. For example, deciding to
describe the areas with high pollen levels first, instead of the areas with low pollen levels.
Aggregation
: Merging of similar sentences to improve readability and naturalness.
For instance, merging the two sentences
Grass pollen levels for Friday have increased from the moderate to high levels of yesterday and
Grass pollen levels will be around 6 to 7 across most parts of the country into the single sentence
Grass pollen levels for Friday have increased from the moderate to high levels of yesterday with values of around 6 to 7 across most parts of the country.
Lexical choice
: Putting words to the concepts. For example, deciding whether medium or moderate
should be used when describing a pollen level of 4.
Referring expression generation
: Creating referring expression
s that identify objects and regions. For example, deciding to use
in the Northern Isles and far northeast of mainland Scotland to refer to a certain region in Scotland.
This task also includes making decisions about pronouns and other types of
anaphora
.
Realisation
: Creating the actual text, which should be correct
according to the rules of
syntax
, morphology
, and orthography
. For example, using will be for the future
tense of to be.
(see computational humor
). But from a commercial perspective, the most successful NLG applications
have been data-to-text systems which generate textual summaries of databases and data sets; these
systems usually perform data analysis
as well as text generation. In particular, several systems have
been built that produce textual weather forecasts from weather data. The earliest such system to be
deployed was FoG, which was used by Environment Canada to
generate weather forecasts in French and English in the early 1990s. The success of FoG triggered
other work, both research and commercial. Recent research in this area include an experiment which
showed that users sometimes preferred computer-generated weather forecasts to human-written ones,
in part because the computer forecasts used more consistent terminology
, and a demonstration that statistical techniques
could be used to generate high-quality weather forecasts.
Recent applications include the ARNS system used to summarise
conditions in US ports.
In the 1990s there was considerable interest in using NLG to summarise financial
and business data. For example the SPOTLIGHT system developed at A.C. Nielsen automatically generated readable English text based on the analysis of large amounts of retail sales data.
More recently there is growing interest in using NLG to summarise electronic medical records.
Commercial applications in this area are starting to appear
,
and researchers have shown that NLG summaries of medical data can be effective
decision-support aids for medical professionals. There is also growing interest is using NLG to enhance accessibility
, for example by describing graphs and data sets to blind people.
An example for a highly interactive use of NLG is the WYSIWYM
framework. It stands for What you see is what you meant and allows users to see and manipulate the continuously rendered view (NLG output) of an underlying formal language document (NLG input), thereby editing the formal language without having to learn it.
Generally speaking, what we ultimately want to know is how useful NLG systems are at helping people, which is the first of the above techniques. However, task-based evaluations are time-consuming and expensive, and can be difficult to carry out (especially if they require subjects with specialised expertise, such as doctors). Hence (as in other areas of NLP) task-based evaluations are the exception, not the norm.
In recent years researchers have started trying to assess how well human-ratings and metrics correlate with (predict) task-based evaluations. Much of this work is being conducted in the context of Generation Challenges shared-task events. Initial results suggest that human ratings are much better than metrics in this regard. In other words, human ratings usually do predict task-effectiveness at least to some degree (although there are exceptions ), while ratings produced by metrics often do not predict task-effectiveness well. These results are very preliminary, hopefully better data will be available soon. In any case, human ratings are currently the most popular evaluation technique in NLG; this is contrast to machine translation
, where metrics are very widely used.
Natural language processing
Natural language processing  is a field of computer science and linguistics concerned with the interactions between computers and human  languages; it began as a branch of artificial intelligence....
task of generating natural language
Natural language
In the philosophy of language, a natural language  is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
from a machine representation system such as a knowledge base
Knowledge base
A knowledge base  is a special kind of database for knowledge management. A Knowledge Base provides a means for information to be collected, organised, shared, searched and utilised.-Types:...
or a logical form
Logical form
In logic, the logical form of a sentence  or set of sentences is the form obtained by abstracting from the subject matter of its content terms or by regarding the content terms as mere placeholders or blanks on a form...
. Psycholinguists
Psycholinguistics
Psycholinguistics or psychology of language is the study of the psychological and neurobiological factors that enable humans to acquire, use, comprehend and produce language. Initial forays into psycholinguistics were largely philosophical ventures, due mainly to a lack of cohesive data on how the...
prefer the term language production
Language production
In psycholinguistics, language production is the production of spoken or written language.  It describes all of the stages between having a concept, and translating that concept into linguistic form...
when such formal representations are interpreted as models for mental representations.
In a sense, one can say that an NLG system is like a translator that converts a computer based representation into a natural language representation. However, the methods to produce the final language are very different from those of a compiler due to the inherent expressivity of natural languages.
NLG may be viewed as the opposite of natural language understanding
Natural language understanding
Natural language understanding is a subtopic  of natural language processing in artificial intelligence that deals with machine reading comprehension....
. The difference can be put this way: whereas in natural language understanding the system needs to disambiguate the input sentence to produce the machine representation language, in NLG the system needs to make decisions about how to put a concept into words.
The simplest (and perhaps trivial) examples are systems that generate form letters. Such systems do not typically involve grammar rules, but may generate a letter to a consumer, e.g. stating that a credit card spending limit is about to be reached. More complex NLG systems dynamically create texts to meet a communicative goal.
As in other areas of natural language processing
Natural language processing
Natural language processing  is a field of computer science and linguistics concerned with the interactions between computers and human  languages; it began as a branch of artificial intelligence....
, this can be done using either explicit models of
language (e.g., grammars) and the domain, or using statistical models derived by analysing human-written texts.
NLG is a fast-evolving field. The best single source for up-to-date research in the area is the SIGGEN portion
of the ACL Anthology. Perhaps the closest the field comes to
a specialist textbook is Reiter and Dale (2000), but this book does not describe developments in
the field since 2000.
Example
The Forecast for Scotland demoshows a simple
NLG system in action. This system takes as input six numbers, which give predicted pollen levels in
different parts of Scotland. From these numbers, the system generates a short textual summary of
pollen levels as its output.
For example, using the historical data for 1-July-2005, the software produces
Grass pollen levels for Friday have increased from the moderate to high levels of yesterday with values of around 6 to 7 across most parts of the country. However, in Northern areas, pollen levels will be moderate with values of 4.
In contrast, the actual forecast (written by a human meteorologist) from this data was
Pollen counts are expected to remain high at level 6 over most of Scotland, and even level 7 in the south east. The only relief is in the Northern Isles and far northeast of mainland Scotland with medium levels of pollen count.
Comparing these two illustrates some of the choices that NLG systems must make; these are further
discussed below.
Stages
The process to generate text can be as simple as keeping a list of canned text that is copied and pasted, possibly linked with some glue text. The results may be satisfactory in simple domains such as horoscope machines or generators of personalised business letters. However, a sophisticated NLG system needs to include stages of planning and merging of information to enable the generation of text that looks natural and does not become repetitive. Typical stages are:Content determination
Content determination
Content determination is a subtask of Natural language generation, which involves deciding the on the information communicated in a generated text.  It is closely related to Document structuring NLG task.-Example:...
: Deciding what information to mention in the text.
For instance, in the pollen example above, deciding whether to explicitly mention that pollen
level is 7 in the south east.
Document structuring
Document structuring
Document Structuring is a subtask of Natural language generation, which involves deciding the order and grouping  of sentences in a generated text...
: Overall organisation of the information to convey. For example, deciding to
describe the areas with high pollen levels first, instead of the areas with low pollen levels.
Aggregation
Aggregation (linguistics)
Aggregation is a subtask of natural language generation, which involves merging syntactic constituents  together...
: Merging of similar sentences to improve readability and naturalness.
For instance, merging the two sentences
Grass pollen levels for Friday have increased from the moderate to high levels of yesterday and
Grass pollen levels will be around 6 to 7 across most parts of the country into the single sentence
Grass pollen levels for Friday have increased from the moderate to high levels of yesterday with values of around 6 to 7 across most parts of the country.
Lexical choice
Lexical choice
Lexical choice is a subtask of Natural language generation, which involves choosing the content words  in a generated text.  Function words  are usually chosen during Realisation.-Examples:...
: Putting words to the concepts. For example, deciding whether medium or moderate
should be used when describing a pollen level of 4.
Referring expression generation
Referring expression generation
Referring expression generation is a subtask of Natural language generation , which involvescreating referring expressions  that identify specific entities to the reader...
: Creating referring expression
Referring expression
A referring expression , in linguistics, is any noun phrase, or surrogate for a noun phrase, whose function in a text  is  "pick out" someone an individual person, place, object, or a set of persons, places, objects, etc. The technical   terminology for "pick out" differs a great deal from one...
s that identify objects and regions. For example, deciding to use
in the Northern Isles and far northeast of mainland Scotland to refer to a certain region in Scotland.
This task also includes making decisions about pronouns and other types of
anaphora
Anaphora (linguistics)
In linguistics, anaphora  is an instance of an expression referring to another. Usually, an anaphoric expression is represented by a pro-form or some other kind of deictic--for instance, a pronoun referring to its antecedent...
.
Realisation
Realization (linguistics)
Realisation is a subtask of Natural language generation, which involvescreating an actual text in a human language  from a syntacticrepresentation...
: Creating the actual text, which should be correct
according to the rules of
syntax
Syntax
In linguistics,  syntax  is the study of the principles and rules for constructing phrases and sentences in natural languages....
, morphology
Morphology (linguistics)
In linguistics, morphology is the identification, analysis and description, in a language, of the structure of morphemes and other linguistic units, such as words, affixes, parts of speech, intonation/stress, or implied context...
, and orthography
Orthography
The orthography of a language specifies a standardized way of using a specific writing system  to write the language. Where more than one writing system is used for a language, for example Kurdish, Uyghur, Serbian or Inuktitut, there can be more than one orthography...
. For example, using will be for the future
tense of to be.
Applications
The popular media has been especially interested in NLG systems which generate jokes(see computational humor
Computational humor
Computational humor is a branch of computational linguistics and artificial intelligence which uses computers in humor research. It is not to be confused with computer humor ....
). But from a commercial perspective, the most successful NLG applications
have been data-to-text systems which generate textual summaries of databases and data sets; these
systems usually perform data analysis
Data analysis
Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making...
as well as text generation. In particular, several systems have
been built that produce textual weather forecasts from weather data. The earliest such system to be
deployed was FoG, which was used by Environment Canada to
generate weather forecasts in French and English in the early 1990s. The success of FoG triggered
other work, both research and commercial. Recent research in this area include an experiment which
showed that users sometimes preferred computer-generated weather forecasts to human-written ones,
in part because the computer forecasts used more consistent terminology
, and a demonstration that statistical techniques
could be used to generate high-quality weather forecasts.
Recent applications include the ARNS system used to summarise
conditions in US ports.
In the 1990s there was considerable interest in using NLG to summarise financial
and business data. For example the SPOTLIGHT system developed at A.C. Nielsen automatically generated readable English text based on the analysis of large amounts of retail sales data.
More recently there is growing interest in using NLG to summarise electronic medical records.
Commercial applications in this area are starting to appear
,
and researchers have shown that NLG summaries of medical data can be effective
decision-support aids for medical professionals. There is also growing interest is using NLG to enhance accessibility
Accessibility
Accessibility is a general term used to describe the degree to which a product, device, service, or environment is available to as many people as possible. Accessibility can be viewed as the "ability to access" and benefit from some system or entity...
, for example by describing graphs and data sets to blind people.
An example for a highly interactive use of NLG is the WYSIWYM
WYSIWYM (Meant)
What You See Is What You Meant allows users to create abstract knowledge representations such as those required by the Semantic Web using a natural language interface. Interestingly, no attempt at natural language understanding  is made...
framework. It stands for What you see is what you meant and allows users to see and manipulate the continuously rendered view (NLG output) of an underlying formal language document (NLG input), thereby editing the formal language without having to learn it.
Evaluation
As in other scientific fields, NLG researchers need to be able to test how well their systems, modules, and algorithms work. This is called evaluation. There are three basic techniques for evaluating NLG systems:- task-based (extrinsic) evaluation: give the generated text to a person, and assess how well it helps him perform a task (or otherwise achieves its communicative goal). For example, a system which generates summaries of medical data can be evaluated by giving these summaries to doctors, and assessing whether the summaries helps doctors make better decisions.
- human ratings: give the generated text to a person, and ask him or her to rate the quality and usefulness of the text.
-  metrics: compare generated texts to texts written by people from the same input data, using an automatic metric such as BLEUBleubleu or BLEU may refer to:* the French word for blue* Three Colors: Blue, a 1993 movie* Bilingual Evaluation Understudy, a machine translation evaluation metric* Belgium–Luxembourg Economic Union...
 .
Generally speaking, what we ultimately want to know is how useful NLG systems are at helping people, which is the first of the above techniques. However, task-based evaluations are time-consuming and expensive, and can be difficult to carry out (especially if they require subjects with specialised expertise, such as doctors). Hence (as in other areas of NLP) task-based evaluations are the exception, not the norm.
In recent years researchers have started trying to assess how well human-ratings and metrics correlate with (predict) task-based evaluations. Much of this work is being conducted in the context of Generation Challenges shared-task events. Initial results suggest that human ratings are much better than metrics in this regard. In other words, human ratings usually do predict task-effectiveness at least to some degree (although there are exceptions ), while ratings produced by metrics often do not predict task-effectiveness well. These results are very preliminary, hopefully better data will be available soon. In any case, human ratings are currently the most popular evaluation technique in NLG; this is contrast to machine translation
Evaluation of machine translation
Various methods for the evaluation for machine translation have been employed. This article focuses on the evaluation of the output of machine translation, rather than on performance or usability evaluation.-Round-trip translation:...
, where metrics are very widely used.
External links
- ACL Special Interest Group on Generation (SIGGEN)
- SIGGEN part of ACL Anthology (contains NLG research papers)
- ACL NLG Portal (contains list of NLG resources)
- Bateman and Zock's list of NLG systems
- Introduction An open-ended review of the state of the art including many references (Last update: September 2002)
- KPML — general-purpose natural language generation system
- Yseop — business-oriented natural language generation system
- SimpleNLG — Open source Java library to assist in NLG (currently English only)
- SingleNLG-EnFr — Open source Java library adaption of simplenlg which adds French support.


