NIST (metric)
Encyclopedia
NIST is a method for evaluating the quality of text which has been translated using machine translation
. Its name comes from the US National Institute of Standards and Technology
.
It is based on the BLEU
metric, but with some alterations. Where BLEU
simply calculates n-gram
precision adding equal weight to each one, NIST also calculates how informative a particular n-gram
is. That is to say when a correct n-gram
is found, the rarer that n-gram is, the more weight it will be given.
For example, if the bigram "on the" is correctly matched, it will receive lower weight than the correct matching of bigram "interesting calculations", as this is less likely to occur.
NIST also differs from BLEU
in its calculation of the brevity penalty insofar as small variations in translation length do not impact the overall score as much.
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
. Its name comes from the US National Institute of Standards and Technology
National Institute of Standards and Technology
The National Institute of Standards and Technology , known between 1901 and 1988 as the National Bureau of Standards , is a measurement standards laboratory, otherwise known as a National Metrological Institute , which is a non-regulatory agency of the United States Department of Commerce...
.
It is based on the BLEU
Bilingual evaluation understudy
BLEU is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Quality is considered to be the correspondence between a machine's output and that of a human: "the closer a machine translation is to a professional human translation,...
metric, but with some alterations. Where BLEU
Bilingual evaluation understudy
BLEU is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Quality is considered to be the correspondence between a machine's output and that of a human: "the closer a machine translation is to a professional human translation,...
simply calculates n-gram
N-gram
In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items in question can be phonemes, syllables, letters, words or base pairs according to the application...
precision adding equal weight to each one, NIST also calculates how informative a particular n-gram
N-gram
In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items in question can be phonemes, syllables, letters, words or base pairs according to the application...
is. That is to say when a correct n-gram
N-gram
In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items in question can be phonemes, syllables, letters, words or base pairs according to the application...
is found, the rarer that n-gram is, the more weight it will be given.
For example, if the bigram "on the" is correctly matched, it will receive lower weight than the correct matching of bigram "interesting calculations", as this is less likely to occur.
NIST also differs from BLEU
Bilingual evaluation understudy
BLEU is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Quality is considered to be the correspondence between a machine's output and that of a human: "the closer a machine translation is to a professional human translation,...
in its calculation of the brevity penalty insofar as small variations in translation length do not impact the overall score as much.
See also
- BLEUBleubleu or BLEU may refer to:* the French word for blue* Three Colors: Blue, a 1993 movie* Bilingual Evaluation Understudy, a machine translation evaluation metric* Belgium–Luxembourg Economic Union...
- F-MeasureF1 ScoreIn statistics, the F1 score is a measure of a test's accuracy. It considers both the precision p and the recall r of the test to compute the score: p is the number of correct results divided by the number of all returned results and r is the number of correct results divided by the number of...
- METEORMETEORMETEOR is a metric for the evaluation of machine translation output. The metric is based on the harmonic mean of unigram precision and recall, with recall weighted higher than precision...
- ROUGE (metric)ROUGE (metric)ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing...
- Word Error Rate (WER)Word error rateWord error rate is a common metric of the performance of a speech recognition or machine translation system.The general difficulty of measuring performance lies in the fact that the recognized word sequence can have a different length from the reference word sequence...
- Noun-Phrase Chunking