Example-based machine translation
Encyclopedia
The example-based machine translation (EBMT) approach to machine translation
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...

 is often characterized by its use of a bilingual corpus
Text corpus
In linguistics, a corpus or text corpus is a large and structured set of texts...

 with parallel text
Parallel text
A parallel text is a text placed alongside its translation or translations. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. The Loeb Classical Library and the Clay Sanskrit Library are two examples of dual-language series of texts...

s
as its main knowledge base, at run-time. It is essentially a translation by analogy
Analogy
Analogy is a cognitive process of transferring information or meaning from a particular subject to another particular subject , and a linguistic expression corresponding to such a process...

 and can be viewed as an implementation of case-based reasoning
Case-based reasoning
Case-based reasoning , broadly construed, is the process of solving new problems based on the solutions of similar past problems. An auto mechanic who fixes an engine by recalling another car that exhibited similar symptoms is using case-based reasoning...

 approach of machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

.

At the foundation of example-based machine translation is the idea of translation by analogy. When applied to the process of human translation, the idea that translation takes place by analogy is a rejection of the idea that people translate sentences by doing deep linguistic analysis. Instead it is founded on the belief that people translate firstly by decomposing a sentence into certain phrases, then by translating these phrases, and finally by properly composing these fragments into one long sentence. Phrasal translations are translated by analogy to previous translations. The principle of translation by analogy is encoded to example-based machine translation through the example translations that are used to train such a system.
Example of bilingual corpus
English Japanese
How much is that red umbrella? Ano akai kasa wa ikura desu ka.
How much is that small camera? Ano chiisai kamera wa ikura desu ka.


Example-based machine translation systems are trained from bilingual parallel corpora
Text corpus
In linguistics, a corpus or text corpus is a large and structured set of texts...

, which contain sentence pairs like the example shown in the table. Sentence pairs contain sentences in one language with their translations into another. The particular example shows an example of a minimal pair, meaning that the sentences vary by just one element. These sentences make it simple to learn translations of subsentential units. For example, an example-based machine translation system would learn three units of translation:
  1. How much is that X ? corresponds to Ano X wa ikura desu ka.
  2. red umbrella corresponds to akai kasa
  3. small camera corresponds to chiisai kamera


Composing these units can be used to produce novel translations in the future. For example, if we have been trained using some text containing the sentences:

President Kennedy was shot dead during the parade. and The convict escaped on July 15th. We could translate the sentence The convict was shot dead during the parade. by substituting the appropriate parts of the sentences.

Other approaches to machine translation, including statistical machine translation
Statistical machine translation
Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...

, also use bilingual corpora to learn the process of translation.

Example based machine translation was first suggested by Makoto Nagao
Makoto Nagao
is a Japanese computer scientist. He contributed to various fields: machine translation, natural language processing, pattern recognition, image processing and library science...

 in 1984. It soon attracted the attention of scientists in the field of natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

.

EBMT is best suited for sub-language phenomena like phrasal verbs.

Phrasal verbs have highly context-dependent meanings. Phrasal verbs are a commonly occurring feature in English and comprise a verb followed by an adverb and/or a preposition. The adverb/preposition(s) are termed as the particle to the verb. Phrasal verbs produce specialized context-specific meanings that may not be derived from the meaning of the constituents. There is almost always an ambiguity during word-to-word translation from source to the target language.

As an example, let us consider the phrasal verb: put on and its Hindi meaning. It may be used in any of the following ways:
Ram put on the lights. (Switched on) (Jalana)
Ram put on a cap. (Wear) (Pahenna)

EBMT can be used to determine the context of the sentence.

See also

  • Machine translation
    Machine translation
    Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...

  • Machine learning
    Machine learning
    Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

  • Statistical machine translation
    Statistical machine translation
    Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...

  • Translation memory
    Translation memory
    A translation memory, or TM, is a database that stores so-called "segments", which can be sentences or sentence-like units that have previously been translated. A translation memory system stores the words, phrases and paragraphs that have already been translated, in order to aid human translators...


External links

  • Cunei - an open source platform for data-driven machine translation that grew out of research in EBMT, but also includes recent advances from the SMT
    Statistical machine translation
    Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...

    field
  • Marclator – a free/open-source marker-driven example-based machine translation system based on the Marker Hypothesis
  • Yeminli Sözlük - An English - Turkish corpus/dictionary of millions of pre-translated sentences
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK