Parallel text
Encyclopedia
A parallel text is a text placed alongside its translation or translations. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. The Loeb Classical Library
and the Clay Sanskrit Library
are two examples of dual-language series of texts. Reference Bibles
may contain the original languages and a translation, or several translations by themselves, for ease of comparison and study; Origen
's Hexapla
(Gr. for "sixfold") placed six versions of the Old Testament side-by-side. Note also the most famous example, the Rosetta Stone
.
Large collections of parallel texts are called parallel corpora (see text corpus
). Alignments of parallel corpora at sentence level are prerequisite for many areas of linguistic
research.
During translation, sentences can be split, merged, deleted, inserted or reordered by the translator. This makes alignment a non-trivial task.
a bitext is a merged document composed of both source- and target-language versions of a given text.
Bitexts are generated by a piece of software called an alignment tool, or a bitext tool, which automatically aligns the original and translated versions of the same text. The tool generally matches these two texts sentence by sentence. A collection of bitexts is called a bitext database or a bilingual corpus, and can be consulted with a search tool.
. Generally, the most salient difference between a bitext and a translation memory is that a translation memory is a database in which its segments (matched sentences) are stored in a way that is totally unrelated to their original context; the original sentence order is lost. A bitext retains the original sentence order. However, some implementations of translation memory, such as Translation Memory eXchange
(TMX) (a standard XML
format for exchanging translation memories between computer-assisted translation
(CAT) programs, allow preserving the original order of sentences.
Bitexts are designed to be consulted by a human translator
, not by a machine. As such, small alignment errors or minor discrepancies that would cause a translation memory to fail are of no importance.
In his original 1988 article, Harris also posited that bitext represents how translators hold their source and target texts together in their mental working memories
as they progress. However, this hypothesis has not been followed up.
Loeb Classical Library
The Loeb Classical Library is a series of books, today published by Harvard University Press, which presents important works of ancient Greek and Latin Literature in a way designed to make the text accessible to the broadest possible audience, by presenting the original Greek or Latin text on each...
and the Clay Sanskrit Library
Clay Sanskrit Library
The Clay Sanskrit Library is a series of books published by New York University Press and the JJC Foundation. Each work features the text in its original language on the left-hand page, with its English translation on the right...
are two examples of dual-language series of texts. Reference Bibles
Bible translations
The Bible has been translated into many languages from the biblical languages of Hebrew, Aramaic and Greek. Indeed, the full Bible has been translated into over 450 languages, although sections of the Bible have been translated into over 2,000 languages....
may contain the original languages and a translation, or several translations by themselves, for ease of comparison and study; Origen
Origen
Origen , or Origen Adamantius, 184/5–253/4, was an early Christian Alexandrian scholar and theologian, and one of the most distinguished writers of the early Church. As early as the fourth century, his orthodoxy was suspect, in part because he believed in the pre-existence of souls...
's Hexapla
Hexapla
Hexapla is the term for an edition of the Bible in six versions. Especially it applies to the edition of the Old Testament compiled by Origen of Alexandria, which placed side by side:#Hebrew...
(Gr. for "sixfold") placed six versions of the Old Testament side-by-side. Note also the most famous example, the Rosetta Stone
Rosetta Stone
The Rosetta Stone is an ancient Egyptian granodiorite stele inscribed with a decree issued at Memphis in 196 BC on behalf of King Ptolemy V. The decree appears in three scripts: the upper text is Ancient Egyptian hieroglyphs, the middle portion Demotic script, and the lowest Ancient Greek...
.
Large collections of parallel texts are called parallel corpora (see text corpus
Text corpus
In linguistics, a corpus or text corpus is a large and structured set of texts...
). Alignments of parallel corpora at sentence level are prerequisite for many areas of linguistic
Linguistics
Linguistics is the scientific study of human language. Linguistics can be broadly broken into three categories or subfields of study: language form, language meaning, and language in context....
research.
During translation, sentences can be split, merged, deleted, inserted or reordered by the translator. This makes alignment a non-trivial task.
Bitext
In the field of translation studiesTranslation studies
Translation studies is an interdiscipline containing elements of social science and the humanities, dealing with the systematic study of the theory, the description and the application of translation, interpreting or both these activities....
a bitext is a merged document composed of both source- and target-language versions of a given text.
Bitexts are generated by a piece of software called an alignment tool, or a bitext tool, which automatically aligns the original and translated versions of the same text. The tool generally matches these two texts sentence by sentence. A collection of bitexts is called a bitext database or a bilingual corpus, and can be consulted with a search tool.
Bitexts and translation memories
The concept of the bitext shows certain similarities with that of the translation memoryTranslation memory
A translation memory, or TM, is a database that stores so-called "segments", which can be sentences or sentence-like units that have previously been translated. A translation memory system stores the words, phrases and paragraphs that have already been translated, in order to aid human translators...
. Generally, the most salient difference between a bitext and a translation memory is that a translation memory is a database in which its segments (matched sentences) are stored in a way that is totally unrelated to their original context; the original sentence order is lost. A bitext retains the original sentence order. However, some implementations of translation memory, such as Translation Memory eXchange
Translation Memory eXchange
TMX is an open XML standard for the exchange of translation memory data created by computer-aided translation and localization tools....
(TMX) (a standard XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
format for exchanging translation memories between computer-assisted translation
Computer-assisted translation
Computer-assisted translation, computer-aided translation, or CAT is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process....
(CAT) programs, allow preserving the original order of sentences.
Bitexts are designed to be consulted by a human translator
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...
, not by a machine. As such, small alignment errors or minor discrepancies that would cause a translation memory to fail are of no importance.
In his original 1988 article, Harris also posited that bitext represents how translators hold their source and target texts together in their mental working memories
as they progress. However, this hypothesis has not been followed up.
See also
- Computer-assisted reviewingComputer-assisted reviewingComputer-assisted reviewing tools are pieces of software based on text-comparison and analysis algorithms. These tools focus on the differences between two documents, taking into account each document's typeface through an intelligent analysis....
- Machine translationMachine translationMachine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
- Natural language processingNatural language processingNatural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
- Polyglot (book)Polyglot (book)A polyglot is a book that contains side-by-side versions of the same text in several different languages. Some editions of the Bible or its parts are polyglots, in which the Hebrew and Greek originals are exhibited along with historical translations...
- Ruby characterRuby characterare small, annotative glosses that can be placed above or to the right of a Chinese character when writing languages with logographic characters such as Chinese or Japanese to show the pronunciation...
Parallel corpora
- The JRC-Acquis Multilingual Parallel Corpus of the total body of European UnionEuropean UnionThe European Union is an economic and political union of 27 independent member states which are located primarily in Europe. The EU traces its origins from the European Coal and Steel Community and the European Economic Community , formed by six countries in 1958...
(EU) law: Acquis Communautaire with 231 language pairs. - European Parliament Proceedings Parallel Corpus 1996-2006
- The Opus project aims at collecting freely available parallel corpora
- COMPARA - Portuguese/English parallel corpora
- TERMSEARCH - English/Russian/French parallel corpora (Major international treaties, conventions, agreements, etc.
- Nunavut Hansard - English/Inuktitut parallel corpus
- ParaSol - A parallel corpus of Slavic and other languages
- Glosbe: Multilanguage parallel corpora with online search interface
Documentation
- Parallel text processing bibliography by J. Veronis and M.-D. Mahimon
- Proceedings of the 2003 Workshop on Building and Using Parallel Texts
- Proceedings of the 2005 Workshop on Building and Using Parallel Texts