Caitra
Encyclopedia
Caitra is a translation
tool developed by the University of Edinburgh
. This Computer Assisted Tool or CAT tool is provided from an online platform, accessed from http://tool.statmt.org/ or http://www.caitra.org. It's based on the AJAX
Web.2 technologies and the Moses decoder. This web page of this tool is implemented with Ruby on Rails
, an open source web framework, and C++
, a statistically typed, multi-paradigm programming language.
Caitra helps the human translators by offering suggestions and alternative translations. The translation
process is simplified and faster.
(MT) systems are usually used by readers who do not need a quality translation
; they want a fast access to the foreign language. On the other hand, professional translators need more advanced Machine Translation
tools to make their work easier and elaborating a high-quality translation for their clients.
In the last years, MT
has experienced a big development, but this MT
is not always suitable for professional translator, because a simple MT
would not aid translators, it would be only an extra-work. However, tools with post-edition facilities have been developed as an intermediate field between typical MT and human translators, in order to integrate MT and human translation and achieving a successful result.
The Trans-Type project (Langlais et al., 2000) gave a pioneer approach to the MT
as a help to human translators. The translation tool would suggest different translations for a segment and the translator may accept them or overwrite their own translation, which triggers new possible translations to the tool.
The School of Informatics
and the Machine Translation Group of the University of Edinburgh
has created a research program, CAITRA, to analyze the benefits of different types of MT and explore the interaction between the machine and the user, in order to develop new CAT tools
.
(Raymond, 2007) connected to a MySQL
database-driven back-end. The machine translation back-end is powered by the statistical sentence-based MT
, Moses
(Koehn et al., 2007). C++ programming language
is used to improve the speed of the translation
suggestions
The tool is provided online in order to make a wide research about this type of Machine Translation
and obtain an advanced study of the user’s interaction with the tool. Moreover, the online feature allows the translation community to access to tool and know their opinions. You can access to Caitra web provider at http://www.caitra.org/
A simple text box is the link between the user and the tool. Caitra processes the text which is typed in the box by clicking the “Upload” icon. The process may last a few minutes, and Caitra will find different options for the translation
, one of them is taken by default.
Once the process is finished, translators have multiple options of assistance, presented in an interface. The segment for translation is the sentence and so Caitra works with only one sentence at the same time.
, consisting of sentence-segment translation
aided by a CAT tool
, which suggests several different options for the translation
. The human translators may choose one of them or typing their own translation if they do not like the offered translations. This process is similar to the auto-completion
which is used in a lot of office programs.
The statistical translation system
is followed to generate the predictions for translation
. These predictions are provided in short phrases, according to the statistical phrase-based translation model
. In addition, this model helps the user not to overload their sight, by using a few words at time. University of Edinburgh
is still investigating the proper length for these suggestions but it has not been developed yet. At the moment, short phrases are used and they are more useful and not distractive for the users. The suggestions and the user actions are stored in a large data base. During the user interaction, Caitra quickly matches user input against the graph using a string edit distance measure. The prediction is the optimal completion path that matches the user input with (a) minimal string edit distance and (b) highest sentence translation probability. This computation
takes place at the server and is implemented in C++
, as Philipp Koehn explains1.
Once the user accepts a suggestion, a new one is displayed as well the typing of a new segment. This process is very fast, it lasts less than a second.
The acceptance of suggestions depends on the pair of languages and the difficulty of the text. Preliminary studies about CAITRA suggest that users usually accept 50-80% of predictions generated by the system.
and edit the text on the basis of the predictions. The prediction table is displayed by clicking the edit icon.
The text is divided into sentences which are also divided into smaller units. Predictions for these units appear in a box, and the most likely suggestion has a different colour in the highest part of the table. Predictions are accepted by clicking on them and the system updates the election to the user input.
The database
is made of amounts of pairs of translated texts and translation
s. The most likely prediction is the result of previous matches in the data base
.
The users choices are scored in the data base
to be used in future translations.
These predictions help not only professional translators, but also novice translators who do not know the vocabulary and people who has no knowledge about the foreign language.
their translation
and make any change to correct possible mistakes. The changes appear in the output display.
.
The actions have different importance for the future predictions depending on the user's actions and in the time they need to perform their translation
. Every action, pause or movement is relevant in order to improve future translation
s.
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...
tool developed by the University of Edinburgh
University of Edinburgh
The University of Edinburgh, founded in 1583, is a public research university located in Edinburgh, the capital of Scotland, and a UNESCO World Heritage Site. The university is deeply embedded in the fabric of the city, with many of the buildings in the historic Old Town belonging to the university...
. This Computer Assisted Tool or CAT tool is provided from an online platform, accessed from http://tool.statmt.org/ or http://www.caitra.org. It's based on the AJAX
Ajax (programming)
Ajax is a group of interrelated web development methods used on the client-side to create asynchronous web applications...
Web.2 technologies and the Moses decoder. This web page of this tool is implemented with Ruby on Rails
Ruby on Rails
Ruby on Rails, often shortened to Rails or RoR, is an open source web application framework for the Ruby programming language.-History:...
, an open source web framework, and C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
, a statistically typed, multi-paradigm programming language.
Caitra helps the human translators by offering suggestions and alternative translations. The translation
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...
process is simplified and faster.
Introduction
Machine TranslationMachine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
(MT) systems are usually used by readers who do not need a quality translation
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...
; they want a fast access to the foreign language. On the other hand, professional translators need more advanced Machine Translation
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
tools to make their work easier and elaborating a high-quality translation for their clients.
In the last years, MT
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
has experienced a big development, but this MT
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
is not always suitable for professional translator, because a simple MT
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
would not aid translators, it would be only an extra-work. However, tools with post-edition facilities have been developed as an intermediate field between typical MT and human translators, in order to integrate MT and human translation and achieving a successful result.
The Trans-Type project (Langlais et al., 2000) gave a pioneer approach to the MT
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
as a help to human translators. The translation tool would suggest different translations for a segment and the translator may accept them or overwrite their own translation, which triggers new possible translations to the tool.
The School of Informatics
University of Edinburgh School of Informatics
The School of Informatics is an academic unit of the University of Edinburgh, in Scotland, responsible for research, teaching, outreach and commercialisation in Informatics....
and the Machine Translation Group of the University of Edinburgh
University of Edinburgh
The University of Edinburgh, founded in 1583, is a public research university located in Edinburgh, the capital of Scotland, and a UNESCO World Heritage Site. The university is deeply embedded in the fabric of the city, with many of the buildings in the historic Old Town belonging to the university...
has created a research program, CAITRA, to analyze the benefits of different types of MT and explore the interaction between the machine and the user, in order to develop new CAT tools
Computer-assisted translation
Computer-assisted translation, computer-aided translation, or CAT is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process....
.
Properties
Caitra is programmed with an open-source web framework, Ruby on Rails (Thomasand Hansson, 2008). The online platform uses Ajax-style Web 2.0 technologiesAjax (programming)
Ajax is a group of interrelated web development methods used on the client-side to create asynchronous web applications...
(Raymond, 2007) connected to a MySQL
MySQL
MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...
database-driven back-end. The machine translation back-end is powered by the statistical sentence-based MT
Statistical machine translation
Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...
, Moses
Moses (machine translation)
Moses is a free software statistical machine translation engine that allows automatically training translation models for any language pair given a collection of source and target text pairs...
(Koehn et al., 2007). C++ programming language
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
is used to improve the speed of the translation
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...
suggestions
The tool is provided online in order to make a wide research about this type of Machine Translation
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
and obtain an advanced study of the user’s interaction with the tool. Moreover, the online feature allows the translation community to access to tool and know their opinions. You can access to Caitra web provider at http://www.caitra.org/
A simple text box is the link between the user and the tool. Caitra processes the text which is typed in the box by clicking the “Upload” icon. The process may last a few minutes, and Caitra will find different options for the translation
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...
, one of them is taken by default.
Once the process is finished, translators have multiple options of assistance, presented in an interface. The segment for translation is the sentence and so Caitra works with only one sentence at the same time.
Interactive Machine translation
The Trans-Type project (Langlais et al., 2000) has done a deep investigation about Interactive Machine TranslationInteractive machine translation
Interactive Machine Translation , is a specific sub-field ofcomputer-aided translation. Under this translation paradigm, thecomputer software that assists the human translator attempts to predict the...
, consisting of sentence-segment translation
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...
aided by a CAT tool
Computer-assisted translation
Computer-assisted translation, computer-aided translation, or CAT is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process....
, which suggests several different options for the translation
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...
. The human translators may choose one of them or typing their own translation if they do not like the offered translations. This process is similar to the auto-completion
Autocomplete
Autocomplete is a feature provided by many web browsers, e-mail programs, search engine interfaces, source code editors, database query tools, word processors, and command line interpreters. Autocomplete involves the program predicting a word or phrase that the user wants to type in without the...
which is used in a lot of office programs.
The statistical translation system
Statistical machine translation
Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...
is followed to generate the predictions for translation
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...
. These predictions are provided in short phrases, according to the statistical phrase-based translation model
Statistical machine translation
Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...
. In addition, this model helps the user not to overload their sight, by using a few words at time. University of Edinburgh
University of Edinburgh
The University of Edinburgh, founded in 1583, is a public research university located in Edinburgh, the capital of Scotland, and a UNESCO World Heritage Site. The university is deeply embedded in the fabric of the city, with many of the buildings in the historic Old Town belonging to the university...
is still investigating the proper length for these suggestions but it has not been developed yet. At the moment, short phrases are used and they are more useful and not distractive for the users. The suggestions and the user actions are stored in a large data base. During the user interaction, Caitra quickly matches user input against the graph using a string edit distance measure. The prediction is the optimal completion path that matches the user input with (a) minimal string edit distance and (b) highest sentence translation probability. This computation
Computation
Computation is defined as any type of calculation. Also defined as use of computer technology in Information processing.Computation is a process following a well-defined model understood and expressed in an algorithm, protocol, network topology, etc...
takes place at the server and is implemented in C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
, as Philipp Koehn explains1.
Once the user accepts a suggestion, a new one is displayed as well the typing of a new segment. This process is very fast, it lasts less than a second.
The acceptance of suggestions depends on the pair of languages and the difficulty of the text. Preliminary studies about CAITRA suggest that users usually accept 50-80% of predictions generated by the system.
Translation process
One the text is uploaded and after a few minutes wait, users can visualize the result of the machine translationMachine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
and edit the text on the basis of the predictions. The prediction table is displayed by clicking the edit icon.
The text is divided into sentences which are also divided into smaller units. Predictions for these units appear in a box, and the most likely suggestion has a different colour in the highest part of the table. Predictions are accepted by clicking on them and the system updates the election to the user input.
The database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
is made of amounts of pairs of translated texts and translation
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...
s. The most likely prediction is the result of previous matches in the data base
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
.
The users choices are scored in the data base
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
to be used in future translations.
These predictions help not only professional translators, but also novice translators who do not know the vocabulary and people who has no knowledge about the foreign language.
Post- editing Machine Translation process
Users can reviewComputer-assisted reviewing
Computer-assisted reviewing tools are pieces of software based on text-comparison and analysis algorithms. These tools focus on the differences between two documents, taking into account each document's typeface through an intelligent analysis....
their translation
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...
and make any change to correct possible mistakes. The changes appear in the output display.
User's activity
Caitra stored in the data base the time users need to accept a prediction or writing their own translationTranslation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...
.
The actions have different importance for the future predictions depending on the user's actions and in the time they need to perform their translation
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...
. Every action, pause or movement is relevant in order to improve future translation
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...
s.
External links
- Ruby on Rails official website
- Caitra Official website
- Statistical Machine Translation Group at the University of Edinburgh
- Moses Official website. University of Edinburg
See also
- Machine TranslationMachine translationMachine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
- Computer-assisted translationComputer-assisted translationComputer-assisted translation, computer-aided translation, or CAT is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process....
- Computer-assisted reviewingComputer-assisted reviewingComputer-assisted reviewing tools are pieces of software based on text-comparison and analysis algorithms. These tools focus on the differences between two documents, taking into account each document's typeface through an intelligent analysis....