Computer-assisted translation
Encyclopedia
Computer-assisted translation, computer-aided translation, or CAT is a form of translation
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...

 wherein a human translator translates texts using computer
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...

 software
Computer software
Computer software, or just software, is a collection of computer programs and related data that provide the instructions for telling a computer what to do and how to do it....

 designed to support and facilitate the translation process.

Computer-assisted translation is sometimes called machine-assisted, or machine-aided, translation.

Computer-assisted translation and machine translation

Some advanced computer-assisted translation solutions include controlled machine translation
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...

 (MT). This type of technology is widely known amongst professional translators and terminologists and also available to any individual translators who wish to invest in such technology. Higher priced MT modules generally provide a more complex set of tools available to the translator, which may include terminology management features and various other linguistic tools and utilities. Carefully customized user dictionaries based on correct terminology significantly improve the accuracy of MT, and as a result, aim at increasing the efficiency of the entire translation process.

Overview

Computer-assisted translation is a broad and imprecise term covering a range of tools, from the fairly simple to the more complicated. These can include:
  • Spell checkers, either built into word processing
    Word processing
    Word processing is the creation of documents using a word processor. It can also refer to advanced shorthand techniques, sometimes used in specialized contexts with a specially modified typewriter.-External links:...

     software, or add-on programs;
  • Grammar checker
    Grammar checker
    A grammar checker in computing terms, is a program, or part of a program, that attempts to verify written text for grammatical correctness. Grammar checkers are most often implemented as a feature of a larger program, such as a word processor, but are also available as stand-alone application that...

    s, again either built into word processing software, or add-on programs;
  • Terminology
    Terminology
    Terminology is the study of terms and their use. Terms are words and compound words that in specific contexts are given specific meanings, meanings that may deviate from the meaning the same words have in other contexts and in everyday language. The discipline Terminology studies among other...

     managers, allowing the translator to manage his own terminology bank in an electronic form. This can range from a simple table created in the translator's word processing software or spreadsheet, a database created in a program such as FileMaker Pro
    FileMaker
    FileMaker Pro is a cross-platform relational database application from FileMaker Inc., formerly Claris, a subsidiary of Apple Inc. It integrates a database engine with a GUI-based interface, allowing users to modify the database by dragging new elements into layouts, screens, or forms...

     or, for more robust (and more expensive) solutions, specialized software packages such as LogiTerm, MultiTerm, Termex, etc.
  • Dictionaries on CD-ROM
    CD-ROM
    A CD-ROM is a pre-pressed compact disc that contains data accessible to, but not writable by, a computer for data storage and music playback. The 1985 “Yellow Book” standard developed by Sony and Philips adapted the format to hold any form of binary data....

    , either unilingual or bilingual
  • Terminology databases, either on CD-ROM or accessible through the Internet, (such as TERMIUM Plus
    TERMIUM Plus
    TERMIUM Plus is the Government of Canada’s terminology and linguistic data bank.The data bank has nearly four million general and specialized terms and offers French/English and some Spanish equivalents for each...

     or Grand dictionnaire terminologique
    Grand dictionnaire terminologique
    The Grand dictionnaire terminologique is an online terminological database containing nearly 3 million French, English and Latin technical terms in 200 industrial, scientific and commercial fields....

    from the Office québécois de la langue française
    Office québécois de la langue française
    The Office québécois de la langue française is a public organization established on March 24, 1961 by the Liberal government of Jean Lesage...

    )
  • Full-text search tools (or indexers), which allow the user to query already translated texts or reference documents of various kinds. In the translation industry one finds such indexers as Naturel, ISYS Search Software
    ISYS Search Software
    ISYS Search Software is a global leader in high performance enterprise search software and universal information access solutions. The company develops a suite of search, mobile access and information management infrastructure technologies....

     and dtSearch.
  • Concordancer
    Concordancer
    A concordancer is a computer program that automatically constructs a concordance. The output of a concordancer may serve as input to a translation memory system for computer-assisted translation, or as an early step in machine translation....

    s, which are programs that retrieve instances of a word or an expression and their respective context in a monolingual, bilingual or multiligual corpus, such as a bitext or a translation memory.
  • Bitexts, a fairly recent development, the result of merging a source text and its translation, which can then be analyzed using a full-text search tool or a concordancer
    Concordancer
    A concordancer is a computer program that automatically constructs a concordance. The output of a concordancer may serve as input to a translation memory system for computer-assisted translation, or as an early step in machine translation....

    .
  • Project management software
    Project management software
    Project management software is a term covering many types of software, including estimation and planning, scheduling, cost control and budget management, resource allocation, collaboration software, communication, quality management and documentation or administration systems, which are used to...

     that allows linguists to structure complex translation projects, assign the various tasks to different people, and track the progress of each of these tasks.
  • Translation memory
    Translation memory
    A translation memory, or TM, is a database that stores so-called "segments", which can be sentences or sentence-like units that have previously been translated. A translation memory system stores the words, phrases and paragraphs that have already been translated, in order to aid human translators...

     tools (TM tools), consisting of a database
    Database
    A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

     of text segments in a source language and their translations in one or more target languages.

Translation memory software

Translation memory
Translation memory
A translation memory, or TM, is a database that stores so-called "segments", which can be sentences or sentence-like units that have previously been translated. A translation memory system stores the words, phrases and paragraphs that have already been translated, in order to aid human translators...

 programs store previously translated source texts and their equivalent target texts in a database and retrieve related segments during the translation of new texts.

Such programs split the source text into manageable units known as "segments". A source-text sentence or sentence-like unit (headings, titles or elements in a list) may be considered a segment, or texts may be segmented into larger units such as paragraphs or small ones, such as clauses. As the translator works through a document, the software displays each source segment in turn and provides a previous translation for re-use, if the program finds a matching source segment in its database. If it does not, the program allows the translator to enter a translation for the new segment. After the translation for a segment is completed, the program stores the new translation and moves onto the next segment. In the dominant paradigm, the translation memory, in principle, is a simple database of fields containing the source language segment, the translation of the segment, and other information such as segment creation date, last access, translator name, and so on. Another translation memory approach does not involve the creation of a database, relying on aligned reference documents instead.

Some translation memory programs function as standalone environments, while others function as an add-on
Add-on
Add-on might mean:* Plug-in , a piece of software which enhances another software application and usually cannot be run independently** Browser extension, which modifies the interface and/or behavior of web browsers...

 or macro to commercially available word-processing or other business software programs. Add-on programs allow source documents from other formats, such as desktop publishing files, spreadsheet
Spreadsheet
A spreadsheet is a computer application that simulates a paper accounting worksheet. It displays multiple cells usually in a two-dimensional matrix or grid consisting of rows and columns. Each cell contains alphanumeric text, numeric values or formulas...

s, or HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

 code, to be handled using the TM program.

Language Search Engine Software

New to the translation industry, Language Search Engine software is typically an Internet based system that works similarly to Internet search engines. Rather than searching the Internet, however, a language search engine searches a large repository of Translation Memories to find previously translated sentence fragments, phrases, whole sentences, even complete paragraphs that match source document segments.

Language search engines are designed to leverage modern search technology to conduct searches based on the source words in context to ensure that the search results match the meaning of the source segments. Like traditional TM tools, the value of a language search engine rests heavily on the Translation Memory repository it searches against.

Terminology management software

Terminology
Terminology
Terminology is the study of terms and their use. Terms are words and compound words that in specific contexts are given specific meanings, meanings that may deviate from the meaning the same words have in other contexts and in everyday language. The discipline Terminology studies among other...

 management software provides the translator a means of automatically searching a given terminology database for terms appearing in a document, either by automatically displaying terms in the translation memory software interface window or through the use of hot keys to view the entry in the terminology database. Some programs have other hotkey combinations allowing the translator to add new terminology pairs to the terminology database on the fly during translation. Some of the more advanced systems enable translators to check, either interactively or in batch mode
Batch processing
Batch processing is execution of a series of programs on a computer without manual intervention.Batch jobs are set up so they can be run to completion without manual intervention, so all input data is preselected through scripts or command-line parameters...

, if the correct source/target term combination has been used within and across the translation memory segments in a given project. Independent terminology management systems also exist that can provide workflow functionality, visual taxonomy, work as a type of term checker (similar to spell checker, terms that have not been used correctly are flagged) and can support other types of multilingual term facet classifications such as pictures, videos, or sound.

Alignment software

Alignment programs take completed translations, divide both source and target texts into segments, and attempt to determine which segments belong together
Parallel text
A parallel text is a text placed alongside its translation or translations. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. The Loeb Classical Library and the Clay Sanskrit Library are two examples of dual-language series of texts...

 in order to build a translation memory
Translation memory
A translation memory, or TM, is a database that stores so-called "segments", which can be sentences or sentence-like units that have previously been translated. A translation memory system stores the words, phrases and paragraphs that have already been translated, in order to aid human translators...

 database with the content. Many alignment programs allow translators to manually realign mismatched segments. The resulting translation memory file can then be imported into a translation memory program for future translations.

Interactive machine translation

Interactive machine translation
Interactive machine translation
Interactive Machine Translation , is a specific sub-field ofcomputer-aided translation. Under this translation paradigm, thecomputer software that assists the human translator attempts to predict the...

 is a paradigm in which the automatic system attempts to predict the translation the human translator is going to produce by suggesting translation hypotheses. These hypotheses may either be the complete sentence, or the part of the sentence that is yet to be translated.

Comparison of various Translation Memory tools

(Alphabetical order, free software first, proprietary
Proprietary software
Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary...

 solutions second.)
Tool Supported File Formats OS Price License
GlobalSight
GlobalSight
GlobalSight is an open source Translation Management System released under the Apache License 2.0. As of version 7.1 it supports the TMX and SRX 2.0 Localization Industry Standards Association standards. It was developed in the Java programming language and uses a MySQL database...

Text ANSI / ASCII / Unicode for Windows, Text for Apple Macintosh, HTML, XML (ASP.NET, ASP, JSP, XSL), SGML, SVG (Scalable Vector Graphics), MS Word for Windows, MS Excel, MS PowerPoint, RTF, RC, QuarkXPress, Adobe FrameMaker, Adobe PageMaker, Interleaf /Quicksilver, Adobe InDesign Cross-platform
Cross-platform
In computing, cross-platform, or multi-platform, is an attribute conferred to computer software or computing methods and concepts that are implemented and inter-operate on multiple computer platforms...

 (Java)
Apache License 2.0
Apache License
The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer....

gtranslator
Gtranslator
gtranslator is an enhanced gettext po file editor for the GNOME desktop environment. It handles all forms of gettext po files and includes features such as Find/Replace, Translation Memory, different Translator Profiles, Messages Table , Easy Navigation and Editing of translation messages and...

PO
Gettext
In computing, gettext is an internationalization and localization system commonly used for writing multilingual programs on Unix-like computer operating systems. The most commonly-used implementation of gettext is GNU gettext, released by the GNU Project in 1995.- History :gettext was originally...

POSIX
POSIX
POSIX , an acronym for "Portable Operating System Interface", is a family of standards specified by the IEEE for maintaining compatibility between operating systems...

GPL
Lokalize
Lokalize
Lokalize is a computer-aided translation system for translators, written from scratch using KDE Platform 4. Aside from basic editing of PO files with nifty auxiliary details, it integrates support for glossary, translation memory, diff-modes for QA, project managing, etc.- External links :* *...

Gettext PO, Qt ts, XLIFF Cross-platform GPL

OmegaT
OmegaT
OmegaT is a computer-assisted translation tool written in the Java programming language. It is free software originally developed by Keith Godfrey in 2000, and is currently developed by a team led by Didier Briel. The name OmegaT is a registered trademark in Germany.OmegaT is intended for...

Plain text, HTML, XHTML, StarOffice, OpenOffice.org, OpenDocument (ODF), MS Office Open XML, Help & Manual, HTML Help Compiler (HCC), LaTeX, DokuWiki, QuarkXPress CopyFlow Gold, DocBook, Android Resource, Java Properties, Typo3 LocManager, Mozilla DTD, Windows RC, WiX, ResX, INI files, XLIFF, Gettext PO, SubRip Subtitles, SVG Images Cross-platform (Java) GPL
Open Language Tools
Open language tools
Open Language Tools is a Java project released by Sun Microsystems under the terms of Sun’s CDDL .Open Language Tools are intended for people who are involved in translation of software and documentation into different natural languages...

XLIFF, HTML/XHTML, XML, DocBook SGML, ASCII, StarOffice/OpenOffice/ODF, .po (gettext), .properties, .java (ResourceBundle), .msg/.tmsg (catgets) Cross-platform (Java) CDDL
Common Development and Distribution License
Common Development and Distribution License is a free software license, produced by Sun Microsystems, based on the Mozilla Public License , version 1.1....

Poedit
PoEdit
Poedit is a free, open source and cross-platform gettext catalog editor to aid in the process of language localisation...

Gettext PO Cross-platform MIT license
MIT License
The MIT License is a free software license originating at the Massachusetts Institute of Technology . It is a permissive license, meaning that it permits reuse within proprietary software provided all copies of the licensed software include a copy of the MIT License terms...

Pootle
Pootle
Pootle is an online translation management tool with translation interface. It is written in the Python programming language using the Django framework and is free software originally developed and released by Translate.org.za in 2004...

Gettext PO, XLIFF, OpenOffice GSI files (.sdf), TMX, TBX, Java Properties, DTD, CSV, HTML, XHTML, Plain Text Cross-platform (Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

)
GPL
Virtaal
Virtaal
Virtaal is a computer-assisted translation tool written in the Python programming language. It is free software developed and maintained by Translate.org.za....

XLIFF, Gettext PO and MO, TMX, TBX, Wordfast TM, Qt ts
Many others via converters in the Translate Toolkit
Translate Toolkit
The Translate Toolkit is a localization and translation toolkit. It provides a set of tools for working with localization file formats and files that might need localization. The toolkit also provides an API on which to develop other localization tools....

Cross-platform (Python) GPL
Proprietary software
AnyMem
Advanced International Translations
Advanced International Translations is a supplier of translation management software for translation agencies and freelance translators worldwide...

MS Word Plug-in Windows €89 (1 license) Proprietary
SDL Trados
SDL Trados
SDL Trados is the market leading computer assisted translation software suites, originally developed by the German company Trados GmbH and currently available from SDL International, a provider of translation management software, content management and language services...

Features four translation environments: dedicated TagEditor, MSWord Interface, SDLX, and - as the latest development - the new SDL Trados Studio 2009. Additional filters for translating with TagEditor available: Word, Excel, PowerPoint, OpenOffice
OpenOffice.org
OpenOffice.org, commonly known as OOo or OpenOffice, is an open-source application suite whose main components are for word processing, spreadsheets, presentations, graphics, and databases. OpenOffice is available for a number of different computer operating systems, is distributed as free software...

, InDesign, QuarkXPress, PageMaker, Interleaf, Framemaker, HTML, SGML, XML, SVG, .... Includes SDL MultiTerm for terminology management and Project Management Dashboard for automating tasks and tracking.
Windows €795 (freelance) - €4995 (LSP; floating license) Proprietary
Web Translate It
Web Translate It
Web Translate It is a web-based translation project management tool developed by Atelier Convivialité and launched in 2009.-Features:Web Translate It encompasses translation project management, user management and project file management...

MS Word .docx, Powerpoint .pptx, Gettext .po/.pot, XLIFF, Java .properties, Qt .ts, JSON, MS .resx/.aspx, Ruby .yml, Apple .strings, Android .xml, Blackberry .rrc, HTML, PHP .ini/.conf, PHP array .php, NSIS .nsh, text files (plain text, textile, Markdown), YouTube .sbv, Subrip .srt... View complete list of supported formats Any (web based) From €14 to €149 per month Proprietary
Wordfast
Wordfast
Wordfast is a provider of translation memory software. Wordfast provides platform-independent TM solutions for freelance translators, language service providers, and multi-national corporations.- History :...

 Classic
MS Word, Excel, PowerPoint (for Windows and Mac); tagged documents Microsoft Office Word addin €330/165 (for both Wordfast Classic and Wordfast Pro bundle); 50% discount for translators from low-income countries Proprietary


According to a survey done by the Imperial College, in 2006 the most popular systems were (in decreasing order):
  • Trados
  • Déjà Vu
    Déjà vu
    Déjà vu is the experience of feeling sure that one has already witnessed or experienced a current situation, even though the exact circumstances of the prior encounter are uncertain and were perhaps imagined...

  • Wordfast
    Wordfast
    Wordfast is a provider of translation memory software. Wordfast provides platform-independent TM solutions for freelance translators, language service providers, and multi-national corporations.- History :...

  • SDL Trados 2006
  • SDLX
  • Star Transit
    Star transit
    A Star transit is the passage of a star through the eyepiece of an telescope.The precise observation of star transits is the basis of many methods in Astronomy and in Geodesy...

  • OmegaT
    OmegaT
    OmegaT is a computer-assisted translation tool written in the Java programming language. It is free software originally developed by Keith Godfrey in 2000, and is currently developed by a team led by Didier Briel. The name OmegaT is a registered trademark in Germany.OmegaT is intended for...


See also

  • Computational linguistics
    Computational linguistics
    Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....

  • Computer-assisted reviewing
    Computer-assisted reviewing
    Computer-assisted reviewing tools are pieces of software based on text-comparison and analysis algorithms. These tools focus on the differences between two documents, taking into account each document's typeface through an intelligent analysis....

  • Fuzzy matching
    Fuzzy matching
    Fuzzy matching is a technique used in computer-assisted translation and some other information technology applications such as record linkage. It works with matches that may be less than 100% perfect when finding correspondences between segments of a text and entries in a database of previous...

  • Open Translation Engine
    Open Translation Engine
    The Open Translation Engine is an Open Source project creating language translation and dictionary tools. The OTE is written in PHP and uses a MySQL database and XML data files...

  • Translation
    Translation
    Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK