Language Weaver
Encyclopedia
SDL Language Weaver is a Los Angeles, California
–based company that was founded in 2002 by the University of Southern California
's Kevin Knight and Daniel Marcu, to commercialize a statistical approach to automatic language translation
and natural language processing - now known globally as statistical machine translation
software (SMTS).
SDL Language Weaver’s statistically-based translation software is an instance of a recent advance in automated translation. While earlier machine translation technology relied on collections of linguistic rules to analyze the source sentence, and then map the syntactic and semantic structure into the target language, SDL Language Weaver uses statistical techniques from cryptography
, applying machine learning
algorithms that automatically acquire statistical models
from existing parallel collections
of human translations. These models are more likely to be up to date, appropriate and idiomatic, because they are learned directly from real translations. The software can also be quickly customized to any subject area or style and do a full translation of previously unseen text.
Statistical MT was once thought appropriate only for languages with very large amounts of pre-translated data. However, with new advances in SMT, SDL Language Weaver has been able to also create translation systems for languages smaller amounts of parallel data. Additionally, with customization, SMT can also "learn" to accurately translate highly technical material.
SDL Language Weaver's primary product is their translation software. They currently offer 24 bi-directional language pairs—these include English to and from French, Italian, Danish, Greek, Spanish, German, Dutch, Portuguese, Swedish, Russian, Czech, Romanian, Polish, Arabic, Persian, Simplified and Traditional Chinese, Korean, and Hindi. Several non-English language pairs are also available, such as Arabic-Spanish, Arabic-French, Spanish-French and French-German.
The current language pairs all utilize phrase-based statistical MT. However, the company is also working on syntax
-based statistical MT for certain language pairs to improve the overall translation quality.
SDL Language Weaver can also create customized (domain specific) language pairs for particular companies. They uses a customer's existing, pre-translated data to "train" a new translation system that statistically understands how to translate that customers information so new data can be translated in a shorter amount of time and edited as needed prior to publication.
As well as their primary translation software, SDL Language Weaver has several other products available. Their Alignment Tool is a translation memory
generator. This allows users to enter previously translated documents, and align them at the segment level, producing a translation memory file. The company also has Customizer, a customization tool. This product allows users to fine-tune the translation system using small amounts (up to 2 million words) of pre-translated data in a specific subject area. This tool allows for incremental improvements over time and gives users more control of the process. However, some customer feedback indicates that while vocabulary may get better, fluency of the translation can be negatively impacted.
In July 2010, Language Weaver was acquired by SDL Plc SDL International
for $42.5 million and the company was renamed SDL Language Weaver.
Los Angeles, California
Los Angeles , with a population at the 2010 United States Census of 3,792,621, is the most populous city in California, USA and the second most populous in the United States, after New York City. It has an area of , and is located in Southern California...
–based company that was founded in 2002 by the University of Southern California
University of Southern California
The University of Southern California is a private, not-for-profit, nonsectarian, research university located in Los Angeles, California, United States. USC was founded in 1880, making it California's oldest private research university...
's Kevin Knight and Daniel Marcu, to commercialize a statistical approach to automatic language translation
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...
and natural language processing - now known globally as statistical machine translation
Statistical machine translation
Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...
software (SMTS).
SDL Language Weaver’s statistically-based translation software is an instance of a recent advance in automated translation. While earlier machine translation technology relied on collections of linguistic rules to analyze the source sentence, and then map the syntactic and semantic structure into the target language, SDL Language Weaver uses statistical techniques from cryptography
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...
, applying machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
algorithms that automatically acquire statistical models
Mathematical model
A mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used not only in the natural sciences and engineering disciplines A mathematical model is a...
from existing parallel collections
Parallel text
A parallel text is a text placed alongside its translation or translations. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. The Loeb Classical Library and the Clay Sanskrit Library are two examples of dual-language series of texts...
of human translations. These models are more likely to be up to date, appropriate and idiomatic, because they are learned directly from real translations. The software can also be quickly customized to any subject area or style and do a full translation of previously unseen text.
Statistical MT was once thought appropriate only for languages with very large amounts of pre-translated data. However, with new advances in SMT, SDL Language Weaver has been able to also create translation systems for languages smaller amounts of parallel data. Additionally, with customization, SMT can also "learn" to accurately translate highly technical material.
SDL Language Weaver's primary product is their translation software. They currently offer 24 bi-directional language pairs—these include English to and from French, Italian, Danish, Greek, Spanish, German, Dutch, Portuguese, Swedish, Russian, Czech, Romanian, Polish, Arabic, Persian, Simplified and Traditional Chinese, Korean, and Hindi. Several non-English language pairs are also available, such as Arabic-Spanish, Arabic-French, Spanish-French and French-German.
The current language pairs all utilize phrase-based statistical MT. However, the company is also working on syntax
Syntax
In linguistics, syntax is the study of the principles and rules for constructing phrases and sentences in natural languages....
-based statistical MT for certain language pairs to improve the overall translation quality.
SDL Language Weaver can also create customized (domain specific) language pairs for particular companies. They uses a customer's existing, pre-translated data to "train" a new translation system that statistically understands how to translate that customers information so new data can be translated in a shorter amount of time and edited as needed prior to publication.
As well as their primary translation software, SDL Language Weaver has several other products available. Their Alignment Tool is a translation memory
Translation memory
A translation memory, or TM, is a database that stores so-called "segments", which can be sentences or sentence-like units that have previously been translated. A translation memory system stores the words, phrases and paragraphs that have already been translated, in order to aid human translators...
generator. This allows users to enter previously translated documents, and align them at the segment level, producing a translation memory file. The company also has Customizer, a customization tool. This product allows users to fine-tune the translation system using small amounts (up to 2 million words) of pre-translated data in a specific subject area. This tool allows for incremental improvements over time and gives users more control of the process. However, some customer feedback indicates that while vocabulary may get better, fluency of the translation can be negatively impacted.
In July 2010, Language Weaver was acquired by SDL Plc SDL International
SDL International
SDL plc is a global company with headquarters based in Maidenhead, UK that provides language localization services and translation and content management software...
for $42.5 million and the company was renamed SDL Language Weaver.