Asia Online
Encyclopedia
Asia Online is a privately owned company backed by individual investors and institutional venture capital. Its corporate headquarters are in Singapore
Singapore
Singapore , officially the Republic of Singapore, is a Southeast Asian city-state off the southern tip of the Malay Peninsula, north of the equator. An island country made up of 63 islands, it is separated from Malaysia by the Straits of Johor to its north and from Indonesia's Riau Islands by the...

, and it has significant operations in Bangkok
Bangkok
Bangkok is the capital and largest urban area city in Thailand. It is known in Thai as Krung Thep Maha Nakhon or simply Krung Thep , meaning "city of angels." The full name of Bangkok is Krung Thep Mahanakhon Amon Rattanakosin Mahintharayutthaya Mahadilok Phop Noppharat Ratchathani Burirom...

, Thailand
Thailand
Thailand , officially the Kingdom of Thailand , formerly known as Siam , is a country located at the centre of the Indochina peninsula and Southeast Asia. It is bordered to the north by Burma and Laos, to the east by Laos and Cambodia, to the south by the Gulf of Thailand and Malaysia, and to the...

, with R&D activities throughout Asia
Asia
Asia is the world's largest and most populous continent, located primarily in the eastern and northern hemispheres. It covers 8.7% of the Earth's total surface area and with approximately 3.879 billion people, it hosts 60% of the world's current human population...

 and expanding sales operations in Europe
Europe
Europe is, by convention, one of the world's seven continents. Comprising the westernmost peninsula of Eurasia, Europe is generally 'divided' from Asia to its east by the watershed divides of the Ural and Caucasus Mountains, the Ural River, the Caspian and Black Seas, and the waterways connecting...

 and North America
North America
North America is a continent wholly within the Northern Hemisphere and almost wholly within the Western Hemisphere. It is also considered a northern subcontinent of the Americas...

.

Asia Online is undertaking what it calls the world's largest literacy project by translating
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...

 vast quantities of the worlds English language
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...

 knowledge into Asian languages. This is achieved using statistical machine translation
Statistical machine translation
Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...

 (SMT) technologies developed and enhanced in Thailand with a specific focus on Asian languages. Despite the name, Asia Online is not limited to just Asian languages and also supports all 23 official EU languages across each other..

It was founded in 2007 by the University of Edinburgh
University of Edinburgh
The University of Edinburgh, founded in 1583, is a public research university located in Edinburgh, the capital of Scotland, and a UNESCO World Heritage Site. The university is deeply embedded in the fabric of the city, with many of the buildings in the historic Old Town belonging to the university...

's Philipp Koehn
Philipp Koehn
Philipp Koehn is a computer scientist and researcher in the field of machine translation. His primary research interest is statistical machine translation and he is one of the inventors of a method called phrase based machine translation which is a sub-field of statistical translation methods...

, Gregory Binger a leading technologist and IT/IP lawyer, and former Gartner
Gartner
Gartner, Inc. is an information technology research and advisory firm headquartered in Stamford, Connecticut, United States. It was known as GartnerGroup until 2001....

 senior analysts Bob Hayward and Dion Wiggins..

Asia Online’s statistically-based translation software is an instance of recent advances in automated translation. Until the early 1990s, almost all production-level machine translation technology relied on collections of linguistic rules to analyze the source sentence, and then map the syntactic and semantic structure into the target language. Asia Online uses statistical techniques from cryptography
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...

, applying machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

 algorithms that automatically acquire statistical models
Mathematical model
A mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used not only in the natural sciences and engineering disciplines A mathematical model is a...

 from existing parallel collections
Parallel text
A parallel text is a text placed alongside its translation or translations. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. The Loeb Classical Library and the Clay Sanskrit Library are two examples of dual-language series of texts...

 of human translations, in the same way as Google Translate
Google Translate
Google Translate is a free statistical machine translation service provided by Google Inc. to translate a section of text, document or webpage, into another language.The service was introduced in April 28, 2006 for the Arabic language...

 and the systems made using Koehn's own open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

 Moses
Moses (machine translation)
Moses is a free software statistical machine translation engine that allows automatically training translation models for any language pair given a collection of source and target text pairs...

 tool for SMT.

Portal Initiatives

On January 7, 2011, Asia Online launched it’s Thai language consumer portal. The launch was funded in part by CAT Telecom and the Thai Ministry of ICT. All 3.6 million English language Wikipedia articles were translated from English into Thai. Then Prime Minister Abhisit Vejjajiva
Abhisit Vejjajiva
Abhisit Vejjajiva , , ; born Mark Abhisit Vejjajiva; 3 August 1964 in Newcastle upon Tyne) is a Thai politician who was the 27th Prime Minister of Thailand from 2008 to 2011 and is the current leader of the Democrat Party...

 and Minister of ICT Chuti Krairiksh launched the site as part of Thailand
Thailand
Thailand , officially the Kingdom of Thailand , formerly known as Siam , is a country located at the centre of the Indochina peninsula and Southeast Asia. It is bordered to the north by Burma and Laos, to the east by Laos and Cambodia, to the south by the Gulf of Thailand and Malaysia, and to the...

’s Children’s Day celebrations. A crowd sourcing approach is being taken to proof read the articles after they have been machine translated.

Differences from other approaches

Google
Google Translate
Google Translate is a free statistical machine translation service provided by Google Inc. to translate a section of text, document or webpage, into another language.The service was introduced in April 28, 2006 for the Arabic language...

, Microsoft and SDL Language Weaver
Language Weaver
SDL Language Weaver is a Los Angeles, California–based company that was founded in 2002 by the University of Southern California's Kevin Knight and Daniel Marcu, to commercialize a statistical approach to automatic language translation and natural language processing - now known globally as...

 have also created SMT
Statistical machine translation
Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...

 systems, some publicly accessible. Asia Online claims there are flaws in the existing processes and techniques of SMT
Statistical machine translation
Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...

 and worked to resolve these issues. It claims three key differences from traditional SMT
Statistical machine translation
Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...

 approaches:
  • Clean data – The traditional approach leveraged content found on the web in corporate sites, news articles and other similar sources where the same content was available in multiple languages. The quality of the data was very low. Asia Online has focused machine and human resources in this area to ensure that the data is as clean and as accurate as possible. Data is sourced from high-quality translations provided by book publishers and translation companies and is aligned at the segment level (usually sentences) and converted into a consistent format in order to be processed by the learning software. This step includes:
    • Extracting segments from files and documents if they are not in a TMX format.
    • Aligning segments (if necessary) once they have been extracted. While this is automated by machines, humans are also used to validate the accuracy.
    • Converting data to a base UTF-8
      UTF-8
      UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

       encoding for training the SMT
      Statistical machine translation
      Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...

       system.
    • Extracting small subsets from the data to guide training.
    • Reviewing, cleaning and analyzing the data to ensure optimal training impact.
  • Multiple domains - Extensive efforts have been put into a system that allows for training in many domains. This is done by extending a base set of information with multiple additional learning sources.
  • Real-time corrections
  • Languages available – Asia Online currently has more than 520 language pairs available in a baseline form and is progressively deploying 15 domains across each language pair. Another 200+ language pairs are under development. These systems are currently used to build customized translation systems for corporate and language service provider (LSP) customers who add their bilingual parallel corpus to the existing data to create higher quality translation systems.


The company characterizes its products as a "platform". By this they mean that there is a suite of independent tools and products that can work independently and together. Some are locally installed and some are only available in their SaaS. This is described in the CSA blog entry.

The Language Studio product suite was reviewed by Common Sense Advisory, a translation industry market research firm, in their Global Watchtower blog shown in the link below.

Supported Languages

Supported languages are being added to on a regular basis.


Asian Languages

Available Now: Arabic (AR), Chinese (ZH), Japanese (JA), Bahasa Indonesia (ID), Korean (KO) and Thai (TH).

Under Development: Bahasa Malay (MS), Bengali (BN), Gujarati (GU), Hindi (HI), Punjabi (PA), Tagalog (TL), Tamil (TM), Vietnamese (VI).

European Languages

Available Now: Bulgarian (BG), Czech (CS), Danish (DA), Dutch (NL), English (EN), Estonian (ET), Finnish (FI), French (FR), German (DE), Greek (EL), Hungarian (HU), Irish (GA), Italian (IT), Latvian (LV), Lithuanian (LT), Maltese (MT), Norwegian (NO), Polish (PL), Portuguese (PT), Romanian (RO), Slovak (SK), Slovene (SL), Spanish (ES), Swedish (SV) and Russian (RU).

See also

  • Comparison of machine translation applications
    Comparison of machine translation applications
    A machine translation application is a program which can translate text or speech from one natural language to another. Machine translation applications are essential to the modern language industry...

  • Google Translate
    Google Translate
    Google Translate is a free statistical machine translation service provided by Google Inc. to translate a section of text, document or webpage, into another language.The service was introduced in April 28, 2006 for the Arabic language...

  • Eurotra
    Eurotra
    Eurotra was an ambitious machine translation project established and funded by the European Commission from the late 1970s until 1994.Emboldened by modest success with an older, commercially-developed machine translation system SYSTRAN, a large network of European computational linguists embarked...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK