Speech Translation
Encyclopedia
Speech Translation is the process by which conversation
Conversation
Conversation is a form of interactive, spontaneous communication between two or more people who are following rules of etiquette.Conversation analysis is a branch of sociology which studies the structure and organization of human interaction, with a more specific focus on conversational...

al spoken phrases are instantly translated
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...

 and spoken aloud in a second language. This differs from phrase translation, which is where the system only translates a fixed and finite set of phrases that have been manually entered into the system. Speech translation technology enables speakers of different languages to communicate. It thus is of tremendous value for humankind in terms of science, cross-cultural exchange and global business.



Perhaps first popularized by the fictional Universal Voice Translator on Star Trek
Star Trek
Star Trek is an American science fiction entertainment franchise created by Gene Roddenberry. The core of Star Trek is its six television series: The Original Series, The Animated Series, The Next Generation, Deep Space Nine, Voyager, and Enterprise...

, this capability of instant translation of conversations has moved from being a future dream to deployed product.

How They Work

Speech translation systems typically integrate three software technologies:
automatic speech recognition (ASR), machine translation
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...

 (MT) and voice synthesis (TTS).

The speaker of language A speaks into a microphone and the speech recognition
Speech recognition
Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...

 module recognizes the utterance. It compares the input with a phonological model, consisting of a large corpus
Speech corpus
A speech corpus is a database of speech audio files and text transcriptions.In Speech technology, speech corpora are used, among other things, to create acoustic models ....

 of speech data from multiple speakers.
The input is then converted into a string of words
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

, using dictionary and grammar of language A, based on a massive corpus of text
Text corpus
In linguistics, a corpus or text corpus is a large and structured set of texts...

 in language A.
The machine translation
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...

 module then translates this string. Early systems replaced every word with a corresponding word in language B. Current systems do not use word-for-word translation, but rather take into account the entire context of the input to generate the appropriate translation. The generated translation utterance
Utterance
In spoken language analysis an utterance is a complete unit of speech. It is generally but not always bounded by silence.It can be represented and delineated in written language in many ways. Note that in such areas of research utterances do not exist in written language, only their representations...

 is sent to the speech synthesis
Speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware...

 module, which estimates the pronunciation and intonation matching the string of words based on a corpus
Speech corpus
A speech corpus is a database of speech audio files and text transcriptions.In Speech technology, speech corpora are used, among other things, to create acoustic models ....

 of speech data in language B. Waveforms matching the text are selected from this database and the speech synthesis connects and outputs them.

History

- In 1983, NEC Corporation
NEC
, a Japanese multinational IT company, has its headquarters in Minato, Tokyo, Japan. NEC, part of the Sumitomo Group, provides information technology and network solutions to business enterprises, communications services providers and government....

 demonstrated speech translation as a concept exhibit at the ITU Telecom World
ITU Telecom World
ITU Telecom is part of the ITU , the United Nations specialized agency for telecommunications. ITU Telecom organizes global events for the governments, industry leaders and regulators that form part of the world’s ICT community. The first ITU Telecom event was held in 1971 and marked its 40th...

 (Telecom '83).

- The first individual generally credited with developing and deploying a commercialized speech translation system capable of translating continuous free speech is Robert Palmquist, with his release of an English-Spanish large vocabulary system in 1997. This effort was funded in part by the Office of Naval Research
Office of Naval Research
The Office of Naval Research , headquartered in Arlington, Virginia , is the office within the United States Department of the Navy that coordinates, executes, and promotes the science and technology programs of the U.S...

>

To further develop and deploy speech translation systems, in 2001 he formed SpeechGear, which has broad patents covering speech translation systems.

- In 2003, SpeechGear developed and deployed the world's first commercial mobile device with on-board Japanese-to-English speech translation.

- One of the first translation systems using a mobile phone, "Interpreter", was released by SpeechGear in 2004.

- In 2006, NEC developed another mobile device with on-board Japanese-to-English speech translation.

- Another speech translation service using a mobile phone, “shabette honyaku”, was released by ATR-Trek in 2007.

- In 2009 SpeechGear released version 4.0 of their Compadre:Interact speech translation product. This version provides instant translation of conversations between English and approximately 35 other languages.
- Today, there are a number of speech translation applications for smart phones, e.g. Jibbigo
Jibbigo
Jibbigo is a mobile language translation application that was developed by Mobile Technologies, LLC and Dr. Alex Waibel, a professor at Carnegie Mellon. Jibbigo is an offline voice translator, and does...

 which offers a self-contained mobile app in eight language pairs for Apple's AppStore and the Android Market.




Research and Development

Research and development has gradually progressed from relatively simple to more advanced translation. International evaluation workshops were established to support the development of speech-translation technology. They allow research institutes to cooperate and compete against each other at the same time. The concept of those workshop is a kind of contest: a common dataset is provided by the organizers and the participating research institutes create systems that are evaluated. In this way, efficient research is being promoted.


The International Workshop on Spoken Language Translation (IWSLT), organized by C-STAR, an international consortium
Consortium
A consortium is an association of two or more individuals, companies, organizations or governments with the objective of participating in a common activity or pooling their resources for achieving a common goal....

 for research on speech translation, has been held since 2004. “Every year, the number of participating institutes increases, and it has become a key event for speech translation research.”

Standards

When many countries begin to research and develop speech translation, it will be necessary to standardize interfaces and data formats to ensure that the systems are mutually compatible. International joint research is being fostered by speech translation consortiums (e.g. the C-STAR international consortium for joint research of speech translation and A-STAR for the Asia-Pacific region). They were founded as “international joint-research organization[s] to design formats of bilingual corpora that are essential to advance the research and development of this technology (...) and to standardize interfaces and data formats to connect speech translation module internationally”.

Applications

Today, speech translation systems are being used throughout the world. Examples include medical facilities, schools, police, hotels, retail stores, and factories. These systems are applicable anywhere that spoken language is being used to communicate.

Challenges and Future Prospects

Currently, speech translation technology is available as product that instantly translates free form multi-lingual conversations. These systems instantly translate continuous speech. Challenges in accomplishing this include overcoming Speaker dependent variations
Idiolect
In linguistics, an idiolect is a variety of a language unique to an individual. It is manifested by patterns of vocabulary or idiom selection , grammar, or pronunciations that are unique to the individual. Every individual's language production is in some sense unique...

 in style of speaking or pronunciation
Pronunciation
Pronunciation refers to the way a word or a language is spoken, or the manner in which someone utters a word. If one is said to have "correct pronunciation", then it refers to both within a particular dialect....

 are issues that have to be dealt with in order to provide high quality translation for all users. Moreover, speech recognition
Speech recognition
Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...

 systems must be able to remedy external factors such as acoustic noise or speech by other speakers in real-world use of speech translation systems.

For the reason that the user does not understand the target language when speech translation is used, a method "must be provided for the user to check whether the translation is correct, by such means as translating it again back into the user's language".
In order to achieve the goal of erasing the language barrier world wide, multiple languages have to be supported. This requires speech corpora, bilingual corpora and text corpora for each of the estimated 6,000 languages said to exist on our planet today.

As the collection of corpora
Corpus linguistics
Corpus linguistics is the study of language as expressed in samples or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely...

 is extremely expensive, collecting data from the Web would be an alternative to conventional methods. “Secondary use of news or other media published in multiple languages would be an effective way to improve performance of speech translation.” However, “current copyright
Copyright
Copyright is a legal concept, enacted by most governments, giving the creator of an original work exclusive rights to it, usually for a limited time...

law
does not take secondary uses such as these types of corpora into account” and thus “it will be necessary to revise it so that it is more flexible.”
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK