History of machine translation
Encyclopedia
The history of machine translation generally starts in the 1950s, although work can be found from earlier periods. The Georgetown experiment
in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The experiment was a great success and ushered in an era of significant funding for machine translation research in the United States. The authors claimed that within three or five years, machine translation would be a solved problem. In the Soviet Union, similar experiments were performed shortly after.
However, the real progress was much slower, and after the ALPAC report
in 1966, which found that the ten years of research had failed to fulfill the expectations, and funding was dramatically reduced. Starting in the late 1980s, as computational power increased and became less expensive, more interest began to be shown in statistical models for machine translation
.
Today there is still no system that provides the holy grail of "fully automatic high quality translation of unrestricted text" (FAHQUT). However, there are many programs now available that are capable of providing useful output within strict constraints; several of them are available online, such as Google Translate
and the SYSTRAN
system which powers AltaVista's (Yahoo's since May 9, 2008) BabelFish.
The first patents for "translating machines" were applied for in the mid 1930s. One proposal, by Georges Artsrouni was simply an automatic bilingual dictionary using paper tape. The other proposal, by Peter Troyanskii, a Russian
, was more detailed. It included both the bilingual dictionary, and a method for dealing with grammatical roles between languages, based on Esperanto
. The system was split up into three stages: the first was for a native-speaking editor in the sources language to organise the words into their logical form
s and syntactic functions; the second was for the machine to "translate" these forms into the target language; and the third was for a native-speaking editor in the target language to normalise this output. His scheme remained unknown until the late 1950s, by which time computers were well-known.
, a researcher at the Rockefeller Foundation
, in his July, 1949 memorandum. These proposals were based on information theory
, successes of code breaking
during the second world war
and speculation about universal underlying principles of natural language.
A few years after these proposals, research began in earnest at many universities in the United States
. On 7 January 1954, the Georgetown-IBM experiment
, the first public demonstration of an MT system, was held in New York at the head office of IBM. The demonstration was widely reported in the newspapers and received much public interest. The system itself, however, was no more than what today would be called a "toy" system, having just 250 words and translating just 49 carefully selected Russian sentences into English — mainly in the field of chemistry
. Nevertheless it encouraged the view that machine translation was imminent — and in particular stimulated the financing of the research, not just in the US but worldwide.
Early systems used large bilingual dictionaries and hand-coded rules for fixing the word order in the final output. This was eventually found to be too restrictive, and developments in linguistics at the time, for example generative linguistics
and transformational grammar
were proposed to improve the quality of translations.
During this time, operational systems were installed. The United States Air Force
used a system produced by IBM
and Washington University, while the Atomic Energy Commission
in the United States
and Euratom in Italy
used a system developed at Georgetown University
. While the quality of the output was poor, it nevertheless met many of the customers' needs, chiefly in terms of speed.
At the end of the 1950s, an argument was put forward by Yehoshua Bar-Hillel
, a researcher asked by the US government to look into machine translation against the possibility of "Fully Automatic High Quality Translation" by machines. The argument is one of semantic ambiguity or double-meaning. Consider the following sentence:
The word pen may have two meanings, the first meaning something you use to write with, the second meaning a container of some kind. To a human, the meaning is obvious, but he claimed that without a "universal encyclopedia" a machine would never be able to deal with this problem. Today, this type of semantic ambiguity can be solved by writing source texts for machine translation in a controlled language
that uses a vocabulary
in which each word has exactly one meaning.
and the United States concentrated mainly on the Russian
-English language
pair. Chiefly the objects of translation were scientific and technical documents, such as articles from scientific journal
s. The rough translations produced were sufficient to get a basic understanding of the articles. If an article discussed a subject deemed to be of security interest, it was sent to a human translator for a complete translation; if not, it was discarded.
A great blow came to machine translation research in 1966 with the publication of the ALPAC report. The report was commissioned by the US government and performed by ALPAC
, the Automatic Language Processing Advisory Committee, a group of seven scientists convened by the US government in 1964. The US government was concerned that there was a lack of progress being made despite significant expenditure. It concluded that machine translation was more expensive, less accurate and slower than human translation, and that despite the expenses, machine translation was not likely to reach the quality of a human translator in the near future.
The report, however, recommended that tools be developed to aid translators — automatic dictionaries, for example — and that some research in computational linguistics should continue to be supported.
The publication of the report had a profound impact on research into machine translation in the United States, and to a lesser extent the Soviet Union
and United Kingdom
. Research, at least in the US, was almost completely abandoned for over a decade. In Canada
, France
and Germany
, however, research continued. In the US the main exceptions were the founders of Systran (Peter Toma
) and Logos
(Bernard Scott), who established their companies in 1968 and 1970 respectively and served the US Dept of Defense. In 1970, the Systran
system was installed for the United States Air Force
and subsequently in 1976 by the Commission of the European Communities. The METEO System
, developed at the Université de Montréal
, was installed in Canada
in 1977 to translate weather forecasts from English to French, and was translating close to 80,000 words per day or 30 million words per year until it was replaced by a competitor's system on the 30th September, 2001.
While research in the 1960s concentrated on limited language pairs and input, demand in the 1970s was for low-cost systems that could translate a range of technical and commercial documents. This demand was spurred by the increase of globalisation and the demand for translation in Canada
, Europe
, and Japan
.
technology were in use, such as Systran
, Logos
, Ariane-G5, and Metal
.
As a result of the improved availability of microcomputers, there was a market for lower-end machine translation systems. Many companies took advantage of this in Europe, Japan, and the USA. Systems were also brought onto the market in China
, Eastern Europe
, Korea
, and the Soviet Union
.
During the 1980s there was a lot of activity in MT in Japan especially. With the Fifth generation computer
Japan intended to leap over its competition in computer hardware and software, and one project that many large Japanese electronics firms found themselves involved in was creating software for translating to and from English (Fujitsu, Toshiba, NTT, Brother, Catena, Matsushita, Mitsubishi, Sharp, Sanyo, Hitachi, NEC, Panasonic, Kodensha, Nova, Oki).
Research during the 1980s typically relied on translation through some variety of intermediary linguistic representation involving morphological, syntactic, and semantic analysis.
At the end of the 1980s there was a large surge in a number of novel methods for machine translation. One system was developed at IBM
that was based on statistical methods. Makoto Nagao
and his group used methods based on large numbers of example translations, a technique which is now termed example-based machine translation
. A defining feature of both of these approaches was the lack of syntactic and semantic rules and reliance instead on the manipulation of large text corpora
.
During the 1990s, encouraged by successes in speech recognition
and speech synthesis
, research began into speech translation with the development of the German Verbmobil
project.
There was significant growth in the use of machine translation as a result of the advent of low-cost and more powerful computers. It was in the early 1990s that machine translation began to make the transition away from large mainframe computer
s toward personal computer
s and workstation
s. Two companies that led the PC market for a time were Globalink and MicroTac, following which a merger of the two companies (in December 1994) was found to be in the corporate interest of both. Intergraph and Systran also began to offer PC versions around this time. Sites also became available on the internet, such as AltaVista
's Babel Fish
(using Systran
technology) and Google
Language Tools (also initially using Systran
technology exclusively).
and example-based machine translation
.
In the area of speech translation, research has focused on moving from domain-limited systems to domain-unlimited translation systems. In different research projects in Europe (like ) and in the United States (STR-DUST and ) solutions for automatically translating Parliamentary speeches and broadcast news have been developed. In these scenarios the domain of the content is no longer limited to any special area, but rather the speeches to be translated cover a variety of topics.
More recently, the French-German project Quaero
investigates possibilities to make use of machine translations for a multi-lingual internet. The project seeks to translate not only webpages, but also videos and audio files found on the internet.
Today, only a few companies use statistical machine translation
commercially, e.g. Asia Online
, SDL International / Language Weaver (sells translation products and services), Google
(uses their proprietary statistical MT system for some language combinations in Google's language tools), Microsoft
(uses their proprietary statistical MT system to translate knowledge base articles), and Ta with you (offers a domain-adapted machine translation solution based on statistical MT with some linguistic knowledge). There has been a renewed interest in hybridisation, with researchers combining syntactic and morphological (i.e., linguistic) knowledge into statistical systems, as well as combining statistics with existing rule-based systems.
Georgetown-IBM experiment
The Georgetown-IBM experiment was an influential demonstration of machine translation, which was performed during January 7, 1954. Developed jointly by the Georgetown University and IBM, the experiment involved completely automatic translation of more than sixty Russian sentences into...
in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The experiment was a great success and ushered in an era of significant funding for machine translation research in the United States. The authors claimed that within three or five years, machine translation would be a solved problem. In the Soviet Union, similar experiments were performed shortly after.
However, the real progress was much slower, and after the ALPAC report
ALPAC
ALPAC was a committee of seven scientists led by John R. Pierce, established in 1964 by the U. S. Government in order to evaluate the progress in computational linguistics in general and machine translation in particular...
in 1966, which found that the ten years of research had failed to fulfill the expectations, and funding was dramatically reduced. Starting in the late 1980s, as computational power increased and became less expensive, more interest began to be shown in statistical models for machine translation
Statistical machine translation
Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...
.
Today there is still no system that provides the holy grail of "fully automatic high quality translation of unrestricted text" (FAHQUT). However, there are many programs now available that are capable of providing useful output within strict constraints; several of them are available online, such as Google Translate
Google Translate
Google Translate is a free statistical machine translation service provided by Google Inc. to translate a section of text, document or webpage, into another language.The service was introduced in April 28, 2006 for the Arabic language...
and the SYSTRAN
SYSTRAN
SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission....
system which powers AltaVista's (Yahoo's since May 9, 2008) BabelFish.
The beginning
The history of machine translation dates back to the seventeenth century, when philosophers such as Leibniz and Descartes put forward proposals for codes which would relate words between languages. All of these proposals remained theoretical, and none resulted in the development of an actual machine.The first patents for "translating machines" were applied for in the mid 1930s. One proposal, by Georges Artsrouni was simply an automatic bilingual dictionary using paper tape. The other proposal, by Peter Troyanskii, a Russian
Russians
The Russian people are an East Slavic ethnic group native to Russia, speaking the Russian language and primarily living in Russia and neighboring countries....
, was more detailed. It included both the bilingual dictionary, and a method for dealing with grammatical roles between languages, based on Esperanto
Esperanto
is the most widely spoken constructed international auxiliary language. Its name derives from Doktoro Esperanto , the pseudonym under which L. L. Zamenhof published the first book detailing Esperanto, the Unua Libro, in 1887...
. The system was split up into three stages: the first was for a native-speaking editor in the sources language to organise the words into their logical form
Logical form
In logic, the logical form of a sentence or set of sentences is the form obtained by abstracting from the subject matter of its content terms or by regarding the content terms as mere placeholders or blanks on a form...
s and syntactic functions; the second was for the machine to "translate" these forms into the target language; and the third was for a native-speaking editor in the target language to normalise this output. His scheme remained unknown until the late 1950s, by which time computers were well-known.
The early years
The first proposals for machine translation using computers were put forward by Warren WeaverWarren Weaver
Warren Weaver was an American scientist, mathematician, and science administrator...
, a researcher at the Rockefeller Foundation
Rockefeller Foundation
The Rockefeller Foundation is a prominent philanthropic organization and private foundation based at 420 Fifth Avenue, New York City. The preeminent institution established by the six-generation Rockefeller family, it was founded by John D. Rockefeller , along with his son John D. Rockefeller, Jr...
, in his July, 1949 memorandum. These proposals were based on information theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...
, successes of code breaking
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...
during the second world war
World War II
World War II, or the Second World War , was a global conflict lasting from 1939 to 1945, involving most of the world's nations—including all of the great powers—eventually forming two opposing military alliances: the Allies and the Axis...
and speculation about universal underlying principles of natural language.
A few years after these proposals, research began in earnest at many universities in the United States
United States
The United States of America is a federal constitutional republic comprising fifty states and a federal district...
. On 7 January 1954, the Georgetown-IBM experiment
Georgetown-IBM experiment
The Georgetown-IBM experiment was an influential demonstration of machine translation, which was performed during January 7, 1954. Developed jointly by the Georgetown University and IBM, the experiment involved completely automatic translation of more than sixty Russian sentences into...
, the first public demonstration of an MT system, was held in New York at the head office of IBM. The demonstration was widely reported in the newspapers and received much public interest. The system itself, however, was no more than what today would be called a "toy" system, having just 250 words and translating just 49 carefully selected Russian sentences into English — mainly in the field of chemistry
Chemistry
Chemistry is the science of matter, especially its chemical reactions, but also its composition, structure and properties. Chemistry is concerned with atoms and their interactions with other atoms, and particularly with the properties of chemical bonds....
. Nevertheless it encouraged the view that machine translation was imminent — and in particular stimulated the financing of the research, not just in the US but worldwide.
Early systems used large bilingual dictionaries and hand-coded rules for fixing the word order in the final output. This was eventually found to be too restrictive, and developments in linguistics at the time, for example generative linguistics
Generative linguistics
Generative linguistics is a school of thought within linguistics that makes use of the concept of a generative grammar. The term "generative grammar" is used in different ways by different people, and the term "generative linguistics" therefore has a range of different, though overlapping,...
and transformational grammar
Transformational grammar
In linguistics, a transformational grammar or transformational-generative grammar is a generative grammar, especially of a natural language, that has been developed in the Chomskyan tradition of phrase structure grammars...
were proposed to improve the quality of translations.
During this time, operational systems were installed. The United States Air Force
United States Air Force
The United States Air Force is the aerial warfare service branch of the United States Armed Forces and one of the American uniformed services. Initially part of the United States Army, the USAF was formed as a separate branch of the military on September 18, 1947 under the National Security Act of...
used a system produced by IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
and Washington University, while the Atomic Energy Commission
United States Atomic Energy Commission
The United States Atomic Energy Commission was an agency of the United States government established after World War II by Congress to foster and control the peace time development of atomic science and technology. President Harry S...
in the United States
United States
The United States of America is a federal constitutional republic comprising fifty states and a federal district...
and Euratom in Italy
Italy
Italy , officially the Italian Republic languages]] under the European Charter for Regional or Minority Languages. In each of these, Italy's official name is as follows:;;;;;;;;), is a unitary parliamentary republic in South-Central Europe. To the north it borders France, Switzerland, Austria and...
used a system developed at Georgetown University
Georgetown University
Georgetown University is a private, Jesuit, research university whose main campus is in the Georgetown neighborhood of Washington, D.C. Founded in 1789, it is the oldest Catholic university in the United States...
. While the quality of the output was poor, it nevertheless met many of the customers' needs, chiefly in terms of speed.
At the end of the 1950s, an argument was put forward by Yehoshua Bar-Hillel
Yehoshua Bar-Hillel
Yehoshua Bar-Hillel was an Israeli philosopher, mathematician, and linguist at the Hebrew University of Jerusalem, best known for his pioneering work in machine translation and formal linguistics.- Biography :...
, a researcher asked by the US government to look into machine translation against the possibility of "Fully Automatic High Quality Translation" by machines. The argument is one of semantic ambiguity or double-meaning. Consider the following sentence:
- Little John was looking for his toy box. Finally he found it. The box was in the pen.
The word pen may have two meanings, the first meaning something you use to write with, the second meaning a container of some kind. To a human, the meaning is obvious, but he claimed that without a "universal encyclopedia" a machine would never be able to deal with this problem. Today, this type of semantic ambiguity can be solved by writing source texts for machine translation in a controlled language
Controlled natural language
Controlled natural languages are subsets of natural languages, obtained byrestricting the grammar and vocabulary in orderto reduce or eliminate ambiguity and complexity.Traditionally, controlled languages fall into two major types:...
that uses a vocabulary
Vocabulary
A person's vocabulary is the set of words within a language that are familiar to that person. A vocabulary usually develops with age, and serves as a useful and fundamental tool for communication and acquiring knowledge...
in which each word has exactly one meaning.
The 1960s, the ALPAC report and the seventies
Research in the 1960s in both the Soviet UnionSoviet Union
The Soviet Union , officially the Union of Soviet Socialist Republics , was a constitutionally socialist state that existed in Eurasia between 1922 and 1991....
and the United States concentrated mainly on the Russian
Russian language
Russian is a Slavic language used primarily in Russia, Belarus, Uzbekistan, Kazakhstan, Tajikistan and Kyrgyzstan. It is an unofficial but widely spoken language in Ukraine, Moldova, Latvia, Turkmenistan and Estonia and, to a lesser extent, the other countries that were once constituent republics...
-English language
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
pair. Chiefly the objects of translation were scientific and technical documents, such as articles from scientific journal
Scientific journal
In academic publishing, a scientific journal is a periodical publication intended to further the progress of science, usually by reporting new research. There are thousands of scientific journals in publication, and many more have been published at various points in the past...
s. The rough translations produced were sufficient to get a basic understanding of the articles. If an article discussed a subject deemed to be of security interest, it was sent to a human translator for a complete translation; if not, it was discarded.
A great blow came to machine translation research in 1966 with the publication of the ALPAC report. The report was commissioned by the US government and performed by ALPAC
ALPAC
ALPAC was a committee of seven scientists led by John R. Pierce, established in 1964 by the U. S. Government in order to evaluate the progress in computational linguistics in general and machine translation in particular...
, the Automatic Language Processing Advisory Committee, a group of seven scientists convened by the US government in 1964. The US government was concerned that there was a lack of progress being made despite significant expenditure. It concluded that machine translation was more expensive, less accurate and slower than human translation, and that despite the expenses, machine translation was not likely to reach the quality of a human translator in the near future.
The report, however, recommended that tools be developed to aid translators — automatic dictionaries, for example — and that some research in computational linguistics should continue to be supported.
The publication of the report had a profound impact on research into machine translation in the United States, and to a lesser extent the Soviet Union
Soviet Union
The Soviet Union , officially the Union of Soviet Socialist Republics , was a constitutionally socialist state that existed in Eurasia between 1922 and 1991....
and United Kingdom
United Kingdom
The United Kingdom of Great Britain and Northern IrelandIn the United Kingdom and Dependencies, other languages have been officially recognised as legitimate autochthonous languages under the European Charter for Regional or Minority Languages...
. Research, at least in the US, was almost completely abandoned for over a decade. In Canada
Canada
Canada is a North American country consisting of ten provinces and three territories. Located in the northern part of the continent, it extends from the Atlantic Ocean in the east to the Pacific Ocean in the west, and northward into the Arctic Ocean...
, France
France
The French Republic , The French Republic , The French Republic , (commonly known as France , is a unitary semi-presidential republic in Western Europe with several overseas territories and islands located on other continents and in the Indian, Pacific, and Atlantic oceans. Metropolitan France...
and Germany
Germany
Germany , officially the Federal Republic of Germany , is a federal parliamentary republic in Europe. The country consists of 16 states while the capital and largest city is Berlin. Germany covers an area of 357,021 km2 and has a largely temperate seasonal climate...
, however, research continued. In the US the main exceptions were the founders of Systran (Peter Toma
Peter Toma
Dr. Peter Toma is a Hungarian-born computer scientist and linguistics researcher.Toma developed and commercialised the SYSTRAN machine translation system...
) and Logos
OpenLogos
OpenLogos is the Open Source version of the Logos Machine Translation System, one of the earliest and longest running commercial machine translation products in the world. It was developed by Logos Corporation in the United States, with additional development teams in Germany and Italy...
(Bernard Scott), who established their companies in 1968 and 1970 respectively and served the US Dept of Defense. In 1970, the Systran
SYSTRAN
SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission....
system was installed for the United States Air Force
United States Air Force
The United States Air Force is the aerial warfare service branch of the United States Armed Forces and one of the American uniformed services. Initially part of the United States Army, the USAF was formed as a separate branch of the military on September 18, 1947 under the National Security Act of...
and subsequently in 1976 by the Commission of the European Communities. The METEO System
METEO System
The METEO System is a machine translation system specifically designed for the translation of the weather forecasts issued daily by Environment Canada. The system was used from 1981 to the 30th of September 2001 by Environment Canada to translate forecasts issued in French in the province of Quebec...
, developed at the Université de Montréal
Université de Montréal
The Université de Montréal is a public francophone research university in Montreal, Quebec, Canada. It comprises thirteen faculties, more than sixty departments and two affiliated schools: the École Polytechnique and HEC Montréal...
, was installed in Canada
Canada
Canada is a North American country consisting of ten provinces and three territories. Located in the northern part of the continent, it extends from the Atlantic Ocean in the east to the Pacific Ocean in the west, and northward into the Arctic Ocean...
in 1977 to translate weather forecasts from English to French, and was translating close to 80,000 words per day or 30 million words per year until it was replaced by a competitor's system on the 30th September, 2001.
While research in the 1960s concentrated on limited language pairs and input, demand in the 1970s was for low-cost systems that could translate a range of technical and commercial documents. This demand was spurred by the increase of globalisation and the demand for translation in Canada
Canada
Canada is a North American country consisting of ten provinces and three territories. Located in the northern part of the continent, it extends from the Atlantic Ocean in the east to the Pacific Ocean in the west, and northward into the Arctic Ocean...
, Europe
Europe
Europe is, by convention, one of the world's seven continents. Comprising the westernmost peninsula of Eurasia, Europe is generally 'divided' from Asia to its east by the watershed divides of the Ural and Caucasus Mountains, the Ural River, the Caspian and Black Seas, and the waterways connecting...
, and Japan
Japan
Japan is an island nation in East Asia. Located in the Pacific Ocean, it lies to the east of the Sea of Japan, China, North Korea, South Korea and Russia, stretching from the Sea of Okhotsk in the north to the East China Sea and Taiwan in the south...
.
The 1980s and early 1990s
By the 1980s, both the diversity and the number of installed systems for machine translation had increased. A number of systems relying on mainframeMainframe computer
Mainframes are powerful computers used primarily by corporate and governmental organizations for critical applications, bulk data processing such as census, industry and consumer statistics, enterprise resource planning, and financial transaction processing.The term originally referred to the...
technology were in use, such as Systran
SYSTRAN
SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission....
, Logos
OpenLogos
OpenLogos is the Open Source version of the Logos Machine Translation System, one of the earliest and longest running commercial machine translation products in the world. It was developed by Logos Corporation in the United States, with additional development teams in Germany and Italy...
, Ariane-G5, and Metal
METAL MT
A machine translation system developed at the University of Texas and at Siemens which ran on Lisp Machines.- Background :Originally titled the Linguistics Research System , it was later renamed METAL...
.
As a result of the improved availability of microcomputers, there was a market for lower-end machine translation systems. Many companies took advantage of this in Europe, Japan, and the USA. Systems were also brought onto the market in China
China
Chinese civilization may refer to:* China for more general discussion of the country.* Chinese culture* Greater China, the transnational community of ethnic Chinese.* History of China* Sinosphere, the area historically affected by Chinese culture...
, Eastern Europe
Eastern Europe
Eastern Europe is the eastern part of Europe. The term has widely disparate geopolitical, geographical, cultural and socioeconomic readings, which makes it highly context-dependent and even volatile, and there are "almost as many definitions of Eastern Europe as there are scholars of the region"...
, Korea
Korea
Korea ) is an East Asian geographic region that is currently divided into two separate sovereign states — North Korea and South Korea. Located on the Korean Peninsula, Korea is bordered by the People's Republic of China to the northwest, Russia to the northeast, and is separated from Japan to the...
, and the Soviet Union
Soviet Union
The Soviet Union , officially the Union of Soviet Socialist Republics , was a constitutionally socialist state that existed in Eurasia between 1922 and 1991....
.
During the 1980s there was a lot of activity in MT in Japan especially. With the Fifth generation computer
Fifth generation computer
The Fifth Generation Computer Systems project was an initiative by Japan'sMinistry of International Trade and Industry, begun in 1982, to create a "fifth generation computer" which was supposed to perform much calculation using massive parallel processing...
Japan intended to leap over its competition in computer hardware and software, and one project that many large Japanese electronics firms found themselves involved in was creating software for translating to and from English (Fujitsu, Toshiba, NTT, Brother, Catena, Matsushita, Mitsubishi, Sharp, Sanyo, Hitachi, NEC, Panasonic, Kodensha, Nova, Oki).
Research during the 1980s typically relied on translation through some variety of intermediary linguistic representation involving morphological, syntactic, and semantic analysis.
At the end of the 1980s there was a large surge in a number of novel methods for machine translation. One system was developed at IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
that was based on statistical methods. Makoto Nagao
Makoto Nagao
is a Japanese computer scientist. He contributed to various fields: machine translation, natural language processing, pattern recognition, image processing and library science...
and his group used methods based on large numbers of example translations, a technique which is now termed example-based machine translation
Example-based machine translation
The example-based machine translation approach to machine translation is often characterized by its use of a bilingual corpus with parallel texts as its main knowledge base, at run-time...
. A defining feature of both of these approaches was the lack of syntactic and semantic rules and reliance instead on the manipulation of large text corpora
Corpus linguistics
Corpus linguistics is the study of language as expressed in samples or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely...
.
During the 1990s, encouraged by successes in speech recognition
Speech recognition
Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...
and speech synthesis
Speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware...
, research began into speech translation with the development of the German Verbmobil
Verbmobil
Verbmobil was a long-term interdisciplinary Language Technology research project with the aim to develop a system that can recognize, translate and produce natural utterances and thus "translate spontaneous speech robustly and bidirectionally for German/English and German/Japanese".Verbmobil...
project.
There was significant growth in the use of machine translation as a result of the advent of low-cost and more powerful computers. It was in the early 1990s that machine translation began to make the transition away from large mainframe computer
Mainframe computer
Mainframes are powerful computers used primarily by corporate and governmental organizations for critical applications, bulk data processing such as census, industry and consumer statistics, enterprise resource planning, and financial transaction processing.The term originally referred to the...
s toward personal computer
Personal computer
A personal computer is any general-purpose computer whose size, capabilities, and original sales price make it useful for individuals, and which is intended to be operated directly by an end-user with no intervening computer operator...
s and workstation
Workstation
A workstation is a high-end microcomputer designed for technical or scientific applications. Intended primarily to be used by one person at a time, they are commonly connected to a local area network and run multi-user operating systems...
s. Two companies that led the PC market for a time were Globalink and MicroTac, following which a merger of the two companies (in December 1994) was found to be in the corporate interest of both. Intergraph and Systran also began to offer PC versions around this time. Sites also became available on the internet, such as AltaVista
AltaVista
AltaVista is a web search engine owned by Yahoo!. AltaVista was once one of the most popular search engines but its popularity declined with the rise of Google...
's Babel Fish
Babel Fish (website)
Yahoo! Babel Fish is a web-based machine translation application on Yahoo! that translates text or web pages from one of several languages into another....
(using Systran
SYSTRAN
SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission....
technology) and Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...
Language Tools (also initially using Systran
SYSTRAN
SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission....
technology exclusively).
Recent research
The field of machine translation has in the last few years seen major changes. Currently a large amount of research is being done into statistical machine translationStatistical machine translation
Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...
and example-based machine translation
Example-based machine translation
The example-based machine translation approach to machine translation is often characterized by its use of a bilingual corpus with parallel texts as its main knowledge base, at run-time...
.
In the area of speech translation, research has focused on moving from domain-limited systems to domain-unlimited translation systems. In different research projects in Europe (like ) and in the United States (STR-DUST and ) solutions for automatically translating Parliamentary speeches and broadcast news have been developed. In these scenarios the domain of the content is no longer limited to any special area, but rather the speeches to be translated cover a variety of topics.
More recently, the French-German project Quaero
Quaero
Quaero is a European research and development program with the goal of developing multimedia and multilingual indexing and management tools for professional and general public applications . The European Commission approved the aid granted by France on 11 March 2008.This program is supported by the...
investigates possibilities to make use of machine translations for a multi-lingual internet. The project seeks to translate not only webpages, but also videos and audio files found on the internet.
Today, only a few companies use statistical machine translation
Statistical machine translation
Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...
commercially, e.g. Asia Online
Asia Online
Asia Online is a privately owned company backed by individual investors and institutional venture capital. Its corporate headquarters are in Singapore, and it has significant operations in Bangkok, Thailand, with R&D activities throughout Asia and expanding sales operations in Europe and North...
, SDL International / Language Weaver (sells translation products and services), Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...
(uses their proprietary statistical MT system for some language combinations in Google's language tools), Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...
(uses their proprietary statistical MT system to translate knowledge base articles), and Ta with you (offers a domain-adapted machine translation solution based on statistical MT with some linguistic knowledge). There has been a renewed interest in hybridisation, with researchers combining syntactic and morphological (i.e., linguistic) knowledge into statistical systems, as well as combining statistics with existing rule-based systems.
See also
- ALPAC report
- Computer-aided translation
- Lighthill reportLighthill reportThe Lighthill report is the name commonly used for the paper "Artificial Intelligence: A General Survey" by James Lighthill, published in Artificial Intelligence: a paper symposium in 1973....
- Machine translationMachine translationMachine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
Further reading
- Hutchins, J. (1986) Machine Translation: past, present, future (Chichester: Ellis Horwood) ISBN 0-85312-788-3 — available online here