Eurotra
Encyclopedia
Eurotra was an ambitious machine translation
project established and funded by the European Commission
from the late 1970s until 1994.
Emboldened by modest success with an older, commercially-developed machine translation system SYSTRAN
, a large network of European computational linguists embarked upon the Eurotra project with the hope of creating a state-of-the-art MT system for the then seven, later nine, official languages of the European Community.
However, as time passed, expectations became tempered; "Fully Automatic High Quality Translation" was not a reasonably attainable goal. The true character of Eurotra was eventually acknowledged to be in fact pre-competitive research rather than prototype development.
The project was motivated by one of the founding principles of the EU: that all citizens had the right to read any and all proceedings of the Commission in their own language. As more countries joined, this produced a combinatorial explosion
in the number of language pairs involved, and the need to translate every paper, speech and even set of meeting minutes produced by the EU into the other eight languages meant that translation rapidly became the overwhelming component in the administrative budget. To solve this problem Eurotra was devised.
The project was unusual in that rather than consisting of a single research team, it had member groups distributed around the member countries, organised along language rather than national lines (for example, groups in Leuven and Utrecht worked closely together), and the secretariat was based at the European Commission
in Luxembourg
. While this contributed significantly to the culture of the project, it also demonstrated graphically Brooks' assertion in The Mythical Man-Month
that adding personnel to a project results in it taking longer to complete; the more the number of groups involved, the more time is spent on administration and communication rather than actual research per se.
The actual design of the project was unusual as MT projects go. Older systems, such as SYSTRAN, were heavily dictionary-based, with minor support for rearranging word order. More recent systems have often worked on a probabilistic approach, based on parallel corpora. Eurotra addressed the constituent structure of the text to be translated, going through first a syntactic parse followed by a second parse to produce a dependency structure followed by a final parse with a third grammar to produce what was referred to internally as Intermediate Representation (IR). Since all three modules were implemented as Prolog
programs, it would then in principle be possible to put this structure backwards through the corresponding modules for another language to produce a translated text in any of the other languages. However, in practice this was not in fact how language pairs were implemented.
The first "live" translation occupied a 4Mb Microvax
running Ultrix
and C-Prolog for a complete weekend some time in early 1987. The sentence, translated from English
into Danish
, was "Japan makes computers". The main problem faced by the system was the generation of so-called "Parse Forests" - often a large number of different grammar rules could be applied to any particular phrase, producing hundreds, even thousands of (often identical) parse trees. This used up huge quantities of computer store, slowing the whole process down unnecessarily.
While Eurotra never delivered a "working" MT system, the project made a far-reaching long-term impact on the nascent language industries in European member states, in particular among the southern countries of Greece
, Italy
, Spain
, and Portugal
. There is at least one commercial MT system (developed by an academic/commercial consortium in Denmark
) derived from Eurotra technology.
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
project established and funded by the European Commission
European Commission
The European Commission is the executive body of the European Union. The body is responsible for proposing legislation, implementing decisions, upholding the Union's treaties and the general day-to-day running of the Union....
from the late 1970s until 1994.
Emboldened by modest success with an older, commercially-developed machine translation system SYSTRAN
SYSTRAN
SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission....
, a large network of European computational linguists embarked upon the Eurotra project with the hope of creating a state-of-the-art MT system for the then seven, later nine, official languages of the European Community.
However, as time passed, expectations became tempered; "Fully Automatic High Quality Translation" was not a reasonably attainable goal. The true character of Eurotra was eventually acknowledged to be in fact pre-competitive research rather than prototype development.
The project was motivated by one of the founding principles of the EU: that all citizens had the right to read any and all proceedings of the Commission in their own language. As more countries joined, this produced a combinatorial explosion
Combinatorial explosion
In administration and computing, a combinatorial explosion is the rapidly accelerating increase in lines of communication as organizations are added in a process...
in the number of language pairs involved, and the need to translate every paper, speech and even set of meeting minutes produced by the EU into the other eight languages meant that translation rapidly became the overwhelming component in the administrative budget. To solve this problem Eurotra was devised.
The project was unusual in that rather than consisting of a single research team, it had member groups distributed around the member countries, organised along language rather than national lines (for example, groups in Leuven and Utrecht worked closely together), and the secretariat was based at the European Commission
European Commission
The European Commission is the executive body of the European Union. The body is responsible for proposing legislation, implementing decisions, upholding the Union's treaties and the general day-to-day running of the Union....
in Luxembourg
Luxembourg
Luxembourg , officially the Grand Duchy of Luxembourg , is a landlocked country in western Europe, bordered by Belgium, France, and Germany. It has two principal regions: the Oesling in the North as part of the Ardennes massif, and the Gutland in the south...
. While this contributed significantly to the culture of the project, it also demonstrated graphically Brooks' assertion in The Mythical Man-Month
The Mythical Man-Month
The Mythical Man-Month: Essays on Software Engineering is a book on software engineering and project management by Fred Brooks, whose central theme is that "adding manpower to a late software project makes it later"...
that adding personnel to a project results in it taking longer to complete; the more the number of groups involved, the more time is spent on administration and communication rather than actual research per se.
The actual design of the project was unusual as MT projects go. Older systems, such as SYSTRAN, were heavily dictionary-based, with minor support for rearranging word order. More recent systems have often worked on a probabilistic approach, based on parallel corpora. Eurotra addressed the constituent structure of the text to be translated, going through first a syntactic parse followed by a second parse to produce a dependency structure followed by a final parse with a third grammar to produce what was referred to internally as Intermediate Representation (IR). Since all three modules were implemented as Prolog
Prolog
Prolog is a general purpose logic programming language associated with artificial intelligence and computational linguistics.Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is declarative: the program logic is expressed in terms of...
programs, it would then in principle be possible to put this structure backwards through the corresponding modules for another language to produce a translated text in any of the other languages. However, in practice this was not in fact how language pairs were implemented.
The first "live" translation occupied a 4Mb Microvax
MicroVAX
The MicroVAX was a family of low-end minicomputers developed and manufactured by Digital Equipment Corporation . The first model, the MicroVAX I, was introduced in 1984...
running Ultrix
Ultrix
Ultrix was the brand name of Digital Equipment Corporation's native Unix systems. While ultrix is the Latin word for avenger, the name was chosen solely for its sound.-History:...
and C-Prolog for a complete weekend some time in early 1987. The sentence, translated from English
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
into Danish
Danish language
Danish is a North Germanic language spoken by around six million people, principally in the country of Denmark. It is also spoken by 50,000 Germans of Danish ethnicity in the northern parts of Schleswig-Holstein, Germany, where it holds the status of minority language...
, was "Japan makes computers". The main problem faced by the system was the generation of so-called "Parse Forests" - often a large number of different grammar rules could be applied to any particular phrase, producing hundreds, even thousands of (often identical) parse trees. This used up huge quantities of computer store, slowing the whole process down unnecessarily.
While Eurotra never delivered a "working" MT system, the project made a far-reaching long-term impact on the nascent language industries in European member states, in particular among the southern countries of Greece
Greece
Greece , officially the Hellenic Republic , and historically Hellas or the Republic of Greece in English, is a country in southeastern Europe....
, Italy
Italy
Italy , officially the Italian Republic languages]] under the European Charter for Regional or Minority Languages. In each of these, Italy's official name is as follows:;;;;;;;;), is a unitary parliamentary republic in South-Central Europe. To the north it borders France, Switzerland, Austria and...
, Spain
Spain
Spain , officially the Kingdom of Spain languages]] under the European Charter for Regional or Minority Languages. In each of these, Spain's official name is as follows:;;;;;;), is a country and member state of the European Union located in southwestern Europe on the Iberian Peninsula...
, and Portugal
Portugal
Portugal , officially the Portuguese Republic is a country situated in southwestern Europe on the Iberian Peninsula. Portugal is the westernmost country of Europe, and is bordered by the Atlantic Ocean to the West and South and by Spain to the North and East. The Atlantic archipelagos of the...
. There is at least one commercial MT system (developed by an academic/commercial consortium in Denmark
Denmark
Denmark is a Scandinavian country in Northern Europe. The countries of Denmark and Greenland, as well as the Faroe Islands, constitute the Kingdom of Denmark . It is the southernmost of the Nordic countries, southwest of Sweden and south of Norway, and bordered to the south by Germany. Denmark...
) derived from Eurotra technology.