Loquendo
Encyclopedia
Loquendo is a multinational
computer
software technology corporation
, headquartered in Torino, Italy
, that provides speech recognition, speech synthesis, speaker verification and identification applications. Loquendo, which was founded in 2001 under the Telecom Italia
Lab, also has offices in United Kingdom, Spain, Germany, France, and the United States.
Current business products can be found in portable and in-car navigation devices, assistive devices for the differently able, smartphones, ebook readers, talking ATMs, computer games, voice-controlled domestic appliances and others. The voice synthesis and speech recognition systems is used in a new e-health application as part of Spain’s Junta de Andalucía Government Health Services's virtual assistant.
Loquendo's products have been the recipient of several awards including being a Speech Technologies Speech Engine Leader in 2007, 2008, and 2009 It was rated as 'Market Leader' by Speech Technologies in 2009 and 2010.
On September 30, 2011, Nuance (one of Loquendo's main competitors) announced that it had acquired Loquendo.
by the intuition of some IRI
-STET in the :it:CSELT laboratories in Turin
, already prestigious at international level.
Speech synthesis
On advices from the University of Padua
, by applying the so-called diphones
technique (union of a consonant and a vowel, 150 in total in the case of the Italian language) in the 1975 was created the first speech synthesizer with high intelligibility; it was MUSA (MUltichannel Speaking Automaton) that shows everybody the target that, with the technology in those days, the computers could reach. The results achieved in those years were condensed in a 45 rpm record that was spread in thousands of copies with the mass communication media. It was mainly the Italian version of the small song Frère Jacques
carried out in polyphony with more singing voices (MUSA could manage up to 8 synthesis channels in parallel) to arouse the maximum astonishment.
The evolution from that prototype, with the increased number of diphones (about 1000), the sharpening of the linguistic analysis tools (that allow, for example in Italian, to automatically distinguish "àncora" from "ancòra" - homograph words where only the stress changes and that mean, respectively, "anchor" and "again") and a better waveform management, brought in the next years towards the improvement of the synthetic voice. It was born in this wave the chip "sintetizzatore voce" developed internally in CSELT and added to the SGS (then SGS-Thompson and now STMicroelectronics
) catalog as Zilog
's Z80 microprocessor's peripheral (with the code M8950).
In the Nineties
ELOQUENS ® was born, speech synthesizer multi-platform for various operating systems (DOS
, Windows, System 7
, Unix
, OS/2
) or telephone boards with very large number of channels.
Such boards are used by the Italian telephone operator to build the reverse telephoner subscribers information service (to get, from the telephone number, subscriber's identity and address).
Towards the end of the second millennium the synthesis technology changes totally, by moving from the diphones approach to the "variable length unit selection and concatenation". This approach was made possible thank to the increased computer strength and mainly the increased capacity of the mass storage systems. So it was born ACTOR ® - The human sounding voice - that begin to have big audience thanks to the number of telephone services and application for disable people created by connected to Loquendo companies.
In the two-thousands, with the spin-off of the research group in the new company Loquendo, the synthesizer changes name, by getting the one of the company and is enriched in the following years of an impressive number of languages and voice (by reaching after ten years to have more than 30 languages and 70 voices, males and females).
The synthesizer, came out from the research labs and became a commercial product, endows itself with a number of editing tools to produce synthetic audio enriched with emotions (typical is the case of audio books for blind people achieved with the DAISY
technology) and furthermore it present itself as a SW library to produce the most various products, from small portable devices - mobile phones, navigators and palm computers, to telephone servers multichannel/multilingual for (semi)automatic call centers.
Speech recognition
Just after the beginning of the speech synthesis
research, they started the ones on the speech recognition
and, yet at the beginning of the Eighties
it was produced a first prototype able to understand the ten digits and a few simple commands.
By applying the Hidden Markov models in the 1984 it was developed a connected words and sentences speech recognizer in collaboration with ELSAG, another company in the IRI
-STET group.
The need to produce speech recognizer speaker independent for telephone applications brought to carry out speech databases with the recorded voice from hundreds of different people and in the 1987 it was created the first large database obtained by recording over the phone, with an automatic procedure, the voice of more than 1000 people calling, from all over Italy, a telephone server got ready in the CSELT labs.
The so recorded material allows to train Hidden Markov models and, by using sophisticated algorithms, to develop AURIS ® the first speech recognizer that could run in the most varied devices with DSP - Digital Signal Processor
.
In the Nineties
they begin the big European collaborations (in the framework of the European funded projects) and, together about twenty other companies and universities in the whole Europe, very large speech databases are collected in the whole Europe (on the whole more than 65000 people are contacted).
This huge material, combined with a new mixed approach Hidden Markov models - Neural networks
brought to realize FLEXUS ®, the first flexible vocabulary speech recognizer, that allows many varied telephone services to use automatic speech recognition in their human interfaces.
By bringing together FLEXUS and ACTOR in a unique dialog system, DIALOGOS ®, allows to realize extremely avant-garde telephone services for those years: the Telephone Subscribers Information Service (servizio 12) and the Railway Information Services (servizio FS Informa).
The two-thousands brought also for the speech recognizer the change of the name, by getting the one of the just born company Loquendo, the development of a large number of languages and the release of the engine also in the form of a SW library to realize the most varied telephone applications.
They are introduced varied sistems to write state-finite grammars and natural language models systems.
They continue the speech databases recording campaigns, by exiting from Europe and by moving to the Mediterranean countries, to the South, Center and North America and, finally, in the Far East countries. In total it has been recorded infinite hours of speech by contacting hundred thousands of people in the countries of the listed regions. The recordings have been executed both for fixed telephone networks, as well as in moving vehicles for mobile phones and also in domestic environment with high quality microphones for consumer applications (video games, appliances and domotics in general)
Speaker recognition
Research activities on speaker recognition
initiated very recently, at the meddle of two-thousands when speech databases tailored for this task have been available. In collaboration with Politecnico of Turin started some experiments on two different topic: speaker "identification" and "verification".
The successful pushed the company to move forward to the development of products targeted for this task commercialized both as SW libraries than trough the enabling platform described below.
Speech coding
The research activities on Speech coding
started even before the ones on speech recognition and synthesis, aiming to build equipment (CODEC
) and echo canceler able to increase as much as possible the number of telephone conversation that can flow trough a single cable (or satellite connection) without losing voice intelligibility.
At the end of the Seventies
, studies and experiments brought to the achievement of algorithm to encode the telephonic speech signal and set-up the European regulation CCITT known as encoding A-law
(8-bit logarithm encoding law "A" for audio signal 8 kHz band limited). Standard then used in the CODEC
for telephone lines ISDN 64 kbit/s.
In the next years they were built stronger codec (used in the telephone exchanges) and, within the PAN-Europe consortium GSM, the codec to use in the second generation mobile phones.
At the same time are built CODEC
to transmit high quality signal in spite of the 8 kHz band limit of the telephone cables, useful for audio and video conference applications-
When Loquendo was born all studies on coding were left to other CSELT groups (in the meanwhile renamed TILab - Telecom Italia Lab) that will bring, among other things, to the MP3
standard to encode high quality audio signal in MPEG video streams (used in the DVD video that have totally replaced the VHS video tapes). MP3 then became synonym of digital music and file sharing with the coming of Napster
yet at the end of the Nineties
and the development of internet in the form currently known (browsable hypertexts resident on different servers that embrace the planet is an only great network) it born the need to make these texts also available in voice through the telephone.
At the same time IVR - Interactive Voice Response
systems become always more and more widespread and it became essentials HW and SW tools to fast development of new telephone applications and services. It is evident to everybody that the previous development models that brought the achievement of complex systems such us the automation of the 'Telephone directory
or the Railway Information Service are too rigid and do not allow the easy development of new applications.
It feels therefore the need of enabling platforms for automatic voice telephone systems both scalable and easily programmables. It is created an appropriate working group that, by combining the efforts of all other groups, develops a voice browser
prototype that is shown to the public at SMAU
2000 with the name VoxNauta ®. The success is so big that Telecom Italia
decides to come out, from the original research labs, developing group and platform an therefore it creates Loquendo on 1 February 2001.
In the years, VoxNauta ® is further developed in various scalable forms: from small server to huge enterprise systems with thousands of lines and it is installed in hundreds of company all around the world (as languages/voices available in that moment).
The birth of standards to write telephone services (VoiceXML
) and protocols ( MRCP
) to connect server hosting the speech technologies to servers hosting the telephone boards pushes the development of solo SW Speech Server, hosting text-to-speech and speech-recognizer engines from Loquendo.
This constant research and development activity has brought Loquendo to become one of the most known players in the speech technology.
graphic department. The three small waves over the letter "0", in the animated gif version of the logo, light in sequence, giving the meaning of the emitting sound.
It is for sure that the name was a brilliant idea as originality and mnemonicity; in fact, at the time of his registration as company brand, it did not appear in any search engine but only rare Latin scrips. Its uniquety acted in such a way that in the years it became synonym of Italian speech technologies (although ofter erroneously identified with the only text-to-speech). To this result contributed also the marketing decision of the first two-thousands to dismiss the historical brand names Actor ® and Flexus ® to bet everything on the company name itself: they were born so Loquendo TTS e Loquendo ASR.
The brand does not have been protected by the company with strong emphasis and this is one of the causes of its enormous spread, also to the detriment of competitors' brands. It is sufficient search it on YouTube
to get it associated to hundreds of amusing and ironic video (although some time of doubtful taste) where the vocal track is carried out with one of the voices of the Turin synthesizer; video authors have in fact decided to leave the Loquendo word in the title of their artistic works to help to identify them as video with an artificial voice.
Same thing for Facebook
in which hundreds of profiles all around the world use the brand Loquendo to identify of non-real person profiles.
In conclusion, after ten years from its creation and by paraphrasing the slogan of an another well known Italian company, in these days we could certainly say: Where there is a voice, then there is Loquendo.
The last are the ones in the summer 2011, when it was announced the interest into the Turin company from two possible multinationals USA based: Nuance
and Avaya
.
As fhe first was a direct competitor of the Italian company it was seen with some worry by Loquendo workers that were worried about the possible melt of the research and development group and the definitive death of knowledge acquired in forty years of activity.
The second company appeared more interesting because complementary to the activity carried on by Loquendo; Avaya
in fact did not own any speech technology engines and therefore could have been very interested to grown in house these technologies rather than to continue to buy them outside (classic dilemma "make or buy".
This news have been followed carefully by the workers, Turin and Piedmont governments and the entire international scientific community.
At the end however, on 13 August 2011, Telecom Italia
publically announced the sale of the whole stake it was owning to the American company Nuance
for 53 million of euro
Multinational corporation
A multi national corporation or enterprise , is a corporation or an enterprise that manages production or delivers services in more than one country. It can also be referred to as an international corporation...
computer
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...
software technology corporation
Corporation
A corporation is created under the laws of a state as a separate legal entity that has privileges and liabilities that are distinct from those of its members. There are many different forms of corporations, most of which are used to conduct business. Early corporations were established by charter...
, headquartered in Torino, Italy
Italy
Italy , officially the Italian Republic languages]] under the European Charter for Regional or Minority Languages. In each of these, Italy's official name is as follows:;;;;;;;;), is a unitary parliamentary republic in South-Central Europe. To the north it borders France, Switzerland, Austria and...
, that provides speech recognition, speech synthesis, speaker verification and identification applications. Loquendo, which was founded in 2001 under the Telecom Italia
Telecom Italia
Telecom Italia is the largest Italian telecommunications company, also active in the media and manufacturing industries. Now a private concern listed on the Borsa Italiana, it was founded in 1994 by the merger of several state-owned telecommunications companies, the most important of which was...
Lab, also has offices in United Kingdom, Spain, Germany, France, and the United States.
Current business products can be found in portable and in-car navigation devices, assistive devices for the differently able, smartphones, ebook readers, talking ATMs, computer games, voice-controlled domestic appliances and others. The voice synthesis and speech recognition systems is used in a new e-health application as part of Spain’s Junta de Andalucía Government Health Services's virtual assistant.
Loquendo's products have been the recipient of several awards including being a Speech Technologies Speech Engine Leader in 2007, 2008, and 2009 It was rated as 'Market Leader' by Speech Technologies in 2009 and 2010.
On September 30, 2011, Nuance (one of Loquendo's main competitors) announced that it had acquired Loquendo.
History
The group that in the following years worked on speech synthesis, recognition and encoding was created in the middle of the Seventies1970s
File:1970s decade montage.png|From left, clockwise: US President Richard Nixon doing the V for Victory sign after his resignation from office after the Watergate scandal in 1974; Refugees aboard a US naval boat after the Fall of Saigon, leading to the end of the Vietnam War in 1975; The 1973 oil...
by the intuition of some IRI
Istituto per la Ricostruzione Industriale
The Istituto per la Ricostruzione Industriale was an Italian public company set up by the fascist government in 1933 to combat the effects of the global depression on the Italian economy...
-STET in the :it:CSELT laboratories in Turin
Turin
Turin is a city and major business and cultural centre in northern Italy, capital of the Piedmont region, located mainly on the left bank of the Po River and surrounded by the Alpine arch. The population of the city proper is 909,193 while the population of the urban area is estimated by Eurostat...
, already prestigious at international level.
Speech synthesisSpeech synthesisSpeech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware...
On advices from the University of PaduaUniversity of Padua
The University of Padua is a premier Italian university located in the city of Padua, Italy. The University of Padua was founded in 1222 as a school of law and was one of the most prominent universities in early modern Europe. It is among the earliest universities of the world and the second...
, by applying the so-called diphones
Diphone
In phonetics, a diphone is an adjacent pair of phones. It is usually used to refer to a recording of the transition between two phones.In the following diagram, a stream of phones are represented by P1, P2, etc., and the corresponding diphones are represented by D1-2, D2-3, etc:...
technique (union of a consonant and a vowel, 150 in total in the case of the Italian language) in the 1975 was created the first speech synthesizer with high intelligibility; it was MUSA (MUltichannel Speaking Automaton) that shows everybody the target that, with the technology in those days, the computers could reach. The results achieved in those years were condensed in a 45 rpm record that was spread in thousands of copies with the mass communication media. It was mainly the Italian version of the small song Frère Jacques
Frère Jacques
"Frère Jacques" , in English sometimes called "Brother John" or "Brother Peter", is a French nursery melody. The song is traditionally sung in a round. When the first singer reaches the end of the first line the next person starts at the beginning...
carried out in polyphony with more singing voices (MUSA could manage up to 8 synthesis channels in parallel) to arouse the maximum astonishment.
The evolution from that prototype, with the increased number of diphones (about 1000), the sharpening of the linguistic analysis tools (that allow, for example in Italian, to automatically distinguish "àncora" from "ancòra" - homograph words where only the stress changes and that mean, respectively, "anchor" and "again") and a better waveform management, brought in the next years towards the improvement of the synthetic voice. It was born in this wave the chip "sintetizzatore voce" developed internally in CSELT and added to the SGS (then SGS-Thompson and now STMicroelectronics
STMicroelectronics
STMicroelectronics is an Italian-French electronics and semiconductor manufacturer headquartered in Geneva, Switzerland.While STMicroelectronics corporate headquarters and the headquarters for EMEA region are based in Geneva, the holding company, STMicroelectronics N.V. is registered in Amsterdam,...
) catalog as Zilog
Zilog
Zilog, Inc., previously known as ZiLOG , is a manufacturer of 8-bit and 24-bit microcontrollers, and is most famous for its Intel 8080-compatible Z80 series.-History:...
's Z80 microprocessor's peripheral (with the code M8950).
In the Nineties
1990s
File:1990s decade montage.png|From left, clockwise: The Hubble Space Telescope floats in space after it was taken up in 1990; American F-16s and F-15s fly over burning oil fields and the USA Lexie in Operation Desert Storm, also known as the 1991 Gulf War; The signing of the Oslo Accords on...
ELOQUENS ® was born, speech synthesizer multi-platform for various operating systems (DOS
DOS
DOS, short for "Disk Operating System", is an acronym for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions 95, 98, and Millennium Edition.Related...
, Windows, System 7
System 7
System 7 is the name of a Macintosh operating system introduced in 1991.System 7 may also refer to:* System 7 , a British dance/ambient band* System 7 , 1991 album* IBM System/7, a 1970s computer system...
, Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
, OS/2
OS/2
OS/2 is a computer operating system, initially created by Microsoft and IBM, then later developed by IBM exclusively. The name stands for "Operating System/2," because it was introduced as part of the same generation change release as IBM's "Personal System/2 " line of second-generation personal...
) or telephone boards with very large number of channels.
Such boards are used by the Italian telephone operator to build the reverse telephoner subscribers information service (to get, from the telephone number, subscriber's identity and address).
Towards the end of the second millennium the synthesis technology changes totally, by moving from the diphones approach to the "variable length unit selection and concatenation". This approach was made possible thank to the increased computer strength and mainly the increased capacity of the mass storage systems. So it was born ACTOR ® - The human sounding voice - that begin to have big audience thanks to the number of telephone services and application for disable people created by connected to Loquendo companies.
In the two-thousands, with the spin-off of the research group in the new company Loquendo, the synthesizer changes name, by getting the one of the company and is enriched in the following years of an impressive number of languages and voice (by reaching after ten years to have more than 30 languages and 70 voices, males and females).
The synthesizer, came out from the research labs and became a commercial product, endows itself with a number of editing tools to produce synthetic audio enriched with emotions (typical is the case of audio books for blind people achieved with the DAISY
DAISY Digital Talking Book
DAISY is a standard for digital talking books. DAISY books are typically used by people have "print disabilities," including blindness, impaired vision, dyslexia...
technology) and furthermore it present itself as a SW library to produce the most various products, from small portable devices - mobile phones, navigators and palm computers, to telephone servers multichannel/multilingual for (semi)automatic call centers.
Speech recognitionSpeech recognitionSpeech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...
Just after the beginning of the speech synthesisSpeech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware...
research, they started the ones on the speech recognition
Speech recognition
Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...
and, yet at the beginning of the Eighties
1980s
File:1980s decade montage.png|thumb|400px|From left, clockwise: The first Space Shuttle, Columbia, lifted off in 1981; American President Ronald Reagan and Soviet leader Mikhail Gorbachev eased tensions between the two superpowers, leading to the end of the Cold War; The Fall of the Berlin Wall in...
it was produced a first prototype able to understand the ten digits and a few simple commands.
By applying the Hidden Markov models in the 1984 it was developed a connected words and sentences speech recognizer in collaboration with ELSAG, another company in the IRI
Istituto per la Ricostruzione Industriale
The Istituto per la Ricostruzione Industriale was an Italian public company set up by the fascist government in 1933 to combat the effects of the global depression on the Italian economy...
-STET group.
The need to produce speech recognizer speaker independent for telephone applications brought to carry out speech databases with the recorded voice from hundreds of different people and in the 1987 it was created the first large database obtained by recording over the phone, with an automatic procedure, the voice of more than 1000 people calling, from all over Italy, a telephone server got ready in the CSELT labs.
The so recorded material allows to train Hidden Markov models and, by using sophisticated algorithms, to develop AURIS ® the first speech recognizer that could run in the most varied devices with DSP - Digital Signal Processor
DSP
- Computing :* Digital signal processing, the study and implementation of signals in digital computing and their processing methods* Digital signal processor, a specialized microprocessor designed specifically for digital signal processing...
.
In the Nineties
1990s
File:1990s decade montage.png|From left, clockwise: The Hubble Space Telescope floats in space after it was taken up in 1990; American F-16s and F-15s fly over burning oil fields and the USA Lexie in Operation Desert Storm, also known as the 1991 Gulf War; The signing of the Oslo Accords on...
they begin the big European collaborations (in the framework of the European funded projects) and, together about twenty other companies and universities in the whole Europe, very large speech databases are collected in the whole Europe (on the whole more than 65000 people are contacted).
This huge material, combined with a new mixed approach Hidden Markov models - Neural networks
Neural Networks
Neural Networks is the official journal of the three oldest societies dedicated to research in neural networks: International Neural Network Society, European Neural Network Society and Japanese Neural Network Society, published by Elsevier...
brought to realize FLEXUS ®, the first flexible vocabulary speech recognizer, that allows many varied telephone services to use automatic speech recognition in their human interfaces.
By bringing together FLEXUS and ACTOR in a unique dialog system, DIALOGOS ®, allows to realize extremely avant-garde telephone services for those years: the Telephone Subscribers Information Service (servizio 12) and the Railway Information Services (servizio FS Informa).
The two-thousands brought also for the speech recognizer the change of the name, by getting the one of the just born company Loquendo, the development of a large number of languages and the release of the engine also in the form of a SW library to realize the most varied telephone applications.
They are introduced varied sistems to write state-finite grammars and natural language models systems.
They continue the speech databases recording campaigns, by exiting from Europe and by moving to the Mediterranean countries, to the South, Center and North America and, finally, in the Far East countries. In total it has been recorded infinite hours of speech by contacting hundred thousands of people in the countries of the listed regions. The recordings have been executed both for fixed telephone networks, as well as in moving vehicles for mobile phones and also in domestic environment with high quality microphones for consumer applications (video games, appliances and domotics in general)
Speaker recognitionSpeaker recognitionSpeaker recognition is the computing task of validating a user's claimed identity using characteristics extracted from their voices .There is a difference between speaker recognition and speech recognition . These two terms are frequently confused, as is voice recognition...
Research activities on speaker recognitionSpeaker recognition
Speaker recognition is the computing task of validating a user's claimed identity using characteristics extracted from their voices .There is a difference between speaker recognition and speech recognition . These two terms are frequently confused, as is voice recognition...
initiated very recently, at the meddle of two-thousands when speech databases tailored for this task have been available. In collaboration with Politecnico of Turin started some experiments on two different topic: speaker "identification" and "verification".
The successful pushed the company to move forward to the development of products targeted for this task commercialized both as SW libraries than trough the enabling platform described below.
Speech codingSpeech codingSpeech coding is the application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting...
The research activities on Speech codingSpeech coding
Speech coding is the application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting...
started even before the ones on speech recognition and synthesis, aiming to build equipment (CODEC
Codec
A codec is a device or computer program capable of encoding or decoding a digital data stream or signal. The word codec is a portmanteau of "compressor-decompressor" or, more commonly, "coder-decoder"...
) and echo canceler able to increase as much as possible the number of telephone conversation that can flow trough a single cable (or satellite connection) without losing voice intelligibility.
At the end of the Seventies
1970s
File:1970s decade montage.png|From left, clockwise: US President Richard Nixon doing the V for Victory sign after his resignation from office after the Watergate scandal in 1974; Refugees aboard a US naval boat after the Fall of Saigon, leading to the end of the Vietnam War in 1975; The 1973 oil...
, studies and experiments brought to the achievement of algorithm to encode the telephonic speech signal and set-up the European regulation CCITT known as encoding A-law
G.711
G.711 is an ITU-T standard for audio companding. It is primarily used in telephony. The standard was released for usage in 1972. Its formal name is Pulse code modulation of voice frequencies. It is required standard in many technologies, for example in H.320 and H.323 specifications. It can also...
(8-bit logarithm encoding law "A" for audio signal 8 kHz band limited). Standard then used in the CODEC
Codec
A codec is a device or computer program capable of encoding or decoding a digital data stream or signal. The word codec is a portmanteau of "compressor-decompressor" or, more commonly, "coder-decoder"...
for telephone lines ISDN 64 kbit/s.
In the next years they were built stronger codec (used in the telephone exchanges) and, within the PAN-Europe consortium GSM, the codec to use in the second generation mobile phones.
At the same time are built CODEC
Codec
A codec is a device or computer program capable of encoding or decoding a digital data stream or signal. The word codec is a portmanteau of "compressor-decompressor" or, more commonly, "coder-decoder"...
to transmit high quality signal in spite of the 8 kHz band limit of the telephone cables, useful for audio and video conference applications-
When Loquendo was born all studies on coding were left to other CSELT groups (in the meanwhile renamed TILab - Telecom Italia Lab) that will bring, among other things, to the MP3
MP3
MPEG-1 or MPEG-2 Audio Layer III, more commonly referred to as MP3, is a patented digital audio encoding format using a form of lossy data compression...
standard to encode high quality audio signal in MPEG video streams (used in the DVD video that have totally replaced the VHS video tapes). MP3 then became synonym of digital music and file sharing with the coming of Napster
Napster
Napster is an online music store and a Best Buy company. It was originally founded as a pioneering peer-to-peer file sharing Internet service that emphasized sharing audio files that were typically digitally encoded music as MP3 format files...
yet at the end of the Nineties
1990s
File:1990s decade montage.png|From left, clockwise: The Hubble Space Telescope floats in space after it was taken up in 1990; American F-16s and F-15s fly over burning oil fields and the USA Lexie in Operation Desert Storm, also known as the 1991 Gulf War; The signing of the Oslo Accords on...
Enabling platforms
Towards the end of Nineties1990s
File:1990s decade montage.png|From left, clockwise: The Hubble Space Telescope floats in space after it was taken up in 1990; American F-16s and F-15s fly over burning oil fields and the USA Lexie in Operation Desert Storm, also known as the 1991 Gulf War; The signing of the Oslo Accords on...
and the development of internet in the form currently known (browsable hypertexts resident on different servers that embrace the planet is an only great network) it born the need to make these texts also available in voice through the telephone.
At the same time IVR - Interactive Voice Response
Interactive voice response
Interactive voice response is a technology that allows a computer to interact with humans through the use of voice and DTMF keypad inputs....
systems become always more and more widespread and it became essentials HW and SW tools to fast development of new telephone applications and services. It is evident to everybody that the previous development models that brought the achievement of complex systems such us the automation of the 'Telephone directory
Telephone directory
A telephone directory is a listing of telephone subscribers in a geographical area or subscribers to services provided by the organization that publishes the directory...
or the Railway Information Service are too rigid and do not allow the easy development of new applications.
It feels therefore the need of enabling platforms for automatic voice telephone systems both scalable and easily programmables. It is created an appropriate working group that, by combining the efforts of all other groups, develops a voice browser
Voice browser
A voice browser is a web browser that presents an interactive voice user interface to the user. In addition, it typically provides an interface to the PSTN or a PBX. Just as a visual web browser works with HTML pages, a voice browser operates on pages that specify voice dialogues...
prototype that is shown to the public at SMAU
SMAU
SMAU is a computer expo in Italy, held annually at FieraMilano exhibition centre, Milan since 1967.- External links :* *...
2000 with the name VoxNauta ®. The success is so big that Telecom Italia
Telecom Italia
Telecom Italia is the largest Italian telecommunications company, also active in the media and manufacturing industries. Now a private concern listed on the Borsa Italiana, it was founded in 1994 by the merger of several state-owned telecommunications companies, the most important of which was...
decides to come out, from the original research labs, developing group and platform an therefore it creates Loquendo on 1 February 2001.
In the years, VoxNauta ® is further developed in various scalable forms: from small server to huge enterprise systems with thousands of lines and it is installed in hundreds of company all around the world (as languages/voices available in that moment).
The birth of standards to write telephone services (VoiceXML
VoiceXML
VoiceXML is the W3C's standard XML format for specifying interactive voice dialogues between a human and a computer. It allows voice applications to be developed and deployed in an analogous way to HTML for visual applications. Just as HTML documents are interpreted by a visual web browser,...
) and protocols ( MRCP
Media Resource Control Protocol
Media Resource Control Protocol is a communication protocol used by speech servers to provide various services to their clients...
) to connect server hosting the speech technologies to servers hosting the telephone boards pushes the development of solo SW Speech Server, hosting text-to-speech and speech-recognizer engines from Loquendo.
This constant research and development activity has brought Loquendo to become one of the most known players in the speech technology.
The brand
The origin of the name Loquendo ® while the logo was created by the Telecom ItaliaTelecom Italia
Telecom Italia is the largest Italian telecommunications company, also active in the media and manufacturing industries. Now a private concern listed on the Borsa Italiana, it was founded in 1994 by the merger of several state-owned telecommunications companies, the most important of which was...
graphic department. The three small waves over the letter "0", in the animated gif version of the logo, light in sequence, giving the meaning of the emitting sound.
It is for sure that the name was a brilliant idea as originality and mnemonicity; in fact, at the time of his registration as company brand, it did not appear in any search engine but only rare Latin scrips. Its uniquety acted in such a way that in the years it became synonym of Italian speech technologies (although ofter erroneously identified with the only text-to-speech). To this result contributed also the marketing decision of the first two-thousands to dismiss the historical brand names Actor ® and Flexus ® to bet everything on the company name itself: they were born so Loquendo TTS e Loquendo ASR.
The brand does not have been protected by the company with strong emphasis and this is one of the causes of its enormous spread, also to the detriment of competitors' brands. It is sufficient search it on YouTube
YouTube
YouTube is a video-sharing website, created by three former PayPal employees in February 2005, on which users can upload, view and share videos....
to get it associated to hundreds of amusing and ironic video (although some time of doubtful taste) where the vocal track is carried out with one of the voices of the Turin synthesizer; video authors have in fact decided to leave the Loquendo word in the title of their artistic works to help to identify them as video with an artificial voice.
Same thing for Facebook
Facebook
Facebook is a social networking service and website launched in February 2004, operated and privately owned by Facebook, Inc. , Facebook has more than 800 million active users. Users must register before using the site, after which they may create a personal profile, add other users as...
in which hundreds of profiles all around the world use the brand Loquendo to identify of non-real person profiles.
In conclusion, after ten years from its creation and by paraphrasing the slogan of an another well known Italian company, in these days we could certainly say: Where there is a voice, then there is Loquendo.
Sale of the company
In the years there are various rumors of the sale of Loquendo to other companies.The last are the ones in the summer 2011, when it was announced the interest into the Turin company from two possible multinationals USA based: Nuance
Nuance Communications
Nuance Communications is a multinational computer software technology corporation, headquartered in Burlington, Massachusetts, USA, that provides speech and imaging applications...
and Avaya
Avaya
Avaya Inc. is a privately held computer networking, information technology and telecommunications company that is a global provider of business communications systems. The international head quarters is in Basking Ridge, New Jersey, United States...
.
As fhe first was a direct competitor of the Italian company it was seen with some worry by Loquendo workers that were worried about the possible melt of the research and development group and the definitive death of knowledge acquired in forty years of activity.
The second company appeared more interesting because complementary to the activity carried on by Loquendo; Avaya
Avaya
Avaya Inc. is a privately held computer networking, information technology and telecommunications company that is a global provider of business communications systems. The international head quarters is in Basking Ridge, New Jersey, United States...
in fact did not own any speech technology engines and therefore could have been very interested to grown in house these technologies rather than to continue to buy them outside (classic dilemma "make or buy".
This news have been followed carefully by the workers, Turin and Piedmont governments and the entire international scientific community.
At the end however, on 13 August 2011, Telecom Italia
Telecom Italia
Telecom Italia is the largest Italian telecommunications company, also active in the media and manufacturing industries. Now a private concern listed on the Borsa Italiana, it was founded in 1994 by the merger of several state-owned telecommunications companies, the most important of which was...
publically announced the sale of the whole stake it was owning to the American company Nuance
Nuance Communications
Nuance Communications is a multinational computer software technology corporation, headquartered in Burlington, Massachusetts, USA, that provides speech and imaging applications...
for 53 million of euro
Products
- speech synthesisSpeech synthesisSpeech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware...
- speech recognitionSpeech recognitionSpeech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...
- speaker verification
- voice browserVoice browserA voice browser is a web browser that presents an interactive voice user interface to the user. In addition, it typically provides an interface to the PSTN or a PBX. Just as a visual web browser works with HTML pages, a voice browser operates on pages that specify voice dialogues...