CereProc
Encyclopedia
CereProc is a speech synthesis
Speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware...

 company based in Edinburgh, Scotland, founded in 2005. The company specialises in creating natural and expressive-sounding text to speech voices, synthesis voices with regional accents, and in voice cloning.

Voice building technology

CereProc creates voices using two different voice-building technologies: unit selection synthesis and HTS.

CereProc's unit selection voices are built from large database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

s of recorded speech. During database creation, each recorded utterance is segmented into some or all of the following: individual phones, syllable
Syllable
A syllable is a unit of organization for a sequence of speech sounds. For example, the word water is composed of two syllables: wa and ter. A syllable is typically made up of a syllable nucleus with optional initial and final margins .Syllables are often considered the phonological "building...

s, morpheme
Morpheme
In linguistics, a morpheme is the smallest semantically meaningful unit in a language. The field of study dedicated to morphemes is called morphology. A morpheme is not identical to a word, and the principal difference between the two is that a morpheme may or may not stand alone, whereas a word,...

s, word
Word
In language, a word is the smallest free form that may be uttered in isolation with semantic or pragmatic content . This contrasts with a morpheme, which is the smallest unit of meaning but will not necessarily stand on its own...

s, phrase
Phrase
In everyday speech, a phrase may refer to any group of words. In linguistics, a phrase is a group of words which form a constituent and so function as a single unit in the syntax of a sentence. A phrase is lower on the grammatical hierarchy than a clause....

s, and sentence
Sentence (linguistics)
In the field of linguistics, a sentence is an expression in natural language, and often defined to indicate a grammatical unit consisting of one or more words that generally bear minimal syntactic relation to the words that precede or follow it...

s. The division into segments is done using a specially modified speech recognizer
Speech recognition
Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...

. An index
Index (database)
A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space...

 of the units in the speech database is then created based on the segmentation and acoustic parameters like the fundamental frequency
Fundamental frequency
The fundamental frequency, often referred to simply as the fundamental and abbreviated f0, is defined as the lowest frequency of a periodic waveform. In terms of a superposition of sinusoids The fundamental frequency, often referred to simply as the fundamental and abbreviated f0, is defined as the...

 (pitch
Pitch (music)
Pitch is an auditory perceptual property that allows the ordering of sounds on a frequency-related scale.Pitches are compared as "higher" and "lower" in the sense associated with musical melodies,...

), duration, position in the syllable, and neighbouring phones. At runtime, the desired target utterance is created by determining the best chain of candidate units from the database (unit selection). Unit selection provides the greatest naturalness, because it applies digital signal processing
Digital signal processing
Digital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...

 (DSP) to the recorded speech only at concatenation points. DSP often makes recorded speech sound less natural.

CereProc's HTS voices produce speech synthesis based on hidden Markov model
Hidden Markov model
A hidden Markov model is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved states. An HMM can be considered as the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E...

s (HMMs). In this system, the frequency spectrum
Frequency spectrum
The frequency spectrum of a time-domain signal is a representation of that signal in the frequency domain. The frequency spectrum can be generated via a Fourier transform of the signal, and the resulting values are usually presented as amplitude and phase, both plotted versus frequency.Any signal...

 (vocal tract
Vocal tract
The vocal tract is the cavity in human beings and in animals where sound that is produced at the sound source is filtered....

), fundamental frequency
Fundamental frequency
The fundamental frequency, often referred to simply as the fundamental and abbreviated f0, is defined as the lowest frequency of a periodic waveform. In terms of a superposition of sinusoids The fundamental frequency, often referred to simply as the fundamental and abbreviated f0, is defined as the...

 (vocal source), and duration (prosody
Prosody (linguistics)
In linguistics, prosody is the rhythm, stress, and intonation of speech. Prosody may reflect various features of the speaker or the utterance: the emotional state of the speaker; the form of the utterance ; the presence of irony or sarcasm; emphasis, contrast, and focus; or other elements of...

) of speech are modelled simultaneously by HMMs. Speech waveforms are generated from HMMs themselves based on the maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

 criterion. Critically, HTS voices can be built from significantly less recorded speech than unit selection voices and have a much smaller footprint when installed.

Voices and languages

CereProc has fourteen generally available voices that speak five languages in a number of different regional accents:


American English: Katherine, Adam

British English: Sarah, William

Scottish English: Heather, Kirsty, Stuart

West Midlands English: Sue

French: Suzanne

Castillian Spanish: Sara

Italian: Laura

German: Gudrun, Alex

Austrian German: Leopold

French-accented English: Nicole


In addition, the company has developed a number of celebrity voices that are not generally available to the public. These include George W. Bush
George W. Bush
George Walker Bush is an American politician who served as the 43rd President of the United States, from 2001 to 2009. Before that, he was the 46th Governor of Texas, having served from 1995 to 2000....

, Barack Obama
Barack Obama
Barack Hussein Obama II is the 44th and current President of the United States. He is the first African American to hold the office. Obama previously served as a United States Senator from Illinois, from January 2005 until he resigned following his victory in the 2008 presidential election.Born in...

 and Arnold Schwarzenegger
Arnold Schwarzenegger
Arnold Alois Schwarzenegger is an Austrian-American former professional bodybuilder, actor, businessman, investor, and politician. Schwarzenegger served as the 38th Governor of California from 2003 until 2011....

.

Voice cloning

In 2009, film critic Roger Ebert
Roger Ebert
Roger Joseph Ebert is an American film critic and screenwriter. He is the first film critic to win a Pulitzer Prize for Criticism.Ebert is known for his film review column and for the television programs Sneak Previews, At the Movies with Gene Siskel and Roger Ebert, and Siskel and Ebert and The...

 employed CereProc to create a synthetic version of his voice. Ebert has lost the power of speech following surgery to treat thyroid cancer
Thyroid cancer
Thyroid neoplasm is a neoplasm or tumor of the thyroid. It can be a benign tumor such as thyroid adenoma, or it can be a malignant neoplasm , such as papillary, follicular, medullary or anaplastic thyroid cancer. Most patients are 25 to 65 years of age when first diagnosed; women are more affected...

. CereProc mined tapes and DVD commentaries featuring Ebert's voice to create a text-to-speech voice that sounded more like his own. Roger Ebert used the voice in his March 2, 2010 appearance appearance on The Oprah Winfrey Show
The Oprah Winfrey Show
The Oprah Winfrey Show is an American syndicated talk show hosted and produced by its namesake Oprah Winfrey. It ran nationally for 25 seasons beginning in 1986, before concluding in 2011. It is the highest-rated talk show in American television history....

.

CereProc voice cloning technology is currently being used in the UK by MND
Motor neurone disease
The motor neurone diseases are a group of neurological disorders that selectively affect motor neurones, the cells that control voluntary muscle activity including speaking, walking, breathing, swallowing and general movement of the body. They are generally progressive in nature, and can cause...

 sufferers, to create synthesis voices before they lose the power of speech. This process was featured in a BBC Radio 4
BBC Radio 4
BBC Radio 4 is a British domestic radio station, operated and owned by the BBC, that broadcasts a wide variety of spoken-word programmes, including news, drama, comedy, science and history. It replaced the BBC Home Service in 1967. The station controller is currently Gwyneth Williams, and the...

 documentary, Giving the Critic Back His Voice, broadcast in August 2010.

System compatibility

CereProc voices can be deployed on different operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

s and on different types of devices. CereProc desktop voices are compatible with Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

 and Apple Mac OSX
OSX
-Technology:*OS-X, an operating system for the Zilog Z80*DC/OSx, 1980s-era Unix operating system by Pyramid Technology-Other:*Kosciusko-Attala County Airport in Kosciusko, Mississippi, by IATA code*OSX, a Brazilian shipbuilding company, part of the EBX Group...

. They install as system voices and are able to be used by other speech-enabled applications. CereProc's client/server system cServer, aimed principally at the corporate IVR market, can be run on Windows and Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

. CereProc Mobile voices can be deployed on Android and Apple iOS
IOS
iOS is an operating system for iPad, iPhone, iPod Touch, and Apple TV.IOS may also refer to:-Companies and organisations:* Illinois Ornithological Society, American state-based bird club...

.

See also

  • Speech synthesis
    Speech synthesis
    Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware...

  • Language
    Language
    Language may refer either to the specifically human capacity for acquiring and using complex systems of communication, or to a specific instance of such a system of complex communication...

  • Natural language processing
    Natural language processing
    Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

  • Speech processing
    Speech processing
    Speech processing is the study of speech signals and the processing methods of these signals.The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signal.It is also closely tied to...

  • Speech recognition
    Speech recognition
    Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...

  • List of screen readers

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK