CereProc - AbsoluteAstronomy.com

CereProc is a speech synthesis

Speech synthesis

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware...

company based in Edinburgh, Scotland, founded in 2005. The company specialises in creating natural and expressive-sounding text to speech voices, synthesis voices with regional accents, and in voice cloning.

Voice building technology

CereProc creates voices using two different voice-building technologies: unit selection synthesis and HTS.

CereProc's unit selection voices are built from large database

Database

A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

s of recorded speech. During database creation, each recorded utterance is segmented into some or all of the following: individual phones, syllable

Syllable

A syllable is a unit of organization for a sequence of speech sounds. For example, the word water is composed of two syllables: wa and ter. A syllable is typically made up of a syllable nucleus with optional initial and final margins .Syllables are often considered the phonological "building...

s, morpheme

Morpheme

In linguistics, a morpheme is the smallest semantically meaningful unit in a language. The field of study dedicated to morphemes is called morphology. A morpheme is not identical to a word, and the principal difference between the two is that a morpheme may or may not stand alone, whereas a word,...

s, word

Word

In language, a word is the smallest free form that may be uttered in isolation with semantic or pragmatic content . This contrasts with a morpheme, which is the smallest unit of meaning but will not necessarily stand on its own...

s, phrase

Phrase

In everyday speech, a phrase may refer to any group of words. In linguistics, a phrase is a group of words which form a constituent and so function as a single unit in the syntax of a sentence. A phrase is lower on the grammatical hierarchy than a clause....

s, and sentence

Sentence (linguistics)

In the field of linguistics, a sentence is an expression in natural language, and often defined to indicate a grammatical unit consisting of one or more words that generally bear minimal syntactic relation to the words that precede or follow it...

s. The division into segments is done using a specially modified speech recognizer

Speech recognition

Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...

. An index

Index (database)

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space...

of the units in the speech database is then created based on the segmentation and acoustic parameters like the fundamental frequency

Fundamental frequency

The fundamental frequency, often referred to simply as the fundamental and abbreviated f0, is defined as the lowest frequency of a periodic waveform. In terms of a superposition of sinusoids The fundamental frequency, often referred to simply as the fundamental and abbreviated f0, is defined as the...

(pitch

Pitch (music)

Pitch is an auditory perceptual property that allows the ordering of sounds on a frequency-related scale.Pitches are compared as "higher" and "lower" in the sense associated with musical melodies,...

), duration, position in the syllable, and neighbouring phones. At runtime, the desired target utterance is created by determining the best chain of candidate units from the database (unit selection). Unit selection provides the greatest naturalness, because it applies digital signal processing

Digital signal processing

Digital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...

(DSP) to the recorded speech only at concatenation points. DSP often makes recorded speech sound less natural.

CereProc's HTS voices produce speech synthesis based on hidden Markov model

Hidden Markov model

A hidden Markov model is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved states. An HMM can be considered as the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E...

s (HMMs). In this system, the frequency spectrum

Frequency spectrum

The frequency spectrum of a time-domain signal is a representation of that signal in the frequency domain. The frequency spectrum can be generated via a Fourier transform of the signal, and the resulting values are usually presented as amplitude and phase, both plotted versus frequency.Any signal...

(vocal tract

Vocal tract

The vocal tract is the cavity in human beings and in animals where sound that is produced at the sound source is filtered....

), fundamental frequency

Fundamental frequency

(vocal source), and duration (prosody

Prosody (linguistics)

In linguistics, prosody is the rhythm, stress, and intonation of speech. Prosody may reflect various features of the speaker or the utterance: the emotional state of the speaker; the form of the utterance ; the presence of irony or sarcasm; emphasis, contrast, and focus; or other elements of...

) of speech are modelled simultaneously by HMMs. Speech waveforms are generated from HMMs themselves based on the maximum likelihood

Maximum likelihood

In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

criterion. Critically, HTS voices can be built from significantly less recorded speech than unit selection voices and have a much smaller footprint when installed.

Voices and languages

CereProc has fourteen generally available voices that speak five languages in a number of different regional accents:

American English: Katherine, Adam

British English: Sarah, William

Scottish English: Heather, Kirsty, Stuart

West Midlands English: Sue

French: Suzanne

Castillian Spanish: Sara

Italian: Laura

German: Gudrun, Alex

Austrian German: Leopold

French-accented English: Nicole

In addition, the company has developed a number of celebrity voices that are not generally available to the public. These include George W. Bush

George W. Bush

George Walker Bush is an American politician who served as the 43rd President of the United States, from 2001 to 2009. Before that, he was the 46th Governor of Texas, having served from 1995 to 2000....

, Barack Obama

Barack Obama

Barack Hussein Obama II is the 44th and current President of the United States. He is the first African American to hold the office. Obama previously served as a United States Senator from Illinois, from January 2005 until he resigned following his victory in the 2008 presidential election.Born in...

and Arnold Schwarzenegger

Arnold Schwarzenegger

Arnold Alois Schwarzenegger is an Austrian-American former professional bodybuilder, actor, businessman, investor, and politician. Schwarzenegger served as the 38th Governor of California from 2003 until 2011....

Voice cloning

In 2009, film critic Roger Ebert

Roger Ebert

Roger Joseph Ebert is an American film critic and screenwriter. He is the first film critic to win a Pulitzer Prize for Criticism.Ebert is known for his film review column and for the television programs Sneak Previews, At the Movies with Gene Siskel and Roger Ebert, and Siskel and Ebert and The...

employed CereProc to create a synthetic version of his voice. Ebert has lost the power of speech following surgery to treat thyroid cancer

Thyroid cancer

Thyroid neoplasm is a neoplasm or tumor of the thyroid. It can be a benign tumor such as thyroid adenoma, or it can be a malignant neoplasm , such as papillary, follicular, medullary or anaplastic thyroid cancer. Most patients are 25 to 65 years of age when first diagnosed; women are more affected...

. CereProc mined tapes and DVD commentaries featuring Ebert's voice to create a text-to-speech voice that sounded more like his own. Roger Ebert used the voice in his March 2, 2010 appearance appearance on The Oprah Winfrey Show
The Oprah Winfrey Show
The Oprah Winfrey Show is an American syndicated talk show hosted and produced by its namesake Oprah Winfrey. It ran nationally for 25 seasons beginning in 1986, before concluding in 2011. It is the highest-rated talk show in American television history....

.

CereProc voice cloning technology is currently being used in the UK by MND

Motor neurone disease

The motor neurone diseases are a group of neurological disorders that selectively affect motor neurones, the cells that control voluntary muscle activity including speaking, walking, breathing, swallowing and general movement of the body. They are generally progressive in nature, and can cause...

sufferers, to create synthesis voices before they lose the power of speech. This process was featured in a BBC Radio 4

BBC Radio 4

BBC Radio 4 is a British domestic radio station, operated and owned by the BBC, that broadcasts a wide variety of spoken-word programmes, including news, drama, comedy, science and history. It replaced the BBC Home Service in 1967. The station controller is currently Gwyneth Williams, and the...

documentary, Giving the Critic Back His Voice, broadcast in August 2010.

System compatibility

CereProc voices can be deployed on different operating system

Operating system

An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

s and on different types of devices. CereProc desktop voices are compatible with Microsoft Windows

Microsoft Windows

Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

and Apple Mac OSX

OSX

-Technology:*OS-X, an operating system for the Zilog Z80*DC/OSx, 1980s-era Unix operating system by Pyramid Technology-Other:*Kosciusko-Attala County Airport in Kosciusko, Mississippi, by IATA code*OSX, a Brazilian shipbuilding company, part of the EBX Group...

. They install as system voices and are able to be used by other speech-enabled applications. CereProc's client/server system cServer, aimed principally at the corporate IVR market, can be run on Windows and Linux

Linux

Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

. CereProc Mobile voices can be deployed on Android and Apple iOS

IOS

iOS is an operating system for iPad, iPhone, iPod Touch, and Apple TV.IOS may also refer to:-Companies and organisations:* Illinois Ornithological Society, American state-based bird club...

External links

CereProc on-line.
http://www.ted.com/talks/roger_ebert_remaking_my_voice.htmlRoger Ebert demonstrates his CereProc voice at TED2011
TED (conference)
TED is a global set of conferences owned by the private non-profit Sapling Foundation, formed to disseminate "ideas worth spreading"....

]

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.

Voice building technology

Voices and languages

Voice cloning

System compatibility

See also

External links