Source-filter model of speech production
Encyclopedia
The source–filter model of speech production models speech as a combination of a sound source, such as the vocal cords, and a linear acoustic filter, the vocal tract
(and radiation characteristic). An important assumption that is often made in the use of the source-filter model is the independence of source and filter. In such cases, the model should more accurately be referred to as the "independent source-filter model".
While only an approximation, the model is widely used in a number of applications because of its relative simplicity. To varying degrees, different phoneme
s can be distinguished by the properties of their source(s) and their spectral shape. Voiced sounds (e.g., vowels) have (at least) a source due to (mostly) periodic glottal excitation, which can be approximated by an impulse train in the time domain and by harmonics in the frequency domain, and a filter that depends on, e.g., tongue position and lip protrusion. On the other hand, fricatives have (at least) a source due to turbulent noise produced at a constriction in the oral cavity (e.g., the sounds represented by orthographically by "s" and "f"). So called voiced fricatives (such as "z" and "v") have two sources - one at the glottis and one at the supra-glottal constriction.
The source-filter model is used in both speech synthesis
and speech analysis
, and is related to linear prediction
. The development of the model is due, in large part, to the early work of Gunnar Fant
, although others, notably Ken Stevens, have also contributed substantially to the models underlying acoustic analysis of speech and speech synthesis.
In implementation of the source-filter model of speech production, the sound source, or excitation signal, is often modelled as a periodic impulse train, for voiced speech, or white noise for unvoiced speech. The vocal tract filter is, in the simplest case, approximated by an all-pole filter, where the coefficients are obtained by performing linear prediction to minimize the mean-squared error in the speech signal to be reproduced. Convolution of the excitation signal with the filter response then produces the synthesised speech.
Vocal tract
The vocal tract is the cavity in human beings and in animals where sound that is produced at the sound source is filtered....
(and radiation characteristic). An important assumption that is often made in the use of the source-filter model is the independence of source and filter. In such cases, the model should more accurately be referred to as the "independent source-filter model".
While only an approximation, the model is widely used in a number of applications because of its relative simplicity. To varying degrees, different phoneme
Phoneme
In a language or dialect, a phoneme is the smallest segmental unit of sound employed to form meaningful contrasts between utterances....
s can be distinguished by the properties of their source(s) and their spectral shape. Voiced sounds (e.g., vowels) have (at least) a source due to (mostly) periodic glottal excitation, which can be approximated by an impulse train in the time domain and by harmonics in the frequency domain, and a filter that depends on, e.g., tongue position and lip protrusion. On the other hand, fricatives have (at least) a source due to turbulent noise produced at a constriction in the oral cavity (e.g., the sounds represented by orthographically by "s" and "f"). So called voiced fricatives (such as "z" and "v") have two sources - one at the glottis and one at the supra-glottal constriction.
The source-filter model is used in both speech synthesis
Speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware...
and speech analysis
Voice analysis
Voice analysis is the study of speech sounds for purposes other than linguistic content, such as in speech recognition. Such studies include mostly medical analysis of the voice i.e. phoniatrics, but also speaker identification...
, and is related to linear prediction
Linear prediction
Linear prediction is a mathematical operation where future values of a discrete-time signal are estimated as a linear function of previous samples....
. The development of the model is due, in large part, to the early work of Gunnar Fant
Gunnar Fant
Carl Gunnar Michael Fant was professor emeritus at the Royal Institute of Technology in Stockholm. He was a first cousin of George Fant.Gunnar Fant received a Master of Science in Electrical Engineering in 1945...
, although others, notably Ken Stevens, have also contributed substantially to the models underlying acoustic analysis of speech and speech synthesis.
In implementation of the source-filter model of speech production, the sound source, or excitation signal, is often modelled as a periodic impulse train, for voiced speech, or white noise for unvoiced speech. The vocal tract filter is, in the simplest case, approximated by an all-pole filter, where the coefficients are obtained by performing linear prediction to minimize the mean-squared error in the speech signal to be reproduced. Convolution of the excitation signal with the filter response then produces the synthesised speech.