Phase vocoder
Encyclopedia
A phase vocoder is a type of vocoder
Vocoder
A vocoder is an analysis/synthesis system, mostly used for speech. In the encoder, the input is passed through a multiband filter, each band is passed through an envelope follower, and the control signals from the envelope followers are communicated to the decoder...

 which can scale
Scaling (geometry)
In Euclidean geometry, uniform scaling is a linear transformation that enlarges or shrinks objects by a scale factor that is the same in all directions. The result of uniform scaling is similar to the original...

 both the frequency
Frequency
Frequency is the number of occurrences of a repeating event per unit time. It is also referred to as temporal frequency.The period is the duration of one cycle in a repeating event, so the period is the reciprocal of the frequency...

 and time domain
Time domain
Time domain is a term used to describe the analysis of mathematical functions, physical signals or time series of economic or environmental data, with respect to time. In the time domain, the signal or function's value is known for all real numbers, for the case of continuous time, or at various...

s of audio signals by using phase
Phase (waves)
Phase in waves is the fraction of a wave cycle which has elapsed relative to an arbitrary point.-Formula:The phase of an oscillation or wave refers to a sinusoidal function such as the following:...

 information. The computer algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

 allows frequency-domain modifications to a digital sound file (typically time expansion/compression and pitch shifting
Audio timescale-pitch modification
Time stretching is the process of changing the speed or duration of an audio signal without affecting its pitch.Pitch scaling or pitch shifting is the opposite: the process of changing the pitch without affecting the speed...

).

At the heart of the phase vocoder is the short-time Fourier transform
Short-time Fourier transform
The short-time Fourier transform , or alternatively short-term Fourier transform, is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time....

 (STFT), typically coded using fast Fourier transform
Fast Fourier transform
A fast Fourier transform is an efficient algorithm to compute the discrete Fourier transform and its inverse. "The FFT has been called the most important numerical algorithm of our lifetime ." There are many distinct FFT algorithms involving a wide range of mathematics, from simple...

s. The STFT converts a time domain
Time domain
Time domain is a term used to describe the analysis of mathematical functions, physical signals or time series of economic or environmental data, with respect to time. In the time domain, the signal or function's value is known for all real numbers, for the case of continuous time, or at various...

 representation of sound into a time-frequency representation
Time-frequency representation
A time–frequency representation is a view of a signal represented over both time and frequency. Time–frequency analysis means analysis into the time–frequency domain provided by a TFR...

 (the "analysis" phase), allowing modifications to the amplitudes or phases of specific frequency components of the sound, before resynthesis of the frequency domain
Frequency domain
In electronics, control systems engineering, and statistics, frequency domain is a term used to describe the domain for analysis of mathematical functions or signals with respect to frequency, rather than time....

 representation into the time domain by the inverse STFT. The time evolution of the resynthesized sound can be changed by means of modifying the time position of the STFT frames prior to the resynthesis operation
allowing for time-scale modification of the original sound file.

Phase coherence problem

The main problem that has to be solved for all case of manipulation of the STFT is the fact that individual signal components (sinusoids, impulses) will be spread over multiple frames and multiple STFT frequency locations (bins). This is because the STFT analysis is done using overlapping analysis windows
Window function
In signal processing, a window function is a mathematical function that is zero-valued outside of some chosen interval. For instance, a function that is constant inside the interval and zero elsewhere is called a rectangular window, which describes the shape of its graphical representation...

. The windowing results in spectral leakage
Spectral leakage
Spectral leakage is an effect in the frequency analysis of finite-length signals or finite-length segments of infinite signals where it appears as if some energy has "leaked" out of the original signal spectrum into other frequencies....

 such that the information of individual sinusoidal components is spread over adjacent STFT bins. To avoid border effects of tapering of the analysis windows STFT analysis windows overlap in time. This time overlap results in the fact that adjacent STFT analysis are strongly correlated (a sinusoid present in analysis frame at time "t" will be present in the subsequent frames as well). The problem of signal transformation with the phase vocoder is related to the problem that all modifications that are done in the STFT representation need to preserve the appropriate correlation between adjacent frequency bins (vertical coherence) and time frames (horizontal coherence). Besides for extremely simple synthetic sounds these appropriate correlations can only be preserved approximately and since the invention of the phase vocoder the research was mainly concerned with finding algorithms that would preserve the vertical and horizontal coherence of the STFT representation after the modification. For time scaling operations amplitude coherence is only a minor problem because shifting analysis frames in time has only a minor impact on the amplitude. The phase coherence problem has been tackled for quite a while before appropriate solutions have emerged.

History

The phase vocoder was introduced in 1966 by Flanagan as an algorithm that would preserve horizontal coherence between the phases of bins that represent sinusoidal components. This original phase vocoder did not take into account the vertical coherence between adjacent frequency bins, and therefore, time stretching with this system did produce sound signals that were missing clarity.

The optimal reconstruction of the sound signal from STFT after amplitude modifications has been proposed by Griffin and Lim in 1984. This algorithm does not consider the problem to produce a coherent STFT, but it allows to find the sound signal that has an STFT that is as close as possible to the modified STFT even if the modified STFT is not coherent (does not represent any signal).

The problem of the vertical coherence remained a major issue for the quality of time scaling operations until 1999 when the Laroche and Dolson proposed a rather simple means to preserve phase consistency across spectral bins. The proposition of Laroche and Dolson has to be seen as a turning point in phase vocoder history. It has been shown that by means of ensuring vertical phase consistency very high quality time scaling transformations can be obtained.

The algorithm proposed by Laroche did not allow to preserve horizontal phase coherence for sound onsets (note onsets). A solution for this problem has been proposed by Roebel.

A software implementation of the phase vocoder based signal transformation that is using means similar to what has been described here above to achieve high quality signal transformation is for example Ircam
IRCAM
IRCAM is a European institute for science about music and sound and avant garde electro-acoustical art music. It is situated next to, and is organizationally linked with, the Centre Pompidou in Paris...

's SuperVP.

Use in music

British composer Trevor Wishart
Trevor Wishart
Trevor Wishart is an English composer, based in York. Wishart has contributed to composing with digital audio media, both fixed and interactive...

 used phase vocoder analyses and transformations of a human voice as the basis for his composition VOX 5 (part of his larger VOX Cycle
Vox Cycle
Vox Cycle is a six compositions or indipendent moviment cycle for four amplified voices, and electroacoustic music by Trevor Wishart, composed between 1980 and 1988, associated with extended vocal techniques and the contemporay vocal composition...

). Transfigured Wind by American composer Roger Reynolds
Roger Reynolds
Roger Reynolds is an American composer born July 18, 1934 in Detroit, Michigan. He is a professor at the University of California at San Diego. He received an undergraduate degree in engineering physics from the University of Michigan where he later studied composition with Ross Lee Finney...

 uses the phase vocoder to perform time-stretching of flute sounds.

The proprietary Auto-Tune
Auto-Tune
Auto-Tune is a proprietary audio processor created by Antares Audio Technologies. Auto-Tune uses a phase vocoder to correct pitch in vocal and instrumental performances. It is used to disguise off-key inaccuracies and mistakes, and has allowed singers to perform apparently perfectly tuned vocal...

pitch-correcting software, widely used in commercial music production, is based on the phase vocoder principle.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK