Independent component analysis
Encyclopedia
Independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents supposing the mutual statistical independence
Statistical independence
In probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs...

 of the non-Gaussian source signals. It is a special case of blind source separation.

Introduction

When the independence assumption is correct, blind ICA separation of a mixed signal gives very good results. It is also used for signals that are not supposed to be generated by a mixing for analysis purposes. A simple application of ICA is the "cocktail party problem", where the underlying speech signals are separated from a sample data consisting of people talking simultaneously in a room. Usually the problem is simplified by assuming no time delays or echoes. An important note to consider is that if N sources are present, at least N observations (e.g. microphones) are needed to get the original signals. This constitutes the square case (J = D, where D is the input dimension of the data and J is the dimension of the model). Other cases of underdetermined (J < D) and overdetermined (J > D) have been investigated.

Defining Component Independence

ICA finds the independent components (aka factors, latent variables or sources) by maximizing the statistical independence of the estimated components. We may choose one of many ways to define independence, and this choice governs the form of the ICA algorithms. The two broadest definitions of independence for ICA are

1) Minimization of Mutual Information

2) Maximization of non-Gaussianity

The Minimization-of-Mutual information
Mutual information
In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables...

 (MMI) family of ICA algorithms uses measures like Kullback-Leibler Divergence
Kullback–Leibler divergence
In probability theory and information theory, the Kullback–Leibler divergence is a non-symmetric measure of the difference between two probability distributions P and Q...

 and maximum-entropy. The Non-Gaussianity family of ICA algorithms, motivated by the central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...

, uses kurtosis
Kurtosis
In probability theory and statistics, kurtosis is any measure of the "peakedness" of the probability distribution of a real-valued random variable...

 and negentropy
Negentropy
The negentropy, also negative entropy or syntropy, of a living system is the entropy that it exports to keep its own entropy low; it lies at the intersection of entropy and life...

.

Typical algorithms for ICA use centering, whitening
Whitening transformation
The whitening transformation is a decorrelation method that converts the covariance matrix S of a set of samples into the identity matrix I. This effectively creates new random variables that are uncorrelated and have the same variances as the original random variables...

 (usually with the eigenvalue decomposition), and dimensionality reduction
Dimensionality reduction
In machine learning, dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction.-Feature selection:...

 as preprocessing steps in order to simplify and reduce the complexity of the problem for the actual iterative algorithm. Whitening and dimension reduction can be achieved with principal component analysis or singular value decomposition
Singular value decomposition
In linear algebra, the singular value decomposition is a factorization of a real or complex matrix, with many useful applications in signal processing and statistics....

. Whitening ensures that all dimensions are treated equally a priori before the algorithm is run. Algorithms for ICA include infomax
Infomax
Infomax is an optimization principle for neural networks and other information processing systems. It prescribes that a function that maps a set of input values I to a set of output values O should be chosen or learned so as to maximize the average Shannon mutual information between I and O,...

, FastICA
FastICA
FastICA is an efficient and popular algorithm for independent component analysis invented by Aapo Hyvärinen at Helsinki University of Technology. The algorithm is based on a fixed-point iteration scheme maximizing non-Gaussianity as a measure of statistical independence...

, and JADE, but there are many others also.

In general, ICA cannot identify the actual number of source signals, a uniquely correct ordering of the source signals, nor the proper scaling (including sign) of the source signals.

ICA is important to blind signal separation
Blind signal separation
Blind signal separation, also known as blind source separation, is the separation of a set of signals from a set of mixed signals, without the aid of information about the source signals or the mixing process....

 and has many practical applications. It is closely related to (or even a special case of) the search for a factorial code
Factorial code
Most real world data sets consist of data vectors whose individual components are not statistically independent, that is, they are redundant in the statistical sense. Then it is desirable to create a factorial code of the data, i...

 of the data, i.e., a new vector-valued representation of each data vector such that it gets uniquely encoded by the resulting code vector (loss-free coding), but the code components are statistically independent.

Mathematical definitions

Linear independent component analysis can be divided into noiseless and noisy cases,
where noiseless ICA is a special case of noisy ICA. Nonlinear ICA should be considered as a separate case.

General definition

The data is represented by the random vector  and the
components as the random vector . The task is to transform the observed data , using a linear static transformation W as,
into maximally independent components measured by some function of independence.

Linear noiseless ICA

The components of the observed random vector are generated as a sum of the independent components , :



weighted by the mixing weights .

The same generative model can be written in vectorial form as
,
where the observed random vector is represented
by the basis vectors .
The basis vectors form the columns of the mixing matrix and the generative formula can be written
as , where .

Given the model and realizations (samples) of the random vector , the task is to estimate both the mixing matrix and the sources . This is done by adaptively calculating the vectors and setting up a cost function which either maximizes the nongaussianity of the calculated or minimizes the mutual information. In some cases, a priori knowledge of the probability distributions of the sources can be used in the cost function.

The original sources can be recovered by multiplying the observed signals with the inverse of the mixing matrix , also known as the unmixing matrix. Here it is assumed that the mixing matrix is square (). If the number of basis vectors is greater than the dimensionality of the observed vectors, , the task is overcomplete but is still solvable with the pseudo inverse.

Linear noisy ICA

With the added assumption of zero-mean and uncorrelated Gaussian noise , the ICA model takes the form .

Nonlinear ICA

The mixing of the sources does not need to be linear. Using a nonlinear mixing function with parameters the nonlinear ICA model is .

Identifiability

The independent components are identifiable up to a permutation and scaling of the sources. This identifiability requires that:
  • At most one of the sources is Gaussian,
  • The number of observed mixtures, , must be at least as large as the number of estimated components : . It is equivalent to say that the mixing matrix must be of full rank
    Rank (linear algebra)
    The column rank of a matrix A is the maximum number of linearly independent column vectors of A. The row rank of a matrix A is the maximum number of linearly independent row vectors of A...

     for its inverse to exist.

Binary independent component analysis

A special variant of ICA is Binary ICA in which both signal sources and monitors are in binary form and observations from monitors are disjunctive mixtures of binary independent sources. The problem was shown to have applications in many domains including medical diagnosis
Medical diagnosis
Medical diagnosis refers both to the process of attempting to determine or identify a possible disease or disorder , and to the opinion reached by this process...

, multi-cluster assignment, network tomography
Network tomography
Network tomography is the study of a network's internal characteristics using information derived from end point data. The word tomography is used to link the field, in concept, to other processes that infer the internal characteristics of an object from external observation, as is done in magnetic...

 and internet resource management
Internet Resource Management
Internet resource management has been the domain of Internet technicians in managing the addressing structure of the Internet to enable the explosive growth of Internet use, and to have enough addressing space for that growth. There is however a different version of this concept used by upon its...

.

Let be the set of binary variables from monitors and be the set of binary variables from sources. Source-monitor connections are represented by the (unknown) mixing matrix , where indicates that signal from the i-th source can be observed by the j-th monitor. The system works as follows: at anytime, if a source is active () and it is connected to the monitor () then the monitor will observe some activity (). Formally we have:


where is Boolean AND and is Boolean OR. Note that noise is not explicitly modeled, rather, can be treated as independent sources.

The above problem can be heuristically solved by assuming variables are continuous and running FastICA
FastICA
FastICA is an efficient and popular algorithm for independent component analysis invented by Aapo Hyvärinen at Helsinki University of Technology. The algorithm is based on a fixed-point iteration scheme maximizing non-Gaussianity as a measure of statistical independence...

 on binary observation data to get the mixing matrix (real values), then apply round number
Round number
A round number is mathematically defined as the product of a considerable number of comparatively small factors as compared to its neighbouring numbers, such as 24 = 2*2*2*3 .However, a round number is informally considered to be an integer that ends with one or more zeroes , such...

 techniques on to obtain the binary values. This approach has been shown to produce a highly inaccurate result.

Another method is to use dynamic programming: recursively breaking the observation matrix into its sub-matrices and run the inference algorithm on these sub-matrices. The key observation which leads to this algorithm is the sub-matrix of where corresponds to the unbiased observation matrix of hidden components that do not have connection to the -th monitor. Experimental results from show that this approach is accurate under moderate noise levels.

See also

  • Blind deconvolution
    Blind deconvolution
    In image processing and applied mathematics, blind deconvolution is a deconvolution technique that permits recovery of the target scene from a single or set of "blurred" images in the presence of a poorly determined or unknown point spread function ....

  • Blind signal separation (BSS)
    Blind signal separation
    Blind signal separation, also known as blind source separation, is the separation of a set of signals from a set of mixed signals, without the aid of information about the source signals or the mixing process....

  • Factor analysis
    Factor analysis
    Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in three or four observed variables...

  • Factorial code
    Factorial code
    Most real world data sets consist of data vectors whose individual components are not statistically independent, that is, they are redundant in the statistical sense. Then it is desirable to create a factorial code of the data, i...

    s
  • Hilbert spectrum
    Hilbert spectrum
    The Hilbert spectrum , named after David Hilbert, is a statistical tool that can help in distinguishing among a mixture of moving signals. The spectrum itself is decomposed into its component sources using independent component analysis...

  • Image processing
    Image processing
    In electrical engineering and computer science, image processing is any form of signal processing for which the input is an image, such as a photograph or video frame; the output of image processing may be either an image or, a set of characteristics or parameters related to the image...

  • Non-negative matrix factorization (NMF)
  • Nonlinear dimensionality reduction
    Nonlinear dimensionality reduction
    High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lies on an embedded non-linear manifold within the higher-dimensional space...

  • Principal component analysis (PCA)
  • Projection pursuit
    Projection pursuit
    Projection pursuit is a type of statistical technique which involves finding the most "interesting" possible projections in multidimensional data. Often, projections which deviate more from a Normal distribution are considered to be more interesting...

  • Redundancy reduction
  • Signal processing
    Signal processing
    Signal processing is an area of systems engineering, electrical engineering and applied mathematics that deals with operations on or analysis of signals, in either discrete or continuous time...

  • Singular value decomposition (SVD)
    Singular value decomposition
    In linear algebra, the singular value decomposition is a factorization of a real or complex matrix, with many useful applications in signal processing and statistics....

  • Varimax rotation
    Varimax rotation
    In statistics, a varimax rotation is a change of coordinates used in principal component analysis and factor analysis that maximizes the sum of the variances of the squared loadings...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK