Bhattacharyya distance
Encyclopedia
In statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, the Bhattacharyya distance measures the similarity of two discrete or continuous probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

s. It is closely related to the Bhattacharyya coefficient which is a measure of the amount of overlap between two statistical
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

 samples or populations. Both measures are named after A. Bhattacharya
A. Bhattacharya
A. Bhattacharya was an Indian statistician who worked at the Indian Statistical Institute founded by P C Mahalanobis in the 1930s and early 40s. He made fundamental contributions to multivariate statistics, particularly for his measure of similarity between two multinomial distributions, known as...

, a statistician
Statistician
A statistician is someone who works with theoretical or applied statistics. The profession exists in both the private and public sectors. The core of that work is to measure, interpret, and describe the world and human activity patterns within it...

 who worked in the 1930s at the Indian Statistical Institute
Indian Statistical Institute
Indian Statistical Institute is a public research institute and university in Kolkata's northern outskirt of Baranagar, India founded by Prasanta Chandra Mahalanobis in 1931...

. The coefficient can be used to determine the relative closeness of the two samples being considered. It is used to measure the separability of classes in classification.

Definition

For discrete probability distributions p and q over the same domain X, it is defined as:
where:
is the Bhattacharyya coefficient.

For continuous probability distributions, the Bhattacharyya coefficient is defined as:

In either case, and . does not obey the triangle inequality
Triangle inequality
In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side ....

, but the Hellinger distance
Hellinger distance
In probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. It is a type of f-divergence...

  does obey the triangle inequality.

For multivariate Gaussian distributions ,,
where and are the means and covariances of the distributions, and.
Note that, in this case, the first term in the Bhattacharyya distance is related to the Mahalanobis distance
Mahalanobis distance
In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936. It is based on correlations between variables by which different patterns can be identified and analyzed. It gauges similarity of an unknown sample set to a known one. It differs from Euclidean...

.

Bhattacharyya coefficient

The Bhattacharyya coefficient is an approximate measurement
Measurement
Measurement is the process or the result of determining the ratio of a physical quantity, such as a length, time, temperature etc., to a unit of measurement, such as the metre, second or degree Celsius...

 of the amount of overlap between two statistical
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

 samples. The coefficient can be used to determine the relative closeness of the two samples being considered.

Calculating the Bhattacharyya coefficient involves a rudimentary form of integration
Integral
Integration is an important concept in mathematics and, together with its inverse, differentiation, is one of the two main operations in calculus...

 of the overlap of the two samples. The interval of the values of the two samples is split into a chosen number of partition
Partition of an interval
In mathematics, a partition, P of an interval [a, b] on the real line is a finite sequence of the formIn mathematics, a partition, P of an interval [a, b] on the real line is a finite sequence of the form...

s, and the number of members of each sample in each partition is used in the following formula,


where considering the samples a and b, n is the number of partitions, and ai, bi are the number of members of samples a and b in the i'th partition.

This formula hence is larger with each partition that has members from both sample, and larger with each partition that has a large overlap of the two sample's members within it. The choice of number of partitions depends on the number of members in each sample; too few partitions will lose accuracy by over-estimating the overlap region, and too many partitions will lose accuracy by creating individual partitions with no members despite being in a surroundingly populated sample space.

The Bhattacharyya coefficient will be 0 if there is no overlap at all due to the multiplication by zero in every partition. This means the distance between fully separated samples will not be exposed by this coefficient alone.

See also

  • Kullback-Leibler divergence
  • Hellinger distance
    Hellinger distance
    In probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. It is a type of f-divergence...

  • Mahalanobis distance
    Mahalanobis distance
    In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936. It is based on correlations between variables by which different patterns can be identified and analyzed. It gauges similarity of an unknown sample set to a known one. It differs from Euclidean...

  • Chernoff bound
    Chernoff bound
    In probability theory, the Chernoff bound, named after Herman Chernoff, gives exponentially decreasing bounds on tail distributions of sums of independent random variables...

  • Rényi entropy
    Rényi entropy
    In information theory, the Rényi entropy, a generalisation of Shannon entropy, is one of a family of functionals for quantifying the diversity, uncertainty or randomness of a system...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK