Sørensen similarity index - AbsoluteAstronomy.com

The Sørensen index, also known as Sørensen’s similarity coefficient, is a statistic

Statistic

A statistic is a single measure of some attribute of a sample . It is calculated by applying a function to the values of the items comprising the sample which are known together as a set of data.More formally, statistical theory defines a statistic as a function of a sample where the function...

used for comparing the similarity

Similarity

-Specific definitions:Different fields provide differing definitions of similarity:-In computer science:* string metric, aka string similarity* semantic similarity in computational linguistics-In other fields:...

of two samples

Sample (statistics)

In statistics, a sample is a subset of a population. Typically, the population is very large, making a census or a complete enumeration of all the values in the population impractical or impossible. The sample represents a subset of manageable size...

. It was developed by the botanist Thorvald Sørensen

Thorvald Sørensen

Thorvald Julius Sørensen was a Danish botanist and evolutionary biologist.Sørensen was professor at the Royal Veterinary and Agricultural College 1953-1955 and at the University of Copenhagen 1955-1972...

and published in 1948.

It is often misspelled as Sorenson index, Soerenson index and Sörenson index (also with the correct ending -sen).

Sørensen's original formula was intended to be applied to presence/absence data, and is

where A and B are the number of species in samples A and B, respectively, and C is the number of species shared by the two samples; QS is the quotient of similarity and ranges from 0 - 1. This expression is easily extended to abundance

Abundance

Abundance may refer to:In science and technology:* Abundance , the opposite of scarcities* Abundance , growing food with plentiful resources that will not run out -- sunshine, CO2, and waste or brine water....

instead of presence/absence of species. This quantitative version of the Sørensen index is also known as Czekanowski
Jan Czekanowski
Jan Czekanowski was a Polish anthropologist, statistician and linguist. Czekanowski is known for having played an important role in saving the Polish-Lithuanian branch of the Karaim people from Holocaust extermination...

index. The Sørensen index is identical to Dice's coefficient which is always in [0, 1] range. The Sørensen index used as a distance measure, 1 − QS, is identical to Hellinger distance

Hellinger distance

In probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. It is a type of f-divergence...

and Bray Curtis dissimilarity

Bray Curtis dissimilarity

In ecology and biology, the Bray–Curtis dissimilarity, named after J. Roger Bray and John T. Curtis, is a statistic used to quantify the compositional dissimilarity between two different sites. It is equivalent to the total number of species that are unique to any one of the two sites divided by...

when applied to quantitative data.

The Sørensen coefficient is mainly useful for ecological community data (e.g. Looman & Campbell, 1960). Justification for its use is primarily empirical rather than theoretical (although it can be justified theoretically as the intersection of two fuzzy set

Fuzzy set

Fuzzy sets are sets whose elements have degrees of membership. Fuzzy sets were introduced simultaneously by Lotfi A. Zadeh and Dieter Klaua in 1965 as an extension of the classical notion of set. In classical set theory, the membership of elements in a set is assessed in binary terms according to...

s). As compared to Euclidean distance

Euclidean distance

In mathematics, the Euclidean distance or Euclidean metric is the "ordinary" distance between two points that one would measure with a ruler, and is given by the Pythagorean formula. By using this formula as distance, Euclidean space becomes a metric space...

, Sørensen distance retains sensitivity in more heterogeneous data sets and gives less weight to outliers .

See also