Rand index
Encyclopedia
The Rand index or Rand measure (named after William M. Rand) in statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, and in particular in data clustering
Data clustering
Cluster analysis or clustering is the task of assigning a set of objects into groups so that the objects in the same cluster are more similar to each other than to those in other clusters....

, is a measure of the similarity between two data clustering
Data clustering
Cluster analysis or clustering is the task of assigning a set of objects into groups so that the objects in the same cluster are more similar to each other than to those in other clusters....

s. The adjusted-for-chance form of the Rand index is the adjusted Rand index.

Definition

Given a set of elements  and two partitions
Partition of a set
In mathematics, a partition of a set X is a division of X into non-overlapping and non-empty "parts" or "blocks" or "cells" that cover all of X...

of to compare, and , the following is defined:
  • , the number of pairs of elements in that are in the same set in and in the same set in

  • , the number of pairs of elements in that are in different sets in and in different sets in

  • , the number of pairs of elements in that are in the same set in and in different sets in

  • , the number of pairs of elements in that are in different sets in and in the same set in


The Rand index, , is:
Intuitively, can be considered as the number of agreements between and and as the number of disagreements between and .

Properties

The Rand index has a value between 0 and 1, with 0 indicating that the two data clusters do not agree on any pair of points and 1 indicating that the data clusters are exactly the same.

In mathematical terms, a, b, c, d are defined as follows:
  • , where

  • , where

  • , where

  • , where


for some .

Adjusted Rand index

The adjusted Rand index is the corrected-for-chance version of the Rand index.

The contingency table

Given a set of data points and two groupings (e.g. clusterings) of these points, namely and , the overlappings between and can be summarized in a contingency table where denotes the number of common objects of groups and : .
U\V Sums
Sums

Definition

The adjusted form of the Rand Index, the Adjusted Rand Index, is , more specifically



where are values from the contingency table.

Properties

The maximum value of the Adjusted Rand Index is 1.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK