RandIndexFS

RandIndexFS calculates Rand type Indices to compare two partitions

Syntax

AR=RandIndexFS(c1,c2)example
AR=RandIndexFS(c1,c2, noisecluster)example
[AR,RI]=RandIndexFS(___)example
[AR,RI,MI]=RandIndexFS(___)example
[AR,RI,MI,HI]=RandIndexFS(___)example

Description

Suppose we want to compare two partitions summarized by the contingency table $T=[n_{ij}]$ where $i=1, 2, ..., r$ and $j=1,...,c$ and $n_{ij}$ denotes the number of data points which are in cluster i in the first partition and in cluster j in the second partition. Let A denote the number of all pairs of data points which are either put into the same cluster by both partitions or put into different clusters by both partitions. Conversely, let D denote the number of all pairs of data points that are put into one cluster in one partition, but into different clusters by the other partition. The partitions disagree for all pairs D and agree for all pairs A. A+D=totcomp= total number of comparisons.

We can measure the agreement by the Rand index A/(A+D)=A/(totcomp) which is invariant with respect to permutations of the columns or rows of T.

The index has to be corrected for agreement by chance if the sizes of the clusters are not uniform (which is usually the case). Since the Rand index lies between 0 and 1, the expected value of the Rand index (although not a constant value) must be greater than or equal to 0. On the other hand, the expected value of the adjusted Rand index has value zero and the maximum value of the adjusted Rand index is also 1. Hence, there is a wider range of values that the adjusted Rand index can take on, thus increasing the sensitivity of the index. The formula of the adjusted Rand index (AR) is given below

$AR= \frac{\mbox{RI- Expected value of RI}}{\mbox{Max Index - Expected value of RI}}$

example

AR =RandIndexFS(c1, c2) RandindexFS with the contingency table as input.

example

AR =RandIndexFS(c1, c2, noisecluster) RandindexFS with the two vectors as input.

example

[AR, RI] =RandIndexFS(___) RandindexFS with the two vectors as input.

example

[AR, RI, MI] =RandIndexFS(___) Compare ARI for iris data (true classification against tclust classification).

example

[AR, RI, MI, HI] =RandIndexFS(___) Compare ARI for iris data (exclude unassigned units from tclust).

Examples

expand all

RandindexFS with the contingency table as input.

T=[1 1 0;
1 2 1;
0 0 4];
ARI=RandIndexFS(T);

RandindexFS with the two vectors as input.

% RandindexFS with the two vectors as input.
c=[1 1;
1 2
2 1;
2 2 ;
2 2;
2 3;
3 3;
3 3;
3 3;
3 3];
% c1= numeric vector containing the labels of the first partition
c1=c(:,1);
% c1= numeric vector containing the labels of the second partition
c2=c(:,2);
ARI=RandIndexFS(c1,c2);

RandindexFS with the two vectors as input.

c=[1 1;
1 2
2 1;
2 2 ;
2 2;
2 3;
3 3;
3 3;
3 3;
3 3];
% c1= numeric vector containing the labels of the first partition
c1=c(:,1);
% c1= numeric vector containing the labels of the second partition
c2=c(:,2);
% Computation of ARI, RI, MI and HI.
[ARI,RI,MI,HI]=RandIndexFS(c1,c2);
disp('Adjusted Rand index')
disp(ARI)
disp('Rand index (RI)')
disp(RI)
disp('Mirkin index = 1-RI')
disp(MI)
disp('Hubert index = RI-MI ')
disp(HI)

Compare ARI for iris data (true classification against tclust classification).

load fisheriris
% first partition c1 is the true partition
c1=species;
% second partition c2 is the output of tclust clustering procedure
out=tclust(meas,3,0,100,'msg',0);
c2=out.idx;
[ARI,RI,MI,HI]=RandIndexFS(c1,c2);

Compare ARI for iris data (exclude unassigned units from tclust).

load fisheriris
% first partition c1 is the true partition
c1=species;
% second partition c2 is the output of tclust clustering procedure
out=tclust(meas,3,0.1,100,'msg',0);
c2=out.idx;
% Units inside c2 which contain number 0 are referred to trimmed observations
noisecluster=0;
[ARI,RI,MI,HI]=RandIndexFS(c1,c2,noisecluster);

Input Arguments

expand all

`c1` — labels of first partition or contingency table. A numeric or character vector containining the class labels of the first partition or a 2-dimensional numeric matrix which contains the cross-tabulation of cluster assignments.

Data Types: single | double | char | logical

Data Types: single| double

`c2` — labels of second partition. A numeric or character vector containining the class labels of the second partition.

The length of vector c2 must be equal to the length of vector c1. This second input is required just if c1 is not a 2-dimensional numeric matrix.

Data Types: single | double | char | logical

Data Types: single| double

Optional Arguments

`noisecluster` — label or number associated to the 'noise class' or 'noise level'. Scalar, numeric or character.

Number or character label which denotes the points which do not belong to any cluster. These points are not takern into account for the computation of the Rand type indexes. The default is to consider all points in order to compute the ARI index.

Example: 0 (in this case the units which in both partitions have 0 class label are not taken into account in the index calculations)

Data Types: double or character

Output Arguments

expand all

`AR` —Adjusted Rand index. Scalar

A number between -1 and 1.

The adjusted Rand index is the corrected-for-chance version of the Rand index.

`RI` —Rand index (unadjusted). Scalar

A number between 0 and 1.

Rand index computes the fraction of pairs of objects for which both classification methods agree.

RI ranges from 0 (no pair classified in the same way under both clusterings) to 1 (identical clusterings).

`MI` —Mirkin's index. Scalar

A number between 0 and 1.

Mirkin's index computes the percentage of pairs of objects for which both classification methods disagree. MI=1-RI.

`HI` —Hubert index. Scalar

A number between -1 and 1.

HI index is equal to the fraction of pairs of objects for which both classification methods agree minus the fraction of pairs of objects for which both classification methods disagree. HI= RI-MI.

References

Hubert L. and Arabie P. (1985), Comparing Partitions, "Journal of Classification", Vol. 2, pp. 193-218.

Acknowledgements

This function follows the lines of MATLAB code developed by David Corney (2000) D.Corney@cs.ucl.ac.uk

Documentation

RandIndexFS

Syntax

Description

Examples

RandindexFS with the contingency table as input.

RandindexFS with the two vectors as input.

RandindexFS with the two vectors as input.

Compare ARI for iris data (true classification against tclust classification).

Compare ARI for iris data (exclude unassigned units from tclust).

Input Arguments

`c1` — labels of first partition or contingency table. A numeric or character vector containining the class labels of the first partition or a 2-dimensional numeric matrix which contains the cross-tabulation of cluster assignments.

`c2` — labels of second partition. A numeric or character vector containining the class labels of the second partition.

Optional Arguments

`noisecluster` — label or number associated to the 'noise class' or 'noise level'. Scalar, numeric or character.

Output Arguments

`AR` —Adjusted Rand index. Scalar

`RI` —Rand index (unadjusted). Scalar

`MI` —Mirkin's index. Scalar

`HI` —Hubert index. Scalar

References

Acknowledgements

See Also

Documentation

RandIndexFS

Syntax

Description

Examples

RandindexFS with the contingency table as input.

RandindexFS with the two vectors as input.

RandindexFS with the two vectors as input.

Compare ARI for iris data (true classification against tclust classification).

Compare ARI for iris data (exclude unassigned units from tclust).

Input Arguments

c1 — labels of first partition or contingency table. A numeric or character vector containining the class labels of the first partition or a 2-dimensional numeric matrix which contains the cross-tabulation of cluster assignments.

c2 — labels of second partition. A numeric or character vector containining the class labels of the second partition.

Optional Arguments

noisecluster — label or number associated to the 'noise class' or 'noise level'. Scalar, numeric or character.

Output Arguments

AR —Adjusted Rand index. Scalar

RI —Rand index (unadjusted). Scalar

MI —Mirkin's index. Scalar

HI —Hubert index. Scalar

References

Acknowledgements

See Also

`c1` — labels of first partition or contingency table. A numeric or character vector containining the class labels of the first partition or a 2-dimensional numeric matrix which contains the cross-tabulation of cluster assignments.

`c2` — labels of second partition. A numeric or character vector containining the class labels of the second partition.

`noisecluster` — label or number associated to the 'noise class' or 'noise level'. Scalar, numeric or character.

`AR` —Adjusted Rand index. Scalar

`RI` —Rand index (unadjusted). Scalar

`MI` —Mirkin's index. Scalar

`HI` —Hubert index. Scalar