RandIndexFS

RandIndexFS calculates Rand type Indices to compare two partitions

Syntax

  • AR=RandIndexFS(c1,c2)example
  • AR=RandIndexFS(c1,c2, noisecluster)example
  • [AR,RI]=RandIndexFS(___)example
  • [AR,RI,MI]=RandIndexFS(___)example
  • [AR,RI,MI,HI]=RandIndexFS(___)example

Description

Suppose we want to compare two partitions summarized by the contingency table $T=[n_{ij}]$ where $i=1, 2, ..., r$ and $j=1,...,c$ and $n_{ij}$ denotes the number of data points which are in cluster i in the first partition and in cluster j in the second partition. Let A denote the number of all pairs of data points which are either put into the same cluster by both partitions or put into different clusters by both partitions. Conversely, let D denote the number of all pairs of data points that are put into one cluster in one partition, but into different clusters by the other partition. The partitions disagree for all pairs D and agree for all pairs A. A+D=totcomp= total number of comparisons.

We can measure the agreement by the Rand index A/(A+D)=A/(totcomp) which is invariant with respect to permutations of the columns or rows of T.

The index has to be corrected for agreement by chance if the sizes of the clusters are not uniform (which is usually the case). Since the Rand index lies between 0 and 1, the expected value of the Rand index (although not a constant value) must be greater than or equal to 0. On the other hand, the expected value of the adjusted Rand index has value zero and the maximum value of the adjusted Rand index is also 1. Hence, there is a wider range of values that the adjusted Rand index can take on, thus increasing the sensitivity of the index. The formula of the adjusted Rand index (AR) is given below

\[ AR= \frac{\mbox{RI- Expected value of RI}}{\mbox{Max Index - Expected value of RI}} \]

example

AR =RandIndexFS(c1, c2) RandindexFS with the contingency table as input.

example

AR =RandIndexFS(c1, c2, noisecluster) RandindexFS with the two vectors as input.

example

[AR, RI] =RandIndexFS(___) RandindexFS with the two vectors as input.

example

[AR, RI, MI] =RandIndexFS(___) Compare ARI for iris data (true classification against tclust classification).

example

[AR, RI, MI, HI] =RandIndexFS(___) Compare ARI for iris data (exclude unassigned units from tclust).

Examples

expand all

  • RandindexFS with the contingency table as input.
  • T=[1 1 0;
    1 2 1;
    0 0 4];
    ARI=RandIndexFS(T);

  • RandindexFS with the two vectors as input.
  • % RandindexFS with the two vectors as input.
    c=[1 1;
    1 2
    2 1;
    2 2 ;
    2 2;
    2 3;
    3 3;
    3 3;
    3 3;
    3 3];
    % c1= numeric vector containing the labels of the first partition
    c1=c(:,1);
    % c1= numeric vector containing the labels of the second partition
    c2=c(:,2);
    ARI=RandIndexFS(c1,c2);

  • RandindexFS with the two vectors as input.
  • c=[1 1;
    1 2
    2 1;
    2 2 ;
    2 2;
    2 3;
    3 3;
    3 3;
    3 3;
    3 3];
    % c1= numeric vector containing the labels of the first partition
    c1=c(:,1);
    % c1= numeric vector containing the labels of the second partition
    c2=c(:,2);
    % Computation of ARI, RI, MI and HI.
    [ARI,RI,MI,HI]=RandIndexFS(c1,c2);
    disp('Adjusted Rand index')
    disp(ARI)
    disp('Rand index (RI)')
    disp(RI)
    disp('Mirkin index = 1-RI')
    disp(MI)
    disp('Hubert index = RI-MI ')
    disp(HI)

  • Compare ARI for iris data (true classification against tclust classification).
  • load fisheriris
    % first partition c1 is the true partition
    c1=species;
    % second partition c2 is the output of tclust clustering procedure
    out=tclust(meas,3,0,100,'msg',0);
    c2=out.idx;
    [ARI,RI,MI,HI]=RandIndexFS(c1,c2);

  • Compare ARI for iris data (exclude unassigned units from tclust).
  • load fisheriris
    % first partition c1 is the true partition
    c1=species;
    % second partition c2 is the output of tclust clustering procedure
    out=tclust(meas,3,0.1,100,'msg',0);
    c2=out.idx;
    % Units inside c2 which contain number 0 are referred to trimmed observations
    noisecluster=0;
    [ARI,RI,MI,HI]=RandIndexFS(c1,c2,noisecluster);

    Input Arguments

    expand all

    c1 — labels of first partition or contingency table. A numeric or character vector containining the class labels of the first partition or a 2-dimensional numeric matrix which contains the cross-tabulation of cluster assignments.

    Data Types: single | double | char | logical

    Data Types: single| double

    c2 — labels of second partition. A numeric or character vector containining the class labels of the second partition.

    The length of vector c2 must be equal to the length of vector c1. This second input is required just if c1 is not a 2-dimensional numeric matrix.

    Data Types: single | double | char | logical

    Data Types: single| double

    Optional Arguments

    noisecluster — label or number associated to the 'noise class' or 'noise level'. Scalar, numeric or character.

    Number or character label which denotes the points which do not belong to any cluster. These points are not takern into account for the computation of the Rand type indexes. The default is to consider all points in order to compute the ARI index.

    Example: 0 (in this case the units which in of the two partitions have 0 class are not taken into account in the index calculations)

    Data Types: double or character

    Output Arguments

    expand all

    AR —Adjusted Rand index. Scalar

    A number between -1 and 1.

    The adjusted Rand index is the corrected-for-chance version of the Rand index.

    RI —Rand index (unadjusted). Scalar

    A number between 0 and 1.

    Rand index computes the fraction of pairs of objects for which both classification methods agree.

    RI ranges from 0 (no pair classified in the same way under both clusterings) to 1 (identical clusterings).

    MI —Mirkin's index. Scalar

    A number between 0 and 1.

    Mirkin's index computes the percentage of pairs of objects for which both classification methods disagree. MI=1-RI.

    HI —Hubert index. Scalar

    A number between -1 and 1.

    HI index is equal to the fraction of pairs of objects for which both classification methods agree minus the fraction of pairs of objects for which both classification methods disagree. HI= RI-MI.

    References

    Hubert L. and Arabie P. (1985), Comparing Partitions, "Journal of Classification", Vol. 2, pp. 193-218.

    Acknowledgements

    This function follows the lines of MATLAB code developed by David Corney (2000) D.Corney@cs.ucl.ac.uk

    See Also

    |

    This page has been automatically generated by our routine publishFS