FowlkesMallowsIndex

FowlkesMallowsIndex computes the Fowlkes and Mallows index.

Syntax

  • ABk=FowlkesMallowsIndex(c1,c2)example
  • ABk=FowlkesMallowsIndex(c1,c2, noisecluster)example
  • [ABk,Bk]=FowlkesMallowsIndex(___)example
  • [ABk,Bk,EBk]=FowlkesMallowsIndex(___)example
  • [ABk,Bk,EBk,VarBk]=FowlkesMallowsIndex(___)example

Description

Fowlkes-Mallows index (see references) is an external evaluation method that is used to determine the similarity between two clusterings (clusters obtained after a clustering algorithm). This measure of similarity could be either between two hierarchical clusterings or a clustering and a benchmark classification. A higher the value for the Fowlkes-Mallows index indicates a greater similarity between the clusters and the benchmark classifications.

This index can be used to compare either two cluster label sets or a cluster label set with a true label set. The formula of the adjusted Fowlkes-Mallows index (ABk) is given below

\[ ABk= \frac{\mbox{Bk- Expected value of Bk}}{\mbox{Max Index - Expected value of Bk}} \]

example

ABk =FowlkesMallowsIndex(c1, c2) FowlkesMallowsIndex (adjusted) with the two vectors as input.

example

ABk =FowlkesMallowsIndex(c1, c2, noisecluster) FM index (adjusted) with the contingency table as input.

example

[ABk, Bk] =FowlkesMallowsIndex(___) Compare FM (unadjusted) for iris data (true classification against tclust classification).

example

[ABk, Bk, EBk] =FowlkesMallowsIndex(___) Compare FM index (unadjusted) for iris data (exclude unassigned units from tclust).

example

[ABk, Bk, EBk, VarBk] =FowlkesMallowsIndex(___) FM index (unadjusted) for iris data with 3 groups coming from single linkage.

Examples

expand all

  • FowlkesMallowsIndex (adjusted) with the two vectors as input.
  • % FowlkesMallowsIndex (adjusted) with the two vectors as input.
    c=[1 1;
    1 2
    2 1;
    2 2 ;
    2 2;
    2 3;
    3 3;
    3 3;
    3 3;
    3 3];
    % c1= numeric vector containing the labels of the first partition
    c1=c(:,1);
    % c1= numeric vector containing the labels of the second partition
    c2=c(:,2);
    FM=FowlkesMallowsIndex(c1,c2);

  • FM index (adjusted) with the contingency table as input.
  • T=[1 1 0;
    1 2 1;
    0 0 4];
    FM=FowlkesMallowsIndex(T);

  • Compare FM (unadjusted) for iris data (true classification against tclust classification).
  • load fisheriris
    % first partition c1 is the true partition
    c1=species;
    % second partition c2 is the output of tclust clustering procedure
    out=tclust(meas,3,0,100,'msg',0);
    c2=out.idx;
    [~,FM,EFM,VARFM]=FowlkesMallowsIndex(c1,c2);

  • Compare FM index (unadjusted) for iris data (exclude unassigned units from tclust).
  • load fisheriris
    % first partition c1 is the true partition
    c1=species;
    % second partition c2 is the output of tclust clustering procedure
    out=tclust(meas,3,0.1,100,'msg',0);
    c2=out.idx;
    % Units inside c2 which contain number 0 are referred to trimmed observations
    noisecluster=0;
    [~,FM,EFM,VARFM]=FowlkesMallowsIndex(c1,c2,noisecluster);

  • FM index (unadjusted) for iris data with 3 groups coming from single linkage.
  • FM index between true and empirical classification

    load fisheriris
    d = pdist(meas);
    Z = linkage(d);
    C = cluster(Z,'maxclust',3);
    [AFM,FM,FMexp,FMvar]=FowlkesMallowsIndex(C,species);
    disp('FM index is equal to')
    disp(FM)
    disp('Expectation of FM index is')
    disp(FMexp)
    disp('Variance of FM index is')
    disp(FMvar)
    disp('Adjsuted FM index is equal to')
    disp(AFM)

    Related Examples

    expand all

  • Monitoring of (adjusted) FM index for iris data using true classification as benchmark.
  • load fisheriris
    d = pdist(meas);
    Z = linkage(d);
    kk=1:15;
    % Produce agglomerative hierarchical cluster tree
    C = cluster(Z,'maxclust',kk);
    FM =zeros(length(kk)-1,1);
    for j=kk
    FM(j)=FowlkesMallowsIndex(C(:,j),species);
    end
    plot(kk,FM)
    xlabel('Number of groups')
    ylabel('Fowlkes and Mallows Index')

    Input Arguments

    expand all

    c1 — labels of first partition or contingency table. Numeric or character vector.

    A numeric or character vector containining the class labels of the first partition or a 2-dimensional numeric matrix which contains the cross-tabulation of cluster assignments.

    Data Types: single | double | char | logical

    Data Types: single| double

    c2 — labels of second partition. Numeric or character vector.

    A numeric or character vector containining the class labels of the second partition. The length of vector c2 must be equal to the length of vector c1. This second input is required just if c1 is not a 2-dimensional numeric matrix.

    Data Types: single | double | char | logical

    Data Types: single| double

    Optional Arguments

    noisecluster — label or number associated to the 'noise class' or 'noise level'. Scalar, numeric or character.

    Number or character label which denotes the points which do not belong to any cluster.

    These points are not takern into account for the computation of the Fowlkes and Mallows index

    Example: 0 (in this case the units which in of the two partitions have 0 class are not taken into account in the index calculations)

    Data Types: double or character

    Output Arguments

    expand all

    ABk —Adjusted Fowlkes and Mallows index. Scalar

    A number between -1 and 1.

    The adjusted Fowlkes and Mallows index is the corrected-for-chance version of the Fowlkes and Mallows index.

    Bk —Value of the Fowlkes and Mallows index. Scalar

    A number between 0 and 1.

    EBk —Expectation of the Fowlkes and Mallows index. Scalar

    Expected value of the index computed under the null hypothesis of no-relation.

    VarBk —Variance of the Fowlkes and Mallows index. Scalar

    Variance of the index computed under the null hypothesis of no-relation.

    References

    Fowlkes, E.B. and Mallows, C.L. (1983), A Method for Comparing Two Hierarchical Clusterings, "Journal of the American Statistical Association", Vol. 78, pp. 553-569.

    [ https://en.wikipedia.org/wiki/Fowlkes-Mallows_index ]

    See Also

    This page has been automatically generated by our routine publishFS