# FowlkesMallowsIndex

FowlkesMallowsIndex computes the Fowlkes and Mallows index.

## Syntax

• ABk=FowlkesMallowsIndex(c1,c2)example
• [ABk,Bk]=FowlkesMallowsIndex(___)example
• [ABk,Bk,EBk]=FowlkesMallowsIndex(___)example
• [ABk,Bk,EBk,VarBk]=FowlkesMallowsIndex(___)example

## Description

Fowlkes-Mallows index (see references) is an external evaluation method that is used to determine the similarity between two clusterings (clusters obtained after a clustering algorithm). This measure of similarity could be either between two hierarchical clusterings or a clustering and a benchmark classification. A higher the value for the Fowlkes-Mallows index indicates a greater similarity between the clusters and the benchmark classifications.

This index can be used to compare either two cluster label sets or a cluster label set with a true label set. The formula of the adjusted Fowlkes-Mallows index (ABk) is given below

$ABk= \frac{\mbox{Bk- Expected value of Bk}}{\mbox{Max Index - Expected value of Bk}}$

 ABk =FowlkesMallowsIndex(c1, c2) FowlkesMallowsIndex (adjusted) with the two vectors as input.

 [ABk, Bk] =FowlkesMallowsIndex(___) FM index (adjusted) with the contingency table as input.

 [ABk, Bk, EBk] =FowlkesMallowsIndex(___) Compare FM (unadjusted) for iris data (true classification against tclust classification).

 [ABk, Bk, EBk, VarBk] =FowlkesMallowsIndex(___) Compare FM index (unadjusted) for iris data (exclude unassigned units from tclust).

## Examples

expand all

### FowlkesMallowsIndex (adjusted) with the two vectors as input.

% FowlkesMallowsIndex (adjusted) with the two vectors as input.
c=[1 1;
1 2
2 1;
2 2 ;
2 2;
2 3;
3 3;
3 3;
3 3;
3 3];
% c1= numeric vector containing the labels of the first partition
c1=c(:,1);
% c1= numeric vector containing the labels of the second partition
c2=c(:,2);
FM=FowlkesMallowsIndex(c1,c2);

### FM index (adjusted) with the contingency table as input.

T=[1 1 0;
1 2 1;
0 0 4];
FM=FowlkesMallowsIndex(T);

### Compare FM (unadjusted) for iris data (true classification against tclust classification).

load fisheriris
% first partition c1 is the true partition
c1=species;
% second partition c2 is the output of tclust clustering procedure
out=tclust(meas,3,0,100,'msg',0);
c2=out.idx;
[~,FM,EFM,VARFM]=FowlkesMallowsIndex(c1,c2);

### Compare FM index (unadjusted) for iris data (exclude unassigned units from tclust).

load fisheriris
% first partition c1 is the true partition
c1=species;
% second partition c2 is the output of tclust clustering procedure
out=tclust(meas,3,0.1,100,'msg',0);
c2=out.idx;
% Units inside c2 which contain number 0 are referred to trimmed observations
noisecluster=0;
[~,FM,EFM,VARFM]=RandIndexFS(c1,c2,noisecluster);

### FM index (unadjusted) for iris data with 3 groups coming from single linkage.

FM index between true and empirical classification

load fisheriris
d = pdist(meas);
C = cluster(Z,'maxclust',3);
[AFM,FM,FMexp,FMvar]=FowlkesMallowsIndex(C,species);
disp('FM index is equal to')
disp(FM)
disp('Expectation of FM index is')
disp(FMexp)
disp('Variance of FM index is')
disp(FMvar)
disp('Adjsuted FM index is equal to')
disp(AFM)

### Monitoring of (adjusted) FM index for iris data using true classification as benchmark.

load fisheriris
d = pdist(meas);
kk=1:15;
% Produce agglomerative hierarchical cluster tree
C = cluster(Z,'maxclust',kk);
FM =zeros(length(kk)-1,1);
for j=kk
FM(j)=FowlkesMallowsIndex(C(:,j),species);
end
plot(kk,FM)
xlabel('Number of groups')
ylabel('Fowlkes and Mallows Index')

## Input Arguments

### c1 — labels of first partition or contingency table. Numeric or character vector.

A numeric or character vector containining the class labels of the first partition or a 2-dimensional numeric matrix which contains the cross-tabulation of cluster assignments.

Data Types: single | double | char | logical

Data Types: single| double

### c2 — labels of second partition. Numeric or character vector.

A numeric or character vector containining the class labels of the second partition. The length of vector c2 must be equal to the length of vector c1. This second input is required just if c1 is not a 2-dimensional numeric matrix.

Data Types: single | double | char | logical

Data Types: single| double

## Output Arguments

### ABk —Adjusted Fowlkes and Mallows index. Scalar

A number between -1 and 1.

The adjusted Fowlkes and Mallows index is the corrected-for-chance version of the Fowlkes and Mallows index.

### Bk —Value of the Fowlkes and Mallows index. Scalar

A number between 0 and 1.

### EBk —Expectation of the Fowlkes and Mallows index. Scalar

Expected value of the index computed under the null hypothesis of no-relation.

### VarBk —Variance of the Fowlkes and Mallows index. Scalar

Variance of the index computed under the null hypothesis of no-relation.

## References

Fowlkes, E.B. and Mallows, C.L. (1983), A Method for Comparing Two Hierarchical Clusterings, "Journal of the American Statistical Association", Vol. 78, pp. 553-569.