forecastTS

FSCorAna

FowlkesMallowsIndex

FowlkesMallowsIndex computes the Fowlkes and Mallows index.

expand all in page

Syntax

ABk=FowlkesMallowsIndex(c1,c2)example
ABk=FowlkesMallowsIndex(c1,c2, noisecluster)example
[ABk,Bk]=FowlkesMallowsIndex(___)example
[ABk,Bk,EBk]=FowlkesMallowsIndex(___)example
[ABk,Bk,EBk,VarBk]=FowlkesMallowsIndex(___)example

Description

Fowlkes-Mallows index (see references) is an external evaluation method that is used to determine the similarity between two clusterings (clusters obtained after a clustering algorithm). This measure of similarity could be either between two hierarchical clusterings or a clustering and a benchmark classification. A higher the value for the Fowlkes-Mallows index indicates a greater similarity between the clusters and the benchmark classifications.

This index can be used to compare either two cluster label sets or a cluster label set with a true label set. The formula of the adjusted Fowlkes-Mallows index (ABk) is given below

$ABk= \frac{\mbox{Bk- Expected value of Bk}}{\mbox{Max Index - Expected value of Bk}}$

example

ABk =FowlkesMallowsIndex(c1, c2) FowlkesMallowsIndex (adjusted) with the two vectors as input.

example

ABk =FowlkesMallowsIndex(c1, c2, noisecluster) FM index (adjusted) with the contingency table as input.

example

[ABk, Bk] =FowlkesMallowsIndex(___) Compare FM (unadjusted) for iris data (true classification against tclust classification).

example

[ABk, Bk, EBk] =FowlkesMallowsIndex(___) Compare FM index (unadjusted) for iris data (exclude unassigned units from tclust).

example

[ABk, Bk, EBk, VarBk] =FowlkesMallowsIndex(___) FM index (unadjusted) for iris data with 3 groups coming from single linkage.

Examples

expand all

FowlkesMallowsIndex (adjusted) with the two vectors as input.

% FowlkesMallowsIndex (adjusted) with the two vectors as input.
c=[1 1;
1 2
2 1;
2 2 ;
2 2;
2 3;
3 3;
3 3;
3 3;
3 3];
% c1= numeric vector containing the labels of the first partition
c1=c(:,1);
% c1= numeric vector containing the labels of the second partition
c2=c(:,2);
FM=FowlkesMallowsIndex(c1,c2);

FM index (adjusted) with the contingency table as input.

T=[1 1 0;
1 2 1;
0 0 4];
FM=FowlkesMallowsIndex(T);

Compare FM (unadjusted) for iris data (true classification against tclust classification).

load fisheriris
% first partition c1 is the true partition
c1=species;
% second partition c2 is the output of tclust clustering procedure
out=tclust(meas,3,0,100,'msg',0);
c2=out.idx;
[~,FM,EFM,VARFM]=FowlkesMallowsIndex(c1,c2);

Compare FM index (unadjusted) for iris data (exclude unassigned units from tclust).

load fisheriris
% first partition c1 is the true partition
c1=species;
% second partition c2 is the output of tclust clustering procedure
out=tclust(meas,3,0.1,100,'msg',0);
c2=out.idx;
% Units inside c2 which contain number 0 are referred to trimmed observations
noisecluster=0;
[~,FM,EFM,VARFM]=FowlkesMallowsIndex(c1,c2,noisecluster);

FM index (unadjusted) for iris data with 3 groups coming from single linkage.

FM index between true and empirical classification

load fisheriris
d = pdist(meas);
Z = linkage(d);
C = cluster(Z,'maxclust',3);
[AFM,FM,FMexp,FMvar]=FowlkesMallowsIndex(C,species);
disp('FM index is equal to')
disp(FM)
disp('Expectation of FM index is')
disp(FMexp)
disp('Variance of FM index is')
disp(FMvar)
disp('Adjsuted FM index is equal to')
disp(AFM)

Related Examples

expand all

Monitoring of (adjusted) FM index for iris data using true classification as benchmark.

load fisheriris
d = pdist(meas);
Z = linkage(d);
kk=1:15;
% Produce agglomerative hierarchical cluster tree
C = cluster(Z,'maxclust',kk);
FM =zeros(length(kk)-1,1);
for j=kk
FM(j)=FowlkesMallowsIndex(C(:,j),species);
end
plot(kk,FM)
xlabel('Number of groups')
ylabel('Fowlkes and Mallows Index')

Input Arguments

expand all

`c1` — labels of first partition or contingency table. Numeric or character vector.

A numeric or character vector containining the class labels of the first partition or a 2-dimensional numeric matrix which contains the cross-tabulation of cluster assignments.

Data Types: single | double | char | logical

Data Types: single| double

`c2` — labels of second partition. Numeric or character vector.

A numeric or character vector containining the class labels of the second partition. The length of vector c2 must be equal to the length of vector c1. This second input is required just if c1 is not a 2-dimensional numeric matrix.

Data Types: single | double | char | logical

Data Types: single| double

Optional Arguments

`noisecluster` — label or number associated to the 'noise class' or 'noise level'. Scalar, numeric or character.

Number or character label which denotes the points which do not belong to any cluster.

These points are not takern into account for the computation of the Fowlkes and Mallows index

Example: 0 (in this case the units which in of the two partitions have 0 class are not taken into account in the index calculations)

Data Types: double or character

Output Arguments

expand all

References

Fowlkes, E.B. and Mallows, C.L. (1983), A Method for Comparing Two Hierarchical Clusterings, "Journal of the American Statistical Association", Vol. 78, pp. 553-569.

[ https://en.wikipedia.org/wiki/Fowlkes-Mallows_index ]

Documentation

FowlkesMallowsIndex

Syntax

Description

Examples

FowlkesMallowsIndex (adjusted) with the two vectors as input.

FM index (adjusted) with the contingency table as input.

Compare FM (unadjusted) for iris data (true classification against tclust classification).

Compare FM index (unadjusted) for iris data (exclude unassigned units from tclust).

FM index (unadjusted) for iris data with 3 groups coming from single linkage.

Related Examples

Monitoring of (adjusted) FM index for iris data using true classification as benchmark.

Input Arguments

`c1` — labels of first partition or contingency table. Numeric or character vector.

`c2` — labels of second partition. Numeric or character vector.

Optional Arguments

`noisecluster` — label or number associated to the 'noise class' or 'noise level'. Scalar, numeric or character.

Output Arguments

`ABk` —Adjusted Fowlkes and Mallows index. Scalar

`Bk` —Value of the Fowlkes and Mallows index. Scalar

`EBk` —Expectation of the Fowlkes and Mallows index. Scalar

`VarBk` —Variance of the Fowlkes and Mallows index. Scalar

References

See Also

Documentation

FowlkesMallowsIndex

Syntax

Description

Examples

FowlkesMallowsIndex (adjusted) with the two vectors as input.

FM index (adjusted) with the contingency table as input.

Compare FM (unadjusted) for iris data (true classification against tclust classification).

Compare FM index (unadjusted) for iris data (exclude unassigned units from tclust).

FM index (unadjusted) for iris data with 3 groups coming from single linkage.

Related Examples

Monitoring of (adjusted) FM index for iris data using true classification as benchmark.

Input Arguments

c1 — labels of first partition or contingency table. Numeric or character vector.

c2 — labels of second partition. Numeric or character vector.

Optional Arguments

noisecluster — label or number associated to the 'noise class' or 'noise level'. Scalar, numeric or character.

Output Arguments

ABk —Adjusted Fowlkes and Mallows index. Scalar

Bk —Value of the Fowlkes and Mallows index. Scalar

EBk —Expectation of the Fowlkes and Mallows index. Scalar

VarBk —Variance of the Fowlkes and Mallows index. Scalar

References

See Also

`c1` — labels of first partition or contingency table. Numeric or character vector.

`c2` — labels of second partition. Numeric or character vector.

`noisecluster` — label or number associated to the 'noise class' or 'noise level'. Scalar, numeric or character.

`ABk` —Adjusted Fowlkes and Mallows index. Scalar

`Bk` —Value of the Fowlkes and Mallows index. Scalar

`EBk` —Expectation of the Fowlkes and Mallows index. Scalar

`VarBk` —Variance of the Fowlkes and Mallows index. Scalar