overlap

load fisheriris;
Y         = meas;
[Mu , SS] = grpstats(Y,species,{'mean',@cov});
S1        = permute(SS,[3,2,1]); % S1 should be equal to S
S = zeros(4,4,3);
S(:,:,1) = cov(Y(1:50,:));
S(:,:,2) = cov(Y(51:100,:));
S(:,:,3) = cov(Y(101:150,:));
pigen=ones(3,1)/3;
k=3;
p=4;
[OmegaMap, BarOmega, MaxOmega, StdOmega, rcMax]=overlap(k,p,pigen,Mu,S)
disp('OmegaMap= k-by-k matrix which contains misclassification probabilities')
disp(OmegaMap);
disp('Average overlap')
disp(BarOmega)
disp('Maximum overlap')
disp(MaxOmega)
disp('Groups with maximum overlap')
disp(rcMax)

OmegaMap =

          1.00          0.00             0
          0.00          1.00          0.02
             0          0.03          1.00


BarOmega =

          0.02


MaxOmega =

          0.05


StdOmega =

          0.03


rcMax =

          2.00
          3.00

OmegaMap= k-by-k matrix which contains misclassification probabilities
          1.00          0.00             0
          0.00          1.00          0.02
             0          0.03          1.00

Average overlap
          0.02

Maximum overlap
          0.05

Groups with maximum overlap
          2.00
          3.00

Example of use of option tol.

load fisheriris;
Y=meas;
pigen=ones(3,1)/3;
k=3;
p=4;
Mu=grpstats(Y,species);
S=zeros(4,4,3);
S(:,:,1)=cov(Y(1:50,:));
S(:,:,2)=cov(Y(51:100,:));
S(:,:,3)=cov(Y(101:150,:));
[OmegaMap]=overlap(k,p,pigen,Mu,S,1e-05)

Example of use of option lim.

load fisheriris;
Y=meas;
pigen=ones(3,1)/3;
k=3;
p=4;
Mu=grpstats(Y,species);
S=zeros(4,4,3);
S(:,:,1)=cov(Y(1:50,:));
S(:,:,2)=cov(Y(51:100,:));
S(:,:,3)=cov(Y(101:150,:));
[OmegaMap]=overlap(k,p,pigen,Mu,S,[],10000)

Example of use of options lim and tol together.

load fisheriris;
Y=meas;
pigen=ones(3,1)/3;
k=3;
p=4;
Mu=grpstats(Y,species);
S=zeros(4,4,3);
S(:,:,1)=cov(Y(1:50,:));
S(:,:,2)=cov(Y(51:100,:));
S(:,:,3)=cov(Y(101:150,:));
[OmegaMap]=overlap(k,p,pigen,Mu,S,1e-08,100000)

Display BarOmega and MaxOmega.

load fisheriris;
Y=meas;
pigen=ones(3,1)/3;
k=3;
p=4;
Mu=grpstats(Y,species);
S=zeros(4,4,3);
S(:,:,1)=cov(Y(1:50,:));
S(:,:,2)=cov(Y(51:100,:));
S(:,:,3)=cov(Y(101:150,:));
[OmegaMap, BarOmega, MaxOmega]=overlap(k,p,pigen,Mu,S)

Display StdOmega.

load fisheriris;
Y=meas;
pigen=ones(3,1)/3;
k=3;
p=4;
Mu=grpstats(Y,species);
S=zeros(4,4,3);
S(:,:,1)=cov(Y(1:50,:));
S(:,:,2)=cov(Y(51:100,:));
S(:,:,3)=cov(Y(101:150,:));
[OmegaMap, BarOmega, MaxOmega, StdOmega]=overlap(k,p,pigen,Mu,S)

Display rcMax.

load fisheriris;
Y=meas;
pigen=ones(3,1)/3;
k=3;
p=4;
Mu=grpstats(Y,species);
S=zeros(4,4,3);
S(:,:,1)=cov(Y(1:50,:));
S(:,:,2)=cov(Y(51:100,:));
S(:,:,3)=cov(Y(101:150,:));
[OmegaMap, BarOmega, MaxOmega, StdOmega, rcMax]=overlap(k,p,pigen,Mu,S)

Input Arguments

expand all

`k` — number of components (groups). Integer.

Scalar associated to the number of groups

Data Types: int16|int32|int64|single|double

`v` — dimensionality (number of variables). Integer.

Scalar associated to the number of variables of the data matrix.

Data Types: int16|int32|int64|single|double

`Pi` — Mixing proportions. Vector.

Vector of size k containing mixing proportions. The sum of the elements of Pi is 1.

Data Types: single| double

`Mu` — centroids. Matrix.

Matrix of size k-by-v containing (in the rows) the centroids of the k groups.

Data Types: single| double

`S` — Covariance matrices. 3D array.

3D array of size v-by-v-by-k containing covariance matrices of the k groups.

Data Types: single| double

Optional Arguments

`tol` — tolerance. Scalar.

Default is 1e-06.

Optional parameters tol and lim will be used by function ncx2mixtcdf which computes the cdf of a linear combination of non central chi2 r.v.. This is the probability of misclassification.

Example: 'tol', 0.0001

Data Types: double

`lim` — maximum number of integration terms. Scalar.

Default is 1000000.

Optional parameters tol and lim will be used by function ncx2mixtcdf which computes the cdf of a linear combination of non central chi2 r.v.. This is the probability of misclassification.

Example: 'lim', 1000

Data Types: double

Output Arguments

expand all

`OmegaMap` —map of misclassification probabilities. Matrix

k-by-k matrix containing map of misclassification probabilities.

More precisely, OmegaMap(i,j) $(i ~= j)=1, 2, ..., k$ $OmegaMap(i,j) = w_{j|i}$ is the probability that X coming from the i-th component (group) is classified to the $j-th$ component.

The probability of overlapping (called pij) between groups i and j is given by $pij=pji= w_j|i + w_i|j \qquad i,j=1,2, ..., k$ .

`BarOmega` —Average overlap. Scalar

Scalar associated with average overlap. BarOmega is computed as sum(sum(OmegaMap))-k)/(0.5*k(k-1).

`MaxOmega` —Maximum overlap. Scalar

Scalar associated with maximum overlap. MaxOmega is the maximum of OmegaMap(i,j)+OmegaMap(j,i) (i ~= j)=1, 2, ..., k.

`StdOmega` —Std of overlap. Scalar

Scalar assocaited with standard deviation of overlap (that is the standard deviation of the 0.5*k(k-1) pij (probabilities of overlapping).

`rcMax` —pair with largest overlap. Vector

Column vector of length equal to 2 containing the indexes associated with the pair of components producing the highest overlap (largest off diagonal element of matrix OmegaMap).

References

Maitra, R. and Melnykov, V. (2010), Simulating data to study performance of finite mixture modeling and clustering algorithms, "The Journal of Computational and Graphical Statistics", Vol. 19, pp. 354-376. [to refer to this publication we will use "MM2010 JCGS"]

Melnykov, V., Chen, W.-C. and Maitra, R. (2012), MixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms, "Journal of Statistical Software", Vol. 51, pp. 1-25.

Davies, R. (1980), The distribution of a linear combination of chi-square random variables, "Applied Statistics", Vol. 29, pp. 323-333.

Documentation

overlap

Syntax

Description

Examples

Finding exact overlap for the Iris data.

Example of use of option tol.

Example of use of option lim.

Example of use of options lim and tol together.

Display BarOmega and MaxOmega.

Display StdOmega.

Display rcMax.

Input Arguments

`k` — number of components (groups). Integer.

`v` — dimensionality (number of variables). Integer.

`Pi` — Mixing proportions. Vector.

`Mu` — centroids. Matrix.

`S` — Covariance matrices. 3D array.

Optional Arguments

`tol` — tolerance. Scalar.

`lim` — maximum number of integration terms. Scalar.

Output Arguments

`OmegaMap` —map of misclassification probabilities. Matrix

`BarOmega` —Average overlap. Scalar

`MaxOmega` —Maximum overlap. Scalar

`StdOmega` —Std of overlap. Scalar

`rcMax` —pair with largest overlap. Vector

References

See Also

overlap

Syntax

Description

Examples

Input Arguments

k — number of components (groups). Integer.

v — dimensionality (number of variables). Integer.

Pi — Mixing proportions. Vector.

Mu — centroids. Matrix.

S — Covariance matrices. 3D array.

Optional Arguments

tol — tolerance. Scalar.

lim — maximum number of integration terms. Scalar.

Output Arguments

OmegaMap —map of misclassification probabilities. Matrix

BarOmega —Average overlap. Scalar

MaxOmega —Maximum overlap. Scalar

StdOmega —Std of overlap. Scalar

rcMax —pair with largest overlap. Vector

References

See Also

`k` — number of components (groups). Integer.

`v` — dimensionality (number of variables). Integer.

`Pi` — Mixing proportions. Vector.

`Mu` — centroids. Matrix.

`S` — Covariance matrices. 3D array.

`tol` — tolerance. Scalar.

`lim` — maximum number of integration terms. Scalar.

`OmegaMap` —map of misclassification probabilities. Matrix

`BarOmega` —Average overlap. Scalar

`MaxOmega` —Maximum overlap. Scalar

`StdOmega` —Std of overlap. Scalar

`rcMax` —pair with largest overlap. Vector