# overlap

overlap computes the exact overlap given the parameters of the mixture

## Syntax

• OmegaMap=overlap(k, v, Pi, Mu, S)example
• OmegaMap=overlap(k, v, Pi, Mu, S, tol)example
• OmegaMap=overlap(k, v, Pi, Mu, S, tol, lim)example
• [OmegaMap, BarOmega]=overlap(___)example
• [OmegaMap, BarOmega, MaxOmega]=overlap(___)example
• [OmegaMap, BarOmega, MaxOmega, StdOmega]=overlap(___)example
• [OmegaMap, BarOmega, MaxOmega, StdOmega, rcMax]=overlap(___)example

## Description

 OmegaMap =overlap(k, v, Pi, Mu, S) Finding exact overlap for the Iris data.

 OmegaMap =overlap(k, v, Pi, Mu, S, tol) Example of use of option tol.

 OmegaMap =overlap(k, v, Pi, Mu, S, tol, lim) Example of use of option lim.

 [OmegaMap, BarOmega] =overlap(___) Example of use of options lim and tol together.

 [OmegaMap, BarOmega, MaxOmega] =overlap(___) Display BarOmega and MaxOmega.

 [OmegaMap, BarOmega, MaxOmega, StdOmega] =overlap(___) Display StdOmega.

 [OmegaMap, BarOmega, MaxOmega, StdOmega, rcMax] =overlap(___) Display rcMax.

## Examples

expand all

### Finding exact overlap for the Iris data.

load fisheriris;
Y         = meas;
[Mu , SS] = grpstats(Y,species,{'mean',@cov});
S1        = permute(SS,[3,2,1]); % S1 should be equal to S
S = zeros(4,4,3);
S(:,:,1) = cov(Y(1:50,:));
S(:,:,2) = cov(Y(51:100,:));
S(:,:,3) = cov(Y(101:150,:));
pigen=ones(3,1)/3;
k=3;
p=4;
[OmegaMap, BarOmega, MaxOmega, StdOmega, rcMax]=overlap(k,p,pigen,Mu,S)
disp('OmegaMap= k-by-k matrix which contains misclassification probabilities')
disp(OmegaMap);
disp('Average overlap')
disp(BarOmega)
disp('Maximum overlap')
disp(MaxOmega)
disp('Groups with maximum overlap')
disp(rcMax)
OmegaMap =

1.0000    0.0000         0
0.0000    1.0000    0.0230
0    0.0263    1.0000

BarOmega =

0.0164

MaxOmega =

0.0493

StdOmega =

0.0285

rcMax =

2
3

OmegaMap= k-by-k matrix which contains misclassification probabilities
1.0000    0.0000         0
0.0000    1.0000    0.0230
0    0.0263    1.0000

Average overlap
0.0164

Maximum overlap
0.0493

Groups with maximum overlap
2
3



### Example of use of option tol.

load fisheriris;
Y=meas;
pigen=ones(3,1)/3;
k=3;
p=4;
Mu=grpstats(Y,species);
S=zeros(4,4,3);
S(:,:,1)=cov(Y(1:50,:));
S(:,:,2)=cov(Y(51:100,:));
S(:,:,3)=cov(Y(101:150,:));
[OmegaMap]=overlap(k,p,pigen,Mu,S,1e-05)

### Example of use of option lim.

load fisheriris;
Y=meas;
pigen=ones(3,1)/3;
k=3;
p=4;
Mu=grpstats(Y,species);
S=zeros(4,4,3);
S(:,:,1)=cov(Y(1:50,:));
S(:,:,2)=cov(Y(51:100,:));
S(:,:,3)=cov(Y(101:150,:));
[OmegaMap]=overlap(k,p,pigen,Mu,S,[],10000)

### Example of use of options lim and tol together.

load fisheriris;
Y=meas;
pigen=ones(3,1)/3;
k=3;
p=4;
Mu=grpstats(Y,species);
S=zeros(4,4,3);
S(:,:,1)=cov(Y(1:50,:));
S(:,:,2)=cov(Y(51:100,:));
S(:,:,3)=cov(Y(101:150,:));
[OmegaMap]=overlap(k,p,pigen,Mu,S,1e-08,100000)

### Display BarOmega and MaxOmega.

load fisheriris;
Y=meas;
pigen=ones(3,1)/3;
k=3;
p=4;
Mu=grpstats(Y,species);
S=zeros(4,4,3);
S(:,:,1)=cov(Y(1:50,:));
S(:,:,2)=cov(Y(51:100,:));
S(:,:,3)=cov(Y(101:150,:));
[OmegaMap, BarOmega, MaxOmega]=overlap(k,p,pigen,Mu,S)

### Display StdOmega.

load fisheriris;
Y=meas;
pigen=ones(3,1)/3;
k=3;
p=4;
Mu=grpstats(Y,species);
S=zeros(4,4,3);
S(:,:,1)=cov(Y(1:50,:));
S(:,:,2)=cov(Y(51:100,:));
S(:,:,3)=cov(Y(101:150,:));
[OmegaMap, BarOmega, MaxOmega, StdOmega]=overlap(k,p,pigen,Mu,S)

### Display rcMax.

load fisheriris;
Y=meas;
pigen=ones(3,1)/3;
k=3;
p=4;
Mu=grpstats(Y,species);
S=zeros(4,4,3);
S(:,:,1)=cov(Y(1:50,:));
S(:,:,2)=cov(Y(51:100,:));
S(:,:,3)=cov(Y(101:150,:));
[OmegaMap, BarOmega, MaxOmega, StdOmega, rcMax]=overlap(k,p,pigen,Mu,S)

## Input Arguments

### k — number of components (groups). Integer.

Scalar associated to the number of groups

Data Types: int16|int32|int64|single|double

### v — dimensionality (number of variables). Integer.

Scalar associated to the number of variables of the data matrix.

Data Types: int16|int32|int64|single|double

### Pi — Mixing proportions. Vector.

Vector of size k containing mixing proportions. The sum of the elements of Pi is 1.

Data Types: single| double

### Mu — centroids. Matrix.

Matrix of size k-by-v containing (in the rows) the centroids of the k groups.

Data Types: single| double

### S — Covariance matrices. 3D array.

3D array of size v-by-v-by-k containing covariance matrices of the k groups.

Data Types: single| double

### tol — tolerance. Scalar.

Default is 1e-06.

Optional parameters tol and lim will be used by function ncx2mixtcdf which computes the cdf of a linear combination of non central chi2 r.v.. This is the probability of misclassification.

Example: 'tol', 0.0001 

Data Types: double

### lim — maximum number of integration terms. Scalar.

Default is 1000000.

Optional parameters tol and lim will be used by function ncx2mixtcdf which computes the cdf of a linear combination of non central chi2 r.v.. This is the probability of misclassification.

Example: 'lim', 1000 

Data Types: double

## Output Arguments

### OmegaMap —map of misclassification probabilities.  Matrix

k-by-k matrix containing map of misclassification probabilities.

More precisely, OmegaMap(i,j) $(i ~= j)=1, 2, ..., k$ $OmegaMap(i,j) = w_{j|i}$ is the probability that X coming from the i-th component (group) is classified to the $j-th$ component.

The probability of overlapping (called pij) between groups i and j is given by $pij=pji= w_j|i + w_i|j \qquad i,j=1,2, ..., k$.

### BarOmega —Average overlap. Scalar

Scalar associated with average overlap. BarOmega is computed as sum(sum(OmegaMap))-k)/(0.5*k(k-1).

### MaxOmega —Maximum overlap. Scalar

Scalar associated with maximum overlap. MaxOmega is the maximum of OmegaMap(i,j)+OmegaMap(j,i) (i ~= j)=1, 2, ..., k.

### StdOmega —Std of overlap. Scalar

Scalar assocaited with standard deviation of overlap (that is the standard deviation of the 0.5*k(k-1) pij (probabilities of overlapping).

### rcMax —pair with largest overlap.  Vector

Column vector of length equal to 2 containing the indexes associated with the pair of components producing the highest overlap (largest off diagonal element of matrix OmegaMap).

## References

Maitra, R. and Melnykov, V. (2010), Simulating data to study performance of finite mixture modeling and clustering algorithms, "The Journal of Computational and Graphical Statistics", Vol. 19, pp. 354-376. [to refer to this publication we will use "MM2010 JCGS"]

Melnykov, V., Chen, W.-C. and Maitra, R. (2012), MixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms, "Journal of Statistical Software", Vol. 51, pp. 1-25.

Davies, R. (1980), The distribution of a linear combination of chi-square random variables, "Applied Statistics", Vol. 29, pp. 323-333.