# overlap

overlap computes the exact overlap given the parameters of the mixture

## Syntax

• OmegaMap=overlap(k, v, Pi, Mu, S)example
• OmegaMap=overlap(k, v, Pi, Mu, S, tol)example
• OmegaMap=overlap(k, v, Pi, Mu, S, tol, lim)example
• [OmegaMap, BarOmega]=overlap(___)example
• [OmegaMap, BarOmega, MaxOmega]=overlap(___)example
• [OmegaMap, BarOmega, MaxOmega, StdOmega]=overlap(___)example
• [OmegaMap, BarOmega, MaxOmega, StdOmega, rcMax]=overlap(___)example

## Description

 OmegaMap =overlap(k, v, Pi, Mu, S) Finding exact overlap for the Iris data.

 OmegaMap =overlap(k, v, Pi, Mu, S, tol) Example of use of option tol.

 OmegaMap =overlap(k, v, Pi, Mu, S, tol, lim) Example of use of option lim.

 [OmegaMap, BarOmega] =overlap(___) Example of use of options lim and tol together.

 [OmegaMap, BarOmega, MaxOmega] =overlap(___) Display BarOmega and MaxOmega.

 [OmegaMap, BarOmega, MaxOmega, StdOmega] =overlap(___) Display StdOmega.

 [OmegaMap, BarOmega, MaxOmega, StdOmega, rcMax] =overlap(___) Display rcMax.

## Examples

expand all

### Finding exact overlap for the Iris data.

load fisheriris;
Y         = meas;
[Mu , SS] = grpstats(Y,species,{'mean',@cov});
S1        = permute(SS,[3,2,1]); % S1 should be equal to S
S = zeros(4,4,3);
S(:,:,1) = cov(Y(1:50,:));
S(:,:,2) = cov(Y(51:100,:));
S(:,:,3) = cov(Y(101:150,:));
pigen=ones(3,1)/3;
k=3;
p=4;
[OmegaMap, BarOmega, MaxOmega, StdOmega, rcMax]=overlap(k,p,pigen,Mu,S)
disp('OmegaMap= k-by-k matrix which contains misclassification probabilities')
disp(OmegaMap);
disp('Average overlap')
disp(BarOmega)
disp('Maximum overlap')
disp(MaxOmega)
disp('Groups with maximum overlap')
disp(rcMax)
OmegaMap =

1.0000    0.0000         0
0.0000    1.0000    0.0230
0    0.0263    1.0000

BarOmega =

0.0164

MaxOmega =

0.0493

StdOmega =

0.0285

rcMax =

2
3

OmegaMap= k-by-k matrix which contains misclassification probabilities
1.0000    0.0000         0
0.0000    1.0000    0.0230
0    0.0263    1.0000

Average overlap
0.0164

Maximum overlap
0.0493

Groups with maximum overlap
2
3



### Example of use of option tol.

load fisheriris;
Y=meas;
pigen=ones(3,1)/3;
k=3;
p=4;
Mu=grpstats(Y,species);
S=zeros(4,4,3);
S(:,:,1)=cov(Y(1:50,:));
S(:,:,2)=cov(Y(51:100,:));
S(:,:,3)=cov(Y(101:150,:));
[OmegaMap]=overlap(k,p,pigen,Mu,S,1e-05)

### Example of use of option lim.

load fisheriris;
Y=meas;
pigen=ones(3,1)/3;
k=3;
p=4;
Mu=grpstats(Y,species);
S=zeros(4,4,3);
S(:,:,1)=cov(Y(1:50,:));
S(:,:,2)=cov(Y(51:100,:));
S(:,:,3)=cov(Y(101:150,:));
[OmegaMap]=overlap(k,p,pigen,Mu,S,[],10000)

### Example of use of options lim and tol together.

load fisheriris;
Y=meas;
pigen=ones(3,1)/3;
k=3;
p=4;
Mu=grpstats(Y,species);
S=zeros(4,4,3);
S(:,:,1)=cov(Y(1:50,:));
S(:,:,2)=cov(Y(51:100,:));
S(:,:,3)=cov(Y(101:150,:));
[OmegaMap]=overlap(k,p,pigen,Mu,S,1e-08,100000)

### Display BarOmega and MaxOmega.

load fisheriris;
Y=meas;
pigen=ones(3,1)/3;
k=3;
p=4;
Mu=grpstats(Y,species);
S=zeros(4,4,3);
S(:,:,1)=cov(Y(1:50,:));
S(:,:,2)=cov(Y(51:100,:));
S(:,:,3)=cov(Y(101:150,:));
[OmegaMap, BarOmega, MaxOmega]=overlap(k,p,pigen,Mu,S)

### Display StdOmega.

load fisheriris;
Y=meas;
pigen=ones(3,1)/3;
k=3;
p=4;
Mu=grpstats(Y,species);
S=zeros(4,4,3);
S(:,:,1)=cov(Y(1:50,:));
S(:,:,2)=cov(Y(51:100,:));
S(:,:,3)=cov(Y(101:150,:));
[OmegaMap, BarOmega, MaxOmega, StdOmega]=overlap(k,p,pigen,Mu,S)

### Display rcMax.

load fisheriris;
Y=meas;
pigen=ones(3,1)/3;
k=3;
p=4;
Mu=grpstats(Y,species);
S=zeros(4,4,3);
S(:,:,1)=cov(Y(1:50,:));
S(:,:,2)=cov(Y(51:100,:));
S(:,:,3)=cov(Y(101:150,:));
[OmegaMap, BarOmega, MaxOmega, StdOmega, rcMax]=overlap(k,p,pigen,Mu,S)

## Input Arguments

### k — number of components (groups). Integer.

Scalar associated to the number of groups

Data Types: int16|int32|int64|single|double

### v — dimensionality (number of variables). Integer.

Scalar associated to the number of variables of the data matrix.

Data Types: int16|int32|int64|single|double

### Pi — Mixing proportions. Vector.

Vector of size k containing mixing proportions. The sum of the elements of Pi is 1.

Data Types: single| double

### Mu — centroids. Matrix.

Matrix of size k-by-v containing (in the rows) the centroids of the k groups.

Data Types: single| double

### S — Covariance matrices. 3D array.

3D array of size v-by-v-by-k containing covariance matrices of the k groups.

Data Types: single| double

### tol — tolerance. Scalar.

Default is 1e-06.

Optional parameters tol and lim will be used by function ncx2mixtcdf which computes the cdf of a linear combination of non central chi2 r.v.. This is the probability of misclassification.

Example: 'tol', 0.0001 

Data Types: double

### lim — maximum number of integration terms. Scalar.

Default is 1000000.

Optional parameters tol and lim will be used by function ncx2mixtcdf which computes the cdf of a linear combination of non central chi2 r.v.. This is the probability of misclassification.

Example: 'lim', 1000 

Data Types: double

## Output Arguments

### OmegaMap —map of misclassification probabilities.  Matrix

k-by-k matrix containing map of misclassification probabilities.

More precisely, OmegaMap(i,j) $(i ~= j)=1, 2, ..., k$ $OmegaMap(i,j) = w_{j|i}$ is the probability that X coming from the i-th component (group) is classified to the $j-th$ component.

The probability of overlapping (called pij) between groups i and j is given by $pij=pji= w_j|i + w_i|j \qquad i,j=1,2, ..., k$.

### BarOmega —Average overlap. Scalar

Scalar associated with average overlap. BarOmega is computed as sum(sum(OmegaMap))-k)/(0.5*k(k-1).

### MaxOmega —Maximum overlap. Scalar

Scalar associated with maximum overlap. MaxOmega is the maximum of OmegaMap(i,j)+OmegaMap(j,i) (i ~= j)=1, 2, ..., k.

### StdOmega —Std of overlap. Scalar

Scalar assocaited with standard deviation of overlap (that is the standard deviation of the 0.5*k(k-1) pij (probabilities of overlapping).

### rcMax —pair with largest overlap.  Vector

Column vector of length equal to 2 containing the indexes associated with the pair of components producing the highest overlap (largest off diagonal element of matrix OmegaMap).

Maitra, R. and Melnykov, V. (2010), Simulating data to study performance of finite mixture modeling and clustering algorithms, "The Journal of Computational and Graphical Statistics", Vol. 19, pp. 354-376. [to refer to this publication we will use "MM2010 JCGS"]

Melnykov, V., Chen, W.-C. and Maitra, R. (2012), MixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms, "Journal of Statistical Software", Vol. 51, pp. 1-25.

Davies, R. (1980), The distribution of a linear combination of chi-square random variables, "Applied Statistics", Vol. 29, pp. 323-333.