# GowerIndex

GowerIndex computes matrix of similarity indexes using Gower metric

## Syntax

• S=GowerIndex(Y)example
• S=GowerIndex(Y,Name,Value)example
• [S, Stable]=GowerIndex(___)example

## Description

This function computes the matrix of Gower similarity indexes

 S =GowerIndex(Y) Example of matrix of Gower indexes using all default options.

 S =GowerIndex(Y, Name, Value) Example 1 of use of option l.

 [S, Stable] =GowerIndex(___) An example of Gowerindex called with two ouptut arguments.

## Examples

expand all

### Example of matrix of Gower indexes using all default options.

Y=randn(10,4);
S=GowerIndex(Y);

### Example 1 of use of option l.

p=3;
n=30;
Y=randi(20,n,p);
% Specify that all variables are quantitative.
l=ones(p,1);
S=GowerIndex(Y,'l',l);

### An example of Gowerindex called with two ouptut arguments.

X=[380	700	    1	0	0  3
500	1800	1	1	1  3
310	480	    0	0	0  2];
% The first two variables are quantitative and 3:5 are
% dichotomous and the last is polithomous.
[S,Stable]=GowerIndex(X,'l',[ 1 1 2 2 2 3]);

## Related Examples

expand all

### Example 2 of use of option l.

p=3;
n=50;
Y=randi(120,n,p);
% Specify that first variable is quantitative and the other 2 are categorical.
l=[1 3 3];
S=GowerIndex(Y,'l',l);

### Example where input is a table with categorical variables containing numbers.

For the categorical variables nummbers are supplied.

X=[380	700	    1	0	0 3
500	1800	1	1	1 3
310	480	    0	0	0 2];
NameRows={'AEG' 'BOSCH' 'IGNIS'};
NameCols={'Capacity' 'Price' 'Alarm' 'Dispenser' 'Display' 'Certificate'};
Xtable=array2table(X,'RowNames',NameRows,'VariableNames',NameCols);
S=GowerIndex(Xtable,'l',[ 1 1 2 2 2 3]);

### Example where input is a table with categorical variables containing labels.

NameRows={'AEG' 'BOSCH' 'IGNIS'};
Capacity=[380; 500; 310];
Price=[700; 1800; 480];
Alarm={'Yes'; 'Yes'; 'No'};
Dispenser={'No'; 'Yes'; 'No'};
Display={'No'; 'Yes'; 'No'};
Certificate={'World';'World';'Europe'};
% Binary variable for which the corresponding value is 'yes' or 'Yes'
% is coded as 1 (presence)
Xtable=table(Capacity,Price,Alarm,Dispenser,Display,Certificate,'RowNames',NameRows);
[S,Stable]=GowerIndex(Xtable,'l',[ 1 1 2 2 2 3]);
disp('Matrix of Gower similarity indexes')
disp(Stable)
Matrix of Gower similarity indexes
AEG       BOSCH      IGNIS
_______    _______    _______

AEG            1    0.42251    0.36623
BOSCH    0.42251          1          0
IGNIS    0.36623          0          1



## Input Arguments

### Y — Input data. 2D array or MATLAB table .

n x p data matrix; n observations and p variables. Rows of Y represent observations, and columns represent variables.

Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

Data Types: single|double

### Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as  Name1,Value1,...,NameN,ValueN.

Example:  'l',[3 3 1] 

### l —type of variable.vector.

Vector of length p which specifies the type of variable for each column of the input matrix Y.

$l(j)=1$ => $j$-th variable assumes orderable values (quantitative variable).

$l(j)=2$ => $j$-th variable is binary (just values 0 and 1).

$l(j)=3$ => $j$-th variable assumes categorical (unorderable) values.

$j =1, 2, \ldots, p$.

If l is omitted or is empty the routine automatically tries to detect the type of variable. For example, if a variable assumes just values 0 and 1 the corresponding variable is classified as binary. If a column of the input dataset assumes just integer values (without decimal points) and the number of unique elements is not greater than 20 the corresponding variable is classified as categorical, else the variable is classified as quantitative.

Example:  'l',[3 3 1] 

Data Types: double

## Output Arguments

### S —matrix with Gower similarity coefficients.  n-by-n symmetric matrix

n-by-n matrix whose i-th j-th entry contains the Gower similarity index between row i and row j of input matrix Y.

### Stable —matrix with Gower similarity coefficients in table format. n-by-n table

n-by-n table whose i-th j-th entry contains the Gower similarity index between row i and row j of input matrix Y.

A very popular metric for mixtures of quantitative, multistate categorical, and binary variables is the one based on Gower's general similarity coefficient (Gower, 1971) that substantially unifies Jaccard's coefficient (binary variables), the simple matching coefficient (multistate categorical variables) and normalized city block distance (quantitative variables).

More specifically, given two $p$-dimensional vectors $z_{i}$ and $z_{j}$, Gower's similarity coefficient is defined as

$s_{ij}=\frac{\sum_{h=1}^{p_{1}}\left(1-|z_{ih}-z_{jh}|/R_{h}\right)+a+\alpha} {p_{1}+(p_{2}-d)+p_{3}}, \quad 0\leq s_{ij}\leq 1,$

where $p=p_{1}+p_{2}+p_{3}$, $p_{1}$ is the number of continuous variables, $a$ and $d$ are the number of positive and negative matches, respectively, for the $p_{2}$ binary variables, $\alpha$ is the number of matches for the $p_{3}$ multi-state categorical variables, and $R_{h}$ is the range of the $h$-th continuous variable.

Gower, J. C. (1971), "A general coefficient of similarity and some of its properties", Biometrics, pp. 857-871.

Grane', A., and Romera R. (2018), "On Visualizing Mixed-Type Data: A Joint Metric Approach to Profile Construction and Outlier Detection", Sociological Methods & Research, Vol. 47, pp. 207-239

## Acknowledgements

This function has been written jointly with Professor Aurea Grane', Universidad Carlos III de Madrid, Statistics Department.