GowerIndex

GowerIndex computes matrix of similarity indexes using Gower metric

expand all in page

Syntax

S=GowerIndex(Y)example
S=GowerIndex(Y,Name,Value)example
[S, Stable]=GowerIndex(___)example

Description

This function computes the matrix of Gower similarity indexes

example

S =GowerIndex(Y) Example of matrix of Gower indexes using all default options.

example

S =GowerIndex(Y, Name, Value) Example 1 of use of option l.

example

[S, Stable] =GowerIndex(___) An example of Gowerindex called with two ouptut arguments.

Examples

expand all

Example of matrix of Gower indexes using all default options.

Y=randn(10,4);
S=GowerIndex(Y);

Example 1 of use of option l.

p=3;
n=30;
Y=randi(20,n,p);
% Specify that all variables are quantitative.
l=ones(p,1);
S=GowerIndex(Y,'l',l);

An example of Gowerindex called with two ouptut arguments.

X=[380  700      1  0  0  3 
500  1800  1  1  1  3
310  480      0  0  0  2];
% The first two variables are quantitative and 3:5 are
% dichotomous and the last is polithomous.
[S,Stable]=GowerIndex(X,'l',[ 1 1 2 2 2 3]);

Related Examples

expand all

Example 2 of use of option l.

p=3;
n=50;
Y=randi(120,n,p);
% Specify that first variable is quantitative and the other 2 are categorical.
l=[1 3 3];
S=GowerIndex(Y,'l',l);

Example where input is a table with categorical variables containing numbers.

For the categorical variables nummbers are supplied.

X=[380  700      1  0  0 3
500  1800  1  1  1 3
310  480      0  0  0 2];
NameRows={'AEG' 'BOSCH' 'IGNIS'};
NameCols={'Capacity' 'Price' 'Alarm' 'Dispenser' 'Display' 'Certificate'}; 
Xtable=array2table(X,'RowNames',NameRows,'VariableNames',NameCols);
S=GowerIndex(Xtable,'l',[ 1 1 2 2 2 3]);

Example where input is a table with categorical variables containing labels.

NameRows={'AEG' 'BOSCH' 'IGNIS'};
Capacity=[380; 500; 310];
Price=[700; 1800; 480];
Alarm={'Yes'; 'Yes'; 'No'};
Dispenser={'No'; 'Yes'; 'No'};
Display={'No'; 'Yes'; 'No'};
Certificate={'World';'World';'Europe'};
% Binary variable for which the corresponding value is 'yes' or 'Yes'
% is coded as 1 (presence)
Xtable=table(Capacity,Price,Alarm,Dispenser,Display,Certificate,'RowNames',NameRows);
[S,Stable]=GowerIndex(Xtable,'l',[ 1 1 2 2 2 3]);
disp('Matrix of Gower similarity indexes')
disp(Stable)

Matrix of Gower similarity indexes
               AEG       BOSCH      IGNIS 
             _______    _______    _______

    AEG            1    0.42251    0.36623
    BOSCH    0.42251          1          0
    IGNIS    0.36623          0          1

Input Arguments

expand all

`Y` — Input data. 2D array or MATLAB table .

n x p data matrix; n observations and p variables. Rows of Y represent observations, and columns represent variables.

Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

Data Types: single|double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example:

 'l',[3 3 1]

`l` —type of variable.vector.

Vector of length p which specifies the type of variable for each column of the input matrix Y.

$l(j)=1$ => $j$ -th variable assumes orderable values (quantitative variable).

$l(j)=2$ => $j$ -th variable is binary (just values 0 and 1).

$l(j)=3$ => $j$ -th variable assumes categorical (unorderable) values.

$j =1, 2, \ldots, p$ .

If l is omitted or is empty the routine automatically tries to detect the type of variable. For example, if a variable assumes just values 0 and 1 the corresponding variable is classified as binary. If a column of the input dataset assumes just integer values (without decimal points) and the number of unique elements is not greater than 20 the corresponding variable is classified as categorical, else the variable is classified as quantitative.

Example: 'l',[3 3 1]

Data Types: double

Output Arguments

expand all

`S` —matrix with Gower similarity coefficients. `n-by-n symmetric` matrix

n-by-n matrix whose i-th j-th entry contains the Gower similarity index between row i and row j of input matrix Y.

`Stable` —matrix with Gower similarity coefficients in table format. n-by-n table

n-by-n table whose i-th j-th entry contains the Gower similarity index between row i and row j of input matrix Y.

More About

expand all

Additional Details

A very popular metric for mixtures of quantitative, multistate categorical, and binary variables is the one based on Gower's general similarity coefficient (Gower, 1971) that substantially unifies Jaccard's coefficient (binary variables), the simple matching coefficient (multistate categorical variables) and normalized city block distance (quantitative variables).

More specifically, given two $p$ -dimensional vectors $z_{i}$ and $z_{j}$ , Gower's similarity coefficient is defined as

$s_{ij}=\frac{\sum_{h=1}^{p_{1}}\left(1-|z_{ih}-z_{jh}|/R_{h}\right)+a+\alpha} {p_{1}+(p_{2}-d)+p_{3}}, \quad 0\leq s_{ij}\leq 1,$

where $p=p_{1}+p_{2}+p_{3}$ , $p_{1}$ is the number of continuous variables, $a$ and $d$ are the number of positive and negative matches, respectively, for the $p_{2}$ binary variables, $\alpha$ is the number of matches for the $p_{3}$ multi-state categorical variables, and $R_{h}$ is the range of the $h$ -th continuous variable.

References

Gower, J. C. (1971), "A general coefficient of similarity and some of its properties", Biometrics, pp. 857-871.

Grane', A., and Romera R. (2018), "On Visualizing Mixed-Type Data: A Joint Metric Approach to Profile Construction and Outlier Detection", Sociological Methods & Research, Vol. 47, pp. 207-239

Acknowledgements

This function has been written jointly with Professor Aurea Grane', Universidad Carlos III de Madrid, Statistics Department.

Documentation

GowerIndex

Syntax

Description

Examples

Example of matrix of Gower indexes using all default options.

Example 1 of use of option l.

An example of Gowerindex called with two ouptut arguments.

Related Examples

Example 2 of use of option l.

Example where input is a table with categorical variables containing numbers.

Example where input is a table with categorical variables containing labels.

Input Arguments

`Y` — Input data. 2D array or MATLAB table .

Name-Value Pair Arguments

`l` —type of variable.vector.

Output Arguments

`S` —matrix with Gower similarity coefficients. `n-by-n symmetric` matrix

`Stable` —matrix with Gower similarity coefficients in table format. n-by-n table

More About

Additional Details

References

Acknowledgements

See Also

GowerIndex

Syntax

Description

Examples

Related Examples

Input Arguments

Y — Input data. 2D array or MATLAB table .

Name-Value Pair Arguments

l —type of variable.vector.

Output Arguments

S —matrix with Gower similarity coefficients. n-by-n symmetric matrix

Stable —matrix with Gower similarity coefficients in table format. n-by-n table

More About

References

Acknowledgements

See Also

`Y` — Input data. 2D array or MATLAB table .

`l` —type of variable.vector.

`S` —matrix with Gower similarity coefficients. `n-by-n symmetric` matrix

`Stable` —matrix with Gower similarity coefficients in table format. n-by-n table