GowerIndex

GowerIndex computes matrix of similarity indexes using Gower metric

Syntax

Description

This function computes the matrix of Gower similarity indexes

example

S =GowerIndex(Y) Example of matrix of Gower indexes using all default options.

example

S =GowerIndex(Y, Name, Value) Example 1 of use of option l.

example

[S, Stable] =GowerIndex(___) An example of Gowerindex called with two ouptut arguments.

Examples

expand all

  • Example 1 of use of option l.
  • p=3;
    n=30;
    Y=randi(20,n,p);
    % Specify that all variables are quantitative.
    l=ones(p,1);
    S=GowerIndex(Y,'l',l);

  • An example of Gowerindex called with two ouptut arguments.
  • X=[380	700	    1	0	0  3 
    500	1800	1	1	1  3
    310	480	    0	0	0  2];
    % The first two variables are quantitative and 3:5 are
    % dichotomous and the last is polithomous.
    [S,Stable]=GowerIndex(X,'l',[ 1 1 2 2 2 3]);

    Related Examples

    expand all

  • Example 2 of use of option l.
  • p=3;
    n=50;
    Y=randi(120,n,p);
    % Specify that first variable is quantitative and the other 2 are categorical.
    l=[1 3 3];
    S=GowerIndex(Y,'l',l);

  • Example where input is a table with categorical variables containing numbers.
  • For the categorical variables nummbers are supplied.

    X=[380	700	    1	0	0 3
    500	1800	1	1	1 3
    310	480	    0	0	0 2];
    NameRows={'AEG' 'BOSCH' 'IGNIS'};
    NameCols={'Capacity' 'Price' 'Alarm' 'Dispenser' 'Display' 'Certificate'}; 
    Xtable=array2table(X,'RowNames',NameRows,'VariableNames',NameCols);
    S=GowerIndex(Xtable,'l',[ 1 1 2 2 2 3]);

  • Example where input is a table with categorical variables containing labels.
  • NameRows={'AEG' 'BOSCH' 'IGNIS'};
    Capacity=[380; 500; 310];
    Price=[700; 1800; 480];
    Alarm={'Yes'; 'Yes'; 'No'};
    Dispenser={'No'; 'Yes'; 'No'};
    Display={'No'; 'Yes'; 'No'};
    Certificate={'World';'World';'Europe'};
    % Binary variable for which the corresponding value is 'yes' or 'Yes'
    % is coded as 1 (presence)
    Xtable=table(Capacity,Price,Alarm,Dispenser,Display,Certificate,'RowNames',NameRows);
    [S,Stable]=GowerIndex(Xtable,'l',[ 1 1 2 2 2 3]);
    disp('Matrix of Gower similarity indexes')
    disp(Stable)
    Matrix of Gower similarity indexes
                   AEG       BOSCH      IGNIS 
                 _______    _______    _______
    
        AEG            1    0.42251    0.36623
        BOSCH    0.42251          1          0
        IGNIS    0.36623          0          1
    
    

    Input Arguments

    expand all

    Y — Input data. 2D array or MATLAB table .

    n x p data matrix; n observations and p variables. Rows of Y represent observations, and columns represent variables.

    Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

    Data Types: single|double

    Name-Value Pair Arguments

    Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

    Example: 'l',[3 3 1]

    l —type of variable.vector.

    Vector of length p which specifies the type of variable for each column of the input matrix Y.

    $l(j)=1$ => $j$-th variable assumes orderable values (quantitative variable).

    $l(j)=2$ => $j$-th variable is binary (just values 0 and 1).

    $l(j)=3$ => $j$-th variable assumes categorical (unorderable) values.

    $j =1, 2, \ldots, p$.

    If l is omitted or is empty the routine automatically tries to detect the type of variable. For example, if a variable assumes just values 0 and 1 the corresponding variable is classified as binary. If a column of the input dataset assumes just integer values (without decimal points) and the number of unique elements is not greater than 20 the corresponding variable is classified as categorical, else the variable is classified as quantitative.

    Example: 'l',[3 3 1]

    Data Types: double

    Output Arguments

    expand all

    S —matrix with Gower similarity coefficients. n-by-n symmetric matrix

    n-by-n matrix whose i-th j-th entry contains the Gower similarity index between row i and row j of input matrix Y.

    Stable —matrix with Gower similarity coefficients in table format. n-by-n table

    n-by-n table whose i-th j-th entry contains the Gower similarity index between row i and row j of input matrix Y.

    More About

    expand all

    Additional Details

    A very popular metric for mixtures of quantitative, multistate categorical, and binary variables is the one based on Gower's general similarity coefficient (Gower, 1971) that substantially unifies Jaccard's coefficient (binary variables), the simple matching coefficient (multistate categorical variables) and normalized city block distance (quantitative variables).

    More specifically, given two $p$-dimensional vectors $z_{i}$ and $z_{j}$, Gower's similarity coefficient is defined as

    \[ s_{ij}=\frac{\sum_{h=1}^{p_{1}}\left(1-|z_{ih}-z_{jh}|/R_{h}\right)+a+\alpha} {p_{1}+(p_{2}-d)+p_{3}}, \quad 0\leq s_{ij}\leq 1, \]

    where $p=p_{1}+p_{2}+p_{3}$, \(p_{1}\) is the number of continuous variables, $a$ and $d$ are the number of positive and negative matches, respectively, for the \(p_{2}\) binary variables, \(\alpha\) is the number of matches for the \(p_{3}\) multi-state categorical variables, and \(R_{h}\) is the range of the \(h\)-th continuous variable.

    References

    Gower, J. C. (1971), "A general coefficient of similarity and some of its properties", Biometrics, pp. 857-871.

    Grane', A., and Romera R. (2018), "On Visualizing Mixed-Type Data: A Joint Metric Approach to Profile Construction and Outlier Detection", Sociological Methods & Research, Vol. 47, pp. 207-239

    Acknowledgements

    This function has been written jointly with Professor Aurea Grane', Universidad Carlos III de Madrid, Statistics Department.

    See Also

    |

    This page has been automatically generated by our routine publishFS