corrNominal

corrpdf

corrOrdinal

corrOrdinal measures strength of association between two ordered categorical variables.

expand all in page

Syntax

out=corrOrdinal(N)example
out=corrOrdinal(N,Name,Value)example

Description

corrOrdinal computes Goodman-Kruskal's $\gamma$ , $\tau_a$ , $\tau_b$ , $\tau_c$ of Kendall and $d_{y|x}$ of Somers.

All these indexes measure the correlation among two ordered qualitative variables and go between -1 and 1. The sign of the coefficient indicates the direction of the relationship, and its absolute value indicates the strength, with larger absolute values indicating stronger relationships. Values close to an absolute value of 1 indicate a strong relationship between the two variables. Values close to 0 indicate little or no relationship. More in detail: $\gamma$ is a symmetric measure of association.

Kendall's $\tau_a$ is a symmetric measure of association that does not take ties into account. Ties happen when both members of the data pair have the same value.

Kendall's $\tau_b$ is a symmetric measure of association which takes ties into account. Even if $\tau_b$ ranges from -1 to 1, a value of -1 or +1 can be obtained only from square tables.

$\tau_c$ (also called Stuart-Kendall $\tau_c$ ) is a symmetric measure of association which makes an adjustment for table size in addition to a correction for ties. Even if $\tau_c$ ranges from -1 to 1, a value of -1 or +1 can be obtained only from square tables.

Somers' $d$ is an asymmetric extension of $\tau_b$ in that it uses a correction only for pairs that are tied on the independent variable (which in this implementation it is assumed to be on the rows of the contingency table).

Additional details about these indexes can be found in the "More About" section of this document.

example

out =corrOrdinal(N) corrOrdinal with all the default options.

example

out =corrOrdinal(N, Name, Value) Compare calculation of tau-b with that which comes from Matlab function corr.

Examples

expand all

corrOrdinal with all the default options.

Rows of N indicate the results of a written test with levels: 'Sufficient' 'Good' Very good' Columns of N indicate the results of an oral test with levels: 'Sufficient' 'Good' Very good'

N=[20    40    20;
10    45    45;
0     5    15];
out=corrOrdinal(N);
% Because the asymptotic 95 per cent confidence limits do not contain
% zero, this indicates a strong positive association between the
% written and the oral examination.

Test of H_0: independence between rows and columns
The standard errors are computed under H_0
              Coeff        se       zscore       pval   
             _______    ________    ______    __________

    gamma        0.5    0.098239    5.0896     3.588e-07
    taua     0.18342    0.047553    3.8571    0.00011474
    taub     0.30557    0.060038    5.0896     3.588e-07
    tauc     0.27375    0.053786    5.0896     3.588e-07
    dyx      0.31466    0.061823    5.0896     3.588e-07

-----------------------------------------
Indexes and 95% confidence limits
The standard error are computed under H_1
              Value     StandardError    ConflimL    ConflimU
             _______    _____________    ________    ________

    gamma        0.5        0.0876       0.32831     0.67169 
    taua     0.18342      0.011904       0.16009     0.20675 
    taub     0.30557       0.05852       0.19087     0.42027 
    tauc     0.27375      0.053786       0.16833     0.37917 
    dyx      0.31466      0.059899       0.19726     0.43205

Compare calculation of tau-b with that which comes from Matlab function corr.

% Starting from a contingency table, create the original data matrix to
% te able to call corr.
N=[20    23    20;
21    25    22;
18     18    19];
n11=N(1,1); n12=N(1,2); n13=N(1,3);
n21=N(2,1); n22=N(2,2); n23=N(2,3);
n31=N(3,1); n32=N(3,2); n33=N(3,3);
x11=[1*ones(n11,1) 1*ones(n11,1)];
x12=[1*ones(n12,1) 2*ones(n12,1)];
x13=[1*ones(n13,1) 3*ones(n13,1)];
x21=[2*ones(n21,1) 1*ones(n21,1)];
x22=[2*ones(n22,1) 2*ones(n22,1)];
x23=[2*ones(n23,1) 3*ones(n23,1)];
x31=[3*ones(n31,1) 1*ones(n31,1)];
x32=[3*ones(n32,1) 2*ones(n32,1)];
x33=[3*ones(n33,1) 3*ones(n33,1)];
% X original data matrix
X=[x11; x12; x13; x21; x22; x23; x31; x32; x33];
% Find taub and pvalue of taub using MATLAB routine corr
[RHO,pval]=corr(X,'type','Kendall');
% Compute tau-b using FSDA corrOrdinal routine.
out=corrOrdinal(X,'datamatrix',true,'dispresults',false);
disp(['tau-b from MATLAB routine corr=' num2str(RHO(1,2))])
disp(['tau-b from FSDA routine corrOrdinal=' num2str(out.taub(1))])
% Remark the p-values are slightly different
disp(['pvalue of H0:taub=0 from MATLAB routine corr=' num2str(pval(1,2))])
disp(['pvalue of H0:taub=0 from FSDA routine corrOrdinal=' num2str(out.taub(4))])

tau-b from MATLAB routine corr=0.0083449
tau-b from FSDA routine corrOrdinal=0.0083449
pvalue of H0:taub=0 from MATLAB routine corr=0.89952
pvalue of H0:taub=0 from FSDA routine corrOrdinal=0.89914

Related Examples

expand all

corrOrdinal with option conflev.

N=[26 26 23 18  9;
6  7  9 14 23];
out=corrOrdinal(N,'conflev',0.999);

corrOrdinal with with option NoStandardErrors.

N=[26 26 23 18  9;
6  7  9 14 23];
out=corrOrdinal(N,'NoStandardErrors',true);

Income and job satisfaction.

Relationship between the income (with levels '< 5000' '5000-25000' and '>25000') and job satisfaction (with levels 'Dissatisfied' 'Moderately satisfied' and 'Very satisfied') for a sample of 300 persons Input data is matlab table Ntable:

N = [24 23 30;19 43 57;13 33 58];
rownam={'LessThan5000',  'Between5000And25000' 'GreaterThan25000'};
colnam= {'Dissatisfied' 'ModeratelySatisfied' 'VerySatisfied'};
Ntable=array2table(N,'RowNames',matlab.lang.makeValidName(rownam),'VariableNames',matlab.lang.makeValidName(colnam));
% Check relationship
out=corrOrdinal(Ntable);

Input is the contingency table in matrix format, labels for rows and columns are supplied.

N=[20    40    20;
10    45    45;
0     5    15];
% labels for rows and columns
labels_rows= {'Sufficient' 'Good' 'Very_good'};
labels_columns= {'Sufficient' 'Good' 'Very_good'};
out=corrOrdinal(N,'Lr',labels_rows,'Lc',labels_columns,'dispresults',false);
% out.Ntable uses labels for rows and columns which are supplied
disp(out.Ntable)

Example 1 of use of option plots.

Exercise Frequency vs. Self-Reported Health

load SportHealth.mat
out=corrOrdinal(SportHealth,'plots',true);
% It is clear the positive relationship between 
% 'Self-Reported Health' and 'Exercise Frequency'

Test of H_0: independence between rows and columns
The standard errors are computed under H_0
              Coeff        se       zscore    pval
             _______    ________    ______    ____

    gamma    0.59385    0.053088    11.186     0  
    taua     0.33958     0.03852    8.8157     0  
    taub     0.45635    0.040796    11.186     0  
    tauc     0.45128    0.040343    11.186     0  
    dyx       0.4525    0.040451    11.186     0  

-----------------------------------------
Indexes and 95% confidence limits
The standard error are computed under H_1
              Value     StandardError    ConflimL    ConflimU
             _______    _____________    ________    ________

    gamma    0.59385       0.04837       0.49905     0.68866 
    taua     0.33958      0.011291       0.31745     0.36171 
    taub     0.45635      0.040331        0.3773      0.5354 
    tauc     0.45128      0.040343       0.37221     0.53036 
    dyx       0.4525      0.040106       0.37389     0.53111

Click here for the graphical output of this example (link to Ro.S.A. website)

Example 2 of use of option plots.

Opinion on the movie watched and age interval

load cinema.mat
out=corrOrdinal(cinema,'plots',true);
% It is clear the negative relationship between 
% age and satisfaction towards the watched movie

Test of H_0: independence between rows and columns
The standard errors are computed under H_0
              Coeff         se       zscore        pval   
             ________    ________    _______    __________

    gamma    -0.22239    0.033693    -6.6004    4.0994e-11
    taua     -0.10884    0.018121    -6.0064    1.8967e-09
    taub     -0.15598    0.023632    -6.6004    4.0994e-11
    tauc     -0.14501     0.02197    -6.6004    4.0994e-11
    dyx      -0.12946    0.019614    -6.6004    4.0994e-11

-----------------------------------------
Indexes and 95% confidence limits
The standard error are computed under H_1
             Value    StandardError    ConflimL    ConflimU
             _____    _____________    ________    ________

    gamma      0         0.033312         0           0    
    taua       0        0.0035492         0           0    
    taub       0         0.023518         0           0    
    tauc       0          0.02197         0           0    
    dyx        0         0.019609         0           0

Input Arguments

expand all

`N` — Contingency table (default) or n-by-2 input dataset. Matrix or Table.

Matrix or table which contains the input contingency table (say of size I-by-J) or the original data matrix.

In this last case N=crosstab(N(:,1),N(:,2)). As default procedure assumes that the input is a contingency table.

Data Types: single| double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example:

 'NoStandardErrors',true
, 'dispresults',false
, 'Lr',{'a' 'b' 'c'}
, 'Lc',{'c1' c2' 'c3' 'c4'}
, 'datamatrix',true
, 'conflev',0.99
, 'plots',true

`NoStandardErrors` —Just indexes without standard errors and p-values.boolean.

if NoStandardErrors is true just the indexes are computed without standard errors and p-values. That is no inferential measure is given. The default value of NoStandardErrors is false.

Example: 'NoStandardErrors',true

Data Types: Boolean

`dispresults` —Display results on the screen.boolean.

If dispresults is true (default) it is possible to see on the screen all the summary results of the analysis.

Example: 'dispresults',false

Data Types: Boolean

`Lr` —Vector of row labels.cell.

Cell containing the labels of the rows of the input contingency matrix N. This option is unnecessary if N is a table. because in this case Lr=N.Properties.RowNames;

Example: 'Lr',{'a' 'b' 'c'}

Data Types: cell array of strings

`Lc` —Vector of column labels.cell.

Cell containing the labels of the columns of the input contingency matrix N. This option is unnecessary if N is a table because in this case Lc=N.Properties.VariableNames;

Example: 'Lc',{'c1' c2' 'c3' 'c4'}

Data Types: cell array of strings

`datamatrix` —Data matrix or contingency table.boolean.

If datamatrix is true the first input argument N is forced to be interpreted as a data matrix, else if the input argument is false N is treated as a contingency table. The default value of datamatrix is false, that is the procedure automatically considers N as a contingency table

Example: 'datamatrix',true

Data Types: logical

`conflev` —Confidence levels to be used to compute confidence intervals.scalar.

The default value of conflev is 0.95, that is 95 per cent confidence intervals are computed for all the indexes (note that this option is ignored if NoStandardErrors=true).

Example: 'conflev',0.99

Data Types: double

`plots` —balloonplot and Pareto plot of individual contributions to $C-D$ .boolean.

If plots is true the following two plots of individual contributions to the numerator of correlation indexesare shown on the screen.

1) a bubble plot (ballonplot);

2) a Pareto plot.

In both plots entries of the contingency table associated with positive association (that is number of concordant pairs greater than number of discordant pairs) are shown in blue while those associated with negative association (that is number of concordant pairs smaller than number of discordant pairs are shown in red. The default value of plots is false.

Example: 'plots',true

Data Types: Boolean

Output Arguments

expand all

`out` — description Structure

Structure which contains the following fields:

Value	Description
`N`	$I$ -by- $J$ -array containing contingency table referred to active rows (i.e. referred to the rows which participated to the fit). The $(i,j)$ -th element is equal to $n_{ij}$ , $i=1, 2, \ldots, I$ and $j=1, 2, \ldots, J$ . The sum of the elements of out.N is $n$ (the grand total).
`Ntable`	Same as out.N but in table format (with row and column names). This output is present just if your MATLAB version is not<2013b.
`gam`	1 x 4 vector which contains Goodman and Kruskall gamma index, standard error, test and p-value.
`taua`	1 x 4 vector which contains index $\tau_a$ , standard error, test and p-value.
`taub`	1 x 4 vector which contains index $\tau_b$ , standard error, test and p-value.
`tauc`	1 x 4 vector which contains index $\tau_c$ , standard error, test and p-value.
`som`	1 x 4 vector which contains Somers index $d_{y\|x}$ , standard error, test and p-value.
`TestInd`	5-by-4 matrix containing index values (first column), standard errors (second column), zscores (third column), p-values (fourth column). Note that the standard errors in this matrix are computed assuming the null hypothesis of independence.
`TestIndtable`	5-by-4 table containing index values (first column), standard errors (second column), zscores (third column), p-values (fourth column). Note that the standard errors in this table are computed assuming the null hypothesis of independence.
`ConfLim`	5-by-4 matrix containing index values (first column), standard errors (second column), lower confidence limit (third column), upper confidence limit (fourth column). Note that the standard errors in this matrix are computed not assuming the null hypothesis of independence.
`ConfLimtable`	5-by-4 table containing index values (first column), standard errors (second column), lower confidence limit (third column), upper confidence limit (fourth column). Note that the standard errors in this table are computed not assuming the null hypothesis of independence.
`Contrib2CminusD`	IxJ array containing individual contributions to the common numerator of all the indexes above (namely C-D).
`Contrib2CminusDtable`	IxJ table containing individual contributions to the common numerator of all the indexes above (namely C-D).

More About

expand all

Additional Details

All these indexes are based on concordant and discordant pairs.

A pair of observations is concordant if the subject who is higher on one variable also is higher on the other variable, and a pair of observations is discordant if the subject who is higher on one variable is lower on the other variable.

More formally, a pair $(i,j)$ , $i=1, 2, ..., n$ is concordant if $(x(i)-x(j)) \times (y(i)-y(j))>0$ .

It is discordant if $(x(i)-x(j) ) \times (y(i)-y(j))<0$ .

Let $C$ be the total number of concordant pairs (concordances) and $D$ the total number of discordant pairs (discordances) . If $C > D$ the variables have a positive association, but if $C < D$ then the variables have a negative association.

In symbols, given an $I \times J$ contingency table the concordant pairs with cell $i,j$ are

$a_{ij} = \sum_{k<i} \sum_{l<j} n_{kl} + \sum_{k>i} \sum_{l>j} n_{kl}$ the number of discordant pairs is

$b_{ij} = \sum_{k>i} \sum_{l<j} n_{kl} + \sum_{k<i} \sum_{l>j} n_{kl}$ Twice the number of concordances,

$C$ is given by:

$2 \times C = \sum_{i=1}^I \sum_{j=1}^J n_{ij} a_{ij}$ Twice the number of discordances,

$D$ is given by:

$2 \times D = \sum_{i=1}^I \sum_{j=1}^J n_{ij} b_{ij}$ Goodman-Kruskal's

$\gamma$ statistic is equal to the ratio:

$\gamma= \frac{C-D}{C+D}$

$\tau_a$ is equal to concordant minus discordant pairs, divided by a factor which takes into account the total number of pairs.

$\tau_a= \frac{C-D}{0.5 n(n-1)}$

$\tau_b$ is equal to concordant minus discordant pairs divided by a term representing the geometric mean between the number of pairs not tied on x and the number not tied on y.

More precisely:

$\tau_b= \frac{C-D}{\sqrt{ (0.5 n(n-1)-T_x)(0.5 n(n-1)-T_y)}}$

where $T_x= \sum_{i=1}^I 0.5 n_{i.}(n_{i.}-1)$ and $T_y=\sum_{j=1}^J 0.5 n_{.j}(n_{.j}-1)$ Note that $\tau_b \leq \gamma$ .

$\tau_c$ is equal to concordant minus discordant pairs multiplied by a factor that adjusts for table size.

$\tau_c= \frac{C-D}{ n^2(m-1)/(2m)}$

where $m= min(I,J)$ ;

Somers' $d_{y|x}$ is an asymmetric extension of $\gamma$ that differs only in the inclusion of the number of pairs not tied on the independent variable. More precisely

$d_{y|x} = \frac{C-D}{0.5 n(n-1)-T_x}$

Null hypothesis: corresponding index = 0. Alternative hypothesis (one-sided) index < 0 or index > 0.

In order to compute confidence intervals and test hypotheses, this routine computes the standard error of the various indexes.

Note that the expression of the standard errors which is used to compute the confidence intervals is different from the expression which is used to test the null hypothesis of no association (no relationship or independence) between the two variables.

As concerns the Goodman-Kruskal's $\gamma$ index we have that:

$var(\gamma) = \frac{4}{(C + D)^4} \sum_{i=1}^I \sum_{j=1}^J n_{ij} (D a_{ij} - C b_{ij} )^2$ where

$d_{ij}=a_{ij}- b_{ij}$ The variance of

$\gamma$ assuming the independence hypothesis is:

$var_0(\gamma) =\frac{1}{(C + D)^2} \left( \sum_{i=1}^I \sum_{j=1}^J n_{ij} d_{ij}^2 -4(C-D)^2/n \right)$ As concerns

$\tau_a$ we have that:

$var(\tau_a)= \frac{2}{n(n-1)} \left\{ \frac{2(n-2)}{n(n-1)^2} \sum_{i=1}^I \sum_{j=1}^J (d_{ij} - \overline d)^2 + 1 - \tau_a^2 \right\} \qquad \mbox{with $i,j$ such that $N(i,j)>0$}$ where

$\overline d = \sum_{i=1}^I \sum_{j=1}^J d_{ij} /n \qquad \mbox{with $i,j$ such that $N(i,j)>0$}$ The variance of

$\tau_a$ assuming the independence hypothesis is:

$var_0(\tau_a) =\frac{2 (2n+5)}{9n(n-1) }$ As concerns

$\tau_b$ we have that:

$var(\tau_b)= \frac{n}{w^4} \left\{ n \sum_{i=1}^I \sum_{j=1}^J n_{ij} \tau_{ij}^2 - \left( \sum_{i=1}^I \sum_{j=1}^J n_{ij}\tau_{ij}\right)^2 \right\}$ where

$\tau_{ij} = 2n d_{ij} +2(C-D) n_{.j} w /n^3+2(C-D) (n_{i.}/n) \sqrt{ w_c/w_r} \qquad \mbox{and} \qquad w= \sqrt{w_rw_c}$ The variance of

$\tau_b$ assuming the independence hypothesis is:

$var_0(\tau_b) =\frac{4}{w_r w_c} \left\{ \sum_{i=1}^I \sum_{j=1}^J n_{ij} d_{ij} ^2 -4(C-D)^2/n \right\}$ As concerns Stuart's

$\tau_c$ we have that:

$var(\tau_c)= \frac{4m^2}{(m-1)^2 n^4} \left\{ \sum_{i=1}^I \sum_{j=1}^J n_{ij} d_{ij} ^2 -4(C-D)^2/n \right\}$ The variance of

$\tau_c$ assuming the independence hypothesis is:

$var_0(\tau_c) =var(\tau_c)$ As concerns

$d_{y|x}$ we have that:

$var( d_{y|x})= \frac{4}{w_r^4} \left\{ \sum_{i=1}^I \sum_{j=1}^J n_{ij} (w_r d_{ij} -2(C-D) (n-n_{i.}) \right\}^2$ where

$w_r= n^2- \sum_{i=1}^I n_{i.}^2$ The variance of

$d_{y|x}$ assuming the independence hypothesis is:

$var_0(d_{y|x}) = \frac{4}{w_r^2} \left\{ \sum_{i=1}^I \sum_{j=1}^J n_{ij} d_{ij} ^2 -4(C-D)^2/n \right\}$

From the theoretical point of view, Simon (1978) showed that all sample measures having the same numerator $(C-D)$ have the same efficacy and hence the same local power, for testing independence.

References

Agresti, A. (2002), "Categorical Data Analysis", John Wiley & Sons. [pp.

57-59]

Agresti, A. (2010), "Analysis of Ordinal Categorical Data", Second Edition, Wiley, New York, pp. 194-195.

Goktas, A. and Oznur, I. (2011), A comparision of the most commonly used measures of association for doubly ordered square contingency tables via simulation, "Metodoloski zvezki", Vol. 8, pp. 17-37,

Goodman, L.A. and Kruskal, W.H. (1954), Measures of association for cross classifications, "Journal of the American Statistical Association", Vol. 49, pp. 732-764.

Goodman, L.A. and Kruskal, W.H. (1959), Measures of association for cross classifications II: Further Discussion and References, "Journal of the American Statistical Association", Vol. 54, pp. 123-163.

Goodman, L.A. and Kruskal, W.H. (1963), Measures of association for cross classifications III: Approximate Sampling Theory, "Journal of the American Statistical Association", Vol. 58, pp. 310-364.

Goodman, L.A. and Kruskal, W.H. (1972), Measures of association for cross classifications IV: Simplification of Asymptotic Variances, "Journal of the American Statistical Association", Vol. 67, pp. 415-421.

Hollander, M, Wolfe, D.A., Chicken, E. (2014), "Nonparametric Statistical Methods", Third edition, Wiley,

Liebetrau, A.M. (1983), "Measures of Association", Sage University Papers Series on Quantitative Applications in the Social Sciences, 07-004, Newbury Park, CA: Sage. [pp. 49-56]

SAS documentation (2009), See http://support.sas.com/documentation/cdl/en/statugfreq/63124/PDF/default/statugfreq.pdf, pp. 1738-1740.

Morton, B.B. and Benedetti, J.K. (1977), Sampling Behavior of Tests for Correlation in Two-Way Contingency Tables, "Journal of the American Statistical Association", Vol. 72, pp. 309-315.

Simon, G. (1978), Alternative analysis for the singly ordered contingency table, "Journal of the American Statistical Association", Vol. 69, pp. 971-976.

Acknowledgements

This file was inspired by Trujillo-Ortiz, A. and R. Hernandez-Walls.

gkgammatst: Goodman-Kruskal's gamma test. URL address http://www.mathworks.com/matlabcentral/fileexchange/42645-gkgammatst

Documentation

corrOrdinal

Syntax

Description

Examples

corrOrdinal with all the default options.

Compare calculation of tau-b with that which comes from Matlab function corr.

Related Examples

corrOrdinal with option conflev.

corrOrdinal with with option NoStandardErrors.

Income and job satisfaction.

Input is the contingency table in matrix format, labels for rows and columns are supplied.

Example 1 of use of option plots.

Example 2 of use of option plots.

Input Arguments

`N` — Contingency table (default) or n-by-2 input dataset. Matrix or Table.

Name-Value Pair Arguments

`NoStandardErrors` —Just indexes without standard errors and p-values.boolean.

`dispresults` —Display results on the screen.boolean.

`Lr` —Vector of row labels.cell.

`Lc` —Vector of column labels.cell.

`datamatrix` —Data matrix or contingency table.boolean.

`conflev` —Confidence levels to be used to compute confidence intervals.scalar.

`plots` —balloonplot and Pareto plot of individual contributions to $C-D$ .boolean.

Output Arguments

`out` — description Structure

More About

Additional Details

References

Acknowledgements

See Also

corrOrdinal

Syntax

Description

Examples

Related Examples

Input Arguments

N — Contingency table (default) or n-by-2 input dataset. Matrix or Table.

Name-Value Pair Arguments

NoStandardErrors —Just indexes without standard errors and p-values.boolean.

dispresults —Display results on the screen.boolean.

Lr —Vector of row labels.cell.

Lc —Vector of column labels.cell.

datamatrix —Data matrix or contingency table.boolean.

conflev —Confidence levels to be used to compute confidence intervals.scalar.

plots —balloonplot and Pareto plot of individual contributions to C-DC-D.boolean.

Output Arguments

out — description Structure

More About

References

Acknowledgements

See Also

`N` — Contingency table (default) or n-by-2 input dataset. Matrix or Table.

`NoStandardErrors` —Just indexes without standard errors and p-values.boolean.

`dispresults` —Display results on the screen.boolean.

`Lr` —Vector of row labels.cell.

`Lc` —Vector of column labels.cell.

`datamatrix` —Data matrix or contingency table.boolean.

`conflev` —Confidence levels to be used to compute confidence intervals.scalar.

`plots` —balloonplot and Pareto plot of individual contributions to $C-D$ .boolean.

`out` — description Structure