corrNominal measures strength of association between two unordered (nominal) categorical variables.
corrNominal computes , \Phi, Cramer's V, Goodman-Kruskal's \lambda_{y|x}, Goodman-Kruskal's \tau_{y|x}, and Theil's H_{y|x} (uncertainty coefficient).
All these indexes measure the association among two unordered qualitative variables.
If the input table is 2-by-2 indexes theta (cross product ratio), Q=(theta-1)/(theta+1) and U=Q=(sqrt(theta)-1)/(sqrt(theta)+1) are also computed Additional details about these indexes can be found in the "More About" section or in the "Output section" of this document.
Example of option conflev.out
=corrNominal(N
,
Name, Value
)
Rows of N indicate type of Bachelor degree: 'Economics' 'Law' 'Literature' Columns of N indicate employment type: 'Private_firm' 'Public_firm' 'Freelance' 'Unemployed'
N=[150 80 20 50
80 250 30 140
30 50 0 120];
out=corrNominal(N);
Chi2 index 221.2405 pvalue Chi2 index 5.6588e-45 Phi index 0.4704 Cramer's V 0.3326 Test of H_0: independence between rows and columns Coeff se zscore pval ________ ________ ______ __________ CramerV 0.3326 0.024431 13.614 0 GKlambdayx 0.22581 0.028383 7.9556 1.7764e-15 tauyx 0.091674 0.013524 6.7788 1.2121e-11 Hyx 0.08716 0.011265 7.7374 1.0214e-14 ----------------------------------------- Indexes and 95% confidence limits Value StandardError ConflimL ConflimU ________ _____________ ________ ________ CramerV 0.3326 0.024431 0.28471 0.37287 GKlambdayx 0.22581 0.028383 0.17018 0.28144 tauyx 0.091674 0.013524 0.065168 0.11818 Hyx 0.08716 0.011265 0.065082 0.10924
Use data from Goodman Kruskal (1954).
N=[1768 807 189 47
946 1387 746 53
115 438 288 16];
out=corrNominal(N,'conflev',0.99);
Chi2 index 1.0735e+03 pvalue Chi2 index 1.1244e-228 Phi index 0.3973 Cramer's V 0.2810 Test of H_0: independence between rows and columns Coeff se zscore pval ________ _________ ______ ____ CramerV 0.28095 0.0088396 31.784 0 GKlambdayx 0.19239 0.012158 15.825 0 tauyx 0.080883 0.0046282 17.476 0 Hyx 0.075341 0.0041619 18.102 0 ----------------------------------------- Indexes and 99% confidence limits Value StandardError ConflimL ConflimU ________ _____________ ________ ________ CramerV 0.28095 0.0088396 0.25818 0.30241 GKlambdayx 0.19239 0.012158 0.16108 0.22371 tauyx 0.080883 0.0046282 0.068962 0.092805 Hyx 0.075341 0.0041619 0.064621 0.086061
N=[ 6 14 17 9;
30 32 17 3];
out=corrNominal(N,'dispresults',false);
N=[26 26 23 18 9;
6 7 9 14 23];
% From the contingency table reconstruct the original data matrix.
n11=N(1,1); n12=N(1,2); n13=N(1,3); n14=N(1,4); n15=N(1,5);
n21=N(2,1); n22=N(2,2); n23=N(2,3); n24=N(2,4); n25=N(2,5);
x11=[1*ones(n11,1) 1*ones(n11,1)];
x12=[1*ones(n12,1) 2*ones(n12,1)];
x13=[1*ones(n13,1) 3*ones(n13,1)];
x14=[1*ones(n14,1) 4*ones(n14,1)];
x15=[1*ones(n15,1) 5*ones(n15,1)];
x21=[2*ones(n21,1) 1*ones(n21,1)];
x22=[2*ones(n22,1) 2*ones(n22,1)];
x23=[2*ones(n23,1) 3*ones(n23,1)];
x24=[2*ones(n24,1) 4*ones(n24,1)];
x25=[2*ones(n25,1) 5*ones(n25,1)];
% X original data matrix (in this case an array)
X=[x11; x12; x13; x14; x15; x21; x22; x23; x24; x25];
out=corrNominal(X,'datamatrix',true);
Initial contingency matrix (2D array).
N=[75 126
76 203
40 129
36 125
24 110
41 222
19 141];
% Labels of the contingency matrix
Party={'ACTIVIST DEMOCRATIC', 'DEMOCRATIC', ...
'SIMPATIZING DEMOCRATIC', 'INDEPENDENT', ...
'LIKING REPUBLICAN', 'REPUBLICAN', ...
'ACTIVIST REPUBLICAN'};
DeathPenalty={'AGAINST' 'FAVORABLE'};
Ntable=array2table(N,'RowNames',Party,'VariableNames',DeathPenalty);
% From the contingency table reconstruct the original data matrix now
% using FSDA function
% The output is a cell arrary
Xcell=crosstab2datamatrix(Ntable);
Xtable=cell2table(Xcell);
% call function corrNominal using first argument as input data matrix
% in table format and option datamatrix set to true
out=corrNominal(Xtable,'datamatrix',true);
Use the 4 possible methods
method={'ncchisq', 'ncchisqadj', 'fisher' 'fisheradj'};
% Use a contingency table referred to type of job vs wine delivery
rownam={'Butcher' 'Carpenter' 'Carter' 'Farmer' 'Hunter' 'Miller' 'Taylor'};
colnam={'Wine not delivered' 'Wine delivered'};
N=[85 9
214 56
212 19
100 17
139 15
109 16
172 29];
Ntable=array2table(N,'RowNames',rownam,'VariableNames',colnam);
ConfintV=zeros(4,2);
for i=1:4
out=corrNominal(Ntable,'conflimMethodCramerV',method{i});
ConfintV(i,:)=out.ConfLimtable{'CramerV',3:4};
end
disp(array2table(ConfintV,'RowNames',method,'VariableNames',{'Lower' 'Upper'}))
Chi2 index 21.0290 pvalue Chi2 index 0.0018 Phi index 0.1328 Cramer's V 0.1328 Test of H_0: independence between rows and columns Coeff se zscore pval ________ _________ ______ _________ CramerV 0.13282 0.04149 3.2013 0.0013679 GKlambdayx 0 0 NaN NaN tauyx 0.017642 0.0078826 2.2381 0.025218 Hyx 0.021875 0.0095422 2.2924 0.021883 ----------------------------------------- Indexes and 95% confidence limits Value StandardError ConflimL ConflimU ________ _____________ _________ ________ CramerV 0.13282 0.04149 0.051504 0.17582 GKlambdayx 0 0 0 0 tauyx 0.017642 0.0078826 0.0021921 0.033091 Hyx 0.021875 0.0095422 0.0031721 0.040577 Chi2 index 21.0290 pvalue Chi2 index 0.0018 Phi index 0.1328 Cramer's V 0.1328 Test of H_0: independence between rows and columns Coeff se zscore pval ________ _________ ______ __________ CramerV 0.13282 0.023037 5.7657 8.1331e-09 GKlambdayx 0 0 NaN NaN tauyx 0.017642 0.0078826 2.2381 0.025218 Hyx 0.021875 0.0095422 2.2924 0.021883 ----------------------------------------- Indexes and 95% confidence limits Value StandardError ConflimL ConflimU ________ _____________ _________ ________ CramerV 0.13282 0.023037 0.087671 0.18959 GKlambdayx 0 0 0 0 tauyx 0.017642 0.0078826 0.0021921 0.033091 Hyx 0.021875 0.0095422 0.0031721 0.040577 Chi2 index 21.0290 pvalue Chi2 index 0.0018 Phi index 0.1328 Cramer's V 0.1328 Test of H_0: independence between rows and columns Coeff se zscore pval ________ _________ ______ _________ CramerV 0.13282 0.028675 4.632 3.621e-06 GKlambdayx 0 0 NaN NaN tauyx 0.017642 0.0078826 2.2381 0.025218 Hyx 0.021875 0.0095422 2.2924 0.021883 ----------------------------------------- Indexes and 95% confidence limits Value StandardError ConflimL ConflimU ________ _____________ _________ ________ CramerV 0.13282 0.028675 0.076621 0.18818 GKlambdayx 0 0 0 0 tauyx 0.017642 0.0078826 0.0021921 0.033091 Hyx 0.021875 0.0095422 0.0031721 0.040577 Chi2 index 21.0290 pvalue Chi2 index 0.0018 Phi index 0.1328 Cramer's V 0.1328 Test of H_0: independence between rows and columns Coeff se zscore pval ________ _________ ______ __________ CramerV 0.13282 0.028646 4.6366 3.5418e-06 GKlambdayx 0 0 NaN NaN tauyx 0.017642 0.0078826 2.2381 0.025218 Hyx 0.021875 0.0095422 2.2924 0.021883 ----------------------------------------- Indexes and 95% confidence limits Value StandardError ConflimL ConflimU ________ _____________ _________ ________ CramerV 0.13282 0.028646 0.076676 0.18824 GKlambdayx 0 0 0 0 tauyx 0.017642 0.0078826 0.0021921 0.033091 Hyx 0.021875 0.0095422 0.0031721 0.040577 Lower Upper ________ _______ ncchisq 0.051504 0.17582 ncchisqadj 0.087671 0.18959 fisher 0.076621 0.18818 fisheradj 0.076676 0.18824
Indexes theta=cross product ratio, Q and U are also computed.
% X=advertisment memory (rows)
% Y=product purchase (columns)
N= [87 188;
42 406];
nam=["Yes" "No"];
Ntable=array2table(N,"RowNames",nam,"VariableNames",nam);
disp('Input 2x2 contingency table')
table(Ntable,RowNames=["X=advertisment memory" "advertisment memory "],VariableNames="Y=Product purchase")
out=corrNominal(Ntable)
Input 2x2 contingency table ans = 2×1 table Y=Product purchase __________________ Yes No ___ ___ X=advertisment memory Yes 87 188 advertisment memory No 42 406 Chi2 index 57.6071 pvalue Chi2 index 3.2006e-14 Phi index 0.2823 Cramer's V 0.2823 ------------------------------- 2x2 contingency table indexes th=cross product ratio 4.4734 Cross product ratio in the interval [-1 1]. Index Q=(th-1)/(th+1) 0.6346 Cross product ratio in the interval [-1 1]. Index U=(sqrt(th)-1)/(sqrt(th)+1) 0.3580 ------------------------------- Test of H_0: independence between rows and columns Coeff se zscore pval ________ ________ ______ __________ CramerV 0.28227 0.037189 7.5902 3.1974e-14 GKlambdayx 0 0 NaN NaN tauyx 0.079678 0.020787 3.8331 0.00012653 Hyx 0.082782 0.021327 3.8816 0.00010376 ----------------------------------------- Indexes and 95% confidence limits Value StandardError ConflimL ConflimU ________ _____________ ________ ________ CramerV 0.28227 0.037189 0.20938 0.35516 GKlambdayx 0 0 0 0 tauyx 0.079678 0.020787 0.038937 0.12042 Hyx 0.082782 0.021327 0.040983 0.12458 out = struct with fields: N: [2×2 double] Ntable: [2×2 table] Chi2: 57.6071 Chi2pval: 3.2006e-14 Phi: 0.2823 CramerV: [0.2823 0.0372 7.5902 3.1974e-14] GKlambdayx: [0 0 NaN NaN] tauyx: [0.0797 0.0208 3.8331 1.2653e-04] Hyx: [0.0828 0.0213 3.8816 1.0376e-04] ConfLim: [4×4 double] ConfLimtable: [4×4 table] TestInd: [4×4 double] TestIndtable: [4×4 table] theta: 4.4734 Q: 0.6346 U: 0.3580 Contrib2Chi2: [2×2 double] Contrib2Chi2table: [2×2 table] Contrib2Hyx: [2×2 double] Contrib2Hyxtable: [2×2 table] Contrib2tauyx: [2×2 double] Contrib2tauyxtable: [2×2 table]
Load the Housetasks data (a contingency table containing the frequency of execution of 13 house tasks in the couple).
N=[156 14 2 4;
124 20 5 4;
77 11 7 13;
82 36 15 7;
53 11 1 57;
32 24 4 53;
33 23 9 55;
12 46 23 15;
10 51 75 3;
13 13 21 66;
8 1 53 77;
0 3 160 2;
0 1 6 153];
rowslab={'Laundry' 'Main-meal' 'Dinner' 'Breakfast' 'Tidying' 'Dishes' ...
'Shopping' 'Official' 'Driving' 'Finances' 'Insurance'...
'Repairs' 'Holidays'};
colslab={'Wife' 'Alternating' 'Husband' 'Jointly'};
Ntable=array2table(N,'VariableNames',colslab,'RowNames',rowslab);
% Call to corrNominal with option 'plots',true
corrNominal(Ntable,'plots',true);
Chi2 index 1.9445e+03 pvalue Chi2 index 0 Phi index 1.0559 Cramer's V 0.6096 Test of H_0: independence between rows and columns Coeff se zscore pval _______ ________ ______ ____ CramerV 0.60963 0.016701 36.502 0 GKlambdayx 0.50787 0.018427 27.561 0 tauyx 0.40671 0.013787 29.5 0 Hyx 0.40833 0.013767 29.659 0 ----------------------------------------- Indexes and 95% confidence limits Value StandardError ConflimL ConflimU _______ _____________ ________ ________ CramerV 0.60963 0.016701 0.57689 0.63133 GKlambdayx 0.50787 0.018427 0.47175 0.54398 tauyx 0.40671 0.013787 0.37969 0.43374 Hyx 0.40833 0.013767 0.38134 0.43531
N
— Contingency table (default) or n-by-2 input dataset.
Matrix or Table.Matrix or table which contains the input contingency table (say of size I-by-J) or the original data matrix.
In this last case N=crosstab(N(:,1),N(:,2)). As default procedure assumes that the input is a contingency table.
If N is a data matrix (supplied as a a n-by-2 cell array of strings, or n-by-2 array or n-by-2 table) optional input datamatrix must be set to true.
Data Types: single| double
Specify optional comma-separated pairs of Name,Value
arguments.
Name
is the argument name and Value
is the corresponding value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
'conflev',0.99
, 'conflimMethodCramerV','fisheradj'
, 'dispresults',false
, 'Lr',{'a' 'b' 'c'}
, 'Lc',{'c1' c2' 'c3' 'c4'}
, 'datamatrix',true
, 'NoStandardErrors',true
, 'plots',true
conflev
—Confidence levels to be used to
compute confidence intervals.scalar.The default value of conflev is 0.95, that is 95 per cent confidence intervals are computed for all the indexes (note that this option is ignored if NoStandardErrors=true).
Example: 'conflev',0.99
Data Types: double
conflimMethodCramerV
—method to compute confidence interval for CramerV.character.Character which identifies the method to use to compute the confidence interval for Cramer index. Default value is 'ncchisq'. Possible values are 'ncchisq', 'ncchisqadj', 'fisher' or 'fisheradj'; 'ncchisq' uses the non central chi2. 'ncchisq' uses the non central chi2 adjusted for the degrees of fredom. 'fisher' uses the Fisher z-transformation and 'fisheradj' uses the fisher z-transformation and bias correction.
Example: 'conflimMethodCramerV','fisheradj'
Data Types: character
dispresults
—Display results on the screen.boolean.If dispresults is true (default) it is possible to see on the screen all the summary results of the analysis.
Example: 'dispresults',false
Data Types: Boolean
Lr
—Vector of row labels.cell.Cell containing the labels of the rows of the input contingency matrix N. This option is unnecessary if N is a table, because in this case Lr=N.Properties.RowNames;
Example: 'Lr',{'a' 'b' 'c'}
Data Types: cell array of strings
Lc
—Vector of column labels.cell.Cell containing the labels of the columns of the input contingency matrix N. This option is unnecessary if N is a table, because in this case Lc=N.Properties.VariableNames;
Example: 'Lc',{'c1' c2' 'c3' 'c4'}
Data Types: cell array of strings
datamatrix
—Data matrix or contingency table.boolean.If datamatrix is true the first input argument N is forced to be interpreted as a data matrix, else if the input argument is false N is treated as a contingency table. The default value of datamatrix is false, that is the procedure automatically considers N as a contingency table. In case datamatrix is true N can be a cell of size n-by-2 containing the two grouping variables or a numeric array of size n-by-2 or a table of size n-by-2.
Example: 'datamatrix',true
Data Types: logical
NoStandardErrors
—Just indexes without standard errors and p-values.boolean.if NoStandardErrors is true just the indexes are computed without standard errors and p-values. That is no inferential measure is given. The default value of NoStandardErrors is false.
Example: 'NoStandardErrors',true
Data Types: Boolean
plots
—balloonplot of squared Pearson residuals and Pareto plot
of squared Pearson residuals.boolean.If plots is true the following two plots of Pearson squared residuals are shown on the screen.
1) a bubble plot (ballonplot);
2) a Pareto plot.
In both plots entries of the contingency table associated with positive association (positive residuals) are shown in blue while those associated with negative association (negative residuals) are shown in red.
The default value of plots is false.
Example: 'plots',true
Data Types: Boolean
out
— description
StructureStructure which contains the following fields:
Value | Description |
---|---|
N |
I-by-J-array containing contingency table referred to active rows (i.e. referred to the rows which participated to the fit). The (i,j)-th element is equal to n_{ij}, i=1, 2, \ldots, I and j=1, 2, \ldots, J. The sum of the elements of out.N is n (the grand total). |
Ntable |
same as out.N but in table format (with row and column names). This output is present just if your MATLAB version is not<2013b. |
Chi2 |
scalar containing \chi^2 index. |
Chi2pval |
scalar containing pvalue of the \chi^2 index. |
Phi |
\Phi index. Phi is a chi-square-based measure of association that involves dividing the chi-square statistic by the sample size and taking the square root of the result. More precisely \Phi= \sqrt{ \frac{\chi^2}{n} } This index lies in the interval [0 , \sqrt{\min[(I-1),(J-1)]}. |
CramerV |
1 x 4 vector which contains Cramer's V index, standard error, z test, and p-value. Cramer'V index is index \Phi divided by its maximum. More precisely V= \sqrt{\frac{\Phi}{\min[(I-1),(J-1)]}}=\sqrt{\frac{\chi^2}{n \min[(I-1),(J-1)]}} The range of Cramer index is [0, 1]. A Cramer's V in the range of [0, 0.3] is considered as weak, [0.3,0.7] as medium and > 0.7 as strong. The way in which the confidence interval for this index is specified in input option conflimMethodCramerV. If conflimMethodCramerV is 'ncchisq', 'ncchisqadj' we first find a confidence interval for the non centrality parameter \Delta of the \chi^2 distribution with df=(I-1)(J-1) degrees of freedom. (see Smithson (2003); pp. 39-41) [\Delta_L \Delta_U]. If input option conflimMethodCramerV is 'ncchisq', confidence interval for \Delta is transformed into one for V by the following transformation V_L=\sqrt{\frac{\Delta_L }{n \min[(I-1),(J-1)]}} and V_U=\sqrt{\frac{\Delta_U }{n \min[(I-1),(J-1)]}} If input option conflimMethodCramerV is 'ncchisqadj', confidence interval for \Delta is transformed into one for V by the following transformation V_L=\sqrt{\frac{\Delta_L+ df }{n \min[(I-1),(J-1)]}} and V_U=\sqrt{\frac{\Delta_U+ df }{n \min[(I-1),(J-1)]}} |
GKlambdayx |
1 x 4 vector which contains index \lambda_{y|x} of Goodman and Kruskal standard error, z test, and p-value. \lambda_{y|x} = \sum_{i=1}^I \frac{r_i- r}{n-r} r_i =\max(n_{ij}) r =\max(n_{.j}) |
tauyx |
1 x 4 vector which contains tau index \tau_{y|x}, standard error, ztest and p-value. \tau_{y|x}= \frac{\sum_{i=1}^I \sum_{j=1}^J f_{ij}^2/f_{i.} -\sum_{j=1}^J f_{.j}^2 }{1-\sum_{j=1}^J f_{.j}^2 } |
Hyx |
1 x 4 vector which contains the uncertainty coefficient index (proposed by Theil) H_{y|x}, standard error, ztest and p-value. H_{y|x}= \frac{\sum_{i=1}^I \sum_{j=1}^J f_{ij} \log( f_{ij}/ (f_{i.}f_{.j}))}{\sum_{j=1}^J f_{.j} \log f_{.j} } |
TestInd |
4-by-4 array containing index values (first column), standard errors (second column), zscores (third column), p-values (fourth column). |
TestIndtable |
4-by-4 table containing index values (first column), standard errors (second column), zscores (third column), p-values (fourth column). |
ConfLim |
4-by-4 array containing index values (first column), standard errors (second column), lower confidence limit (third column), upper confidence limit (fourth column). |
ConfLimtable |
4-by-4 table containing index values (first column), standard errors (second column), lower confidence limit (third column), upper confidence limit (fourth column). |
theta |
cross product ratio. This index is computed just if the input table is 2-by-2 |
Q |
cross product ratio in the interval [-1 1] using the Q rescaling Q=(th-1)/(th+1). This index is computed just if the input table is 2-by-2 |
U |
cross product ratio in the interval [-1 1] using the U rescaling U=(sqrt(th)-1)/(sqrt(th)+1). This index is computed just if the input table is 2-by-2 |
Contrib2Chi2 |
I x J array containing the contributions (with sign) to the Chi2 index. Note that sum(abs(out.Contrib2chi),'all')=out.Chi2. |
Contrib2Chi2table |
same of out.Contrib2chi but in table format |
Contrib2Hyx |
I x J array containing the contributions to the Hyx index. Note that sum(out.Contrib2Hyx,'all')=out.Hyx. |
Contrib2Hyxtable |
same of out.Contrib2Hyx but in table format |
Contrib2tauyx |
I x J array containing the contributions to the tauyx index. Note that sum(out.Contrib2tauyx,'all')=out.tauyx. |
Contrib2tauyxtable |
same of out.Contrib2tauyx but in table format |
In the contingency table N of size I \times J, whose i,j entry is n_{ij}, the Pearson residuals are defined as \frac{n_{ij}-n_{ij}^*}{\sqrt{n_{ij}^*}} where n_{ij}^* is the theoretical frequency under the independence hypothesis. The sum of the squares of the Pearson residuals is equal to the \chi^2 statistic to test the independence between rows and columns of the contingency table.
\lambda_{y|x} is a measure of association that reflects the proportional reduction in error when values of the independent variable (variable in the rows of the contingency table) are used to predict values of the dependent variable (variable in the columns of the contingency table). The range of \lambda_{y|x} is [0, 1]. A value of 1 means that the independent variable perfectly predicts the dependent variable. On the other hand, a value of 0 means that the independent variable does not help in predicting the dependent variable.
More generally, let V(y) a measure of variation for the marginal distribution (f_{.1}=n_{.1}/n, ..., f_{.J}=n_{.J}/n) of the response y and let V(y|i) denote the same measure computed for the conditional distribution (f_{1|i}=n_{1|i}/n, ..., f_{J|i}=n_{J|i}/n) of y at the i-th setting of the explanatory variable x. A proportional reduction in variation measure has the form.
\frac{V(y) - E[V(y|x)]}{V(y|x)} where E[V(y|x)] is the expectation of the conditional variation taken with respect to the distribution of x. When x is a categorical variable having marginal distribution, (f_{1.}, \ldots, f_{I.}), E[V(y|x)]= \sum_{i=1}^I (n_{i.}/n) V(y|i) = \sum_{i=1}^I f_{i.} V(y|i) If we take as measure of variation V(y) the Gini coefficient V(y)=1 -\sum_{j=1}^J f_{.j} \qquad V(y|i)=1 -\sum_{j=1}^J f_{j|i}
we obtain the index of proportional reduction in variation \tau_{y|x} of Goodman and Kruskal.
\tau_{y|x}= \frac{\sum_{i=1}^I \sum_{j=1}^J f_{ij}^2/f_{i.} -\sum_{j=1}^J f_{.j}^2 }{1-\sum_{j=1}^J f_{.j}^2 } If, on the other hand, we take as measure of variation V(y) the entropy index V(y)=-\sum_{j=1}^J f_{.j} \log f_{.j} \qquad V(y|i) -\sum_{j=1}^J f_{j|i} \log f_{j|i}
we obtain the index H_{y|x}, (uncertainty coefficient of Theil).
H_{y|x}= \frac{\sum_{i=1}^I \sum_{j=1}^J f_{ij} \log( f_{ij}/ (f_{i.}f_{.j}))}{\sum_{j=1}^J f_{.j} \log f_{.j} }
The range of \tau_{y|x} and H_{y|x} is [0 1].
A large value of of the index represents a strong association, in the sense that we can guess y much better when we know x than when we do not.
In other words, \tau_{y|x}=H_{y|x} =1 is equivalent to no conditional variation in the sense that for each i, n_{j|i}=1. For example, a value of: \tau_{y|x}=0.85 indicates that knowledge of x reduces error in predicting values of y by 85 per cent (when the variation measure which is used is the Gini's index).
H_{y|x}=0.85 indicates that knowledge of x reduces error in predicting values of y by 85 per cent (when variation measure which is used is the entropy index) Remark: if the contingency table is of size 2x2 the following indexes are also computed theta=cross product ratio, index Q
Q= \frac{\theta-1}{\theta+1} and U U= \frac{\sqrt{\theta}-1}{\sqrt{\theta}+1}Agresti, A. (2002), "Categorical Data Analysis", John Wiley & Sons. [pp.
23-26]
Goodman, L.A. and Kruskal, W.H. (1959), Measures of association for cross classifications II: Further Discussion and References, "Journal of the American Statistical Association", Vol. 54, pp. 123-163.
Goodman, L.A. and Kruskal, W.H. (1963), Measures of association for cross classifications III: Approximate Sampling Theory, "Journal of the American Statistical Association", Vol. 58, pp. 310-364.
Goodman, L.A. and Kruskal, W.H. (1972), Measures of association for cross classifications IV: Simplification of Asymptotic Variances, "Journal of the American Statistical Association", Vol. 67, pp. 415-421.
Liebetrau, A.M. (1983), "Measures of Association", Sage University Papers Series on Quantitative Applications in the Social Sciences, 07-004, Newbury Park, CA: Sage. [pp. 49-56]
Smithson, M.J. (2003), "Confidence Intervals", Quantitative Applications in the Social Sciences Series, No. 140. Thousand Oaks, CA: Sage. [pp.
39-41]
crosstab
|
rcontFS
|
CressieRead
|
corr
|
corrOrdinal