SparseTableTest computes independence test for large and sparse contingency tables
This function implements a new test of indipendence between row variables distribution ('outcomes') and columns ('treatments') which is expecially suited for the analysis of large and sparse $I$-by-$J$ contingency tables. The procedure is based on the collapsing of the original table into a set of 2-by-2 tables for each cell of the original table which has no less than a small number of counts (set in the optional input parameter 'threshold') and testing each of the resulting collapsed tables for independence by any test (Fisher exact test (default), Barnard test or those belonging to the power divergence family of Cressie and Read).
Because of the Bonferroni inequality, a sufficient condition for attaining a significance level $\alpha$ for this test (i.e., the probability of detecting a positive association between two levels of the response variables when in fact there is no such association) is that each test done for each cell of the $I$-by-$J$ table rejects with significance level equal to $\alpha$ divided by the number of comparisons done. An additional bonus of the procedures is that it enables to highlight the most important contribution to the association of each single entry of the original I-by-J-table two way table. The original idea of this test is due to Spyros Arsenis (Joint Research Centre of the European Commission) and has been successfully applied to the analysis of contingency table coming from international trade data.
Cressie and Read test on collapsed contingency table.out
=SparseTableTest(N
,
Name, Value
)
Arsenis, S. and Riani, M. (2019), Data mining large contingency tables standard approaches and a new method, in preparation.