RobRegrSize provides proper threshold for robust estimators to obtain an empirical size close to 1 per cent nominal size
thresh=RobRegrSize(n,p,robest,rhofunc,bdp,eff,sizesim,Tallis)
exampleFind the threshold for MM estimator, Tukey biweight rho function with efficiency 0.87 (simultaneous size)
n=232; p=10; bdp=''; robest='MM'; eff=0.87; rhofunc='TB'; sizesim=1; thresh=RobRegrSize(n,p,robest,rhofunc,bdp,eff,sizesim);
Find the threshold for MM estimator, take an average threhold for all rho functions, and use efficiency 0.85 (simultaneous size)
n=93; p=5; bdp=''; eff=0.85; robest='MM'; rhofunc='ST'; sizesim=1; thresh=RobRegrSize(n,p,robest,rhofunc,bdp,eff,sizesim);
Find the threshold for LTS estimator, use Tallis correction to infer a threshold for bdp equal to 0.27 (simultaneous size)
n=72; p=10; bdp=0.27; robest='LTS'; eff=''; rhofunc=''; sizesim=1; Tallis=1; thresh=RobRegrSize(n,p,robest,rhofunc,bdp,eff,sizesim,Tallis);
Find the threshold for S estimator and hyperbolic rho function, use Tallis correction to infer a threshold for bdp equal to 0.3 (simultaneous size)
n=100; p=5; bdp=0.3; robest='S'; eff=''; rhofunc='HY'; sizesim=1; Tallis=1; thresh=RobRegrSize(n,p,robest,rhofunc,bdp,eff,sizesim,Tallis);
n
— sample size.
Scalar integer.Number of units of the regression dataset.
REMARK - simulations have been done for n=50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 400, 500.
For other values of n the threhold are found by interpolation using the two closest values smaller or greater than the one which has been considered
Data Types: ingle | double | int8 | int16 | int32 | int64 |uint8 | uint16 | uint32 | uint64
p
— number of variables.
Scalar integer.Number of explanatory variables.
REMARK - simulations have been done for p=2, 3, ..., 10. If the user supplies a value of p greater than 10 the correction factors are extrapolated by fitting a simple quadratic model in p.
Data Types: ingle | double | int8 | int16 | int32 | int64 |uint8 | uint16 | uint32 | uint64
robest
— robust estimator.
String.String which identifies the robust estimator which is used Possibile values are: 'S' S estimators;
'MM' MM estimators;
'LTS' Least trimmed squares estimator;
'LTSr' Least trimmed squares estimator reweighted.
If robest is missing MM estimator is used
Data Types: har
rhofunc
— Weight function.
String.String which identifies the weight function which has been used for S or MM.
Possibile values are 'TB', for Tukey biweight rho function;
'HA', for Hampel rho function;
'HY', for hyperbolic rho function;
'OP', for optimal rho function;
'ST' Soft trimming estimator (in this case an average threshold based on the TB,HY,HA and OP is used) REMARK - this value is ignored if robest is LTS or LTSr If rhofunc is missing and robest is 'S' or 'MM', the default value of rhofunc is 'ST'.
Data Types: har
bdp
— breakdown point.
Scalar.Scalar between 0 and 0.5. If robest is S, LTS or LTSr and bdp is missing a value of 0.5 is used as default.
REMARK - simulations have been done for bdp=0.25 and 0.50 If the user supplies a value of bdp smaller than 0.25, the threhold found for bdp=0.25 is used. In this case a warning is produced which alerts the user that the test is likely to be conservative. If on the other hand bdp is a value in the interval (0.25 0.5) an average between bdp=0.25 and bdp=0.5 is used (for a more refined correction please see input option Tallis)
Data Types: single| double
eff
— nominal efficiency.
Scalar.Scalar between between 0.5 and 1-epsilon (if robest is 'MM') REMARK - simulations have been done for eff = 0.85, 0.90 and 0.95 If the user supplies a value of eff smaller than 0.85 (greater than 0.95), the threshold found for eff=0.85 (eff=0.95) is used. In all the other cases an average is taken using the two closest values of eff.
Data Types: single| double
sizesim
— simultaneous or individual size.
Scalar.Scalar which specifies whether simultaneous (sizesim=1) or individual size is used. If sizesim is missing or equal to 1 a simultaneous size is used.
Data Types: single| double
Tallis
— need to intermpolate.
Scalar.Scalar which has an effect just if bdp is not equal to 0.25 or 0.5. If Tallis=1 the program computes the ratio between the asymptotic consitency factor using the breakdown point supplied by the user and the closest consistency factor associated to the breakdown point for which simulations exist. Therefore, if for example the supplied breakdown is smaller than 0.25 the program multiplies the empirical threshold using bdp=0.25 by a number smaller than 1.
Similarly, if bdp>0.375 the program multiplies the empirical threshold using bdp=0.5 by a number smaller than 1. If supplied bdp is very close to 0.25 or 0.5 we suggest to use this option otherwise it is better to take a simple average of the threholds associated to the two closest breakdown points for which simulations exist. The default value of Tallis is 0.
Data Types: single| double
thresh
—Empirical threshold.
ScalarEmprirical threshold which can be used in order to have a test with en empirical size close to the nominal size (1 individual or simultaneous)
We assume that the two input MAT files Ind_ThreshSm.mat and Sim_ThreshSm.mat are in the same folder or in the MATLAB path.
Ind_ThreshSm.mat contains a 3D array with the thresholds in case an individual size is requested Sim_ThreshSm.mat contains a 3D array with the thresholds in case a simultaneous size is requested The two 3D arrays have dimension 12-by-9-by-24 The 12 rows are referred to the 12 sample sizes which have been considered namely n=50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 400, 500.
The 9 colums are referred to the number of variables which have been considered namely p=2, 3, ..., 10.
The third dimension is associated with the 24 estimators which have been used. The order of the estimators is: ' 1' 'LTSbdp050' ;
' 2' 'LTSbdp025' ;
' 3' 'LTSrbdp050';
' 4' 'LTSrbdp025';
' 5' 'Sbdp025TB' ;
' 6' 'Sbdp050TB' ;
' 7' 'MMeff085TB';
' 8' 'MMeff090TB';
' 9' 'MMeff095TB';
'10' 'Sbdp025OP' ;
'11' 'Sbdp050OP' ;
'12' 'MMeff085OP';
'13' 'MMeff090OP';
'14' 'MMeff095OP';
'15' 'Sbdp025HY' ;
'16' 'Sbdp050HY' ;
'17' 'MMeff085HY';
'18' 'MMeff090HY';
'19' 'MMeff095HY';
'20' 'Sbdp025HA' ;
'21' 'Sbdp050HA' ;
'22' 'MMeff085HA';
'23' 'MMeff090HA';
'24' 'MMeff095HA'.
Salini S., Cerioli A., Laurini F. and Riani M. (2014), Reliable Robust Regression Diagnostics, "International Statistical Review", Vol. 84, pp. 99-127.