# RobRegrSize

RobRegrSize provides proper threshold for robust estimators to obtain an empirical size close to 1 per cent nominal size

## Syntax

• thresh=RobRegrSize(n,p,robest,rhofunc,bdp,eff,sizesim,Tallis)example

## Description

 thresh =RobRegrSize(n, p, robest, rhofunc, bdp, eff, sizesim, Tallis) RobRgerSize with all defalut options.

## Examples

expand all

### RobRgerSize with all defalut options.

Find the threshold for MM estimator, Tukey biweight rho function with efficiency 0.87 (simultaneous size)

n=232;
p=10;
bdp='';
robest='MM';
eff=0.87;
rhofunc='TB';
sizesim=1;
thresh=RobRegrSize(n,p,robest,rhofunc,bdp,eff,sizesim);

## Related Examples

expand all

Find the threshold for MM estimator, take an average threhold for all rho functions, and use efficiency 0.85 (simultaneous size)

n=93;
p=5;
bdp='';
eff=0.85;
robest='MM';
rhofunc='ST';
sizesim=1;
thresh=RobRegrSize(n,p,robest,rhofunc,bdp,eff,sizesim);

Find the threshold for LTS estimator, use Tallis correction to infer a threshold for bdp equal to 0.27 (simultaneous size)

n=72;
p=10;
bdp=0.27;
robest='LTS';
eff='';
rhofunc='';
sizesim=1;
Tallis=1;
thresh=RobRegrSize(n,p,robest,rhofunc,bdp,eff,sizesim,Tallis);

Find the threshold for S estimator and hyperbolic rho function, use Tallis correction to infer a threshold for bdp equal to 0.3 (simultaneous size)

n=100;
p=5;
bdp=0.3;
robest='S';
eff='';
rhofunc='HY';
sizesim=1;
Tallis=1;
thresh=RobRegrSize(n,p,robest,rhofunc,bdp,eff,sizesim,Tallis);

## Input Arguments

### n — sample size. Scalar integer.

Number of units of the regression dataset.

REMARK - simulations have been done for n=50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 400, 500.

For other values of n the threhold are found by interpolation using the two closest values smaller or greater than the one which has been considered

Data Types: ingle | double | int8 | int16 | int32 | int64 |uint8 | uint16 | uint32 | uint64

### p — number of variables. Scalar integer.

Number of explanatory variables.

REMARK - simulations have been done for p=2, 3, ..., 10. If the user supplies a value of p greater than 10 the correction factors are extrapolated by fitting a simple quadratic model in p.

Data Types: ingle | double | int8 | int16 | int32 | int64 |uint8 | uint16 | uint32 | uint64

### robest — robust estimator. String.

String which identifies the robust estimator which is used Possibile values are:

'S' S estimators;

'MM' MM estimators;

'LTS' Least trimmed squares estimator;

'LTSr' Least trimmed squares estimator reweighted.

If robest is missing MM estimator is used

Data Types: har

### rhofunc — Weight function. String.

String which identifies the weight function which has been used for S or MM.

Possibile values are 'TB', for Tukey biweight rho function;

'HA', for Hampel rho function;

'HY', for hyperbolic rho function;

'OP', for optimal rho function;

'ST' Soft trimming estimator (in this case an average threshold based on the TB,HY,HA and OP is used) REMARK - this value is ignored if robest is LTS or LTSr If rhofunc is missing and robest is 'S' or 'MM', the default value of rhofunc is 'ST'.

Data Types: har

### bdp — breakdown point. Scalar.

Scalar between 0 and 0.5. If robest is S, LTS or LTSr and bdp is missing a value of 0.5 is used as default.

REMARK - simulations have been done for bdp=0.25 and 0.50 If the user supplies a value of bdp smaller than 0.25, the threhold found for bdp=0.25 is used. In this case a warning is produced which alerts the user that the test is likely to be conservative. If on the other hand bdp is a value in the interval (0.25 0.5) an average between bdp=0.25 and bdp=0.5 is used (for a more refined correction please see input option Tallis)

Data Types: single| double

### eff — nominal efficiency. Scalar.

Scalar between between 0.5 and 1-epsilon (if robest is 'MM') REMARK - simulations have been done for eff = 0.85, 0.90 and 0.95 If the user supplies a value of eff smaller than 0.85 (greater than 0.95), the threshold found for eff=0.85 (eff=0.95) is used. In all the other cases an average is taken using the two closest values of eff.

Data Types: single| double

### sizesim — simultaneous or individual size. Scalar.

Scalar which specifies whether simultaneous (sizesim=1) or individual size is used. If sizesim is missing or equal to 1 a simultaneous size is used.

Data Types: single| double

### Tallis — need to intermpolate. Scalar.

Scalar which has an effect just if bdp is not equal to 0.25 or 0.5. If Tallis=1 the program computes the ratio between the asymptotic consitency factor using the breakdown point supplied by the user and the closest consistency factor associated to the breakdown point for which simulations exist. Therefore, if for example the supplied breakdown is smaller than 0.25 the program multiplies the empirical threshold using bdp=0.25 by a number smaller than 1.

Similarly, if bdp>0.375 the program multiplies the empirical threshold using bdp=0.5 by a number smaller than 1. If supplied bdp is very close to 0.25 or 0.5 we suggest to use this option otherwise it is better to take a simple average of the threholds associated to the two closest breakdown points for which simulations exist. The default value of Tallis is 0.

Data Types: single| double

## Output Arguments

### thresh —Empirical threshold. Scalar

Emprirical threshold which can be used in order to have a test with en empirical size close to the nominal size (1 individual or simultaneous)

We assume that the two input MAT files Ind_ThreshSm.mat and Sim_ThreshSm.mat are in the same folder or in the MATLAB path.

Ind_ThreshSm.mat contains a 3D array with the thresholds in case an individual size is requested Sim_ThreshSm.mat contains a 3D array with the thresholds in case a simultaneous size is requested The two 3D arrays have dimension 12-by-9-by-24 The 12 rows are referred to the 12 sample sizes which have been considered namely n=50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 400, 500.

The 9 colums are referred to the number of variables which have been considered namely p=2, 3, ..., 10.

The third dimension is associated with the 24 estimators which have been used. The order of the estimators is:

' 1' 'LTSbdp050' ;

' 2' 'LTSbdp025' ;

' 3' 'LTSrbdp050';

' 4' 'LTSrbdp025';

' 5' 'Sbdp025TB' ;

' 6' 'Sbdp050TB' ;

' 7' 'MMeff085TB';

' 8' 'MMeff090TB';

' 9' 'MMeff095TB';

'10' 'Sbdp025OP' ;

'11' 'Sbdp050OP' ;

'12' 'MMeff085OP';

'13' 'MMeff090OP';

'14' 'MMeff095OP';

'15' 'Sbdp025HY' ;

'16' 'Sbdp050HY' ;

'17' 'MMeff085HY';

'18' 'MMeff090HY';

'19' 'MMeff095HY';

'20' 'Sbdp025HA' ;

'21' 'Sbdp050HA' ;

'22' 'MMeff085HA';

'23' 'MMeff090HA';

'24' 'MMeff095HA'.

## References

Salini S., Cerioli A., Laurini F. and Riani M. (2014), Reliable Robust Regression Diagnostics, "International Statistical Review", Vol. 84, pp. 99-127.