Mlocsca computes M estimator of location and scale in univariate samples
This routines compute simultaneous M estimates of location and scale in univariate samples.
Example of use of Mlocsca with four input arguments.outIRWLS
=Mlocsca(y
,
psifunc
,
initialmu
,
initialscale
)
Example of M estimate of location and scale using TB function.
psifuncTB=struct; psifuncTB.class='TB'; bdp=0.5; c=TBbdp(bdp,1); % kc = E(rho) = sup(rho)*bdp kc=c^2/6*bdp; psifuncTB.c1=c; psifuncTB.kc1=kc; % trueloc and truescale are the true values of the two parameters trueloc=5; truescale=2; % fraction of units to contaminate fraccontamination=0.3; % shift in contamination shift=100; % n is sample size n=2000; rng(1000) u=truescale*randn(n,1)+trueloc; n1=round(n*fraccontamination); % shift contamination u(1:n1)=u(1:n1)+shift; % Specify initial estimate of location. initialmu=10; out=Mlocsca(u,psifuncTB,initialmu); disp('Robust estimate of location') disp(out.location) disp('Robust estimate of scale') disp(out.scale)
M estimate of location ans scale using Hampel rho function with a value of c associated to a breakdown point of 0.5
psifunc=struct; psifunc.class='HA'; abc=[1.5 3.5 8]; bdp=0.5; c=HAbdp(bdp,1,abc); psifunc.c1=[c abc]; % kc = E(rho) = sup(rho)*bdp kc=HArho(c*abc(3),[c, abc])*bdp; psifunc.c1=[c, abc]; psifunc.kc1=kc; % trueloc and truescale are the true values of the two parameters trueloc=5; truescale=2; % n is sample size n=20000; u=truescale*randn(n,1)+trueloc; % Just 2 outliers with point mass contamination u(1:2)=10000000; out=Mlocsca(u,psifunc); disp('Robust estimate of location') disp(out.location) disp('Robust estimate of scale') disp(out.scale)
M estimate of location and scale using optimal rho function with a value of c associated to a breakdown point of 0.3
psifuncOPT=struct; psifuncOPT.class='OPT'; bdp=0.3; cOPT=OPTbdp(bdp,1); rhoOPTsup=OPTrho(200000,1); % rhoOPTsup=1; psifuncOPT.c1=cOPT; % kc = E(rho) = sup(rho)*bdp psifuncOPT.kc1=rhoOPTsup*bdp; % trueloc and truescale are the true values of the two parameters trueloc=5; truescale=2; % n is sample size n=200; u=truescale*randn(n,1)+trueloc; % Just 10 outliers with point mass contamination u(1:10)=10000000; % 10 and 8 are our initial estimates of location and scale out=Mlocsca(u,psifuncOPT,10,8); disp('Robust estimate of location') disp(out.location) disp('Robust estimate of scale') disp(out.scale)
M estimate of location ans scale using power divergence rho function with a value of c associated to a breakdown point of 0.2
psifuncPD=struct; psifuncPD.class='PD'; bdp=0.2; c1=PDbdp(bdp); psifuncPD.c1=c1; psifuncPD.kc1=bdp; % trueloc and truescale are the true values of the two parameters trueloc=5; truescale=2; % n is sample size n=200; u=truescale*randn(n,1)+trueloc; % Just 10 outliers with point mass contamination u(1:20)=10000000; % 10 and 8 are our initial estimates of location and scale % Set the tolerance tol=1e-20; outIRWLS=Mlocsca(u,psifuncPD,10,8,tol); disp('Robust estimate of location') disp(outIRWLS.location) disp('Robust estimate of scale') disp(outIRWLS.scale)
Robust estimate of location 5.0739 Robust estimate of scale 3.1579
M estimate of location and scale using hyperbolic rho function with a value of c associated to a breakdown point of 0.11
psifuncHYP=struct; psifuncHYP.class='HYP'; bdp=0.11; [cHYP,A,B,d]=HYPbdp(bdp,1); k=6; rhoHYPsup=HYPrho(200000,[cHYP,k,A,B,d]); % rhoHAsup=1; psifuncHYP.c1=[cHYP,k,A,B,d]; % kc = E(rho) = sup(rho)*bdp psifuncHYP.kc1=rhoHYPsup*bdp; % trueloc and truescale are the true values of the two parameters trueloc=5; truescale=2; % n is sample size n=200; u=truescale*randn(n,1)+trueloc; % Just 10 outliers with point mass contamination u(1:20)=10000000; % 10 and 8 are our initial estimates of location and scale % Set the tolerance tol=1e-20; % Set maximum number of iterations maxiter=500; out=Mlocsca(u,psifuncHYP,10,8,tol,maxiter); disp('Robust estimate of location') disp(out.location) disp('Robust estimate of scale') disp(out.scale)
y
— A vector with n elements which contains the response variable.
It can be both a row or column vector.
Data Types: single| double
psifunc
— a structure specifying the class of rho function to use, the
consistency factor, and the value associated with the
Expectation of rho in correspondence of the consistency
factor.
struct.Psifunc is a struct which contains the following fields
Value | Description |
---|---|
class |
string identyfing the rho (psi) function to use. Admissible values for class are 'bisquare', 'optimal' 'hyperbolic' and 'hampel' |
c1 |
consistency factor associated to required breakdown point More precisely, psifunc.c1(1) contains consistency factor associated to required breakdown point or nominal efficiency psifunc.c1(2:end) may contain other parameters associated with the rho (psi) function. For example, if psifunc.class is 'hampel', c1(2:4) must contain parameters (a, b and c) of Hampel rho function. If length(psifunc.c1)=1 psifunc.class is HA, then default parameters for Hampel rho function are used. Remark: if class is 'hyperbolic' it is also possible to specify parameters k (sup CVC), A, B and d |
kc1 |
Expectation of rho associated with c1 |
Data Types: struct
initialmu
— starting value of the location estimate.
Scalar.The initial estimate of location to use in the first iteration.
If not defined, initialmu is set equal to median(y).
Example: 0.34
Data Types: double
initialscale
— starting value of the dispersion estimate.
Scalar.The initial estimate of the scale to use in the first iteration.
If not defined, initialscale is set equal to 1.4826*mad(y,1).
Example: 1.32
Data Types: double
tol
— scalar.
The tolerance for controlling convergence.If not defined, tol is fixed to 1e-7.
Example: 1e-10
Data Types: double
maxiter
— scalar.
Maximum number of iterations to find the location estimate.If not defined, maxiter is fixed to 200.
Example: 100
Data Types: double
outIRWLS
— description
StructureA structure containing the following fields:
Value | Description |
---|---|
location |
Location estimate. Estimate of location after refsteps refining steps |
scale |
scale estimate. Estimate of scale after refsteps refining step |
weights |
n-by-1 vector. Weights assigned to each observation |
In the IRWLS (iterative reweighted least square) procedure the value of location and the value of the scale are updated in each step. In order to find simultaneous estimates of location and dispersion we need to solve the system of two equations \[ \sum_{i=1}^n \psi_\text{loc}\left( \frac{y_i - \hat{\mu}}{\hat{\sigma}}\right) = 0 \] \[ \sum_{i=1}^n \rho_\text{scale}\left( \frac{y_i - \hat{\mu}}{\hat{\sigma}}\right) = K. \]
In the two equations above we distinguish between $\rho_\text{scale}$ used for scale estimation and $\psi_\text{loc}$ and its derivatives used for location. The two need not be different and are, indeed, often the same. In this routine we assume that they are the same and are specified in input parameter psifunc.
$K$ corresponds to input parameter psifunc.kc1.
Given starting values $\hat{\mu}_0$ and $\hat{\sigma}_0$ the pair of reweighting equations moves forward from stage $k$ using the calculations
Note that in order to compute $\rho_\text{scale}$ we need psifunc.c1, the tuning constant associated to a nominal value of breakdown point or efficiency. $K$ corresponds to input parameter psifunc.kc1.
is less than a prespecified tolerance (optional input argument tol) or $k$ is equal to optional input parameter maxiter (maximum number of iterations).
This alternating algorithm converges to a point with zero derivatives, which may be a minimum, a maximum or a saddle point.
Maronna, R.A., Martin D. and Yohai V.J. (2006), "Robust Statistics, Theory and Methods", Wiley, New York.