Processing math: 0%

Technical introduction to Robust methods in multivariate analysis

Notation

Classical multivariate methods of estimation are based on the assumption of an i.i.d. sample of observation with each y_i have a v-variate normal N_v(\mu, \Sigma) with density f(y) given by

 

f(y)=\frac{1}{(2\pi)^{v/2}\sqrt{|\Sigma|}}\exp \Big( -\frac{1}{2}(y-\mu)' \Sigma^{-1}(y-\mu)\Big)

 

where \Sigma = \textrm{var}(y) and |\Sigma| stands for the determinant of \Sigma. If y is multivariate normal for any constant vector a, all linear combinations a'y are normally distributed. In addition given that the conditional expectation of a subgroup of coordinates with any other subgroup is a linear function of the latters, the type of dependence among variables is linear. Thus, methods based on multivariate normality only give information about linear relatioships among coordinates. As in the univariate case, the main reason for this assumption is simplicity.

If one dispersion matrix is a scalar multiple of another i.e. \Sigma_2=k\Sigma_1 we say that they have the same shape but a different size. In what follows we denote with simbol \hat \mu the estimate of location, with symbol \hat \Gamma the estimate of the shape matrix and with symbol  \hat \Sigma the estimate of the covariance matrix where \hat \Sigma= \hat \sigma^2 \hat \Gamma.

As in the univariate case, one may consider the approach of outlier detection. We are concerned with the case where a certain proportion of the observations may not follow model above. The Mahalanobis distance (in squared units) between the vectors y and \mu with respect to matrix \Sigma is defined as

d(y,\mu,\Sigma)=(y-\mu)'\Sigma^{-1}(y-\mu)

Remark: the code to implement the Mahalanobis distances from a given centroid and a scatter matrix (which are not necessarily the mean and the sample covariance matrix) is implemented in function mahalFS.

 

Affine equivariance

Let (\tilde \mu(Y),\tilde \Sigma(Y)) be the location and dispersion estimates corresponding to a sample Y={y_1,y_2,\dots,y_n}, the estimates are affine equivariant if for each constant vector b and matrix A:

\tilde \mu (AY+b)=A\tilde \mu (Y)+b \quad \quad \quad \tilde \Sigma(AY+b)=A\tilde \Sigma (Y)A'

 

Affine equivariance is a natural property in those situations where it is desirable that the reuslts remain essentially unchanged under any nonsingular linear transformations, like linear discriminant analysis, canonical correlation and factor analysis.

 

The multivariate M-estimate

In the regression case it was possible to define separately robust equivariant estimate of location and of dispersion. This is more complicated to do in the multivariate case, and if we require equivariant estimates it is better to estimate location and dispersion simultaneously.

The multivariate M-estimate of location and dispersion are defined as the solution of the following system of equations (estimating equations)

\begin{cases} \sum_{i=1}^n W_1(d_i)(y_i-\hat \mu)=0\\ \\ \frac{1}{n}\sum_{i=1}^n W_2(d_i)(y_i-\hat \mu)(y_i-\hat \mu)'=\hat \Sigma \\ \end{cases}

where the functions W_1 and W_2 need not to be equal.

Remark 1: If function W_2 is non decreasing, the solution to this system of equations is called monotone multivaritate M estimates, while if W_2 is redescending the solutions are called redescending multivariate M-estimates and must be defined by the minimization of  some objective function as it happens in the case of S estimates (which are introduced in the next section).

Remark 2: it is possible to show that the multivariate M estimates are affine equivariant and asymptotically have a multivariate normal distribution.

 

The multivariate S-estimates and the MVE estimate

Just as with the regression estimates where we aimed at making the residuals "small", we shall define multivaraite estimates of location and dispersion that make the distances d_i small. To this purpose we look for \hat \mu and \hat \Sigma = \hat \sigma^2 \hat \Gamma minimizing some measures of "largeness" of d(y,\hat \mu,\hat \Sigma). From the expression of the Mahalanobis distance it is clear that the minimum can be trivially attained by letting the smallest eigenvalue of \hat \Sigma tend to zero. To prevent this, it is customary to impose the constraint |\hat \Gamma|=1. If we call with S_v the set of symmetric positive definite v \times v shape matrices. For a dataset Y call d(Y,\hat \mu, \hat \Gamma) the vector with elements d(y_i,\hat \mu, \hat \Gamma) with i=1,2,\dots,n and let \hat \sigma be a robust scale estimate.

Then we define the estimates \hat \mu and \hat \Gamma by

\min \limits_{\hat \mu \in R^p, \; \hat \Gamma \in S_p \textrm{with} \; |\hat \Gamma|=1} \hat \sigma \Big(d(Y,\hat \mu,\hat \Gamma) \Big)

If we take for \hat \sigma an M scale estimate that satisfies

\frac{1}{n} \sum_{i=1}^n \rho \Bigg(\frac{d_i}{\hat \sigma} \Bigg) = K

where \rho is a smooth bounded \rho-function, we obtain the class of S estimates. If \rho is differentiable, it is easy to show that to solve the minimization problem the solution must satisfy the so called M estimating equations. So an S estimate \hat \mu_S and \hat \Sigma_S is also an M estimate.

In other words, for any \tilde \mu, \tilde \Gamma with |\tilde \Gamma|=1, and \hat \sigma_S = \hat \sigma_S \Big(d(y_i,\hat \mu_S, \hat \Gamma_S) \Big) the S estimates \hat \mu_S and \hat \Sigma_S satisfy the following inequality

\sum_{i=1}^n \rho \Bigg( \frac{d(y_i,\hat \mu_S, \hat \Gamma_S)}{\hat \sigma_S}\Bigg) \leq \sum_{i=1}^n \rho \Bigg( \frac{d(y_i,\tilde \mu,\tilde \Gamma)}{\hat \sigma_S} \Bigg)

where \hat \sigma_Sis the same in the denominator on both sides of the equation.

The S estimate of location \hat \mu_S, scale \hat \sigma_S and shape \hat \Gamma_S and consequently the S estimate of the covariance matrix \hat \Sigma_S=\sigma_S^2 \hat \Gamma_S can be found using routine Smult.

 

If we take for \hat \sigma (to mimic the approach that results in the LMS in regression) the sample median of the Mahalanobis distances, the resulting location and dispersion matrix estimate is called minimum volume ellipsoid (MVE) estimate. The name comes from the fact that among all ellipsoids \{x:d(x,\mu,\Sigma) \} containing at least half of the data points, the one given by the MVE estimate has minimum volume (i.e. minimum |\Sigma|).

The MVE estimate of location \hat \mu_{MVE} and  of the covariance matrix  \hat \Sigma_{MVE} can be found using routine mve.

 

 

The multivariate L-estimates of scale and the MCD estimate

An alternative to using an M-scale or the median of the distances, we can use a trimmed scale for \hat \sigma (as it was done in regression to define the LTS estimate). More formally, let

d_{(1)} \leq d_{(2)} \leq \ldots \leq d_{(n)}

be the ordered values of the squared distances d_i =d(x_i, \mu, \Gamma), and for 1< h< n define the trimmed scale of the squared distances as:

\hat \sigma = \sum_{i=1}^h d_{(i)}

An estimate \hat \mu, \hat \Sigma based on this trimmed scale is called a minimum covariance determinant (MCD) estimate. The reason for this name is the following: for each ellipsoid \{x: d(x, t, V)\} we can compute the covariance matrix C of the data points in the ellipsoid. If \hat \mu, \hat \Sigma is an MCD estimate, the ellipsoid with t=\hat \mu_{MCD} and V equal to a scalar multiple of \hat \Sigma_{MCD} minimizes |\textrm{C}|.

The MCD estimate of location \hat \mu_{MCD} and of the covariance matrix \hat \Sigma_{MCD} can be found using routine mcd. For each elemental subset which is extracted (which satisfies certain conditions) we compute the minimum value of the determinant checking for singularity. For the subsets with the 5 best (bestr) minimum values of the determinant a series of refining steps (C-steps) are done.

The multivariate MM-estimate

The MM-regression estimator of location and shape is defined as the minimum of the following f function

f(\hat \mu_{MM},\hat \Gamma_{MM})= \min \limits_{\mu \in R^p, \; \Gamma \in S_p \textrm{with} \; |\Gamma|=1} \frac{1}{n}\sum_{i=1}^n \rho_2 \Bigg( \frac{d(y_i,\mu,\Gamma)}{\hat \sigma} \Bigg )

where \rho_2 is possibly another \rho function which provides higher efficiency than the previous ρ at the null multivariate normal model. Function f is minimized with respect to \mu and \Sigma for fixed (\hat \sigma). In this equation \hat \sigma is any auxiliary robust scale estimate, however it is common to use \hat \sigma_S and, as starting values of location and shape, those which come out from the S estimator (that is \hat \mu_S and \hat \Gamma_S) .

\hat \mu_{MM} and \hat \Gamma_{MM} can be computed using routines MMmult and MMmultcore. More precisely, MMmultcore assumes that user supplies an estimate of \sigma, \mu and \Gamma; on the other hand, function MMmult uses function Smult to preliminary compute an estimate of \sigma, \mu and \Gamma.

 


The developers of the toolbox The forward search group Terms of Use Acknowledgments