Introduction to transformations in multivariate analysis

As we have already seen in the analysis of transformations in regression analysis, the analysis of data is often improved by using a transformation of the response rather than the original response itself.

In the extension of the Box and Cox (1964) family to multivariate responses there is a vector $\lambda$ of $v$ transformation parameters $\lambda_i$, one for each of the $v$ responses. As before, $y_i$ is the $v \times 1$ vector of responses at observation i with $y_{ij}$ the observation on response $j$. The normalized transformation of $y_{ij}$ is given by

\begin{eqnarray} z_{ij}(\lambda_j) & = & (y_{ij}^{\lambda_j}-1)/\lambda_j \dot{y}_{j}^{\lambda_j-1} \quad \quad (\lambda \neq 0)\\ & = & \dot{y}_{j} \log y_{ij} \quad \quad \quad \quad \quad \quad \quad \quad (\lambda = 0), \end{eqnarray}

where $\dot{y}_{j}$ is is the geometric mean of the $j$-th response. The value $\lambda_j=1 \; (j=1,\dots,v)$ λj = 1 (j = 1,... , v) corresponds to no transformation of any of the responses. If the transformed observations are normally distributed with vector mean $\mu_i$ for the $i$-th observation and covariance matrix $\Sigma$, twice the profile loglikelihood of the observations is given by

\begin{eqnarray} 2L_{\max}(\lambda) & = & \textrm{const} - n\log |\hat \Sigma(\lambda)|-\sum_{i=1}^n \{z_i(\lambda)-\hat \mu_i(\lambda)\}^T\hat\Sigma^{-1}(\lambda)\{z_i(\lambda)-\hat \mu_i(\lambda)\}\\ & = & \textrm{const} - n\log |\hat \Sigma(\lambda)|-\sum_{i=1}^n e_i(\lambda)^T \hat \Sigma(\lambda)^{-1}e_i(\lambda). \end{eqnarray}

In this equation the parameter estimates $\hat \mu_i(\lambda)$ and $\hat \Sigma(\lambda)$ are found for fixed $\lambda$ and $e_i(\lambda)$ is the $v \times 1$ vector of residuals for observation $i$ for the same value of $\lambda$. As in the analysis of univariate transformations, it makes no difference in likelihood ratio tests for the value of $\lambda$ whether we use the maximum likelihood estimator of $\Sigma$, or the unbiased estimator $\Sigma_u$. Suppose the maximum likelihood estimator is used, so

$$ n\hat \Sigma (\lambda) = \sum_{i=1}^n e_ie_i^T.$$

When this estimate is substituted in the previous equation, the profile loglikelihood reduces to $$ 2L_{\max}(\lambda) = \textrm{const}' -n \log |\hat \Sigma (\lambda)|. $$

So, to test the hypothesis $\lambda = \lambda_0$, the likelihood ratio test statistic for transformation becomes

$$T_{LR} =n \log \{ |\hat \Sigma (\lambda_0)|/|\hat \Sigma(\hat \lambda)|\} $$

is compared with the $\chi^2$ distribution on $v$ degrees of freedom. In this equation $\hat \lambda$ is the vector of $v$ parameter estimates maximising $L_{\max}$, which is found by numerical search. Replacement of $\hat \Sigma (\lambda)$ by the unbiased estimator results in the multiplication of each determinant by a factor which cancels, leaving the value of the statistic unchanged. There are two special cases when tests of the form of $t_{LR}$ are on one degree of freedom. In the first we test that all responses should have the same, unspecified, transformation, that is that $\lambda_1 = \lambda_2 = \dots = \lambda_v = \lambda$. The second is the test of just one component of $\lambda$ when all others are kept at some specified value. In both cases we sometimes plot $T_N$, the signed square root form of these tests is used.

For transformations of a single variable we use forward plots of the score statistic $T_p(\lambda)$. However, influential observations may only be evident for some transformations of the data, but not others. Therefore, we employ the forward search on untransformed data and on data subject to various transformations. In other words, with just one variable for transformation it is easy to use the fan plot from the five forward searches with standard values of $\lambda$ to find satisfactory transformations, if such exist, and to discover the observations that are influential in their choice. However, with v variables for transformation, there are $5^v$ combinations of the five standard values to be investigated. Whether or not the calculations are time consuming, trying to absorb and sort the information would be difficult. We therefore suggest three steps to help structure the search for a multivariate transformation:

  1. Run a forward search through the untransformed data, ordering the observations at each $m$ by Mahalanobis distances calculated from untransformed observations. Estimate $\lambda$ at each value of $m$. Use the forward plot of $\hat \lambda$ to select a set of transformation parameters.
  2. Rerun the forward search, now ordering the observations by distances calculated with the parameters selected in the first step; $\lambda$ is again estimated for each $m$. As the search is now on transformed data, the order in which the observations enter the subset will have changed from that in Step 1. Again monitor the values of $\hat \lambda$ and of the likelihood ratio test for the transformation. If a correct transformation has been found, the parameter estimates, if well defined, will be stable until near the end of the search, when any outliers start to enter. At this point, the value of the test statistic may increase rapidly. How well defined the parameter estimates are can be determined by plots of profile loglikelihoods against the individual values of $\lambda$ for various values of $m$. A flat likelihood will explain an estimate of $\lambda$ which is behaving erratically. If some change is suggested in $\lambda$, perhaps because outliers appear to be entering before the end of the search, repeat Step 2 until a reasonable set of transformations has been found. Let this be $\lambda_R$.
  3.  Confirmatory testing of the suggested transformation. We expand each transformation parameter in turn around the five common values of $\lambda(-1, -0.5, 0, 0.5, 1)$, using the values of the vector $\lambda_R$ for transforming the other $v-1$ variables. In this way we turn a multivariate problem into a series of $v$ univariate ones. In each search we can test the transformation by comparing the likelihood ratio test with $\chi^2$ on 1 degree of freedom. But we use the signed square root of the likelihood ratio  in order to learn whether lower or higher values of $\lambda$ are indicated. The plot is thus a version of the fan plot.
In Step 1 the search using untransformed data could be replaced by one using a preliminary estimate of the vector of transformations, perhaps from univariate estimation on each response separately. Some iteration may be needed between Steps 2 and 3.

In summary, for the analysis of multivariate transformations we monitor forward plots of parameter estimates and likelihood ratio tests for the vector parameter $\lambda$. We produce the fan plot using the multivariate version of the signed square root form of the univariate likelihood ratio test. We calculate a set of tests by varying each component of $\lambda$ about $\lambda_0$. Suppose we require a fan plot for $\lambda_j$. Let $\lambda_{0(j)}$ be the vector of all parameters in $\lambda_0$ except $\lambda_j$. Then $\lambda_{0(j)} = (\lambda_{0(j)}:\lambda_j^S)$ is the vector of parameter values in which $\lambda_{j}$ takes one of the five standard values $\lambda^S$ while the other parameters keep their values in $\lambda_0$. To form the likelihood ratio test we also require the estimator $\hat \lambda_{0(j)}$ found by maximization only over $\lambda_j$. More explicitly, we can write $\hat \lambda_{0(j)}=(\lambda_{0(j)}:\hat \lambda_j)$. Then the version, for multivariate data, of the signed square root likelihood ratio test is

$$ T_N(\lambda) = \textrm{sign}(\hat \lambda_j-\lambda_j^S)\sqrt{m \log \{|\hat \Sigma_m(\lambda_{0(j)})|/|\hat \Sigma_m(\hat \lambda_{0(j)})|\}}.$$

We produce $v$ fan plots, one for each variable, by letting $\lambda_j$, $j = 1,\dots,v$ take each of the five standard values. Alternatively, particularly if numerical maximization of the likelihood is time consuming, we could produce the fan plot from the signed square root of the score test .

This chapter considers the following procedures.