As we have already seen in the analysis of transformations in regression analysis, the analysis of data is often improved by using a transformation of the response rather than the original response itself.
In the extension of the Box and Cox (1964) family to multivariate responses there is a vector $\lambda$ of $v$ transformation parameters $\lambda_i$, one for each of the $v$ responses. As before, $y_i$ is the $v \times 1$ vector of responses at observation i with $y_{ij}$ the observation on response $j$. The normalized transformation of $y_{ij}$ is given by
\begin{eqnarray} z_{ij}(\lambda_j) & = & (y_{ij}^{\lambda_j}-1)/\lambda_j \dot{y}_{j}^{\lambda_j-1} \quad \quad (\lambda \neq 0)\\ & = & \dot{y}_{j} \log y_{ij} \quad \quad \quad \quad \quad \quad \quad \quad (\lambda = 0), \end{eqnarray}
where $\dot{y}_{j}$ is is the geometric mean of the $j$-th response. The value $\lambda_j=1 \; (j=1,\dots,v)$ λj = 1 (j = 1,... , v) corresponds to no transformation of any of the responses. If the transformed observations are normally distributed with vector mean $\mu_i$ for the $i$-th observation and covariance matrix $\Sigma$, twice the profile loglikelihood of the observations is given by
\begin{eqnarray} 2L_{\max}(\lambda) & = & \textrm{const} - n\log |\hat \Sigma(\lambda)|-\sum_{i=1}^n \{z_i(\lambda)-\hat \mu_i(\lambda)\}^T\hat\Sigma^{-1}(\lambda)\{z_i(\lambda)-\hat \mu_i(\lambda)\}\\ & = & \textrm{const} - n\log |\hat \Sigma(\lambda)|-\sum_{i=1}^n e_i(\lambda)^T \hat \Sigma(\lambda)^{-1}e_i(\lambda). \end{eqnarray}In this equation the parameter estimates $\hat \mu_i(\lambda)$ and $\hat \Sigma(\lambda)$ are found for fixed $\lambda$ and $e_i(\lambda)$ is the $v \times 1$ vector of residuals for observation $i$ for the same value of $\lambda$. As in the analysis of univariate transformations, it makes no difference in likelihood ratio tests for the value of $\lambda$ whether we use the maximum likelihood estimator of $\Sigma$, or the unbiased estimator $\Sigma_u$. Suppose the maximum likelihood estimator is used, so
$$ n\hat \Sigma (\lambda) = \sum_{i=1}^n e_ie_i^T.$$
When this estimate is substituted in the previous equation, the profile loglikelihood reduces to $$ 2L_{\max}(\lambda) = \textrm{const}' -n \log |\hat \Sigma (\lambda)|. $$So, to test the hypothesis $\lambda = \lambda_0$, the likelihood ratio test statistic for transformation becomes
$$T_{LR} =n \log \{ |\hat \Sigma (\lambda_0)|/|\hat \Sigma(\hat \lambda)|\} $$is compared with the $\chi^2$ distribution on $v$ degrees of freedom. In this equation $\hat \lambda$ is the vector of $v$ parameter estimates maximising $L_{\max}$, which is found by numerical search. Replacement of $\hat \Sigma (\lambda)$ by the unbiased estimator results in the multiplication of each determinant by a factor which cancels, leaving the value of the statistic unchanged. There are two special cases when tests of the form of $t_{LR}$ are on one degree of freedom. In the first we test that all responses should have the same, unspecified, transformation, that is that $\lambda_1 = \lambda_2 = \dots = \lambda_v = \lambda$. The second is the test of just one component of $\lambda$ when all others are kept at some specified value. In both cases we sometimes plot $T_N$, the signed square root form of these tests is used.
For transformations of a single variable we use forward plots of the score statistic $T_p(\lambda)$. However, influential observations may only be evident for some transformations of the data, but not others. Therefore, we employ the forward search on untransformed data and on data subject to various transformations. In other words, with just one variable for transformation it is easy to use the fan plot from the five forward searches with standard values of $\lambda$ to find satisfactory transformations, if such exist, and to discover the observations that are influential in their choice. However, with v variables for transformation, there are $5^v$ combinations of the five standard values to be investigated. Whether or not the calculations are time consuming, trying to absorb and sort the information would be difficult. We therefore suggest three steps to help structure the search for a multivariate transformation:
In summary, for the analysis of multivariate transformations we monitor forward plots of parameter estimates and likelihood ratio tests for the vector parameter $\lambda$. We produce the fan plot using the multivariate version of the signed square root form of the univariate likelihood ratio test. We calculate a set of tests by varying each component of $\lambda$ about $\lambda_0$. Suppose we require a fan plot for $\lambda_j$. Let $\lambda_{0(j)}$ be the vector of all parameters in $\lambda_0$ except $\lambda_j$. Then $\lambda_{0(j)} = (\lambda_{0(j)}:\lambda_j^S)$ is the vector of parameter values in which $\lambda_{j}$ takes one of the five standard values $\lambda^S$ while the other parameters keep their values in $\lambda_0$. To form the likelihood ratio test we also require the estimator $\hat \lambda_{0(j)}$ found by maximization only over $\lambda_j$. More explicitly, we can write $\hat \lambda_{0(j)}=(\lambda_{0(j)}:\hat \lambda_j)$. Then the version, for multivariate data, of the signed square root likelihood ratio test is
$$ T_N(\lambda) = \textrm{sign}(\hat \lambda_j-\lambda_j^S)\sqrt{m \log \{|\hat \Sigma_m(\lambda_{0(j)})|/|\hat \Sigma_m(\hat \lambda_{0(j)})|\}}.$$
We produce $v$ fan plots, one for each variable, by letting $\lambda_j$, $j = 1,\dots,v$ take each of the five standard values. Alternatively, particularly if numerical maximization of the likelihood is time consuming, we could produce the fan plot from the signed square root of the score test .
This chapter considers the following procedures.
FSMtra implements the monitoring of
maximum likelihood estimates of transformation parameters
and the likelihood ratio test statistic for transformation.
It is also possible to analyze for required steps of the search, the profile loglikelihoods
of the transformation parameters. In summary, FSMtra implements steps 1 and 2
described above.