# ace

ace computes alternative conditional expectation

## Syntax

• out=ace(y,X)example
• out=ace(y,X,Name,Value)example

## Description

This function uses the alternating conditional expectation algorithm to find the transformations of y and X that maximise the proportion of variation in y explained by X. When X is a matrix, it is transformed so that its columns are equally weighted when predicting y.

 out =ace(y, X) Example of the use of ace based on the Wang and Murphy data.

 out =ace(y, X, Name, Value) Example 1 from TIB88: brain body weight data.

## Examples

expand all

### Example of the use of ace based on the Wang and Murphy data.

In order to have the possibility of replicating the results in R using library acepack function mtR is used to generate the random data.

rng('default')
seed=11;
negstate=-30;
n=200;
X1 = mtR(n,0,seed)*2-1;
X2 = mtR(n,0,negstate)*2-1;
X3 = mtR(n,0,negstate)*2-1;
X4 = mtR(n,0,negstate)*2-1;
res=mtR(n,1,negstate);
% Generate y
y = log(4 + sin(3*X1) + abs(X2) + X3.^2 + X4 + .1*res );
X = [X1 X2 X3 X4];
% Apply the ace algorithm
out= ace(y,X);
% Show the output graphically using function aceplot
aceplot(out)

### Example 1 from TIB88: brain body weight data.

Comparison between ace and avas.

YY=load('animals.txt');
y=YY(1:62,2);
X=YY(1:62,1);
out=ace(y,X);
aceplot(out)
out=avas(y,X);
aceplot(out)
% https://vincentarelbundock.github.io/Rdatasets/doc/robustbase/Animals2.html
% ## The same' plot for Rousseeuw's subset:
% data(Animals, package = "MASS")
% brain <- Animals[c(1:24, 26:25, 27:28),]
% plotbb(bbdat = brain)

## Input Arguments

### y — Response variable. Vector.

Response variable, specified as a vector of length n, where n is the number of observations. Each entry in y is the response for the corresponding row of X.

Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

Data Types: single| double

### X — Predictor variables. Matrix.

Matrix of explanatory variables (also called 'regressors') of dimension n x (p-1) where p denotes the number of explanatory variables including the intercept.

Rows of X represent observations, and columns represent variables. By default, there is a constant term in the model, unless you explicitly remove it using input option intercept, so do not include a column of 1s in X. Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

Data Types: single| double

### Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as  Name1,Value1,...,NameN,ValueN.

Example:  'l',[3 3 1] , 'w',1:n , 'nterm',5 , 'delrsq',0.001 , 'maxit',30 

### l —type of transformation.vector.

Vector of length p+1 which specifies how the type of transformation for the explanatory variables and the response. The fitst p element of this vector refer to the p explanatory variables, the last element refers to the response.

l(j)=1 => j-th variable assumes orderable values.

l(j)=2 => j-th variable assumes circular (periodic) values in the range (0.0,1.0) with period 1.0.

l(j)=3 => j-th variable transformation is to be monotone.

l(j)=4 => j-th variable transformation is to be linear.

l(j)=5 => j-th variable assumes categorical (unorderable) values.

j =1, 2, \ldots, p+1.

The default value of l is is a vector of ones of length p+1, that is the procedure assumes that both the explanatory variables and the response have orderable values.

Example:  'l',[3 3 1] 

Data Types: double

### w —weights for the observations.vector.

Row or column vector of length n containing the weights associated to each observations. If w is not specified we assum $w=1$ for $i=1, 2, \ldots, n$.

Example:  'w',1:n 

Data Types: double

### nterm —minimum number of consecutive iteration below the threshold to terminate the outer loop.positive scalar.

This value specifies how many consecutive iterations below the threshold it is necesasry to have to declare convergence in the outer loop. The default value of nterm is 3.

Example:  'nterm',5 

Data Types: double

### delrsq —termination threshold.scalar.

Iteration (in the outer loop) stops when rsq changes less than delrsq in nterm. The default value of delrsq is 0.01.

Example:  'delrsq',0.001 

Data Types: double

### maxit —maximum number of iterations for the outer loop.scalar.

The default maximum number of iterations before exiting the outer loop is 20.

Example:  'maxit',30 

Data Types: double

## Output Arguments

### out — description Structure

Structure which contains the following fields

Value Description
ty

n x 1 vector containing the transformed y values.

tX

n x p matrix containing the transformed X matrix.

rsq

the multiple R-squared value for the transformed values in the last iteration of the outer loop.

y

n x 1 vector containing the original y values.

X

n x p matrix containing the original X matrix.

niter

scalar. Number of iterations which have been necessary to achieve convergence.

outliers`

k x 1 vector containing the units declared as outliers when procedure is called with input option rob set to true. If rob is false out.outliers=[].

## References

Breiman, L. and Friedman, J.H. (1985), Estimating optimal transformations for multiple regression and correlation, "Journal of the American Statistical Association", Vol. 80, pp. 580-597.

Wang D. and Murphy M. (2005), Identifying nonlinear relationships regression using the ACE algorithm, "Journal of Applied Statistics", Vol. 32, pp. 243-258.