# avas

avas computes additivity and variance stabilization for regression

## Syntax

• out=avas(y,X)example
• out=avas(y,X,Name,Value)example

## Description

This function differs from ace in that it uses a (nonparametric) variance-stabilizing transformation for the response variable.

 out =avas(y, X) Example of the use of avas based on the Wang and Murphy data.

 out =avas(y, X, Name, Value) Example 1 from TIB88: brain body weight data.

## Examples

expand all

### Example of the use of avas based on the Wang and Murphy data.

In order to have the possibility of replicating the results in R using library acepack function mtR is used to generate the random data.

rng('default')
seed=11;
negstate=-30;
n=200;
X1 = mtR(n,0,seed)*2-1;
X2 = mtR(n,0,negstate)*2-1;
X3 = mtR(n,0,negstate)*2-1;
X4 = mtR(n,0,negstate)*2-1;
res=mtR(n,1,negstate);
% Generate y
y = log(4 + sin(3*X1) + abs(X2) + X3.^2 + X4 + .1*res );
X = [X1 X2 X3 X4];
% Apply the avas algorithm
out= avas(y,X);
% Show the output graphically using function aceplot
aceplot(out)

### Example 1 from TIB88: brain body weight data.

Comparison between ace and avas.

YY=load('animals.txt')
y=YY(1:62,2);
X=YY(1:62,1);
out=ace(y,X);
aceplot(out)
out=avas(y,X);
aceplot(out)
% https://vincentarelbundock.github.io/Rdatasets/doc/robustbase/Animals2.html
% ## The same' plot for Rousseeuw's subset:
% data(Animals, package = "MASS")
% brain <- Animals[c(1:24, 26:25, 27:28),]
% plotbb(bbdat = brain)

## Related Examples

expand all

### Example 3 from TIB88: unequal variances.

n=200;
rng('default')
seed=100;
negstate=-30;
x = mtR(n,0,seed)*3;
z = mtR(n,1,negstate);
y=x+0.1*x.*z;
X=x;
nr=1;
nc=2;
outACE=ace(y,X);
outAVAS=avas(y,X);
subplot(nr,nc,1)
yy=[y, outACE.ty outAVAS.ty];
yysor=sortrows(yy,1);
plot(yysor(:,1),yysor(:,2:3))
% plot(y,[outACE.ty outAVAS.ty])
title('Compare ACE and AVAS')
xlabel('y')
ylabel('ty')
legend({'ACE' 'AVAS'},'Location','Best')
subplot(nr,nc,2)
XX=[X, outACE.tX outAVAS.tX];
XXsor=sortrows(XX,1);
plot(XXsor(:,1),XXsor(:,2:3))
% plot(y,[outACE.ty outAVAS.ty])
title('Compare ACE and AVAS')
xlabel('x')
ylabel('tX')
legend({'ACE' 'AVAS'},'Location','Best')
% For R users, the R code to reproduce the example above is given
% below.
% set.seed(100)
% n=200
% x <- runif(n)*3
% z <- rnorm(n)
% y=x+0.1*x*z;
% out=ace(x,y)
% out=avas(x,y)

### Example 4 from TIB88: non constant underlying variance.

close all
negstate=-100;
rng('default')
seed=100;
n=200;
x1=mtR(n,1,seed);
x2=mtR(n,1,negstate);
z=mtR(n,1,negstate);
% x1=randn(n,1);
% x2=randn(n,1);
% z=randn(n,1);
absx1_x2=abs(x1-x2);
y=x1+x2+(1/3)*absx1_x2.*z;
X=[x1 x2];
out=avas(y,X);
tyhat=sum(out.tX,2);
ty=out.ty;
absty_tyhat=abs(ty-tyhat);
% hold('on')
plot(absx1_x2,absty_tyhat,'o');
lsline
% For R users, the R code to reproduce the example above is given
% below.
% # Example 4 non constant underlying variance
%  set.seed(100)
% n=200;
% x1=rnorm(n);
% x2=rnorm(n);
% z=rnorm(n);
% absx1_x2=abs(x1-x2);
% y=x1+x2+(1/3)*absx1_x2*z;
% X=cbind(x1,x2);
% out=avas(X,y);

### Example 5 from TIB88: missing group variable.

rng('default')
seed=1000;
n=100;
x=mtR(n,0,seed)*10-5;
z=mtR(n,1,-seed);
%x=rand(n,1)*10-5;
%z=randn(n,1);
y1=3+5*x+z;
y2=-3+5*x+z;
y=[y1;y2];
X=[x;x];
out=avas(y,X);
aceplot(out);
% For R users, the R code to reproduce the example above is given
% below.
% # Example 5 missing group variable
% set.seed(1000)
% n=100;
% x=runif(n)*10-5;
% z=rnorm(n);
% y1=3+5*x+z;
% y2=-3+5*x+z;
% y=c(y1,y2);
% X=c(x,x)
% out=avas(X,y);

### Example 6 from TIB88: binary.

seed=20;
n=50;
y1=exp(-0.5+0.5*mtR(n,1,seed));
y2=exp(0.5+0.5*mtR(n,1,-seed));
y=[y1;y2];
X=[-0.5*ones(n,1); 0.5*ones(n,1)];
out=avas(y,X);
aceplot(out)
% For R users, the R code to reproduce the example above is given
% below.
% Example 6 binary
% set.seed(20)
% n=50;
% y1=exp(-0.5+0.5*rnorm(n));
% y2=exp(0.5+0.5*rnorm(n));
% y=c(y1,y2);
% X=c(-0.5*rep(1,n), 0.5*rep(1,n));
% out=avas(X,y);

### Example 9 from TIB88: Nonmonotone function of X.

n=200;
x=rand(n,1)*2*pi;
z=randn(n,1);
y=exp(sin(x)+0.2*z);
X=x;
out=avas(y,X);
aceplot(out)

## Input Arguments

### y — Response variable. Vector.

Response variable, specified as a vector of length n, where n is the number of observations. Each entry in y is the response for the corresponding row of X.

Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

Data Types: single| double

### X — Predictor variables. Matrix.

Matrix of explanatory variables (also called 'regressors') of dimension n x (p-1) where p denotes the number of explanatory variables including the intercept.

Rows of X represent observations, and columns represent variables. By default, there is a constant term in the model, unless you explicitly remove it using input option intercept, so do not include a column of 1s in X. Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

Data Types: single| double

### Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as  Name1,Value1,...,NameN,ValueN.

Example:  'l',[3 3 1] , 'w',1:n , 'nterm',5 , 'delrsq',0.001 , 'maxit',30 

### l —type of transformation.vector.

Vector of length p which specifies the type of transformation for the explanatory variables.

l(j)=1 => j-th variable assumes orderable values.

l(j)=2 => j-th variable assumes circular (periodic) values in the range (0.0,1.0) with period 1.0.

l(j)=3 => j-th variable transformation is to be monotone.

l(j)=4 => j-th variable transformation is to be linear.

l(j)=5 => j-th variable assumes categorical (unorderable) values.

j =1, 2, \ldots, p+1.

The default value of l is a vector of ones of length p, that is the procedure assumes that both the explanatory variables and the response have orderable values. Note that in avas procedure the reponse is always transformed

Example:  'l',[3 3 1] 

Data Types: double

### w —weights for the observations.vector.

Row or column vector of length n containing the weights associated to each observations. If w is not specified we assum $w=1$ for $i=1, 2, \ldots, n$.

Example:  'w',1:n 

Data Types: double

### nterm —minimum number of consecutive iteration below the threshold to terminate the outer loop.positive scalar.

This value specifies how many consecutive iterations below the threshold it is necesasry to have to declare convergence in the outer loop. The default value of nterm is 3.

Example:  'nterm',5 

Data Types: double

### delrsq —termination threshold.scalar.

Iteration (in the outer loop) stops when rsq changes less than delrsq in nterm. The default value of delrsq is 0.01.

Example:  'delrsq',0.001 

Data Types: double

### maxit —maximum number of iterations for the outer loop.scalar.

The default maximum number of iterations before exiting the outer loop is 20.

Example:  'maxit',30 

Data Types: double

## Output Arguments

### out — description Structure

Structure which contains the following fields

Value Description
ty

n x 1 vector containing the transformed y values.

tX

n x p matrix containing the transformed X matrix.

rsq

the multiple R-squared value for the transformed values in the last iteration of the outer loop.

y

n x 1 vector containing the original y values.

X

n x p matrix containing the original X matrix.

niter`

scalar. Number of iterations which have been necessary to achieve convergence.

## References

Tibshirani R. (1987), Estimating optimal transformations for regression, "Journal of the American Statistical Association", Vol. 83, 394-405.

Wang D. and Murphy M. (2005), Identifying nonlinear relationships regression using the ACE algorithm, "Journal of Applied Statistics", Vol. 32, pp. 243-258.