# rlsmo

rlsmo computes a running-lines smoother with global cross-validation.

## Syntax

• smo=rlsmo(x,y)example
• smo=rlsmo(x,y,w)example
• smo=rlsmo(x,y,w,span)example
• [smo,span]=rlsmo(___)example

## Description

This function is called in each step of the avas function but it can be called directly when it is necessary to smooth a set of values using local regressions. Note that the x values must be non decreasing.

 smo =rlsmo(x, y) rlsmo with all the default arguments.

 smo =rlsmo(x, y, w) rlsmo with weights.

 smo =rlsmo(x, y, w, span) rlsmo with span value supplied as input.

 [smo, span] =rlsmo(___) rlsmo called with two outputs.

## Examples

expand all

### rlsmo with all the default arguments.

n=200;
x=sort(randn(n,1));
y=2*x.^2+-3*x+2*randn(n,1);
ysmo=rlsmo(x,y);
plot(x,[y ysmo])

### rlsmo with weights.

n=200;
x=sort(randn(n,1));
y=2*x.^2+-3*x+2*randn(n,1);
w=1:n; w=w(:);
[ysmo,span]=rlsmo(x,y,w);
plot(x,[y ysmo])
title(['span chosen by cross validation= ' num2str(span)])

### rlsmo with span value supplied as input.

n=200;
x=sort(randn(n,1))*10;
y=3*x.^3-2*x.^2+-4*x+10000*randn(n,1);
[ysmo,span]=rlsmo(x,y,[],0.5);
plot(x,[y ysmo])
title(['Fixed value of span = ' num2str(span)])

### rlsmo called with two outputs.

n=200;
x=sort(randn(n,1));
y=2*x.^2+-3*x+2*randn(n,1);
[ysmo,span]=rlsmo(x,y);
plot(x,[y ysmo])
title(['span chosen by cross validation= ' num2str(span)])

## Input Arguments

### x — Predictor variable sorted. Vector.

Ordered abscissa values.

Note that the x values are assumed non decreasing.

Data Types: single| double

### y — Response variable. Vector.

Response variable which has to be smoothed, specified as a vector of length n, where n is the number of observations.

Data Types: single| double

### w — weights for the observations. Vector.

Row or column vector of length n containing the weights associated to each observations. If w is not specified we assum $w=1$ for $i=1, 2, \ldots, n$.

Example: 1:n 

Data Types: double

### span — length of the local regressions. Scalar.

Scalar in the interval [0, 1] which specifies the length of the local regressions. If span is 0 (default value) the fractions of observations which are considered for computing the local regressions are roughly $cvspan=n*[0.3,0.4,0.5,0.6,0.7,1.0]$.

The element of $cvspan$ which is associated with the smallest cross validation residual sum of squares is chosen. The smoothing procedure is called using the best value of cvspan and the smoothed values are found without cross validation.

If span is not 0 but is a value in the interval (0, 1], the local regression have length n*span and the smoothed values are found without cross validation.

Example: 0.4 

Data Types: double

## Output Arguments

### smo —smoothed values.  Vector

A vector with the same dimension of y containing smoothed values, that is the y values on the fitted curve. The smoothed values come from linear local linear regressions whose length is specified by input parameter span.

### span —length of the local regressions. Scalar

Scalar in the interval [0, 1] which specifies the length of the local regressions which has been used. For example if span=0.3 approximately 30 per cent of consecutive observations are used in order to compute the local regressions.

This function makes use of subroutine smth.

The sintax of $smth$ is $[smo] = smth(x,y,w,span,cross)$. $x$, $y$ and $w$ are 3 vectors of length $n$ containing respectively the $x$ coordinates, the $y$ coordinates and the weights. Input paramter $span$ is a scalar in the interval (0 1] which defines the length of the elements in the local regressions.

More precisely, if $span$ is in (0 1), the length of elements in the local regressions is $m*2+1$, where $m$ is defined as the $\max([(n \times span)/2],1)$ to ensure that minimum length of the local regression is 3. Symbol $[ \cdot ]$ denotes the integer part.

Parameter $cross$ is a Boolean scalar. If it is set to true, it specifies that, to compute the local regression centered on unit $i$, unit $i$ must be deleted. Therefore for example,

[1] if $m$ is 3 and $cross$ is true, the smoothed value for observation $i$ uses a local regression with $x$ coordinates $(x(i-1), x(i+1))$, $y$ coordinates $(y(i-1), y(i+1))$ and $w$ coordinates $(w(i-1), w(i+1))$, $i=2, \ldots, n-1$. The smoothed values for observation 1 is $y(2)$ and the smoothed value for observation $n$ is $y(n-1)$.

[2] If $m$ is 3 and $cross$ is false, the smoothed value for observations $i$ is based on a local regression with $x$ coordinates $(x(i-1), x(i), x(i+1))$, $y$ coordinates $(y(i-1), y(i), y(i+1))$ and $w$ coordinates $(w(i-1), w(1), w(i+1))$, $i=2, \ldots, n-1$. The smoothed values for observation 1 uses a local regression based on $(x(1), x(2))$, $(y(1), y(2))$, and $(w(1), w(2))$ while the smoothed value for observation $n$ uses a local regression based on $(x(n-1), x(n))$, $(y(n-1), y(n))$, and $(w(n-1), w(n))$.

[3] If $m=5$ and $cross$ is true, the smoothed value for observations $i$ uses a local regression based on observations $(i-2), (i-1), (i+1), (i+2)$, for $i=3, \ldots, n-2$. The smoothed values for observation 1 uses observations 2 and 3, the smoothed value for observations 2 uses observations 1, 3 and 4 ...

[4] If $m$ is 5 and $cross$ is false, the smoothed value for observations $i$ uses a local regression based on observations $(i-2), (i-1), i, (i+1), (i+2)$, for $i=3, \ldots, n-2$.

The smoothed values for observation 1 uses observations 1, 2 and 3, the smoothed value for observations 2 uses observations 1, 2, 3 and 4 ...

## References

Tibshirani R. (1987), Estimating optimal transformations for regression, "Journal of the American Statistical Association", Vol. 83, 394-405.

Hastie, T., and Tibshirani, R. (1986), Generalized Additive Models (with discussion), "Statistical Science", Vol 1, pp. 297-318