# LTStsVarSel

LTStsVarSel does variable selection in the robust time series model LTSts

## Syntax

• reduced_est=LTStsVarSel(y)example
• reduced_est=LTStsVarSel(y,Name,Value)example
• [reduced_est, reduced_model]=LTStsVarSel(___)example
• [reduced_est, reduced_model, msgstr]=LTStsVarSel(___)example

## Description

LTSts requires variable selection when the optimal model parameters are not known in advance. This happens in particular when the function has to be applied to many heterogeneous datasets in an automatic way, possibly on a regular basis (that is the model parameters are expected to change over time even for datasets associated to the same phenomenon).

The approach consists in iteratively eliminating the less significant estimated model parameter, starting from an over-parametrized model. The model parameters are re-estimated with LTSts at each step, until all the p-values are below a given threshold. Then, the output is a reduced time series model with significant parameters only.

 reduced_est =LTStsVarSel(y) run LTStsVarSel with all default options.

 reduced_est =LTStsVarSel(y, Name, Value) run LTStsVarSel starting from a specific over-parametrized model.

 [reduced_est, reduced_model] =LTStsVarSel(___) run LTStsVarSel starting from over-parametrized model with autoregressive components.

 [reduced_est, reduced_model, msgstr] =LTStsVarSel(___) run LTStsVarSel with default options and return warning messages.

## Examples

expand all

### run LTStsVarSel with all default options.

data model

model=struct;
model.trend=1;                  % linear trend
model.trendb=[0 1];             % parameters of the linear trend
model.s=12;                     % monthly time series
model.seasonal=1;               % 1 harmonic with linear trend
model.seasonalb=[10 10];        % parameter for one harmonic with linear trend
model.lshiftb=100;              % level shift amplitude
model.lshift= 30;               % level shift amplitude
model.signal2noiseratio = 100;  % signal to noise
rng('default')
n = 100;                        % sample size
tmp = rand(n,1);
model.X = tmp.*[1:n]';          % a extra covariate
model.Xb = 1;                   % beta coefficient of the covariate
% generate data
out_sim=simulateTS(n,'plots',1,'model',model);
%run LTStsVarSel with all default options
rng(1);
[out_model_0, out_reduced_0] = LTStsVarSel(out_sim.y);
% optional: add a FS step to the LTSts estimator
% outFS = FSRts(out_sim.y,'model',out_model_0);
% To be fixed: 'Non existent user option found-> '    'ARp'

### run LTStsVarSel starting from a specific over-parametrized model.

sample size

n = 100;
tmp = rand(n,1);
% data model
model=struct;
model.trend=1;                  % linear trend
model.trendb=[0 1];             % parameters of the linear trend
model.s=12;                     % monthly time series
model.seasonal=1;               % 1 harmonic with linear trend
model.seasonalb=[10 10];        % parameter for one harmonic with linear trend
model.lshiftb=100;              % level shift amplitude
model.lshift= 30;               % level shift amplitude
model.signal2noiseratio = 100;  % signal to noise
model.X = tmp.*[1:n]';          % a extra covariate
model.Xb = 1;                   % beta coefficient of the covariate
out_sim=simulateTS(n,'plots',1,'model',model);
% complete model to be tested.
overmodel=struct;
overmodel.s=12;                 % monthly time series
overmodel.seasonal=303;         % number of harmonics
overmodel.lshift=4;             % position where to start monitoring level shift
overmodel.X=tmp.*[1:n]';
% pval threshold
thPval=0.01;
[out_model_1, out_reduced_1] = LTStsVarSel(out_sim.y,'model',overmodel,'thPval',thPval,'plots',1);

### run LTStsVarSel starting from over-parametrized model with autoregressive components.

add three autoregressive components to the complete model.

n = 100;                        % sample size
tmp = rand(n,1);
model.X = tmp.*[1:n]';          % a extra covariate
model.Xb = 1;                   % beta coefficient of the covariate
out_sim=simulateTS(n,'plots',1,'model',model);
% complete model to be tested.
overmodel=struct;
overmodel.s=12;                 % monthly time series
overmodel.seasonal=303;         % number of harmonics
overmodel.lshift=4;             % position where to start monitoring level shift
overmodel.X=tmp.*[1:n]';
overmodel.ARp=3;
% pval threshold
thPval=0.01;
[out_model_2, out_reduced_2] = LTStsVarSel(out_sim.y,'model',overmodel,'thPval',thPval);

### run LTStsVarSel with default options and return warning messages.

data model

model=struct;
model.trend=1;                  % linear trend
model.trendb=[0 1];             % parameters of the linear trend
model.s=12;                     % monthly time series
model.seasonal=1;               % 1 harmonic with linear trend
model.seasonalb=[10 10];        % parameter for one harmonic with linear trend
model.lshiftb=100;              % level shift amplitude
model.lshift= 30;               % level shift amplitude
model.signal2noiseratio = 100;  % signal to noise
n = 100;                        % sample size
tmp = rand(n,1);
model.X = tmp.*[1:n]';          % a extra covariate
model.Xb = 1;                   % beta coefficient of the covariate
% generate data
out_sim=simulateTS(n,'plots',1,'model',model);
[out_model_3, out_reduced_3, messages] = LTStsVarSel(out_sim.y);

## Input Arguments

### y — Time series to analyze. Vector.

A row or a column vector with T elements, which contains the time series.

Data Types: single| double

### Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as  Name1,Value1,...,NameN,ValueN.

Example:  'model', model , 'thPval',0.05 , 'plots',1 , 'msg',1 , 'dispresults',true 

### model —model type.structure.

A structure which specifies the (over-parametrized) model which will be used to initialise the variable selection process. The model structure is identical to the one defined for function LTSts: for convenience, we list the fields also here:

Value Description
s

scalar (length of seasonal period). For monthly data s=12 (default), for quartely data s=4, ...

trend

scalar (order of the trend component).

trend = 0 implies no trend trend = 1 implies linear trend with intercept (default), trend = 2 implies quadratic trend trend = 3 implies cubic trend Admissible values for trend are, 0, 1, 2 and 3.

In the paper RPRH to denote the order of the trend symbol A is used. If this field is not present into input structure model, model.trend=2 is used.

seasonal

scalar (integer specifying number of frequencies, i.e. harmonics, in the seasonal component. Possible values for seasonal are $1, 2, ..., [s/2]$, where $[s/2]=floor(s/2)$.

For example:

if seasonal =1 (default) we have:

$\beta_1 \cos( 2 \pi t/s) + \beta_2 sin ( 2 \pi t/s)$;

if seasonal =2 we have:

$\beta_1 \cos( 2 \pi t/s) + \beta_2 \sin ( 2 \pi t/s) + \beta_3 \cos(4 \pi t/s) + \beta_4 \sin (4 \pi t/s)$.

Note that when $s$ is even the sine term disappears for $j=s/2$ and so the maximum number of trigonometric parameters is $s-1$.

If seasonal is a number greater than 100 then it is possible to specify how the seasonal component grows over time.

For example, seasonal = 101 implies a seasonal component which just uses one frequency which grows linearly over time as follows:

$(1+\beta_3 t)\times ( \beta_1 cos( 2 \pi t/s) + \beta_2 \sin ( 2 \pi t/s))$.

For example, seasonal =201 implies a seasonal component which just uses one frequency which grows in a quadratic way over time as follows:

$(1+\beta_3 t + \beta_4 t^2)\times( \beta_1 \cos( 2 \pi t/s) + \beta_2 \sin ( 2 \pi t/s))$.

seasonal =0 implies a non seasonal model.

In the paper RPRH to denote the number of frequencies of the seasonal component symbol B is used, while symbol G is used to denote the order of the trend of the seasonal component.

Therefore, for example, model.seasonal=201 corresponds to B=1 and G=2, while model.seasonal=3 corresponds to B=3 and G=0;

X

matrix of size T-by-nexpl containing the values of nexpl extra covariates which are likely to affect y.

lshift

scalar greater or equal than 0 which specifies whether it is necessary to include a level shift component. lshift = 0 (default) implies no level shift component. If lshift is an interger greater then 0 then it is possible to specify the moment to start considering level shifts. For example if lshift =13 then the following additional parameters are estimated $\beta_{LS1}* I(t \geq beta_{LS2})$ where $\beta_{LS1}$ is a real number and $\beta_{LS2}$ is an integer which assumes values 14, 14, ..., T-13.

In general, the level shift which are considered are referred to times (lshift+1):(T-lshift).

In the paper RPRH $\beta_{LS1}$ is denoted with symbol $\delta_1$, while, $\beta_{LS2}$ is denoted with symbol $\delta_2$.

ARp

scalar greater or equal than 0 which specifies the length of the autoregressive component. The default value of model.ARp is 0, that is there is no autoregressive component.

Remark: the default overparametrized model is for monthly data with a quadratic trend (3 parameters) + seasonal component with just one harmonic (2 parameters), no additional explanatory variables, no level shift and no AR component that is model=struct;

model.s=12;

model.trend=1;

model.seasonal=1;

model.X='';

model.lshift=0;

model.ARp=0;

Using the notation of the paper RPRH we have A=1, B=1; and $\delta_1=0$.

Example:  'model', model 

Data Types: struct

### thPval —threshold for pvalues.scalar.

A value between 0 and 1.

An estimated parameter/variable is eliminated if the associated pvalue is below thPval. Default is thPval=0.01.

Example:  'thPval',0.05 

Data Types: double

### plots —Plots on the screen.scalar.

If plots = 1, the typical LTSts plots will be shown on the screen. The default value of plot is 0 i.e. no plot is shown on the screen.

Example:  'plots',1 

Data Types: double

### msg —Messages on the screen.scalar.

Scalar which controls whether LTSts will display or not messages on the screen. Deafault is msg=0, that is no messages are displayed on the screen. If msg==1 messages displayed on the screen are about estimated time to compute the estimator and the warnings about 'MATLAB:rankDeficientMatrix', 'MATLAB:singularMatrix' and 'MATLAB:nearlySingularMatrix'.

Example:  'msg',1 

Data Types: double

### dispresults —Display results of final fit.boolean.

If dispresults is true, labels of coefficients, estimated coefficients, standard errors, tstat and p-values are shown on the screen in a fully formatted way. The default value of dispresults is false.

Example:  'dispresults',true 

Data Types: logical

## Output Arguments

### reduced_est —A reduced model structure obtained by eliminating parameters that are non-significant. It is a structure containing the typical input model fields for function LTSts (refer to LTSts for details): model.s = the optimal length of seasonal period

model.trend = the optimal order of the trend.

model.seasonal = the optimal number of frequencies in the seasonal component.

model.lshift = the optimal level shift position.

model.X = a matrix containing the values of the extra covariates which are likely to affect y. If the imput model specifies autoregressive components in model.ARp, then the selected ones will be also included in model.X.

### reduced_model —Output fields of the optimal model. Structure

The fields are those of function LTSts (refer to LTSts for details):

out.B = matrix of estimated beta coefficients.

out.h = number of observations that have determined the initial LTS estimator.

out.bs = vector of the units with the smallest squared residuals before the reweighting step.

out.Hsubset = matrix of the units forming best H subset for each tentative level shift considered.

out.numscale2 = matrix of the values of the lts.bestr smallest values of the target function.

out.BestIndexes = matrix of indexes associated with the best nbestindexes solutions.

out.Likloc = matrix containing local sum of squares of residuals determining the best position of level shift.

out.RES = matrix containing scaled residuals for all the units of the original time series monitored in steps lshift+1, lshift+2, ....

out.yhat = vector of fitted values after final step.

out.residuals = vector of scaled residuals from after final NLS step.

out.weights = vector of weights after adaptive reweighting.

out.scale = final scale estimate of the residuals using final weights.

out.conflev = confidence level used to declare outliers.

out.outliers = vector of the units declared outliers.

out.singsub = number of subsets wihtout full rank.

out.y = response vector y.

out.X = data matrix X containing trend, seasonal, expl (with autoregressive component) and lshift.

out.class = 'LTSts'.

Optional Output:

### msgstr —Last warning message. String

This relates to the execution of the LTS.

## References

Rousseeuw, P.J., Perrotta D., Riani M. and Hubert, M. (2018), Robust Monitoring of Many Time Series with Application to Fraud Detection, "Econometrics and Statistics". [RPRH]