LTStsVarSel

LTStsVarSel does variable selection in the robust time series model LTSts

Syntax

reduced_est=LTStsVarSel(y)example
reduced_est=LTStsVarSel(y,Name,Value)example
[reduced_est, reduced_model]=LTStsVarSel(___)example
[reduced_est, reduced_model, msgstr]=LTStsVarSel(___)example

Description

LTSts requires variable selection when the optimal model parameters are not known in advance. This happens in particular when the function has to be applied to many heterogeneous datasets in an automatic way, possibly on a regular basis (that is the model parameters are expected to change over time even for datasets associated to the same phenomenon).

The approach consists in iteratively eliminating the less significant estimated model parameter, starting from an over-parametrized model. The model parameters are re-estimated with LTSts at each step, until all the p-values are below a given threshold. Then, the output is a reduced time series model with significant parameters only.

example

reduced_est =LTStsVarSel(y) run LTStsVarSel with all default options.

example

reduced_est =LTStsVarSel(y, Name, Value) run LTStsVarSel starting from a specific over-parametrized model.

example

[reduced_est, reduced_model] =LTStsVarSel(___) run LTStsVarSel starting from over-parametrized model with autoregressive components.

example

[reduced_est, reduced_model, msgstr] =LTStsVarSel(___) run LTStsVarSel with default options and return warning messages.

Examples

expand all

run LTStsVarSel with all default options.

rng('default')
rng(1);
% data model
model=struct;
model.trend=1;                  % linear trend
model.trendb=[0 1];             % parameters of the linear trend
model.s=12;                     % monthly time series
model.seasonal=1;               % 1 harmonic
model.seasonalb=[10 10];        % parameter for one harmonic
model.signal2noiseratio = 100;  % signal to noise ratio
n = 100;                        % sample size
% generate data
out_sim=simulateTS(n,'plots',1,'model',model);
%run LTStsVarSel with all default options
[out_model_0, out_reduced_0] = LTStsVarSel(out_sim.y);
% optional: add a FS step to the LTSts estimator
% outFS = FSRts(out_sim.y,'model',out_model_0);
% To be fixed: 'Non existent user option found-> '    'ARp'

run LTStsVarSel starting from a specific over-parametrized model.

rng('default')
rng(1);
% sample size
n = 100;                       
tmp = rand(n,1);
% data model
model=struct;
model.trend=1;                  % linear trend
model.trendb=[0 1];             % parameters of the linear trend
model.s=12;                     % monthly time series
model.seasonal=1;               % 1 harmonic
model.seasonalb=[10 10];        % parameter for one harmonic
model.lshiftb=100;              % level shift amplitude
model.lshift= 30;               % level shift position
model.signal2noiseratio = 100;  % signal to noise ratio
model.X = tmp.*[1:n]';          % a extra covariate
model.Xb = 1;                   % beta coefficient of the covariate
out_sim=simulateTS(n,'plots',1,'model',model);
% complete model to be tested.
overmodel=struct;
overmodel.trend=2;              % quadratic trend
overmodel.s=12;                 % monthly time series
overmodel.seasonal=303;         % number of harmonics
overmodel.lshift=4:n-4;         % positions of level shift which have to be considered
overmodel.X=tmp.*[1:n]';
% pval threshold
thPval=0.01;
[out_model_1, out_reduced_1] = LTStsVarSel(out_sim.y,'model',overmodel,'thPval',thPval,'plots',1);

run LTStsVarSel starting from over-parametrized model with autoregressive components.

rng('default')
rng(1);
% add three autoregressive components to the complete model.
n = 100;                        % sample size
tmp = rand(n,1);
model=struct;
model.trend=1;                  % linear trend
model.trendb=[0 1];             % parameters of the linear trend
model.s=12;                     % monthly time series
model.seasonal=1;               % 1 harmonic
model.seasonalb=[10 10];
model.X = tmp.*[1:n]';          % a extra covariate
model.Xb = 1;                   % beta coefficient of the covariate
model.signal2noiseratio = 100;  % signal to noise ratio
out_sim=simulateTS(n,'plots',1,'model',model);
% complete model to be tested.
overmodel=struct;
overmodel.trend=2;              % quadratic trend
overmodel.s=12;                 % monthly time series
overmodel.seasonal=303;         % number of harmonics
overmodel.lshift=0;             % no level shift
overmodel.X=tmp.*[1:n]';
overmodel.ARp=1:3;
% pval threshold
thPval=0.01;
[out_model_2, out_reduced_2] = LTStsVarSel(out_sim.y,'model',overmodel,'thPval',thPval);

run LTStsVarSel with default options and return warning messages.

rng('default')
rng(1);
% data model
model=struct;
model.trend=1;                  % linear trend
model.trendb=[0 1];             % parameters of the linear trend
model.s=12;                     % monthly time series
model.seasonal=1;               % 1 harmonic
model.seasonalb=[10 10];        % parameter for one harmonic
model.lshiftb=100;              % level shift amplitude
model.lshift= 30;               % level shift position
model.signal2noiseratio = 100;  % signal to noise ratio
n = 100;                        % sample size
tmp = rand(n,1);
model.X = tmp.*[1:n]';          % a extra covariate
model.Xb = 1;                   % beta coefficient of the covariate
% generate data
out_sim=simulateTS(n,'plots',1,'model',model);
[out_model_3, out_reduced_3, messages] = LTStsVarSel(out_sim.y);

Related Examples

expand all

An example of the use of option firstTestLS rng('default') rng(10); data model model=struct; model.

rng('default')
rng(10);
% data model
model=struct;
model.trend=1;                  % linear trend
model.trendb=[0 1];             % parameters of the linear trend
model.s=12;                     % monthly time series
model.seasonal=1;               % 1 harmonic
model.seasonalb=[2 3];          % parameter for one harmonic
model.lshiftb=50;               % level shift amplitude
model.lshift= 35;               % level shift position
model.signal2noiseratio = 30;   % signal to noise ratio
model.ARp=1;
model.ARb=0.9;
n = 50;                         % sample size
tmp = rand(n,1);
model.X = tmp;                  % a extra covariate
model.Xb = 1;                   % beta coefficient of the covariate
% generate data
out_sim=simulateTS(n,'plots',1,'model',model);
overmodel=model;
overmodel.trend=2;              % quadratic trend
overmodel.s=12;                 % monthly time series
overmodel.seasonal=303;         % number of harmonics
overmodel.lshift=-1;
overmodel=rmfield(overmodel,"trendb");
overmodel=rmfield(overmodel,"seasonalb");
overmodel=rmfield(overmodel,"signal2noiseratio");
overmodel=rmfield(overmodel,"lshiftb");
overmodel=rmfield(overmodel,"Xb");
overmodel=rmfield(overmodel,"ARb");
nsamp=100;
tic;
[out_model_3, out_reduced_3] = LTStsVarSel(out_sim.y, ...
'model',overmodel,'nsamp',nsamp);
compTime=toc; 
disp(compTime)
disp('Final selected model without option firstTestLS')
disp(out_model_3)
[out_model_3N, out_reduced_3N] = LTStsVarSel(out_sim.y, ...
'model',overmodel,'firstTestLS',true,'nsamp',nsamp);
disp('Final selected model with option firstTestLS')
disp(out_model_3N)

Input Arguments

expand all

`y` — Time series to analyze. Vector.

A row or a column vector with T elements, which contains the time series.

Data Types: single| double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example:

 'firstTestLS', false
, 'model', model
, 'nsamp',500
, 'thPval',0.05
, 'plots',1
, 'msg',1
, 'dispresults',true

`firstTestLS` —initial test for presence of level shift.boolean.

If firstTestLS is true, we immediately find the position of the level shift in a model which does not contain autoregressive terms, the seasonal specification is 101. If the level shift component is significant we pass the level shift component in fixed position to the variable selection procedure. Note also that the units declared as outliers with a p-value smaller than 0.001 are used to form model.ARtentout. model.ARtentout is used in the subsequent steps of the variable selection procedure, every time there is a call to LTSts with an autoregressive component. The default value of firstTestLS is false.

Example: 'firstTestLS', false

Data Types: logical

`model` —model type.structure.

A structure which specifies the (over-parametrized) model which will be used to initialise the variable selection process. The model structure is identical to the one defined for function LTSts: for convenience, we list the fields also here:

Value	Description
`s`	scalar (length of seasonal period). For monthly data s=12 (default), for quarterly data s=4, ...
`trend`	scalar (order of the trend component). trend = 0 implies no trend, trend = 1 implies linear trend with intercept, trend = 2 implies quadratic trend, trend = 3 implies cubic trend. Admissible values for trend are, 0, 1, 2 and 3. In the paper RPRH to denote the order of the trend, symbol A is used. If this field is not present into input structure model, model.trend=2 is used.
`seasonal`	scalar (integer specifying number of frequencies), i.e. harmonics, in the seasonal component. Possible values for seasonal are $1, 2, ..., [s/2]$ , where $[s/2]=floor(s/2)$ . For example: if seasonal =1 we have: $\beta_1 \cos( 2 \pi t/s) + \beta_2 sin ( 2 \pi t/s)$ ; if seasonal =2 we have: $\beta_1 \cos( 2 \pi t/s) + \beta_2 \sin ( 2 \pi t/s) + \beta_3 \cos(4 \pi t/s) + \beta_4 \sin (4 \pi t/s)$ . Note that when $s$ is even, the sine term disappears for $j=s/2$ and so the maximum number of trigonometric parameters is $s-1$ . If seasonal is a number greater than 100, then it is possible to specify how the seasonal component grows over time. For example, seasonal = 101 implies a seasonal component which just uses one frequency which grows linearly over time as follows: $(1+\beta_3 t)\times ( \beta_1 cos( 2 \pi t/s) + \beta_2 \sin ( 2 \pi t/s))$ . For example, seasonal =201 implies a seasonal component which just uses one frequency which grows in a quadratic way over time as follows: $(1+\beta_3 t + \beta_4 t^2)\times( \beta_1 \cos( 2 \pi t/s) + \beta_2 \sin ( 2 \pi t/s))$ . seasonal =0 implies a non seasonal model. In the paper RPRH, to denote the number of frequencies of the seasonal component, symbol B is used, while symbol G is used to denote the order of the trend of the seasonal component. Therefore, for example, model.seasonal=201 corresponds to B=1 and G=2, while model.seasonal=3 corresponds to B=3 and G=0. If this field is not present into input structure model, model.seasonal=303 is used.
`X`	matrix of size T-by-nexpl containing the values of nexpl extra covariates which are likely to affect y.
`lshift`	scalar or vector associated to level shift component. lshift=0 (default) implies no level shift component. If model.lshift is vector of positive integers, then it is associated to the positions of level shifts which have to be considered. The most significant one is included in the fitted model. For example, if model.lshift =[13 20 36], a tentative level shift is imposed in position $t=13$ , $t=20$ and $t=36$ . The most significant among these positions is included in the final model. In other words, the following extra parameters are added to the final model: $\beta_{LS1}* I(t \geq \beta_{LS2})$ where $\beta_{LS1}$ is a real number (associated with the magnitude of the level shift) and $\beta_{LS2}$ is an integer which assumes values 13, 20 or 36 and and $I$ denotes the indicator function. As a particular case, if model.lshift =13 then a level shift in position $t=13$ is added to the model. In other words the following additional parameters are added: $\beta_{LS1}* I(t \geq 13)$ where $\beta_{LS1}$ is a real number and $I$ denotes the indicator function. If lshift = -1 tentative level shifts are considered for positions $p+1,p+2, ..., T-p$ and the most significant one is included in the final model ( $p$ is the total number of parameters in the fitted model). Note that lshift=-1 is not supported for C-coder translation. In the paper RPRH $\beta_{LS1}$ is denoted with symbol $\delta_1$ , while, $\beta_{LS2}$ is denoted with symbol $\delta_2$ .
`ARp`	vector with non negative integer numbers specifying the autoregressive components. For example: model.ARp=[1 2] means a AR(2) process; model.ARp=2 means just the lag 2 component; model.ARp=[1 2 5 8] means AR(2) + lag 5 + lag 8; model.ARp=0 (default) means no autoregressive component.
`ARtentout`	matrix of size r-by-2 containing the list of the units declared as outliers (first column) and corresponding fitted values (second column) or empty scalar. If model.ARtentout is not empty, when the autoregressive component is present, the y values which are used to compute the autoregressive component are replaced by model.tentout(:,2) for the units contained in model.tentout(:,1) Remark: the default overparametrized model is for monthly data with a quadratic trend (3 parameters) + seasonal component with three harmonics which grows in a cubic way over time (9 parameters), no additional explanatory variables, no level shift and no AR component that is: model=struct; model.s=12; model.trend=2; model.seasonal=303; model.X=''; model.lshift=0; model.ARp=0; Using the notation of the paper RPRH we have A=2, B=3; G=3; and $\delta_1=0$ .

Example: 'model', model

Data Types: struct

`nsamp` —number of subsamples to extract.scalar | vector of length 2.

Vector of length 1 or 2 which controls the number of subsamples which will be extracted to find the robust estimator. If lshift is not equal to 0 then nsamp(1) controls the number of subsets which have to be extracted to find the solution for t=lshift(1). nsamp(2) controls the number of subsets which have to be extracted to find the solution for t=lshift(2), lshift(3), ..., lshift(end).

Note that nsamp(2) is generally smaller than nsamp(1) because in order to compute the best solution for t=lshift(2), lshift(3), ..., lshift(end), we use the lts.bestr/2 best solutions from previous t (after shifting the position of the level shift in the estimator of beta). If lshift is a vector of positive integers the default value of nsamp is (500 250). If lshift is a vector of positive integers and nsamp is supplied as a scalar the default is to extract [nsamp/2] subsamples for t=lshift(1), lshift(2), ... Therefore, for example, in order to extract 600 subsamples for t=lshift(1) and 300 subsamples for t= lshift(2) ... you can use nsamp =600 or nsamp=[600 300].

The default value of nsamp is 1000;

Remark: if nsamp=0 ,all subsets will be extracted.

They will be (n choose p).

Example: 'nsamp',500

Data Types: double

`thPval` —threshold for pvalues.scalar.

A value between 0 and 1.

An estimated parameter/variable is eliminated if the associated pvalue is below thPval. Default is thPval=0.01.

Example: 'thPval',0.05

Data Types: double

`plots` —Plots on the screen.scalar.

If plots = 1, the typical LTSts plots will be shown on the screen. The default value of plot is 0 i.e. no plot is shown on the screen.

Example: 'plots',1

Data Types: double

`msg` —Messages on the screen.scalar.

Scalar which controls whether LTSts will display or not messages on the screen. Default is msg=0, that is no messages are displayed on the screen. If msg==1, messages displayed on the screen are about estimated time to compute the estimator and the warnings about 'MATLAB:rankDeficientMatrix', 'MATLAB:singularMatrix' and 'MATLAB:nearlySingularMatrix'.

Example: 'msg',1

Data Types: double

`dispresults` —Display results of final fit.boolean.

If dispresults is true, labels of coefficients, estimated coefficients, standard errors, tstat and p-values are shown on the screen in a fully formatted way. The default value of dispresults is false.

Example: 'dispresults',true

Data Types: logical

Output Arguments

expand all

`reduced_est` —A reduced model structure obtained by eliminating parameters that are non-significant. It is a structure containing the typical input model fields for function LTSts (refer to LTSts for details): model.s = the optimal length of seasonal period

model.trend = the optimal order of the trend.

model.seasonal = the optimal number of frequencies in the seasonal component.

model.lshift = the optimal level shift position.

model.X = a matrix containing the values of the extra covariates which are likely to affect y. If the input model specifies autoregressive components in model.ARp, then the selected ones will be also included in model.X.

`reduced_model` —Output fields of the optimal model. Structure

The fields are those of function LTSts (refer to LTSts for details): out.B = matrix of estimated beta coefficients.

out.h = number of observations that have determined the initial LTS estimator.

out.bs = vector of the units with the smallest squared residuals before the reweighting step.

out.Hsubset = matrix of the units forming best H subset for each tentative level shift considered.

out.numscale2 = matrix of the values of the lts.bestr smallest values of the target function.

out.BestIndexes = matrix of indexes associated with the best nbestindexes solutions.

out.Likloc = matrix containing local sum of squares of residuals determining the best position of level shift.

out.RES = matrix containing scaled residuals for all the units of the original time series monitored in steps lshift+1, lshift+2, ....

out.yhat = vector of fitted values after final step.

out.residuals = vector of scaled residuals from after final NLS step.

out.weights = vector of weights after adaptive reweighting.

out.scale = final scale estimate of the residuals using final weights.

out.conflev = confidence level used to declare outliers.

out.outliers = vector of the units declared outliers.

out.singsub = number of subsets without full rank.

out.y = response vector y.

out.X = data matrix X containing trend, seasonal, expl (with autoregressive component) and lshift.

out.class = 'LTSts'.

Optional Output:

`msgstr` —Last warning message. String

This relates to the execution of the LTS.

References

Rousseeuw, P.J., Perrotta D., Riani M. and Hubert, M. (2018), Robust Monitoring of Many Time Series with Application to Fraud Detection, "Econometrics and Statistics". [RPRH]

Documentation

LTStsVarSel

Syntax

Description

Examples

run LTStsVarSel with all default options.

run LTStsVarSel starting from a specific over-parametrized model.

run LTStsVarSel starting from over-parametrized model with autoregressive components.

run LTStsVarSel with default options and return warning messages.

Related Examples

An example of the use of option firstTestLS rng('default') rng(10); data model model=struct; model.

Input Arguments

`y` — Time series to analyze. Vector.

Name-Value Pair Arguments

`firstTestLS` —initial test for presence of level shift.boolean.

`model` —model type.structure.

`nsamp` —number of subsamples to extract.scalar | vector of length 2.

`thPval` —threshold for pvalues.scalar.

`plots` —Plots on the screen.scalar.

`msg` —Messages on the screen.scalar.

`dispresults` —Display results of final fit.boolean.

Output Arguments

`reduced_est` —A reduced model structure obtained by eliminating parameters that are non-significant. It is a structure containing the typical input model fields for function LTSts (refer to LTSts for details): model.s = the optimal length of seasonal period

`reduced_model` —Output fields of the optimal model. Structure

`msgstr` —Last warning message. String

References

See Also

Documentation

LTStsVarSel

Syntax

Description

Examples

run LTStsVarSel with all default options.

run LTStsVarSel starting from a specific over-parametrized model.

run LTStsVarSel starting from over-parametrized model with autoregressive components.

run LTStsVarSel with default options and return warning messages.

Related Examples

An example of the use of option firstTestLS rng('default') rng(10); data model model=struct; model.

Input Arguments

y — Time series to analyze. Vector.

Name-Value Pair Arguments

firstTestLS —initial test for presence of level shift.boolean.

model —model type.structure.

nsamp —number of subsamples to extract.scalar | vector of length 2.

thPval —threshold for pvalues.scalar.

plots —Plots on the screen.scalar.

msg —Messages on the screen.scalar.

dispresults —Display results of final fit.boolean.

Output Arguments

reduced_est —A reduced model structure obtained by eliminating parameters that are non-significant. It is a structure containing the typical input model fields for function LTSts (refer to LTSts for details): model.s = the optimal length of seasonal period

reduced_model —Output fields of the optimal model. Structure

msgstr —Last warning message. String

References

See Also

`y` — Time series to analyze. Vector.

`firstTestLS` —initial test for presence of level shift.boolean.

`model` —model type.structure.

`nsamp` —number of subsamples to extract.scalar | vector of length 2.

`thPval` —threshold for pvalues.scalar.

`plots` —Plots on the screen.scalar.

`msg` —Messages on the screen.scalar.

`dispresults` —Display results of final fit.boolean.

`reduced_est` —A reduced model structure obtained by eliminating parameters that are non-significant. It is a structure containing the typical input model fields for function LTSts (refer to LTSts for details): model.s = the optimal length of seasonal period

`reduced_model` —Output fields of the optimal model. Structure

`msgstr` —Last warning message. String