LTStsVarSel

LTStsVarSel does variable selection in the robust time series model LTSts

Syntax

  • reduced_est=LTStsVarSel(y)example
  • reduced_est=LTStsVarSel(y,Name,Value)example
  • [reduced_est, reduced_model]=LTStsVarSel(___)example
  • [reduced_est, reduced_model, msgstr]=LTStsVarSel(___)example

Description

LTSts requires variable selection when the optimal model parameters are not known in advance. This happens in particular when the function has to be applied to many heterogeneous datasets in an automatic way, possibly on a regular basis (that is the model parameters are expected to change over time even for datasets associated to the same phenomenon).

The approach consists in iteratively eliminating the less significant estimated model parameter, starting from an over-parametrized model. The model parameters are re-estimated with LTSts at each step, until all the p-values are below a given threshold. Then, the output is a reduced time series model with significant parameters only.

example

reduced_est =LTStsVarSel(y) run LTStsVarSel with all default options.

example

reduced_est =LTStsVarSel(y, Name, Value) run LTStsVarSel starting from a specific over-parametrized model.

example

[reduced_est, reduced_model] =LTStsVarSel(___) run LTStsVarSel starting from over-parametrized model with autoregressive components.

example

[reduced_est, reduced_model, msgstr] =LTStsVarSel(___) run LTStsVarSel with default options and return warning messages.

Examples

expand all

  • run LTStsVarSel with all default options.
  • rng('default')
    rng(1);
    % data model
    model=struct;
    model.trend=1;                  % linear trend
    model.trendb=[0 1];             % parameters of the linear trend
    model.s=12;                     % monthly time series
    model.seasonal=1;               % 1 harmonic
    model.seasonalb=[10 10];        % parameter for one harmonic
    model.signal2noiseratio = 100;  % signal to noise ratio
    n = 100;                        % sample size
    % generate data
    out_sim=simulateTS(n,'plots',1,'model',model);
    %run LTStsVarSel with all default options
    [out_model_0, out_reduced_0] = LTStsVarSel(out_sim.y);
    % optional: add a FS step to the LTSts estimator
    % outFS = FSRts(out_sim.y,'model',out_model_0);
    % To be fixed: 'Non existent user option found-> '    'ARp'

  • run LTStsVarSel starting from a specific over-parametrized model.
  • rng('default')
    rng(1);
    % sample size
    n = 100;                       
    tmp = rand(n,1);
    % data model
    model=struct;
    model.trend=1;                  % linear trend
    model.trendb=[0 1];             % parameters of the linear trend
    model.s=12;                     % monthly time series
    model.seasonal=1;               % 1 harmonic
    model.seasonalb=[10 10];        % parameter for one harmonic
    model.lshiftb=100;              % level shift amplitude
    model.lshift= 30;               % level shift position
    model.signal2noiseratio = 100;  % signal to noise ratio
    model.X = tmp.*[1:n]';          % a extra covariate
    model.Xb = 1;                   % beta coefficient of the covariate
    out_sim=simulateTS(n,'plots',1,'model',model);
    % complete model to be tested.
    overmodel=struct;
    overmodel.trend=2;              % quadratic trend
    overmodel.s=12;                 % monthly time series
    overmodel.seasonal=303;         % number of harmonics
    overmodel.lshift=4:n-4;         % positions of level shift which have to be considered
    overmodel.X=tmp.*[1:n]';
    % pval threshold
    thPval=0.01;
    [out_model_1, out_reduced_1] = LTStsVarSel(out_sim.y,'model',overmodel,'thPval',thPval,'plots',1);

  • run LTStsVarSel starting from over-parametrized model with autoregressive components.
  • rng('default')
    rng(1);
    % add three autoregressive components to the complete model.
    n = 100;                        % sample size
    tmp = rand(n,1);
    model=struct;
    model.trend=1;                  % linear trend
    model.trendb=[0 1];             % parameters of the linear trend
    model.s=12;                     % monthly time series
    model.seasonal=1;               % 1 harmonic
    model.seasonalb=[10 10];
    model.X = tmp.*[1:n]';          % a extra covariate
    model.Xb = 1;                   % beta coefficient of the covariate
    model.signal2noiseratio = 100;  % signal to noise ratio
    out_sim=simulateTS(n,'plots',1,'model',model);
    % complete model to be tested.
    overmodel=struct;
    overmodel.trend=2;              % quadratic trend
    overmodel.s=12;                 % monthly time series
    overmodel.seasonal=303;         % number of harmonics
    overmodel.lshift=0;             % no level shift
    overmodel.X=tmp.*[1:n]';
    overmodel.ARp=1:3;
    % pval threshold
    thPval=0.01;
    [out_model_2, out_reduced_2] = LTStsVarSel(out_sim.y,'model',overmodel,'thPval',thPval);

  • run LTStsVarSel with default options and return warning messages.
  • rng('default')
    rng(1);
    % data model
    model=struct;
    model.trend=1;                  % linear trend
    model.trendb=[0 1];             % parameters of the linear trend
    model.s=12;                     % monthly time series
    model.seasonal=1;               % 1 harmonic
    model.seasonalb=[10 10];        % parameter for one harmonic
    model.lshiftb=100;              % level shift amplitude
    model.lshift= 30;               % level shift position
    model.signal2noiseratio = 100;  % signal to noise ratio
    n = 100;                        % sample size
    tmp = rand(n,1);
    model.X = tmp.*[1:n]';          % a extra covariate
    model.Xb = 1;                   % beta coefficient of the covariate
    % generate data
    out_sim=simulateTS(n,'plots',1,'model',model);
    [out_model_3, out_reduced_3, messages] = LTStsVarSel(out_sim.y);

    Related Examples

    expand all

  • An example of the use of option firstTestLS rng('default') rng(10); data model model=struct; model.
  • rng('default')
    rng(10);
    % data model
    model=struct;
    model.trend=1;                  % linear trend
    model.trendb=[0 1];             % parameters of the linear trend
    model.s=12;                     % monthly time series
    model.seasonal=1;               % 1 harmonic
    model.seasonalb=[2 3];        % parameter for one harmonic
    model.lshiftb=50;              % level shift amplitude
    model.lshift= 35;               % level shift position
    model.signal2noiseratio = 30;  % signal to noise ratio
    model.ARp=1;
    model.ARb=0.9;
    n = 50;                        % sample size
    tmp = rand(n,1);
    model.X = tmp;          % a extra covariate
    model.Xb = 1;                   % beta coefficient of the covariate
    % generate data
    out_sim=simulateTS(n,'plots',1,'model',model);
    overmodel=model;
    overmodel.trend=2;              % quadratic trend
    overmodel.s=12;                 % monthly time series
    overmodel.seasonal=303;         % number of harmonics
    overmodel.lshift=-1;
    overmodel=rmfield(overmodel,"trendb");
    overmodel=rmfield(overmodel,"seasonalb");
    overmodel=rmfield(overmodel,"signal2noiseratio");
    overmodel=rmfield(overmodel,"lshiftb");
    overmodel=rmfield(overmodel,"Xb");
    overmodel=rmfield(overmodel,"ARb");
    nsamp=100;
    tic;
    [out_model_3, out_reduced_3] = LTStsVarSel(out_sim.y, ...
    'model',overmodel,'nsamp',nsamp);
    compTime=toc; 
    disp(compTime)
    disp('Final selected model without option firstTestLS')
    disp(out_model_3)
    [out_model_3N, out_reduced_3N] = LTStsVarSel(out_sim.y, ...
    'model',overmodel,'firstTestLS',true,'nsamp',nsamp);
    disp('Final selected model with option firstTestLS')
    disp(out_model_3N)

    Input Arguments

    expand all

    y — Time series to analyze. Vector.

    A row or a column vector with T elements, which contains the time series.

    Data Types: single| double

    Name-Value Pair Arguments

    Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

    Example: 'firstTestLS', false , 'model', model , 'nsamp',500 , 'thPval',0.05 , 'plots',1 , 'msg',1 , 'dispresults',true

    firstTestLS —initial test for presence of level shift.boolean.

    if firstTestLS is true, we immediately find the position of the level shift in a model which does not contain autoregressive terms, the seasonal specification is 101 If the level shift component is significant we pass the level shift component in fixed position to the variable selection procedure. Note also that the units declared as outliers with a p-value smaller than 0.001 are used to form model.ARtentout. model.ARtentout is used in the subsequent steps of the variable selection procedure, every time there is a call to LTSts with an autoregressive component. The default value of firstTestLS is false.

    Example: 'firstTestLS', false

    Data Types: logical

    model —model type.structure.

    A structure which specifies the (over-parametrized) model which will be used to initialise the variable selection process. The model structure is identical to the one defined for function LTSts: for convenience, we list the fields also here:

    Value Description
    s

    scalar (length of seasonal period). For monthly data s=12 (default), for quartely data s=4, ...

    trend

    scalar (order of the trend component).

    trend = 0 implies no trend, trend = 1 implies linear trend with intercept, trend = 2 implies quadratic trend, trend = 3 implies cubic trend.

    Admissible values for trend are, 0, 1, 2 and 3.

    In the paper RPRH to denote the order of the trend symbol A is used. If this field is not present into input structure model, model.trend=2 is used.

    seasonal

    scalar (integer specifying number of frequencies, i.e. harmonics, in the seasonal component. Possible values for seasonal are $1, 2, ..., [s/2]$, where $[s/2]=floor(s/2)$.

    For example: if seasonal =1 we have: $\beta_1 \cos( 2 \pi t/s) + \beta_2 sin ( 2 \pi t/s)$;

    if seasonal =2 we have: $\beta_1 \cos( 2 \pi t/s) + \beta_2 \sin ( 2 \pi t/s) + \beta_3 \cos(4 \pi t/s) + \beta_4 \sin (4 \pi t/s)$.

    Note that when $s$ is even the sine term disappears for $j=s/2$ and so the maximum number of trigonometric parameters is $s-1$.

    If seasonal is a number greater than 100 then it is possible to specify how the seasonal component grows over time.

    For example, seasonal = 101 implies a seasonal component which just uses one frequency which grows linearly over time as follows: $(1+\beta_3 t)\times ( \beta_1 cos( 2 \pi t/s) + \beta_2 \sin ( 2 \pi t/s))$.

    For example, seasonal =201 implies a seasonal component which just uses one frequency which grows in a quadratic way over time as follows: $(1+\beta_3 t + \beta_4 t^2)\times( \beta_1 \cos( 2 \pi t/s) + \beta_2 \sin ( 2 \pi t/s))$.

    seasonal =0 implies a non seasonal model.

    In the paper RPRH to denote the number of frequencies of the seasonal component symbol B is used, while symbol G is used to denote the order of the trend of the seasonal component.

    Therefore, for example, model.seasonal=201 corresponds to B=1 and G=2, while model.seasonal=3 corresponds to B=3 and G=0.

    If this field is not present into input structure model, model.seasonal=303 is used.

    X

    matrix of size T-by-nexpl containing the values of nexpl extra covariates which are likely to affect y.

    lshift

    scalar or vector associated to level shift component. lshift=0 (default) implies no level shift component.

    If model.lshift is vector of positive integers, then it is associated to the positions of level shifts which have to be considered. The most significant one is included in the fitted model.

    For example if model.lshift =[13 20 36] a tentative level shift is imposed in position $t=13$, $t=20$ and $t=36$. The most significant among these positions in included in the final model. In other words, the following extra parameters are added to the final model: $\beta_{LS1}* I(t \geq \beta_{LS2})$ where $\beta_{LS1}$ is a real number (associated with the magnitude of the level shift) and $\beta_{LS2}$ is an integer which assumes values 13, 20 or 36 and and $I$ denotes the indicator function.

    As a particular case, if model.lshift =13 then a level shift in position $t=13$ is added to the model. In other words the following additional parameters are added: $\beta_{LS1}* I(t \geq 13)$ where $\beta_{LS1}$ is a real number and $I$ denotes the indicator function.

    If lshift = -1 tentative level shifts are considered for positions $p+1,p+2, ..., T-p$ and the most significant one is included in the final model ($p$ is the total number of parameters in the fitted model). Note that lshift=-1 is not supported for C-coder translation.

    In the paper RPRH $\beta_{LS1}$ is denoted with symbol $\delta_1$, while, $\beta_{LS2}$ is denoted with symbol $\delta_2$.

    ARp

    vector with non negative integer numbers specifying the autoregressive components. For example: model.ARp=[1 2] means a AR(2) process;

    model.ARp=2 means just the lag 2 component;

    model.ARp=[1 2 5 8] means AR(2) + lag 5 + lag 8;

    model.ARp=0 (default) means no autoregressive component.

    ARtentout

    matrix of size r-by-2 containing the list of the units declared as outliers (first column) and corresponding fitted values (second column) or empty scalar. If model.ARtentout is not empty, when the autoregressive component is present, the y values which are used to compute the autoregressive component are replaced by model.tentout(:,2) for the units contained in model.tentout(:,1) Remark: the default overparametrized model is for monthly data with a quadratic trend (3 parameters) + seasonal component with three harmonics which grows in a cubic way over time (9 parameters), no additional explanatory variables, no level shift and no AR component that is model=struct;

    model.s=12;

    model.trend=2;

    model.seasonal=303;

    model.X='';

    model.lshift=0;

    model.ARp=0;

    Using the notation of the paper RPRH we have A=2, B=3; G=3; and $\delta_1=0$.

    Example: 'model', model

    Data Types: struct

    nsamp —number of subsamples to extract.scalar | vector of length 2.

    Vector of length 1 or 2 which controls the number of subsamples which will be extracted to find the robust estimator. If lshift is not equal to 0 then nsamp(1) controls the number of subsets which have to be extracted to find the solution for t=lshift(1). nsamp(2) controls the number of subsets which have to be extracted to find the solution for t=lshift(2), lshift(3), ..., lshift(end).

    Note that nsamp(2) is generally smaller than nsamp(1) because in order to compute the best solution for t=lshift(2), lshift(3), ..., lshift(end), we use the lts.bestr/2 best solutions from previous t (after shifting the position of the level shift in the estimator of beta). If lshift is a vector of positive integers the default value of nsamp is (500 250). If lshift is a vector of positive integers and nsamp is supplied as a scalar the default is to extract [nsamp/2] subsamples for t=lshift(1), lshift(2), ... Therefore, for example, in order to extract 600 subsamples for t=lshift(1) and 300 subsamples for t= lshift(2) ... you can use nsamp =600 or nsamp=[600 300].

    The default value of nsamp is 1000;

    Remark: if nsamp=0 all subsets will be extracted.

    They will be (n choose p).

    Example: 'nsamp',500

    Data Types: double

    thPval —threshold for pvalues.scalar.

    A value between 0 and 1.

    An estimated parameter/variable is eliminated if the associated pvalue is below thPval. Default is thPval=0.01.

    Example: 'thPval',0.05

    Data Types: double

    plots —Plots on the screen.scalar.

    If plots = 1, the typical LTSts plots will be shown on the screen. The default value of plot is 0 i.e. no plot is shown on the screen.

    Example: 'plots',1

    Data Types: double

    msg —Messages on the screen.scalar.

    Scalar which controls whether LTSts will display or not messages on the screen. Deafault is msg=0, that is no messages are displayed on the screen. If msg==1 messages displayed on the screen are about estimated time to compute the estimator and the warnings about 'MATLAB:rankDeficientMatrix', 'MATLAB:singularMatrix' and 'MATLAB:nearlySingularMatrix'.

    Example: 'msg',1

    Data Types: double

    dispresults —Display results of final fit.boolean.

    If dispresults is true, labels of coefficients, estimated coefficients, standard errors, tstat and p-values are shown on the screen in a fully formatted way. The default value of dispresults is false.

    Example: 'dispresults',true

    Data Types: logical

    Output Arguments

    expand all

    reduced_est —A reduced model structure obtained by eliminating parameters that are non-significant. It is a structure containing the typical input model fields for function LTSts (refer to LTSts for details): model.s = the optimal length of seasonal period

    model.trend = the optimal order of the trend.

    model.seasonal = the optimal number of frequencies in the seasonal component.

    model.lshift = the optimal level shift position.

    model.X = a matrix containing the values of the extra covariates which are likely to affect y. If the imput model specifies autoregressive components in model.ARp, then the selected ones will be also included in model.X.

    reduced_model —Output fields of the optimal model. Structure

    The fields are those of function LTSts (refer to LTSts for details): out.B = matrix of estimated beta coefficients.

    out.h = number of observations that have determined the initial LTS estimator.

    out.bs = vector of the units with the smallest squared residuals before the reweighting step.

    out.Hsubset = matrix of the units forming best H subset for each tentative level shift considered.

    out.numscale2 = matrix of the values of the lts.bestr smallest values of the target function.

    out.BestIndexes = matrix of indexes associated with the best nbestindexes solutions.

    out.Likloc = matrix containing local sum of squares of residuals determining the best position of level shift.

    out.RES = matrix containing scaled residuals for all the units of the original time series monitored in steps lshift+1, lshift+2, ....

    out.yhat = vector of fitted values after final step.

    out.residuals = vector of scaled residuals from after final NLS step.

    out.weights = vector of weights after adaptive reweighting.

    out.scale = final scale estimate of the residuals using final weights.

    out.conflev = confidence level used to declare outliers.

    out.outliers = vector of the units declared outliers.

    out.singsub = number of subsets wihtout full rank.

    out.y = response vector y.

    out.X = data matrix X containing trend, seasonal, expl (with autoregressive component) and lshift.

    out.class = 'LTSts'.

    Optional Output:

    msgstr —Last warning message. String

    This relates to the execution of the LTS.

    References

    Rousseeuw, P.J., Perrotta D., Riani M. and Hubert, M. (2018), Robust Monitoring of Many Time Series with Application to Fraud Detection, "Econometrics and Statistics". [RPRH]

    See Also

    This page has been automatically generated by our routine publishFS