LTStsVarSel

LTStsVarSel does variable selection in the robust time series model LTSts

Syntax

  • reduced_est=LTStsVarSel(y)example
  • reduced_est=LTStsVarSel(y,Name,Value)example
  • [reduced_est, reduced_model]=LTStsVarSel(___)example
  • [reduced_est, reduced_model, msgstr]=LTStsVarSel(___)example

Description

LTSts requires variable selection when the optimal model parameters are not known in advance. This happens in particular when the function has to be applied to many heterogeneous datasets in an automatic way, possibly on a regular basis (that is the model parameters are expected to change over time even for datasets associated to the same phenomenon).

The approach consists in iteratively eliminating the less significant estimated model parameter, starting from an over-parametrized model. The model parameters are re-estimated with LTSts at each step, until all the p-values are below a given threshold. Then, the output is a reduced time series model with significant parameters only.

example

reduced_est =LTStsVarSel(y) run LTStsVarSel with all default options.

example

reduced_est =LTStsVarSel(y, Name, Value) run LTStsVarSel starting from a specific over-parametrized model.

example

[reduced_est, reduced_model] =LTStsVarSel(___) run LTStsVarSel starting from over-parametrized model with autoregressive components.

example

[reduced_est, reduced_model, msgstr] =LTStsVarSel(___) run LTStsVarSel with default options and return warning messages.

Examples

expand all

  • run LTStsVarSel with all default options.
  • data model

    model=struct;
    model.trend=1;                  % linear trend
    model.trendb=[0 1];             % parameters of the linear trend
    model.s=12;                     % monthly time series
    model.seasonal=1;               % 1 harmonic with linear trend
    model.seasonalb=[10 10];        % parameter for one harmonic with linear trend
    model.lshiftb=100;              % level shift amplitude
    model.lshift= 30;               % level shift amplitude
    model.signal2noiseratio = 100;  % signal to noise
    rng('default')
    n = 100;                        % sample size
    tmp = rand(n,1);
    model.X = tmp.*[1:n]';          % a extra covariate
    model.Xb = 1;                   % beta coefficient of the covariate
    % generate data
    out_sim=simulateTS(n,'plots',1,'model',model);
    %run LTStsVarSel with all default options
    rng(1);
    [out_model_0, out_reduced_0] = LTStsVarSel(out_sim.y);
    % optional: add a FS step to the LTSts estimator
    % outFS = FSRts(out_sim.y,'model',out_model_0);
    % To be fixed: 'Non existent user option found-> '    'ARp'

  • run LTStsVarSel starting from a specific over-parametrized model.
  • sample size

    n = 100;                       
    tmp = rand(n,1);
    % data model
    model=struct;
    model.trend=1;                  % linear trend
    model.trendb=[0 1];             % parameters of the linear trend
    model.s=12;                     % monthly time series
    model.seasonal=1;               % 1 harmonic with linear trend
    model.seasonalb=[10 10];        % parameter for one harmonic with linear trend
    model.lshiftb=100;              % level shift amplitude
    model.lshift= 30;               % level shift amplitude
    model.signal2noiseratio = 100;  % signal to noise
    model.X = tmp.*[1:n]';          % a extra covariate
    model.Xb = 1;                   % beta coefficient of the covariate
    out_sim=simulateTS(n,'plots',1,'model',model);
    % complete model to be tested.
    overmodel=struct;
    overmodel.trend=2;              % quadratic trend
    overmodel.s=12;                 % monthly time series
    overmodel.seasonal=303;         % number of harmonics
    overmodel.lshift=4;             % position where to start monitoring level shift
    overmodel.X=tmp.*[1:n]';
    % pval threshold
    thPval=0.01;
    [out_model_1, out_reduced_1] = LTStsVarSel(out_sim.y,'model',overmodel,'thPval',thPval,'plots',1);

  • run LTStsVarSel starting from over-parametrized model with autoregressive components.
  • add three autoregressive components to the complete model.

    n = 100;                        % sample size
    tmp = rand(n,1);
    model.X = tmp.*[1:n]';          % a extra covariate
    model.Xb = 1;                   % beta coefficient of the covariate
    out_sim=simulateTS(n,'plots',1,'model',model);
    % complete model to be tested.
    overmodel=struct;
    overmodel.trend=2;              % quadratic trend
    overmodel.s=12;                 % monthly time series
    overmodel.seasonal=303;         % number of harmonics
    overmodel.lshift=4;             % position where to start monitoring level shift
    overmodel.X=tmp.*[1:n]';
    overmodel.ARp=3;
    % pval threshold
    thPval=0.01;
    [out_model_2, out_reduced_2] = LTStsVarSel(out_sim.y,'model',overmodel,'thPval',thPval);

  • run LTStsVarSel with default options and return warning messages.
  • data model

    model=struct;
    model.trend=1;                  % linear trend
    model.trendb=[0 1];             % parameters of the linear trend
    model.s=12;                     % monthly time series
    model.seasonal=1;               % 1 harmonic with linear trend
    model.seasonalb=[10 10];        % parameter for one harmonic with linear trend
    model.lshiftb=100;              % level shift amplitude
    model.lshift= 30;               % level shift amplitude
    model.signal2noiseratio = 100;  % signal to noise
    n = 100;                        % sample size
    tmp = rand(n,1);
    model.X = tmp.*[1:n]';          % a extra covariate
    model.Xb = 1;                   % beta coefficient of the covariate
    % generate data
    out_sim=simulateTS(n,'plots',1,'model',model);
    [out_model_3, out_reduced_3, messages] = LTStsVarSel(out_sim.y);

    Input Arguments

    expand all

    y — Time series to analyze. Vector.

    A row or a column vector with T elements, which contains the time series.

    Data Types: single| double

    Name-Value Pair Arguments

    Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

    Example: 'model', model , 'thPval',0.05 , 'plots',1 , 'msg',1 , 'dispresults',true

    model —model type.structure.

    A structure which specifies the (over-parametrized) model which will be used to initialise the variable selection process. The model structure is identical to the one defined for function LTSts: for convenience, we list the fields also here:

    Value Description
    s

    scalar (length of seasonal period). For monthly data s=12 (default), for quartely data s=4, ...

    trend

    scalar (order of the trend component).

    trend = 0 implies no trend trend = 1 implies linear trend with intercept (default), trend = 2 implies quadratic trend trend = 3 implies cubic trend Admissible values for trend are, 0, 1, 2 and 3.

    In the paper RPRH to denote the order of the trend symbol A is used. If this field is not present into input structure model, model.trend=2 is used.

    seasonal

    scalar (integer specifying number of frequencies, i.e. harmonics, in the seasonal component. Possible values for seasonal are $1, 2, ..., [s/2]$, where $[s/2]=floor(s/2)$.

    For example:

    if seasonal =1 (default) we have:

    $\beta_1 \cos( 2 \pi t/s) + \beta_2 sin ( 2 \pi t/s)$;

    if seasonal =2 we have:

    $\beta_1 \cos( 2 \pi t/s) + \beta_2 \sin ( 2 \pi t/s) + \beta_3 \cos(4 \pi t/s) + \beta_4 \sin (4 \pi t/s)$.

    Note that when $s$ is even the sine term disappears for $j=s/2$ and so the maximum number of trigonometric parameters is $s-1$.

    If seasonal is a number greater than 100 then it is possible to specify how the seasonal component grows over time.

    For example, seasonal = 101 implies a seasonal component which just uses one frequency which grows linearly over time as follows:

    $(1+\beta_3 t)\times ( \beta_1 cos( 2 \pi t/s) + \beta_2 \sin ( 2 \pi t/s))$.

    For example, seasonal =201 implies a seasonal component which just uses one frequency which grows in a quadratic way over time as follows:

    $(1+\beta_3 t + \beta_4 t^2)\times( \beta_1 \cos( 2 \pi t/s) + \beta_2 \sin ( 2 \pi t/s))$.

    seasonal =0 implies a non seasonal model.

    In the paper RPRH to denote the number of frequencies of the seasonal component symbol B is used, while symbol G is used to denote the order of the trend of the seasonal component.

    Therefore, for example, model.seasonal=201 corresponds to B=1 and G=2, while model.seasonal=3 corresponds to B=3 and G=0;

    X

    matrix of size T-by-nexpl containing the values of nexpl extra covariates which are likely to affect y.

    lshift

    scalar greater or equal than 0 which specifies whether it is necessary to include a level shift component. lshift = 0 (default) implies no level shift component. If lshift is an interger greater then 0 then it is possible to specify the moment to start considering level shifts. For example if lshift =13 then the following additional parameters are estimated $\beta_{LS1}* I(t \geq beta_{LS2})$ where $\beta_{LS1}$ is a real number and $\beta_{LS2}$ is an integer which assumes values 14, 14, ..., T-13.

    In general, the level shift which are considered are referred to times (lshift+1):(T-lshift).

    In the paper RPRH $\beta_{LS1}$ is denoted with symbol $\delta_1$, while, $\beta_{LS2}$ is denoted with symbol $\delta_2$.

    ARp

    scalar greater or equal than 0 which specifies the length of the autoregressive component. The default value of model.ARp is 0, that is there is no autoregressive component.

    Remark: the default overparametrized model is for monthly data with a quadratic trend (3 parameters) + seasonal component with just one harmonic (2 parameters), no additional explanatory variables, no level shift and no AR component that is model=struct;

    model.s=12;

    model.trend=1;

    model.seasonal=1;

    model.X='';

    model.lshift=0;

    model.ARp=0;

    Using the notation of the paper RPRH we have A=1, B=1; and $\delta_1=0$.

    Example: 'model', model

    Data Types: struct

    thPval —threshold for pvalues.scalar.

    A value between 0 and 1.

    An estimated parameter/variable is eliminated if the associated pvalue is below thPval. Default is thPval=0.01.

    Example: 'thPval',0.05

    Data Types: double

    plots —Plots on the screen.scalar.

    If plots = 1, the typical LTSts plots will be shown on the screen. The default value of plot is 0 i.e. no plot is shown on the screen.

    Example: 'plots',1

    Data Types: double

    msg —Messages on the screen.scalar.

    Scalar which controls whether LTSts will display or not messages on the screen. Deafault is msg=0, that is no messages are displayed on the screen. If msg==1 messages displayed on the screen are about estimated time to compute the estimator and the warnings about 'MATLAB:rankDeficientMatrix', 'MATLAB:singularMatrix' and 'MATLAB:nearlySingularMatrix'.

    Example: 'msg',1

    Data Types: double

    dispresults —Display results of final fit.boolean.

    If dispresults is true, labels of coefficients, estimated coefficients, standard errors, tstat and p-values are shown on the screen in a fully formatted way. The default value of dispresults is false.

    Example: 'dispresults',true

    Data Types: logical

    Output Arguments

    expand all

    reduced_est —A reduced model structure obtained by eliminating parameters that are non-significant. It is a structure containing the typical input model fields for function LTSts (refer to LTSts for details): model.s = the optimal length of seasonal period

    model.trend = the optimal order of the trend.

    model.seasonal = the optimal number of frequencies in the seasonal component.

    model.lshift = the optimal level shift position.

    model.X = a matrix containing the values of the extra covariates which are likely to affect y. If the imput model specifies autoregressive components in model.ARp, then the selected ones will be also included in model.X.

    reduced_model —Output fields of the optimal model. Structure

    The fields are those of function LTSts (refer to LTSts for details):

    out.B = matrix of estimated beta coefficients.

    out.h = number of observations that have determined the initial LTS estimator.

    out.bs = vector of the units with the smallest squared residuals before the reweighting step.

    out.Hsubset = matrix of the units forming best H subset for each tentative level shift considered.

    out.numscale2 = matrix of the values of the lts.bestr smallest values of the target function.

    out.BestIndexes = matrix of indexes associated with the best nbestindexes solutions.

    out.Likloc = matrix containing local sum of squares of residuals determining the best position of level shift.

    out.RES = matrix containing scaled residuals for all the units of the original time series monitored in steps lshift+1, lshift+2, ....

    out.yhat = vector of fitted values after final step.

    out.residuals = vector of scaled residuals from after final NLS step.

    out.weights = vector of weights after adaptive reweighting.

    out.scale = final scale estimate of the residuals using final weights.

    out.conflev = confidence level used to declare outliers.

    out.outliers = vector of the units declared outliers.

    out.singsub = number of subsets wihtout full rank.

    out.y = response vector y.

    out.X = data matrix X containing trend, seasonal, expl (with autoregressive component) and lshift.

    out.class = 'LTSts'.

    Optional Output:

    msgstr —Last warning message. String

    This relates to the execution of the LTS.

    References

    Rousseeuw, P.J., Perrotta D., Riani M. and Hubert, M. (2018), Robust Monitoring of Many Time Series with Application to Fraud Detection, "Econometrics and Statistics". [RPRH]

    See Also

    This page has been automatically generated by our routine publishFS