SETARX

SETARX implements Threshold autoregressive models with two regimes

Syntax

• out=SETARX(y, p, d)example
• out=SETARX(y, p, d,Name,Value)example
• [out, reg]=SETARX(___)example
• [out, reg, input]=SETARX(___)example

Description

 out =SETARX(y, p, d) Example 1: Estimation from simulated data.

 out =SETARX(y, p, d, Name, Value) Example 2: Estimation from simulated data of example 1 with an extra constant column as regressor.

 [out, reg] =SETARX(___) Example 3: variant from example 1.

 [out, reg, input] =SETARX(___)

Examples

expand all

Example 1: Estimation from simulated data.

$\beta_1=(0.7, -0.5, -0.6, 0.3, 0.3)^{\prime}$ and ...

%  $\beta_2=(-0.1, -0.5, 0.6, 0.4, 0)^{\prime}$.
% SETAR with all the default options.
% Use simulated data.
rng('default')
rng(10)
n = 50;
y = randn(n,1);
X1 = randn(n,2);
for i = 3:n
y(i) = (y(i-2) < 0.5)*(0.3+0.7*y(i-1)-0.5*y(i-2)-0.6*X1(i,1)+0.3*X1(i,2))+...
(y(i-2) >= 0.5)*(-0.1*y(i-1)-0.5*y(i-2)+0.6*X1(i,1)+0.4*X1(i,2));
end
p = 2;
d = 2;
[out1] = SETARX(y, p ,d, 'X',X1);

Example 2: Estimation from simulated data of example 1 with an extra constant column as regressor.

n = 50;
y = randn(n,1);
X1 = randn(n,2);
X2 = [repmat(0.3,n,1) X1];
p = 2;
d = 2;
[out2] = SETARX(y, p ,d, 'X',X2);

Example 3: variant from example 1.

Estimation from simulated data of example 1 with an extra column as regressor, half zeros and half ones. This will produce a warning (column of zeros removed) during the loop for the estimation of the threshold value. Check out3.setarx.rmv_col_loop.

n = 50;
y = randn(n,1);
X1 = randn(n,2);
p = 2;
d = 2;
X3 = [[repmat(0,25,1);repmat(1,25,1)] X1];
[out3] = SETARX(y, p ,d, 'X',X3);

Input Arguments

y — Response variable. Vector.

Response variable, specified as a vector of length n, where n is the number of observations.

Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations. Data type - double.

Data Types: single| double

p — autoregressive order of y in regimes. Scalar.

If p = 0, the AR part is not present in the regimes, so an error is given in the case 'X' and 'Z' are empty and 'intercept' is false.

Data type - non negative integer.

Data Types: single| double

d — lag of the threshold variable $y_{(t-d)}$. Scalar.

Data Data type - positive integer.

Data Types: single| double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as  Name1,Value1,...,NameN,ValueN.

Example: ,,,

trim —Minimum fraction of observations contained in each regime.scalar.

The trimming parameter should be set between 0.05 and 0.45. The fraction of observations to trim from tails of the threshold variable, in order to ensure a sufficient number of observations around the true threshold parameter so that it can be identified (usually set between 0.10 and 0.15). Default is 0.15. If the number of observation to be trimmed is less that the total number of regressors, an error is given.

Example:

Data Types:

intercept —Indicator for constant term.true (default) | false.

Indicator for the constant term (intercept) in the fit, specified as the comma-separated pair consisting of 'Intercept' and either true to include or false to remove the constant term from the model.

Example:

Data Types:

X —Data matrix of explanatory variables.matrix of exogenous regressors of dimension n x k1, where k1 denotes the number of regressors excluding the intercept.

Rows of X represent observations, and columns represent variables. Each entry in y is the response for the corresponding row of X. By default, there is a constant term in the model, unless you explicitly remove it using input option intercept, so do not include a column of 1s in X. Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

Example:

Data Types:

Z —Deterministic variables (including dummies).matrix of deterministic regressors of dimension n x k2, where k2 denotes the number of regressors excluding the intercept.

Rows of Z represent observations, and columns represent variables. Each entry in y is the response for the corresponding row of Z. By default, there is a constant term in the model, unless you explicitly remove it using input option intercept, so do not include a column of 1s in Z. Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations. Data

Example:

Data Types:

Output Arguments

out — description Structure

A structure with the results of the SETARX model estimation containing the following fields:

Value Description
regime1

A sub-substructure containing the results of the OLS estimation of the linear regression model applyied to the data in regime 1. The estimation is performed with the function estregimeTAR.

Additional details are in the description of the outputs in the output structure reg.

regime2

A sub-substructure containing the results of the OLS estimation of the linear regression model applyied to the data in regime 2. The estimation is performed with the function estregimeTAR (see sections 'Outputs' and 'More about' of estregimeTAR).

rmv_col_loop

Warnings collected for regimes 1 and 2 in the loop for the search of the optimal threshold value. Matrix of strings of dimension n x 2. Warnings show the indices of columns removed from the matrix of regressors before the model estimation. Columns containing only zeros are removed. Then, to avoid multicollinearity, in the case of presence of multiple non-zero constant columns, the code leave only the first constant column.

thrhat

Estimated threshold value. Scalar. It is the threshold value that minimizes the joint RSS.

thrvar_ord

Index series after reorder of threshold variable yd. Vector.

sigma_2

Estimated residual variance of SETARX model. Scalar.

RSSj

Joint Residual Sum of Squared of SETARX model. Scalar.

yjhat

Fitted values of SETARX model. Vector.

resj

Residuals of the SETARX model. Vector.

yjhat_full

Fitted values of SETARX model with observations (rows) with missing or infinite values reinserted as NaNs. This is to obtain the same length of the initial input vector y defined by the user.

resj_full

Residuals of the SETARX model with observations (rows) with missing or infinite values reinserted as NaNs. This is to obtain the same length of the initial input vector ydefined by the user.

reg — description Structure

A structure with the results of the OLS estimation of the linear regression model (benchmark), contanining the following fields.

Value Description
beta

Estimated parameters of the regression model. Vector.

See out.covar.

se

Estimated heteroskedasticity-consistent (HC) standard errors. Vector.

covar

Estimated variance-covariance matrix. Matrix. It is the heteroskedasticity-consistent (HC) covariance matrix. See section 'More about'.

sigma_2

Estimated residual variance. Scalar.

yhat

Fitted values. Vector.

res

Residuals of the regression model. Vector.

RSS

Residual Sum of Squared. Scalar.

TSS

Total Sum of Squared. Scalar.

R_2

R^2. Scalar.

n

Number of observations entering in the estimation.

Scalar.

k

Number of regressors in the model left after the checks. It is the number of betas to be estimated by OLS. The betas corresponding to the removed columns of X will be set to 0 (see section 'More about' of estregimeTAR). Scalar.

rmv_col

Indices of columns removed from X before the model estimation. Scalar or vector.

Columns containing only zeros are removed. Then, to avoid multicollinearity, in the case of presence of multiple non-zero constant columns, the code leave only the first constant column (see section 'More about' of estregimeTAR).

rk_warning

Warning for skipped estimation. String. If the matrix X is singular after the adjustments, the OLS estimation is skipped, the parameters are set to NaN and a warning is produced.

yhat_full

Fitted values of the estimated linear regression model with observations (rows) with missing or infinite values reinserted as NaNs.

This is to obtain the same length of the initial input vector y defined by the user.

res_full

Residuals of the estimated linear regression model with observations (rows) with missing or infinite values reinserted as NaNs.

This is to obtain the same length of the initial input vector y defined by the user.

input — description Structure

A structure containing the following fields.

Value Description
y

Response without missing and infs. Vector. The new response variable, with observations (rows) with missing or infinite values excluded.

X

Predictor variables without infs and missings. Matrix.

The new matrix of explanatory variables, with missing or infinite values excluded, to be used for the model estimation. It is the matrix [L X Z intercept] where L is the lagged matrix n x p of y (if p > 0), X is the matrix of exogenous regressors defined by the user, Z is the matrix of deterministic regressors and the last column is the intercept (if any).

yd

Threshold variable without missing and infs. Vector. The new threshold variable, with observations (rows) with missing or infinite values excluded.

rmv_obs

Indices of removed observations/rows (because of missings or infs). Scalar vector.

y_full

Response y after adjustements by chkinputTAR BUT with observations (rows) with missing or infinite values included.

X_full

Matrix X after adjustements by chkinputTAR BUT with observations (rows) with missing or infinite values included.

yd_full

Threshold variable after adjustements by chkinputTAR BUT with observations (rows) with missing or infinite values included.

Given a time series $y_t$, a two-regime Self-Exciting Threshold Auto Regressive model SETARX($p$,$d$) with exogenous regressors is specified as $$\label{eqn:setar} y_t= \begin{cases} {\bf x}_{t} {\boldsymbol{\beta}}_1 + {\bf z}_{t} {\boldsymbol{\lambda}}_1 + \varepsilon_{1t}, \hspace{0.5cm} \textrm{if} \hspace{0.5cm} y_{t-d}\leq \gamma \\ {\bf x}_{t} {\boldsymbol{\beta}}_2 + {\bf z}_{t} {\boldsymbol{\lambda}}_2 + \varepsilon_{2t}, \hspace{0.5cm} \textrm{if} \hspace{0.5cm} y_{t-d}> \gamma \end{cases}$$ for $t=\max(p,d),...,N$, where $y_{t-d}$ is the threshold variable with $d\geq 1$ and $\gamma$ is the threshold value. The relation between $y_{t-d}$ and $\gamma$ states if $y_t$ is observed in regime 1 or 2. ${\boldsymbol{\beta}}_j$ is the vector of auto-regressive parameters for regime $j=1,2$ and ${\bf x}_{t}$ is the $t$-th row of the $(N\times p)$ matrix ${\bf X}$ comprising $p$ lagged variables of $y_t$. ${\boldsymbol{\lambda}}_j$ is the vector of parameters corresponding to exogenous regressors and/or dummies contained in the $(N \times r)$ matrix ${\bf Z}$ whose $t$-th row is ${\bf z}_t$. Errors $\varepsilon_{1t}$ and $\varepsilon_{2t}$ are assumed to be independent and to follow distributions $\mathrm{iid}(0,\sigma_{\varepsilon,1})$ and $\mathrm{iid}(0,\sigma_{\varepsilon,2})$ respectively.

\subsection{Estimation of SETAR models} \label{sec:2.1} In general the value of the threshold $\gamma$ is unknown, so that the parameters to estimate become ${\boldsymbol{\theta}}_1=({\boldsymbol{\beta}}^{\prime}_1, {\bf\lambda}^{\prime}_1)^{\prime}$, ${\boldsymbol{\theta}}_2=({\boldsymbol{\beta}}^{\prime}_2,{\bf\lambda}^{\prime}_2)^{\prime}$, $\gamma$, $\sigma_{\varepsilon,1}$ and $\sigma_{\varepsilon,2}$.

Parameters can be estimated by sequential conditional least squares. For a fixed threshold $\gamma$ the observations may be divided into two samples $\{y_t |y_{t-d}\leq \gamma\}$ and $\{y_t |y_{t-d}> \gamma\}$: the data can be denoted respectively as $\mathbf{y}_j=(y_{ji_1},y_{ji_2},...,y_{ji_{N_j}})^{\prime}$ in regimes $j=1,2$, with $N_1$ and $N_2$ the regimes sample sizes and $N_1+N_2=N-\max(p,d)$.

Parameters ${\boldsymbol{\theta}}_1$ and ${\boldsymbol{\theta}}_2$ can be estimated by OLS as $$\label{eqn:par} \hat{\boldsymbol{\theta}}_j=\left({\mathbf{X}^*_j}^{\prime}\mathbf{X}^*_j\right)^{-1}{\mathbf{X}^*_j}^{\prime}\mathbf{y}_j\,$$ for $j=1,2$ where $\mathbf{X}^*_j=(\mathbf{X}_j,\mathbf{Z}_j)=(({\bf x}_{ji_1}^{\prime},...,{\bf x}_{ji_{N_j}}^{\prime})^{\prime},({\bf z}_{ji_1}^{\prime},...,{\bf z}_{ji_{N_j}}^{\prime})^{\prime})$ is the $(N_j \times (p+r))$ matrix of regressors for each regime. The variance estimates can be calculated as $\hat{\sigma}_{\varepsilon,j}={\bf r}_j^{\prime}{\bf r}_j /(N_j - (p+r))$, with ${\bf r}_j={\bf y}_j-\mathbf{X}^*_j\hat{\boldsymbol{\theta}}_j$.

The least square estimate of $\gamma$ is obtained by minimizing the joint residual sum of squares $$\label{eqn:gamma} \gamma=\arg\min_{\gamma\in\Gamma}\sum_{j=1}^2 {\bf r}_j^{\prime}{\bf r}_j$$ over a set $\Gamma$ of allowable threshold values so that each regime contains at least a given fraction $\varphi$ (ranging from 0.05 to 0.3) of all observations

References

Franses and van Dijk (2000), "Nonlinear Time Series Models in Empirical Finance", Cambridge: Cambridge University Press.

Grossi, L. and Nan, F. (2019), Robust forecasting of electricity prices:

simulations, models and the impact of renewable sources, "Technological Forecasting & Social Change", Vol. 141, pp. 305-318.