SETARX implements Threshold autoregressive models with two regimes

```
```

Example 2: Estimation from simulated data of example 1 with an extra constant column as regressor.`out`

=SETARX(`y`

,
`p`

,
`d`

,
`Name, Value`

)

$\beta_1=(0.7, -0.5, -0.6, 0.3, 0.3)^{\prime}$ and ...

% $\beta_2=(-0.1, -0.5, 0.6, 0.4, 0)^{\prime}$. % SETAR with all the default options. % Use simulated data. rng('default') rng(10) n = 50; y = randn(n,1); X1 = randn(n,2); for i = 3:n y(i) = (y(i-2) < 0.5)*(0.3+0.7*y(i-1)-0.5*y(i-2)-0.6*X1(i,1)+0.3*X1(i,2))+... (y(i-2) >= 0.5)*(-0.1*y(i-1)-0.5*y(i-2)+0.6*X1(i,1)+0.4*X1(i,2)); end p = 2; d = 2; [out, reg, input] = SETARX(y, p ,d, 'X',X1);

Warning: observations in positions 1 2 have been removed! Searching for the optimal threshold: loop 1 of 34 Searching for the optimal threshold: loop 2 of 34 Searching for the optimal threshold: loop 3 of 34 Searching for the optimal threshold: loop 4 of 34 Searching for the optimal threshold: loop 5 of 34 Searching for the optimal threshold: loop 6 of 34 Searching for the optimal threshold: loop 7 of 34 Searching for the optimal threshold: loop 8 of 34 Searching for the optimal threshold: loop 9 of 34 Searching for the optimal threshold: loop 10 of 34 Searching for the optimal threshold: loop 11 of 34 Searching for the optimal threshold: loop 12 of 34 Searching for the optimal threshold: loop 13 of 34 Searching for the optimal threshold: loop 14 of 34 Searching for the optimal threshold: loop 15 of 34 Searching for the optimal threshold: loop 16 of 34 Searching for the optimal threshold: loop 17 of 34 Searching for the optimal threshold: loop 18 of 34 Searching for the optimal threshold: loop 19 of 34 Searching for the optimal threshold: loop 20 of 34 Searching for the optimal threshold: loop 21 of 34 Searching for the optimal threshold: loop 22 of 34 Searching for the optimal threshold: loop 23 of 34 Searching for the optimal threshold: loop 24 of 34 Searching for the optimal threshold: loop 25 of 34 Searching for the optimal threshold: loop 26 of 34 Searching for the optimal threshold: loop 27 of 34 Searching for the optimal threshold: loop 28 of 34 Searching for the optimal threshold: loop 29 of 34 Searching for the optimal threshold: loop 30 of 34 Searching for the optimal threshold: loop 31 of 34 Searching for the optimal threshold: loop 32 of 34 Searching for the optimal threshold: loop 33 of 34 Searching for the optimal threshold: loop 34 of 34

n = 50; y = randn(n,1); X1 = randn(n,2); X2 = [repmat(0.3,n,1) X1]; p = 2; d = 2; [out2] = SETARX(y, p ,d, 'X',X2);

Estimation from simulated data of example 1 with an extra column as regressor, half zeros and half ones. This will produce a warning (column of zeros removed) during the loop for the estimation of the threshold value. Check out3.setarx.rmv_col_loop.

n = 50; y = randn(n,1); X1 = randn(n,2); p = 2; d = 2; X3 = [[repmat(0,25,1);repmat(1,25,1)] X1]; [out3] = SETARX(y, p ,d, 'X',X3);

`y`

— Response variable.
Vector.Response variable, specified as a vector of length n, where n is the number of observations.

Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations. Data type - double.

**
Data Types: **`single| double`

`p`

— autoregressive order of y in regimes.
Scalar.If p = 0, the AR part is not present in the regimes, so an error is given in the case 'X' and 'Z' are empty and 'intercept' is false.

Data type - non negative integer.

**
Data Types: **`single| double`

`d`

— lag of the threshold variable $y_{(t-d)}$.
Scalar.Data Data type - positive integer.

**
Data Types: **`single| double`

Specify optional comma-separated pairs of `Name,Value`

arguments.
`Name`

is the argument name and `Value`

is the corresponding value. `Name`

must appear
inside single quotes (`' '`

).
You can specify several name and value pair arguments in any order as ```
Name1,Value1,...,NameN,ValueN
```

.

,

,

,

`trim`

—Minimum fraction of observations contained in each regime.scalar.The trimming parameter should be set between 0.05 and 0.45. The fraction of observations to trim from tails of the threshold variable, in order to ensure a sufficient number of observations around the true threshold parameter so that it can be identified (usually set between 0.10 and 0.15). Default is 0.15. If the number of observation to be trimmed is less that the total number of regressors, an error is given.

**Example: **

**Data Types: **

`intercept`

—Indicator for constant term.true (default) | false.Indicator for the constant term (intercept) in the fit, specified as the comma-separated pair consisting of 'Intercept' and either true to include or false to remove the constant term from the model.

**Example: **

**Data Types: **

`X`

—Data matrix of explanatory variables.matrix of exogenous regressors of dimension n x k1, where k1 denotes the number of regressors excluding the intercept.Rows of X represent observations, and columns represent variables. Each entry in y is the response for the corresponding row of X. By default, there is a constant term in the model, unless you explicitly remove it using input option intercept, so do not include a column of 1s in X. Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

**Example: **

**Data Types: **

`Z`

—Deterministic variables (including dummies).matrix of deterministic regressors of dimension n x k2, where k2 denotes the number of regressors excluding the intercept.Rows of Z represent observations, and columns represent variables. Each entry in y is the response for the corresponding row of Z. By default, there is a constant term in the model, unless you explicitly remove it using input option intercept, so do not include a column of 1s in Z. Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations. Data

**Example: **

**Data Types: **

`out`

— description
StructureA structure with the results of the SETARX model estimation containing the following fields:

Value | Description |
---|---|

`regime1` |
A sub-substructure containing the results of the OLS estimation of the linear regression model applyied to the data in regime 1. The estimation is performed with the function estregimeTAR. Additional details are in the description of the outputs in the output structure reg. |

`regime2` |
A sub-substructure containing the results of the OLS estimation of the linear regression model applyied to the data in regime 2. The estimation is performed with the function estregimeTAR (see sections 'Outputs' and 'More about' of estregimeTAR). |

`rmv_col_loop` |
Warnings collected for regimes 1 and 2 in the loop for the search of the optimal threshold value. Matrix of strings of dimension n x 2. Warnings show the indices of columns removed from the matrix of regressors before the model estimation. Columns containing only zeros are removed. Then, to avoid multicollinearity, in the case of presence of multiple non-zero constant columns, the code leave only the first constant column. |

`thrhat` |
Estimated threshold value. Scalar. It is the threshold value that minimizes the joint RSS. |

`thrvar_ord` |
Index series after reorder of threshold variable yd. Vector. |

`sigmaj_2` |
Estimated residual variance of SETARX model. Scalar. |

`RSSj` |
Joint Residual Sum of Squared of SETARX model. Scalar. |

`yjhat` |
Fitted values of SETARX model. Vector. |

`resj` |
Residuals of the SETARX model. Vector. |

`yjhat_full` |
Fitted values of SETARX model with observations (rows) with missing or infinite values reinserted as NaNs. This is to obtain the same length of the initial input vector y defined by the user. |

`resj_full` |
Residuals of the SETARX model with observations (rows) with missing or infinite values reinserted as NaNs. This is to obtain the same length of the initial input vector ydefined by the user. |

`reg`

— description
StructureA structure with the results of the OLS estimation of the linear regression model (benchmark), contanining the following fields.

Value | Description |
---|---|

`beta` |
Estimated parameters of the regression model. Vector. See out.covar. |

`se` |
Estimated heteroskedasticity-consistent (HC) standard errors. Vector. |

`covar` |
Estimated variance-covariance matrix. Matrix. It is the heteroskedasticity-consistent (HC) covariance matrix. See section 'More about'. |

`sigma_2` |
Estimated residual variance. Scalar. |

`yhat` |
Fitted values. Vector. |

`res` |
Residuals of the regression model. Vector. |

`RSS` |
Residual Sum of Squared. Scalar. |

`TSS` |
Total Sum of Squared. Scalar. |

`R_2` |
R^2. Scalar. |

`n` |
Number of observations entering in the estimation. Scalar. |

`k` |
Number of regressors in the model left after the checks. It is the number of betas to be estimated by OLS. The betas corresponding to the removed columns of X will be set to 0 (see section 'More about' of estregimeTAR). Scalar. |

`rmv_col` |
Indices of columns removed from X before the model estimation. Scalar or vector. Columns containing only zeros are removed. Then, to avoid multicollinearity, in the case of presence of multiple non-zero constant columns, the code leave only the first constant column (see section 'More about' of estregimeTAR). |

`rk_warning` |
Warning for skipped estimation. String. If the matrix X is singular after the adjustments, the OLS estimation is skipped, the parameters are set to NaN and a warning is produced. |

`yhat_full` |
Fitted values of the estimated linear regression model with observations (rows) with missing or infinite values reinserted as NaNs. This is to obtain the same length of the initial input vector y defined by the user. |

`res_full` |
Residuals of the estimated linear regression model with observations (rows) with missing or infinite values reinserted as NaNs. This is to obtain the same length of the initial input vector y defined by the user. |

`input`

— description
StructureA structure containing the following fields.

Value | Description |
---|---|

`y` |
Response without missing and infs. Vector. The new response variable, with observations (rows) with missing or infinite values excluded. |

`X` |
Predictor variables without infs and missings. Matrix. The new matrix of explanatory variables, with missing or infinite values excluded, to be used for the model estimation. It is the matrix [L X Z intercept] where L is the lagged matrix n x p of y (if p > 0), X is the matrix of exogenous regressors defined by the user, Z is the matrix of deterministic regressors and the last column is the intercept (if any). |

`yd` |
Threshold variable without missing and infs. Vector. The new threshold variable, with observations (rows) with missing or infinite values excluded. |

`rmv_obs` |
Indices of removed observations/rows (because of missings or infs). Scalar vector. |

`y_full` |
Response y after adjustements by chkinputTAR BUT with observations (rows) with missing or infinite values included. |

`X_full` |
Matrix X after adjustements by chkinputTAR BUT with observations (rows) with missing or infinite values included. |

`yd_full` |
Threshold variable after adjustements by chkinputTAR BUT with observations (rows) with missing or infinite values included. More about (fix this section): Given a time series $y_t$, a two-regime Self-Exciting Threshold Auto Regressive model SETARX($p$,$d$) with exogenous regressors is specified as \begin{equation}\label{eqn:setar} y_t= \begin{cases} {\bf x}_{t} {\boldsymbol{\beta}}_1 + {\bf z}_{t} {\boldsymbol{\lambda}}_1 + \varepsilon_{1t}, \hspace{0.5cm} \textrm{if} \hspace{0.5cm} y_{t-d}\leq \gamma \\ {\bf x}_{t} {\boldsymbol{\beta}}_2 + {\bf z}_{t} {\boldsymbol{\lambda}}_2 + \varepsilon_{2t}, \hspace{0.5cm} \textrm{if} \hspace{0.5cm} y_{t-d}> \gamma \end{cases} \end{equation} for $t=\max(p,d),...,N$, where $y_{t-d}$ is the threshold variable with $d\geq 1$ and $\gamma$ is the threshold value. The relation between $y_{t-d}$ and $\gamma$ states if $y_t$ is observed in regime 1 or 2. ${\boldsymbol{\beta}}_j$ is the vector of auto-regressive parameters for regime $j=1,2$ and ${\bf x}_{t}$ is the $t$-th row of the $(N\times p)$ matrix ${\bf X}$ comprising $p$ lagged variables of $y_t$. ${\boldsymbol{\lambda}}_j$ is the vector of parameters corresponding to exogenous regressors and/or dummies contained in the $(N \times r)$ matrix ${\bf Z}$ whose $t$-th row is ${\bf z}_t$. Errors $\varepsilon_{1t}$ and $\varepsilon_{2t}$ are assumed to be independent and to follow distributions $\mathrm{iid}(0,\sigma_{\varepsilon,1})$ and $\mathrm{iid}(0,\sigma_{\varepsilon,2})$ respectively. \subsection{Estimation of SETAR models} \label{sec:2.1} In general the value of the threshold $\gamma$ is unknown, so that the parameters to estimate become ${\boldsymbol{\theta}}_1=({\boldsymbol{\beta}}^{\prime}_1, {\bf\lambda}^{\prime}_1)^{\prime}$, ${\boldsymbol{\theta}}_2=({\boldsymbol{\beta}}^{\prime}_2,{\bf\lambda}^{\prime}_2)^{\prime} $, $\gamma$, $\sigma_{\varepsilon,1}$ and $\sigma_{\varepsilon,2}$. Parameters can be estimated by sequential conditional least squares. For a fixed threshold $\gamma$ the observations may be divided into two samples $\{y_t |y_{t-d}\leq \gamma\}$ and $\{y_t |y_{t-d}> \gamma\}$: the data can be denoted respectively as $\mathbf{y}_j=(y_{ji_1},y_{ji_2},...,y_{ji_{N_j}})^{\prime}$ in regimes $j=1,2$, with $N_1$ and $N_2$ the regimes sample sizes and $N_1+N_2=N-\max(p,d)$. Parameters ${\boldsymbol{\theta}}_1$ and ${\boldsymbol{\theta}}_2$ can be estimated by OLS as \begin{equation}\label{eqn:par} \hat{\boldsymbol{\theta}}_j=\left({\mathbf{X}^*_j}^{\prime}\mathbf{X}^*_j\right)^{-1}{\mathbf{X}^*_j}^{\prime}\mathbf{y}_j\, \end{equation} for $j=1,2$ where $\mathbf{X}^*_j=(\mathbf{X}_j,\mathbf{Z}_j)=(({\bf x}_{ji_1}^{\prime},...,{\bf x}_{ji_{N_j}}^{\prime})^{\prime},({\bf z}_{ji_1}^{\prime},...,{\bf z}_{ji_{N_j}}^{\prime})^{\prime})$ is the $(N_j \times (p+r))$ matrix of regressors for each regime. The variance estimates can be calculated as $\hat{\sigma}_{\varepsilon,j}={\bf r}_j^{\prime}{\bf r}_j /(N_j - (p+r))$, with ${\bf r}_j={\bf y}_j-\mathbf{X}^*_j\hat{\boldsymbol{\theta}}_j$. The least square estimate of $\gamma$ is obtained by minimizing the joint residual sum of squares \begin{equation}\label{eqn:gamma} \gamma=\arg\min_{\gamma\in\Gamma}\sum_{j=1}^2 {\bf r}_j^{\prime}{\bf r}_j \end{equation} over a set $\Gamma$ of allowable threshold values so that each regime contains at least a given fraction $\varphi$ (ranging from 0.05 to 0.3) of all observations |

Franses and van Dijk (2000), "Nonlinear Time Series Models in Empirical Finance", Cambridge: Cambridge University Press.

Grossi, L. and Nan, F. (2019), Robust forecasting of electricity prices: simulations, models and the impact of renewable sources, "Technological Forecasting & Social Change", Vol. 141, pp. 305-318.