regressB computes Bayesian estimates of regression parameters
Common part to all examples: definition of the prior information.
rng('default') rng(100) % set seed for replicability p=3; n0=30; X0=[ones(n0,1) randn(n0,p-1)]; R=X0'*X0; beta0=zeros(p,1); tau0=1; % SAMPLE INFORMATION n1=100; X=randn(n1,p-1); y=randn(n1,1); % Run regressB using all n1 data and use the default value of c=1 out=regressB(y, X, beta0, R, tau0, n0); disp(out)
beta1: [3×1 double] a1: 63.5000 b1: 68.1135 tau1: 0.9323 covbeta1: [3×3 double] sigma21: 1.0898 res: [100×2 double]
Run regressB and compute new estimate of beta using just the first 20 observations and use a value of c equal to 1.2.
rng('default') rng(100) % set seed for replicability p=3; n0=30; X0=[ones(n0,1) randn(n0,p-1)]; R=X0'*X0; beta0=zeros(p,1); tau0=1; % SAMPLE INFORMATION n1=100; X=randn(n1,p-1); y=randn(n1,1); bsb=1:20; c=1.2; out=regressB(y, X, beta0, R, tau0, n0,'bsb',bsb,'c',c);
rng('default') rng(100) % set seed for replicability p=3; n0=30; X0=[ones(n0,1) randn(n0,p-1)]; R=X0'*X0; beta0=zeros(p,1); tau0=1; % SAMPLE INFORMATION n1=100; X=randn(n1,p-1); y=randn(n1,1); bsb=1:20; stats=true; out=regressB(y, X, beta0, R, tau0, n0,'bsb',bsb,'stats',stats);
Compare the output with Table 3.3 of Koop (2004) p. 52.
load hprice.txt; % setup parameters n=size(hprice,1); y=hprice(:,1); X=hprice(:,2:5); n0=5; % set \beta components beta0=0*ones(5,1); beta0(2,1)=10; beta0(3,1)=5000; beta0(4,1)=10000; beta0(5,1)=10000; % \tau s02=1/4.0e-8; tau0=1/s02; % R prior settings R=2.4*eye(5); R(2,2)=6e-7; R(3,3)=.15; R(4,4)=.6; R(5,5)=.6; R=inv(R); out=regressB(y, X, beta0, R, tau0, n0,'stats',1); disp(out)
beta1: [5×1 double] a1: 273 b1: 9.0326e+10 tau1: 3.0224e-09 covbeta1: [5×5 double] sigma21: 3.3208e+08 res: [546×2 double] beta1HPD: [5×4 double] tau1HPD: [2.6745e-09 3.3913e-09 2.5720e-09 3.5143e-09] sigma21HPD: [2.9487e+08 3.7391e+08 2.8455e+08 3.8880e+08] Bpval: [5×1 double] postodds: [5×1 double] modelprob: [5×1 double]
y
— Response variable.
Vector.Response variable, specified as a vector of length n1, where n1 is the number of observations. Each entry in y is the response for the corresponding row of X.
Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.
Data Types: single| double
X
— Predictor variables.
Matrix.Matrix of explanatory variables (also called 'regressors') of dimension n1 x (p-1) where p denotes the number of explanatory variables including the intercept.
Rows of X represent observations, and columns represent variables. By default, there is a constant term in the model, unless you explicitly remove it using input option intercept, so do not include a column of 1s in X. Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.
Remark: note that here we use symbol n1 instead of traditional symbol n because we want to better separate sample information coming from n1 values to prior information coming from n0 previous experiments.
PRIOR INFORMATION.
$\beta$ is assumed to have a normal distribution with mean $\beta_0$ and (conditional on $\tau_0$) covariance $(1/\tau_0) (X_0'X_0)^{-1}$.
$\beta \sim N( \beta_0, (1/\tau_0) (X_0'X_0)^{-1} )$
Data Types: single| double
R
— Matrix associated with covariance matrix of $\beta$.
p-times-p
positive definite matrix.It can be interpreted as X0'X0 where X0 is a n0 x p matrix coming from previous experiments (assuming that the intercept is included in the model) The prior distribution of $\tau_0$ is a gamma distribution with parameters $a_0$ and $b_0$, that is: \[ p(\tau_0) \propto \tau^{a_0-1} \exp (-b_0 \tau) \qquad E(\tau_0)= a_0/b_0 \]
Data Types: single| double
tau0
— Prior estimate of tau.
Scalar.Prior estimate of $\tau=1/ \sigma^2 =a_0/b_0$.
Data Types: single| double
n0
— Number of previous experiments.
Scalar.Sometimes it helps to think of the prior information as coming from n0 previous experiments. Therefore we assume that matrix X0 (which defines R), was made up of n0 observations.
Data Types: single| double
Specify optional comma-separated pairs of Name,Value
arguments.
Name
is the argument name and Value
is the corresponding value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
'intercept',false
, 'bsb',[3 5]
, 'c',1.2
, 'stats',1
, 'conflev',[0.99 0.999]
, 'nocheck',true
intercept
—Indicator for constant term.true (default) | false.Indicator for the constant term (intercept) in the fit, specified as the comma-separated pair consisting of 'Intercept' and either true to include or false to remove the constant term from the model.
Example: 'intercept',false
Data Types: boolean
bsb
—units forming subset.vector.m x 1 vector.
The default value of bsb is 1:n1, that is all n1 units are used to compute beta1 REMARK: if bsb='' (empty value) just prior information is used
Example: 'bsb',[3 5]
Data Types: double
c
—it can be used to control the prior information
about beta.scalar.Scalar between 0 (excluded) and 1 (included).
The covariance matrix of the prior distribution of $\beta$ is
\[ (1/\tau_0)(c X_0'X_0)^{-1} = (1/\tau_0)(cR)^{-1} \]therefore multiplication of $R$ by $c$ (with $c<1$) increases the variance of $\beta$.
The default value of $c$ is 1. Note also that if we adjust the information for $\beta$ multiplying its covariance matrix by $c$ we need to make a similar adjustment to the variance of $\tau$. Simple mathematical calculations show that the new variance of $\tau$ is equal to the old variance of $\tau$ divided by $c$. So, once again, given that $c < 1$, the new variance of $\tau$ increases.
The value of $c$ will usually be the default, i,e., 1.
It was included to clarify the difference from Zellner's procedure and has the effect of changing n0. For we can write n0' = cn0 and use n0' in the calculations. However, it may be useful in scaling X0 if the prior information is chosen according to a design that is not of the appropriate size to represent the amount of prior knowledge.
Example: 'c',1.2
Data Types: double
stats
—additional statistics.scalar.If stats =1 the following additional statistics are computed: 1) Bayesian p-values.
2) highest posterior density intervals (HPDI) for each value of input option conflev.
3) posterior odds for $\beta_j=0$.
4) posterior model probability of the model which excludes variable j
Example: 'stats',1
Data Types: double
conflev
—confidence levels to be used to
compute HPDI.vector.This input option is used just if input stats=1. The default value of conflev is [0.95 0.99] that is 95 per cent and 99 per cent HPDI confidence intervals are computed
Example: 'conflev',[0.99 0.999]
Data Types: double
nocheck
—Check input arguments.boolean.If nocheck is equal to true no check is performed on matrix y and matrix X. Notice that y and X are left unchanged. In other words the additional column of ones for the intercept is not added. As default nocheck=false.
Example: 'nocheck',true
Data Types: boolean
out
— description
StructureStructure which contains the following fields
Value | Description |
---|---|
beta1 |
p x 1 vector containing posterior mean (conditional on $\tau_0$) of $\beta$ (regression coefficents) $ \beta_1 = (c \times R + X'X)^{-1} (c \times R \times \beta_0 + X'y)$ |
a1 |
shape parameter of the posterior gamma distribution of $\tau$ ($\sigma^2$). Scalar. $a1 = 0.5 (c*n_0 + n_1)$ The posterior distribution of $\tau$ is a gamma distribution with parameters $a_1$ and $b_1$. The posterior distribution of $\sigma^2$ is an inverse-gamma distribution with parameters $a_1$ and $b_1$. |
b1 |
scale parameter of the posterior gamma distribution of $\tau$ ($\sigma^2$). Scalar. \[ b1 = 0.5 * ( n_0 / \tau_0 + (y-X*\beta_1)'y +(\beta_0-\beta_1)'*c*R*\beta_0 ) \] The posterior distribution of $\tau$ is a gamma distribution with parameters $a_1$ and $b_1$. The posterior distribution of $\sigma^2$ is an inverse-gamma distribution with parameters $a_1$ and $b_1$. Remark: note that if bsb is supplied X'X and X'y and n1 are transformed respectively into Xm'Xm and Xm'ym and m where Xm=X(bsb,:), ym=y(bsb) and m=length(bsb), therefore all the previous quantities are estimated just using the units forming subset |
tau1 |
posterior estimate of $\tau$. Scalar. $\tau_1$ is obtained as $a_1/b_1$ |
covbeta1 |
p x p matrix containing covariance matrix (conditional on $\tau_1$) of $\beta_1$ $covbeta1 = cov(\beta_1)= (1/\tau_1) * (c*R + X'X)^{-1}$ where $\tau_1$ is obtained as $a_1/b_1$ (that is through the gamma parameters of the posterior distribution of $\tau$) |
sigma21 |
posterior estimate of $\sigma^2$. Scalar. $\sigma^2_1$ is obtained as $b_1/(a_1-1)$ (that is through the inverse gamma parameters of the posterior distribution of $\sigma^2$). Remember that if $X\sim IG(a,b)$, then $E(X)=b/(a-1)$. |
res |
n1-times-2 matrix. 1st column = raw residuals res(i,1) is computed as y(i) - X(i,:)*beta1 where beta1 is computed using the units forming subset. In the Bayesian approach they are the posterior means of the $\epsilon_i$ and can be interpreted as point estimates of the $\epsilon_i$. 2nd col = deletion residuals (just computed for the units which do not form the subset). res(i,2) with i \not \in subset is computed as (y(i)-X(i,:)*beta1) * sqrt ((a1/b1)/(1+hii)) where hii = X(i,:)* (c*R + Xm'*Xm)^{-1} * X(i,:)'. The additional output which follows is produced just if input scalar stats is equal 1 |
beta1HPD |
p-by-2*length(conflev) matrix HPDI of \hat \beta. (HPDI stands for Highest Posterior Density Interval) 1st column =lower bound of HPDI associated with conflev(1). 2st column =upper bound of HPDI associated with conflev(1). ... 2*length(conflev)-1 column = lower bound of HPDI associated with conflev(end). 2*length(conflev) column (last column) = upper bound of HPDI associated with conflev(end). |
tau1HPD |
1-by-2*length(conflev) matrix HPDI of $\hat \tau_1$. 1st element =lower bound of HPDI associated with conflev(1). 2st element =upper bound of HPDI associated with conflev(1). ... 2*length(conflev)-1 element = lower bound of HPDI associated with conflev(end). 2*length(conflev) element (last element) = upper bound of HPDI associated with conflev(end). Remark: confidence levels are based on the Gamma distribution |
sigma21HPD |
1-by-2*length(conflev) matrix HPDI of $\hat \sigma^2_1$. 1st element =lower bound of HPDI associated with conflev(1). 2st element =upper bound of HPDI associated with conflev(1). ... 2*length(conflev)-1 element = lower bound of HPDI associated with conflev(end). 2*length(conflev) element (last element) = upper bound of HPDI associated with conflev(end). Remark: confidence levels are based on the inverse-gamma distribution |
Bpval |
p-by-1 vector containing Bayesian p-values. p-value = P(|t| > | \hat \beta /se(\hat beta) |) = prob. of beta different from 0 |
postodds |
p-by-1 vector which contains posterior odds for betaj=0. For example the posterior odd of betaj=0 is p(y| model which contains all expl variables except the one associated with betaj) divided by p(y| model which contains all expl variables). Warning: postodds can be computed just if n1>=p |
modelprob |
p-by-1 vector which contains posterior model probability of the model which excludes variable j. For example if modelprob(j)= 0.28, that is if the probability of the model which does not contain variable j is equal to 0.28, it means that there is a 28 per cent chance that $\beta_j=0$ and a 72 per cent chance that it is not. Warning: modelprob can be computed just if n1>=p |
The density of the gamma distribution (which we denote with $G(a,b)$) with parameters $a$ and $b$ is \[ f_G(x,a,b)= \frac{b^a}{\Gamma(a)} x^{a-1} \exp(-b x) \]
With this parametrization $G(a,b)$ has mean $a/b$ and variance $a/b^2$.
The density of the inverse gamma distribution defined over the support $x>0$ with shape parameter $a$ and scale parameter $b$, which we denote with $IG(a, b)$ is
\[ f_{IG}(x, a, b) = \frac{b^a}{\Gamma(a)} (1/x)^{a +1} \exp (-b/x) \]The mean (for $a>1$) is $b/(a-1)$ and the variance (for $a>2$) is $\frac{b^2}{(a-1)^2(a-2)}$.
With the above parameterizations if $X\sim G(a,b)$, then $Y=1/X$ has an $IG(a, b)$.
The posterior distribution of $\beta$ conditional on $\tau$ is $N(\hat \beta_1, (1/\tau) (R+ X^TX)^{-1})$ where \begin{eqnarray*} \hat \beta_1 &=& (R+ X^TX)^{-1} (R \beta_0+X^T y) \\ &=& (R+ X^TX)^{-1} (R \beta_0 + X^TX \hat \beta) \\ &=& (I-A) \beta_0 + A \hat \beta, \end{eqnarray*} with $A= (R+ X^TX)^{-1} X^TX$. This last expression shows that the posterior estimate $\hat \beta_1$ is a matrix weighted average of the prior mean $\beta_0$ and the classical OLS estimate $\hat \beta$, with weights $I-A$ and $A$. If prior information is strong elements of $R$ will be large, and $A$ will be small, so that the posterior mean gives most weight to the prior mean. In the classical approach these weights are fixed, while with the forward search technique as the subset size grows, the weight assigned to A becomes stronger and stronger, so we can dynamically see how the estimate changes as the effect of the prior becomes less important.
The posterior distribution of $\tau$ is $G(a_1, b_1)$ where \begin{equation}\label{a1} a_1=a+\frac{n}{2}= \frac{1}{2} (n_0 + n -p) \end{equation} and \begin{equation}\label{b1} b_1 = 0.5 \left( \frac{n_0-p}{\tau_0} + (y-X \beta_1)'y +(\beta_0-\beta_1)'R\beta_0 \right) \end{equation} and, analogously, the posterior distribution of $\sigma^2$ is $IG(a_1, b_1)$.
The posterior estimates of $\tau$ and $\sigma^2$ are respectively:
\[ \hat \tau_1 = a_1/b_1, \qquad \hat \sigma^2_1 = b_1 /(a_1-1) \] The posterior marginal distribution of $\beta$ is multivariate $T$ with parameters \[ \hat \beta_1, (1/\tau_1) \{R + X^T X \}^{-1}, n_0 +n - p \]Chaloner, K. and Brant, R. (1988), A Bayesian Approach to Outlier Detection and Residual Analysis, "Biometrika", Vol. 75, pp. 651-659.
Koop G. (2003), "Bayesian Econometrics", Wiley, New York. [Chapt. 3]
regress
|