# regressB

regressB computes Bayesian estimates of regression parameters

## Syntax

• out=regressB(y, X, beta0, R, tau0, n0)example
• out=regressB(y, X, beta0, R, tau0, n0,Name,Value)example

## Description

 out =regressB(y, X, beta0, R, tau0, n0) regressB with all default options.

 out =regressB(y, X, beta0, R, tau0, n0, Name, Value) regressB with optional arguments.

## Examples

expand all

### regressB with all default options.

Common part to all examples: definition of the prior information.

rng('default')
rng(100) % set seed for replicability
p=3;
n0=30;
X0=[ones(n0,1) randn(n0,p-1)];
R=X0'*X0;
beta0=zeros(p,1);
tau0=1;
% SAMPLE INFORMATION
n1=100;
X=randn(n1,p-1);
y=randn(n1,1);
% Run regressB using all n1 data and use the default value of c=1
out=regressB(y, X, beta0, R, tau0, n0);
disp(out)
       beta1: [3×1 double]
a1: 63.5000
b1: 68.1135
tau1: 0.9323
covbeta1: [3×3 double]
sigma21: 1.0898
res: [100×2 double]



### regressB with optional arguments.

Run regressB and compute new estimate of beta using just the first 20 observations and use a value of c equal to 1.2.

rng('default')
rng(100) % set seed for replicability
p=3;
n0=30;
X0=[ones(n0,1) randn(n0,p-1)];
R=X0'*X0;
beta0=zeros(p,1);
tau0=1;
% SAMPLE INFORMATION
n1=100;
X=randn(n1,p-1);
y=randn(n1,1);
bsb=1:20;
c=1.2;
out=regressB(y, X, beta0, R, tau0, n0,'bsb',bsb,'c',c);

## Related Examples

expand all

### Example of the use of input option stats.

rng('default')
rng(100) % set seed for replicability
p=3;
n0=30;
X0=[ones(n0,1) randn(n0,p-1)];
R=X0'*X0;
beta0=zeros(p,1);
tau0=1;
% SAMPLE INFORMATION
n1=100;
X=randn(n1,p-1);
y=randn(n1,1);
bsb=1:20;
stats=true;
out=regressB(y, X, beta0, R, tau0, n0,'bsb',bsb,'stats',stats);

### Example of Houses Price.

Compare the output with Table 3.3 of Koop (2004) p. 52.

load hprice.txt;
% setup parameters
n=size(hprice,1);
y=hprice(:,1);
X=hprice(:,2:5);
n0=5;
% set \beta components
beta0=0*ones(5,1);
beta0(2,1)=10;
beta0(3,1)=5000;
beta0(4,1)=10000;
beta0(5,1)=10000;
% \tau
s02=1/4.0e-8;
tau0=1/s02;
% R prior settings
R=2.4*eye(5);
R(2,2)=6e-7;
R(3,3)=.15;
R(4,4)=.6;
R(5,5)=.6;
R=inv(R);
out=regressB(y, X, beta0, R, tau0, n0,'stats',1);
disp(out)
         beta1: [5×1 double]
a1: 273
b1: 9.0326e+10
tau1: 3.0224e-09
covbeta1: [5×5 double]
sigma21: 3.3208e+08
res: [546×2 double]
beta1HPD: [5×4 double]
tau1HPD: [2.6745e-09 3.3913e-09 2.5720e-09 3.5143e-09]
sigma21HPD: [2.9487e+08 3.7391e+08 2.8455e+08 3.8880e+08]
Bpval: [5×1 double]
postodds: [5×1 double]
modelprob: [5×1 double]



## Input Arguments

### y — Response variable. Vector.

Response variable, specified as a vector of length n1, where n1 is the number of observations. Each entry in y is the response for the corresponding row of X.

Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

Data Types: single| double

### X — Predictor variables. Matrix.

Matrix of explanatory variables (also called 'regressors') of dimension n1 x (p-1) where p denotes the number of explanatory variables including the intercept.

Rows of X represent observations, and columns represent variables. By default, there is a constant term in the model, unless you explicitly remove it using input option intercept, so do not include a column of 1s in X. Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

Remark: note that here we use symbol n1 instead of traditional symbol n because we want to better separate sample information coming from n1 values to prior information coming from n0 previous experiments.

PRIOR INFORMATION.

$\beta$ is assumed to have a normal distribution with mean $\beta_0$ and (conditional on $\tau_0$) covariance $(1/\tau_0) (X_0'X_0)^{-1}$.

$\beta \sim N( \beta_0, (1/\tau_0) (X_0'X_0)^{-1} )$

Data Types: single| double

### beta0 — Prior mean of $\beta$. p-times-1 vector.

Data Types: single| double

### R — Matrix associated with covariance matrix of $\beta$. p-times-p positive definite matrix.

It can be interpreted as X0'X0 where X0 is a n0 x p matrix coming from previous experiments (assuming that the intercept is included in the model) The prior distribution of $\tau_0$ is a gamma distribution with parameters $a_0$ and $b_0$, that is:

$p(\tau_0) \propto \tau^{a_0-1} \exp (-b_0 \tau) \qquad E(\tau_0)= a_0/b_0$

Data Types: single| double

### tau0 — Prior estimate of tau. Scalar.

Prior estimate of $\tau=1/ \sigma^2 =a_0/b_0$.

Data Types: single| double

### n0 — Number of previous experiments. Scalar.

Sometimes it helps to think of the prior information as coming from n0 previous experiments. Therefore we assume that matrix X0 (which defines R), was made up of n0 observations.

Data Types: single| double

### Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as  Name1,Value1,...,NameN,ValueN.

Example:  'intercept',false , 'bsb',[3 5] , 'c',1.2 , 'stats',1 , 'conflev',[0.99 0.999] , 'nocheck',true 

### intercept —Indicator for constant term.true (default) | false.

Indicator for the constant term (intercept) in the fit, specified as the comma-separated pair consisting of 'Intercept' and either true to include or false to remove the constant term from the model.

Example:  'intercept',false 

Data Types: boolean

### bsb —units forming subset.vector.

m x 1 vector.

The default value of bsb is 1:n1, that is all n1 units are used to compute beta1 REMARK: if bsb='' (empty value) just prior information is used

Example:  'bsb',[3 5] 

Data Types: double

### c —it can be used to control the prior information about beta.scalar.

Scalar between 0 (excluded) and 1 (included).

The covariance matrix of the prior distribution of $\beta$ is

$(1/\tau_0)(c X_0'X_0)^{-1} = (1/\tau_0)(cR)^{-1}$

therefore multiplication of $R$ by $c$ (with $c<1$) increases the variance of $\beta$.

The default value of $c$ is 1. Note also that if we adjust the information for $\beta$ multiplying its covariance matrix by $c$ we need to make a similar adjustment to the variance of $\tau$. Simple mathematical calculations show that the new variance of $\tau$ is equal to the old variance of $\tau$ divided by $c$. So, once again, given that $c < 1$, the new variance of $\tau$ increases.

The value of $c$ will usually be the default, i,e., 1.

It was included to clarify the difference from Zellner's procedure and has the effect of changing n0. For we can write n0' = cn0 and use n0' in the calculations. However, it may be useful in scaling X0 if the prior information is chosen according to a design that is not of the appropriate size to represent the amount of prior knowledge.

Example:  'c',1.2 

Data Types: double

### stats —additional statistics.scalar.

If stats =1 the following additional statistics are computed:

1) Bayesian p-values.

2) highest posterior density intervals (HPDI) for each value of input option conflev.

3) posterior odds for $\beta_j=0$.

4) posterior model probability of the model which excludes variable j

Example:  'stats',1 

Data Types: double

### conflev —confidence levels to be used to compute HPDI.vector.

This input option is used just if input stats=1. The default value of conflev is [0.95 0.99] that is 95 per cent and 99 per cent HPDI confidence intervals are computed

Example:  'conflev',[0.99 0.999] 

Data Types: double

### nocheck —Check input arguments.boolean.

If nocheck is equal to true no check is performed on matrix y and matrix X. Notice that y and X are left unchanged. In other words the additional column of ones for the intercept is not added. As default nocheck=false.

Example:  'nocheck',true 

Data Types: boolean

## Output Arguments

### out — description Structure

Structure which contains the following fields

Value Description
beta1

p x 1 vector containing posterior mean (conditional on $\tau_0$) of $\beta$ (regression coefficents) $\beta_1 = (c \times R + X'X)^{-1} (c \times R \times \beta_0 + X'y)$

a1

shape parameter of the posterior gamma distribution of $\tau$ ($\sigma^2$). Scalar.

$a1 = 0.5 (c*n_0 + n_1)$ The posterior distribution of $\tau$ is a gamma distribution with parameters $a_1$ and $b_1$.

The posterior distribution of $\sigma^2$ is an inverse-gamma distribution with parameters $a_1$ and $b_1$.

b1

scale parameter of the posterior gamma distribution of $\tau$ ($\sigma^2$). Scalar.

$b1 = 0.5 * ( n_0 / \tau_0 + (y-X*\beta_1)'y +(\beta_0-\beta_1)'*c*R*\beta_0 )$

The posterior distribution of $\tau$ is a gamma distribution with parameters $a_1$ and $b_1$.

The posterior distribution of $\sigma^2$ is an inverse-gamma distribution with parameters $a_1$ and $b_1$.

Remark: note that if bsb is supplied X'X and X'y and n1 are transformed respectively into Xm'Xm and Xm'ym and m where Xm=X(bsb,:), ym=y(bsb) and m=length(bsb), therefore all the previous quantities are estimated just using the units forming subset

tau1

posterior estimate of $\tau$. Scalar.

$\tau_1$ is obtained as $a_1/b_1$

covbeta1

p x p matrix containing covariance matrix (conditional on $\tau_1$) of $\beta_1$ $covbeta1 = cov(\beta_1)= (1/\tau_1) * (c*R + X'X)^{-1}$ where $\tau_1$ is obtained as $a_1/b_1$ (that is through the gamma parameters of the posterior distribution of $\tau$)

sigma21

posterior estimate of $\sigma^2$. Scalar. $\sigma^2_1$ is obtained as $b_1/(a_1-1)$ (that is through the inverse gamma parameters of the posterior distribution of $\sigma^2$).

Remember that if $X\sim IG(a,b)$, then $E(X)=b/(a-1)$.

res

n1-times-2 matrix.

1st column = raw residuals res(i,1) is computed as y(i) - X(i,:)*beta1 where beta1 is computed using the units forming subset.

In the Bayesian approach they are the posterior means of the $\epsilon_i$ and can be interpreted as point estimates of the $\epsilon_i$.

2nd col = deletion residuals (just computed for the units which do not form the subset).

res(i,2) with i \not \in subset is computed as (y(i)-X(i,:)*beta1) * sqrt ((a1/b1)/(1+hii)) where hii = X(i,:)* (c*R + Xm'*Xm)^{-1} * X(i,:)'.

The additional output which follows is produced just if input scalar stats is equal 1

beta1HPD

p-by-2*length(conflev) matrix HPDI of \hat \beta.

(HPDI stands for Highest Posterior Density Interval) 1st column =lower bound of HPDI associated with conflev(1).

2st column =upper bound of HPDI associated with conflev(1).

...

2*length(conflev)-1 column = lower bound of HPDI associated with conflev(end).

2*length(conflev) column (last column) = upper bound of HPDI associated with conflev(end).

tau1HPD

1-by-2*length(conflev) matrix HPDI of $\hat \tau_1$.

1st element =lower bound of HPDI associated with conflev(1).

2st element =upper bound of HPDI associated with conflev(1).

...

2*length(conflev)-1 element = lower bound of HPDI associated with conflev(end).

2*length(conflev) element (last element) = upper bound of HPDI associated with conflev(end).

Remark: confidence levels are based on the Gamma distribution

sigma21HPD

1-by-2*length(conflev) matrix HPDI of $\hat \sigma^2_1$.

1st element =lower bound of HPDI associated with conflev(1).

2st element =upper bound of HPDI associated with conflev(1).

...

2*length(conflev)-1 element = lower bound of HPDI associated with conflev(end).

2*length(conflev) element (last element) = upper bound of HPDI associated with conflev(end).

Remark: confidence levels are based on the inverse-gamma distribution

Bpval

p-by-1 vector containing Bayesian p-values.

p-value = P(|t| > | \hat \beta /se(\hat beta) |) = prob. of beta different from 0

postodds

p-by-1 vector which contains posterior odds for betaj=0.

For example the posterior odd of betaj=0 is p(y| model which contains all expl variables except the one associated with betaj) divided by p(y| model which contains all expl variables).

Warning: postodds can be computed just if n1>=p

modelprob

p-by-1 vector which contains posterior model probability of the model which excludes variable j.

For example if modelprob(j)= 0.28, that is if the probability of the model which does not contain variable j is equal to 0.28, it means that there is a 28 per cent chance that $\beta_j=0$ and a 72 per cent chance that it is not.

Warning: modelprob can be computed just if n1>=p

The density of the gamma distribution (which we denote with $G(a,b)$) with parameters $a$ and $b$ is $f_G(x,a,b)= \frac{b^a}{\Gamma(a)} x^{a-1} \exp(-b x)$

With this parametrization $G(a,b)$ has mean $a/b$ and variance $a/b^2$.

The density of the inverse gamma distribution defined over the support $x>0$ with shape parameter $a$ and scale parameter $b$, which we denote with $IG(a, b)$ is

$f_{IG}(x, a, b) = \frac{b^a}{\Gamma(a)} (1/x)^{a +1} \exp (-b/x)$

The mean (for $a>1$) is $b/(a-1)$ and the variance (for $a>2$) is $\frac{b^2}{(a-1)^2(a-2)}$.

With the above parameterizations if $X\sim G(a,b)$, then $Y=1/X$ has an $IG(a, b)$.

The posterior distribution of $\beta$ conditional on $\tau$ is $N(\hat \beta_1, (1/\tau) (R+ X^TX)^{-1})$ where \begin{eqnarray*} \hat \beta_1 &=& (R+ X^TX)^{-1} (R \beta_0+X^T y) \\ &=& (R+ X^TX)^{-1} (R \beta_0 + X^TX \hat \beta) \\ &=& (I-A) \beta_0 + A \hat \beta, \end{eqnarray*} with $A= (R+ X^TX)^{-1} X^TX$. This last expression shows that the posterior estimate $\hat \beta_1$ is a matrix weighted average of the prior mean $\beta_0$ and the classical OLS estimate $\hat \beta$, with weights $I-A$ and $A$. If prior information is strong elements of $R$ will be large, and $A$ will be small, so that the posterior mean gives most weight to the prior mean. In the classical approach these weights are fixed, while with the forward search technique as the subset size grows, the weight assigned to A becomes stronger and stronger, so we can dynamically see how the estimate changes as the effect of the prior becomes less important.

The posterior distribution of $\tau$ is $G(a_1, b_1)$ where $$\label{a1} a_1=a+\frac{n}{2}= \frac{1}{2} (n_0 + n -p)$$ and $$\label{b1} b_1 = 0.5 \left( \frac{n_0-p}{\tau_0} + (y-X \beta_1)'y +(\beta_0-\beta_1)'R\beta_0 \right)$$ and, analogously, the posterior distribution of $\sigma^2$ is $IG(a_1, b_1)$.

The posterior estimates of $\tau$ and $\sigma^2$ are respectively:

$\hat \tau_1 = a_1/b_1, \qquad \hat \sigma^2_1 = b_1 /(a_1-1)$ The posterior marginal distribution of $\beta$ is multivariate $T$ with parameters $\hat \beta_1, (1/\tau_1) \{R + X^T X \}^{-1}, n_0 +n - p$

## References

Chaloner, K. and Brant, R. (1988), A Bayesian Approach to Outlier Detection and Residual Analysis, "Biometrika", Vol. 75, pp. 651-659.

Koop G. (2003), "Bayesian Econometrics", Wiley, New York. [Chapt. 3]