regressCens

regressCens computes estimates of regression parameters under the censored (Tobit) model

Syntax

out=regressCens(y,X)example
out=regressCens(y,X,Name,Value)example

Description

The model is estimated by Maximum Likelihood (ML) assuming a Gaussian (normal) distribution of the error term. The maximization of the likelihood function is done by function fminunc of the optimization toolbox.

example

out =regressCens(y, X) Tobit regression using the affairs dataset.

example

out =regressCens(y, X, Name, Value) Example of right censoring.

Examples

expand all

Tobit regression using the affairs dataset.

In the example of Kleiber and Zeileis (2008, p. 142), the number of a person's extramarital sexual inter-courses ("affairs") in the past year is regressed on the person's age, number of years married, religiousness, occupation, and won rating of the marriage. The dependent variable is left-censored at zero and not right-censored. Hence this is a standard Tobit model which can be estimated by the following lines

load affairs.mat
% Show the description of this table
wraptextFS(affairs.Properties.Description,'width',80)
% Define X and y 
X=affairs(:,["age" "yearsmarried" "religiousness" "occupation" "rating"]);
y=affairs(:,"affairs");
out=regressCens(y,X,"dispresults",true);

ans =

    'fidelity data, known as Fair’s Affairs. Cross-section data from a survey
     conducted by Psychology Today in 1969. The variables are
     gender: Factor indicating gender.age: Numeric variable coding age in years17.5
     = under 20, 22 = 20–24, 27 = 25–29, 32 = 30–34, 37 = 35–39, 42 = 40–44, 47 =
     45–49, 52 = 50–54, 57 = 55 or over.
     yearsmarried: Numeric variable coding number of years married: 0.125 = 3 months
     or less, 0.417 = 4–6 months, 0.75 = 6 months–1 year, 1.5 = 1–2 years, 4 = 3–5
     years, 7 = 6–8 years, 10 = 9–11 years, 15 = 12 or more years.
     children factor: Are there children in the marriage?
     religiousness: Numeric variable coding religiousness: 1 = anti, 2 = not at all,
     3 = slightly, 4 = somewhat, 5 = very.
     education: Numeric variable coding level of education: 9 = grade school, 12 =
     high school graduate, 14 = some college, 16 = college graduate, 17 = some
     graduate work, 18 = master’s degree, 20 = Ph.D., M.D., or other advanced
     degree.
     occupation: Numeric variable coding occupation according to Hollingshead
     classification (reverse numbering).
     affairs (y): Numeric variable. How often engaged in extramarital sexual
     intercourse during the past year? 0 = none, 1 = once, 2 = twice, 3 = 3 times, 7
     = 4–10 times, 12 = monthly, 12 = weekly, 12 = daily.
     In the example of Kleiber and Zeileis (2008, p. 142), the number of a person's
     extramarital sexual inter-courses ('affairs') in the past year is regressed on
     the person's age, number of years married, religiousness, occupation, and won
     rating of the marriage. The dependent variable is left-censored at zero and not
     right-censored. Hence this is a standard Tobit model which be estimated by
     functions regressCens and  FSRedaCens
     '

Observations:
    Total    LeftCensored    Uncensored    RightCensored
    _____    ____________    __________    _____________

     601         451            150              0      

Coefficients
                     betaout     sebetaout     tout         pval   
                     ________    _________    _______    __________

    (Intercept)        8.1756      2.7425      2.9811     0.0029896
    age              -0.17936    0.079165     -2.2656      0.023833
    yearsmarried      0.55417     0.13463      4.1162    4.3966e-05
    religiousness     -1.6863     0.40385     -4.1756    3.4171e-05
    occupation        0.32604     0.25444      1.2814       0.20054
    rating            -2.2851     0.40801     -5.6006    3.2686e-08
    sigma              8.2472     0.55408      14.885     8.126e-43

Number of observations: 601, Error degrees of freedom:594
Log-likelihood: -705.5762
R-squared: 0.36215

Example of right censoring.

When left=-Inf and right=0 indicates that there is no left-censoring but there is a right censoring at 0. The same model as above but with the negative number of affairs as the dependent variable can be estimated by

load affairs.mat
X=affairs(:,["age" "yearsmarried" "religiousness" "occupation" "rating"]);
y=affairs(:,"affairs").*(-1);
out=regressCens(y,X,"dispresults",true,"left",-Inf,"right",0);
% This estimation returns beta parameters that have the opposite sign of
% the beta parameters in the original model, but the estimate of sigma does
% not change.

Observations:
    Total    LeftCensored    Uncensored    RightCensored
    _____    ____________    __________    _____________

     601          0             150             451     

Coefficients
                     betaout     sebetaout     tout         pval   
                     ________    _________    _______    __________

    (Intercept)       -8.1756      2.7425     -2.9811     0.0029896
    age               0.17936    0.079165      2.2656      0.023833
    yearsmarried     -0.55417     0.13463     -4.1162    4.3966e-05
    religiousness      1.6863     0.40385      4.1756    3.4171e-05
    occupation       -0.32604     0.25444     -1.2814       0.20054
    rating             2.2851     0.40801      5.6006    3.2686e-08
    sigma              8.2472     0.55408      14.885     8.126e-43

Number of observations: 601, Error degrees of freedom:594
Log-likelihood: -705.5762
R-squared: 0.36215

Related Examples

expand all

Example of right censoring with X a table with categorical variables.

load affairs.mat
X=affairs(:,["age" "yearsmarried" "religiousness" "occupation" "rating"]);
y=affairs(:,"affairs").*(-1);
out=regressCens(y,X,"dispresults",true,"left",-Inf,"right",0);
% This estimation returns beta parameters that have the opposite sign of
% the beta parameters in the original model, but the estimate of sigma does
% not change.

Observations:
    Total    LeftCensored    Uncensored    RightCensored
    _____    ____________    __________    _____________

     601          0             150             451     

Coefficients
                     betaout     sebetaout     tout         pval   
                     ________    _________    _______    __________

    (Intercept)       -8.1756      2.7425     -2.9811     0.0029896
    age               0.17936    0.079165      2.2656      0.023833
    yearsmarried     -0.55417     0.13463     -4.1162    4.3966e-05
    religiousness      1.6863     0.40385      4.1756    3.4171e-05
    occupation       -0.32604     0.25444     -1.2814       0.20054
    rating             2.2851     0.40801      5.6006    3.2686e-08
    sigma              8.2472     0.55408      14.885     8.126e-43

Number of observations: 601, Error degrees of freedom:594
Log-likelihood: -705.5762
R-squared: 0.36215

Another example with right censoring.

This example is taken from https://stats.oarc.ucla.edu/r/dae/tobit-models/ Consider the situation in which we have a measure of academic aptitude (scaled 200-800) which we want to model using reading and math test scores, as well as, the type of program the student is enrolled in (academic, general, or vocational). The problem here is that students who answer all questions on the academic aptitude test correctly receive a score of 800, even though it is likely that these students are not “truly” equal in aptitude. The same is true of students who answer all of the questions incorrectly. All such students would have a score of 200, although they may not all be of equal aptitude.

tabname="https://stats.idre.ucla.edu/stat/data/tobit.csv";
XX=readtable(tabname,"ReadRowNames",true);
XX.prog=categorical(XX.prog);
% The dataset contains 200 observations. The academic aptitude variable is
% "apt", the reading and math test scores are read and math respectively. The
% variable prog is the type of program the student is in, it is a
% categorical (nominal) variable that takes on three values, academic
% general, and vocational (prog = 3). 
% The scatterplot matrix is shown below
spmplot(XX(:,[1 2 4]));
% Now let’s look at the data descriptively. Note that in this dataset, the
% lowest value of apt is 352. That is, no students received a score of 200
% (the lowest score possible), meaning that even though censoring from
% below was possible, it does not occur in the dataset.
summary(XX)
% Define y and X
y=XX(:,"apt");
X=XX(:,["read", "math" "prog"]);
% Call regressCens
out=regressCens(y,X,'right',800,'left',-Inf,'dispresults',true);
% Show the plot of fitted vs residuals
fitted=out.X*out.Beta(1:end-1,1);
residuals=(y{:,1}-fitted)/out.Beta(end,1);
figure
scatter(fitted,residuals)
xlabel('Fitted values (yhat)')
ylabel('Residuals')

<strong>XX</strong>: 200×4 table

Variables:

    <strong>read</strong>: double
    <strong>math</strong>: double
    <strong>prog</strong>: categorical (3 categories)
    <strong>apt</strong>: double

Statistics for applicable variables:

            <strong>NumMissing</strong>      <strong>Min</strong>      <strong>Median</strong>       <strong>Max</strong>          <strong>Mean</strong>            <strong>Std</strong>    

    <strong>read</strong>        0            28         50         76         52.2300        10.2529  
    <strong>math</strong>        0            33         52         75         52.6450         9.3684  
    <strong>prog</strong>        0                                                                     
    <strong>apt </strong>        0           352        633        800        640.0350        99.2190  

Observations:
    Total    LeftCensored    Uncensored    RightCensored
    _____    ____________    __________    _____________

     200          0             183             17      

Coefficients
                       betaout    sebetaout     tout         pval   
                       _______    _________    _______    __________

    (Intercept)         209.57      32.774      6.3943    1.1763e-09
    read                2.6979     0.61879        4.36    2.1086e-05
    math                5.9145      0.7099      8.3314    1.4324e-14
    prog_general       -12.715      12.406     -1.0249       0.30669
    prog_vocational    -46.144      13.724     -3.3623    0.00093117
    sigma               65.677      3.4827      18.858    9.5608e-46

Number of observations: 200, Error degrees of freedom:194
Log-likelihood: -1041.0629
R-squared: 0.78247

Click here for the graphical output of this example (link to Ro.S.A. website)

Input Arguments

expand all

`y` — Response variable. Vector.

Response variable, specified as a vector of length n, where n is the number of observations. Each entry in y is the response for the corresponding row of X.

Data Types: array or table

`X` — Predictor variables in the regression equation. Matrix.

Matrix of explanatory variables (also called 'regressors') of dimension n x (p-1) where p denotes the number of explanatory variables including the intercept. Rows of X represent observations, and columns represent variables. By default, there is a constant term in the model, unless you explicitly remove it using input option intercept, so do not include a column of 1s in X. Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

Data Types: array or table

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example:

 'bsb',[3 5 20:30]
, 'dispresults',true
, 'left',1
, 'initialbeta',[3 8]
, 'intercept',false
, 'nocheck',false
, 'right',800
,'optmin.Display','off'

`bsb` —units forming subset.vector.

m x 1 vector.

The default value of bsb is 1:length(y), that is all units are used to compute parameter estimates.

Example: 'bsb',[3 5 20:30]

Data Types: double

`dispresults` —Display results of final fit.boolean.

If dispresults is true, labels of coefficients, estimated coefficients, standard errors, tstat and p-values are shown on the screen in a fully formatted way. The default value of dispresults is false.

Example: 'dispresults',true

Data Types: logical

`left` —left limit for the censored dependent variable.scalar.

If set to -Inf, the dependent variable is assumed to be not left-censored; default value of left is zero (classical Tobit model).

Example: 'left',1

Data Types: double

`initialbeta` —initial estimate of beta.vector.

p x 1 vector. If initialbeta is not supplied (default) standard least squares is used to find initial estimate of beta

Example: 'initialbeta',[3 8]

Data Types: double

`intercept` —Indicator for constant term.true (default) | false.

Indicator for the constant term (intercept) in the fit, specified as the comma-separated pair consisting of 'Intercept' and either true to include or false to remove the constant term from the model.

Example: 'intercept',false

Data Types: boolean

`nocheck` —Check input arguments.boolean.

If nocheck is equal to true no check is performed on supplied structure model

Example: 'nocheck',false

Data Types: logical

`right` —right limit for the censored dependent variable.scalar.

If set to Inf, the dependent variable is assumed to be not right-censored; default value of right is Inf (classical Tobit model).

Example: 'right',800

Data Types: double

`optmin` —It contains the options dealing with the maximization algorithm.structure.

Use optimset to set these options.

Notice that the maximization algorithm which is used is fminunc if the optimization toolbox is present else is fminsearch.

Example: 'optmin.Display','off'

Data Types: double

Output Arguments

expand all

`out` — description Structure

Structure which contains the following fields

Value	Description
`Beta`	p-by-3 matrix containing: 1st col = Estimates of regression coefficients; 2nd col = Standard errors of the estimates of regression coefficients; 3rd col = t-tests of the estimates of regression coefficients;
`LogL`	scalar. Value of the maximized log likelihood (ignoring constants)
`Exflag`	Reason fminunc stopped. Integer. out.Exflag is equal to 1 if the maximization procedure did not produce warnings or the warning was different from "ILL Conditiioned Jacobian". For any other warning which is produced (for example, "Overparameterized", "IterationLimitExceeded", 'MATLAB:rankDeficientMatrix") out.Exflag is equal to -1;
`X`	design matrix which has been used. This output is present just in X is a table containing categorical variables.

More About

expand all

Additional Details

The issue is one where data is censored such that while we observe the value, it is not the true value, which would extend beyond the range of the observed data. This is very commonly seen in cases where the dependent variable has been given some arbitrary cutoff at the lower or upper end of the range, often resulting in floor or ceiling effects respectively. The conceptual idea is that we are interested in modeling the underlying latent variable that would not have such restriction if it was actually observed.

In the standard Tobit model (Tobin 1958), we have a dependent variable $y$ that is left-censored at zero: $\begin{eqnarray} y_i^* & = & x_i^{\prime} \beta+\varepsilon_i \\ y_i = & 0 & \text { if } y_i^* \leq 0 \\ y_i = & y_i^* & \text { if } y_i^*>0 \end{eqnarray}$ Here the subscript $i=1, \ldots, n$ indicates the observation, $y_i^*$ is an unobserved ("latent") variable, $x_i$ is a vector of explanatory variables, $\beta$ is a vector of unknown parameters, and $\varepsilon_i$ is a disturbance term.

The censored regression model is a generalisation of the standard Tobit model.

The dependent variable can be either left-censored, right-censored, or both left-censored and right-censored, where the lower and/or upper limit of the dependent variable can be any number: $\begin{eqnarray} y_i^*=x_i^{\prime} \beta+\varepsilon_i \\ y_i & = & a \qquad \text { if } y_i^* \leq a \\ & = & y_i^* \qquad \text { if } a<y_i^*<b \\ & = & b \qquad \text { if } y_i^* \geq b \end{eqnarray}$ Here $a$ is the lower limit and $b$ is the upper limit of the dependent variable. If $a=-\infty$ or $b=\infty$ , the dependent variable is not left-censored or right-censored, respectively.

Censored regression models (including the standard Tobit model) are usually estimated by the Maximum Likelihood (ML) method. Assuming that the disturbance term $\varepsilon$ follows a normal distribution with mean 0 and variance $\sigma^2$ , the log-likelihood function is $\begin{aligned} \log L=\sum_{i=1}^N & \left[ I_i^a \log \Phi\left(\frac{a-x_i^{\prime} \beta}{\sigma}\right)+I_i^b \log \Phi\left(\frac{x_i^{\prime} \beta-b}{\sigma}\right) \right. \\ & \left.+\left(1-I_i^a-I_i^b\right)\left(\log \phi\left(\frac{y_i-x_i^{\prime} \beta}{\sigma}\right)-\log \sigma\right)\right], \end{aligned}$ where $\phi( \cdot)$ and $\Phi( \cdot )$ denote the probability density function and the cumulative distribution function, respectively, of the standard normal distribution, and $I_i^a$ and $I_i^b$ are indicator functions with $\begin{eqnarray} I_i^a & = & 1 \text { if } y_i=a \\ & = & 0 \text { if } y_i>a \\ I_i^b & = & 1 \text { if } y_i=b \\ & = & 0 \text { if } y_i<b \\ \end{eqnarray}$ In this file the censored log likelihood above is maximized with respect to the parameter vector $(\beta', \sigma)$ using routine fminunc of the optimization toolbox

References

Greene, W.H. (2008), "Econometric Analysis, Sixth Edition", Prentice Hall, pp. 871-875.

Henningsen, A. (2012), Estimating Censored Regression Models in R using the censReg Package, [ https://cran.r-project.org/web/packages/censReg/vignettes/censReg.pdf ]

Kleiber C., Zeileis A. (2008), "Applied Econometrics with R", Springer, New York.

Tobin, J. (1958), Estimation of Relationships for Limited Dependent Variables, "Econometrica", 26, pp. 24-36.

Documentation

regressCens

Syntax

Description

Examples

Tobit regression using the affairs dataset.

Example of right censoring.

Related Examples

Example of right censoring with X a table with categorical variables.

Another example with right censoring.

Input Arguments

`y` — Response variable. Vector.

`X` — Predictor variables in the regression equation. Matrix.

Name-Value Pair Arguments

`bsb` —units forming subset.vector.

`dispresults` —Display results of final fit.boolean.

`left` —left limit for the censored dependent variable.scalar.

`initialbeta` —initial estimate of beta.vector.

`intercept` —Indicator for constant term.true (default) | false.

`nocheck` —Check input arguments.boolean.

`right` —right limit for the censored dependent variable.scalar.

`optmin` —It contains the options dealing with the maximization algorithm.structure.

Output Arguments

`out` — description Structure

More About

Additional Details

References

See Also

Documentation

regressCens

Syntax

Description

Examples

Tobit regression using the affairs dataset.

Example of right censoring.

Related Examples

Example of right censoring with X a table with categorical variables.

Another example with right censoring.

Input Arguments

y — Response variable. Vector.

X — Predictor variables in the regression equation. Matrix.

Name-Value Pair Arguments

bsb —units forming subset.vector.

dispresults —Display results of final fit.boolean.

left —left limit for the censored dependent variable.scalar.

initialbeta —initial estimate of beta.vector.

intercept —Indicator for constant term.true (default) | false.

nocheck —Check input arguments.boolean.

right —right limit for the censored dependent variable.scalar.

optmin —It contains the options dealing with the maximization algorithm.structure.

Output Arguments

out — description Structure

More About

Additional Details

References

See Also

`y` — Response variable. Vector.

`X` — Predictor variables in the regression equation. Matrix.

`bsb` —units forming subset.vector.

`dispresults` —Display results of final fit.boolean.

`left` —left limit for the censored dependent variable.scalar.

`initialbeta` —initial estimate of beta.vector.

`intercept` —Indicator for constant term.true (default) | false.

`nocheck` —Check input arguments.boolean.

`right` —right limit for the censored dependent variable.scalar.

`optmin` —It contains the options dealing with the maximization algorithm.structure.

`out` — description Structure