ctsub computes numerical integration from x(1) to z(i) of y=f(x).
For a function defined with n pairs (x_1,y_1) \ldots (x_n,y_n), with x_1 \leq x_2 \leq, \ldots, \leq x_n this routine computes the (approximate) integral using the trapezoidal rule a_i = \int_{x_1}^{z_i} f(x) dx For further details see the section More About.
n=1000;
x=sort(randn(n,1))+10;
y=1+2*x+ randn(n,1);
% y(50:70)=-5;
z=randn(n,1);
a=ctsub(x,y,z);
subplot(2,1,1)
plot(x,y)
subplot(2,1,2)
plot(x,a)
n=1000;
x=sort(randn(n,1));
y=randn(n,1);
% If the third argument of ctsub is equal to the first
% argument then the output of cumtraps and ctsub is exactly the same.
res=cumtrapz(x,y);
res1=ctsub(x,y,x);
disp(max(abs(res-res1)))
n=1000;
x=10*sort(randn(n,1))+10;
% The variance of y depends on x.
y=1+1*x+ 10*x.*randn(n,1);
% Upper limits of integration.
z=rand(n,1)*100-50;
a=ctsub(x,y,z);
subplot(2,1,1)
plot(x,y)
title('Original data')
subplot(2,1,2)
plot(x,a)
title('Transformed data after the variance stabilizing transformation')
Generate the data
rng(2000)
la=0;
n=100;
X=10*rand(n,1);
sigma=0.1;
a=2;
b=0.3;
y=a+b*X+sigma*randn(n,1);
% The correct transformation is log
y=normBoxCox(y,1,la,'inverse',true);
% Data standardization
y=zscore(y);
X=zscore(X);
out=fitlm(X,y);
yhat=out.Fitted;
res=y-yhat;
figure
nr=2;
nc=3;
subplot(nr,nc,1)
plot(X,y,'o')
hold('on')
[xsor,ordx]=sort(X);
yordx=yhat(ordx);
plot(xsor,yordx)
xlabel('Original x (standardized)')
ylabel('Original y (standardized)')
title(['R2=' num2str(out.Rsquared.Ordinary)])
subplot(nr,nc,2)
[yhatsor,ordyhat]=sort(yhat);
% Residuals sorted using the indexes of sorted fitted values
resordyhat=res(ordyhat);
yordyhat=y(ordyhat);
scatter(yhatsor,resordyhat)
xlabel('Sorted fitted values')
ylabel('Residuals (e) in correspondence of sorted fitted values')
% z2 = log abs value of the residuals
% ztar_sorted = log abs value of the residuals (sorted using ordering based on yhat)
z2=log(sqrt(res.^2));
ztar_sorted=z2(ordyhat);
% Smooth the log of (the sample version of) the |residuals|
% against fitted values. Use the ordering based on fitted values.
[smo,yspan]=rlsmo(yhatsor,ztar_sorted);
scatter(yhatsor,ztar_sorted)
hold('on')
plot(yhatsor,smo)
xlabel('Sorted fitted values')
ylabel('log |e| and smoothed values')
subplot(nr,nc,3)
z7=exp(-smo);
plot(yhatsor,z7)
xlabel('Sorted fitted values')
ylabel('exp(-(log |e| smoothed)=smoothed 1/|e| ')
subplot(nr,nc,4)
z7=exp(-smo);
ytnew=ctsub(yhatsor,z7,yordyhat);
plot(yhatsor,ytnew,'o')
xlabel('Sorted fitted values')
ylabel('Transformed y (after integration)')
tynew=y;
tynew(ordyhat)=ytnew;
subplot(nr,nc,5)
tynew=zscore(tynew);
plot(y,tynew,'o')
xlabel('Original y')
ylabel('Transformed y (standardized)')
refline(1,0)
subplot(nr,nc,6)
outAVASOneIteration=fitlm(X,tynew);
yhatt=outAVASOneIteration.Fitted;
plot(X,tynew,'o')
hold('on')
plot(xsor,yhatt(ordx))
xlabel('Original x (standardized)')
ylabel('Transformed y (standardized) and fitted values')
title(['R2=' num2str(outAVASOneIteration.Rsquared.Ordinary)])
n=10;
x=(1:n)';
y=3*ones(n,1);
y(10)=5;
y(1)=1;
% The overall area in the interval [1 10] is equal to 27
disp(['Overall area: ' num2str(ctsub(x,y,10))])
disp('Area when x=11 using the rectangular hypothesis Hp:f(11)=5')
disp(ctsub(x,y,11,false))
disp('Area when x=11 using the trapezoidal hypothesis Hp:f(11)=3')
disp(ctsub(x,y,11,true))
x
— Predictor variable sorted.
Vector.Vector of length n containing ordered abscissa values. Note that the length of x must be equal to the length of y.
Note that the x values are assumed non decreasing.
Data Types: single| double
y
— Response variable.
Vector.Vector of length n containing ordinate values. Note that the length of x must be equal to the length of y.
Data Types: single| double
z
— Upper limits of integration.
Vector.Vector of length k containing the upper integration limits. Note that length(z) is not necessarily equal to length(x).
Data Types: single| double
Trapezoid
— strategy for evaluating z points that lie outside the domain of x.
Boolean.This options specifies the hypothesis to assume for the cases in which z(i)<x(1) or z(i)>x(n). If this argument is omitted or it is false we assume a rectangular hypothesis (default). In other words, we assume that below x(1) the function is constant and equal to y(1). Similarly, we assume that beyond x(n) the function is constant and equal to y(n).
Therefore for example for if z(i)<min(x) the result of the integral is a(i) =(z(i)-x(1))*y(1) (rectangular hypothesis).
If this argument is true we assume that the y coordinate z(i) is equal to mean(y).
Therefore for example for if z(i)<min(x) the result of the integral is a(i) =(z(i)-x(1))*(y(1)+mean(y))/2 (trapezoidal hypothesis).
Example: true
Data Types: logical
a
—Result of numerical integration.
VectorVector of length k containing the results of the k numerical integrations.
The i-th element of a_i with i=1, 2, \ldots, n is equal to:
a_i = \int_{x_1}^{z_i} f(x) dxThis function estimates the integral of Y via the trapezoidal method.
For a function defined y=f(x) with n pairs (x_1,y_1) \ldots (x_n,y_n), with x_1 \leq x_2 \leq, \ldots, \leq x_n, if x_i<z_i \leq x_{i+1}, i=1, \ldots, n-1, this routine computes the (approximate) integral using the trapezoidal rule:
a_i = \int_{x_1}^{z_i} f(x) dx More precisely \begin{equation}\label{ai} a_i= \sum_{j=2}^{i} 0.5 (x_j-x_{j-1})(y_j+y_{j-1}) +0.5 (z_i-x_i) \left\{ 2y_i+(z_i-x_i) \frac{y_{i+1}-y_i}{x_{i+1}-x_i} \right\} \end{equation} The last term of the equation is the area of the trapezoid with coordinates (x_i, y_i),(z_i, f(z_i)) and f(z_i)=y_i+(z_i-x_i) \frac{y_{i+1}-y_i}{x_{i+1}-x_i}is found by linear interpolation.
If Trapezoid is omitted or is false, when z_i>x_n the function for x >x_n is assumed constant and equal to y_n therefore, to the expression of a_i computed as described above, we must add (z_i-x_n) y_n.
On the other hand, if z_i<x_1 the function for x<x_1 is assumed constant and equal to y_1 therefore a_i is computed as:
a_i = -(x_1-z_i) y_1Note that a_i in this last case (if y_1 is positive) becomes negative.
On the other hand, if Trapezoid is true when z_i>x_n or when z_i<x_1, we assume f(z_i)=\sum_{i=1}^n y_i/n.
This routine in called in every step of the outer loop by function avas in order to compute a new set of transformed values for the response which have approximately constant variance.
Inside avas, the x coordinates are the fitted values ordered, the y coordinates are the reciprocal of the smoothed absolute values of residuals sorted using the ordering of fitted values, while the upper values of the range of integration are given by the elements vector of transformed values in the previous iteration. The output is the new vector of transformed values ty.
Tibshirani R. (1987), Estimating optimal transformations for regression, "Journal of the American Statistical Association", Vol. 83, 394-405.