ScoreYJmle

ScoreYJmle computes the likelihood ratio test fof H_0=lambdaP=lambdaP0 and lambdaN=lambdaN0

Syntax

  • outSC=ScoreYJmle(y,X)example
  • outSC=ScoreYJmle(y,X,Name,Value)example

Description

The transformations for negative and positive responses were determined by Yeo and Johnson (2000) by imposing the smoothness condition that the second derivative of zYJ(λ) with respect to y be smooth at y = 0. However some authors, for example Weisberg (2005), query the physical interpretability of this constraint which is oftern violated in data analysis. Accordingly, Atkinson et al (2019) and (2020) extend the Yeo-Johnson transformation to allow two values of the transformations parameter: λN for negative observations and λP for non-negative ones.

$\lambda$ is the transformation parameter (scalar) for all the obseravtions (positive adn negative).

$\lambda_P$ is the transformation parameter for positive observations.

$\lambda_N$ is the transformation parameter for negative observations.

SSR is the residual sum of squares of the model which regresses $z(λ)$ against X.

SSF is the residual sum of squares of the model which regresses $z(\hat λ_{MLE})$ against $X$ where $\lambda_{MLE}$ is the vector of length 2 which contains the MLE of $\lambda_P$ and $\lambda_N$ ScoreYJmle computes Num/Den where Num and Den are defined as follows:

Num=(SSR-SSF)/2 and Den=SSF/(n-p-2) where p is the number of columns of matrix X (including intercept).

example

outSC =ScoreYJmle(y, X)

example

outSC =ScoreYJmle(y, X, Name, Value) Ex in which positive and negative observations require different lambdas.

Examples

expand all

  • rng('default')
    rng(1)
    n=100;
    yori=randn(n,1);
    % Transform the value to find out if we can recover the true value of
    % the transformation parameter
    la=0.5;
    ytra=normYJ(yori,[],la,'inverse',true);
    % Start the analysis
    X=ones(n,1);
    [outSCmle]=ScoreYJmle(ytra,X,'intercept',0);
    la=[-1 -0.5 0 0.5 1]';
    Comb=[la outSCmle.Score(:,1)];
    CombT=array2table(Comb,'VariableNames',{'la', 'FtestLR'});
    disp(CombT)

  • Ex in which positive and negative observations require different lambdas.
  • rng(1000)
    n=100;
    y=randn(n,1);
    % Transform in a different way positive and negative values
    lapos=0;
    ytrapos=normYJ(y(y>=0),[],lapos,'inverse',true);
    laneg=1;
    ytraneg=normYJ(y(y<0),[],laneg,'inverse',true);
    ytra=[ytrapos; ytraneg];
    % Start the analysis
    X=ones(n,1);
    la=[-1:0.25:1]';
    [outSCmle]=ScoreYJmle(ytra,X,'intercept',0,'la',la);
    Pval=fcdf(outSCmle.Score,2,n-2,'upper');
    Comb=[la outSCmle.Score(:,1) Pval];
    CombT=array2table(Comb,'VariableNames',{'la','FtestLR' 'Pvalues'});
    disp(CombT)
    disp('The test is significant for all values of lambda')
    disp('This may indicate that the data need two separate lambdas for pos and neg observations')
         la      FtestLR     Pvalues  
        _____    _______    __________
    
           -1    350.59     2.1902e-45
        -0.75    195.62     6.0722e-35
         -0.5    106.44      2.711e-25
        -0.25    54.822     1.0505e-16
            0    25.305     1.3794e-09
         0.25    9.5895     0.00015719
          0.5    3.6966       0.028332
         0.75    6.7847      0.0017393
            1    21.293     2.0943e-08
    
    The test is significant for all values of lambda
    This may indicate that the data need two separate lambdas for pos and neg observations
    

    Input Arguments

    expand all

    y — Response variable. Vector.

    A vector with n elements that contains the response variable. It can be either a row or a column vector.

    Data Types: single| double

    X — Predictor variables. Matrix.

    Data matrix of explanatory variables (also called 'regressors') of dimension (n x p-1). Rows of X represent observations, and columns represent variables.

    Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

    Data Types: single| double

    Name-Value Pair Arguments

    Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

    Example: 'intercept',false , 'la',[0 0.5] , 'usefmin',true , 'sseReducedModel',[20.2 30.3 12.8] , 'nocheck',1

    intercept —Indicator for constant term.true (default) | false.

    Indicator for the constant term (intercept) in the fit, specified as the comma-separated pair consisting of 'Intercept' and either true to include or false to remove the constant term from the model.

    Example: 'intercept',false

    Data Types: boolean

    la —transformation parameter.vector.

    It specifies for which values of the transformation parameter it is necessary to compute the score test. Default value of lambda is la=[-1 -0.5 0 0.5 1]; that is the five most common values of lambda

    Example: 'la',[0 0.5]

    Data Types: double

    usefmin —use solver to find MLE of lambda.boolean | struct.

    if usefmin is true or usefmin is a struct it is possible to use MATLAB solvers fminsearch or fminunc to find the maximum likelihood estimates of $\lambda_P$ and $\lambda_N$. The default value of usefmin is false that is solver is not used and the likelihood is evaluated at the grid of points with steps 0.01.

    If usefmin is a structure it may contain the following fields:

    usefmin.MaxIter = Maximum number of iterations (default is 1000).

    usefmin.TolX = Termination tolerance for the parameters (default is 1e-7).

    usefmin.solver = name of the solver. Possible values are 'fminsearch' (default) and 'fminunc'. fminunc needs the optimization toolbox.

    usefmin.displayLevel = amount of information displayed by the algorithm. possible values are 'off' (displays no information, this is the default), 'final' (displays just the final output) and 'iter' (displays iterative output to the command window).

    Example: 'usefmin',true

    Data Types: boolean or struct

    sseReducedModel —sum of squares of residuals of reduced model.vector.

    Vector with the same length of input vector lambda containing the sum of squares of residuals of the reduced model. The default value of sseReducedModel is an empty value that is this quantity is computed by this routine.

    Example: 'sseReducedModel',[20.2 30.3 12.8]

    Data Types: empty value or double

    nocheck —Check input arguments.scalar.

    If nocheck is equal to 1 no check is performed on matrix y and matrix X. Notice that y and X are left unchanged. In other words the additional column of ones for the intercept is not added. As default nocheck=0.

    Example: 'nocheck',1

    Data Types: double

    Output Arguments

    expand all

    outSC — description Structure

    containing the following fields:

    Value Description
    Score

    score tests. Vector.

    Column vector of length(la) which contains the value of the likelihood ratio test for each value of lambda specified in optional input parameter la.

    laMLE

    score tests. Matrix.

    Matrix of dimension length(la)-by-2 which contains the value of maximum likelihood estimate of $\lambda_P$ and $\lambda_N$ for each value of lambda specified in optional input parameter la. First column refers to $\lambda_P$ and second column to $\lambda_N$

    References

    Yeo, I.K. and Johnson, R. (2000), A new family of power transformations to improve normality or symmetry, "Biometrika", Vol. 87, pp. 954-959.

    Atkinson, A.C. Riani, M., Corbellini A. (2019), The analysis of transformations for profit-and-loss data, Journal of the Royal Statistical Society, Series C, "Applied Statistics", https://doi.org/10.1111/rssc.12389

    Atkinson, A.C. Riani, M. and Corbellini A. (2020), The Box-Cox Transformation: Review and Extensions, "Statistical Science", in press.

    This page has been automatically generated by our routine publishFS