Processing math: 0%

ScoreYJall

Computes all the 4 score tests for YJ transformation

Syntax

  • outSC=ScoreYJall(y,X)example
  • outSC=ScoreYJall(y,X,Name,Value)example

Description

The transformations for negative and positive responses were determined by Yeo and Johnson (2000) by imposing the smoothness condition that the second derivative of zYJ(λ) with respect to y be smooth at y = 0. However some authors, for example Weisberg (2005), query the physical interpretability of this constraint which is often violated in data analysis. Accordingly, Atkinson et al (2019) and (2020) extend the Yeo-Johnson transformation to allow two values of the transformations parameter: λN for negative observations and λP for non-negative ones.

ScoreYJall computes: 1) the global t test associated with the constructed variable for λ=λP=λN.

2) the t test associated with the constructed variable computed assuming a different transformation for positive observations keeping the value of the transformation parameter for negative observations fixed. In short we call this test, "test for positive observations".

3) the t test associated with the constructed variable computed assuming a different transformation for negative observations keeping the value of the transformation parameter for positive observations fixed. In short we call this test, "test for negative observations".

4) the F test for the joint presence of the two constructed variables described in points 2) and 3).

example

outSC =ScoreYJall(y, X) Example in which positive and negative observations require the same lambda.

example

outSC =ScoreYJall(y, X, Name, Value) Example in which positive and negative observation require different lambda.

Examples

expand all

  • Example in which positive and negative observations require the same lambda.
  • rng('default')
    rng(100)
    n=100;
    y=randn(n,1);
    % Transform the value to find out if we can recover the true value of
    % the transformation parameter.
    la=0.5;
    ytra=normYJ(y,[],la,'inverse',true);
    % Start the analysis
    X=ones(n,1);
    [outSC]=ScoreYJall(ytra,X,'intercept',false);
    la=[-1 -0.5 0 0.5 1]';
    Sco=[la outSC.Score];
    Scotable=array2table(Sco,'VariableNames',{'lambda','Tall','Tp','Tn','Ftest'});
    disp(Scotable)
    % Comment: if we consider the 5 most common values of lambda, the value
    % of the score test when lambda=0.5 is the only one that is not
    % significant. Both values of the score test for positive and negative
    % observations and the Ftest confirm that this value of the
    % transformation parameter is OK for both sides of the distribution.
        lambda     Tall        Tp         Tn       Ftest 
        ______    _______    _______    _______    ______
    
           -1      21.177     32.887     14.266     575.2
         -0.5      12.662     15.573      9.192    122.47
            0      5.0377     5.3971     3.9745    14.416
          0.5     -1.6988    -1.4896    -1.7737    1.5805
            1     -8.4803    -7.4925    -9.3782    43.761
    
    

  • Example in which positive and negative observation require different lambda.
  • rng(2000)
    n=100;
    y=randn(n,1);
    % Transform in a different way positive and negative values.
    lapos=0;
    ytrapos=normYJ(y(y>=0),[],lapos,'inverse',true);
    laneg=1;
    ytraneg=normYJ(y(y<0),[],laneg,'inverse',true);
    ytra=[ytrapos; ytraneg];
    % Start the analysis.
    X=ones(n,1);
    % also compute lik. ratio test based on MLE of laP and laN
    scoremle=true;
    [outSC]=ScoreYJall(ytra,X,'intercept',false,'scoremle',scoremle);
    la=[-1 -0.5 0 0.5 1]';
    Sco=[la outSC.Score];
    Scotable=array2table(Sco,'VariableNames',{'lambda','Tall','Tp','Tn','FtestPN' 'FtestLR'});
    disp(Scotable)
    % Comment: if we consider the 5 most common values of lambda,
    % the value of the score test when lambda=0.5 is the only one which is not
    % significant. However, when lambda=0.5, the score test for negative
    % observations is highly significant.
    disp('Difference between the test for positive and the test for negative')
    disp(abs(Scotable{4,3}-Scotable{4,4})),
    % which is very large.
    % This indicates that the two tails need a different value of the
    % transformation parameter.
        lambda     Tall        Tp         Tn       FtestPN    FtestLR
        ______    _______    _______    _______    _______    _______
    
           -1      36.467      55.13     25.519    1610.7     198.99 
         -0.5      20.184     22.481     16.391    250.17     61.484 
            0      7.8511     6.7795      8.197    33.341     12.708 
          0.5     -1.5618     -2.642    0.61876     14.23     3.0175 
            1     -11.647    -11.721    -9.2192    68.153     27.268 
    
    Difference between the test for positive and the test for negative
        3.2608
    
    

    Input Arguments

    expand all

    y — Response variable. Vector.

    A vector with n elements that contains the response variable.

    It can be either a row or a column vector.

    Data Types: single| double

    X — Predictor variables. Matrix.

    Data matrix of explanatory variables (also called 'regressors') of dimension (n x p-1). Rows of X represent observations, and columns represent variables.

    Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

    Data Types: single| double

    Name-Value Pair Arguments

    Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

    Example: 'intercept',false , 'la',[0 0.5] , 'scoremle',true , 'usefmin',true , 'nocheck',true

    intercept —Indicator for constant term.true (default) | false.

    Indicator for the constant term (intercept) in the fit, specified as the comma-separated pair consisting of 'Intercept' and either true to include or false to remove the constant term from the model.

    Example: 'intercept',false

    Data Types: boolean

    la —transformation parameter.vector.

    It specifies for which values of the transformation parameter it is necessary to compute the score test. Default value of lambda is la=[-1 -0.5 0 0.5 1]; that are the five most common values of lambda.

    Example: 'la',[0 0.5]

    Data Types: double

    scoremle —likelihood ratio test for the two different transformation parameters and \lambda_N.boolean.

    If scoremle is true, it is possible to compute the likelihood ratio test. In this case, the residual sum of squares of the null model based on a single transformation parameter, \lambda is compared with the residual sum of squares of the model based on transformed data using MLE of \lambda_P and \lambda_N. If scoremle is true, it is possible through following option usefmin, to control the parameters of the optimization routine.

    Example: 'scoremle',true

    Data Types: logical

    usefmin —use solver to find MLE of lambda.boolean | struct.

    If usefmin is true or usefmin is a struct, it is possible to use MATLAB solvers fminsearch or fminunc to find the maximum likelihood estimates of \lambda_P and \lambda_N. The default value of usefmin is false, which means the solver is not used, and the likelihood is evaluated at the grid of points with steps 0.01.

    If usefmin is a structure, it may contain the following fields:

    Value Description
    MaxIter

    Maximum number of iterations (default is 1000).

    TolX

    Termination tolerance for the parameters (default is 1e-7).

    solver

    name of the solver. Possible values are 'fminsearch' (default) and 'fminunc'. fminunc needs the optimization toolbox.

    displayLevel

    amount of information displayed by the algorithm. Possible values are 'off' (displays no information, this is the default), 'final' (displays just the final output) and 'iter' (displays iterative output to the command window).

    Example: 'usefmin',true

    Data Types: boolean or struct

    nocheck —Check input arguments.boolean.

    If nocheck is equal to true, no check is performed on matrix y and matrix X. Notice that y and X are left unchanged. In other words, the additional column of ones for the intercept is not added. As default nocheck=false.

    Example: 'nocheck',true

    Data Types: boolean

    Output Arguments

    expand all

    outSC — description Structure

    Containing the following fields:

    Value Description
    Score

    score tests. Matrix.

    Matrix of size length(la)-by-5 that contains the value of the score test for each value of lambda specified in optional input parameter la. The first column refers to the global test, the second to the test for positive observations, the third refers to the test for negative observations and the fourth column refers to the F test for the joint presence of the two constructed variables.

    If input option scoremle is true, the fifth column will contain the exact likelihood ratio test based on the maximum likelihood estimates of the \lambda_P and \lambda_N.

    If la is not specified, the number of rows of outSc.Score is equal to 5 and will contain the values of the score tests for the 5 most common values of lambda.

    laMLE

    MLE of lambda. Vector.

    Vector of dimension 2 which contains the value of maximum likelihood estimate of \lambda_P and \lambda_N. This output is present only if input option scoremle is true.

    References

    Yeo, I.K. and Johnson, R. (2000), A new family of power transformations to improve normality or symmetry, "Biometrika", Vol. 87, pp. 954-959.

    Atkinson, A.C. Riani, M., Corbellini A. (2019), The analysis of transformations for profit-and-loss data, Journal of the Royal Statistical Society, Series C, "Applied Statistics", https://doi.org/10.1111/rssc.12389

    Atkinson, A.C. Riani, M. and Corbellini A. (2021), The Box–Cox Transformation: Review and Extensions, "Statistical Science", Vol. 36, pp. 239-255, https://doi.org/10.1214/20-STS778

    This page has been automatically generated by our routine publishFS