mdTEM

mdTEM EM algorithm with trimming (TEM) for data with missing values.

Syntax

Description

The algorithm:

- At each iteration compute adjusted partial Mahalanobis distances - Rank them and set weights w_i = 1 for the lowest n*(1-alpha) rows, else 0 - Run E-step and M-step using these weights - Repeat until convergence or maxiter

example

out =mdTEM(Y) Call to mdTEM with all the default options.

example

out =mdTEM(Y, Name, Value) Example of use of option condmeanimp.

Examples

expand all

  • Call to mdTEM with all the default options.
  • True model (choose something correlated)

    p=5; n=200;
    A = randn(p);
    SigmaTrue = A'*A;
    D = diag(1 ./ sqrt(diag(SigmaTrue)));
    SigmaTrue = D * SigmaTrue * D;      % "correlation-like"
    muTrue = linspace(-1,1,p)';
    %  generate complete data
    Yfull = mvnrnd(muTrue', SigmaTrue, n);             % n x p
    missRate = 0.25;     % MCAR missing probability per entry
    missMask = rand(n,p) < missRate;
    Y=Yfull;
    Y(missMask) = NaN;
    out=mdTEM(Y);
    % Show true means and inputed means
    scatter(out.loc,muTrue)
    refline(1)
    xlabel('Imputed means')
    ylabel('True means')

  • Example of use of option condmeanimp.
  • number of variables

    p = 15;                
    % number of observations
    n = 1000;            
    % target pairwise correlation (0<rho<1)
    rho = 0.9;            
    % Covariance matrix (unit variances)
    Sigma = (1-rho)*eye(p) + rho*ones(p);
    R = chol(Sigma);      % upper-triangular such that Sigma = R'*R
    % Generate samples ~ N(0,Sigma)
    Yfull = randn(n,p) * R;   % Strong positive correlation between the vars
    missRate = 0.25;     % MCAR missing probability per entry
    missMask = rand(n,p) < missRate;
    Y=Yfull;
    Y(missMask) = NaN;
    % md with missing imputation
    out=mdTEM(Y,'condmeanimp',true);
    % Mahalanobis distances using original matrix
    d2Ori=mahalFS(Yfull,mean(Yfull),cov(Yfull));
    % Calculate the Mahalanobis distance for the imputed data
    d2Imp = mahalFS(out.Yimp, mean(out.Yimp), cov(out.Yimp));
    % Compare original with distances for the imputed data
    % Calculate the differences between original and imputed Mahalanobis distances
    scatter(d2Ori,d2Imp)
    % Add axis labels
    xlabel('Original Mahalanobis Distances');
    ylabel('Imputed Mahalanobis Distances');
    grid on
        ticker            Name       
        _______    __________________
    
        "^IXIC"    "Nasdaq Composite"
        "AAPL"     "Apple"           
        "MSFT"     "Microsoft"       
        "NVDA"     "NVIDIA"          
        "AMZN"     "Amazon"          
        "GOOGL"    "Alphabet Class A"
    
    Requested pair range=1y, interval=1m is not supported.
    Using interval=1d instead.
    
    Processing ticker 1 of 5: AAPL
    
    Processing ticker 2 of 5: MSFT
    
    Processing ticker 3 of 5: NVDA
    
    Processing ticker 4 of 5: AMZN
    
    Processing ticker 5 of 5: GOOGL
      5×1 struct array with fields:
    
        Ticker
        LastPeriod
        intervalRequested
        intervalActual
        TimeZone
        TT
        Indicators
        Success
        Message
        class
    
                                       CompanyName           TickerSymbol    marketCap               sector                           industry             
                                _________________________    ____________    __________    __________________________    __________________________________
    
        AppleInc_               {'Apple Inc.'           }     {'AAPL' }      3.7615e+12    {'Technology'            }    {'Consumer Electronics'          }
        MicrosoftCorporation    {'Microsoft Corporation'}     {'MSFT' }      2.7757e+12    {'Technology'            }    {'Software - Infrastructure'     }
        NVIDIACorporation       {'NVIDIA Corporation'   }     {'NVDA' }      4.3115e+12    {'Technology'            }    {'Semiconductors'                }
        Amazon_com_Inc_         {'Amazon.com, Inc.'     }     {'AMZN' }      2.2519e+12    {'Consumer Cyclical'     }    {'Internet Retail'               }
        AlphabetInc_            {'Alphabet Inc.'        }     {'GOOGL'}      3.5779e+12    {'Communication Services'}    {'Internet Content & Information'}
    
    
    Click here for the graphical output of this example (link to Ro.S.A. website).

    Input Arguments

    expand all

    Y — Input data. Matrix.

    n x p data matrix; n observations and v variables possibly with missing values (NaN's). Rows of Y represent observations, and columns represent variables.

    Data Types: single | double

    Name-Value Pair Arguments

    Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

    Example: 'alpha',0.1 , 'mus',[] , 'sigs',eye(p) , 'maxiter',50 , 'tol',1e-10 , 'tol_sigma',false , 'method','chiMap' , 'condmeanimp',true , 'stochimp',true , 'stochimp',true

    alpha —proportion to trim.real number in the interval [0 0.

    5] or empty value.

    At each iteration compute adjusted partial Mahalanobis distance and set weights w_i = 1 for the lowest n*(1-alpha) rows. (e.g., 0.5 -> keep 50% with smallest distances). If alpha is empty the default value which is used is 0.5.

    Example: 'alpha',0.1

    Data Types: single | double

    mus —initial mean.p x 1 vector | empty double.

    Initial mean vector. If empty (default), column nanmeans are used.

    Example: 'mus',[]

    Data Types: single | double

    sigs —initial covariance matrix.p x p matrix | empty double.

    Initial p x p covariance matrix.

    If empty, uses nan-cov

    Example: 'sigs',eye(p)

    Data Types: single | double

    maxiter —maximum number of iterations.positive integer.

    The default value is 100

    Example: 'maxiter',50

    Data Types: single | double

    tol —tolerance for convergence.positive real number.

    The default value of the tolerance is 1e-5

    Example: 'tol',1e-10

    Data Types: single | double

    tol_sigma —Use tolerance for both mu sigs.boolean .

    If true use both mu and sigma diffs (default true)

    Example: 'tol_sigma',false

    Data Types: logical

    method —method used to rescale the distances.string scalar | char vector.

    Possible values are.

    'pri' = principled EM rescaling (default), d2_partial + (p - pobs).

    'expScale' = expectation scaling, d2_partial * (p / pobs).

    'zMap' = standardization mapping, p + sqrt(2*p) * ((d2_partial - pobs) ./ sqrt(2*pobs)).

    'detMap' = determinant-based rescaling, d2_partial * (p / pobs) * (g_full / g_obs).

    'chiMap' = chi-square quantile mapping. Use the cdf and inverse of the cdf of Chi2 distribution.

    'betaMap' = Beta quantile mapping. Use the cdf and inverse of the cdf of Beta distribution.

    'impMD' = MD on EM-imputed data.

    Example: 'method','chiMap'

    Data Types: string scalar | char vector

    condmeanimp —Also give the matrix of conditional mean imputed values.boolean.

    if true structure out also contains the matrix of imputed values called Yimp.

    The default value of condmeanimp is false.

    Example: 'condmeanimp',true

    Data Types: logical

    stochimp —Also give the matrix of stochastic imputed values.boolean.

    if true structure out also contains the matrix of imputed values called stochYimp.

    The default value of stochimp is false.

    Example: 'stochimp',true

    Data Types: logical

    storeobj —Compute value of the objective function in each iteration.also give the matrix of stochastic imputed values.

    Boolean.

    if true structure out also contains the matrix of imputed values called stochYimp.

    The default value of stochimp is false.

    Example: 'stochimp',true

    Data Types: logical

    Output Arguments

    expand all

    out — description Structure

    Structure which contains the following fields

    Value Description
    loc

    final estimates of means

    cov

    final estimate of cov matrix

    iter

    number of iterations to convergence.

    Yimp

    empty value of matrix Y with imputed values (depending on input option condmeanimp)

    stochYimp

    empty value of matrix Y with imputed values (only if input option stochimp is true)

    obj

    empty value or value of the objective function (trimmed sum of smallest MD) in each iteration (only if input option storeobj is true)

    References

    Little, R. J. A., & Rubin, D. B. (2019). Statistical Analysis with Missing Data (3rd ed.). Hoboken, NJ: John Wiley & Sons.

    Templ, M. (2023). Visualization and Imputation of Missing Values: With Applications in R. Cham, Switzerland: Springer Nature.

    This page has been automatically generated by our routine publishFS