mdEM

mdEM EM algorithm for data with missing values (no trimming)

expand all in page

Syntax

out=mdEM(Y)example
out=mdEM(Y,Name,Value)example

Description

example

out =mdEM(Y) Call to mdEM with all the default options.

example

out =mdEM(Y, Name, Value) Example of use of option condmeanimp.

Examples

expand all

Call to mdEM with all the default options.

. True model (choose something correlated)

p=5; n=200;
A = randn(p);
SigmaTrue = A'*A;
D = diag(1 ./ sqrt(diag(SigmaTrue)));
SigmaTrue = D * SigmaTrue * D;      % "correlation-like"
muTrue = linspace(-1,1,p)';
%  generate complete data
Yfull = mvnrnd(muTrue', SigmaTrue, n);             % n x p
missRate = 0.25;     % MCAR missing probability per entry
missMask = rand(n,p) < missRate;
Y=Yfull;
Y(missMask) = NaN;
out=mdEM(Y);
% Show true means and inputed means
scatter(out.loc,muTrue)
xlabel('Imputed means')
ylabel('True means')

Example of use of option condmeanimp.

. number of variables

p = 15;                
% number of observations
n = 1000;            
% target pairwise correlation (0<rho<1)
rho = 0.9;            
% Covariance matrix (unit variances)
Sigma = (1-rho)*eye(p) + rho*ones(p);
R = chol(Sigma);      % upper-triangular such that Sigma = R'*R
% Generate samples ~ N(0,Sigma)
Yfull = randn(n,p) * R;   % Strong positive correlation between the vars
missRate = 0.25;     % MCAR missing probability per entry
missMask = rand(n,p) < missRate;
Y=Yfull;
Y(missMask) = NaN;
% md with missing imputation
out=mdEM(Y,'condmeanimp',true);
% Mahalanobis distances using original matrix
d2Ori=mahalFS(Yfull,mean(Yfull),cov(Yfull));
% Calculate the Mahalanobis distance for the imputed data
d2Imp = mahalFS(out.Yimp, mean(out.Yimp), cov(out.Yimp));
% Compare original with distances for the imputed data
% Calculate the differences between original and imputed Mahalanobis distances
scatter(d2Ori,d2Imp)
% Add axis labels
xlabel('Original Mahalanobis Distances');
ylabel('Imputed Mahalanobis Distances');
grid on

Click here for the graphical output of this example (link to Ro.S.A. website).

Related Examples

expand all

Example of use of option stochimp.

number of variables

p = 15;                
    % number of observations
    n = 1000;            
    % target pairwise correlation (0<rho<1)
    rho = 0.9;            
    % Covariance matrix (unit variances)
    Sigma = (1-rho)*eye(p) + rho*ones(p);
    R = chol(Sigma);      % upper-triangular such that Sigma = R'*R
    % Generate samples ~ N(0,Sigma)
    Yfull = randn(n,p) * R;   % Strong positive correlation between the vars
    missRate = 0.25;     % MCAR missing probability per entry
    missMask = rand(n,p) < missRate;
    Y=Yfull;
    Y(missMask) = NaN;
    % md with missing imputation
    out=mdEM(Y,'stochimp',true);
    % Mahalanobis distances using original matrix
    d2Ori=mahalFS(Yfull,mean(Yfull),cov(Yfull));
    % Calculate the Mahalanobis distance for the imputed data
    d2Imp = mahalFS(out.stochYimp, mean(out.stochYimp), cov(out.stochYimp));
    % Compare original with distances for the imputed data
    % Calculate the differences between original and imputed Mahalanobis distances
    scatter(d2Ori,d2Imp)
    % Add axis labels
    xlabel('Original Mahalanobis Distances');
    ylabel('Imputed Mahalanobis Distances (stochastic imputation)');
    grid on

Example of use of options Patterns and idxPatterns.

number of variables

p = 3;
    % number of observations
    n = 50000;
    % target pairwise correlation (0<rho<1)
    rho = 0.9;
    % Covariance matrix (unit variances)
    Sigma = (1-rho)*eye(p) + rho*ones(p);
    R = chol(Sigma);      % upper-triangular such that Sigma = R'*R
    % Generate samples ~ N(0,Sigma)
    Yfull = randn(n,p) * R;   % Strong positive correlation between the vars
    missRate = 0.25;     % MCAR missing probability per entry
    missMask = rand(n,p) < missRate;
    Y=Yfull;
    Y(missMask) = NaN;
    M=ismissing(Y);
    [Patterns, ~, idxPatterns] = unique(M, 'rows', 'stable');
    disp('Computational time using missing patterns')
    tic
    outWITHPAT=mdEM(Y,'Patterns',Patterns,'idxPatterns',idxPatterns);
    toc
    disp('Computational time neglecting missing patterns')
    tic
    outNOPAT=mdEM(Y);
    toc

Computational time using missing patterns
Elapsed time is 0.077916 seconds.
Computational time neglecting missing patterns
Elapsed time is 2.477940 seconds.

Input Arguments

expand all

`Y` — Input data. Matrix.

n x p data matrix; n observations and p variables possibly with missing values (NaN's). Rows of Y represent observations, and columns represent variables.

Data Types: single | double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example:

 'mus',[]
, 'Patterns',Patterns
, 'idxPatterns',idxPatterns
, 'maxiter',50
, 'sigs',eye(p)
, 'tol',1e-10
, 'tol_sigma',false
, 'condmeanimp',true
, 'stochimp',true

`mus` —initial mean.p x 1 vector | empty double.

Initial mean vector. If empty (default), column nanmeans are used.

Example: 'mus',[]

Data Types: single | double

`Patterns` —matrix of missingness patterns.2D matrix of size kxp.

Logical or numeric matrix of size k x p, where each row identifies a distinct pattern of missing values in Y. A true (or 1) entry indicates that the corresponding variable is missing, while a false (or 0) entry indicates that it is observed. If empty (default), missingness patterns are computed internally from Y. Supplying Patterns can save computing time when mdEM is called repeatedly on data with the same missingness structure.

Example: 'Patterns',Patterns

Data Types: logical | double

`idxPatterns` —group membership for missingness patterns.numeric vector of length n.

idxPatterns contains, for each row of Y, the index of the corresponding row of Patterns. In other words, idxPatterns(i) = g means that row i of Y has missingness pattern equal to Patterns(g,:). If empty (default), idxPatterns is computed internally together with Patterns. Supplying idxPatterns can save computing time when mdEM is called repeatedly on data with the same missingness structure.

Example: 'idxPatterns',idxPatterns

Data Types: single | double

`maxiter` —maximum number of iterations.positive integer.

The default value is 100

Example: 'maxiter',50

Data Types: single | double

`sigs` —initial covariance matrix.p x p matrix | empty double.

Initial p x p covariance matrix. If empty, uses nan-cov

Example: 'sigs',eye(p)

Data Types: single | double

`tol` —tolerance for convergence.positive real number.

The default value of the tolerance is 1e-5

Example: 'tol',1e-10

Data Types: single | double

`tol_sigma` —Use tolerance for both mu sigs.boolean .

If true use both mu and sigma diffs (default true)

Example: 'tol_sigma',false

Data Types: logical

`condmeanimp` —Also give the matrix of conditional mean imputed values.boolean.

if true structure out also contains the matrix of imputed values called Yimp. The default value of condmeanimp is false.

Example: 'condmeanimp',true

Data Types: logical

`stochimp` —Also give the matrix of stochastic imputed values.boolean.

if true structure out also contains the matrix of imputed values called stochYimp. The default value of stochimp is false.

Example: 'stochimp',true

Data Types: logical

Output Arguments

expand all

`out` — description Structure

Structure which contains the following fields

Value	Description
`loc`	final estimates of means
`cov`	final estimate of cov matrix
`iter`	number of iterations to convergence.
`Yimp`	empty value of matrix Y with imputed values (only if input option condmeanimp is true)
`stochYimp`	empty value of matrix Y with imputed values (only if input option stochimp is true)

References

Little, R. J. A., & Rubin, D. B. (2019). Statistical Analysis with Missing Data (3rd ed.). Hoboken, NJ: John Wiley & Sons.

Templ, M. (2023). Visualization and Imputation of Missing Values: With Applications in R. Cham, Switzerland: Springer Nature.

Documentation

mdEM

Syntax

Description

Examples

Call to mdEM with all the default options.

Example of use of option condmeanimp.

Related Examples

Example of use of option stochimp.

Example of use of options Patterns and idxPatterns.

Input Arguments

`Y` — Input data. Matrix.

Name-Value Pair Arguments

`mus` —initial mean.p x 1 vector | empty double.

`Patterns` —matrix of missingness patterns.2D matrix of size kxp.

`idxPatterns` —group membership for missingness patterns.numeric vector of length n.

`maxiter` —maximum number of iterations.positive integer.

`sigs` —initial covariance matrix.p x p matrix | empty double.

`tol` —tolerance for convergence.positive real number.

`tol_sigma` —Use tolerance for both mu sigs.boolean .

`condmeanimp` —Also give the matrix of conditional mean imputed values.boolean.

`stochimp` —Also give the matrix of stochastic imputed values.boolean.

Output Arguments

`out` — description Structure

References

See Also

Documentation

mdEM

Syntax

Description

Examples

Call to mdEM with all the default options.

Example of use of option condmeanimp.

Related Examples

Example of use of option stochimp.

Example of use of options Patterns and idxPatterns.

Input Arguments

Y — Input data. Matrix.

Name-Value Pair Arguments

mus —initial mean.p x 1 vector | empty double.

Patterns —matrix of missingness patterns.2D matrix of size kxp.

idxPatterns —group membership for missingness patterns.numeric vector of length n.

maxiter —maximum number of iterations.positive integer.

sigs —initial covariance matrix.p x p matrix | empty double.

tol —tolerance for convergence.positive real number.

tol_sigma —Use tolerance for both mu sigs.boolean .

condmeanimp —Also give the matrix of conditional mean imputed values.boolean.

stochimp —Also give the matrix of stochastic imputed values.boolean.

Output Arguments

out — description Structure

References

See Also

`Y` — Input data. Matrix.

`mus` —initial mean.p x 1 vector | empty double.

`Patterns` —matrix of missingness patterns.2D matrix of size kxp.

`idxPatterns` —group membership for missingness patterns.numeric vector of length n.

`maxiter` —maximum number of iterations.positive integer.

`sigs` —initial covariance matrix.p x p matrix | empty double.

`tol` —tolerance for convergence.positive real number.

`tol_sigma` —Use tolerance for both mu sigs.boolean .

`condmeanimp` —Also give the matrix of conditional mean imputed values.boolean.

`stochimp` —Also give the matrix of stochastic imputed values.boolean.

`out` — description Structure