zscoreFS

zscoreFS computes (robust) standardized z scores

Description

X can be a vector of length(n) or data matrix containing n observations on v variables or 3D array of size n-by-v-by-r.

Z = zscoreFS(X) returns a centered, scaled version of X, with the same size as X. For vector input X, Z is the vector of z-scores (X-median(X)) ./ (1.4826* mad(X)).

Z=zscoreFS(X,loc,scale) returns a centered, scaled version of X, the same size as X using location and scale are specified in input parameters 'loc' and 'scale'. For vector input X, Z is the vector of z-scores (X-location(X)) ./ scale(X).

where scaled(X) is the corrected estimator of scale (corrected in the sense that it is multiplied by a coefficient to achieve consistency for normally distributed data).

Z=zscoreFS(X,loc,scale) computes robust standardized zscores using the estimates of location and scale specified in loc and scale strings. If X is a 2D matrix, zscores are computed using loc and scale along each column of X. If X is a 3D array zscores are computed using the location and scale along the first non-singleton dimension. For example if X is n-by-v-by-r (with n>1) and loc='median'; n-by-r medians are computed for each of the n rows of X and each third dimension r.

Z=zscoreFS(X,loc) computes standardized zscores using the estimates of location specified in loc and the mad as measure of dispersion.

[Z,mu,sigma] = zscoreFS(X) also returns median(X) in mu and mad in sigma.

[Z,mu,sigma] = zscoreFS(X,loc,scale) also returns the estimates of location in mu and of scale in sigma as specified in loc and scale strings.

Z=zscoreFS(X,loc,scale,dim) computes robust standardized zscores along the dimension dim of X using the estimates of location and scale specified in loc and scale strings. dim standardizes X by working along the dimension dim of X. For example if X is a two dimensional matrix dim=2 (default) standardizes the columns of X else if dim=1 standardizes the rows. If X is a three dimensional dim = 1 standardizes the columns, dim =2 standardizes the rows and dim =3 standardizes the third dimension.

zscoreFS is an extension of function zscore of statistic toolbox because it enables to specify alternative measures of location and scale.

Z =zscoreFS(X) Scale using medians and mads.

Z =zscoreFS(X, loc) Scale using mean and mads.

Z =zscoreFS(X, loc, scale) Remove the medians and divide by Qn.

Z =zscoreFS(X, loc, scale, dim) Examples with 3D arrays.

[Z, mu] =zscoreFS(___) Report also location and scale measures which have have been used.

[Z, mu, sigma] =zscoreFS(___) 3D arrays with dim=1, dim=2 and dim=3.

Examples

expand all

zscoreFS with all default options (that is remove the medians and divide by mads)

n=200;
v=3;
randn('state', 123456);
Y=randn(n,v);
% Contaminated data
Ycont=Y;
Ycont(1:5,:)=Ycont(1:5,:)+10;
[out]=zscoreFS(Ycont);

Computes standardized zscores using mean and mads estimates of location the medians and the measure of dispersion specified in scale

loc='mean'
X=randn(10,2);

Remove the medians and divide by Qn.

n=200;
v=1;
randn('state', 123456);
Y=randn(n,v);
% Contaminated data
Ycont=Y;
Ycont(1:5,:)=Ycont(1:5,:)+10;
[out]=zscoreFS(Ycont,[],'Qn');
% Alternatively it is possible to use the following sintax
[out]=zscoreFS(Ycont,'median','Qn');

Examples with 3D arrays.

n=200;
v=3;
q=5;
randn('state', 123456);
Y=randn(n,v,q);
% Contaminated data
Ycont=Y;
Ycont(1:5,:,:)=Ycont(1:5,:,:)+10;
[out1,Mu,Sigma]=zscoreFS(Ycont,[],'Sn',1);
% [out,Mu1,Sigma1]=zscoreFS(Ycont,[],'Sn',1);

Report also location and scale measures which have have been used.

zscoreFS produces the same output as function zscore of statistics toolbox if centroid is arithmetic mean and scale measure is the standard deviation

X=randn(10,3,6);
[Z,mu,sig]=zscoreFS(X,'mean','std',3);
[Z1,mu1,sig1]=zscore(X,[],3);
if isequal(Z,Z1) + isequal(mu,mu1) + isequal(sig,sig) ==3
disp('Everything is equal')
else
disp('Equality not reached')
end

3D arrays with dim=1, dim=2 and dim=3.

n=200;
v=3;
q=5;
randn('state', 123456);
Y=randn(n,v,q);
% Contaminated data
Ycont=Y;
Ycont(1:5,:,:)=Ycont(1:5,:,:)+10;
scale='Qn';
loc='mean';
dim=2; % work along rows
[Z,Mu1,Sigma1]=zscoreFS(Ycont,loc,scale,dim);
isequal(Z(3,:,2)',zscoreFS(Ycont(3,:,2),loc,scale))
scale='Qn';
loc='median';
dim=1; % work along columns
[Z,Mu1,Sigma1]=zscoreFS(Ycont,loc,scale,dim);
isequal(Z(:,2,4),zscoreFS(Ycont(:,2,4),loc,scale))
scale='Sn';
loc='median';
dim=3; % work along third dimension
[Z,Mu1,Sigma1]=zscoreFS(Ycont,loc,scale,dim);
isequal(squeeze(Z(7,2,:)),zscoreFS(squeeze(Ycont(7,2,:)),loc,scale))

Related Examples

expand all

Example of use of modmad as a scale measure.

p=3;
X=randn(100,p);
loc='median';
% Project the data using v vectors
v=10;
proj=randn(p,v);
Y=X*proj;
% Standardize the n projected points using median and modified MAD
% Note that Y has v columns but the original matrix X has p columns
[Z,Mu1,Sigma1]=zscoreFS(Y,loc,scale);

Input Arguments

X — Input data. Vector or Matrix or 3D array.

Vector of length n or data matrix containing n observations on v variables or 3D array of size n-by-v-by-r.

Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

Data Types: single|double

loc — location measure to use. 'median' (default) or 'mean'.

String which specifies the location measure to use. The default value is 'median'.

Example: 'median'

Data Types: character

scale — scale measure to use. 'mad' (default) or 'Qn' or 'Sn' or 'std' or moddmadp'.

String which specifies the dispersion measure to use 'mad' is the default. Traditional (corrected) mad is $Me(|x_i-Me(X)|)/norminv(3/4)$;

'Qn' first quartile of interpoint distances $|x_i-x_j|$ corrected by the consistency factor. See function Qn.m;

'Sn' robust Gini's average difference index corrected by the consistency factor. See function Sn.m;

'std' Unbiased standard deviations. See function std.m;

'modmadp'. Modified mad where the last letter(s) p of string modmap is (are) a number converted to string necessary to compute the modified MAD.

Modified MAD = (order statistic $ceil((n+p-1)/2)$ of $|x_i-Me(X)|$ + order statistic $floor((n+p-1)/2+1)$ of $|x_i-Me(X)|)$ / $(2 \sigma)$ where $\sigma= norminv(0.5*((n+p-1)/(2*n)+1))$.

Note that $p$ is different from $v$ (columns of X if X is a matrix) and must be supplied by the user.

For example if p=5 then the user can supply the string 'modmad5' as follows. p=5; modmadp=['modmap' num2str(p)];

Data Types: character

dim — Dimension to operate along. Positive integer scalar.

Dimension to operate along, specified as a positive integer scalar. If no value is specified, then the default is the first array dimension whose size does not equal 1.

Example: 2

Data Types: ingle | double | int8 | int16 | int32 | int64 |uint8 | uint16 | uint32 | uint64

Output Arguments

Z —centered, scaled version of X. Array with the same dimension as input X

Array with the same size as X using location and scale are specified in input parameters 'loc' and 'scale'. For vector input X, Z is the vector of z-scores (X-location(X)) ./ scale(X).

mu —location estimate. Scalar, vector or matrix depending on the size of input matrix X

Estimates of location specified in loc input string.

sigma —scale estimate. Scalar, vector or matrix depending on the size of input matrix X

Estimates of scale specified in scale input string.