FSMedaeasy is exactly equal to FSMeda but it is much less efficient
Run the FS on a simulated dataset by choosing an initial subset formed by the three observations with the smallest Mahalanobis Distance.
n=100; v=3; m0=3; Y=randn(n,v); % Contaminated data Ycont=Y; Ycont(1:5,:)=Ycont(1:5,:)+3; [fre]=unibiv(Y); %create an initial subset with the 3 observations with the lowest %Mahalanobis Distance fre=sortrows(fre,4); bs=fre(1:m0,1); [out]=FSMedaeasy(Ycont,bs);
Monitoring the evolution of minimum Mahalanobis distance.
n=100; v=3; m0=3; Y=randn(n,v); % Contaminated data Ycont=Y; Ycont(1:5,:)=Ycont(1:5,:)+3; [fre]=unibiv(Y); %create an initial subset with the 3 observations with the lowest %Mahalanobis Distance fre=sortrows(fre,4); bs=fre(1:m0,1); [out]=FSMedaeasy(Ycont,bs,'plots',1);
Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND = 1.641939e-16.
Monitoring the minimum Mahalanobis distance and the centroid position.
% In this example the figures of minimum Mahalanobis distance are closed. Y=load('sixty_eighty.txt'); G = 60*ones(140,1); G(1:80)=80; n = size(Y,1); init = floor(n/2); % start from G80 bs80 = [1,2,3]; [out80]=FSMedaeasy(Y,bs80,'plots',2,'init',init,'scaled',1); close(findobj('tag','pl_mmd')); title('Start from G80','Fontsize',14); % start from G60 bs60 = [81,82,83]; [out60]=FSMedaeasy(Y,bs60,'plots',2,'init',init,'scaled',1); close(findobj('tag','pl_mmd')); title('Start from G60','Fontsize',14); % start in an optimal way automatically [fre]=unibiv(Y,'plots',0,'rf',0.5); fre=sortrows(fre,4); init=2; bs=fre(1:init,1); [out]=FSMedaeasy(Y,bs,'plots',2,'init',init,'scaled',1); close(findobj('tag','pl_mmd')); title('Start carefully chosen using unibiv','Fontsize',14);
Warning: interchange greater than 10 when m=92 Number of units which entered=25 Attention : init1 should be larger than v. It is set to v+1. Warning: interchange greater than 10 when m=92 Number of units which entered=25
load('emilia2001') Y=emilia2001{:,:}; [fre]=unibiv(Y); %create an initial subset with the 30 observations with the lowest %Mahalanobis Distance fre=sortrows(fre,4); m0=30; bs=fre(1:m0,1); [out]=FSMedaeasy(Y,bs,'init',100); % Minimum Mahalanobis distance % Compare the plot with Figure 1.12 p. 21, ARC (2004) mmdplot(out,'ylimy',[6 14]) % Analysis of the last 16 units to enter the forward search % Compare the results with Table 1.3 p. 21 disp(out.Un(end-15:end,:));
load('emilia2001') Y=emilia2001{:,:}; % Replace zeros with min values for variables specified in sel sel=[6 10 12 13 19 21]; for i=sel Y(Y(:,i)==0,i)=min(Y(Y(:,i)>0,i)); end % Modify variables y16 y23 y25 y26 sel=[16 23 25 26]; sel=[25 26]; Y1=Y; Y1(:,sel)=100-Y1(:,sel); la0demo=[0.5,0.25,0,1,0.25,0,0,0.25,0.5]; la0weal=[0.25,0.5,0.5,1,1,0.5,-1/3,0.25,0.25,-1]; la0work=[0.25,0,1,0,0,0.25,1,1,1]; la0C2=[la0demo(1:5) la0work(1:4) la0demo(6:9) la0weal la0work(5:9)]; Y1tr=normBoxCox(Y1,1:28,la0C2); [fre]=unibiv(Y1tr); %create an initial subset with the 30 observations with the lowest %Mahalanobis Distance fre=sortrows(fre,4); m0=30; bs=fre(1:m0,1); [out]=FSMeda(Y1tr,bs,'init',100,'scaled',1); % Minimum Mahalanobis distance [out]=FSMedaeasy(Y1tr,bs,'init',100); mmdplot(out,'ylimy',[5 26]) standard=struct; standard.ylim=[4 17]; malfwdplot(out,'standard',standard);
Y
— Input data.
Matrix.n x v data matrix; n observations and v variables. Rows of Y represent observations, and columns represent variables. Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.
Data Types: single | double
bsb
— Units forming subset.
Vector.List of units forming the initial subset.
If bsb=0 (default) then the procedure starts with v units randomly chosen else if bsb is not 0 the search will start with m0=length(bsb).
Data Types: single | double
Specify optional comma-separated pairs of Name,Value
arguments.
Name
is the argument name and Value
is the corresponding value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
'init',50
, 'plots',0
, 'msg',0
, 'scaled',0
, 'nocheck',1
init
—Point where to start monitoring required diagnostics.scalar.Note that if bsb is supplied, init>=length(bsb). If init is not specified it will be set equal to floor(n*0.6).
Example: 'init',50
Data Types: double
plots
—It specify whether it is necessary to produce the plots of the
monitoring of minMD.scalar.If plots=0 (default), all plots are suppressed.
If plots=1, a plot of the monitoring of minMD among the units not belonging to the subset is produced on the screen with 1 per cent, 50 per cent and 99 per cent confidence bands.
If plots=2, the monitoring is also extended to the centroid, which is tracked in the data scatterplot.
Example: 'plots',0
Data Types: double
msg
—It controls whether to display or not messages
about great interchange on the screen.scalar.If msg==1 (default) messages are displyed on the screen else no message is displayed on the screen.
Example: 'msg',0
Data Types: double
scaled
—It controls whether to monitor scaled Mahalanobis distances.scalar.If scaled=1 Mahalanobis distances monitored during the search are scaled using ratio of determinant.
If scaled=2 Mahalanobis distances monitored during the search are scaled using asymptotic consistency factor.
The default value is 0 that is Mahalanobis distances are not scaled.
Example: 'scaled',0
Data Types: double
nocheck
—It controls whether to perform checks on matrix Y.scalar.If nocheck is equal to 1 no check is performed on matrix Y. As default nocheck=0.
Example: 'nocheck',1
Data Types: double
out
— description
StructureStructure which contains the following fields
Value | Description |
---|---|
MAL |
n x (n-init+1) = matrix containing the monitoring of Mahalanobis distances. 1st row = distance for first unit; ...; nth row = distance for nth unit. |
BB |
n x (n-init+1) matrix containing the information about the units belonging to the subset at each step of the forward search. 1st col = indexes of the units forming subset in the initial step; ...; last column = units forming subset in the final step (all units). |
mmd |
n-init x 3 matrix which contains the monitoring of minimum MD or (m+1)th ordered MD at each step of the forward search. 1st col = fwd search index (from init to n-1); 2nd col = minimum MD; 3rd col = (m+1)th-ordered MD. |
msr |
n-init+1 x 3 = matrix which contains the monitoring of maximum MD or mth ordered MD. 1st col = fwd search index (from init to n); 2nd col = maximum MD; 3rd col = mth-ordered MD. |
gap |
n-init+1 x 3 = matrix which contains the monitoring of the gap (difference between minMD outside subset and max. inside). 1st col = fwd search index (from init to n); 2nd col = min MD - max MD; 3rd col = (m+1)th ordered MD - mth ordered distance. |
Loc |
(n-init+1) x (v+1) matrix containing the monitoring of estimated of the means for each variable in each step of the forward search. |
S2cov |
(n-init+1) x (v*(v+1)/2+1) matrix containing the monitoring of the elements of the covariance matrix in each step of the forward search. 1st col = fwd search index (from init to n); 2nd col = monitoring of S(1,1); 3rd col = monitoring of S(1,2); ...; end col = monitoring of S(v,v). |
detS |
(n-init+1) x (2) matrix containing the monitoring of the determinant of the covariance matrix in each step of the forward search. |
Un |
(n-init) x 11 Matrix which contains the unit(s) included in the subset at each step of the fwd search. REMARK: in every step the new subset is compared with the old subset. Un contains the unit(s) present in the new subset but not in the old one Un(1,2) for example contains the unit included in step init+1 Un(end,2) contains the units included in the final step of the search |
Y |
Original data input matrix |
class |
'FSMeda' |
Atkinson, A.C., Riani, M. and Cerioli, A. (2004), "Exploring multivariate data with the forward search", Springer Verlag, New York.