malindexplot plots the Mahalanobis distances for each row of the input data matrix
Numbers are from from the chi2 with 5 degrees of freedom
MCDenv=malindexplot(chi2rnd(5,100,1),5);
load('stack_loss.txt'); X=stack_loss(:,1:3); [n,v]=size(X); % Define confidence level conflev=[0.95,0.99]; figure; h1=subplot(2,1,1); % Compute traditional Mahalanobis distances mdtrad=mahal(X,X); malindexplot(mdtrad,v,'h',h1,'conflev',conflev,'labx','Index number','laby','Traditional md'); % Compute robust md [out]=FSM(X,'init',5,'plots',0); seq=1:size(X,1); good=setdiff(seq,out.outliers); mdrob=mahal(X,X(good,:)); h2=subplot(2,1,2); malindexplot(mdrob,v,'h',h2,'conflev',conflev,'labx','Index number','laby','Robust md','title','');
n=200; v=3; randn('state', 123456); Y=randn(n,v); % Contaminated data Ycont=Y; Ycont(1:5,:)=Ycont(1:5,:)+3; [RAW,REW]=mcd(Ycont); RAW.Y=Ycont; malindexplot(RAW,v,'databrush',1)
n=200; v=3; randn('state', 123456); Y=randn(n,v); % Contaminated data Ycont=Y; Ycont(1:5,:)=Ycont(1:5,:)+3; [RAW,REW]=mcd(Ycont); RAW.Y=Ycont; databrush=struct; databrush.selectionmode='Brush'; % Brush selection databrush.persist='on'; % Enable repeated mouse selections databrush.Label='on'; % Write labels of the units while selecting databrush.RemoveLabels='on'; % Remove labels after selection databrush.RemoveTool = 'off'; % Do not remove yellow tool after selection databrush.RemoveFlagged = 'off'; % Do not remove filled red color for selected points after selection databrush.labeladd = '1'; % Write number of seleceted units in the scatter plot matrix malindexplot(RAW,v,'databrush',databrush)
md
— Mahalanobis distances.
Vector or structure.Vector of Mahalanobis distances (in squared units) or a structure containing fields md and Y. In this second case md is a structure with the following fields:
Value | Description |
---|---|
md |
contains the Mahalanobis distances (this field is compulsory); |
Y |
contains the original data matrix whose Mahalanobis distances have been computed (this field is compulsory is option databrush is used). |
class |
this field is not compulsory. In the case of md.class='mcdCorAna' simulated envelopes based on the null hypothesis of independence are used to define the empirical quantiles. Note that if the simulated bands have been precalculated they can be passed through the second input argument v or through field md.mdStore. |
N |
this field is not compulsory. If this field is present N is the original contingency table in array format. If this field is present current procedure also checks if precalculated Mahalanobis distances to construct the empirical envelopes are present in field md.mdStore. |
Ntable |
this field is not compulsory. If this field is present Ntable is the original contingency table in table or timetable format. If this field is present the label of the rows which are used are taken from RAW.Ntable.Properties.RowTimes (in presence of a timetable) RAW.Ntable.Properties.RowNames (in presence of a table). |
Data Types: single|double|struct
md.mdStore = this field is not compulsory. If this field is present
mdstore contains the md distances for the nsimul contingency
tables which have been generated.
v
— Number of variables or matrix of size n-by-k containing empirical envelope.
Scalar or matrix with the same rows of length(md).If v is a scalar, it contains the number of variables of the original data matrix which have been used to compute md. The threshold in this case is based on the Chi^2 distribution with v degrees of freedom. If v is a matrix with size(v,1)=length(md) the empirical precalculated envelope in v are used to obtain the confidence bands. Note that the precalculated envelopes in case input is a struct with field N can also be passed through field mdStore of the input structure. In this last case this input argument is ignored and can be a missing value.
Data Types: single|double
Specify optional comma-separated pairs of Name,Value
arguments.
Name
is the argument name and Value
is the corresponding value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
'modelT',5
, 'h',gca
, 'x','1:100'
, 'labx','unit number'
, 'laby','MD'
, 'title','Index plot of MD'
, 'numlab',{3}
, 'conflev',0.99
, 'FontSize',12
, 'SizeAxesNum',12
, 'ylimy',[0 3]
, 'xlimx',[1 30]
, 'lwdenv',4
, 'MarkerSize',4
, 'MarkerFaceColor','b'
, 'tag','indexPlot'
, 'databrush',1
, 'nameY',{'Y_1' Y_2'}
, 'label',{'UK' ... 'IT'}
modelT
—controls how the consistency factor is applied to account
for the effect of trimming.scalar.It is empty for the classic case when uncontaminated data are assumed to come from a normal distribution (default). If on the other hand the data are heavy-tailed and can be modeled by a Student-t distribution, modelT takes a positive value representing the degrees of freedom of the t-distribution;
if modelT is zero, then the degrees of freedom are estimated from the data (to be implemented).
Example: 'modelT',5
Data Types: double
h
—Where to plot.axis handle.The axis handle of the Figure where to send the malindexplot. This can be used to host the malindexplot in a subplot of a complex figure formed by different panels (e.g. a panel with malindexplot from a classical mle estimator and another with Mahalanobis distances from a robust analysis, see example below).
Example: 'h',gca
Data Types: graphics handle
x
—x-axis index.vector.The vector to be plotted on the x-axis.
Default is the sequence 1:length(md).
Example: 'x','1:100'
Data Types: numeric
labx
—x label.character.A label for the x-axis (default: '').
Example: 'labx','unit number'
Data Types: character
laby
—y label.character.A label for the y-axis (default: '').
Example: 'laby','MD'
Data Types: character
title
—plot title.character.A label containing the title of the plot.
Default is 'Index plot of Mahalanobis distances'.
Example: 'title','Index plot of MD'
Data Types: character
numlab
—number of points to be labeled in the plot.vector | cell.If numlab is a cell containing scalar k, the units with the k largest md are labeled in the plots.
If numlab is a vector, the units indexed by the vector are labeled in the plot.
Default is numlab={5}, that is units with the 5 largest md are labeled.
Use numlab='' for no labeling. Therefore if 'numlab',5 unit 5 is labeled while 'numlab',{5} indicates that the units is the 5 largest distances have to be labelled.
Example: 'numlab',{3}
Data Types: numeric vector or cell or missing value.
conflev
—confidence interval for the horizontal bands.vector.It can be a vector of different confidence level values, e.g. [0.95,0.99,0.999]. Confidence interval is based on the chi^2 distribution.
Example: 'conflev',0.99
Data Types: numeric
FontSize
—Labels font size.scalar.Scalar which controls the font size of the labels of the axes.
Default value is 12.
Example: 'FontSize',12
Data Types: numeric
SizeAxesNum
—Numbers font size.scalar.Scalar which controls the fontsize of the numbers of the axes.
Default value is 10.
Example: 'SizeAxesNum',12
Data Types: numeric
ylimy
—ylimits.vector.Vector with two elements controlling minimum and maximum value of the y axis.
Default is '' (automatic scale).
Example: 'ylimy',[0 3]
Data Types: numeric
xlimx
—xlimits.vector.Vector with two elements controlling minimum and maximum value of the x axis.
Default is '' (automatic scale).
Example: 'xlimx',[1 30]
Data Types: numeric
lwdenv
—Envelope line width.scalar.Scalar which controls the width of the lines associated with the envelopes.
Default is lwdenv=1.
Example: 'lwdenv',4
Data Types: numeric
MarkerSize
—Marker size of points.scalar.Scalar specifying the size of the marker in points (1 point = 1/72 inch).
Default is MarkerSize = 6.
Example: 'MarkerSize',4
Data Types: numeric
MarkerFaceColor
—Marker fill color of points.character | length 3 RGB numeric vector.The fill color for markers that are closed shapes (circle, square, diamond, pentagram, hexagram, and the four triangles).
Example: 'MarkerFaceColor','b'
Data Types: numeric | character
tag
—Figure tag.character.Tag of the figure which will host the malindexplot.
The default tag is pl_malindex.
Example: 'tag','indexPlot'
Data Types: character
databrush
—interactive mouse brushing.empty value, scalar | structure.If databrush is an empty value (default), no brushing is done. The activation of this option (databrush is a scalar or a structure) enables the user to select a set the points in the current plot and to see them highlighted in the scatter plot matrix (spm). If spm does not exist it is automatically created.
DATABRUSH IS A SCALAR.
If databrush is a scalar the default selection tool is a rectangular brush and it is possible to brush only once (that is persist='').
DATABRUSH IS A STRUCTURE.
If databrush is a structure, it is possible to use all optional arguments of function selectdataFS and the following optional arguments: databrush.persist = persistent brushing.
Persist is an empty value or a scalar containing the strings 'on' or 'off'.
The default value of persist is '', that is brushing is allowed only once.
If persist is 'on' or 'off' brushing can be done as many time as the user requires.
If persist='on' then the unit(s) currently brushed are added to those previously brushed. it is possible, every time a new brushing is done, to use a different color for the brushed units.
If persist='off' every time a new brush is performed units previously brushed are removed.
databrush.labeladd = add labels. If this option is '1', we label in the scatter plot matrix the units of the last selected group with the unit row index in matrix Y. The default value is labeladd='', i.e. no label is added.
REMARK: the options which follow work in connection with previous option databrush and produce their effect on the scatter plot matrix of the original data.
Example: 'databrush',1
Data Types: single | double | struct
nameY
—variables labels of the original data matrix.cell.Cell array of strings containing the labels of the variables. As default value, the labels which are added are Y1, ..., Yv. This option is used just if previous option databrush is not empty.
Example: 'nameY',{'Y_1' Y_2'}
Data Types: character
label
—row labels.cell | vector of strings.Cell or vector of strings of length n containing the labels of the rows.
Example: 'label',{'UK' ... 'IT'}
Data Types: cell or characters or vector of strings
MCDenv
—Empirical envelopes.
ArrayMatrix with size n-by-length(conflev) which contains the empirical confidence envelopes or vector of length length(conflev) containing the quantiles of the reference distribution.
Rousseeuw P.J., Leroy A.M. (1987), "Robust regression and outlier detection", Wiley.