malindexplot plots the Mahalanobis distances versus a selected variable.
Numbers are from from the chi2 with 5 degrees of freedom
MCDenv=malindexplot(chi2rnd(5,100,1),5);
load('stack_loss.txt'); X=stack_loss(:,1:3); [n,v]=size(X); % Define confidence level conflev=[0.95,0.99]; figure; h1=subplot(2,1,1); % Compute traditional Mahalanobis distances mdtrad=mahal(X,X); malindexplot(mdtrad,v,'h',h1,'conflev',conflev,'labx','Index number','laby','Traditional md'); % Compute robust md [out]=FSM(X,'init',5,'plots',0); seq=1:size(X,1); good=setdiff(seq,out.outliers); mdrob=mahal(X,X(good,:)); h2=subplot(2,1,2); malindexplot(mdrob,v,'h',h2,'conflev',conflev,'labx','Index number','laby','Robust md','title','');
n=200; v=3; randn('state', 123456); Y=randn(n,v); % Contaminated data Ycont=Y; Ycont(1:5,:)=Ycont(1:5,:)+3; [RAW,REW]=mcd(Ycont); RAW.Y=Ycont; malindexplot(RAW,v,'databrush',1)
n=200; v=3; randn('state', 123456); Y=randn(n,v); % Contaminated data Ycont=Y; Ycont(1:5,:)=Ycont(1:5,:)+3; [RAW,REW]=mcd(Ycont); RAW.Y=Ycont; databrush=struct; databrush.selectionmode='Brush'; % Brush selection databrush.persist='on'; % Enable repeated mouse selections databrush.Label='on'; % Write labels of the units while selecting databrush.RemoveLabels='on'; % Remove labels after selection databrush.RemoveTool = 'off'; % Do not remove yellow tool after selection databrush.RemoveFlagged = 'off'; % Do not remove filled red color for selected points after selection databrush.labeladd = '1'; % Write number of seleceted units in the scatter plot matrix malindexplot(RAW,v,'databrush',databrush)
md
— Mahalanobis distances.
Vector or structure.Vector of Mahalanobis distances (in squared units) or a structure containing fields md and Y. In this second case md is a structure with the following fields:
Value | Description |
---|---|
md |
contains the Mahalanobis distances (this field is compulsory); |
Y |
contains the original data matrix whose Mahalanobis distances have been computed (this field is compulsory is option databrush is used). |
class |
this field is not compulsory. In the case of md.class='mcdCorAna' simulated envelopes are used to define the empirical quantiles. Note that if the simulated bands have been precalculated they can be passed through the second input argument v. |
Data Types: single|double
v
— Number of variables or matrix of size n-by-k containing empirical envelope.
Scalar or matrix with the same rows of length(md).If v is a scalar, it contains the number of variables of the original data matrix which have been used to compute md. The threshold in this case is based on the Chi^2 distribution with v degrees of freedom. If v is a matrix with size(v,1)=length(md) the empirical precalculated envelope in v are used to obtain the confidence bands.
Data Types: single|double
Specify optional comma-separated pairs of Name,Value
arguments.
Name
is the argument name and Value
is the corresponding value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
'modelT',5
, 'h',gca
, 'x','1:100'
, 'labx','unit number'
, 'laby','MD'
, 'title','Index plot of MD'
, 'numlab',{3}
, 'conflev',0.99
, 'FontSize',12
, 'SizeAxesNum',12
, 'ylimiy',[-3 3]
, 'xlimix',[1 30]
, 'lwdenv',4
, 'MarkerSize',4
, 'MarkerFaceColor','b'
, 'tag','indexPlot'
, 'databrush',1
, 'nameY',{'Y_1' Y_2'}
, 'label',{'UK' ... 'IT'}
modelT
—controls how the consistency factor is applied to account
for the effect of trimming.scalar.It is empty for the classic case when uncontaminated data are assumed to come from a normal distribution (default). If on the other hand the data are heavy-tailed and can be modelled by a Student-t distribution, modelT takes a positive value representing the degrees of freedom of the t-distribution;
if modelT is zero, then the degrees of freedom are estimated from the data (to be implemented).
Example: 'modelT',5
Data Types: double
h
—Where to plot.axis hadle.The axis handle of the Figure where to send the malindexplot. This can be used to host the malindexplot in a subplot of a complex figure formed by different panels (e.g. a panel with malindexplot from a classical mle estimator and another with Mahalanobis distances from a robust analysis, see example below).
Example: 'h',gca
Data Types: graphics handle
x
—x-axis index.vector.The vector to be plotted on the x-axis.
Default is the sequence 1:length(md).
Example: 'x','1:100'
Data Types: numeric
labx
—x label.character.A label for the x-axis (default: '').
Example: 'labx','unit number'
Data Types: character
laby
—y label.character.A label for the y-axis (default: '').
Example: 'laby','MD'
Data Types: character
title
—plot title.character.A label containing the title of the plot.
Default is 'Index plot of Mahalanobid distances'.
Example: 'title','Index plot of MD'
Data Types: character
numlab
—number of points to be labelled in the plot.vector | cell.If numlab is a cell containing scalar k, the units with the k largest md are labelled in the plots.
If numlab is a vector, the units indexed by the vector are labelled in the plot.
Default is numlab={5}, that is units with the 5 largest md are labelled.
Use numlab='' for no labelling.
Example: 'numlab',{3}
Data Types: numeric vector or cell.
conflev
—confidence interval for the horizontal bands.vector.It can be a vector of different confidence level values, e.g. [0.95,0.99,0.999]. Confidence interval is based on the chi^2 distribution.
Example: 'conflev',0.99
Data Types: numeric
FontSize
—Labels font size.scalar.Scalar which controls the font size of the labels of the axes.
Default value is 12.
Example: 'FontSize',12
Data Types: numeric
SizeAxesNum
—Numbers font size.scalar.Scalar which controls the fontsize of the numbers of the axes.
Default value is 10.
Example: 'SizeAxesNum',12
Data Types: numeric
ylimy
—ylimits.vector.Vector with two elements controlling minimum and maximum value of the y axis.
Default is '' (automatic scale).
Example: 'ylimiy',[-3 3]
Data Types: numeric
xlimx
—xlimits.vector.Vector with two elements controlling minimum and maximum value of the x axis.
Default is '' (automatic scale).
Example: 'xlimix',[1 30]
Data Types: numeric
lwdenv
—Envelope line width.scalar.Scalar which controls the width of the lines associated with the envelopes.
Default is lwdenv=1.
Example: 'lwdenv',4
Data Types: numeric
MarkerSize
—Marker size of points.scalar.Scalar specifying the size of the marker in points (1 point = 1/72 inch).
Default is MarkerSize = 6.
Example: 'MarkerSize',4
Data Types: numeric
MarkerFaceColor
—Marker fill color of points.character | length 3 RGB numeric vector.The fill color for markers that are closed shapes (circle, square, diamond, pentagram, hexagram, and the four triangles).
Example: 'MarkerFaceColor','b'
Data Types: numeric | character
tag
—Figure tag.character.Tag of the figure which will host the malindexplot.
The default tag is pl_malindex.
Example: 'tag','indexPlot'
Data Types: character
databrush
—interactive mouse brushing.empty value, scalar | structure.If databrush is an empty value (default), no brushing is done. The activation of this option (databrush is a scalar or a structure) enables the user to select a set the points in the current plot and to see them highlighted in the scatter plot matrix (spm). If spm does not exist it is automatically created.
DATABRUSH IS A SCALAR.
If databrush is a scalar the default selection tool is a rectangular brush and it is possible to brush only once (that is persist='').
DATABRUSH IS A STRUCTURE.
If databrush is a structure, it is possible to use all optional arguments of function selectdataFS and the following optional arguments: databrush.persist = persisent brushing.
Persist is an empty value or a scalar containing the strings 'on' or 'off'.
The default value of persist is '', that is brushing is allowed only once.
If persist is 'on' or 'off' brushing can be done as many time as the user requires.
If persist='on' then the unit(s) currently brushed are added to those previously brushed. it is possible, every time a new brushing is done, to use a different color for the brushed units.
If persist='off' every time a new brush is performed units previously brushed are removed.
databrush.labeladd = add labels. If this option is '1', we label in the scatter plot matrix the units of the last selected group with the unit row index in matrix Y. The default value is labeladd='', i.e. no label is added.
REMARK: the options which follow work in connection with previous option databrush and produce their effect on the scatter plot matrix of the original data.
Example: 'databrush',1
Data Types: single | double | struct
nameY
—variables labels of the original data matrix.cell.Cell array of strings containing the labels of the variables. As default value, the labels which are added are Y1, ..., Yv. This option is used just if previous option databrush is not empty.
Example: 'nameY',{'Y_1' Y_2'}
Data Types: character
label
—row labels.cell.Cell of length n containing the labels of the rows.
Example: 'label',{'UK' ... 'IT'}
Data Types: cell
MCDenv
—Empirical envelopes.
ArrayMatrix with size n-by-length(conflev) which contains the empirical confidence envelopes or vector of length length(conflev) containing teh quantiles of the reference distribution.
Rousseeuw P.J., Leroy A.M. (1987), "Robust regression and outlier detection", Wiley.