malindexplot

malindexplot plots the Mahalanobis distances for each row of the input data matrix

Syntax

  • MCDenv=malindexplot(md,v)example
  • MCDenv=malindexplot(md,v,Name,Value)example

Description

example

MCDenv =malindexplot(md, v) Mahalanobis distance plot of 100 random numbers.

example

MCDenv =malindexplot(md, v, Name, Value) Compare traditional md with robust md for the stack loss data.

Examples

expand all

  • Mahalanobis distance plot of 100 random numbers.
  • Numbers are from from the chi2 with 5 degrees of freedom

    MCDenv=malindexplot(chi2rnd(5,100,1),5);

  • Compare traditional md with robust md for the stack loss data.
  • load('stack_loss.txt');
    X=stack_loss(:,1:3);
    [n,v]=size(X);
    % Define confidence level
    conflev=[0.95,0.99];
    figure;
    h1=subplot(2,1,1);
    % Compute traditional Mahalanobis distances
    mdtrad=mahal(X,X);
    malindexplot(mdtrad,v,'h',h1,'conflev',conflev,'labx','Index number','laby','Traditional md');
    % Compute robust md
    [out]=FSM(X,'init',5,'plots',0);
    seq=1:size(X,1);
    good=setdiff(seq,out.outliers);
    mdrob=mahal(X,X(good,:));
    h2=subplot(2,1,2);
    malindexplot(mdrob,v,'h',h2,'conflev',conflev,'labx','Index number','laby','Robust md','title','');

    Related Examples

    expand all

  • Interactive example 1. Index plot Mahalanobis distance with databrush option.
  • n=200;
    v=3;
    randn('state', 123456);
    Y=randn(n,v);
    % Contaminated data
    Ycont=Y;
    Ycont(1:5,:)=Ycont(1:5,:)+3;
    [RAW,REW]=mcd(Ycont);
    RAW.Y=Ycont;
    malindexplot(RAW,v,'databrush',1)

  • Interactive example 2. Index plot Mahalanobis distance with personalized databrush option.
  • n=200;
    v=3;
    randn('state', 123456);
    Y=randn(n,v);
    % Contaminated data
    Ycont=Y;
    Ycont(1:5,:)=Ycont(1:5,:)+3;
    [RAW,REW]=mcd(Ycont);
    RAW.Y=Ycont;
    databrush=struct;
    databrush.selectionmode='Brush'; % Brush selection
    databrush.persist='on'; % Enable repeated mouse selections
    databrush.Label='on'; % Write labels of the units while selecting
    databrush.RemoveLabels='on'; % Remove labels after selection
    databrush.RemoveTool    = 'off'; % Do not remove yellow tool after selection
    databrush.RemoveFlagged = 'off'; % Do not remove filled red color for selected points after selection
    databrush.labeladd = '1'; % Write number of seleceted units in the scatter plot matrix
    malindexplot(RAW,v,'databrush',databrush)

    Input Arguments

    expand all

    md — Mahalanobis distances. Vector or structure.

    Vector of Mahalanobis distances (in squared units) or a structure containing fields md and Y. In this second case md is a structure with the following fields:

    Value Description
    md

    contains the Mahalanobis distances (this field is compulsory);

    Y

    contains the original data matrix whose Mahalanobis distances have been computed (this field is compulsory is option databrush is used).

    class

    this field is not compulsory. In the case of md.class='mcdCorAna' simulated envelopes based on the null hypothesis of independence are used to define the empirical quantiles. Note that if the simulated bands have been precalculated they can be passed through the second input argument v or through field md.mdStore.

    N

    this field is not compulsory. If this field is present N is the original contingency table in array format. If this field is present current procedure also checks if precalculated Mahalanobis distances to construct the empirical envelopes are present in field md.mdStore.

    Ntable

    this field is not compulsory. If this field is present Ntable is the original contingency table in table or timetable format. If this field is present the label of the rows which are used are taken from RAW.Ntable.Properties.RowTimes (in presence of a timetable) RAW.Ntable.Properties.RowNames (in presence of a table).

    Data Types: single|double|struct md.mdStore = this field is not compulsory. If this field is present mdstore contains the md distances for the nsimul contingency tables which have been generated.

    v — Number of variables or matrix of size n-by-k containing empirical envelope. Scalar or matrix with the same rows of length(md).

    If v is a scalar, it contains the number of variables of the original data matrix which have been used to compute md. The threshold in this case is based on the Chi^2 distribution with v degrees of freedom. If v is a matrix with size(v,1)=length(md) the empirical precalculated envelope in v are used to obtain the confidence bands. Note that the precalculated envelopes in case input is a struct with field N can also be passed through field mdStore of the input structure. In this last case this input argument is ignored and can be a missing value.

    Data Types: single|double

    Name-Value Pair Arguments

    Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

    Example: 'modelT',5 , 'h',gca , 'x','1:100' , 'labx','unit number' , 'laby','MD' , 'title','Index plot of MD' , 'numlab',{3} , 'conflev',0.99 , 'FontSize',12 , 'SizeAxesNum',12 , 'ylimy',[0 3] , 'xlimx',[1 30] , 'lwdenv',4 , 'MarkerSize',4 , 'MarkerFaceColor','b' , 'tag','indexPlot' , 'databrush',1 , 'nameY',{'Y_1' Y_2'} , 'label',{'UK' ... 'IT'}

    modelT —controls how the consistency factor is applied to account for the effect of trimming.scalar.

    It is empty for the classic case when uncontaminated data are assumed to come from a normal distribution (default). If on the other hand the data are heavy-tailed and can be modeled by a Student-t distribution, modelT takes a positive value representing the degrees of freedom of the t-distribution;

    if modelT is zero, then the degrees of freedom are estimated from the data (to be implemented).

    Example: 'modelT',5

    Data Types: double

    h —Where to plot.axis handle.

    The axis handle of the Figure where to send the malindexplot. This can be used to host the malindexplot in a subplot of a complex figure formed by different panels (e.g. a panel with malindexplot from a classical mle estimator and another with Mahalanobis distances from a robust analysis, see example below).

    Example: 'h',gca

    Data Types: graphics handle

    x —x-axis index.vector.

    The vector to be plotted on the x-axis.

    Default is the sequence 1:length(md).

    Example: 'x','1:100'

    Data Types: numeric

    labx —x label.character.

    A label for the x-axis (default: '').

    Example: 'labx','unit number'

    Data Types: character

    laby —y label.character.

    A label for the y-axis (default: '').

    Example: 'laby','MD'

    Data Types: character

    title —plot title.character.

    A label containing the title of the plot.

    Default is 'Index plot of Mahalanobis distances'.

    Example: 'title','Index plot of MD'

    Data Types: character

    numlab —number of points to be labeled in the plot.vector | cell.

    If numlab is a cell containing scalar k, the units with the k largest md are labeled in the plots.

    If numlab is a vector, the units indexed by the vector are labeled in the plot.

    Default is numlab={5}, that is units with the 5 largest md are labeled.

    Use numlab='' for no labeling. Therefore if 'numlab',5 unit 5 is labeled while 'numlab',{5} indicates that the units is the 5 largest distances have to be labelled.

    Example: 'numlab',{3}

    Data Types: numeric vector or cell or missing value.

    conflev —confidence interval for the horizontal bands.vector.

    It can be a vector of different confidence level values, e.g. [0.95,0.99,0.999]. Confidence interval is based on the chi^2 distribution.

    Example: 'conflev',0.99

    Data Types: numeric

    FontSize —Labels font size.scalar.

    Scalar which controls the font size of the labels of the axes.

    Default value is 12.

    Example: 'FontSize',12

    Data Types: numeric

    SizeAxesNum —Numbers font size.scalar.

    Scalar which controls the fontsize of the numbers of the axes.

    Default value is 10.

    Example: 'SizeAxesNum',12

    Data Types: numeric

    ylimy —ylimits.vector.

    Vector with two elements controlling minimum and maximum value of the y axis.

    Default is '' (automatic scale).

    Example: 'ylimy',[0 3]

    Data Types: numeric

    xlimx —xlimits.vector.

    Vector with two elements controlling minimum and maximum value of the x axis.

    Default is '' (automatic scale).

    Example: 'xlimx',[1 30]

    Data Types: numeric

    lwdenv —Envelope line width.scalar.

    Scalar which controls the width of the lines associated with the envelopes.

    Default is lwdenv=1.

    Example: 'lwdenv',4

    Data Types: numeric

    MarkerSize —Marker size of points.scalar.

    Scalar specifying the size of the marker in points (1 point = 1/72 inch).

    Default is MarkerSize = 6.

    Example: 'MarkerSize',4

    Data Types: numeric

    MarkerFaceColor —Marker fill color of points.character | length 3 RGB numeric vector.

    The fill color for markers that are closed shapes (circle, square, diamond, pentagram, hexagram, and the four triangles).

    Example: 'MarkerFaceColor','b'

    Data Types: numeric | character

    tag —Figure tag.character.

    Tag of the figure which will host the malindexplot.

    The default tag is pl_malindex.

    Example: 'tag','indexPlot'

    Data Types: character

    databrush —interactive mouse brushing.empty value, scalar | structure.

    If databrush is an empty value (default), no brushing is done. The activation of this option (databrush is a scalar or a structure) enables the user to select a set the points in the current plot and to see them highlighted in the scatter plot matrix (spm). If spm does not exist it is automatically created.

    DATABRUSH IS A SCALAR.

    If databrush is a scalar the default selection tool is a rectangular brush and it is possible to brush only once (that is persist='').

    DATABRUSH IS A STRUCTURE.

    If databrush is a structure, it is possible to use all optional arguments of function selectdataFS and the following optional arguments: databrush.persist = persistent brushing.

    Persist is an empty value or a scalar containing the strings 'on' or 'off'.

    The default value of persist is '', that is brushing is allowed only once.

    If persist is 'on' or 'off' brushing can be done as many time as the user requires.

    If persist='on' then the unit(s) currently brushed are added to those previously brushed. it is possible, every time a new brushing is done, to use a different color for the brushed units.

    If persist='off' every time a new brush is performed units previously brushed are removed.

    databrush.labeladd = add labels. If this option is '1', we label in the scatter plot matrix the units of the last selected group with the unit row index in matrix Y. The default value is labeladd='', i.e. no label is added.

    REMARK: the options which follow work in connection with previous option databrush and produce their effect on the scatter plot matrix of the original data.

    Example: 'databrush',1

    Data Types: single | double | struct

    nameY —variables labels of the original data matrix.cell.

    Cell array of strings containing the labels of the variables. As default value, the labels which are added are Y1, ..., Yv. This option is used just if previous option databrush is not empty.

    Example: 'nameY',{'Y_1' Y_2'}

    Data Types: character

    label —row labels.cell | vector of strings.

    Cell or vector of strings of length n containing the labels of the rows.

    Example: 'label',{'UK' ... 'IT'}

    Data Types: cell or characters or vector of strings

    Output Arguments

    expand all

    MCDenv —Empirical envelopes. Array

    Matrix with size n-by-length(conflev) which contains the empirical confidence envelopes or vector of length length(conflev) containing the quantiles of the reference distribution.

    References

    Rousseeuw P.J., Leroy A.M. (1987), "Robust regression and outlier detection", Wiley.

    This page has been automatically generated by our routine publishFS