The problem of representing data or any quantitative information in a graphical form suitable to human interpretation and exploration has deep roots and has been addressed in statistics and in many other scientific disciplines (Friendly 2005). One of the biggest challenges statisticians face when working on applied problems with non-statisticians is to be able to effectively present and communicate statistical results (Spence 2001, Tufte 1983). In this toolbox we have developed new interactive tools which can dynamically connect the information which comes from different “robust plots”.
The Flexible Statistics Data Analysis Toolbox™ extends the data visualization functions already present in the Statistics toolbox and in MATLAB. The extensions concern a series of robust statistics plots which we made dynamic and interactive. These new features not only help to better present the data to a non statistical audience, but also enable the researcher to highlight the presence of hidden subgroups of data.
MATLAB includes a data brushing facility for marking observations on graphs and allowing the user to remove or save them to new variables. There is also a related data linking facility for connecting graphs with workspace variables, to automatically and interactively update them. In the Forward Search context the applicability of such powerful functions is limited by the fact that with the link function two graphs can be connected, and therefore brushed, only when they refer to the same variables in the workspace.
For example, linking the Monitoring Residuals Plot with the Scatter Plot Matrix is not possible, as the former refers to the residuals of the data units at each step of the search, while the latter refers to the unit values. Besides, line plots created with line and the gplotmatrix function in particular cannot be brushed with the standard MATLAB brush function.
For these reasons in the FSDA toolbox we have re-implemented the brushing and linking facilities. In particular, we have adopted (and adapted to our needs) the powerful selectdata function by John D'Errico, available at the MATLAB Central exchange user community (http://www.mathworks.com/matlabcentral/fileexchange/13857).
Brushing is implemented in the FSDA toolbox in two modalities. A non persistent modality where the selection can be done by the user only once and a persistent brushing where the selection can be repeated multiple times. In addition, there is a persistent non cumulative brush option, where every time a brushing action is performed previous selections are removed, and a cumulative one where each selection is highlighted and appropriately reported in the legend of the graphs involved.
The above features are complemented by our customized version of the standard Matlab datatip option that, once a point in a forward trajectory is selected reports in a a tooltip box relevant information about the associated unit(s) and the related statistics.
Monitoring Residuals plot is a basic visualization tool to identify outliers and more in general groups of observations which behave differently from the rest of the data. In this plot the option datatooltip can be used to provide information about the unit selected, the step in which the unit enters the search and the associated label. On the other hand, option databrush enables the user to select a set of residual trajectories and to see them highlighted in the y|X plot, i.e. a matrix of scatter plots of y against each column of X, grouped according to the selection(s) done by brushing. If the plot y|X does not exist it is automatically created. In addition, brushed units are automatically highlighted in other forward plots (e.g. minimum deletion residual plot) if they are already open.
Minimum Deletion Residual monitors the value of minimum deletion residual in each step of the forward search. If one or more atypical observations are present in the data, the plot of minimum deletion residual will show a peak in the step prior to the inclusion of the first outlier. The plot may show a subsequent decrease, due to the effect of masking, as further outliers enter the subset. In this plot the option datatooltip can be used to provide information about the unit(s) which enter the search in the selected step. On the other hand, option databrush enables the user to select a part of the curve of minimum deletion residual and to see the corresponding units highlighted in the scatter plot matrix and in other forward plots.
Fan plot helps you identify the best value of the Box-Cox transformation parameter for the response of your regression model. The fan plot allows assessment of the proportion of the data supporting a particular transformation, information not available from other methods of analysis. In this plot the option datatooltip can be used to provide information about the unit(s) which enter the search in the selected step. On the other hand, option databrush enables the user to select a part of the curve(s) which make up the fan plot and to see the corresponding units highlighted in the scatter plot matrix of transformed and untransfomred data and in other forward plots.
yXplot plots the dependent variable against the columns of the independent variables in the input dataset. The function, based on the MATLAB gplotmatrix function, allows to select units in one of the scatter plots and to produce automatically a Monitoring Residuals plot in which labels are put for the units which fulfill various user defined criteria.