Introduction to Dynamic Statistical Visualization

The problem of representing data or any quantitative information in a graphical form suitable to human interpretation and exploration has deep roots and has been addressed in statistics and in many other scientific disciplines (Friendly 2005). One of the biggest challenges statisticians face when working on applied problems with non-statisticians is to be able to effectively present and communicate statistical results (Spence 2001, Tufte 1983). In this toolbox we have developed new interactive tools which can dynamically connect the information which comes from different “robust plots”.

The Flexible Statistics Data Analysis Toolbox™ extends the data visualization functions already present in the Statistics toolbox and in MATLAB. The extensions concern a series of robust statistics plots which we made dynamic and interactive. These new features not only help to better present the data to a non statistical audience, but also enable the researcher to highlight the presence of hidden subgroups of data.

MATLAB includes a data brushing facility for marking observations on graphs and allowing the user to remove or save them to new variables. There is also a related data linking facility for connecting graphs with workspace variables, to automatically and interactively update them. In the Forward Search context the applicability of such powerful functions is limited by the fact that with the link function two graphs can be connected, and therefore brushed, only when they refer to the same variables in the workspace.

For example, linking the Monitoring Residuals Plot with the Scatter Plot Matrix is not possible, as the former refers to the residuals of the data units at each step of the search, while the latter refers to the unit values. Besides, line plots created with line and the gplotmatrix function in particular cannot be brushed with the standard MATLAB brush function.

For these reasons in the FSDA toolbox we have re-implemented the brushing and linking facilities. In particular, we have adopted (and adapted to our needs) the  powerful selectdata function by John D'Errico, available at the MATLAB Central exchange user community (http://www.mathworks.com/matlabcentral/fileexchange/13857).

Brushing is implemented in the FSDA toolbox in two modalities. A non persistent modality where the selection can be done by the user only once and a persistent brushing where the selection can be repeated multiple times. In addition, there is a persistent non cumulative brush option, where every time a brushing action is performed previous selections are removed, and a cumulative one where each selection is highlighted and appropriately reported in the legend of the graphs involved.

The above features are complemented by our customized version of the standard Matlab datatip option that, once a point in a forward trajectory is selected reports in a a tooltip box relevant information about the associated unit(s) and the related statistics.