Dynamic Statistical Visualization in the Forward Search


The forward search is a powerful general method for detecting multiple masked outliers and for determining their effect on inferences about models fitted to data. From the monitoring of a series of statistics based on subsets of data of increasing size we obtain multiple views of any hidden structure. 

 

Often it happens that a lot of interesting features emerge unexpectedly during the progression of the forward search only when a specific combination of forward plots is inspected at the same time. Thus, the analyst should be able to interact with the plots and redefine or refine the links among them. In the absence of dynamic linking and interaction tools, the analyst risks to miss relevant hidden information. Therefore, in this toolbox we provide the user with flexible interaction tools.

 

Using option databrush in the graphical functions (e.g. resfwdplot or mdrplot) it is possible to select the trajectories of the monitoring residuals plot and to see them highlighted in the minimum deletion residual and in the scatter plot matrix. Similarly, it is possible to start brushing from the minimum deletion residual or from the scatter plot matrix and to see highlighted the corresponding trajectories in the monitoring residuals plot.

MATLAB includes a data brushing facility for marking observations on graphs and allowing the user to remove or save them to new variables. There is also a related data linking facility for connecting graphs with workspace variables, to automatically and interactively update them. In the Forward Search context the applicability of such powerful functions is limited by the fact that with the link function two graphs can be connected, and therefore brushed, only when they refer to the same variables in the workspace.

For example, linking the Monitoring Residuals Plot with the Scatter Plot Matrix is not possible, as the former refers to the residuals of the data units at each step of the search, while the latter refers to the unit values. Besides, line plots created with line and the gplotmatrix function in particular cannot be brushed with the standard MATLAB brush function.

For these reasons in the FSDA toolbox we have re-implemented the brushing and linking facilities. In particular, we have adopted (and adapted to our needs) the  powerful selectdata function by John D'Errico, available at the MATLAB Central exchange user community (http://www.mathworks.com/matlabcentral/fileexchange/13857).

Brushing is implemented in the FSDA toolbox in two modalities. A non persistent modality where the selection can be done by the user only once and a persistent brushing where the selection can be repeated multiple times. In addition, there is a persistent non cumulative brush option, where every time a brushing action is performed previous selections are removed, and a cumulative one where each selection is highlighted and appropriately reported in the legend of the graphs involved.

The above features are complemented by our customized version of the standard Matlab datatip option that, once a point in a forward trajectory is selected reports in a a tooltip box relevant information about the associated unit(s) and the related statistics.