The FSDA group: milestones and synergies

The group ...

The core of the FSDA group comes both from the members of the division of Statistics and Computing of the University of Parma and the Department of Statistics of the London School of Economics (from now on ParmaLSE) made up of Anthony Atkinson (LSE), Marco Riani and Andrea Cerioli (Parma). This has resulted in the publication of two books with Springer-Verlag New York (Atkinson and Riani, 2000; Atkinson Riani and Cerioli, 2004) and many papers in prestigious international journals. The last 20 years of joint research of people from ParmaLSE and their research partners can be summarized as follows: 

1994-2002. Introduction of the new approach based on the forward search. At that time we were completely unaware of many of the potentialities of the new approach and we did not think of it as a new philosophy. The emphasis was as a new technique of exploratory data analysis in linear and nonlinear regression and generalized linear models. In the case of a single population and contaminated data, the new approach revealed itself as a powerful tool for detecting the presence of outliers, both individual and grouped, and to evaluate their effect on the results of traditional analyses (Riani and Atkinson, 2000). 


2002-2004. Extension of this technique to the analysis of multivariate data based on a combination of statistical modelling and graphical diagnostics. In these years we also realized the importance of monitoring statistics and developed diagnostic tools specifically based on these monitoring plots. Our "monitoring philosophy" started in these years.


2005-2008. In this period we began to introduce the Forward Search not only as an exploratory tool but also for inferential purposes and we have developed a theory for the construction of robust confidence bands for the curves which are monitored along the search. This work laid to the foundation for the automatic application of the procedure in multivariate analysis (Riani et al. 2009). Clearly, the automation of the procedure is necessary in concrete applications where the Forward Search has to be repeated many times. One of the objectives of the team is to extend this automated procedure to multiple regression or, more generally, to the analysis of complex data. The first step in this direction can be found in Riani et al. (2008).


2005. Well before the FSDA toolbox, an implementation of the Forward Search was done in Splus and R by Dr. Konis for the analysis of multivariate data (e.g. the Rfwdmd software downloadable from the CRAN site). Unfortunately, lack of resources prevented the maintenance of such implementations. Almost ten years later, Luca Scrucca revitalized the package ‘forward’. 


2008. The need to apply routinely the Forward Search and robust methods to huge amounts of international trade data and related antifraud problems encouraged a new collaboration between the ParmaLSE team and the European Commission's Joint Research Centre. The persons originally involved in this project at the JRC were Domenico Perrotta, Francesca Torti (working for several years also for the University of Parma), Vytis Kopustinskas and the head of the group, Spyros Arsenis. The joint activities of the two teams progressed till today with always-new stimuli and challenging problems. The FSDA toolbox is one of the main outcomes of this collaboration. 


2010. We started including in the toolbox all traditional robust estimators (S, MM and tau) with the different weight functions (Hampel, hyperbolic, optimal and biweght) together with high breakdown estimators (LMS, LTS, MVE and MCD) and their reweighted version. We have also implemented the accurate distributional approximations to the finite sample distribution of the corresponding diagnostic tools (Cerioli Riani and Atkinson, 2009; Cerioli, 2010). In this FSDA became a general tool for robust data analysis (regression and multivariate) and even changed its name in "Flexible Statistics and Data Analysis toolbox".


2014. The concept ot monitoring was applied to the traditional robust estimators (such as S, MM and tau) and to high breakdown estimators based on trimming both in the regression (such as LTS and LMS) and multivariate context (e.g. MVE, MVD). 


2015. The main robust clustering methods were introduced into the toolbox: tclust, tkmeans, rlga, and the non-robust counterpart lga. One year later the set of robust clustering tools was enriched with clusterwise regression methods (tclust-reg) and cluster weighted models (tcwrm). For these last extensions we could benefit from the precious help of Agustín Mayo-Iscar.


2016. We have always given special importance to the role of simulations and benchmark experiments in the statistical practice. For this reason we introduced in FSDA sound methods for generating mixtures of populations based on user-specified overlap: MixSim and MixSimReg. The project (quite demanding) could benefit from precious interactions with Volodymyr Melnykov, had at the Model-Based Clustering workshop series organised in Catania by Salvatore Ingrassia.

2017. We released FSDA with a new documentation system, generated automatically from the head of the .m files. The "documentation project" started at least two year before, and could see the light only in 2017. The project goes on: the ambition is to make our documentation tools friendly enough to be usable by any user who wants to do the same with his/her toolbox. 

2019. We ported the versioning system of the FSDA to github. 



... and its synergies

The ParmaLSE - JRC teams created in the last years lot of synergies with new research groups, extending considerably the network of scientists interested in (robust) solutions to anti-fraud problems and in the Forward Search as an original approach for data analysis. With apologies for possible involuntary omissions, we mention here some of them.


2009. Fruitful contacts were established with the University of Rome La Sapienza (Pierluigi Conti and Alessio Farcomeni) in view of grounding the Forward Search in the theory of empirical processes. The objective is to find the limiting trajectories to which the curves monitored in the Forward Search might converge. 


2009. Collaboration with the University of Verona (Grossi and Laurini, 2009) produced extensions of the new philosophy to the analysis of correlated observations and more specifically to the context of extreme value theory.


2010. New studies on the Forward Search philosophy were initiated by PhD students (now integrated in the Parma team) in topics such as "The forward search in Bayesian statistics (Aldo Corbellini)" and "Robust classification and mixtures (Gianluca Morelli)". 


2011. Fruitful collaboration started with the robust clustering group of the University of Valladolid (Carlos Matran, Alfonso Gordaliza, Agustín Mayo-Iscar, Carlos Matrán, Luis Angel García-Escudero), with regular meetings organised in Ispra, Parma and Valladolid. The link between the Valladolid group was naturally established, because of the close relation between their trimming approach (considered e.g. in García-Escudero and Gordaliza, 2007) and the Forward Search. Relevant outcomes of the collaboration include methodological studies (e.g. on the determination of the number of groups in TCLUST), applications of TCLUST and RLGA in antifraud, and new FSDA functions for trimmed based clustering methods, including tools for the generation of simulated clustering data based on prespecified level of overlap (MixSim). 


2012. FSDA routines have been adopted by ISTAT, the Italian National Institute of Statistics, for the correction and control of Italian Agriculture Census data (Reale, Bianchi, Francescangeli Ruocco and Manzari, ISTAT).


2013-2014. This may become an important milestone, because we are convinced that the lack of good statistical software is a main factor limiting the diffusion of the new philosophy and robust statistics in general. The milestone is about the porting of the main FSDA Forward Search functions to SAS IML and SAS IML Studio. The work, initiated in 2008 by the JRC trainee Tomas Demcenko, was revitalized and finalized in the JRC by Francesca Torti in collaboration with staff from SAS: Koen Knapen and Jos Polfliet. With the porting, now the Forward Search can be run in the SAS platform for both inferential and exploratory data analysis purposes, including several of the interactive and dynamic plotting features that characterize FSDA.


2013-2015. A second factor potentially limiting the diffusion of the new philosophy was the lack of sound asymptotic theory for the Forward Search monitoring process. Johansen and Nielsen (2013, 2014), starting from several meetings held with the ParmaLSE team, developed respectively pointwise and simultaneous confidence bands for estimators and forward residuals. Nielsen also implemented the undelying asymptotic theory in the `ForwardSearch' R package.


2017. FSDA team started collaborating with Valentin Todorov on the porting of the key FSDA regression routines to a R package. 

2022. Mara Sabina Bernardi joined the JRC and the FSDA team. She brought insights in functional data  and time series analyses. 

It is worthwhile to mention other people who share the same philosophy, who have been in contact from time to time with the “ParmaLSE team” and who have already given important theoretical and applied contributions to the forward search literature. For example, Fabio Crosilla of the University of Udine has extended the forward routines to data on photogrammetry. This extension has considerably improved the classification of spatial objects. Nadia Solaro is currently working to extend this new philosophy to multilevel models. Daniela Calò has investigated the use of forward search routines for finding groups of homogeneous observations in the context of robust cluster analysis. Silvia Salini has recently worked on robust extensions of calibration models using forward search tools. Matilde Bini has used the forward search to develop robust transformations in percentage data while Bruno Bertaccini, together with Roberta Varriale, has extended the new philosophy to the robust analysis of variance. Moustaki and Mavridis extended the forward serach approach to perform robust factor analysis. Recently, C. Chakraborty and S. S. Dhar extended the asymtptic theory monitoring approach to elliptical distributions, while M. Baragilly created new tools for assessing the number of clusters based on the idea of monitoring in presence of non elliptical distributions.


We finally hope further interaction with researchers who use approaches based on dynamic models which show strong points of contact with our philosophy.


In summary, it is true to say that the growth of the Forward Search method into a philosophy and practical approach of data analysis is largely due to the initiative and leadership of Anthony Atkinson, Marco Riani and Andrea Cerioli who, when the method was introduced, had the capacity to move the research well beyond the then understood scientific boundaries. However, we also believe that the success of this philosophy of data analysis relies on a wider combination of circumstances and people, including: