Getting Started with FSDA Toolbox

Analyze and model data using robust statistics estimators

Key Features

FSDA Toolbox™ provides statisticians, engineers, scientists, researchers, financial analysts with a comprehensive set of tools to assess and understand their data. Flexible Statistics Data Analysis Toolbox™ software includes functions and interactive tools for analyzing and modeling data, learning and teaching statistics.

The Flexible Statistics Data Analysis Toolbox™ supports a set of routines to develop robust and efficient regression analysis. In addition, it offers a rich set interactive graphical tools which enable us to explore the connection in the various features of the different forward plots.

All Flexible Statistics Data Analysis Toolbox™ functions are written in the open MATLAB® language. This means that you can inspect the algorithms, modify the source code, and create your own custom functions.

Analyze and model data using flexible, robust statistical methods

FSDA is a software library in support of a robust and efficient statistical analysis of data sets, ensuring an output unaffected by anomalies in the provided data or deviations from model assumptions. The tool:

  • Is especially useful in detecting in data potential anomalies (outliers), even when they occur in groups.

  • Can be used to identify sub-groups in heterogeneous data.

  • Extend functionalities in key statistical domains requiring robust analysis (cluster analysis, discriminant analysis, model selection, data transformation).

  • Integrate instruments for interactive data visualization and modern exploratory data analysis, designed to simplify the interpretation of the statistical results by the end user.

  • Provides statisticians, engineers, scientists, financial analysts a comprehensive set of tools to assess and understand their data.

  • Provides practitioners, students and teachers with functions and graphical tools for modeling complex data, learning and teaching statistics.

FSDA is developed for wide applicability. For its capacity to address problems focusing on anomalies in the data, it is expected that it will be used in applications such as anti-fraud, detection of computer network intrusions, e-commerce and credit cards frauds, customer and market segmentation, detection of spurious signals in data acquisition systems, in chemometrics (a wide field covering biochemistry, medicine, biology and chemical engineering), in issues related to the production of official statistics (e.g. imputation and data quality checks), and so on.

System requirements

FSDA:

  • Works from the current MATLAB release up to 5 years older (for example if the current release R2022a we support MATLAB from release 2017a) and uses its Statistics Toolbox.

  • It has been tested on Microsoft as well as UNIX (Linux and MacOsX) platforms.

  • It can be installed in Windows platforms automatically, with a setup program that also opens inside the MATLAB editor files containing a series of examples of use in typical statistical problems.

  • It can be installed in UNIX platforms manually, from a compressed tar-file, or automatically from a shell script. 

  • Works without major porting adaptations and with limitations on interactive graphical functions in GNU OCTAVE. OCTAVE is a software development environment mostly compatible with MATLAB, freely redistributable under the terms of the GNU General Public License (GPL) of the Free Software Foundation (http://www.gnu.org/).

Main features include:

  1. Different categories of robust statistical functions, covering:
  • Regression analysis;

  • Multivariate analysis;

  • Data transformations in regression and multivariate applications;

  • Model selection;

  • Clustering;

  • Correspondence analysis

  • Interactive statistical visualization.

  1. Graphical user interfaces that allow exploring the main interactive graphics features on plots generated by running different statistical functions.

  2. Didactic material in the form of movies (with audio) that can be run in a browser connected to the Internet.

  3. A rich collection of popular datasets provided in different data formats and fully documented.

  4. A comprehensive documentation system:

  • The head of each m-function describes the function purpose, the input-output parameters, the bibliographic references, possible function dependencies, third party acknowledgments and self-contained examples of use.
  • The corresponding html documentation pages are obtained using a parser, publishFS.m, which is able to recognize the key elements of the head of each m-function (provided they conform to the syntactical rules documented in publishFS.m). Using publishFS.m, the FSDA user can generate documentation pages for his/her new functions.
  • The extensive html documentation is perfectly integrated in the documentation system of MATLAB and can be conveniently explored from the standard MATLAB search box.

Description of use

FSDA is a MATLAB toolbox and it is therefore used within MATLAB. Typically, the user writes new scripts and/or functions including FSDA statements in an ordinary text file with '.m' extension, and executes the code from the standard MATLAB Command Window with a single command (the '.m' filename).

Many FSDA scripts automating steps of typical robust statistics tasks are available for trial in the head of each FSDA m-function and in example files.

As all MATLAB functions, FSDA functions accept input arguments and return output arguments, for example:

function [out] = FSReda(y,X,bsb,varargin)

For many functions the set of input/output parameters is so rich that it is neither convenient nor possible to treat them comprehensively in these introductory pages. Details on a specific option is retrieved from the MATLAB Command Window by typing

docsearchFS(file_name)

The order with which the optional input parameters are set does not matter.

Typically, even a well-trained practitioner will make use of few of the optional parameters available. On the other hand, a researcher will have the possibility to experiment with many internal variables controlled by optional parameters, without being forced to touch the source codes.

The use of very flexible and thus elaborated options is simplified by the adoption of data types of increasing complexity. For example, option databrush, which controls the interactive brushing features of the FSDA dynamic statistical visualization tools, can be simply neglected if the purpose is to produce traditional static plots, it can be set to a scalar (e.g. databrush=1) to make a single data selection, or it can be set to a MATLAB structure (e.g. databrush.persist='on') to make an indefinite number of selections. In general, when an option becomes a structure, the list of possible fields will be automatically set to default values and the user will only have to set what is of interest.

The output parameters are dealt with by the same principle: when a function generates a lot of information, this is organized in an output structure so that the user can extract only fields of major interest.

A second modality to use FSDA is through Graphical User Interfaces, to perform tasks interactively through controls such as buttons and sliders. The user can develop GUIs for FSDA using the standard MATLAB instruments. Few GUIs are integrated in the FSDA distribution, but they are mainly designed for demonstration purpose.

Generate the documentation of your own FSDA functions

Almost all FSDA functions are in open MATLAB language. The use of mex files obtained from the compilation of codes written in C or other languages is minimal and always accompanied by the corresponding MATLAB function. This is to facilitate the understanding of the algorithms implemented and encourage the user to enrich the toolbox with new functions.

Of course each new function should be documented. It is customary for a MATLAB user to document new functions in the head of the .m file. Only rarely the user is prepared to duplicate the effort and work on the corresponding .html documentation file. This is understandable, since the complete integration of new .html files in the standard MATLAB documentation system is not facilitated by built-in tools. In order to help the user in this time consuming but valuable task, FSDA provides some tools, which should be used in the following order:

  1. publishFS. This is a parser that generates the .html documentation page of a structured .m file.
    • The head of the .m file must contain documentation written in compliance with the syntactical rules written in the documentation of publishFS. The syntactical rules are rather intuitive, but are many and not easy to remember if applied episodically. For this reason, it may be a good practice to write documentation starting from an existing FSDA.m file, and refer to the documented syntactical rules just when the parsing errors and warnings seem difficult to interpret.
    • The tail of the .m file (i.e. any line after the end function statement) must contain a tag that identifies the category to which the new function is thought to belong. If the tag is not present, the parser does not generate the documentation page.
  2. makecontentsfileFS.m. This function generates personalized .contents files of the functions given in a folder and selected subfolders.  The files to include inside the contents can be filtered according to their filename or content makecontentsFS also returns an output structure containing the list of the files together with information on their location, creation dates, and so on.
    The .m function publishFSallFiles calls routine publishFS for the list of files generated by makecontentsFS

  3. publishFunctionAlpha.m and publishFunctionCate.m. These functions generate the categorical and alphabetical index pages of the documenation system, starting from the list generated by makecontentsfileFS. An option of publishFunctionAlpha.m enables us to obain a .txt file (named function-alpha.txt) which contains the names of all files present indexed by makecontentsfileFS separated by commas. In our HTML page automatically created by our parser publishFS we have included a javascript which calls function-alpha.txt and automatically includes a navigation bar to previous and next file in alphabetical order.

  4. publishBibliography.m This functions generates page bibliography starting from the output of routine publishFSallFiles.

Do you want to contribute to the FSDA project?

If you arrive at the point of writing new functions and documentation pages compliant with the FSDA philosophy, it means that you have enough energies to take part in the FSDA project. In this case, please check our websites (https://github.com/UniprJRC/FSDA  and http://rosa.unipr.it /) for open projects and feel free to contact us at fsda@unipr.it.


The developers of the toolbox The forward search group Terms of Use Acknowledgments