pcaFS performs Principal Component Analysis (PCA) on raw data.
The main differences with respect to MATLAB function pca are: 1) accepts an input X also as table;
2) produces in table format the percentage of the variance explained single and cumulative of the various components and the associated scree plot, in order to decide about the number of components to retain.
3) returns the loadings in table format and shows them graphically.
4) provides guidelines about the automatic choice of the number of components;
5) returns the communalities for each variable with respect to the first k principal components in table format;
6) retuns the orthogonal distance () of each observation to the PCA subspace. For example, if the subspace is defined by the first two principal components, OD_i is computed as:
OD_i=|| z_i- V_{(2)} V_{(2)}' z_i ||where z_i is the i-th row of the original centered data matrix Z of dimension n \times v and V_{(2)}=(v_1 v_2) is the matrix of size p \times 2 containing the first two eigenvectors of Z'Z/(n-1). The observations with large OD_i are not well represented in the space of the principal components.
7) returns the score distance SD_i of each observation. For example, if the subspace is defined by the first two principal components, SD_i is computed as:
SD_i=\sqrt{(z_i'v_1)^2/l_1+ (z_i'v_2)^2/l_2 }where l_1 and l_2 are the first two eigenvalues of Z'Z/(n-1).
8) calls app biplotFS which enables to obtain an interactive biplot in which points, rowslabels or arrows can be shown or hidden. This app also gives the possibility of controlling the length of the arrows and the position of the row points through two interactive slider bars. In the app it is also possible to color row points depending on the orthogonal distance (OD_i) of each observation to the PCA subspace. If optional input argument bsb or bdp is specified, it is possible to have in the app two tabs which enable the user to select the breakdown point of the analysis or the subset size to use in the svd. The units which are declared as outliers or the units outside the subset are shown in the biplot with filled circles.
use of pcaFS on the ingredients dataset.out
=pcaFS(Y
,
Name, Value
)