grpstatsFS calls grpstats and reshapes the output in a much better way
grpstatsFS calls grpstats, but shows the output in much better way. The output of grpstatsFS is a table with a number of rows equal to the number of variables for which statistics are computed. The number of columns of this table is equal to the number of statistics which are computed. By default, the statistics which are computed are the two (non robust and robust) indexes of location, (mean and median) the two (non robust and robust) indexes of spread (standard deviation and scaled MAD) and the two (non robust and robust) indexes of skewness.
The robust index of skewness is the medcouple. The scaled MAD is defined as 1.4826(med|x-med(x)|).
In presence of a grouping variable the number of rows of the output table remains the same, but the number of columns is equal to the number of statistics multiplied by the number of groups. With option 'OutputFormat' 'nested' which is similar to option 'OutputFormat' in function pivot, it is possible to display the results using nested tables.
grpstatsFS with just one input argument.statTable
=grpstatsFS(TBL
,
groupvars
,
whichstats
)
grpstatsFS with second input the grouping variable.statTable
=grpstatsFS(TBL
,
groupvars
,
whichstats
,
Name, Value
)
Load a table
load citiesItaly.mat % Compute mean, median, std, MAD, skewness and medcouple % for the 7 variables of the input table citiesItaly TBL=grpstatsFS(citiesItaly); disp(TBL)
mean median std MAD skewness medcouple ______ ______ ______ ______ ________ _________ addedval 18096 18469 4941.6 6001.8 0.1079 -0.12451 depos 7769.1 7661.1 2841.4 3150.5 1.0734 -0.03612 pensions 10044 9975.2 1230.8 1170.6 0.31692 0.089163 unemploy 10.173 6.42 7.8789 4.8778 1.0687 0.61846 export 23.11 21.53 15.642 16.457 0.5797 0.066667 bankrup 30.467 29.62 12.11 11.49 1.0067 -0.010969 billsoverd 44.614 40.6 22.783 19.496 0.98031 0.1586
load citiesItaly.mat % The first 46 rows are referred to provinces located in northern Italy and % the remaining in centre-south Italy. zone=[repelem("N",46) repelem("CS",57)]'; % Add zone to citiesItaly citiesItaly.zone=zone; TBL=grpstatsFS(citiesItaly,"zone"); % First 6 columns are referred to province of the north. % The remaning columns to the other provinces. disp(TBL)
meanN medianN stdN MADN skewnessN medcoupleN meanCS medianCS stdCS MADCS skewnessCS medcoupleCS ______ _______ ______ ______ _________ __________ ______ ________ ______ ______ __________ ___________ addedval 22071 21936 2926.4 2136.5 0.78292 -0.024878 14888 14049 3760.7 3500.3 0.83573 0.23097 depos 9614.6 9407.5 2214.3 1052.6 2.9736 -0.054686 6279.7 5506.5 2389.6 1810.8 1.5434 0.37286 pensions 10798 10658 891.71 716.05 0.8251 0.14446 9434.9 9465.1 1129.5 1103.6 0.95725 -0.15654 unemploy 4.4713 4.34 1.6834 1.7643 0.81049 0.0022075 14.774 15.25 7.9086 10.126 0.35304 -0.10805 export 30.366 29.025 13.042 13.84 0.48928 0.088586 17.255 12.94 15.193 14.366 1.1879 0.29376 bankrup 27.855 27.745 10.451 10.215 0.73877 -0.10754 32.575 31.64 13.009 12.691 1.0051 0.010132 billsoverd 31.775 30.2 14.368 10.156 1.2639 0.056604 54.975 52.34 23.127 19.422 0.68542 0.069632
The second element is empty, that is there is no grouping variable.
load citiesItaly.mat % Just compute mean and median stats=["mean" "median"]; TBL=grpstatsFS(citiesItaly,[],stats); disp(TBL)
mean median ______ ______ addedval 18096 18469 depos 7769.1 7661.1 pensions 10044 9975.2 unemploy 10.173 6.42 export 23.11 21.53 bankrup 30.467 29.62 billsoverd 44.614 40.6
load citiesItaly.mat % multiple types of summary statistics specified as a cell stats={"mean","std",@skewness}; TBL=grpstatsFS(citiesItaly,[],stats); disp(TBL)
mean std skewness ______ ______ ________ addedval 18096 4941.6 0.1079 depos 7769.1 2841.4 1.0734 pensions 10044 1230.8 0.31692 unemploy 10.173 7.8789 1.0687 export 23.11 15.642 0.5797 bankrup 30.467 12.11 1.0067 billsoverd 44.614 22.783 0.98031
load citiesItaly.mat % Note that in this case meanci has in output two columns stats={"meanci" 'mean'}; % Confidence interval for the sample means TBL=grpstatsFS(citiesItaly,[],stats); disp(TBL)
meanCIinf meanCIsup mean _________ _________ ______ addedval 17131 19062 18096 depos 7213.7 8324.4 7769.1 pensions 9803.2 10284 10044 unemploy 8.6327 11.712 10.173 export 20.053 26.167 23.11 bankrup 28.101 32.834 30.467 billsoverd 40.161 49.066 44.614
load citiesItaly.mat % Note that in this case meanci has in output two columns stats={"meanci" 'mean'}; % The first 46 rows are referred to provinces located in northern Italy and % the remaining in centre-south Italy. zone=[repelem("N",46) repelem("CS",57)]'; % Add zone to citiesItaly citiesItaly.zone=zone; % Confidence interval for the sample means separated for the 2 groups TBL=grpstatsFS(citiesItaly,"zone",stats); disp(TBL)
meanCIinfN meanCIsupN meanN meanCIinfCS meanCIsupCS meanCS __________ __________ ______ ___________ ___________ ______ addedval 21202 22940 22071 13891 15886 14888 depos 8957 10272 9614.6 5645.6 6913.7 6279.7 pensions 10533 11063 10798 9135.3 9734.6 9434.9 unemploy 3.9714 4.9712 4.4713 12.675 16.872 14.774 export 26.493 34.239 30.366 13.224 21.286 17.255 bankrup 24.752 30.959 27.855 29.124 36.027 32.575 billsoverd 27.508 36.042 31.775 48.839 61.111 54.975
load citiesItaly.mat % Note that in this case meanci has in output two columns stats={"meanci" 'mean'}; % The first 46 rows are referred to provinces located in northern Italy and % the remaining in centre-south Italy. zone=[repelem("N",46) repelem("CS",57)]'; % Add zone to citiesItaly citiesItaly.zone=zone; % 99 per cent confidence intervals for the sample means separated for the 2 groups TBL=grpstatsFS(citiesItaly,"zone",stats,'Alpha',0.01); disp(TBL)
meanCIinfN meanCIsupN meanN meanCIinfCS meanCIsupCS meanCS __________ __________ ______ ___________ ___________ ______ addedval 20911 23232 22071 13560 16217 14888 depos 8736.5 10493 9614.6 5435.7 7123.6 6279.7 pensions 10445 11152 10798 9036 9833.9 9434.9 unemploy 3.8037 5.1389 4.4713 11.98 17.567 14.774 export 25.194 35.538 30.366 11.889 22.621 17.255 bankrup 23.711 32 27.855 27.981 37.17 32.575 billsoverd 26.077 37.473 31.775 46.807 63.143 54.975
load citiesItaly.mat % Note that in this case meanci has in output two columns stats={"meanci" 'mean'}; % The first 46 rows are referred to provinces located in northern Italy and % the remaining in centre-south Italy. zone=[repelem("N",46) repelem("CS",57)]'; % Add zone to citiesItaly citiesItaly.zone=zone; % Confidence interval for the sample means separated for the 2 groups TBL=grpstatsFS(citiesItaly,"zone",stats,'DataVars',["addedval" "unemploy"]); disp(TBL)
meanCIinfN meanCIsupN meanN meanCIinfCS meanCIsupCS meanCS __________ __________ ______ ___________ ___________ ______ addedval 21202 22940 22071 13891 15886 14888 unemploy 3.9714 4.9712 4.4713 12.675 16.872 14.774
load citiesItaly.mat % Note that in this case meanci has in output two columns stats={"median" 'mean'}; TBL=grpstatsFS(citiesItaly,[],stats, ... 'DataVars',[1 2 5],'VarNames', ... ["Robust location" "Non robust location"]); disp(TBL)
Robust location Non robust location _______________ ___________________ addedval 18469 18096 depos 7661.1 7769.1 export 21.53 23.11
load citiesItaly.mat % Note that in this case meanci has in output two columns stats={"meanci" 'mean'}; % The first 46 rows are referred to provinces located in northern Italy and % the remaining in centre-south Italy. zone=[repelem("N",46) repelem("CS",57)]'; % Add zone to citiesItaly citiesItaly.zone=zone; % Confidence interval for the sample means separated for the 2 groups TBL=grpstatsFS(citiesItaly,"zone",stats, ... 'DataVars',["addedval" "unemploy"],'VarNames', ... ["Mean: lower confidence interval" "Mean: upper confidence interval" "Sample mean"]); disp(TBL)
Mean: lower confidence intervalN Mean: upper confidence intervalN Sample meanN Mean: lower confidence intervalCS Mean: upper confidence intervalCS Sample meanCS ________________________________ ________________________________ ____________ _________________________________ _________________________________ _____________ addedval 21202 22940 22071 13891 15886 14888 unemploy 3.9714 4.9712 4.4713 12.675 16.872 14.774
load citiesItaly.mat % Two measures of location and dispersion stats={'mean' 'median' 'std' @(x)1.4826*mad(x,1)}; % The first 46 rows are referred to provinces located in northern Italy and % the remaining in centre-south Italy. zone=[repelem("N",46) repelem("CS",57)]'; % Add zone to citiesItaly citiesItaly.zone=zone; % Requested statistics for the 2 groups using nested tables TBL=grpstatsFS(citiesItaly,"zone",stats,'OutputFormat','nested'); format bank disp(TBL)
mean median std (x)1.4826*mad(x,1) ____________________ ____________________ __________________ __________________ N CS N CS N CS N CS ________ ________ ________ ________ _______ _______ _______ _______ addedval 22071.47 14888.47 21936.30 14049.16 2926.39 3760.72 2136.45 3500.31 depos 9614.60 6279.66 9407.50 5506.50 2214.32 2389.55 1052.64 1810.80 pensions 10798.16 9434.94 10658.22 9465.15 891.71 1129.48 716.05 1103.65 unemploy 4.47 14.77 4.34 15.25 1.68 7.91 1.76 10.13 export 30.37 17.25 29.02 12.94 13.04 15.19 13.84 14.37 bankrup 27.86 32.58 27.75 31.64 10.45 13.01 10.22 12.69 billsoverd 31.77 54.97 30.20 52.34 14.37 23.13 10.16 19.42
TBL
— Input data.
Table.Table containing n observations on p variables.
Rows of TBL represent observations, and columns represent variables. If it necessary to compute the statistics for subgroups, TBL must include at least one grouping variable, which you specify using groupvars.
Data Types: single| double
groupvars
— grouping variable.
Identifiers for the grouping variables in input TBL.If groupvars is [] then the output refers to the overall sample. For additional information on groupvars see the help of grpstats. For example Example - 'groupvars',2
Data Types: character vector | string array | cell array of character vectors | vector of positive integers | logical vector | []
whichstats
— Types of summary statistics.
Name of the statistics which have to be computed.For additional information on whichstats see the help of grpstats. If whichstats is empty or it is not specified, the summary statistics which are computed are ["mean" "median" "std" "MAD" "skewness" "medcouple"];
Example - ["mean" "std"]
Data Types: character vector | string array | function handle | cell array of character vectors or function handles.
Specify optional comma-separated pairs of Name,Value
arguments.
Name
is the argument name and Value
is the corresponding value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
'Alpha',0.01
, 'DataVars',[2 4]
, 'VarNames',["location" "robust location"];
, 'OutputFormat','nested';
Alpha
—Significance level.scalar in [0 1).Significance level for confidence and prediction intervals.
For additional information on Alpha see the help of grpstats.
Example: 'Alpha',0.01
Data Types: double
DataVars
—Table variables for which to compute summary statistics.for additional information on Alpha see the help of grpstats.
Example: 'DataVars',[2 4]
Data Types: character vector | string array | cell array of character vectors | vector of positive integers | logical vector
VarNames
—Variable names for output table.cell array | characters | string array.Note that the length of VarNames must be equal to the number of statistics which are computed. Variable (column) names for the output table statTable, specified as a string array or a cell array of character vectors. By default, grpstatsFS removes the @ if it is present in the name of the statistic. In presence of a grouping variable grpstatsFS appends the name corresponding to each category of the groups.
Example: 'VarNames',["location" "robust location"];
Data Types: character vector | string array | cell array of character vectors | vector of positive integers | logical vector
OutputFormat
—Column hierarchy output format.'flat' (default) | 'nested' In presence of one | more grouping variable, this option specifies whether the label of each statistic must be concatenated with the categories of the grouping variables | the overall output table must contain nested tables each referred to a different statistic.
Example: 'OutputFormat','nested';
Data Types: character | string
statTable
—Summary statistics for the table input TBL.
table with p rows.
TableThe rows are referred to the variables of the input table and the columns to the requested statistics.
The number of columns of statTable is equal to the number of requested statistics multiplied by the number of groups.