(Evangelos Grigoroudis, Yannis Siskos (Auth.) ) Cus
(Evangelos Grigoroudis, Yannis Siskos (Auth.) ) Cus
(Evangelos Grigoroudis, Yannis Siskos (Auth.) ) Cus
Rosa Arboretti · Arne Bathke
Stefano Bonnini · Paolo Bordignon
Eleonora Carrozzo · Livio Corain
Luigi Salmaso
Parametric and
Nonparametric
Statistics for
Sample Surveys
and Customer
Satisfaction Data
123
SpringerBriefs in Statistics
More information about this series at http://www.springer.com/series/8921
Rosa Arboretti • Arne Bathke • Stefano Bonnini
Paolo Bordignon • Eleonora Carrozzo
Livio Corain • Luigi Salmaso
Parametric and
Nonparametric Statistics for
Sample Surveys and
Customer Satisfaction Data
123
Rosa Arboretti Arne Bathke
Department of Civil and Environmental Natural Science
Engineering University of Salzburg
University of Padova Salzburg, Austria
Padova, Italy
Paolo Bordignon
Stefano Bonnini Department of Management and Engineering
Department of Economics and Management University of Padova
University of Ferrara Padova, Italy
Ferrara, Italy
Livio Corain
Eleonora Carrozzo Department of Management and Engineering
Department of Management and Engineering University of Padova
University of Padova Padova, Italy
Padova, Italy
Luigi Salmaso
Department of Management and Engineering
University of Padova
Padova, Italy
© The Author(s), under exclusive licence to Springer International Publishing AG, part of Springer
Nature 2018
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, express or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer International Publishing AG part
of Springer Nature.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This book deals with problems related to the evaluation of customer satisfaction in
very different contexts and in many different ways. Analyzing satisfaction is not an
easy issue since it represents a complex phenomenon which is not directly measur-
able. Often satisfaction about a product or service is investigated through suitable
surveys which try to capture the satisfaction about several partial aspects which
characterize the perceived quality of that product or service.
In this book we present a series of statistical techniques adopted to analyze data
from real situations where customer satisfaction surveys were performed. The aim
is to give a simple guide of the variety of analysis that can be performed when
analyzing data from surveys.
Experiencing satisfaction when customers buy products or services is an index
of how a company is operating. Are goods or services appreciated by customers?
Are customers satisfied with goods or services of a specific company?
How they respond to similar questions is a crucial point in order to evaluate
and analyze the answers. For this purpose preference evaluation methods are good
candidates to understand how customers react to the evaluated items. A promising
and theory-based method is called CUB model. In the first chapter of this book
CUB model has been adopted in order to evaluate two latent variables, feelings and
uncertainty, that are supposed to be involved in the choice process of an item. The
application field refers to a genuine study conducted in Italy and in Austria about
the satisfaction level of customers about food packaging at the grocery store.
The second chapter deals with the concept of heterogeneity in satisfaction. Iden-
tifying customer groups characterized by “within homogeneity” and “between het-
erogeneity” could be a useful starting point of market segmentation. In this chapter
the main heterogeneity indices are introduced and testing methods for comparing
the satisfaction heterogeneities of two or more customer populations are described
also with different practical examples.
In the field of satisfaction assessment it is quite common that the final objective
is obtaining an appropriate ordering of different products or services under compar-
v
vi Preface
ison. From a statistical point of view the issue of ranking several populations from
the best to the worse on the basis of one or more aspects of interest is not so easy. In
the third chapter of this book different examples of contexts where the problem of
ranking occurs are described and a nonparametric inferential approach is presented
with application to the field of food sensory analysis.
Another way to assess satisfaction is represented by the so-called composite in-
dicators which aggregate different dimensions of satisfaction into a single overall
indicator. How to suitably compute such indicator is the topic of Chap. 4. In this
chapter the construction of a composite indicator is discussed in general and a non-
parametric composite indicator which includes different benchmarks of satisfaction
is developed. The properties of the proposed indicator are shown by analyzing data
from a university students’ satisfaction survey.
Finally Chap. 5 describes some rank-based procedures for analyzing surveys data
with the help of a useful R package.
The work was also partially supported by the University of Ferrara, which funded
the FIR (Research Incentive Fund) project “Advanced Statistical Methods for Data
Analysis in Complex Problems.”
vii
Contents
ix
x Contents
The CUB model [12], where CUB stands for Combination of a discrete Uniform
and a shifted Binomial distributions assumes the involvement of two latent variables
during an evaluation process, that have been called feeling and uncertainty. In order
to justify the names for latent variables, consider the way you choose an evaluation
grade from a set of 9. The final choice reflects your feeling about the evaluated item,
your past experience, your knowledge about it, and so on. On the other hand, there
are some aspects concern with a basic uncertainty about the evaluated item, for ex-
ample you are asked to deal with it for the first time and you don’t know what grade
to choose, maybe the task is too difficult or the task is annoying you. These two
main components are supposed to move your final choice and they are supposed
to follow respectively a shifted Binomial distribution and a Uniform distribution
[12, 18]. The CUB models were first described as a suitable method for preference
evaluation [12]. There are two main approaches for evaluating preferences i.e. stated
and revealed preferences (see e.g. [1, 22]) and CUB models have been considered
a stated preference approach [18]. The probabilistic structure described by D’Elia
and Piccolo [12] considers the psychological process of evaluating a specific item
in a survey, where subjects are usually asked to evaluate some items (products, ser-
vices, etc.) by means of ranking or rating scales. The second one is usually the
most preferred by respondents. One of the first applications of CUB models was
described by Piccolo and D’Elia [25] where models were applied to a large data-set
on preferences about smoked salmons. Subjects were asked to evaluate five brands
of smoked salmons on a 9-point scale. In a first step, the authors estimated the two
latent variables, feeling and uncertainty, subsequently they introduced subject’s and
object’s covariates into the model in order to link some relevant information to the
latent variables. The CUB model with covariates has been extensively applied in
many studies after their introduction [5, 6]. CUB models have also been applied for
customer satisfaction investigations and they have been proven to help at dealing
with marketing questions. For instance, in a study on wine consumers CUB model
with covariates have been applied in order to identify which wine characteristics are
more relevant for Italian wine consumers [8]. For other examples of applications to
marketing issues see [20, 21]. The following section will describe the CUB model
structure and some extensions.
The CUB model is a mixture model that aims at evaluating two latent variables,
feeling and uncertainty, that are supposed to be involved in the choice process of an
item. The process of selecting a grade among m can be represented by the mixture
of two components: the liking—disliking towards the evaluating item (products or
services) and an inherent uncertainty that belongs to any human choice [12]. The
two components can be described by a mixture model of two random variables,
feeling and uncertainty, with a shifted Binomial and a Uniform distribution respec-
tively [12, 25]. When respondents are involved in the choice process of an evaluation
grade, they synthesize what they feel so that it has been adopted the wide term feel-
ing to cover the several psychological aspects surrounding the final choice, such
as motivation, awareness, attraction, knowledge, etc. A shifted Binomial random
variable is supposed to simulate the mechanism of selecting an evaluation grade
by means of paired comparisons among the grades [9]. Let us consider the scale
y = 1, . . . , m. In the choice of a grade y, you show to prefer that point over the lower
and higher ones because they are not matching what you feel. Let p and 1 − p be
the probability of rejecting a point respectively because it is too low and too high
compared to y, so the probability of choosing a grade y can be described by a shifted
Binomial distribution as follows [19]:
m − 1 y−1
Pr(Y = y) = p (1 − p)my (1.1)
y−1
y = 1, 2, . . . , m, where Y varies from 1 to m, ξ ∈ [0, 1], π ∈ (0, 1]. Iannario [16] shows
that the model is identified for m > 3. The parameter vector θ = (π , ξ ) belongs to
the parameter space Ω (θ ) = {(π , ξ ) : 0 < π ≤ 1, 0 ≤ ξ ≤ 1}. This is a very flexible
distribution because it can assume many different shapes considering we have only
two parameters, π and ξ [12, 23].
The parameter π is an estimate of uncertainty and (1 − π ) is considered a mea-
sure of uncertainty with (1 − π )/m the uncertainty shared by response categories.
Parameter ξ estimates the feeling component that affects a choice process, whose
interpretation depends on scale coding. Let us consider a 9-point scale with y = 1 as
minimum, then an estimate of ξ close to 1 indicates a feeling component very close
to minimum. In this case, the distribution of observed values has a mode close to
or at the point y = 1, so that (1 − ξ ) can be considered a measure of feeling. In this
case, high values of (1 − ξ ), i.e. values close to 1, indicate a mode close to y = 9
which means a high feeling/liking. In order to show the flexibility of the model, a
graphic (see Fig. 1.1) is aimed to show the distribution shapes of CUB model vary-
ing π and ξ parameter values. When ξ = 0.5 the distribution is symmetric with
mode on the central value. As (1 − ξ ) parameter increases from 0 to 1, the mode
values increase showing that (1 − ξ ) can be considered a direct measure of liking
[18]. About uncertainty, when π increases the distribution assumes a bell shape and
values on y axis increase rapidly. A pattern like this would mean a low uncertainty
among subjects’ responses.
Fig. 1.1 The distribution shapes of CUB model varying π and ξ parameter values
4 1 The CUB Models
D’Elia [10] and Piccolo [24] developed an E-M algorithm for the maximum like-
lihood estimate of parameters π and ξ and they present a detailed description of the
E-M algorithm.
Since their introduction, CUB models have been supplied with several extensions.
The main scope of extensions was to let CUB models take into considerations some
aspects affecting the choice process, for instance the introduction of covariates into
the model. D’Elia [10] and Piccolo [24] provide a formal description of CUB model
with covariates. Information about subjects or objects can be introduced into the
model and linked to uncertainty and feeling by means of two logistic functions.
Consider the following specification of CUB model:
m−1 (1 − πi )
Pr(Yi = yi ) = πi (1 − ξi )yi −1 ξim−yi + (1.3)
yi − 1 m
and
1 1
ξi = = . (1.5)
1 + exp−γ0 −β1 wi1 −...−γq wiq 1 + exp−γ0 −wi γ
The CUB(p,q) model indicates p covariates for π and q covariates for ξ . By this
way, parameters can be linked to subjects’ covariates (gender, age, income, etc.)
or objects’ covariates (some characteristic of the item), showing which informa-
tion affects feeling ad uncertainty. The CUB model with covariates is an effective
approach to identify which objects’ or subjects’ characteristics are relevant in the
choice process. In particular, the approach can delineate clusters based on signifi-
cant covariates, which outline groups of subjects that behave differently in terms of
feeling and uncertainty. For example, Piccolo and D’Elia [25] applying CUB mod-
els with covariates identified a relation between gender and age in the likeness of
smoked salmons. Moreover they showed how sensory perception changes with re-
spect to different chemical characteristics of smoked salmon. Furthermore Iannario
and Piccolo [18] applied CUB models to investigate the link between satisfaction
and some relevant characteristics of a product. Iannario [15] introduced the con-
cept of shelter choice that has been supposed to be present in surveys with atypical
patterns of observed frequencies. An atypical pattern can be the results of an over-
selection of a specific grade because of unwillingness to respond or privacy reasons.
1.1 Description of CUB Models Structure 5
The CUB model with shelter choice describes the probability distribution of a
random variable Y , with θ = (π1 , π2 , ξ ) the parameter vector belonging to the pa-
rameter space defined as Ω (θ ) = {(π1 , π2 , ξ ) : π1 > 0, π2 ≥ 0, π1 + π2 ≤ 1, 0 ≤ ξ ≤
(c) (c) (c)
1}. Dy is a degenerate random variable with Dy = 1 when y = c and Dy = 0
elsewhere. The equivalence δ = 1 − π1 − π2 quantity the shelter effect at Y = c. The
model can be formulated again considering the parameter vector Θ = (π , ξ , δ ) as
follows:
(c)
Pr(Y = y) = (1 − δ )[π by (ξ ) + (1 − π )Uy ] + δ Dy (1.7)
where by is the shifted Binomial distribution and Uy is the discrete Uniform distri-
bution. Considering the two formulations of the CUB model with shelter effect, the
following relation among parameters can be determined:
π1 = (1 − δ )π π = π1π+1π2
⇐⇒
π2 = (1 − δ )(1 − π ) δ = 1 − π1 − π2
model. The model with a Beta-binomial distribution has been called CUBE model
(Combination of a Uniform and a shifted BEta-binomial).
The model
1
Pr(Y = y) = π · be(ξ , φ ) + (1 − π ) , y = 1, . . . , m (1.8)
m
represents the probability distribution of a random variable Y with the parameter
vector θ = (π , ξ , φ ) belonging to the parameter space Ω (θ ) = (π , ξ , φ ) : 0 < π ≤ 1,
0 ≤ ξ ≤ 1, 0 ≤ φ < ∞, and with the parameter φ = 0 indicating an overdispersion
effect. In CUBE model, a relevant part is represented by a Beta-binomial distribution
be(ξ , φ ) whose derivation is explained in [17]. The Beta-binomial distribution is as
follows,
Pr(Y = y) = be(ξ , φ ) =
y m−y+1
m−1 ∏k=1 [1 − ξ + φ (k − 1)] ∏k=1 [ξ + φ (k − 1)]
= x (1.9)
y − 1 [1 − ξ + φ (y − 1)] [ξ + φ (m − y)] ∏m−1
k=1 [1 + φ (k − 1)]
y = 1, . . . , m, where parameters ξ and φ are linked to feeling and overdispersion
respectively.
Let us consider the two central moments of first order of a Beta-binomial random
φ
variable E(Y ) = ξ + m(1 − ξ ) and Var(Y ) = (m − 1)ξ (1 − ξ )(1 + (m−2)1+φ ) variance
increases of an amount depending on parameter φ > 0, that is linked to overdisper-
sion. The CUB models described so far give an overview of their great power in
explaining more than one behaviors related to a choice process. We stressed par-
ticularly on conceptual meaning of CUB models, notwithstanding we gave specific
references for a more detailed description of statistical derivation of models.
Models are usually considered useful tools for studying phenomena, but are they
adequately representing the object observed? In order to evaluate if a model is use-
ful, we need to have a measure of model fitting to observed data. In fact, all models
are wrongs but some are useful [4]. CUB models estimate probabilities given a pa-
rameter vector θ . The estimated probabilities py (Y = y|θ ) should be very close to
observed probabilities fy when the model is good. The following index of fitting,
called normalized dissimilarity index, is a measure of distance between estimated
and observed frequencies:
m
Diss = 0.5 ∑ fy − py (θ ) . (1.10)
y=1
portion of respondents that should change their choice in order to get a perfect fit-
ting [7]. While CUB model with shelter choice and CUBE model allow a Diss in-
dex, an index for CUB models with covariates cannot be derived. When comparing
CUB models with covariates it should be compared likelihood measures between
CUB model without and with covariates. The log-likelihood can be used to com-
pare nested models [13] and all CUB model extensions are nested into the CUB
model that estimates parameters ξ and π . In order to measure the goodness of a
CUB model with covariates, we have to rely on log-likelihood comparison. Let us
consider the relation
(θ0 ) ≤ (θ ) ≤ (θsat ) (1.11)
with (θ0 ) the likelihood of null model (only the constant), (θ ) the model es-
timated likelihood and (θsat ) the saturated model likelihood [13]. This relation
implies that greater is likelihood and better is an estimated model so that it make
sense to compare a CUB model (0, 0), i.e. θ (π , ξ ), with a CUB model (p, q), i.e.
θ = (βi , γ j ), i = 1, . . . , p + 1 and j = 1, . . . , q + 1.
The Likelihood Ratio Test, LRT = −2((θ ) − (θ )), is a measure of deviance
between log-likelihoods of two models with one model nested in another. The prob-
ability distribution of LRT follows a χ 2 distribution with degrees of freedom equal
to the difference between parameters of compared models, as it shows Table 1.1.
Such a test gives an indication on how good is a nested model compared to the
basic one. With respect to shelter choice and overdispersion, LRT should be taken
into account the distribution χ 2 with 1 degree of freedom when comparing log-
likelihoods. Moreover, because of a particular parametrization, the p-value should
be halved [15, 17]. In the next sections will be described applications of CUB mod-
els to a case study on food packaging.
been conducted in Italy and in Austria, collecting opinions and satisfaction grades
of 209 subjects. The survey was centred on questions about packaging attributes like
capacity to preserve food, resealability and easy peel. About the buying behaviour,
subjects were asked to rate attention paid to several aspects, e.g. brand, packaging,
price, etc. (Table 1.2) shows the main variables and coding of measurement scale.
Subjects were also asked demographic and habit information, some of them in-
troduced as covariates (see Table 1.3) to improve the CUB model fitting. The sur-
vey included questions on attention level about some aspects related to products
respondents usually buy at grocery stores (attention variables). Respondents were
also asked to evaluate their satisfaction level about the following specific packaging
characteristics: the ability to preserve foods, the resealability and easy peel. More-
over, they were asked to express opinions about packaging and preservatives that
are usually involved in food preservation (satisfaction and opinion variables).
CUB models have been applied to attention variables and results are shown in Ta-
ble 1.4. Results show parameter estimate, dissimilarity index and log-likelihood of
each variable. Dissimilarity indexes are lower than 0.12 except for Advertisement.
CUB model has two advantages: the first one is a model with only two parameters
indicating we have a parsimonious model, the second one is a useful visual descrip-
tion by representing parameters into two dimensional space (Fig. 1.2).
Figure 1.2 indicates that the variable with the highest attention and the lowest
uncertainty is price, while in the opposite position there is innovation. Whereas
price and discounted price get the highest attention and the lowest uncertainty, re-
spondents don’t pay much attention to nutrition facts and innovation revealing very
different behavior (high uncertainty). About packaging, they pay attention to it but
uncertainty is high indicating very different attentional levels among respondents.
Let us take a look at Table 1.5. CUB models with and without covariates are applied
and Log-likelihoods are compared by Chi-square tests.
1.3 A Food Packaging Survey 9
p-Values in Table 1.5 show that log-likelihoods for CUB models with covariates
increase. Significant covariates can help to understand how behaviours of subject’s
subgroups differ in terms of attention paid to attributes. The introduction of covari-
ates in the CUB model is supposed to be significant when log-likelihoods between
CUB model without and with covariates are significantly different. Table 1.6 de-
scribes which direction takes 1 − ξ (attention) once covariates are introduced.
10 1 The CUB Models
Fig. 1.2 Attention variables with increasing attention (feeling) and uncertainty when parameters
tend to 1
CUB models
1.0 7 34
6
0.8
10
Feeling (1-ξ)
0.6
5
0.4
2
0.2 9
1 8
0.0
Uncertainty (1-π)
In Table 1.6 we realize that older subjects paid more attention to provenance than
the younger ones and males (coded as 0) paid more attention to brand than females.
Another interesting results is that Italians seem to pay less attention to no GMO food
than Austrian. Let us take a look at subgroups that come from crossing the categor-
ical variables gender and education. We saw that covariate nutrition facts improve
log-likelihoods of CUB models for all subgroups but with different implications.
The logistic functions (1.4)–(1.5) estimate parameters π and ξ for each subgroup so
that to have rating distribution estimations (Table 1.7).
The expected value E(Y ) is derived as follows (see [14]):
1 (m + 1)
E(Y ) = π (m − 1)( − ξ ) + . (1.12)
2 2
Females have higher uncertainty than males and their responses present higher dis-
persion. About education, the higher education affect attention paid to nutrition
facts. The expected values of females are quite similar whereas there is a clear
gap between high school (E(Y ) = 2.61) and graduate (E(Y ) = 4.07) conditions for
males. Figure 1.3 shows estimated probability distributions about covariates gender
and education.
The distributions in left panel (females) are flatter than ones in right panel (males)
indicating high uncertainty. Moreover, we cannot discriminate between elementary
and intermediate because of overlapping distributions. Respondents with a degree
clearly are connected with a higher level of attention to the nutrition facts. CUB
models can take into account several choice behaviors and in order to give an exam-
ple let us consider the variable advertisement. The dissimilarity index is quite high
(Diss = 0.166) meaning a bad match between estimated and observed distributions.
About observed relative frequencies of advertisement, we see a Mode at y = 1. The
score 1 has a high frequency and could indicate a shelter choice at c = 1. The hy-
pothesis is that some respondents have chosen the lowest score in order to simplify
the task and in order to test it we apply the CUB model with shelter choice at c = 1
(Table 1.8).
12 1 The CUB Models
Fig. 1.3 Estimated probability distributions of responses to nutrition facts for females (left panel)
and males (right panel)
0.5 Females 0.5 Males
Elem Elem
Interm Interm
High High
0.4 0.4
Grad Grad
probability
probability
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
1 2 3 4 5 6 1 2 3 4 5 6
rating rating
Table 1.9 CUB model estimates for satisfaction and opinion variables
Coding Variable name π (s.e.) ξ (s.e.) Diss. index (0,0)
1 Preservation 0.695(0.065) 0.366(0.018) 0.123 −384.280
2 Resealability 0.668(0.066) 0.333(0.019) 0.122 −383.851
3 Easy peel 0.628(0.072) 0.418(0.021) 0.085 −393.884
4 Preservatives 0.536(0.071) 0.293(0.023) 0.104 −399.914
5 Packaging 0.553(0.074) 0.322(0.025) 0.164 −400.412
1.3 A Food Packaging Survey 13
Parameter estimates, as coordinates, describe points in Fig. 1.4 and suggests two
clusters, the satisfaction variables (1, 2, 3) and the opinion variables (4, 5) with
the second ones in the upper right corner of the space. Opinion variable positions
suggest high uncertainty and high feeling.
Fig. 1.4 Satisfaction and opinion variables as coded in Table 1.9 with increasing feeling and un-
certainty when parameters tend to 1
CUB models
0.8
4
0.7 5
2
Feeling (1-ξ)
0.6 3
0.5
0.4
Uncertainty (1-π)
Fig. 1.5 Estimated probability distributions about food preservation for frequent buyers of pack-
aged products (left panel) and not frequent buyers (right panel)
Frequency buyers Not frequent buyers
0.20 0.20
probability
probability
0.15 0.15
0.10 0.10
0.05 0.05
0.00 0.00
2 4 6 8 10 2 4 6 8 10
rating rating
From Fig. 1.5 we see that age affects uncertainty: older the respondents flatter the
distribution. Older respondents display greater uncertainty whereas frequent buyers
seem to be more satisfied than not frequent buyers about packaged fresh foods.
The main aim of CUB models was to explain the psychological mechanism un-
derlying choice processes [10, 12]. Moreover, in order to take into account several
choice behavior, model extensions have been developed [16]. Within the framework
of preference evaluation methods, CUB models are considered a stated preference
method and suited to many real cases [25, 7, 6, 20], confirming CUB models as
useful and theorem based [19] statistical models. Moreover CUB models have been
applied in combination and integration with other methods like conjoint analysis
[3], indicating that they are also very flexible models. feeling and uncertainty as
References 15
latent variables are supposed to be involved in the choice process of an item. The in-
terpretation is wide with feeling indicating constructs (satisfaction, preference or at-
tention) that are linked to the measurement scale adopted. In the real case study just
described, CUB models have been applied to specific questions relating to the level
of attention respondents usually pay to specific characteristics at grocery stores.
Moreover questions regarded also satisfaction level and opinions about some food
packaging characteristics. Results showed high levels of attention for price, dis-
counted price, seasonality and provenance but with different levels of uncertainty.
Packaging as a variable received medium-high grades of attention but uncertainty
was high: respondents seem to pay very different levels of attention about packaging
when they buy foods at supermarket. Packaging was not affected by any covariates
so the high uncertainty could indicate an attribute (packaging) respondents are not
used to evaluate or to consider when they buy products at the supermarket. This
could mean a low knowledge of the real utility/importance of packaging. Some de-
mographic characteristics introduced as covariates revealed the presence of clusters
of subjects whose responses were quite different in terms of feeling and uncertainty.
For example, males and females don’t pay the same attention to brands or again the
product provenance seems to be affected by age, older respondents are more inter-
ested product provenance than younger ones. The CUB model extension with shelter
choice has been applied to variable advertisement. The model fitting improved a lot
with a shelter at c = 1. From this results we reasonably state that respondents tend to
simplify the answer. Maybe choosing a grade reflecting how much attention is paid
to advertised products is not a simple task. Respondents were more satisfied with
preservation of the food and the resealability of the packaging than with packaging
with easy peel. Finally, from introducing covariates, we saw that a high frequency of
packaged-food product purchasing and satisfaction for food preservation are linked
in some ways. Concluding, the CUB models have proven to be very useful, flex-
ible and constantly evolving. They have a wide range of possible applications not
only as a single method but also in combination or integration with others statistical
methods.
References
4. Box, G.E.P., Draper, N.R.: Empirical Model Building and Response Surfaces.
Wiley, New York (1987)
5. Capecchi, S., Endrizzi, I., Gasperi, F., Piccolo, D.: A multi-product approach
for detecting subjects and objects’ covariates’ in consumer preferences. Br.
Food J. 118(3), 515–526 (2016)
6. Cicia, G., Corduas M., Del Giudice T., Piccolo, D.: Valuing consumer prefer-
ences with the CUB model: a case study of fair trade coffee. Int. J. Food Syst.
Dyn. 1, 82–93 (2010)
7. Corduas, M., Iannario, M., Piccolo, D.: A class of statistical models for
evaluating services and performances. In: Monari, P., Bini, M., Piccolo, D.,
Salmaso, L. (eds.), Statistical Methods for the Evaluation of Educational Ser-
vices and Quality of Products, pp. 99–117. Physica-Verlag HD, Heidelberg
(2009)
8. Corduas, M., Cinquanta, L., Ievoli, C.: The importance of wine attributes for
purchase decisions: a study of Italian consumers’ perception. Food Qual. Pre-
fer. 28, 407–418 (2013)
9. D’Elia, A.: Il meccanismo dei confronti appaiati nella modellistica per gradu-
atorie: sviluppi statistici ed aspetti critici. Quad. Stat. 2, 173–203 (2000)
10. D’Elia, A.: A mixture model with covariates for ranks data: some inferential
developments. Quad. Stat. 5, 1–25 (2003)
11. D’Elia, A.: New developments in ranks data modelling with covariate. Atti
della XLII Riunione Scientifica SIS, pp. 233–244. CLEUP, Padova (2004)
12. D’Elia, A., Piccolo, D.: A mixture model for preferences data analysis. Com-
put. Stat. Data Anal. 49, 917–934 (2005)
13. Iannario, M.: Fitting measures for ordinal data models. Quad. Stat. 11, 39–72
(2009)
14. Iannario, M.: On the identifiability of a mixture model for ordinal data.
METRON Int. J. Stat. LXVIII(1), 87–94 (2010)
15. Iannario, M. Modelling shelter choices in a class of mixture models for ordinal
responses. Stat. Method Appl. 21, 1–22 (2012)
16. Iannario, M.: A class of model for ordinal data analysis: statistical issues and
new developments. Mixture and latent variable models for casual inference
and analysis of socio-economic data, FIRB meeting, Perugia (2013)
17. Iannario, M.: Modelling uncertainty and overdispersion in ordinal data. Com-
mun. Stat. Theory Method 43, 771–786 (2014)
18. Iannario, M., Piccolo, D.: CUB models: statistical methods and empirical evi-
dence. In: Kennet, S.R., Salini, S. (eds.) Modern Analysis of Customer Satis-
faction Surveys: With Applications Using R, pp. 231–258. Wiley, Chichester
(2012)
19. Iannario, M., Piccolo, D.: A theorem on CUB models for rank data. Stat.
Probab. Lett. 91, 27–31 (2014)
20. Iannario, M., Manisera, M., Piccolo, D., Zuccolotto, P.: Sensory analysis in
the food industry as a tool for marketing decisions. Adv. Data Anal. Classif. 6,
303–321 (2012)
References 17
21. Kennet, S.R., Salini, S.: Modern analysis of customer satisfaction surveys:
comparison of models and integrated analysis. Appl. Stoch. Model. Bus. Ind.
27, 465–475 (2011)
22. Louviere, J.J., Hensher, D.A., Swait, J.D.: Stated Choice Methods. Analysis
and Applications. Cambridge University Press, Cambridge (2000)
23. Piccolo, D.: On the moments of a mixture of uniform and shifted binomial
random variables. Quad. Stat. 5, 85–104 (2003)
24. Piccolo, D.: Computational issues in the E-M algorithm for ranks model esti-
mation with covariates. Quad. Stat. 5, 1–22 (2003)
25. Piccolo, D., D’ Elia A.: A new approach for modelling consumers preferences.
Food Qual. Prefer. 19, 247–259 (2008)
26. Rundh, B.: Packaging design: creating competitive advantage with product
packaging. Br. Food J. 111(9), 988–1002 (2009)
27. Silayoi, P., Speece, M.: The importance of packaging attributes: a conjoint
analysis approach. Eur. J. Mark. 41, 1495–1517 (2007)
Chapter 2
Customer Satisfaction Heterogeneity
The measurement of the customer satisfaction concerns the gap between the cus-
tomer expectations about the product or service and the perceptions of the customer
after the consumption or use. In other words, the customer satisfaction is closely
related to the concept of “perceived quality”. According to the definition of Mont-
gomery [24], it depends on how much the products or services meet the require-
ments of the consumers/users and it is directly connected to the homogeneity of the
performance of the production process or service provision process.
When the performance is represented by a numerical variable, for instance when
the customers express their satisfaction degree with a numerical score, quality is in-
versely related to variability. High variability of the process means inhomogeneous
outputs and great percentage of waste. As a matter of fact, the process capacity in-
dices are measures used to evaluate the ability of the process to produce little waste
and are an inverse function of the standard deviation. Other important tools for pro-
cess quality control are the so called control charts and among them the R charts
and the S charts are commonly used to control range and standard deviation as main
indices of variability.
In many cases the customer satisfaction is measured through categorical judg-
ments, by using a Likert scale or a set of ordered categorical evaluations. Often, the
number of possible levels used to represent the different satisfaction degrees are 4,
5, 7 or 10. Even if sometimes, in the customer satisfaction questionnaires, the pos-
sible response alternatives are linked to integer numbers which represent the ranks
of the evaluations with respect to the judgment scale, the nature of the information
provided by the answers of the respondents is categorical. For this reason, range,
variance, standard deviation and other indices used to measure the variability of
the satisfaction, after transformation of the ordinal assessments into numeric scores,
often are not the suitable way to measure satisfaction inhomogeneity. The transfor-
mation of the original data can change the information provided by the statistical
surveys and cause a bias in the estimation of the customer satisfaction parameters,
leading to unreliable results because the transformed data do not reflect the real
customers opinions.
Hence, the application of suitable statistical techniques for categorical variables
is preferable to the transformation of data into numeric scores and consequent ap-
plication of inferential methods for variability parameters (e.g. F test for variance
comparison). In this framework, interesting methodological tools to measure satis-
faction inhomogeneity are represented by the indices of heterogeneity for categori-
cal variables, such as Gini’s index, Shannon’s entropy, Rényi’s family of measures
and others. These indices can be used for inferential purposes, concerning hetero-
geneity estimation and test of hypothesis for comparing the heterogeneity of two or
more populations. Another good reason why the use of indices of heterogeneity for
categorical variables is preferable to data transformation and application of indices
of variability, is that the detection of customer groups characterized by “within ho-
mogeneity” and “between heterogeneity” could be a useful starting point of market
segmentation and product/service differentiation strategies.
The present chapter is dedicated to the description of testing methods, for com-
paring the (categorical) satisfaction heterogeneities of two or more customer popula-
tions. These methods are based on a nonparametric approach and on the comparison
of the Pareto diagrams of the probability distributions. In the next section the het-
erogeneity of categorical variables is defined and suitable indices of heterogeneity
are presented. In the section after that, the theory of the two-sample test for het-
erogeneity comparisons is introduced and some real applications shown. The final
section includes the extension of the method under study to the comparison of more
than two populations, from both theoretical and application point of view.
The notion of statistical heterogeneity for categorical data finds several applica-
tions in different disciplines such as genetics, physics, engineering, environmental
sciences, sociology, economics, and others. According to the problem and the sci-
entific framework, it is associated to the concept of diversity, entropy, mutability,
dispersion, differentiation, dissimilarity or uniform distribution. For instance, it can
be used for studying market segmentation [10], biodiversity [26, 32, 33, 7], process
capability [11], clustering [38], genetic differentiation [6], customer satisfaction [5]
and many other phenomena typical of the mentioned disciplines.
Let us assume that the random variable X j , representing the customer satisfaction
of population j ( j = 1, . . . ,C;C ≥ 2), may take m categories (judgments) v1 ,v2 ,. . . ,vm
and f jh denotes the absolute frequency of the j-th sample for the h-th category
( j = 1, . . . ,C; h = 1, . . . , m). A 2 × m contingency table of the absolute frequencies
[ f jh ] is observed. In other words the categorical response variable X j takes cate-
gories in {v1 , . . . , vm }, with unobserved probability distribution Pr{X j = vh } = π jh ,
h = 1, . . . , m.
The heterogeneity of X j or, equivalently, the heterogeneity of its distribution, is
minimum (in other words the homogeneity is maximum) when in the j-th population
2.1 Heterogeneity Indices 21
one modality is observed with probability 1 (certain event) and all the other modal-
ities with probability 0 (impossible event). In this population there is full judgment
homogeneity and then the distribution is degenerate. Conversely the heterogeneity
of X j is maximum when all the modalities/categories are observed with the same
probability. In this population the distribution is uniform over the set of modalities,
that is π jh = 1/m, ∀h. Thus, heterogeneity depends on the concentration of proba-
bilities over the categories v1 ,v2 ,. . . ,vm .
An suitable measure of heterogeneity η j = het(X j ) must satisfy the following
properties:
1. it takes its minimum when the distribution is degenerate, i.e. when there is an
integer r ∈ {1, . . . , m} such that π jr = 1 and π jh = 0, ∀h = r;
2. the farther the distribution from the degenerate case and the closer to the uniform
case, the greater the index value;
3. it takes its maximum in case of uniform distribution, i.e. π jh = 1/m, ∀h ∈
{1, . . . , m}.
The properties listed above hold for many different index types (for a review see
[31]). Each of this indices can be used as a measure of the degree of heterogeneity.
By assuming that log(·) correspond to the natural logarithm and that 0 · log(0) = 0,
let us consider the following indices:
(S)
Shannon entropy ([35]): η j = − ∑m
h=1 π jh log(π jh ),
(G)
Gini heterogeneity ([17]): η j h=1 π jh (1 − π jh ) = 1 − ∑h=1 π jh ,
= ∑m m 2
(L) −π jh
Leti index ([20]): η j h=1 (π jh )
= ∏m ,
(Fe)
Frosini index - euclidean distance ([16]): η j = h=1 (π jh − 1/m) ,
∑m 2
(Fm)
Frosini index - Manhattan distance ([16]): η j h=1 |π jh − 1/m|,
= ∑m
(Rδ ) δ
Rényi index of order δ ([34]): η j = 1
1−δ h=1 π jh .
log ∑m
(G) m (G)
η̃ j = ·η .
m−1 j
The Gini’s index can be interpreted as arithmetic mean of 1 − π jh (h = 1, · · · , m),
which can be considered measures of heterogeneity of the single attributes vh
(h = 1, · · · , m). The logic underlying the index of Leti is similar, because the
Leti’s measure of heterogeneity corresponds to the geometric mean of 1/π jh (h =
1, · · · , m). This index is a monotonic transformation of Shannon’s entropy since
(L) (S)
η j = exp(η j ). Its normalized version is
(L)
(L) ηj −1
η̃ j .
=
m−1
Quite a general approach was followed by Frosini, whose indices of homogene-
ity are defined as distances between the vector of probabilities (π j1 , π j2 , . . . , π jm ) ,
which characterize the distribution of X j (observed relative frequencies in descrip-
tive statistics), and the vector of expected probabilitites in case of uniform distri-
bution (1/m, 1/m, . . . , 1/m) . The normalized version of the index computed as eu-
clidean distance is
(Fe) m (G)2
η̃ j = 1 − ·η ,
m−1 j
(Fe)
which is an increasing function of the Gini’s index. It can be proved that η̃ j =
(G)
1− 1 − η̃ j . Both indices are a decreasing function of ∑h π 2jh , which is commonly
used as measure of homogeneity by several contributions.
The normalized Frosini’s index based on Manhattan distance is
(Fm) m (Fm)
η̃ j = 1− ·η .
2(m − 1) j
(R∞ ) δ
• ηj h=1 π jh )] = − log[suph (π jh )].
= limδ →∞ [ 1−1 δ log(∑m
Table 2.1 shows the values of some of the most common indices of heterogeneity
(in both non-normalized and normalized versions) related to specific probability
distributions in the case of 4 modalities or satisfaction levels. Three aspects are
evident from the table: the monotonic relationship between index value and degree
of heterogeneity, the similar normalized values of η (G) and η (S) and the normalized
values of η (R∞ ) , which are much different from those of the other indices.
2.2 Two-Sample Test for Dominance in Heterogeneity 23
Let us consider the two-sample test where the hypothesis under study consists in the
comparison between the heterogeneities of the customer satisfactions of two popu-
lations, by assuming that the satisfaction is represented by the categorical variable
X j ( j = 1, 2), with support given by the set of modalities {v1 , v2 , . . . , vm }. We could
be interested in assessing the plausibility of the hypothesis that the heterogeneity of
one population is greater than that of the other (one-sided test) or the hypothesis that
the two heterogeneities are not equal (two-sided test). To this end, let us suppose that
one random sample from each of the two populations is selected and denote the size
of sample j with N j ( j = 1, 2). As indicated in the previous section, the probability
distribution of X j is {π jh , h = 1, . . . , m}, with π jh = Pr{X j = vh } ≥ 0 and therefore
∑h π jh = 1. The probabilities π jh are unknown parameters of the two populations.
If we denote the heterogeneity degree of the judgments of the j-th population
with het(X j , the hypotheses of the one-sided testing problem can be formally defined
as follows:
H0 : het(X1 ) = het(X2 )
against
H1 : het(X1 ) > het(X2 ).
Table 2.1 Probability distributions in the case of m = 4 modalities and measures of heterogeneity
Modalities Index Normalized index
v1 v2 v3 v4 η (S) η (G) η (R3 ) η (R∞ ) η̃ (S) η̃ (G) η̃ (R3 ) η̃ (R∞ )
1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.70 0.20 0.10 0.00 0.46 0.80 0.52 0.36 0.61 0.58 0.38 0.26
0.50 0.30 0.15 0.05 0.64 1.14 0.93 0.69 0.85 0.82 0.67 0.50
0.40 0.30 0.20 0.10 0.70 1.28 1.15 0.92 0.93 0.92 0.83 0.66
0.30 0.25 0.25 0.20 0.75 1.38 1.36 1.20 0.99 0.99 0.98 0.87
0.25 0.25 0.25 0.25 0.75 1.39 1.39 1.39 1.00 1.00 1.00 1.00
Let us denote with π j(1) ≥ π j(2) ≥ · · · ≥ π j(m) the ordered probabilities of the
j-th population. All the indices defined in the previous section are order invariant,
i.e. their values do not change if they are computed with the ordered probabilities
instead of the unordered ones. Formally, if we denote with η j any index to measure
het(X j ) and we indicate it as a function of the unknown parameters, we can state that
η j (π j(1) , π j(2) , . . . , π j(m) ) = η j (π j1 , π j2 , . . . , π jm ),
thus we can express het(X j ) by using the ordered probabilities.
Let us observe that:
In other words, two populations with the same ordered distributions, are equally
heterogeneous. Moreover, if π1(h) = π2(h) ∀h = 1, 2, . . . , m, exchangeability holds and
the permutation testing principle applies. According to it, the null hypothesis of the
testing problem can be represented as:
H0 : π1(h) = π2(h) ∀h = 1, 2, . . . , m
or equivalently as
H0 : Π1(h) = Π2(h) ∀h = 1, 2, . . . , m − 1,
where Π j(h) = ∑hs=1 π j(s) are the cumulative ordered probabilities, with Π j(m) =
h=1 π j(h) = 1.
∑m
The alternative hypothesis of the problem can be written as:
H1 : Π1(h) ≤ Π2(h) ∀h = 1, 2, . . . , m − 1,
For problems of stochastic dominance many exact and approximate solutions have
been proposed in the literature (see [1, 18, 19, 22, 21, 23, 25]). For the univariate
case several authors proposed solutions based on the restricted maximum likelihood
ratio test [15, 36, 40]. According to these proposals, under the null and alternative
hypothesis, the distributions of the test statistics asymptotically are mixtures of chi-
squared whose weights essentially depend on the unknown population distribution.
Nonparametric solutions are proposed by other authors [39, 13, 27, 28, 29].
For comparing the heterogeneities of two populations, it is reasonable to choose,
as test statistic, the difference between the sampling values of an index η , as follows
Tη = η̂1 − η̂2 ,
where η̂ j = η j (p j(1) , p j(2) , . . . , p j(m) ), p j(h) = f j(h) /N j , with f j(h) and p j(h) ordered
relative frequencies and ordered absolute frequencies respectively, observed on the
two samples (h = 1, . . . , m; j = 1, 2). The ordered relative frequencies p j(h) = π̂ j(h)
are point estimates of the unknown ordered probabilities π j(h) as well as the sample
2.2 Two-Sample Test for Dominance in Heterogeneity 25
indices η̂ j represent suitable point estimates of the indices η j . The null hypothesis
is rejected in favour of the alternative for large values of the test statistic.
(obs) (obs)
In order to compute the p-value λ = Pr{Tη ≥ Tη |H0 }, where Tη is the
observed value of the test statistic, we need to know the sampling null distribution
of the test statistic. Arboretti et al. [6] proposed a permutation solution that assumes
exchangeability under the null hypothesis. The authors consider different options
as test statistic, considering alternatively the indices of Gini, Shannon and Rény
of order 3 and ∞. The Rény’s index of order 2 is the most commonly used but,
according to the property mentioned above, it is permutationally equivalent to the
index of Gini.
Exchangeability under H0 holds if the true order of the unknown probabilities
were known. Since in practice the true order is not known, it must be estimated with
the sample data. Hence exchangeability under H0 is not exact, because the order
of the probabilities is estimated with the order of the observed frequencies (empir-
ical order) which presents a random component and it is subjected to the sampling
variations. Thus the solution is data driven. Anyhow, estimated and true order are
asymptotically equal with probability one, according to the Glivenko-Cantelli the-
orem [37], thus exchangeability holds asymptotically and the permutation test is
approximate for finite sample sizes.
For each sample, the observed relative frequencies are sorted in decreasing or-
der, as well as in the Pareto diagram, and the table of ordered relative frequencies
p j(h) (ordered table) is determined. The original variables X j are then transformed
in such a way that, within the j-th sample, vh is replaced by the integer number r if
(obs)
p jh = p j(r) . The observed value of the test statistic Tη is computed as a function
of the frequencies of the ordered table. The null distribution of the test statistic is
obtained by considering all the permutations of the transformed dataset or (for com-
putational convenience) a random sample of them. In the latter case the permutation
p-value is not exact but estimated with the conditional Monte Carlo method. For
each permutation a new (permuted) table of the ordered frequencies is obtained and
the corresponding permutation value of the test statistic is computed. The permuta-
tion p-value is then
∗(b) (obs)
(Tη ≥ Tη |X)
λη = ,
B
∗(b)
where Tη is the permutation value of the test statistic corresponding to the b-th
permutation, B is the number of permutations and the numerator is the number of
permutation values of the test statistic greater than or equal to the observed one,
given the observed dataset X. Thus the inferential result depends on the space gen-
erated by the permutations of X, that is the orbit associated with X. As usual, the null
hypothesis is rejected when λη ≤ α , where α is the significance level of the test.
Let us consider the following practical example, where 350 customers of a food
outlet were interviewed to evaluate a sweet snack bought in that shop. The evalua-
tion consisted in choosing a categorical judgment within a Likert scale with seven
satisfaction levels, where level 1 corresponds to “not at all satisfied” and level 7
corresponds to “very much satisfied”.
26 2 Customer Satisfaction Heterogeneity
Table 2.2 Contingency table of observed absolute frequencies in the customer satisfaction survey
about a sweet snack
Satisfaction Gender Total
Male Female
1 = not at all 18 5 23
2 25 17 42
3 31 28 59
4 35 36 71
5 27 39 66
6 22 28 50
7 = very much 17 22 39
Total 175 175 350
We wish to test the hypothesis that the satisfaction of males is more heterogeneous
than that of females at significance level α = 0.05. The observed contingency table
is shown in Table 2.2.
It is evident that the sample modal satisfaction for males corresponds to level 4
and for females to level 5. Hence, from a descriptive point of view and by using the
mode as location index, the satisfaction of females for the sweet snack seems to be
greater than that of males.
Since we are interested in the comparison of the heterogeneities, we can con-
sider the Pareto diagrams of the frequencies distributions, where the two cumulative
ordered frequency polygons can be represented to compare the frequency concen-
trations (see Fig. 2.1).
The frequencies of the customer satisfaction distribution for males seem to be
less concentrated, supporting the hypothesis of greater heterogeneity of judgements
for this category. The computation of the sample indices of heterogeneity of Gini,
Fig. 2.1 Pareto diagrams of the customer satisfaction survey about a sweet snack
Males Females
100%
100%
150
150
C u m u la tiv e P e rc e n ta g e
C u m u la ti v e P e rc e n ta g e
75%
75%
F re q u e n c y
F re q u e n c y
100
100
50%
50%
50
50
25%
25%
0%
0%
0
G
F
G
F
A
E
C
D
C
2.3 Two-Sided Test 27
Table 2.3 Sample indices of heterogeneity in the customer satisfaction survey about a sweet snack
Index Gender
Male Female
Gini 0.849 0.831
Shannon 1.916 1.835
Rényi-3 1.862 1.739
Rényi-∞ 1.609 1.501
Shannon, Rény of order 3 and Rény of order ∞, conforms this idea as shown in
Table 2.3.
To test whether the observed positive differences between the indices of the two
groups are significant, we apply the permutation test described above, by estimating
the permutation p-values with B = 10,000 conditional Monte Carlo simulations and
using all the four measures of heterogeneity just mentioned. Table 2.4 reports the
observed values of the test statistics and the corresponding p-values:
According to the table, the observed difference of the sample indices is (positively)
significant when we use the indices of Gini, Shannon and Rény of order 3. It is non-
significant when we use the index of Rény of order ∞. Thus, according to the first
three procedures, the heterogeneity of males’ judgements is greater than the hetero-
geneity of females. This result is consistent with some evidences of the literature.
Table 2.4 One-sided test on heterogeneity in the customer satisfaction survey about a sweet snack
(obs)
Index Tη p-value
Gini 0.018 0.028
Shannon 0.081 0.012
Rényi-3 0.123 0.046
Rényi-∞ 0.108 0.226
Arboretti et al. [6] prove that, in general, the test based on the index of Rényi-∞
is the least powerful among the four considered, and this is more evident for high
degrees of heterogeneity. As a matter of fact the degrees of heterogeneity in the
problem presented are very high. The normalized values of the Gini’s indices for
males and females are 0.991 and 0.970 respectively; the normalized values of the
Shannon’s indices are 0.985 and 0.943; for the index of Rényi-3 we have 0.957 and
0.894 and for the index of Rényi-∞ the values are 0.827 and 0.771.
Sometimes, in the two-sample problem, the interest of the study concerns the test of
a two-sided hypothesis. For example let us consider the case of a survey to evaluate
the satisfaction of students about professors’ teaching effectiveness in an academic
28 2 Customer Satisfaction Heterogeneity
course. The students were divided into two groups that attended the lectures sepa-
rately at different times. A a matter of fact, the course was repeated twice and the
same classes were held by the same professors separately for the two groups.
Table 2.5 Contingency table of observed absolute frequencies of students’ satisfaction about
teaching effectiveness
Satisfaction Group Total
A B
Very dissatisfied 5 12 17
Moderately dissatisfied 18 35 53
Moderately satisfied 28 26 54
Very satisfied 9 9 18
Total 60 82 142
100%
60
80
50
75%
75%
C u m u l a ti v e P e rc e n ta g e
C u m u l a ti v e P e rc e n ta g e
60
40
F re q u e n c y
F re q u e n c y
50%
50%
30
40
20
25%
25%
20
10
0%
0%
0
0
A
C
D
B
A
C
The Pareto diagrams of the data presented in Table 2.5 are shown in Fig. 2.2.
From a descriptive point of view, to determine which of the two heterogeneities is
greater is not simple. The sample values of the heterogeneity indices can help in this
assessment.
2.3 Two-Sided Test 29
For the two-sided test, Arboretti and Bonnini [2] and Arboretti et al. [4] propose
the application of the permutation solution described in the previous section by us-
ing, as a test statistic, the difference squared of the sample indices. Formally the
problem can be represented by the null hypothesis
H0 : het(X1 ) = het(X2 )
H1 : het(X1 ) = het(X2 ),
where X1 and X2 represent the satisfaction of the two compared groups. By denoting
with η̃ j a given sample index of heterogeneity, the test statistic for this problem is
Tη = (η̃1 − η̃2 )2 ,
and large values of Tη lead to the rejection of the null hypothesis in favour of the
alternative. Under the null hypothesis, exchangeability holds, even if, as remarked
above, it is approximate for finite sample sizes and exact only asymptotically. Thus
the permutation distribution of Tη can be estimated, the p-value of the test computed
and the usual decision rule applied to decide whether reject or not the null hypoth-
esis. In the mentioned publications, a further index, consistent with the two-sided
nature of the alternative hypothesis, is taken into account as alternative to the other
indices listed above. This is the well known chi-squared index
(χ 2 )
m (N p jh − Nm )2
ηj = ∑ N
.
h=1 m
However, the test based on the chi-squared index is not well approximated as well
as the test statistics based on Shannon’s and Gini’s indices, because it tends to be
anticonservative under the null hypothesis. Furthermore it is less powerful than the
other two tests under the alternative hypothesis.
Table 2.6 Sampe indices of heterogeneity of students’ satisfaction about teaching effectiveness
Index Group
A B
Gini 0.663 0.684
Shannon 1.209 1.251
Rényi-3 1.010 1.085
Rényi-∞ 0.762 0.851
Table 2.7 Two-sided test on heterogeneity of students’ satisfaction about teaching effectiveness
(obs)
Index Tη p-value
Gini 0.021 0.624
Shannon 0.043 0.627
Rényi-3 0.075 0.615
Rényi-∞ 0.089 0.672
In Table 2.7 we can see that all the p-values are much greater than the signifi-
cance level, hence we cannot reject the null hypothesis of equal satisfaction hetero-
geneities.
When the compared categorical variables are more than two, a sort of ANOVA can
be applied to test the hypothesis that the heterogeneities of the C ≥ 3 distributions
are not equal (see [3]). An example is provided in [14], where the results of a cus-
tomer satisfaction survey on facilitis services in Terminal 2 at Tampere Airport are
published.
Table 2.8 Contingency table of observed absolute frequencies of customer satisfaction about fa-
cilities services in Terminal 2 at Tampere Airport
Satisfaction Terminal area
Entrance Departure Arrival lounge
concourse lounge
Highly dissatisfied 4 4 3
Somewhat dissatisfied 7 9 9
Neutral 42 35 48
Somewhat satisfied 50 50 38
Highly satisfied 27 32 32
The data of the frequency distributions of the customer satisfaction about facil-
ities services are reported in Table 2.8. The total number of customers, travellers
who took part to the customer satisfaction survey, is 130 for all the areas under
evaluation, i.e. “Entrance course”, “Departure lounge” and “Arrival lounge”. We are
interested in comparing the heterogeneities of the judgements for the three areas
(α = 0.10).
In generale, given C distributions with C ≥ 3, the C-sample testing problem with
null hypothesis of equality in heterogeneity of the C distributions and alternative
hypothesis of inequality in heterogeneity of some of them, can be formally written as
H0 : het(X1 ) = het(X2 ) = · · · = het(XC )
versus
H1 : ∃(i, j)|het(Xi ) = het(X j ), i = j = 1, 2, · · · ,C.
2.4 Multisample Test 31
The alternative hypothesis states that at least one couple of categorical random
variables which represents customer satisfactions present different heterogeneities.
Even in the multisample case, the hypotheses of the problem can be defined by
comparing the cumulative ordered probabilities, that is the Pareto diagrams of the
probability distributions, as follows:
versus
with Π j(h) = ∑hs=1 π j(s) , where π j(s) is the s-th ordered probability in the j-th distri-
bution, that is in the probability distribution of the categorical variable X j .
If the null hypothesis is true, asymptotically exact or, for finite sample, approx-
imate exchangeability holds. Thus the permutation distribution of a suitable test
statistic can be estimated and the p-value can be computed or estimated as usual. A
suitable test statistic for the multisample problem is:
C
Tη = ∑ (η̂ j − η̂• )2 ,
j=1
where η̂ j is a sample index of heterogeneity for the j-th sample and η̂• is the same
index of heterogeneity computed for the pooled sample. The algorythm for the ex-
ecution of the test is the same described above, except for the test statistic and the
number of samples.
Fig. 2.3 Pareto diagrams of customer satisfaction about facilities services in Terminal 2 at Tampere
Airport
Entrance concourse Departure lounge Arrival lounge
100%
100%
100%
120
120
120
100
100
100
75%
75%
75%
C u m u la tiv e P e r c e n ta g e
C u m u la tiv e P e r c e n ta g e
C u m u la tiv e P e r c e n ta g e
80
80
80
F re q u e n c y
F re q u e n c y
F re q u e n c y
50%
50%
50%
60
60
60
40
40
40
25%
25%
25%
20
20
20
0%
0%
0%
0
0
E
A
D
A
C
D
32 2 Customer Satisfaction Heterogeneity
In the presented application example, from the descriptive point of view, the com-
parison of the sample Pareto diagrams does not reveal evident differences in the
frequency concentration over the categories (judgements) among the three samples
(see Fig. 2.3).
Table 2.9 Sampe indices of heterogeneity of customer satisfaction about facilities services in Ter-
minal 2 at Tampere Airport
Index Terminal area
Entrance Departure lounge Arrival lounge
concourse
Gini 0.700 0.713 0.712
Shannon 1.323 1.358 1.344
Rényi-3 1.152 1.195 1.201
Rényi-∞ 0.956 0.956 0.996
The values of the sample indices of heterogeneity, as shown in Table 2.9, seem
to be consistent with the hypothesis of equality in heterogeneity. The described per-
mutation test can be applied to test whether the differences of the sample indices
between samples are significant.
Table 2.10 Multisample test on heterogeneity of customer satisfaction about facilities services in
Terminal 2 at Tampere Airport
(obs)
Index Tη p-value
Since the p-values of all the tests, reported in Table 2.10, are greater than α , than
we cannot reject the null hypothesis, that is there is not empirical evidence in favour
of the hypothesis of inequality in heterogeneity between some of the three compared
distributions.
The permutation test described in this chapter can be applied by choosing one of the
several indices of heterogeneity. In general the index of Shannon and the index of
Gini are preferable because the corresponding tests are well approximated (rarely
anticonservative under the null hypothesis) and in many cases more powerful than
other tests. These tests are unbiased, consistent and the convergence to one of the
power is fast.
References 33
References
18. Han, K.E., Catalano, P.J., Senchaudhuri, P., Mehta, C.: Exact analysis of dose-
response for multiple correlated binary outcomes. Biometrics 60, 216–224
(2004)
19. Hirotsu, C.: Cumulative chi-squared statistic as a tool for testing goodness-of-
fit. Biometrika 73, 165–173 (1986)
20. Leti, G.: Entropy, a Gini index and other heterogeneity measures. Metron
XXIV, 1–4 (1965)
21. Loughin, T.M.: A systematic comparison of methods for combining p-values
from independent tests. Comput. Stat. Data Anal. 47,467–485 (2004)
22. Loughin, T.M., Scherer, P.N.: Testing for association in contingency tables
with multiple column responses. Biometrics 54, 630–637 (1998)
23. Lumley, T.: Generalized estimating equations for ordinal data: a note on work-
ing correlation structures. Biometrics 52, 354–361 (1996)
24. Montgomery, D.C.: Introduction to statistical quality control. Wiley, Hoboken
(2013)
25. Nettleton, D., Banerjee, T.: Testing the equality of distributions of random
vectors with categorical components. Comput. Stat. Data Anal. 37, 195–208
(2001)
26. Patil, G.P., Taillie, C.: Diversity as a concept and its measurement (with dis-
cussion). J. Am. Stat. Assoc. 77, 548–567 (1982)
27. Pesarin, F.: Goodness-of-fit testing for ordered discrete distributions by resam-
pling techniques. Metron LII, 57–71 (1994)
28. Pesarin, F.: Multivariate Permutation Test With Application to Biostatistics.
Wiley, Chichester (2001)
29. Pesarin, F., Salmaso, L.: Permutation tests for univariate and multivariate or-
dered categorical data. Aust. J. Stat. 35, 315–324 (2006)
30. Pesarin, F., Salmaso, L.: Permutation Tests for Complex Data: Theory, Appli-
cations and Software. Wiley, Chichester (2010)
31. Piccolo, D.: Statistica. Il Mulino, Bologna (2000)
32. Pielou, E.C.: Ecological Diversity. Wiley, New York (1975)
33. Pielou, E.C.: Mathematical Ecology. Wiley, New York (1977)
34. Rényi, A.: Calculus des probabilités. Dunod, Paris (1966)
35. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J.
27, 379–423, 623–656 (1948)
36. Silvapulle, M.J., Sen, P.K.: Constrained Statistical Inference, Inequality, Order,
and Shape Restrictions. Wiley, New York (2005)
37. Shorack, G.R., Wellner, J.A.: Empirical Processes with Applications to Statis-
tics. Wiley, New York (1986)
38. Thornton-Wells, T.A., Moore, J.,H., Haines, J.L.: Dissecting trait heterogene-
ity: a comparison of three clustering methods applied to genotypic data. BMC
Bioinf. 7, 204 (2006)
39. Troendle, J.F.: A likelihood ratio test for the nonparametric Behrens-Fisher
problem. Biom. J. 44(7), 813–824 (2002)
40. Wang, Y.: A likelihood ratio test against stochastic ordering in several popula-
tions. J. Am. Stat. Assoc. 91, 1676–1683 (1996)
Chapter 3
Ranking Multivariate Populations
are ordered categorical the difficulties of the traditional methods based on contin-
gency tables may become insurmountable. Nonparametric inference based on the
NPC—NonParametric Combination of several dependent permutation test statistics
(see [10]), allows us to overcome most of these limitations, without the necessity
of referring to assume any specified random distribution. The main advantages of
using the permutation and combination approach to classify and rank several multi-
variate populations is that it is the only one testing method which allow us to derive
multivariate directional p-values that can be calculated also when the number of
response variables are much more larger than the number of replications (so-called
finite-sample consistency of combined permutation tests). It is worth noting that in
this situation, which can be common in many real applications, all traditional para-
metric and nonparametric testing procedures are not appropriate at all. For deeper
introduction on the topic of ranking of multivariate populations we refer the reader
to Corain and Salmaso [5].
Following results in Arboretti er al.[1], let us assume that data were drawn from each
of C multivariate populations Π1 , . . . , ΠC (i.e. items/groups/treatments), C > 2, by
means of a sampling procedure, so as to make inference on their possible equality
and in case of rejection of this hypothesis to classify those populations in order to
obtain a relative ranking from the ‘best’ to the ‘worst’ according to a pre-specified
meaningful criterion. We use the term relative ranking because we want to under-
line that it is not an absolute ranking but a kind of ordering that is only refereed
to the C populations at hand. Let Y be the p-dimensional response variable rep-
resented as a p-vector of the observed data from population Π and let us assume,
without loss of generality, that large values of each univariate aspect Y correspond to
a better marginal performance, so that when comparing two populations the possi-
ble marginal stochastic superiority should result in a high ranking position. In other
words, we are assuming the criterion “the larger the better”. The term “large val-
ues” has a clear meaning in case of continuous responses, while in case of binary or
ordered categorical responses, this should be intended in term of “large proportion”
and of “large frequencies of high score categories” respectively. The marginal uni-
variate components of Y are not restricted to belong to the same type, in other words
we can consider also the situation of mixed variables (some continuous/binary and
some others ordered categorical). We recall that our goal is to classify and ranking
Π1 , . . . , ΠC multivariate populations with respect to p marginal variables where are
available C samples, from each one population, of n j independent replicates repre-
sented by the random variables Y1 , ..,YC , j = 1, . . . ,C. In other words we are looking
for an estimate r̂(Π j ) of the rank r(Π j ), i.e. the relative stochastic ordering of each
population when compared among all other populations, i.e. more formally
d d
r j = r(Π j ) = 1 + ∑ I(Y j < Yh ) = 1 + {#Y j < Yh }, j, h = 1, . . . ,C, j = h (3.1)
j=h
3.1 Ranking Methods 39
where I(·) is the indicator function and # means the number of times. This definition
extends into a nonparametric multivariate framework the traditional definition of
ranking, hence it is consistent with the ranking problem literature (see [7] and [8]).
Under the hypothesis of distributional equality of the C populations, all true global
rankings would necessarily be equal to one, hence they would be in a full ex-aequo
situation, that is
d
r(Π j |H0 ) = {1 + #Y j < Yh , h = 1, . . . ,C, j = h} = 1, ∀ j (3.2)
This situation of equal ranking where all populations belong to just one ranking
class may be formally represented in a testing-like framework where the hypotheses
of interest are: ⎧
⎨H0 : Y1 =d d d
Y2 = . . . = YC
d (3.3)
⎩H : ∃Y = Y , j, h = 1, . . . ,C, j = h
1 j h
Note that a rejection of at least one hypothesis H0( jh) implies that we are not in
an equal ranking situation, that is at least one multivariate population has a greater
ranking position than some others. Note that, as usual in the framework of the C-
sample inference, the rejection of the global null hypothesis is not informative on
the specific alternative has caused the rejection so that post-hoc analysis is needed
to look for which alternative is more likely. In this connection, to make inference on
which marginal variable(s) that inequality is mostly due to, it is useful considering
the inferences on univariate pairwise comparisons between populations, defined as:
⎧
⎨H d
0k( jh) : Y jk = Yhk
(3.5)
⎩H d d
: (Y < Y ) (Y > Y ), j, h = 1, . . . ,C, j = h
1k( jh) jk hk jk hk
d d
because when Y jk = Yhk is true, then one and only one between Y jk < Yhk and
d
Y jk > Yhk is true, i.e. they cannot be jointly true. Looking at the univariate alter-
native hypothesis H1k( jh) , note that we are mostly interested in deciding whether
a population is either greater or smaller than another one (not only establishing if
they are different). In this connection, we can take into account separately of the di-
rectional type alternatives, namely those that are suitable for testing both one-sided
alternatives (see [10, p. 163] and [2]). Let p+
k( jh)
and p−
k( jh)
be the permutation-based
marginal directional p-value statistics related to the stochastic inferiority or supe-
40 3 Ranking Multivariate Populations
+ d − d
riority alternatives H1k( jh)
: Y jk > Yhk and H1k( jh)
: Y jk < Yhk , respectively. Since
+ − −
by definition pk( jh) = 1 − pk( jh) = pk(h j) , note that all one-sided inferential results
related to the hypotheses (3.5) can be represented as follows:
⎡ ⎤
− p+
1(1,2)
p+
1(1,3)
... p+1(1,C)
⎢ p+ − p+ ... p+ ⎥
⎢ 1(2,1) 1(2,3) 1(2,C) ⎥
+ ⎢ ⎥
P = ⎢ ... ... − ... ... ⎥
⎢ + ⎥
⎣ p1(C−1,1) p+
1(C−1,2)
... − p+
1(C−1,C) ⎦
p+1(C,1)
p+
1(C,2)
+
. . . p1(C,C−1) −
⎡ ⎤
− p+
p(1,2)
+
p p(1,3) ... +
p p(1,C)
⎢ p+ − +
... p p(2,C) ⎥
+
⎢ p(2,1) p p(2,3) ⎥
⎢ ⎥
= ⎢ ... ... − ... ... ⎥
⎢ + ⎥
⎣ p p(C−1,1) p+
p(C−1,2)
... − +
p p(C−1,C) ⎦
p+p(C,1)
p+
p(C,2)
+
. . . p p(C,C−1) −
Finally, let be p+
•( j,h)
the directional p-value statistics calculated via nonparametric
combination methodology [10]. All the C × (C − 1) p+ •( j,h)
can be represented as
follows: ⎡ ⎤
− p+•(1,2)
p+
•(1,3)
... p+•(1,C)
⎢ p+ − p+ ... p+ ⎥
⎢ •(2,1) •(2,3) •(2,C) ⎥
+ ⎢ ⎥
P• = ⎢ . . . ... − ... ... ⎥
⎢ + + + ⎥
⎣ p•(C−1,1) p•(C−1,2) . . . − p•(C−1,C) ⎦
+ + +
p•(C,1) p•(C,2) . . . p•(C,C−1) −
Now, let α be the chosen significance α -level and let S the C ×C matrix which trans-
forms the adjusted (by multiplicity) p-values p+ •( j,h)ad j
into 0-and-1 scores where
each element s j,h takes the value of 0 if p+•( j,h)ad j
> α /2, otherwise it takes 1 if
+
p•( j,h)ad j ≤ α /2, that is
⎡ ⎤
−S(1,2) S(1,3) . . . S(1,C)
⎢ S(2,1) − S(2,3) . . . S(2,C) ⎥
⎢ ⎥
S=⎢
⎢ S(3,1) S(3,2) − ... S(3,C) ⎥
⎥
⎣ ... ... ... − ... ⎦
S(C,1) S(C,2) . . . S(C,C−1) −
In practice, S is nothing more than a synthetic representation of results from all
multivariate directional pairwise comparisons suitable for estimating the possible
pairwise dominances. If we consider either the sum of the s( j, h) 0 − 1 scores along
the h-th column or the j-th row, then we are respectively counting the number of
populations which, at the chosen significance α -level, are considered to be stochas-
tically larger or smaller than the h-th population or the j-th row. That is, we are
defining an estimate r̂(Πh ) and r̂(Π j ) of the rank r(Πh ) or r(Π j ), i.e. the relative
3.2 Set-Up of the Multivariate Ranking Problem 41
stochastic ordering of each population when compared with all other populations,
i.e. more formally
C
r̂hD = 1 + ∑ S( j,h) , h = 1, . . . ,C (3.6)
j=1
C C
r̂Uj = 1 + {#(C − ∑ S( j,h) ) > (C − ∑ S( j ,h) ), j , j = 1, . . . ,C; j = j (3.7)
h=1 h=1
where D and U and stands for downward and upward rank estimates respectively.
We note that the ranking estimators defined in (3.6) and (3.7) are deriving by count-
ing, on the basis of empirical evidence, how many populations are significantly
stochastically larger/smaller than the h-th/ j-th population at the chosen significance
α -level. The two estimated rankings r̂D and r̂U of the true rank r are intentionally
denoted with a different notation in order to highlight that sometimes they could
provide different rank estimates for the same population. We remark that, as we out-
lined in our literature review, the use of a pairwise matrix as to derive a ranking is
quite common, especially in the algorithmic ranking literature.
In Sect. 3.1 we formalized our approach to solve the problem we called the multi-
variate ranking problem, i.e. that of ranking several multivariate populations from
the ‘best’ to the ‘worst’ according to a given pre-specified criterion when a sample
from each population is available and for each marginal univariate response there
is a natural preferable direction. Since the key element of our solution is a testing
procedure suitable for multivariate one-sided alternatives, the NPC methodology
represents our main methodological reference framework. In fact, to the best of our
knowledge, the nonparametric combination of dependent permutation tests, the so-
called NPC Tests, is the only method proposed by the literature suitable to achieve
this goal. Moreover, when deriving the multivariate one-sided p-values we can also
benefit from the flexibility of the method for obtaining a series of advantages: NPC
methodology allows to handle with all type of response variable, i.e. numeric, bi-
nary and ordered categorical even in the presence of missing data (at random or not
at random, i.e. non-informative or informative) and this can be done also when the
number of response variables are much more larger than that of units without the
need of having to worry about the curse of dimensionality or the problem of the re-
duction of degrees of freedom. On the contrary, thanks to the so-called finite-sample
consistency of combined permutation tests, the power function does not decrease for
any added variable which makes larger standardized noncentrality. It is worth noting
that in this situation, which can be common in many real applications, all traditional
parametric and nonparametric testing procedures are not at all appropriate (also in
42 3 Ranking Multivariate Populations
the case all multivariate alternatives were of two-sided type). Finally, the NPC ap-
proach has a lot of nice feature: it is very low demanding in terms of assumptions
and provides always an exact solution for whatever finite sample size whenever the
permutation principle applies, i.e. when the null hypothesis implies data exchange-
ability. We recall that our goal is to classify and ranking C multivariate populations
with respect to several marginal variables where samples from each population are
available. Note that the multivariate ranking problem is essentially related to a post-
hoc comparative multivariate C-sample problem where the populations of interest
are treatments or groups or items to be investigated by an experimental or observa-
tion study.
It is worth noting that within the NPC framework an optimal statistic cannot exist
because it is function of the population distributions which is unknown by definition
[10]. For this reason it is important to consider for each type of response variable a
number of different test statistics. We recall that each univariate partial test statistic
we are presenting must be suitable for one-sided alternatives with respect to the
hypotheses H0k( jh) vs. H1k( jh) . When the univariate marginal response variable is
continuous or binary, within the permutation framework we can use a number of
test statistics suitable for one-sided alternatives. In this context, we underline that
the test statistics should obviously be not permutationally equivalent, for example
we can refer to the difference of sample means:
TDM,k( jh) = ∑ Yi jk /n j − ∑ Yihk /nh . (3.8)
i i
When the univariate marginal response variable is ordered categorical with S ordinal
categories, within the permutation framework we can use a number of test statistics
suitable for directional alternatives. Some examples of suitable test statistics are the
Anderson-Darling test statistics:
S−1
∑ Nhsk · [N·sk · (n jk + nhk − N·sk )]− 2
1
TAD,k( jh) = (3.9)
s=1
where N·sk = N jsk + Nhsk are the cumulative frequencies, and the Multi-Focus test
statistics:
The following sensory study aims at investigating the relation of wine quality sen-
sory assessments with the related wine physicochemical properties on a large sample
of red variants of the Portuguese “Vinho Verde” wine. This dataset is public avail-
able for research, the details are described in Cortez et al. [6]. The dataset is made
by 1599 records referring to independent red wine testing sessions where at least
three evaluations were made by wine experts. Each expert graded the wine quality
between 0 (very bad) and 10 (very excellent) and, as final output of wine quality, the
related median of scores was considered. Moreover, for each one tested red wine, a
number of 11 physicochemical properties were recorded:
1. fixed acidity (tartaric acid—g/dm3 )
2. volatile acidity (acetic acid—g/dm3 )
3. citric acid (g/dm3 )
4. residual sugar (g/dm3 )
5. chlorides (sodium chloride—g/dm3 )
6. free sulfur dioxide (mg/dm3 )
7. total sulfur dioxide (mg/dm3 )
8. density (g/cm3 )
9. pH
10. sulphates (potassium sulphate—g/dm3 )
11. alcohol (% by volume)
First of all we represented in Fig. 3.1 the scatterplots of quality scores (median of
scores provided by experts) versus each physicochemical property, along with the
fitted line calculated by the simple linear regression.
According to the sign of the fitted regression lines, we classified the physicochem-
ical properties into two main domains, i.e. the direct and the inverse set of quality-
related physicochemical properties. In the fist domain there are “fixed acidity”, “cit-
ric acid”, “residual sugar”, “sulphates” and “alcohol”. In the second domain we
found “volatile acidity”, “chlorides”, “free sulfur dioxide”, “total sulfur dioxide”,
44 3 Ranking Multivariate Populations
6
4
2
5 10 15 5 10 15 0.5 1.0 1.5 0.0 0.5 1.0
residual.sugar chlorides free.sulfur.dioxide total.sulfur.dioxide
10
8
6
4
2
0 8 10.00 0.25 0.50 0 40 800 150 300
density pH sulphates alcohol
10
8
6
4
2
0.990 0.995 1.000 3.0 3.5 4.0 0.6 1.2 1.8 10.0 12.5 15.0
“density” and “pH”. After that, we defined four quality groups according to the
values of the quality score:
– Group Quality=1: scores equal or less to 4; 63 records met this condition;
– Group Quality=2: scores equal to 5; 681 records met this condition;
– Group Quality=1: scores equal to 6; 638 records met this condition;
– Group Quality=1: scores equal or greater to 7; 217 records met this condition.
Figure 3.2 displays the box plots of some of the most correlated properties by group
property.
As expected, when we move from group 1 to 4 the physicochemical property in-
creases or decreases accordingly to the positive or negative sign of the fitted regres-
sion line. We set up the ranking analysis problem as follows:
– type of design: one-way MANOVA design (four independent samples, i.e. four
multivariate populations to be ranked);
– domain analysis: yes (two domains, direct and inverse quality-related physico-
chemical properties);
– type of response variables: numerical continuous (11 numerical responses);
– Ranking rule: “the higher the better” and “the lower the better” for the first and
second domain respectively;
– Combining function: Fisher;
– B—Number of permutations: 2000;
– Significance α -level: 0.01.
3.3 Application to Food Sensory Analysis 45
Fig. 3.2 Boxplot of “volatile acidity”, “density”, “sulphates” and “alcohol” vs. Group Quality
volatile.acidity density
1.6 1.005
1.2
1.000
0.8
0.995
0.4
0.0 0.990
sulphates alcohol
2.0
14
1.5
12
1.0
10
0.5
8
1 2 3 4 1 2 3 4
Group Quality
When applying the ranking method to wine quality data using directional paramet-
ric p-values performed via B = 2000 permutations we obtain results reported in
Tables 3.1 and 3.2, where directional multivariate permutation p-values in Table 3.1
have been adjusted by using the Bonferroni-Holm-Shaffer method [11].
It is worth noting that ranking analysis results allow to strongly support that the
sensorial perceived quality can be associated with higher or lower values of specific
direct and inverse quality-related physicochemical properties.
The goal of this sensory study it to investigate the influence of fat content on sensory
properties and consumer perception of dairy products of ten cream cheese. From the
full set of ten products investigated in Bro et al. [4] we selected a subset set of five
products whose description is detailed in Table 3.3.
From the full list of sensory descriptors as in Bro et al. [4], we focused on a sub-
set set of six sensory descriptors, belonging to two main domain i.e. firmness and
creaminess (Table 3.4).
Sensory evaluations was performed in three replicates by a panel consisting of eight
panellists. Figures 3.3 and 3.4 show the boxplot and the sample mean of sensory
data by each sensory descriptor and product.
46 3 Ranking Multivariate Populations
Table 3.1 Direct quality-related physicochemical properties: global ranking results (α = 1%) and
related adjusted directional multivariate and directional (raw-unadjusted) univariate permutation
p-values for the Wine Quality sensory study (α = 1%)
Direct quality-related physicochemical properties
Global ranking 4 3 2 1
Group Quality 1 2 3 4
1 – 0.636 1.000 1.000
2 0.003 – 1.000 1.000
3 0.001 0.001 – 1.000
4 0.001 0.001 0.000 –
Fixed acidity
Group Quality 1 2 3 4
1 – 0.931 0.981 1.000
2 0.70 – 0.976 1.000
3 0.020 0.024 – 1.000
4 0.000 0.000 0.001 –
Citric acid
Group Quality 1 2 3 4
1 – 1.000 1.000 1.000
2 0.000 – 0.997 1.000
3 0.000 0.004 – 1.000
4 0.000 0.000 0.000 –
Residual sugar
Group Quality 1 2 3 4
1 – 0.179 0.142 0.531
2 0.821 – 0.268 0.949
3 0.859 0.733 – 0.966
4 0.471 0.052 0.035 –
Sulphates
Group Quality 1 2 3 4
1 – 0.906 1.000 1.000
2 0.097 – 1.000 1.000
3 0.000 0.000 – 1.000
4 0.000 0.000 0.001 –
Alcohol
Group Quality 1 2 3 4
1 – 0.002 0.999 1.000
2 0.999 – 1.000 1.000
3 0.002 0.000 – 1.000
4 0.000 0.000 0.000 –
3.3 Application to Food Sensory Analysis 47
Table 3.2 Inverse quality-related physicochemical properties: Global ranking results (α = 1%)
and related adjusted directional multivariate and directional (raw-unadjusted) univariate permuta-
tion p-values for the Wine Quality sensory study (α = 1%)
Inverse quality-related physicochemical
properties
Global ranking 3 3 2 1
Group Quality 1 2 3 4
1 – 0.003 0.001 0.001
2 0.003 – 0.001 0.001
3 0.177 1.000 – 0.000
4 1.000 1.000 1.000 –
Volatile acidity
Group Quality 1 2 3 4
1 – 0.000 0.000 0.000
2 1.000 – 0.000 0.000
3 1.000 1.000 – 0.000
4 1.000 1.000 1.000 –
Chlorides
Group Quality 1 2 3 4
1 – 0.315 0.052 0.002
2 0.686 – 0.003 0.000
3 0.950 0.998 – 0.000
4 0.998 1.000 1.000 –
Free sulfur dioxide
Group Quality 1 2 3 4
1 - 1.000 0.999 0.917
2 0.000 – 0.012 0.000
3 0.002 0.988 – 0.020
4 0.088 1.000 0.980 –
Total sulfur dioxide
Group Quality 1 2 3 4
1 - 1.000 0.980 0.512
2 0.000 – 0.000 0.000
3 0.021 1.000 – 0.005
4 0.490 1.000 0.996 –
Density
Group Quality 1 2 3 4
1 – 0.981 0.386 0.013
2 0.020 – 0.000 0.000
3 0.615 1.000 – 0.001
4 0.988 1.000 1.000 –
(continued)
48 3 Ranking Multivariate Populations
It seems that product 1 and 2 are the best ones, especially for the first domain, while
the remaining products look like not so different one each other.
We set up the ranking analysis problem as follows:
– type of design: multivariate randomized complete block design (five related sam-
ples, i.e. five multivariate populations to be ranked);
– domain analysis: yes (two domains);
– type of response variables: numerical continuous (six numerical responses);
– Ranking rule: “the higher the better”;
– Combining function: Fisher;
– B—Number of permutations: 2000;
– Significance α -level: 0.05.
3.3 Application to Food Sensory Analysis 49
0 * *
M-Creaminess M-Cream N-Cream
12
9
*
6
* * * *
3
* * * *
0
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
Product ID
2
Mean
0
M-Creaminess M-Cream N-Cream
10
0
1 2 3 4 5 1 2 3 4 5
Product ID
When applying the ranking method to wine quality data using directional paramet-
ric p-values performed via B = 2000 permutations we obtain results reported in
Tables 3.5 and 3.6, where directional multivariate permutation p-values in Table 3.1
have been adjusted by using the Bonferroni-Holm-Shaffer method [11].
50 3 Ranking Multivariate Populations
Table 3.5 Firmness domain: Global ranking results (α = 1%) and related adjusted directional mul-
tivariate and directional (raw-unadjusted) univariate permutation p-values for the Cream Cheese
sensory study
Firmness: all descriptors
Global ranking 2 1 3 3 3
Product ID 1 2 3 4 5
1 – 0.925 0.005 0.003 0.003
2 0.005 – 0.003 0.003 0.002
3 1.000 1.000 – 1.000 1.000
4 1.000 1.000 0.369 – 1.000
5 1.000 1.000 0.231 1.000 –
H-Resistence
Product ID 1 2 3 4 5
1 – 0.551 0.000 0.000 0.000
2 0.462 – 0.000 0.000 0.000
3 1.000 1.000 – 0.986 0.997
4 1.000 1.000 0.016 – 0.868
5 1.000 1.000 0.003 0.132 –
M-Firm
Product ID 1 2 3 4 5
1 – 0.994 0.000 0.000 0.000
2 0.007 – 0.000 0.000 0.000
3 1.000 1.000 – 0.883 0.724
4 1.000 1.000 0.127 – 0.217
5 1.000 1.000 0.292 0.792 –
M-Resistence
Product ID 1 2 3 4 5
1 – 1.000 0.012 0.000 0.000
2 0.000 – 0.000 0.000 0.000
3 0.989 1.000 – 0.315 0.155
4 1.000 1.000 0.698 – 0.272
5 1.000 1.000 0.850 0.744 –
It is worth noting the ranking results from the two domains are similar but some-
what different. Product 2 is the best product for firmness domain whose descriptors
allow. in general to strongly discriminate among panellist assessment. In the case of
creaminess domain there only very few significant differences and the best product
seems to be Product 1.
3.3 Application to Food Sensory Analysis 51
Table 3.6 Creaminess domain: Global ranking results (α = 1%) and related adjusted direc-
tional multivariate and directional (raw-unadjusted) univariate permutation p-values for the Cream
Cheese sensory study
Creaminess: all descriptors
Global ranking 1 2 2 5 2
Product ID 1 2 3 4 5
1 – 0.597 0.123 0.005 0.093
2 1.000 – 1.000 0.360 0.450
3 1.000 1.000 – 0.648 0.861
4 1.000 1.000 1.000 – 0.951
5 1.000 0.753 0.720 0.360 –
M-Creaminess
Product ID 1 2 3 4 5
1 – 0.453 0.163 0.001 0.006
2 0.559 – 0.171 0.006 0.007
3 0.849 0.840 – 0.046 0.087
4 1.000 0.995 0.958 – 0.733
5 0.997 0.994 0.920 0.280 –
M-Cream
Product ID 1 2 3 4 5
1 – 0.043 0.018 0.019 0.078
2 0.963 – 0.543 0.471 0.686
3 0.984 0.478 – 0.419 0.674
4 0.983 0.548 0.597 – 0.734
5 0.930 0.327 0.342 0.282 –
N-Cream
Product ID 1 2 3 4 5
1 – 0.204 0.182 0.179 0.898
2 0.807 – 0.523 0.504 0.984
3 0.830 0.493 – 0.491 0.987
4 0.832 0.504 0.521 – 0.992
5 0.107 0.017 0.016 0.010 –
Five different breads were baked in two replicates giving a total of ten samples.
Eight different judges assessed the breads with respect to eleven different attributes
in a fixed vocabulary profiling analysis. The data was kindly provided by Prof.
Magni Martens (KVL, DK) and come from a student project in Sensory Science
[3]. Sensory evaluations was performed in two replicates by a panel consisting of
eight panellists. Figures 3.5 and 3.6 show the boxplot and the sample mean of sen-
sory data by each sensory descriptor and product.
52 3 Ranking Multivariate Populations
Fig. 3.5 Box plot of sensory attribute scores by bread. The empty dots represent the actual ob-
served pannelist assessments
Colour Bread ID
5 1
4 2
3 3
2
* 4
1 5
0
Moisture *
5
4
3
2
*
1
0
*
Tough
5
4
*
3
*
2
1
0
*
1 2 3 4 5
Bread ID
Fig. 3.6 Chart of sample means by sensory attribute score and bread
Colour Bread ID
5 1
4 2
3 3
2 4
5
1
0
Moisture
5
Mean
4
3
2
1
0
Tough
5
4
3
2
1
0
1 2 3 4 5
Bread ID
At a first sight, product 4 and 1 look as the best and the worst one, while the remain-
ing products seems to differ only for some attributes. We set up the analysis ranking
problem as follows:
3.3 Application to Food Sensory Analysis 53
– type of design: multivariate randomized complete block design (five related sam-
ples, i.e. five multivariate populations to be ranked);
– domain analysis: no;
– type of response variables: ordered categorical (three ordered categorical re-
sponses in a 0-5 rating scale);
– Ranking rule: “the higher the better”;
– Combining function: Fisher;
– B—Number of permutations: 2000;
– Significance α -level: 0.01.
When applying the ranking analysis using directional permutation p-values (α =
5%), calculated via Anderson-Darling test statistic and performed with 2000 per-
mutations and using the Fisher combining function, we obtain the following results
(Table 3.7).
Table 3.7 Global ranking results (α = 1%) and related adjusted directional multivariate and direc-
tional (raw-unadjusted) univariate permutation p-values for the Five Breads sensory study
All attributes
Global ranking 5 2 3 1 4
Product ID 1 2 3 4 5
1 – 1.000 1.000 1.000 1.000
2 0.005 – 0.762 1.000 0.005
3 0.003 1.000 – 1.000 0.114
4 0.003 0.003 0.002 – 0.003
5 0.003 1.000 0.446 0.999 –
Colour
Product ID 1 2 3 4 5
1 – 0.997 0.986 1.000 0.151
2 0.006 – 0.490 0.991 0.000
3 0.027 0.643 – 0.990 0.001
4 0.000 0.026 0.023 – 0.000
5 0.932 1.000 1.000 1.000 –
Moisture
Product ID 1 2 3 4 5
1 – 1.000 1.000 1.000 1.000
2 0.000 – 0.046 0.824 0.148
3 0.000 0.985 – 0.998 0.852
4 0.000 0.258 0.006 – 0.028
5 0.000 0.928 0.268 0.988 –
(continued)
54 3 Ranking Multivariate Populations
Ranking analysis confirms the significant much more better multivariate assess-
ments of bread 4 over the other four breads. This result is mainly explained by the
better performance of bread 4 in the assessments of attribute Tough. In fact, this is
actually the attribute with the strongest significant differences (smallest p-values).
References
1. Arboretti Giancristofaro, R., Bonnini, S., Corain, L., Salmaso, L.: A permuta-
tion approach for ranking of multivariate populations. J. Multivar. Anal. 132,
39–57 (2014)
2. Bertoluzzo, F., Pesarin, F., Salmaso, L.: On multi-sided permutation tests.
Commun. Stat. Simul. Comput. 42(6), 1380–1390 (2013)
3. Bro, R.: Multi-way Analysis in the Food Industry. Models, Algorithms, and
Applications. 1998. Ph.D. thesis, University of Amsterdam (NL) & Royal Vet-
erinary and Agricultural University (DK) (1998)
4. Bro, R., Qannari, El M., Kiers, H.A.L., Næs, T., Frøst M.B.: Multiway models
for sensory profiling data. J. Chemom. 22, 36–45 (2008)
5. Corain, L., Salmaso, L.: Improving power of multivariate combination-based
permutation tests. Stat. Comput. 25(2), 203–214 (2015)
6. Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J.: Modeling wine pref-
erences by data mining from physicochemical properties. Decis. Support Syst.
47(4), 547–553 (2009)
7. Gupta, S.S., Panchapakesan, S.: Multiple Decision Procedures: Theory and
Methodology of Selecting and Ranking Populations. SIAM-Society for Indus-
trial and Applied Mathematics, Philadelphia (2002)
8. Hall, P., Miller, H.: Using the bootstrap to quantify the authority of an empiri-
cal ranking. Ann. Stat. 37(6B), 3929–3959 (2009)
References 55
9. Meilgaard, M., Civille, G., Carr, B.: Sensory Evaluation Techniques, 4th edn.
CRC Press, Boca Raton (2006)
10. Pesarin, F., Salmaso, L.: Permutation Tests for Complex Data: Theory, Appli-
cations and Software. Wiley, Chichester (2010)
11. Shaffer, J.P.: Modified sequentially rejective multiple test procedure. J. Am.
Stat. Assoc. 81, 826–831 (1986)
Chapter 4
Composite Indicators and Satisfaction
Profiles
The main purpose of NPC method is obtaining a single criterion for statistical
units under study, which summarize many partial aspects. In order to formalize
let us consider a k-dimensional variable X = [X1 , . . . , Xk ] where the marginal vari-
able Xi , i = 1, . . . , k assumes mi ordered modalities v1 , . . . , vmi or discrete scores,
h = 1, . . . , mi , mi ∈ N \{0}, mi > 1. If Xi is a categorical variable, then a numeri-
cal transformation of modalities v1 , . . . , vmi into scores is needed. Large values of
h correspond to higher satisfaction/quality rates. In order to simplify the notation,
let us assume that mi = m for every i, but it is not necessary that all variables have
the same number of modalities/scores. Let us suppose that these variables are given
different (non-negative) degree of importance:
0 < wi ≤ 1, i = 1, . . . , k.
Such weights are thought to reflect the different roles of the variables in repre-
senting indicators of the specific quality aspect under evaluation and are provided by
responsible experts or from results of surveys previously carried out in the specific
context.
4.1 NPC-Based Composite Indicator 59
Suppose that N subjects give their judgement for k dependent variables each repre-
senting a specific aspect under evaluation. Thus the methodological problem we are
going to face concerns obtaining a composite indicator which represents a global
index of satisfaction for each subject starting from that k dependent variables. We
introduce a set of minimum reasonable conditions related to variables Xi , i = 1, . . . , k:
1. for each of the k informative variables a partial ordering criterion is well estab-
lished, i.e. “large is better”.
2. Regression relationships within the k informative variables are monotonic (in-
creasing or decreasing).
3. The marginal distribution of each informative variables is non-degenerate.
It is worth to note that we do not need assuming continuity of Xi , i = 1, . . . , k so the
probability of ex-equo can be positive. As link function we consider a real function
φ from a class of Φ of real combining functions satisfying the following minimum
properties:
1. φ must be continuous in all 2k arguments, in that small variations is any subset
of arguments imply a small variation in the φ -index;
2. φ must be monotone non-decreasing with respect to each argument:
φ (. . . , Xi , . . . ; wi , . . . , wk ) ≥ φ (. . . , Xi , . . . , wi , . . . , wk )
if 1 > Xi > Xi > 0, i = 1, . . . , k;
Property 1 is obvious. Property 2 means that if, for instance, two subjects have ex-
actly the same values for all Xs except for the ith, then the one with Xi > Xi must
have at least the same satisfaction φ -index assigned to it. Property 3 states that any
combining function φ must be invariant with respect to the order in which infor-
mative variables are processed. As link function, the Fisher’s combining function
defined as: φ = − ∑ki=1 wi × log(1 − λi ), where λi = (Xi + 0.5)/(m + 1) are normal-
ized scores defined in the open interval [0, 1], satisfies the three described properties.
When dealing with assessment of satisfaction the Fisher’s combining function
appears more sensitive to assess higher satisfaction than to assess lower satisfac-
tion, i.e. small differences in the lower satisfaction region seem to be identified with
greater difficulty that those in the higher satisfaction region [9].
60 4 Composite Indicators and Satisfaction Profiles
An aspect that should be considered when attempting to find a global index of sat-
isfaction are extreme units, in the sense that the relevant question may not be the
achievement of the absolute rank, but rather a more realistic expected one [4]. For
this perspective we propose an extension of the NPC ranking method to the case
of ordered categorical variables based on extreme satisfaction profiles [3]. Extreme
satisfaction profiles are defined a priori on a hypothetical frequency distribution
of variables Xi , i = 1, . . . , k. Let us consider data X, where the rule “large is bet-
ter” holds for all variables. Observed values for the k variables are denoted by
x ji , i = 1, . . . , k; j = 1, . . . , N. Examples of extreme satisfaction profiles are given
below.
The strong satisfaction profile is defined as follows.
Maximum satisfaction is obtained when all subjects have the highest value of
satisfaction for all variables:
1 for h = m
fhi = ∀i, i = 1, . . . , k
0 otherwise
Minimum satisfaction is obtained when subjects have the smallest value of satisfac-
tion with relative frequencies varying across the variables:
li for h = 1
fhi =
lhi otherwise, where ∑mh=2 lhi = (1 − li ) i = 1, . . . , k
where ui and li represent realistic achievable targets that can be fixed observing
past experience or motivational targets established by managers or organizers in the
strategic and business planning.
4.1 NPC-Based Composite Indicator 61
h + fhi × 0.5h = m − t + 1, . . . , m; i = 1, . . . , k.
h + (1 − fhi ) × 0.5h = 1, . . . , m − t; i = 1, . . . , k.
where
1 if x ji = h
Ih (x j i) =
0 if x ji = h
For the first (m − t) values of h corresponding to judgments of dissatisfaction, the
transformed values of x ji are defined as:
m−t
z ji = x ji + ∑ Ih (x ji ) × (1 − fih ) × 0.5, i = 1, . . . , k; j = 1, . . . , N.
h=1
(z ji − zi min ) + 0.5
λ ji = , i = 1, . . . , k; j = 1, . . . , N
(zi max − zi min ) + 1
62 4 Composite Indicators and Satisfaction Profiles
T j = φ (λ j1 , . . . , λ jk ; w1 , . . . , wk ), j = 1, . . . , N.
In order for the global index to vary in the interval [0, 1] we put:
T j − Tmin
Sj = , j = 1, . . . , N,
Tmax − Tmin
where
Tmin = φ (λ1 min , . . . , λk min ; w1 , . . . , wk ),
Tmax = φ (λ1 max , . . . , λk max ; w1 , . . . , wk ),
and λi min and λi max are obtained according to the extreme satisfaction profiles:
Note that value Tmin represents the unpreferred value of the satisfaction index since
it is calculated from (λi min , . . . , λk min ), while Tmax represents the preferred value
since it is calculated from (λi max , . . . , λk max ). Tmin and Tmax are reference values to
evaluate the distance of the observed satisfaction values from the situation of highest
satisfaction defined according to the extreme satisfaction profile.
At the end of each teaching course, the students of the School of Engineering of
the University of Padova in Italy are required to complete a questionnaire on their
satisfaction about it. The questionnaire covers different aspects of satisfaction such
as organizational aspects, teaching activities and infrastructures. Finally students
were asked to give a judgement on their overall satisfaction. Questions of different
domains are shown in Table 4.1.
Answers consist of scores in Likert scale 1–10 intended as “greater is better”. We an-
alyze data from this survey adopting the NPC-based composite indicator described
in previous section. It is common in practice to consider the simple mean of the
answers to the question related to the overall satisfaction (D09 of Table 4.1) as in-
dicator of global satisfaction about a teaching course. In the following sections we
aim at showing the advantages that NPC-based composite indicator brings in under-
standing the real students’ satisfaction.
64 4 Composite Indicators and Satisfaction Profiles
We also investigate the impact of single aspects of satisfaction both on overall sat-
isfaction intended as the answers to D09 and on the NPC composite indicator, by
means of a multiple linear regression model (other model could be also adopted as
latent classes, multilevel models, etc.). We would expect that the overall satisfaction
is influenced simultaneously by all partial aspects. Actually when we are asked to
express a judgement about something a complex mechanism is activated in our mind
and it is common to be guided only by a particular aspect or few aspects. Indeed an-
alyzing data from students’ satisfaction we found that the overall satisfaction seems
guided by the satisfaction on how much teacher motivates the interest in the sub-
4.2 A Students’ Satisfaction Survey 65
Fig. 4.1 Distribution of the NPC-based composite indicator using different satisfaction profiles.
Red dashed lines represent the point of sufficient satisfaction; black dashed lines represent the
median of the composite indicator
Weak Satisfaction Profile
15
Frequency
10
5
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
10
5
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
ject (D05). Table 4.2 shows a representative extract of the results after adopting for
all teaching courses a regression model putting in relation the overall satisfaction
as indicated in D09 with partial aspects D01-D06. Since answers to D07-D09 pre-
sented a lot of missing values, we exclude them from this analysis. As we can see
satisfaction about D05 is present for each teaching course whereas other aspects do
not always influence the satisfaction. It is more evident in Fig. 4.2 where we report
the histogram showing the percentage of time when each partial aspect is resulted
significant (at a significance level α = 0.05) in the multivariate regression model
when putted in relation with D05 and NPC composite indicator. We can see that the
aspect related to the motivation of the teacher (D05) in the 80% of times impacts
on the overall satisfaction as intended by D09, followed by aspect related to the
teaching material (D04) and aspect related to teacher explanation (D06) both with
about 40%. Whereas the NPC composite indicator obviously (for its construction)
is influenced by all partial aspects (the distribution is almost uniform).
66 4 Composite Indicators and Satisfaction Profiles
Table 4.2 Extract of significant partial aspects in the regression model which put in relation the
overall satisfaction (D09) with partial aspects in the questionnaire
Teaching Course ID D01 D02 D03 D04 D05 D06
TC ID1 * *
TC ID2 * *
TC ID3 * * *
TC ID4 *
TC ID5 * *
TC ID6 * *
TC ID7 * * *
TC ID8 * * *
TC ID9 *
... ... ... ... ... ...
The star means that the aspect is significative at a significance level α = 0.05
Fig. 4.2 Percentage of time when each partial aspect results significant (α = 0.05) in determining
the overall satisfaction
Overall Satisfaction (D09)
1.0
0.8
0.6
0.4
0.0 0.2
In order to emphasize how the NPC-based composite indicator well explains the sat-
isfaction structure of the respondents we show for a teaching course the distribution
of the scores of each partial aspects, of the overall satisfaction and of the compos-
ite indicator (see Fig. 4.3). What we can see is that for example for this teaching
course, the distribution of the scores of the overall satisfaction (D9) is very much
similar to that related to teaching explanation (D6) confirming that the overall sat-
isfaction seems to be guided by a particular aspect. On the other hand the scores of
the composite indicator, converted on a scale 1–10, does not follow a specific aspect
but it mediate all partial aspects.
Fig. 4.3 Example of distribution of the scores of a teaching course for each partial aspect, overall
satisfaction and composite indicator
D01 D02 D03
0.6
0.6
0.6
0.0
0.0
0.0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0.6
0.0
0.0
0.0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0.0
0.0
0.0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
References
3. Arboretti Giancristofaro, R., Bonnini, S., Salmaso, L.: Employment status and
education/employment relationship of PhD graduates from the University of
Ferrara. J. Appl. Stat. 36(12), 1329–1344 (2009)
4. Bird, S.M., Cox, D., Farawell, V.T., Goldstain, H., Holt, T., Smith, P.C.: Per-
formance indicators: good, bad and ugly. J. R. Stat. Soc. Ser. A Stat. Soc. 168,
1–27 (2005)
5. Boccuzzo, G., Gianecchini, M.: Measuring young graduates’ job quality
through a composite indicator. Soc. Indic. Res. 122(2), 453–478 (2015)
6. Coromaldi, M., Zoli, M.: Deriving multidimensional poverty indicators:
methodological issues and an empirical analysis for Italy. Soc. Indic. Res.
107(1), 37–54 (2012)
7. Del Carmen Bas, M., Tarantola, S., Carot, J.M., Conchado, A.: Sensitivity
analysis: a necessary ingredient for measuring the quality of a teaching ac-
tivity index. Soc. Indic. Res. 131(3), 931–946 (2017)
8. Fayers, P.M., Hand, D.J.: Casual variables, indicator variables and measure-
ment scales: an example from quality of life. J. R. Stat. Soc. Ser. A Stat. Soc.
165(2), 233–253 (2002)
9. Finos, L., Salmaso, L.: Nonparametric multi-focus analysis for categorical
variables. Commun. Stat. Theory Methods 33(8), 1931–1941 (2004)
10. Giambona, F., Vassallo, E.: Composite indicators of social inclusion for Euro-
pean countries. Soc. Indic. Res. 116(1), 269–293 (2014)
11. Gnaldi, M., Ranalli, M.G.: Measuring university performance by means of
composite indicators: a robustness analysis of the composite measure used for
benchmark of Italian universities. Soc. Indic. Res. 129(2), 659–675 (2016)
12. Hoskins, B.L., Mascherini, M.: Measuring active citizenship through the de-
velopment of a composite indicator. Soc. Indic. Res. 90(3), 459–488 (2009)
13. Joint Research Center - European Commission: Handbook on Constructing
Composite Indicators: Methodology and User Guide. OECD Publishing, Paris
(2008)
14. Krishnan, V.: Development of a multidimensional living conditions index
(LCI). Soc. Indic. Res. 120(2), 455–481 (2015)
15. Marozzi, M.: A composite indicator dimension reduction procedure with ap-
plication to university student satisfaction. Stat. Neerl. 63(3), 258–268 (2009)
16. Marozzi, M.: Construction, dimension reduction and uncertainty analysis of
an index of trust in public institutions. Qual. Quant. 48(2), 939–953 (2014)
17. Munda, G.: Choosing aggregation rules for composite indicators. Soc. Indic.
Res. 109(3), 337–354 (2012)
18. Munda, G., Nardo, M., Saisana, M., Srebotnjak, T.: Measuring uncertainties
in composite indicators of sustainability. Int. J. Environ. Technol. Manag. 11,
7–26 (2009)
19. Murias, P., De Miguel, J.C., Rodrı́guez, D.: A composite indicator for uni-
versity quality assessment: the case of Spanish higher education system. Soc.
Indic. Res. 89(1), 129–146 (2008)
20. Paruolo, P., Saisana M., Saltelli, A.: Rating and rankings: voodoo or science?
J. R. Statist. Soc. A 176, 609–634 (2013)
References 69
21. Pesarin, F., Salmaso, L.: Permutation tests for complex data, theory, applica-
tions and software. Wiley, Chichester (2010)
22. Royuela, V., López-Tamayo, J., Suriñach, J.: Results of a quality of work life
index in Spain. A comparison of survey results and aggregate social indicators.
Soc. Indic. Res. 90(2), 225–241 (2009)
23. Saisana, M., d’Hombres, B., Saltelli, A., Rickety numbers: volatility of uni-
versity rankings and policy implications. Res. Policy 40, 165–177 (2011)
24. Saltelli, A.: Composite indicators between analysis and advocacy. Soc. Indic.
Res. 81(1), 65–77 (2007)
25. Saltelli, A., Tarantola, S.: On the relative importance of input factors in math-
ematical models: safety assessment for nuclear waste disposal. J. Am. Stat.
Assoc.
26. Somarriba, N., Pena, B.: Synthetic indicators of quality of life in Europe. Soc.
Indic. Res. 94(1), 115–133 (2009)
27. Tarabusi, C.E., Guarini, G.: An unbalance adjustment method for development
indicators. Soc. Indic. Res. 112, 19–45 (2013)
Chapter 5
Analyzing Survey Data Using
Multivariate Rank-Based Inference
Data from customer satisfaction surveys are multivariate—there are several ques-
tions resulting in as many endpoints. Furthermore, survey data typically don’t fit
into simple parametric models. Indeed, the endpoints or response variables may be
measured on different types of scales (metric, ordinal, binary). For these two rea-
sons, one requires multivariate inference methods, and specifically methods that can
deal with a mix of response variable types. Additionally, it would be advantageous if
the procedures also performed well for small to moderate numbers of respondents,
as not every survey can afford to obtain responses from hundreds of participants.
Appropriate inference procedures fulfilling all these requirements for multivari-
ate data are rare. However, they do exist within the framework of the multivariate,
robust rank-based approach developed in the last decade through the publications
[1, 2, 4, 3, 11, 10, 12, 15], and implemented in the R package npmv [9, 19]. This
approach constitutes a nonparametric extension of parametric multivariate analysis
of variance (MANOVA), and it doesn’t suffer from the severe limitations of those
classical approaches. Another set of appropriate inference procedures for multivari-
ate non-normal data is described in the book by Pesarin and Salmaso [18] and refer-
ences cited therein. In this chapter, we will focus on the former approach, the latter
is described in other chapters of the book.
Recall that the classical MANOVA model assumes that observation vectors are real-
izations of multivariate normal random variables. Furthermore, the classical model
assumes that the covariances structure between the different outcomes doesn’t differ
between groups. That is, no matter which group of respondents is considered, the
endpoints have the same variances and covariances. This latter assumption is called
homoscedasticity assumption.
To be mathematically precise, the classical model assumes the following.
(1) (k)
(Xi j , . . . , Xi j ) ∼ Nk (μi , Σ ), i = 1, . . . , a; j = 1, . . . , ni ,
degree of variation in answering the two questions, and the same correlation be-
tween the two responses. These are some rather strong assumptions. Why would
anyone who has no information regarding equal means, and thus wanting to test
their equality, believe in equal variances and correlations a priori? Even worse, vi-
olation of this so-called homoscedasticity assumption has rather detrimental effects
on the performance of classical MANOVA methods. Just as the two-sample t-test
with equal variance assumption performs poorly when variances are not equal and
samples not balanced, its extensions to ANOVA and MANOVA suffer from the same
problems. They may become very conservative (large type II error probability) or
very liberal (large type I error probability), depending on the configuration of sam-
ple sizes and underlying variances.
Concluding, there are obvious situations where classical MANOVA doesn’t make
sense for the data at hand because the response variables are not metric. However,
even for metric data, there is no guarantee that the actual error rates of classical
MANOVA procedures are close to the nominal error rates when the strong assump-
tions of multivariate normality and homoscedasticity are not fulfilled. And, if the
error rates are not reliable, those methods should not be used.
A fully nonparametric model for the multivariate observation vectors can be formu-
lated as follows.
(1) (k)
(Xi j , . . . , Xi j ) ∼ Fi , i = 1, . . . , a; j = 1, . . . , ni ,
where the Xi j are again independent random vectors of dimension k, and there is a
total of N = ∑ai=1 ni respondents or response vectors.
The details of this nonparametric model formula look a bit simpler compared to
the parametric model above. In fact, there are no assumptions regarding any particu-
lar distribution, such as a multivariate normal distribution. Instead, the model simply
states that the observations from group i follow some multivariate distribution that
is denoted as Fi . Each group may have a different multivariate distribution. It may
be multivariate normal in one group, and multivariate exponential in another. Or, it
may even be some discrete or absolutely continuous distribution without name. In
fact, one endpoint may be ordinal, another endpoint may be quantitative (metric),
and a third endpoint may be binary. Clearly, the data-generating model could not be
more general.
Also, there are no assumptions on specific covariance matrix structures. In gen-
eral, the covariance structure between the endpoints may be different for each
group. However, the generality of this model comes along with a rather stringent
null hypothesis. For inference in the sense of hypothesis testing, the null hypoth-
esis is formulated in terms of equality of the multivariate distributions, namely
74 5 Analyzing Survey Data Using Multivariate Rank-Based Inference
H0F : F1 = · · · = Fa . Thus, under this global null hypothesis, all multivariate distribu-
tions are equal. This implies that under null hypothesis, the means (if well-defined),
the medians, but also the dispersions are equal in each group. A bit later, when we
discuss the details on simultaneously testing subsets of the endpoints and of the
groups, this stringent null hypothesis will be softened somewhat.
Finally, the approach explained in this chapter does not require that group effects
point into the same direction for each variable, an assumption made in the test by
[17] that is widely cited in the medical literature. For example, an improvement in
a treatment, or a more favorable view of one sub-population, may in reality corre-
spond to increasing values of one outcome, and decreasing values of another, while
even other response variables may be completely unaffected. Such a situation would
present a challenge for methods designed similarly as the ‘O”Brien test. However,
in order to apply the methods presented here, it is not at all of importance. That is,
it is also not necessary to transform the variable scales in such a way that a favor-
able outcome always corresponds to larger values. In fact, no transformation at all is
needed in order to perform inference using the nonparametric multivariate methods
described here.
The global null hypothesis states that all k-variate distributions are equal for all a
groups that are being compared, namely H0F : F1 = · · · = Fa . Valid inference for this
global null hypothesis can be performed using the following steps. All the steps
described here in detail can conveniently be performed automatically using the R
package npmv. However, we think it is instructive and helpful in a knowledgeable
interpretation of the results to know what is implemented in the software, and in
part also why. We will illustrate the correct use of the R package in the subsequent
section.
1. Ranking.
Replace the observations for each endpoint by their mid-ranks. That is, for each
response variable separately, the observations (across all groups) are ranked from
smallest (rank 1) to largest (rank N). This is called variable-wise ranking. In case
of ties, mid-ranks are recommended. Table 5.1 illustrates the rank calculation for
a simple example with three variables (k = 3) and two groups (a = 2) of size
n1 = 4 and n2 = 3, respectively. The example is deliberately chosen to include
outcomes measured on three different scales, namely a metric, an ordinal, and a
binary (dichotomous) variable. All three scales typically appear in surveys, and
the approach presented here is general enough to allow for valid inference in the
presence of even a mixture of the three. As long as variable-wise ranking is pos-
sible, the respective response variables can be included. The resulting ranks are
assembled in the matrix R given in Table 5.2. This matrix has k rows, correspond-
5.3 How Can Inference Be Performed for the Global Null Hypothesis? 75
ing to the k endpoints, as well as N columns. Each column represents the ranks
obtained for one specific respondent. The blocks indicate different groups of re-
spondents (for example, males vs. females, or different nationalities, or different
types of instruction experienced).
Table 5.1 Illustration of variable-wise rank calculations for k = 3 variables and a = 2 groups
Original observations (Mid-)ranks
Group 1 Group 2 Group 1 Group 2
(1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1)
X11 X12 X13 X14 X21 X22 X23 R11 R12 R13 R14 R21 R22 R23
3 2 1 4 2 2 1 6 4 1.5 7 4 4 1.5
(2) (2) (2) (2) (2) (2) (2) (2) (2) (2) (2) (2) (2) (2)
X11 X12 X13 X14 X21 X22 X23 R11 R12 R13 R14 R21 R22 R23
-0.03 0.64 0.75 -0.80 0.97 0.24 -1.64 3 5 6 2 7 4 1
(3) (3) (3) (3) (3) (3) (3) (3) (3) (3) (3) (3) (3) (3)
X11 X12 X13 X14 X21 X22 X23 R11 R12 R13 R14 R21 R22 R23
0 0 0 1 1 0 1 2.5 2.5 2.5 6 6 2.5 6
The two groups have n1 = 4 and n2 = 3 respondents, respectively. The first outcome is measured
on an ordinal scale with categories coded as numbers 1–5. The second outcome is metric, rounded
to two digits, while the third endpoint is binary. In case of ties, midranks are calculated. The
inferential methods presented in the text allow for such a mixture of outcome scales
All the steps outlined in Sect. 5.3 can be performed at once using the R-package
npmv (Ellis et al. [9]). The test statistics to be calculated (Wilks’ Λ or ANOVA-type)
can be selected by the user. In fact, there are two more test statistics implemented
in npmv whose performance is typically somewhat similar to that of Wilks’ Λ (the
Lawley-Hotelling and the Bartlett-Nanda-Pillai tests). By default, p-values based on
both approximations to the sampling distribution are provided (F-distribution and
approximated permutation distribution).
For the simple illustrative example above, the data can be entered into an R data
frame using the following few lines.
X1=c(3,2,1,4,2,2,1,4,4,3,4)
X2=c(-0.03,0.64,0.75,-0.80,0.97,0.24,-1.64,-1.79,
-1.14,-0.29,-0.41)
X3=c(0,0,0,1,1,0,1,0,0,0,0)
group=as.factor(c(rep(1,4),rep(2,3),rep(3,4)))
X=cbind(group,X1,X2,X3)
X=as.data.frame(X)
In addition to the numbers given in Table 5.1, we have added a third group of four
three-variate observations. However, please note that this should really be regarded
as a toy example for pure illustrative purposes. In practice, sample sizes of n1 = 4,
n2 = 3, and n3 = 4 would be too small to draw reliable conclusions—even if the
methods presented here provide interpretable answers.
The analysis using npmv can simply be performed with the following two lines
of R-code, assuming that the package has been installed.
library(npmv)
nonpartest(X1|X2|X3 ˜ group, X, plots=FALSE,
permtest=FALSE)
In the output, no significant differences are revealed for this data set. In other
words, the data have not provided sufficient evidence against the null hypothesis
that the three (a = 3) three-variate (k = 3) distributions F1 , F2 , and F3 differ from
each other. Given the small sample sizes, this is not a surprise.
The output also provides the following descriptive summary of the data in terms
of nonparametric relative effects.
$releffects
X1 X2 X3
1 0.44318 0.63636 0.48864
2 0.24242 0.59091 0.69697
3 0.75000 0.29545 0.36364
This is to be interpreted as follows. Regarding variable X2 , the estimated marginal
nonparametric relative effect of group 3 equals about 0.3. This means that if a re-
spondent A was randomly chosen from group 3, and if another respondent B was
5.5 Which Groups Differ from Each Other? 79
randomly chosen from among all 11 participants in the study, then, regarding vari-
able X2 , the probability is 0.3 that A’s outcome is at least as large as B’s. Thus, the
nonparametric relative effects provide for an intuitive descriptive interpretation that
can be used along with the inferential results.
For data with ties, the wording “at least as large” is interpreted as “at least as
large, where equality is only given half weight”. This is much easier said with a
mathematical formula, namely P(XA > XB ) + 12 P(XA = XB ).
Here, we would like to investigate how differences between particular groups may
be investigated, in addition to an overall analysis of all groups. In classical terms,
this is often referred to as a post-hoc analysis.
In order to have a data set at hand that is somewhat more realistic in terms of its
size and usefulness, let us generate one using the random generator functions in R.
set.seed(0)
X1=c(rnorm(15),rnorm(15)+0.5,rnorm(15)+1,rnorm(15)+2)
X2=rnorm(60)
X3=X1+X2+0.5*rnorm(60)
group=as.factor(c(rep(1,15),rep(2,15),rep(3,15),
rep(4,15)))
X=cbind(group,X1,X2,X3)
X=as.data.frame(X)
library(npmv)
nonpartest(X1|X2|X3 ˜ group, X, permtest=FALSE)
80 5 Analyzing Survey Data Using Multivariate Rank-Based Inference
This function performs a closed testing procedure which reveals all combinations
of groups that exhibit a significant difference between them. Ultimately, it shows all
pairwise group comparisons that are significant, if there are any. The whole proce-
dure maintains the familywise error rate. That is, all decisions are made simultane-
ously at the prespecified α , which per default is set to 0.05.
This is reported to the user with the following statement in the output. “All appro-
priate subsets using factor levels have been checked using a closed multiple testing
procedure, which controls the maximum overall type I error rate at alpha= 0.05.”
In this data example, significant pairwise differences are found between the fol-
lowing pairs of groups: 1–4, 2–3, 2–4.
Considering how the group differences are introduced in the simulated data in
variable X1 , it is not surprising that those groups whose labels are furthest apart from
each other (1,4) show a significant difference, but the procedure is also capable of
detecting two more significant pairwise differences.
In the function call above, the option factors.and.variables = TRUE
ensures that group differences are actually evaluated. Since there are fewer variables
(k = 3) than groups (a = 4), if this option is not specified, the algorithm only looks
for differences between variables and not between groups.
Differences between variables are considered in the next Section.
If a global difference is detected, one would usually like to know which of the out-
comes have driven this significance. This can be considered a variable selection
problem, namely trying to answer the question which variables are most important
in distinguishing the groups from one another.
Let us use the same synthetic data already considered in the previous section. The
global test indeed revealed a significant difference, so it is natural and legitimate to
look for the sources for this difference.
The same code as already presented in the previous section also provides the
answers to this quest. The procedure is a modification of closed testing, therefore
the wording of the final sentence in the output is somewhat different. Most impor-
tant is however that also this procedure maintains the familywise error rate at the
nominal level.
5.7 How to Interpret the Results? 81
“All appropriate subsets using response variables have been checked using a mul-
tiple testing procedure, which controls the maximum overall type I error rate at al-
pha= 0.05.”
For the simulated data set being analyzed as an example, the procedure has re-
vealed variables X1 and X3 as driving the significant differences. Again, the result
is not surprising because in the synthetic data, group differences are introduced in
variable X1 . Variable X2 is only noise, so there should be no effect to be detected.
And X3 is highly influenced by X1 , so the differences among groups that are manifest
in X1 also carry over to X3 .
As in most statistical analyses, there are two major components, each requiring care-
ful interpretation. Namely, a descriptive, and an inferential component. In the de-
scriptive part, graphical and numerical summaries of the data are provided. A good
visualization should always be the first step of any data analysis, after the typically
lengthy data cleaning. In fact, a good visualization may even provide hints that data
cleaning needs to continue. Inconsistencies in the data may best be discovered using
visual methods. In this chapter, we will not provide detailed advice on how to dis-
play data graphically, as most introductory statistics textbooks devote ample space
to this topic (see, e.g., [6, 7, 20]).
Part of the descriptive analysis is also the calculation of numerical summary mea-
sures. In the context of a nonparametric approach to data analysis, statistics such as
means, variances, and standardized mean differences (often referred to as Cohen’s
d or Hedges’ g) are not appropriate summaries. They do not make sense for ordinal
or highly skewed data, and they have in general no relation to the conclusions from
nonparametric inference methods, unless very specific models are assumed (e.g.,
location shift models).
The most appropriate summary measure is the nonparametric relative effect
which is the statistical functional underlying the most important classical nonpara-
metric tests for two and more samples [8, 21, 16, 13, 14]. The same functional is
also the basis for the nonparametric multivariate inference methods described in this
chapter. For two random variables X and Y , the relative effect is basically the prob-
ability that the first one takes a smaller value than the second one. More precisely,
1
pXY = P(X < Y ) + P(X = Y ).
2
Its empirical analog in case of two samples is the proportion of (X,Y )-pairs from all
possible such pairs from the X- and Y -samples where the X-value is smaller than the
Y -value. In case of ties (equal values), one needs to add one half of the proportion
of pairs where both X and Y take the same value. In other words, the estimated
relative effect in two samples is the probability that a randomly chosen observation
from the first sample takes a smaller value than a randomly chosen observation
from the second sample, in case of ties corrected by one half of the probability of
an equal value.
82 5 Analyzing Survey Data Using Multivariate Rank-Based Inference
When there are three and more groups of experimental units, one compares the
observations in each group with those from a reference sample. Here, a useful ref-
erence sample is the combined sample of all respondents (from all groups) included
in the study. As described in Sect. 5.4, the estimated relative effect of group i is the
probability that a subject chosen randomly from all subjects in the study yields a
smaller value than a subject chosen randomly from all subjects in group i. Again, in
the presence of tied values, one half of the probability of an equal value is added.
“Randomly chosen” here means that each respondent in the respective sets has the
same probability of being chosen.
These estimated effects are provided in the output of the R-package npmv (Ellis
et al. [9]). They indicate a tendency of observations within a particular group to take
larger (or smaller) values, as compared to the other groups. A major advantage of
the relative effect is that it does not require metric responses. As long as a “smaller”
or “greater” relation can be assessed, the relative effect makes sense. Clearly, larger
values of the relative effect in one group indicate that this group tends to larger ob-
served values. A value of 1/2 for the relative effect can be interpreted as no tendency
to larger or smaller values in that particular group.
For the inferential analysis, the relative effects provide an important piece of
supplementary information. If the multivariate test reports a significant value, it is
instructive to look at the estimated relative effects, in order to be able to interpret
the significance.
Regarding the statistical inference itself, we recommend use of Wilks’ Λ when-
ever possible. If this test statistic cannot be calculated, the ANOVA-type statistic
should be used. If the p-value for an overall multivariate test based on the chosen
test statistic is small, there is evidence that in at least one outcome variable there is
at least one group whose observations tend to take smaller or larger values than in
the other groups, and this evidence is beyond mere chance.
Depending on whether interest focuses on groups or on outcome variables, in a
next step, a multiple testing procedure should be performed in order to identify the
groups or endpoints that are responsible for the detected significance. The method
implemented in the R-package npmv automatically adjusts for the multiplicity of
testing. Thus, the probability of making at least one false rejection is controlled
throughout the procedure.
Typically, the algorithm yields one or more endpoints or groups responsible for
the significant effect. Now, a descriptive reporting of the corresponding nonpara-
metric relative effects aids in interpreting the magnitude and direction of the effect.
Finally, a visualization involving the identified groups and endpoints is advisable.
References
2. Bathke, A.C., Harrar, S.W.: Rank-based inference for multivariate data in fac-
torial designs. In: Robust Rank-Based and Nonparametric Methods, pp. 121–
139. Springer, Berlin (2016)
3. Bathke, A.C., Harrar, S.W., Madden, L.V.: How to compare small multivariate
samples using nonparametric tests. Comput. Stat. Data Anal. 52(11), 4951–
4965 (2008)
4. Bathke, A.C., Harrar, S.W., Ahmad, M.R.: Some contributions to the analysis
of multivariate data. Biom. J. 51(2), 285–303 (2009). https://doi.org/10.1002/
bimj.200800196
5. Box, G.E., et al.: Some theorems on quadratic forms applied in the study of
analysis of variance problems, II. effects of inequality of variance and of cor-
relation between errors in the two-way classification. Ann. Math. Stat. 25(3),
484–498 (1954)
6. Crawley, M.J.: The R Book. Wiley, Chichester (2007)
7. Dalgaard, P.: Introductory Statistics with R. Springer, Berlin (2008)
8. Deuchler, G.: Über die Methoden der Korrelationsrechnung in der Pädagogik
und Psychologie. Zeitschrift für pädagogische Psychologie und experimentelle
Pädagogik 15, 114–31 (1914)
9. Ellis, A.R., Burchett, W.W., Harrar, S.W., Bathke, A.C.: Nonparametric infer-
ence for multivariate data: the R package npmv. J. Stat. Softw. 76(4), 1–18
(2017)
10. Harrar, S.W., Bathke, A.C.: Nonparametric methods for unbalanced multivari-
ate data and many factor levels. J. Multivar. Anal. 99,8, 1635–1664 (2008)
11. Harrar, S.W., Bathke, A.C.: A nonparametric version of the Bartlett-Nanda-
Pillai multivariate test. Asymptotics, approximations, and applications. Am. J.
Math. Manag. Sci. 28(3–4), 309–335 (2008)
12. Harrar, S.W., Bathke, A.C.: A modified two-factor multivariate analysis of
variance: asymptotics and small sample approximations (and erratum). Ann.
Inst. Stat. Math. 64(1 and 5), 135–165, 1087 (2012)
13. Kruskal, W.: A nonparametric test for the several sample problem. Ann. Math.
Stat. 23, 525–540 (1952)
14. Kruskal, W.H., Wallis, W.A.: Use of ranks in one-criterion variance analysis.
J. Am. Stat. Assoc. 47(260), 583–621 (1952)
15. Liu, C., Bathke, A.C., Harrar, S.W.: A nonparametric version of Wilks’ lambda
– asymptotic results and small sample approximations. Stat. Probability Lett.
81, 1502–1506 (2011)
16. Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables
is stochastically larger than the other. Ann. Math. Stat. 18, 50–60 (1947)
17. O’Brien, P.C.: Procedures for comparing samples with multiple endpoints.
Biometrics 40, 1079–1087 (1984)
18. Pesarin, F., Salmaso, L.: Permutation tests for complex data: theory, applica-
tions and software. Wiley, New York (2010)
19. R Core Team: R: A Language and Environment for Statistical Computing. R
Foundation for Statistical Computing, Wien, Austria (2017). URL http://www.
R-project.org
84 5 Analyzing Survey Data Using Multivariate Rank-Based Inference
20. Verzani, J.: Using R for Introductory Statistics. CRC Press, Boca Raton (2014)
21. Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1(6),
80–83 (1945)