Abstract
We design a probability distribution for ordinal data by modeling the process generating data, which is assumed to rely only on order comparisons between categories. Contrariwise, most competitors often either forget the order information or add a non-existent distance information. The data generating process is assumed, from optimality arguments, to be a stochastic binary search algorithm in a sorted table. The resulting distribution is natively governed by two meaningful parameters (position and precision) and has very appealing properties: decrease around the mode, shape tuning from uniformity to a Dirac, identifiability. Moreover, it is easily estimated by an EM algorithm since the path in the stochastic binary search algorithm can be considered as missing values. Using then the classical latent class assumption, the previous univariate ordinal model is straightforwardly extended to model-based clustering for multivariate ordinal data. Parameters of this mixture model are estimated by an AECM algorithm. Both simulated and real data sets illustrate the great potential of this model by its ability to parsimoniously identify particularly relevant clusters which were unsuspected by some traditional competitors.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Ranking data occur when some judges are asking to sort m objects, quoted by \(1,\ldots ,m\), according to a preference order. Thus, a ranking data is a permutation of the whole data set \(\{1,\ldots ,m\}\). Ordinal data occur when some judges are asking to give a note among the set of ordered notes \(1<\cdots <m\). Thus, an ordinal data is a unique element of \(\{1,\ldots ,m\}\).
We will do the slight abuse of notation \(\hbox {p}(x;\mu ,\pi )=\hbox {p}\Big (\{x\};\mu ,\pi \Big )\).
For avoiding the label cluster dependency, we keep the best distance over all label permutations.
Note that a two component multinomial model is not performed in this univariate situation since it is not identifiable.
All partitions are better displayed in the first and the third axes than in the first and second axes.
References
Agresti, A.: Analysis of Ordinal Categorical Data. Wiley Series in Probability and Statistics. Wiley-Interscience, New York (2010)
Allman, E.S., Matias, C., Rhodes, J.A.: Identifiability of parameters in latent structure models with many observed variables. Ann. Stat. 37(6A), 3099–3132 (2009)
Bartholomew, D.J., Knott, M., Moustaki, I.: Latent Variable Models and Factor Analysis: A Unified Approach. Wiley, Hoboken (2011)
Biernacki, C., Jacques, J.: A generative model for rank data based on sorting algorithm. Comput. Stat. Data Anal. 58, 162–176 (2013)
Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. J. Pattern Recogn. Soc. 28, 781–793 (1995)
D’Elia, A., Piccolo, D.: A mixture model for preferences data analysis. Comput. Stat. Data Anal. 49(3), 917–934 (2005)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39(1), 1–38 (1977). With discussion
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Soc. 97, 611–612 (2002)
Giordan, M., Diana, G.: A clustering method for categorical ordinal data. Commun. Stat. Theory Methods 40, 1315–1334 (2011)
Goodman, L.A., Kruskal, W.H.: Measures of association for cross classifications. J. Am. Stat. Assoc. 49, 732–764 (1954)
Goodman, L.A.: Explanatory latent structure models using both identifiable and unidentifiable models. Biometrika 61, 215–231 (1974)
Gouget, C.: Utilisation des modèles de mélange pour la classification automatique de données ordinales. Ph.D. thesis, Université de Technologie de Compiègne (2006)
Hartigan, J.A., Wong, M.A.: Algorithm as 1326: a k-means clustering algorithm. Appl. Stat. 28, 100–108 (1978)
Iannario, M., Piccolo, D.: A program in r for cub models inference. Technical report, Universita di Napoli Federico II, Version 2.0, www.dipstat.unina.it (2009)
Jacques, J., Biernacki, C.: Model-based clustering for multivariate partial ranking data. J. Stat. Plan. Inference 149, 201–217 (2014)
Jacques, J., Grimonprez, Q., Biernacki, C.: Rankcluster: an r package for clustering multivariate partial rankings. R J. (in press) (2014)
Jacques, J., Preda, C.: Functional data clustering: a survey. Adv. Data Anal. Classif. 8(3), 231–255 (2014)
Jollois, F-X., Nadif, M.: Classification de données ordinales : modèles et algorithmes. In: Proceedings of the 41th Conference of the French Statistical Society, Bordeaux, France (2011)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Hoboken (1990)
Kendall, M.G.: The treatment of ties in ranking problems. Biometrika 33, 239–251 (1945)
Knuth, D.E.: Sorting and Searching: The Art of Computer Programming, vol. 3, 2nd edn. Addison-Wesley Professional, Boston (1998)
Lewis, S.J.G., Foltynie, T., Blackwell, A.D., Robbins, T.W., Owen, A.M., Barker, R.A.: Heterogeneity of parkinson’s disease in the early clinical stages using a data driven approach. J. Neurol. Neurosurg. Psychiatry 76, 343–348 (2003)
Lipsitz, S.R., Kim, K., Zhao, L.: Analysis of repeated categorical data using generalized estimating equations. Stat. Med. 13, 1149–1163 (1994)
Manisera, M., Zuccolotto, P.: Modeling rating data with nonlinear CUB models. Comput. Stat. Data Anal. 78, 100–118 (2014)
Marbac, M., Biernacki, C., Vandewalle, V.: Model-based clustering of Gaussian copulas for mixed data. ArXiv e-prints, May (2014)
Matechou, E., Liu, I., Pledger, S., Arnold, R.: Biclustering models for ordinal data. In: NZSA 2011 Conference, Auckland, New-Zealand (2011)
McCullagh, P.: Regression models for ordinal data. J. R. Stat. Soc. Ser. B 42, 109–142 (1980)
McLachlan, G., Peel, D.: Applied Probability and Statistics. Finite Mixture Models. Wiley Series in Probability and Statistics. Wiley-Interscience, New York (2000)
McParland, D., Gormley, C.: Algorithms from and for Nature and Life: Studies in Classification, Data Analysis, and Knowledge Organization, chapter Clustering Ordinal Data via Latent Variable Models. Springer, New York (2013)
Melnykov, V., Maitra, R.: Finite mixture models and model-based clustering. Stat. Surv. 4, 80–116 (2010)
Nelsen, R.B.: An Introduction to Copulas. Springer, New York (1999)
Podani, J.: Braun-blanquet’s legacy and data analysis in vegetation science. J. Veg. Sci. 17, 113–117 (2006)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Somers, R.H.: A new assymmetric measure of association for ordinal variables. Am. Sociol. Rev. 27, 799–811 (1962)
Stevens, S.S.: On the theory of scales of measurement. Science 103(2684), 677–680 (1946)
Vermunt, J.K.: A general class of nonparametric models for ordinal categorical data. Sociol. Methodol. 29, 187–223 (1999)
Vermunt, J.K., Magidson, J.: Technical Guide for Latent GOLD 4.0: Basic and Advanced. Statistical Innovations Inc., Belmont (2005)
Wolfe, J.H.: Pattern clustering by multivariate mixture analysis. Multivar. Behav. Res. 5, 329–350 (1970)
Xu, R., Wunsch, D.C.: Clustering. Wiley, Hoboken (2009)
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: The BOS properties: statements and proofs
From Equations (7), (8) and (10) in Sect. 2.2, it is easily seen that the BOS model has a polynomial expression related to \(\pi \) of degree less than or equal to \(m-1\):
From this statement, we have developed a formal Matlab code computing the exact expression of each coefficient \(a_j(m,\mu ,x)\) for any x, \(\mu \), and m values. For instance, for \(m=5\), \(\mu =2\), and \(x=4\), we obtain the following exact expression:
This formal calculus tool is used in the proofs of all the following properties. Its advantage is to provide very straightforward proofs. Its drawback is to provide proofs for only some selected values of m. However, we have chosen \(m\in \{1,\ldots ,8\}\) since it answers to most practical situationsFootnote 7. For instance in the reference book Agresti (2010) all ordinal data have less than 8 categories.
Proposition 5.1
(\(\pi =0\): Uniformity.) If \(\pi =0\), then \(\forall (\mu ,x) \in \{1,\ldots ,m\}^2\), \(\hbox {p}(x;\mu ,\pi )=m^{-1}\).
Proof
The formal calculus leads to \(a_0(m,\mu ,x)=m^{-1}\) for all \(m\in \{1,\ldots ,8\}\). Conclusion follows by setting \(\pi =0\) in (18).
Proposition 5.2
(\(\pi =1\): Dirac in \(\mu \).) If \(\pi =1\), then \(\forall (x,\mu ) \in \{1,\ldots ,m\}^2\), \(\hbox {p}(x;\mu ,\pi )={\mathbb I}(x=\mu )\).
Proof
The formal calculus leads to \(\sum _{j=0}^{m-1}a_j(m,\mu ,\mu )=1\) for all \(m\in \{1,\ldots ,8\}\). Conclusion follows by setting \(\pi =1\) in (18).
Proposition 5.3
(\(\mu \): mode if \(\pi >0\).) If \(\pi >0\), then \(\forall (\mu ,x) \in \{1,\ldots ,m\}^2\) such that \(\mu \ne x\), \(\hbox {p}(\mu ;\mu ,\pi ) > \hbox {p}(x;\mu ,\pi )\).
Proof
For \(\pi =1\), Proposition 5.2 already established that \(\mu \) is the mode. We consider now \(0<\pi <1\). From (18), \(\hbox {p}(\mu ;\mu ,\pi ) - \hbox {p}(x;\mu ,\pi )\) is a polynomial of degree less than or equal to \(m-1\). Moreover, from Proposition 5.1, \(\pi =0\) is a root of \(\hbox {p}(\mu ;\mu ,\pi ) - \hbox {p}(x;\mu ,\pi )\), thus we can factorize it by \(\pi \)
Since \(0<\pi < 1\) implies that \(0<\pi ^j< 1\) for any \(j>1\), the following lower bound is directly obtained
where \(b_j^-(m,\mu ,x)=\min (0,b_j(m,\mu ,x))\). All \(b_j(m,\mu ,x)\) are obtained from a formal polynomial Euclidian division for all \(m\in \{1,\ldots ,8\}\), allowing to show by formal calculus that \( b_0(m,\mu ,x) + \sum _{j=1}^{m-2} b_j^-(m,\mu ,x) > 0\) for all \(m\in \{1,\ldots ,8\}\). Conclusion follows.
Proposition 5.4
(\(\mu \): absolute growing of its probability with \(\pi \). ) \(\forall \mu \in \{1,\ldots ,m\}\), \(\hbox {p}(\mu ;\mu ,\pi )\) is an increasing function of \(\pi \).
Proof
The formal calculus leads to \(a_j(m,\mu ,\mu )\ge 0\) for all \(m\in \{1,\ldots ,8\}\) and for all \(j\in \{1,\ldots ,m-1\}\). Conclusion follows directly from (18).
Proposition 5.5
( \(\mu \): relative growing of its probability with \(\pi \).) \(\forall (\mu ,x) \in \{1,\ldots ,m\}^2\) such that \(\mu \ne x\), \(\hbox {p}(\mu ;\mu ,\pi )- \hbox {p}(x;\mu ,\pi )\) is an increasing function of \(\pi \).
Proof
The key is to prove the differential (in \(\pi \)) expression \(\hbox {p}'(\mu ;\mu ,\pi )- \hbox {p}'(x;\mu ,\pi )>0\). From (18), \(\hbox {p}'(\mu ;\mu ,\pi ) - \hbox {p}'(x;\mu ,\pi )\) is a polynomial of degree less than or equal to \(m-2\)
Since \(0<\pi < 1\) implies that \(0<\pi ^j< 1\) for any \(j>1\), the following lower bound is directly obtained
where \(b_j^-(m,\mu ,x)=\min (0,b_j(m,\mu ,x))\). All \(b_j(m,\mu ,x)\) are obtained from a formal derivation of a polynomial for all \(m\in \{1,\ldots ,8\}\), allowing to show by formal calculus again that \( b_0(m,\mu ,x) + \sum _{j=1}^{m-2} b_j^-(m,\mu ,x) > 0\) for all \(m\in \{1,\ldots ,8\}\). Conclusion follows.
Proposition 5.6
(Decreasing around \(\mu \) if \(0<\pi <1\).) \(\forall (x,x') \in \{1,\ldots ,m\}^2\) such that \(x'<x<\mu \) or \(\mu >x>x'\), \(\hbox {p}(x;\mu ,\pi )> \hbox {p}(x';\mu ,\pi )\).
Proof
From (18), \(\hbox {p}(x;\mu ,\pi ) - \hbox {p}(x';\mu ,\pi )\) is a polynomial of degree less than or equal to \(m-1\). Moreover, from Propositions 5.1 and 5.2 respectively, \(\pi =0\) and \(\pi =1\) are roots of \(\hbox {p}(x;\mu ,\pi ) - \hbox {p}(x';\mu ,\pi )\), thus we can factorize it by \(\pi (1-\pi )\)
Since \(0<\pi < 1\) implies that \(0<\pi ^j< 1\) for any \(j>1\), the following lower bound is directly obtained
where \(b_j^-(m,\mu ,x,x')=\min (0,b_j(m,\mu ,x,x'))\). All \(b_j\) \((m,\mu ,x,x')\) are obtained from a formal polynomial Euclidian division for all \(m\in \{1,\ldots ,8\}\), allowing to show by formal calculus again that \( b_0(m,\mu ,x,x') + \sum _{j=1}^{m-2} b_j^-(m,\mu ,x,\) \(x') > 0\) for all \(m\in \{1,\ldots ,8\}\). Conclusion follows.
Proposition 5.7
(Identifiability) If \(0<\pi \le 1\), identifiability holds.
Proof
The identifiability problem could concern \(\mu \) and/or \(\pi \).
-
First, there exists no couple \((\mu ,\mu ')\in \{1,\ldots ,m\}^2\) with \(\mu \ne \mu '\) such that \(\hbox {p}(x;\mu ,\pi )=\hbox {p}(x;\mu ',\pi ')\) for any \(x\in \{1,\ldots ,m\}\) and any \((\pi ,\pi ')\in (0,1]^2\). Indeed, otherwise both \(\mu \) and \(\mu '\) should be modes of both distributions, which is impossible by unicity of the mode (Propositions 5.3 and 5.6).
-
Second, there exists no couple \((\pi ,\pi ')\in (0,1]^2\) with \(\pi \ne \pi '\) such that \(\hbox {p}(x;\mu ,\pi )=\hbox {p}(x;\mu ,\pi ')\) for any \(x\in \{1,\ldots ,m\}\) and any \(\mu \in \{1,\ldots ,m\}\) since \(\hbox {p}(x;\mu ,\pi )\) is a polynomial of order at least one. This last statement comes from the fact that \(\hbox {p}(x;\mu ,\pi )\) varies from 1 / m to 0 or 1 when \(\pi \) varies (respectively when \(x\ne \mu \) and \(x=\mu \)) from Propositions 5.1 and 5.2, thus depends on the value of \(\pi \).
Appendix 2: AERES data
See Table 11.
Rights and permissions
About this article
Cite this article
Biernacki, C., Jacques, J. Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm. Stat Comput 26, 929–943 (2016). https://doi.org/10.1007/s11222-015-9585-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-015-9585-2