Efficient and accurate inference for mixtures of Mallows models with Spearman distance

Marta Crispino¹^na1,
Cristina Mollica²^na1,
Valerio Astuti¹ &
…
Luca Tardella²

304 Accesses
3 Citations
Explore all metrics

Abstract

The Mallows model (MM) occupies a central role in parametric modelling of ranking data to learn preferences of a population of judges. Despite the wide range of metrics for rankings that can be considered in the model specification, the choice is typically limited to the Kendall, Cayley or Hamming distances, due to the closed-form expression of the related model normalizing constant. This work instead focuses on the Mallows model with Spearman distance (MMS). A novel approximation of the normalizing constant is introduced to allow inference even with a large number of items. This allows us to develop and implement an efficient and accurate EM algorithm for estimating finite mixtures of MMS aimed at i) enlarging the applicability to samples drawn from heterogeneous populations, and ii) dealing with partial rankings affected by diverse forms of censoring. These novelties encompass the critical inferential steps that traditionally limited the use of this distance in practice, and render the MMS comparable (or even preferable) to the MMs with other metrics in terms of computational burden. The inferential ability of the EM scheme and the effectiveness of the approximation are assessed by extensive simulation studies. Finally, we show that the application to three real-world datasets endorses our proposals also in the comparison with competing mixtures of ranking models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian Plackett–Luce Mixture Models for Partially Ranked Data

Article 12 October 2016

Bayesian analysis of ranking data with the Extended Plackett–Luce model

Article 12 March 2020

Statistical Inference for Incomplete Ranking Data: A Comparison of Two Likelihood-Based Estimators

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Precisely, they are the unnormalized versions of the corresponding correlation coefficients.
This version is available at https://arxiv.org/abs/1405.7945v3. See in particular Section 5.1 and Appendix C.1.
Note that the definition of $Z(\theta )$ corresponds to the canonical partition function of statistical mechanics $Z(T) = \sum _i \exp \{-E_i/k_\beta T\} = \sum _Eg(E)\exp \{-E/k_\beta T\}$, where $k_\beta $ is the Boltzmann constant, T is the temperature of a system, $E_i$ is the energy of the i-th state and g(E) is the density of the states.
In the original notation, the author proved the existence and differentiability of the function $Z\left( f, \theta \right) = \lim _{n\rightarrow \infty } \frac{1}{n}\log \text {E}\left[ \exp (n\, \theta \, \nu _{\pi }\left[ f\right] )\right] $. By setting $f(x,y)=(x - y)^2$ and $\theta =k$, we obtain the desired result.
Note that this criterion is useful only for automatizing the simulations: when choosing the number of clusters in applications the researcher typically uses the BIC, as well as other relevant measures, as a guidance, and decides the number of groups based also on other heuristic criteria (such as the size of the clusters, their separation, or prior information depending on the application at hand).

References

Ali, A., Meilă, M.: Experiments with Kemeny ranking: What works when? Math. Soc. Sci. 64(1), 28–40 (2012)
MathSciNet MATH Google Scholar
Alvo, M., Yu, P.L.H.: Statistical methods for ranking data. Frontiers in Probability and the Statistical Sciences. Springer, New York, USA (2014)
MATH Google Scholar
Andrieu, C., Roberts, G.O.: The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Stat. 37(2), 697–725 (2009)
MathSciNet MATH Google Scholar
Bartholdi, J.J., Tovey, C.A., Trick, M.A.: The computational difficulty of manipulating an election. Soc. Choice Welf. 6(3), 227–241 (1989)
MathSciNet MATH Google Scholar
Beckett, L.A.: Maximum likelihood estimation in mallows’s model using partially ranked data. In: Fligner, M.A., Verducci, J.S. (eds.) Probability Models and Statistical Analyses for Ranking Data, pp. 92–107. Springer, New York (1993)
Google Scholar
Busse, L.M., Orbanz, P., Buhmann, J.M.: Cluster analysis of heterogeneous rank data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 113–120. (2007)
Caron, F., Teh, Y.W., Murphy, T.B.: Bayesian nonparametric Plackett–Luce models for the analysis of preferences for college degree programmes. Ann. Appl. Stat. 8(2), 1145–1181 (2014)
MathSciNet MATH Google Scholar
Chen, Q.R., Vansant, G., Oades, K., Pickering, M., Wei, J.S., Song, Y.K., Monforte, J., Khan, J.: Diagnosis of the small round blue cell tumors using multiplex polymerase chain reaction. J. Mol. Diagn. 9(1), 80–88 (2007)
Google Scholar
Cramér, H.: Sur un nouveau théoreme-limite de la théorie des probabilités. Actualités scientifiques et industrielles 736, 2–23 (1938)
MATH Google Scholar
Crispino, M., Modugno, L., Mollica, C.: The Mallows model with respondents’ covariates. Preliminary draft available upon request. (2023)
Croux, C., Dehon, C.: Influence functions of the Spearman and Kendall correlation measures. Stat. Methods Appl. 19, 497–515 (2010)
MathSciNet MATH Google Scholar
DeConde, R.P., Hawley, S., Falcon, S., Clegg, N., Knudsen, B., Etzioni, R.: Combining results of microarray experiments: a rank aggregation approach. Stat. Appl. Genet. Mol. Biol. 5(1), 23 (2006)
MathSciNet MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Stat. Method. 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Diaconis, P.: Group representations in probability and statistics, Volume 11 of Lecture Notes–Monograph Series. Hayward, USA: Institute of Mathematical Statistics. (1988)
Eliseussen, E., Fleischer, T., Vitelli, V.: Rank-based Bayesian variable selection for genome-wide transcriptomic analyses. Stat. Med. 41(23), 4532–4553 (2022)
MathSciNet Google Scholar
Ellis, R.S.: Large deviations for a general class of random vectors. Ann. Prob. 12(1), 1–12 (1984)
MathSciNet MATH Google Scholar
Feigin, P., Cohen, A.: On a model for concordance between judges. J. R. Stat. Soc. Ser. B Method. 40(2), 203–213 (1978)
MATH Google Scholar
Fligner, M.A., Verducci, J.S.: Distance based ranking models. J. R. Stat. Soc. B 48(3), 359–369 (1986)
MathSciNet MATH Google Scholar
Gärtner, J.: On large deviations from the invariant measure. Theory Prob. Appl. 22(1), 24–39 (1977)
MathSciNet MATH Google Scholar
Goibert, M., Calauzènes, C., Irurozki, E., Clémençon, S.: (2023). Robust consensus in ranking data analysis: Definitions, properties and computational issues. arXiv preprint arXiv:2303.12878
Goibert, M., Clémençon, S., Irurozki, E., Mozharovskyi, P.: (2022). Statistical depth functions for ranking distributions: definitions, statistical learning and applications. arXiv preprint arXiv:2201.08105
Gormley, I.C., Murphy, T.B.: Exploring voting blocs within the Irish electorate: a mixture modeling approach. J. Am. Stat. Assoc. 103(483), 1014–1027 (2008)
MathSciNet MATH Google Scholar
Gormley, I.C., Murphy, T.B.: A mixture of experts model for rank data with applications in election studies. Ann. Appl. Stat. 2(4), 1452–1477 (2008)
MathSciNet MATH Google Scholar
Henery, R.J.: Permutation probabilities as models for horse races. J. R. Stat. Soc. Ser. B Stat. Method. 43(1), 86–91 (1981)
MathSciNet MATH Google Scholar
Irurozki, E.: Sampling and learning distance-based probability models for permutation spaces. Ph. D. thesis, Department of Computer Science and Artificial Intelligence of the University of the Basque Country. (2014)
Irurozki, E., Calvo, B., Lozano, A.: PerMallows: an R package for Mallows and generalized Mallows models. J. Stat. Softw. 71(12), 1–30 (2016)
MATH Google Scholar
Irurozki, E., Calvo, B., Lozano, A.: Sampling and learning the Mallows and generalized Mallows models under the Cayley distance. Method. Comput. Appl. Prob. 20(1), 1–35 (2018)
MathSciNet MATH Google Scholar
Irurozki, E., Calvo, B., Lozano, J.A.: Mallows and generalized Mallows model for matchings. Bernoulli 25(2), 1160–1188 (2019)
MathSciNet MATH Google Scholar
Jacques, J., Grimonprez, Q., Biernacki, C.: Rankcluster: An R package for clustering multivariate partial rankings. R J. 6(1), 101–110 (2014)
Google Scholar
Kendall, M.G.: Rank correlation methods, 4th edn. Griffin, London (1970)
MATH Google Scholar
Khan, J., Wei, J.S., Ringnér, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)
Google Scholar
Lee, P.H., Yu, P.L.H.: Distance-based tree models for ranking data. Comput. Stat. Data Anal. 54(6), 1672–1682 (2010)
MathSciNet MATH Google Scholar
Lee, P.H., Yu, P.L.H.: Mixtures of weighted distance-based models for ranking data with applications in political studies. Comput. Stat. Data Anal. 56(8), 2486–2500 (2012)
MathSciNet MATH Google Scholar
Lee, P.H., Yu, P.L.H.: An R package for analyzing and modeling ranking data. BMC Med. Res. Method. 3(1), 1–11 (2013)
Google Scholar
Liu, Q., Reiner, A., Frigessi, A., Scheel, I.: Diverse personalized recommendations with uncertainty from implicit preference data with the Bayesian Mallows model. Knowl. Based Syst. 186, 104960 (2019)
Google Scholar
Mallows, C.L.: Non-null ranking models. I. Biometrika 44(1/2), 114–130 (1957)
MathSciNet MATH Google Scholar
Marden, J. I.: Analyzing and Modeling Rank Data, Volume 64 of Monographs on Statistics and Applied Probability. Cambridge, USA: Chapman & Hall. (1995)
McCullagh, P.: Models on spheres and models for permutations. In: Fligner, M.A., Verducci, J.S. (eds.) Probability Models and Statistical Analyses for Ranking Data, pp. 278–283. Springer, New York (1993)
Google Scholar
Meilǎ, M., Bao, L.: An exponential model for infinite rankings. J. Mach. Learn. Res. 11, 3481–3518 (2010)
MathSciNet MATH Google Scholar
Møller, J., Pettitt, A.N., Reeves, R., Berthelsen, K.K.: An efficient Markov Chain Monte Carlo method for distributions with intractable normalising constants. Biometrika 93(2), 451–458 (2006)
MathSciNet MATH Google Scholar
Mollica, C., Tardella, L.: Epitope profiling via mixture modeling for ranked data. Stat. Med. 33(21), 3738–3758 (2014)
MathSciNet Google Scholar
Mukherjee, S.: Estimation in exponential families on permutations. Ann. Stat. 44(2), 853–875 (2016)
MathSciNet MATH Google Scholar
Murphy, T.B., Martin, D.: Mixtures of distance-based models for ranking data. Comput. Stat. Data Anal. 41(3–4), 645–655 (2003)
MathSciNet MATH Google Scholar
Murray, I., Ghahramani, Z., MacKay, D.: MCMC for doubly-intractable distributions. (2012) arXiv preprint arXiv:1206.6848
Qian, Z., Yu, P.L.H.: Weighted distance-based models for ranking data using the R package rankdist. J. Stat. Softw. 90(5), 1–31 (2019)
Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
MathSciNet MATH Google Scholar
Shi, C., Wei, B., Wei, S., Wang, W., Liu, H., Liu, J.: A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm. EURASIP J. Wireless Commun. Network. 2021(1), 31 (2021)
Google Scholar
Sloane, N. J. A.: The Encyclopedia of Integer Sequences. (2017)
Sørensen, Ø., Crispino, M., Liu, Q., Vitelli, V.: BayesMallows: an R Package for the Bayesian Mallows Model. R J. 12(1), 324–342 (2020)
Google Scholar
Tarsitano, A. et al. Comparing the effectiveness of rank correlation statistics. P: Dip. di Economia e Statistica, University of della Calabria. (2009)
Varadhan, S.S.: Asymptotic probabilities and differential equations. Commun. Pure Appl. Math. 19(3), 261–286 (1966)
MathSciNet MATH Google Scholar
Vitelli, V., Sørensen, Ø., Crispino, M., Frigessi, A., Arjas, E.: Probabilistic preference learning with the Mallows rank model. J. Mach. Learn. Res. 18(158), 1–49 (2018)
MathSciNet MATH Google Scholar
Xu, H., Alvo, M., Yu, P.L.H.: Angle-based models for ranking data. Comput. Stati. Data Anal. 121, 113–136 (2018)
MathSciNet MATH Google Scholar
Zhao, Q., Hautamäki, V.: Knee Point Detection in BIC for Detecting the Number of Clusters. In Blanc-Talon, J., Bourennane, S., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2008. Lecture Notes in Computer Science, vol 5259. Springer, Berlin, Heidelberg, pp. 664–673. (2008)

Download references

Acknowledgements

The authors would like to thank an anonymous reviewer for her/his constructive and insightful comments that greatly improved the manuscript. The authors wish also to thank prof. Enrico Casadio Tarabusi for his precious suggestions on the approximation of the Spearman distance distribution. This work was supported by the grant N. RP12117A89FFE8A0 funded by Sapienza University of Rome.

The opinions expressed are those of the authors and do not necessarily reflect the views of the Bank of Italy or the Eurosystem.

Author information

Marta Crispino and Cristina Mollica have contributed equally to this work.

Authors and Affiliations

DG of Economics, Statistics and Research, Bank of Italy, Rome, Italy
Marta Crispino & Valerio Astuti
Department of Statistical Sciences, Sapienza University of Rome, Rome, Italy
Cristina Mollica & Luca Tardella

Authors

Marta Crispino
View author publications
You can also search for this author in PubMed Google Scholar
Cristina Mollica
View author publications
You can also search for this author in PubMed Google Scholar
Valerio Astuti
View author publications
You can also search for this author in PubMed Google Scholar
Luca Tardella
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Crispino and Mollica are both first authors as they contributed equally to the development of this work as well as to write the main manuscript text. Astuti developed and described the new approximation in Section 3 and Supplementary Material A. Tardella mainly contributed to implement and computationally optimize the EM algorithm for statistical inference described in Section 2. All authors reviewed the manuscript.

Corresponding author

Correspondence to Cristina Mollica.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 255 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Crispino, M., Mollica, C., Astuti, V. et al. Efficient and accurate inference for mixtures of Mallows models with Spearman distance. Stat Comput 33, 98 (2023). https://doi.org/10.1007/s11222-023-10266-8

Download citation

Received: 20 September 2022
Accepted: 13 June 2023
Published: 05 July 2023
DOI: https://doi.org/10.1007/s11222-023-10266-8

Efficient and accurate inference for mixtures of Mallows models with Spearman distance

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Bayesian Plackett–Luce Mixture Models for Partially Ranked Data

Bayesian analysis of ranking data with the Extended Plackett–Luce model

Statistical Inference for Incomplete Ranking Data: A Comparison of Two Likelihood-Based Estimators

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 255 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Efficient and accurate inference for mixtures of Mallows models with Spearman distance

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Bayesian Plackett–Luce Mixture Models for Partially Ranked Data

Bayesian analysis of ranking data with the Extended Plackett–Luce model

Statistical Inference for Incomplete Ranking Data: A Comparison of Two Likelihood-Based Estimators

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 255 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation