research-article

Open access

Learning the Markov Order of Paths in Graphs

Authors:

Luka V. Petrovic,

Ingo ScholtesAuthors Info & Claims

WWW '22: Proceedings of the ACM Web Conference 2022

Pages 1559 - 1569

https://doi.org/10.1145/3485447.3512091

Published: 25 April 2022 Publication History

All formats PDF

Abstract

We address the problem of learning the Markov order in categorical sequences that represent paths in a network, i.e., sequences of variable lengths where transitions between states are constrained to a known graph. Such data pose challenges for standard Markov order detection methods and demand modeling techniques that explicitly account for the graph constraint. Adopting a multi-order modeling framework for paths, we develop a Bayesian learning technique that (i) detects the correct Markov order more reliably than a competing method based on the likelihood ratio test, (ii) requires considerably less data than methods using AIC or BIC, and (iii) is robust against partial knowledge of the underlying constraints. We further show that a recently published method that uses a likelihood ratio test exhibits a tendency to overfit the true Markov order of paths, which is not the case for our Bayesian technique. Our method is important for data scientists analyzing patterns in categorical sequence data that are subject to (partially) known constraints, e.g. click stream data or other behavioral data on the Web, information propagation in social networks, mobility trajectories, or pathway data in bioinformatics. Addressing the key challenge of model selection, our work is also relevant for the growing body of research that emphasizes the need for higher-order models in network analysis.

References

[1]

Hirotugu Akaike. 1974. A new look at the statistical model identification. IEEE transactions on automatic control 19, 6 (1974), 716–723.

[2]

Theodore W Anderson and Leo A Goodman. 1957. Statistical inference about Markov chains. The Annals of Mathematical Statistics(1957), 89–110.

[3]

Anonymous. [n.d.]. Python implementation of Bayesian Multi-Order Networks. https://doi.org/10.5281/zenodo.4809434

[4]

M. Arlitt and T. Jin. 2000. A workload characterization study of the 1998 World Cup Web site. IEEE Network 14, 3 (2000), 30–37. https://doi.org/10.1109/65.844498

Digital Library

[5]

Angel Rodolfo Baigorri, Cátia Regina Goncalvez, and Paulo Angelo Alves Resende. 2014. Markov chain order estimation based on the chi-square divergence. The Canadian Journal of Statistics / La Revue Canadienne de Statistique 42, 4(2014), 563–578. http://www.jstor.org/stable/43185202

[6]

Maurice S Bartlett. 1951. The frequency goodness of fit test for probability chains. In Mathematical Proceedings of the Cambridge Philosophical Society, Vol. 47. Cambridge University Press, 86–95.

[7]

Federico Battiston, Giulia Cencetti, Iacopo Iacopini, Vito Latora, Maxime Lucas, Alice Patania, Jean-Gabriel Young, and Giovanni Petri. 2020. Networks beyond pairwise interactions: structure and dynamics. Physics Reports (2020).

[8]

Austin R Benson, David F Gleich, and Desmond J Higham. 2021. Higher-order Network Analysis Takes Off, Fueled by Old Ideas and New Data. https://sinews.siam.org/Details-Page/higher-order-network-analysis-takes-off-fueled-by-old-ideas-and-new-data

[9]

Patrick Billingsley. 1961. Statistical methods in Markov chains. The Annals of Mathematical Statistics(1961), 12–40.

[10]

Béla Bollobás. 2013. Modern graph theory. Vol. 184. Springer Science & Business Media.

[11]

Igor Cadez, David Heckerman, Christopher Meek, Padhraic Smyth, and Steven White. 2000. Visualization of navigation patterns on a web site using model-based clustering. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. 280–284.

Digital Library

[12]

Christopher Chatfield. 1973. Statistical inference regarding Markov chain models. Journal of the Royal Statistical Society: Series C (Applied Statistics) 22, 1(1973), 7–20.

[13]

Flavio Chierichetti, Ravi Kumar, Prabhakar Raghavan, and Tamas Sarlos. 2012. Are Web Users Really Markovian?. In Proceedings of the 21st International Conference on World Wide Web (Lyon, France) (WWW ’12). Association for Computing Machinery, New York, NY, USA, 609–618. https://doi.org/10.1145/2187836.2187919

Digital Library

[14]

Débora Cristina Corrêa, Jack Murdoch Moore, Thomas Jüngling, and Michael Small. 2020. Constrained Markov order surrogates. Physica D: Nonlinear Phenomena(2020), 132437.

[15]

Imre Csiszár, Paul C Shields, 2000. The consistency of the BIC Markov order estimator. The Annals of Statistics 28, 6 (2000), 1601–1619.

[16]

Daniel Dalevi and Devdatt Dubhashi. 2005. The peres-shields order estimator for fixed and variable length markov models with applications to DNA sequence similarity. In International Workshop on Algorithms in Bioinformatics. Springer.

Digital Library

[17]

Brian D Davison. 2004. Learning web request patterns. In Web dynamics. Springer, 435–459.

[18]

Chang C. Y Dorea. 2008. Optimal Penalty Term for EDC Markov Chain Order Estimator. Annales de l’ISUP (2008).

[19]

Paul G Hoel. 1954. A test for Markoff chains. Biometrika 41, 3/4 (1954), 430–433.

[20]

Søren Jespersen, Torben Bach Pedersen, and Jesper Thorhauge. 2003. Evaluating the markov assumption for web usage mining. In Proceedings of the 5th ACM international workshop on Web information and data management. 82–89.

Digital Library

[21]

Robert E Kass and Adrian E Raftery. 1995. Bayes factors. Journal of the american statistical association 90, 430(1995), 773–795.

[22]

Richard W Katz. 1981. On some criteria for estimating the order of a Markov chain. Technometrics 23, 3 (1981), 243–249.

[23]

Ron Kohavi, Carla E. Brodley, Brian Frasca, Llew Mason, and Zijian Zheng. 2000. KDD-Cup 2000 Organizers’ Report: Peeling the Onion. SIGKDD Explor. Newsl. 2, 2 (dec 2000), 86–93. https://doi.org/10.1145/380995.381033

Digital Library

[24]

Renaud Lambiotte, Martin Rosvall, and Ingo Scholtes. 2019. From networks to optimal higher-order models of complex systems. Nature Physics (2019), 313–320.

[25]

Timothy LaRock, Ingo Scholtes, and Tina Eliassi-Rad. 2021. Sequential Motifs in Observed Walks. arXiv preprint 2112.05642(2021). arxiv:2112.05642 [physics.soc-ph]

[26]

Jun S. Liu and Charles E. Lawrence. 1999. Bayesian inference on biopolymer models.Bioinformatics (Oxford, England) 15, 1 (1999), 38–52.

[27]

David JC MacKay and David JC Mac Kay. 2003. Information theory, inference and learning algorithms. Cambridge University Press.

Digital Library

[28]

AA Markov. 1906. Extension of law of big numbers on variables, depending from each other. Izvestiya Fiziko-Matematicheskogo Obschestva pri Kazanskom Universitete 2 (1906), 135–156.

[29]

AA Markov. 1913. Example of statistical research on text of “Eugene Onegin”, illustrating interconnection of trials in chain. Izvestiya Akademii Nauk SPb 6 (1913), 153–162.

[30]

ML Menéndez, L Pardo, MC Pardo, and Konstantinos Zografos. 2011. Testing the order of Markov dependence in DNA sequences. Methodology and computing in applied probability 13, 1(2011), 59–74.

[31]

Jack Murdoch Moore, Débora Cristina Corrêa, and Michael Small. 2018. Is Bach’s brain a Markov chain? Recurrence quantification to assess Markov order for short, symbolic, musical compositions. Chaos: An Interdisciplinary Journal of Nonlinear Science 28, 8(2018), 085715.

[32]

Maria Papapetrou and Dimitris Kugiumtzis. 2013. Markov chain order estimation with conditional mutual information. Physica A: Statistical Mechanics and its Applications 392, 7(2013), 1593–1601.

[33]

Maria Papapetrou and Dimitris Kugiumtzis. 2016. Markov chain order estimation with parametric significance tests of conditional mutual information. Simulation Modelling Practice and Theory 61 (2016), 1–13.

[34]

Tiago P Peixoto and Martin Rosvall. 2017. Modelling sequences and temporal networks with dynamic community structures. Nature communications 8, 1 (2017), 582.

[35]

Yuval Peres and Paul Shields. 2005. Two new Markov order estimators. arXiv preprint math/0506080(2005).

[36]

Vincenzo Perri and Ingo Scholtes. 2020. HOTVis: Higher-Order Time-Aware Visualisation of Dynamic Graphs. In Graph Drawing and Network Visualization - 28th International Symposium, GD 2020, Vancouver, BC, Canada, September 16-18, 2020, Revised Selected Papers(Lecture Notes in Computer Science, Vol. 12590), David Auber and Pavel Valtr (Eds.). Springer, 99–114. https://doi.org/10.1007/978-3-030-68766-3_8

Digital Library

[37]

SD Pethel and DW Hahs. 2014. Exact significance test for Markov order. Physica D: Nonlinear Phenomena 269 (2014), 42–47.

[38]

Peter Pirolli and James E. Pitkow. 1999. Distributions of Surfers’ Paths Through the World Wide Web: Empirical Characterizations. World Wide Web 2, 1-2 (1999), 29–45. https://doi.org/10.1023/A:1019288403823

Digital Library

[39]

Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing Personalized Markov Chains for Next-Basket Recommendation. In Proceedings of the 19th International Conference on World Wide Web (Raleigh, North Carolina, USA) (WWW ’10). Association for Computing Machinery, New York, NY, USA, 811–820. https://doi.org/10.1145/1772690.1772773

Digital Library

[40]

[40] RITA TransStat.2014. Origin and Destination Survey database. http://www.transtats.bts.gov/Tables.asp?DB_ID=125

[41]

Martin Rosvall, Alcides V Esquivel, Andrea Lancichinetti, Jevin D West, and Renaud Lambiotte. 2014. Memory in network flows and its effects on spreading dynamics and community detection. Nature communications 5(2014), 4630.

[42]

Ingo Scholtes. 2017. When is a network a network?: Multi-order graphical model selection in pathways and temporal networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1037–1046.

Digital Library

[43]

Gideon Schwarz 1978. Estimating the dimension of a model. The annals of statistics 6, 2 (1978), 461–464.

[44]

Philipp Singer, Denis Helic, Andreas Hotho, and Markus Strohmaier. 2015. Hyptrails: A bayesian approach for comparing hypotheses about human trails on the web. In Proceedings of the 24th International Conference on World Wide Web.

Digital Library

[45]

Philipp Singer, Denis Helic, Behnam Taraghi, and Markus Strohmaier. 2014. Detecting memory and structure in human navigation patterns using markov chain models of varying order. PloS one 9, 7 (2014), e102070.

[46]

Christopher C Strelioff, James P Crutchfield, and Alfred W Hübler. 2007. Inferring Markov chains: Bayesian estimation, model comparison, entropy rate, and out-of-class modeling. Physical Review E 76, 1 (2007), 011106.

[47]

Howell Tong. 1975. Determination of the order of a Markov chain by Akaike’s information criterion. Journal of applied probability 12, 3 (1975), 488–497.

[48]

Leo Torres, Ann S Blevins, Danielle S Bassett, and Tina Eliassi-Rad. 2020. The why, how, and when of representations for complex systems. arXiv preprint arXiv:2006.02870(2020).

[49]

[49] Transport for London.2014. Rolling Origin and Destination Survey (RODS) database. http://www.tfl.gov.uk/info-for/open-data-users/our-feeds

[50]

Marcel J Van der Heyden, Cees GC Diks, Bart PT Hoekstra, and Jacob DeGoede. 1998. Testing the order of discrete Markov chains using surrogate data. Physica D: Nonlinear Phenomena 117, 1-4 (1998), 299–313.

[51]

Robert West and Jure Leskovec. 2012. Human wayfinding in information networks. In Proceedings of the 21st World Wide Web Conference 2012, WWW 2012, Lyon, France, April 16-20, 2012, Alain Mille, Fabien Gandon, Jacques Misselis, Michael Rabinovich, and Steffen Staab (Eds.). ACM, 619–628. https://doi.org/10.1145/2187836.2187920

Digital Library

[52]

LC Zhao, CCY Dorea, and CR Gonçalves. 2001. On determination of the order of a Markov chain. Statistical inference for stochastic processes 4, 3 (2001), 273–282.

Cited By

Perri VPetrović LScholtes I(2023)Bayesian inference of transition matrices from incomplete graph data with a topological priorEPJ Data Science10.1140/epjds/s13688-023-00416-312:1Online publication date: 11-Oct-2023
https://doi.org/10.1140/epjds/s13688-023-00416-3
Klimm F(2022)Quantifying the ‘end of history’ through a Bayesian Markov-chain approachRoyal Society Open Science10.1098/rsos.2211319:11Online publication date: 30-Nov-2022
https://doi.org/10.1098/rsos.221131

Index Terms

Learning the Markov Order of Paths in Graphs
1. Theory of computation

Index terms have been assigned to the content through auto-classification.

Recommendations

The infinite hidden Markov random field model

Hidden Markov random field (HMRF) models are widely used for image segmentation, as they appear naturally in problems where a spatially constrained clustering scheme is asked for. A major limitation of HMRF models concerns the automatic selection of the ...
Ockham's Razor
Abstract
Ockham's Razor is the philosophical principle that entities or assumptions should not be introduced unnecessarily. This is often interpreted as stating that one should seek the simplest explanation for a phenomenon. The purposes of this note are ...
Factorial Markov Random Fields
ECCV '02: Proceedings of the 7th European Conference on Computer Vision-Part III

In this paper we propose an extension to the standard Markov Random Field (MRF) model in order to handle layers. Our extension, which we call a Factorial MRF (FMRF), is analogous to the extension from Hidden Markov Models (HMM's) to Factorial HMM's. We ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '22: Proceedings of the ACM Web Conference 2022

April 2022

3764 pages

ISBN:9781450390965

DOI:10.1145/3485447

Editors:
Frédérique Laforest
INSA Lyon, France
,
Raphaël Troncy
EURECOM, France
,
Elena Simperl
King’s College London, UK
,
Deepak Agarwal
Pinterest, USA
,
Aristides Gionis
KTH Royal Institute of Technology, Sweden
,
Ivan Herman
W3C / retired
,
Lionel Médini
Université Lyon 1, France

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 April 2022

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '22

Sponsor:

SIGWEB

WWW '22: The ACM Web Conference 2022

April 25 - 29, 2022

Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
1,194
Total Downloads

Downloads (Last 12 months)373
Downloads (Last 6 weeks)35

Reflects downloads up to 23 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Perri VPetrović LScholtes I(2023)Bayesian inference of transition matrices from incomplete graph data with a topological priorEPJ Data Science10.1140/epjds/s13688-023-00416-312:1Online publication date: 11-Oct-2023
https://doi.org/10.1140/epjds/s13688-023-00416-3
Klimm F(2022)Quantifying the ‘end of history’ through a Bayesian Markov-chain approachRoyal Society Open Science10.1098/rsos.2211319:11Online publication date: 30-Nov-2022
https://doi.org/10.1098/rsos.221131

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents