Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3394486.3403240acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

A Non-Iterative Quantile Change Detection Method in Mixture Model with Heavy-Tailed Components

Published: 20 August 2020 Publication History

Abstract

Estimating parameters of mixture model has wide applications ranging from classification problems to estimating of complex distributions. Most of the current literature on estimating the parameters of the mixture densities are based on iterative Expectation Maximization (EM) type algorithms which require the use of either taking expectations over the latent label variables or generating samples from the conditional distribution of such latent labels using the Bayes rule. Moreover, when the number of components is unknown, the problem becomes computationally more demanding due to well-known label switching issues [28]. In this paper, we propose a robust and quick approach based on change-point methods to determine the number of mixture components that works for almost any location-scale families even when the components are heavy tailed (e.g., Cauchy). We present several numerical illustrations by comparing our method with some of popular methods available in the literature using simulated data and real case studies. The proposed method is shown be as much as 500 times faster than some of the competing methods and are also shown to be more accurate in estimating the mixture distributions by goodness-of-fit tests.

References

[1]
Murray Aitkin, Duy Vu, and Brian Francis. 2015. A new Bayesian approach for determining the number of components in a finite mixture. Metron, Vol. 73, 2 (2015), 155--176.
[2]
Hirotugu Akaike. 1987. Factor analysis and AIC. In Selected papers of hirotugu akaike. Springer, 371--386.
[3]
Felipe M Aparicio and Javier Estrada. 2001. Empirical distributions of stock returns: European securities markets, 1990--95. The European Journal of Finance, Vol. 7, 1 (2001), 1--21.
[4]
Tatiana Benaglia, Didier Chauveau, and David R Hunter. 2009. An EM-like algorithm for semi-and nonparametric estimation in multivariate mixtures. Journal of Computational and Graphical Statistics, Vol. 18, 2 (2009), 505--526.
[5]
Tatiana Benaglia, Didier Chauveau, and David R Hunter. 2011. Bandwidth selection in an EM-like algorithm for nonparametric multivariate mixtures. In Nonparametric Statistics And Mixture Models: A Festschrift in Honor of Thomas P Hettmansperger. World Scientific, 15--27.
[6]
Kenneth P Burnham and David R Anderson. 2004. Multimodel inference: understanding AIC and BIC in model selection. Sociological methods & research, Vol. 33, 2 (2004), 261--304.
[7]
Bradley P Carlin and Siddhartha Chib. 1995. Bayesian model choice via Markov chain Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 57, 3 (1995), 473--484.
[8]
Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society. Series B (methodological) (1977), 1--38.
[9]
Mathias Drton and Martyn Plummer. 2017. A Bayesian information criterion for singular models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 79, 2 (2017), 323--380.
[10]
Nader Ebrahimi and Sujit K Ghosh. 2001. Bayesian and frequentist methods in change-point problems. Handbook of statistics, Vol. 20 (2001), 777--787.
[11]
Paul Fearnhead. 2005. Direct simulation for discrete mixture distributions. Statistics and Computing, Vol. 15, 2 (2005), 125--133.
[12]
DAS Fraser. 1963. On sufficiency and the exponential family. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 25, 1 (1963), 115--123.
[13]
Stuart Geman and Donald Geman. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on pattern analysis and machine intelligence 6 (1984), 721--741.
[14]
John Geweke. 2007. Interpretation and inference in mixture models: Simple MCMC works. Computational Statistics & Data Analysis, Vol. 51, 7 (2007), 3529--3550.
[15]
Sanford J Grossman and Robert J Shiller. 1980. The determinants of the variability of stock market prices.
[16]
Ling Hu. 2006. Dependence patterns across financial markets: a mixed copula approach. Applied financial economics, Vol. 16, 10 (2006), 717--729.
[17]
JD Humphrey and KR Rajagopal. 2002. A constrained mixture model for growth and remodeling of soft tissues. Mathematical models and methods in applied sciences, Vol. 12, 03 (2002), 407--430.
[18]
Rebecca Killick, Paul Fearnhead, and Idris A Eckley. 2012. Optimal detection of changepoints with a linear computational cost. J. Amer. Statist. Assoc., Vol. 107, 500 (2012), 1590--1598.
[19]
Tze-San Lee. 2010. Change-point problems: bibliography and review. Journal of Statistical Theory and Practice, Vol. 4, 4 (2010), 643--662.
[20]
Jiayi Ma, Junjun Jiang, Chengyin Liu, and Yansheng Li. 2017. Feature guided Gaussian mixture model with semi-supervised EM and local geometric constraint for retinal image registration. Information Sciences, Vol. 417 (2017), 128--142.
[21]
Andrew McCallum. 1999. Multi-label text classification with a mixture model trained by EM. In AAAI workshop on Text Learning. 1--7.
[22]
Bengt Muthén and Kerby Shedden. 1999. Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics, Vol. 55, 2 (1999), 463--469.
[23]
Kazem Nasserinejad, Joost van Rosmalen, Wim de Kort, and Emmanuel Lesaffre. 2017. Comparison of criteria for choosing the number of classes in Bayesian finite mixture models. PloS one, Vol. 12, 1 (2017), e0168838.
[24]
Peter Neal and Theodore Kypraios. 2015. Exact Bayesian inference via data augmentation. Statistics and Computing, Vol. 25, 2 (2015), 333--347.
[25]
Ewa Nowakowska, Jacek Koronacki, and Stan Lipovetsky. 2014. Tractable measure of component overlap for gaussian mixture models. arXiv preprint arXiv:1407.7172 (2014).
[26]
Ewan S Page. 1954. Continuous inspection schemes. Biometrika, Vol. 41, 1/2 (1954), 100--115.
[27]
Byung-Jung Park, Dominique Lord, and Chungwon Lee. 2014. Finite mixture modeling for vehicle crash data with application to hotspot identification. Accident Analysis & Prevention, Vol. 71 (2014), 319--326.
[28]
Sylvia Richardson and Peter J Green. 1997. On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society: series B (statistical methodology), Vol. 59, 4 (1997), 731--792.
[29]
Kathryn Roeder. 1994. A graphical technique for determining the number of components in a mixture of normals. J. Amer. Statist. Assoc., Vol. 89, 426 (1994), 487--495.
[30]
Judith Rousseau and Kerrie Mengersen. 2011. Asymptotic behaviour of the posterior distribution in overfitted mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 73, 5 (2011), 689--710.
[31]
Jianfeng Si, Arjun Mukherjee, Bing Liu, Qing Li, Huayi Li, and Xiaotie Deng. 2013. Exploiting topic based twitter sentiment for stock prediction. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 24--29.
[32]
Chao Wang, Mohammed A Quddus, and Stephen G Ison. 2011. Predicting accident frequency at their severity levels and its application in site ranking using a two-stage mixed multivariate model. Accident Analysis & Prevention, Vol. 43, 6 (2011), 1979--1990.
[33]
Tingting Wang, Yi-Ping Phoebe Chen, Phil J Bowman, Michael E Goddard, and Ben J Hayes. 2016. A hybrid expectation maximisation and MCMC sampling algorithm to implement Bayesian mixture model based genomic prediction and QTL mapping. BMC genomics, Vol. 17, 1 (2016), 744.
[34]
Eric J Ward. 2008. A review and comparison of four commonly used Bayesian and maximum likelihood model selection tools. Ecological Modelling, Vol. 211, 1--2 (2008), 1--10.
[35]
CS Wong, WS Chan, and PL Kam. 2009. A Student t-mixture autoregressive model with applications to heavy-tailed financial data. Biometrika, Vol. 96, 3 (2009), 751--760.
[36]
Dong Yin, Jia Pan, Peng Chen, and Rong Zhang. 2008. Medical image categorization based on gaussian mixture model. In 2008 International Conference on BioMedical Engineering and Informatics, Vol. 2. IEEE, 128--131.
[37]
Ming-Heng Zhang and Qian-Sheng Cheng. 2004. Determine the number of components in a mixture model by the extended KS test. Pattern recognition letters, Vol. 25, 2 (2004), 211--216.

Index Terms

  1. A Non-Iterative Quantile Change Detection Method in Mixture Model with Heavy-Tailed Components

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
    August 2020
    3664 pages
    ISBN:9781450379984
    DOI:10.1145/3394486
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 August 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cauchy distribution
    2. heavy-tailed distribution
    3. mixture model
    4. stock data

    Qualifiers

    • Research-article

    Conference

    KDD '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 351
      Total Downloads
    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media