Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2063576.2063735acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Pattern change discovery between high dimensional data sets

Published: 24 October 2011 Publication History

Abstract

This paper investigates the general problem of pattern change discovery between high-dimensional data sets. Current methods either mainly focus on magnitude change detection of low-dimensional data sets or are under supervised frameworks. In this paper, the notion of the principal angles between the subspaces is introduced to measure the subspace difference between two high-dimensional data sets. Principal angles bear a property to isolate subspace change from the magnitude change. To address the challenge of directly computing the principal angles, we elect to use matrix factorization to serve as a statistical framework and develop the principle of the dominant subspace mapping to transfer the principal angle based detection to a matrix factorization problem. We show how matrix factorization can be naturally embedded into the likelihood ratio test based on the linear models. The proposed method is of an unsupervised nature and addresses the statistical significance of the pattern changes between high-dimensional data sets. We have showcased the different applications of this solution in several specific real-world applications to demonstrate the power and effectiveness of this method.

References

[1]
A. Banerjee, S. Merugu, I.S.Dhillon, and J.Ghosh. Clustering with Bregman divergence. Journal of Machine Learning Research, 6:1705--1749, 2005.
[2]
A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà. New ensemble methods for evolving data streams. In KDD '09, pages 139--148, 2009.
[3]
Y. Chi, B. Tseng, and J. Tatemura. Eigen-trend: trend analysis in the blogosphere based on singular value decomposition. In CIKM, pages 68--77, 2006.
[4]
C. Ding, T. Li, and M. Jordan. Nonnegative matrix factorization for combinatorial optimization: Spectral clustering, graph matching, and clique finding. In ICDM, pages 183--192, 2008.
[5]
A. Dries and U. Rückert. Adaptive concept drift detection. Statistical Analysis and Data Mining, 2(5--6):311--327, 2009.
[6]
R. Ge, M. Ester, B. Gao, Z. Hu, B. Bhattacharya, and B. Ben-Moshe. Joint cluster analysis of attribute data and relationship data: The connected k-center problem algorithms and applications. Trans. on Knowledge discovery from Data, 2:1--35, 2008.
[7]
G. H. Golub and C. F. V. Loan. Matrix computation. The Johns Hopkins University Press, Baltimore and London, 1996.
[8]
G. Gordon. Generalized2 linear2 models. In NIPS, 2002.
[9]
D. He and D.Parker. Topic dynamics: An alternative model of 'bursts' in streams of topics. In KDD, pages 443--452, 2010.
[10]
S. Hido, T. Ide, H. Kashima, H. Kubo, and H. Matsuzawa. Unsupervised change analysis using supervised learning. In Advances in Knowledge Discovery and Data Mining, pages 148--159. Springer Berlin / Heidelberg, 2008.
[11]
G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In KDD, pages 97--106, 2001.
[12]
D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. In VLDB, pages 180--191, 2004.
[13]
R. Klinkenberg. Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis, 8(3):697--717, 2004.
[14]
K. Lang. http://people.csail.mit.edu/jrennie/20newsgroups/.
[15]
L.Chen and A.Roy. Event detection from flickr data through wavelet-based spatial analysis. In CIKM, pages 523--532, 2009.
[16]
D. Lee and H. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788--791, 1999.
[17]
D. Lee and H. Seung. Algorithms for non-negative matrix factorization. In NIPS, pages 556--562, 2000.
[18]
B. Long, Z. Zhang, and P. Yu. Co-clustering by block value decomposition. In KDD, 2005.
[19]
B. Long, Z. Zhang, and P. Yu. Unsupervised learning on k-partite graphs. In KDD, pages 317--326, 2006.
[20]
P. Miettinen. Matrix Decomposition Methods for Data Mining: Computational Complexity and Algorithms. Helsinki University Print, 2009.
[21]
K. Nishida and K. Yamauchi. Detecting concept drift using statistical testing. In Proceedings of the 10th international conference on Discovery science, DS'07, pages 264--269, Berlin, Heidelberg, 2007. Springer-Verlag.
[22]
D. Preston, P. Protopapas, and C. Brodley. Event discovery in time series. In SDM, pages 61--72, 2009.
[23]
G. Seber and A. Lee. Linear Regression Analysis. Wiley, 2003.
[24]
A. P. Singh and G. Gordon. A unified view of matrix factorization models. In ECML PKDD, 2008.
[25]
X. Song, M. Wu, C. Jermaine, and S. Ranka. Statistical change detection for multi-dimensional data. In KDD, 2007.
[26]
A. Tsymbal. The problem of concept drift: Definitions and related work. Technical report, 2004.
[27]
M. van Leeuwen and A. Siebes. Streamkrimp: Detecting change in data streams. In ECML, 2008.
[28]
V. N. Vapnik. Statistical Learning Theory. John Wiley & sons, 1998.
[29]
J. Vreeken, M. Leeuwen, and A. Siebes. Characterising the difference. In KDD '07, pages 226--235, 2007.
[30]
H. Wang, W. Fan, P. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In KDD '03, pages 226--235, 2003.
[31]
P. Zhang, X. Zhu, and Y. Shi. Categorizing and mining concept drifting data streams. In KDD '08, pages 812--820, 2008.

Cited By

View all
  • (2017)Unsupervised Detection and Analysis of Changes in Everyday Physical Activity DataAdvances in Biomedical Informatics10.1007/978-3-319-67513-8_6(97-122)Online publication date: 20-Oct-2017
  • (2016)Unsupervised detection and analysis of changes in everyday physical activity dataJournal of Biomedical Informatics10.1016/j.jbi.2016.07.02063:C(54-65)Online publication date: 1-Oct-2016
  • (2015)Ranking educational videos: The impact of social presence2015 IEEE 9th International Conference on Research Challenges in Information Science (RCIS)10.1109/RCIS.2015.7128895(342-350)Online publication date: May-2015
  • Show More Cited By

Index Terms

  1. Pattern change discovery between high dimensional data sets

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
    October 2011
    2712 pages
    ISBN:9781450307178
    DOI:10.1145/2063576
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 October 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. matrix factorization
    2. pattern change detection
    3. principal angles
    4. principle of dominant subspace mapping
    5. unsupervised learning

    Qualifiers

    • Research-article

    Conference

    CIKM '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Unsupervised Detection and Analysis of Changes in Everyday Physical Activity DataAdvances in Biomedical Informatics10.1007/978-3-319-67513-8_6(97-122)Online publication date: 20-Oct-2017
    • (2016)Unsupervised detection and analysis of changes in everyday physical activity dataJournal of Biomedical Informatics10.1016/j.jbi.2016.07.02063:C(54-65)Online publication date: 1-Oct-2016
    • (2015)Ranking educational videos: The impact of social presence2015 IEEE 9th International Conference on Research Challenges in Information Science (RCIS)10.1109/RCIS.2015.7128895(342-350)Online publication date: May-2015
    • (2013)Nonnegative Local Coordinate Factorization for Image RepresentationIEEE Transactions on Image Processing10.1109/TIP.2012.222435722:3(969-979)Online publication date: 1-Mar-2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media