Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1150402.1150445acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Beyond streams and graphs: dynamic tensor analysis

Published: 20 August 2006 Publication History

Abstract

How do we find patterns in author-keyword associations, evolving over time? Or in Data Cubes, with product-branch-customer sales information? Matrix decompositions, like principal component analysis (PCA) and variants, are invaluable tools for mining, dimensionality reduction, feature selection, rule identification in numerous settings like streaming data, text, graphs, social networks and many more. However, they have only two orders, like author and keyword, in the above example.We propose to envision such higher order data as tensors,and tap the vast literature on the topic. However, these methods do not necessarily scale up, let alone operate on semi-infinite streams. Thus, we introduce the dynamic tensor analysis (DTA) method, and its variants. DTA provides a compact summary for high-order and high-dimensional data, and it also reveals the hidden correlations. Algorithmically, we designed DTA very carefully so that it is (a) scalable, (b) space efficient (it does not need to store the past) and (c) fully automatic with no need for user defined parameters. Moreover, we propose STA, a streaming tensor analysis method, which provides a fast, streaming approximation to DTA.We implemented all our methods, and applied them in two real settings, namely, anomaly detection and multi-way latent semantic indexing. We used two real, large datasets, one on network flow data (100GB over 1 month) and one from DBLP (200MB over 25 years). Our experiments show that our methods are fast, accurate and that they find interesting patterns and outliers on the real datasets.

References

[1]
http://www.informatik.uni-trier.de/ ley/db/.
[2]
Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In SIGMOD, pages 207--216, 1993.
[3]
S. Brin and L. Page. The anatomy of a large-scale hypertextual (web) search engine. In WWW, pages 107--117, 1998.
[4]
J. D. Carroll and J. Chang. Analysis of individual differences in multi dimensional scaling via an n-way generalization of 'eckart-young' decomposition. Psychometrika, 35(3), 1970.
[5]
L. de Lathauwer. Signal Processing Based on Multilinear Algebra. PhD thesis, Katholieke University of Leuven, Belgium, 1997.
[6]
I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In KDD, 2003.
[7]
Chris Ding and Jieping Ye. Two-dimensional singular value decomposition (2dsvd) for 2d maps and images. In SDM, 2005.
[8]
Petros Drineas and Michael W. Mahoney. A randomized algorithm for a tensor-based generalization of the svd. technical report.
[9]
Peter W. Foltz and Susan T. Dumais. Personalized information delivery: An analysis of information filtering methods. Comm. of ACM (CACM), 35(12), 1992.
[10]
R. A. Harshman. Foundations of the parafac procedure:model and conditions for an explanatory multi-mode factor analysis. UCLA working papers in phonetics, 16, 1970.
[11]
S. Haykin. Adaptive Filter Theory. Prentice Hall, 1992.
[12]
Xiaofei He, Deng Cai, and Partha Niyogi. Tensor subspace analysis. In NIPS, 2005.
[13]
Geoff Hulten, Laurie Spencer, and Pedro Domingos. Mining time-changing data streams. In KDD, 2001.
[14]
Piotr Indyk, Nick Koudas, and S. Muthukrishnan. Identifying representative trends in massive time series data sets using sketches. In VLDB, 2000.
[15]
I. T. Jolliffe. Principal Component Analysis. Springer, 2002.
[16]
R. Kannan, S. Vempala, and A. Vetta. On clusterings-good, bad and spectral. In FOCS, 2000.
[17]
A. Kapteyn, H. Neudecker, and T. Wansbeek. An approach to n-mode component analysis. Psychometrika, 51(2), 1986.
[18]
George Karypis and Vipin Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48(1), 1998.
[19]
Jon Kleinberg. Authoritative sources in a hyperlinked environment. In SODA, 1998.
[20]
Tamara G. Kolda, Brett W. Bader, and Joseph P. Kenny. Higher-order web link analysis using multilinear algebra. In ICDM, 2005.
[21]
Flip Korn, Alexandros Labrinidis, Yannis Kotidis, and Christos Faloutsos. Quantifiable data mining using ratio rules. VLDB Journal, 8(3-4):254--266, 2000.
[22]
P. Kroonenberg and J. D. Leeuw. Principal component analysis of three-mode data by means of alternating least square algorithms. Psychometrika, 45, 1980.
[23]
S. Muthukrishnan. Data streams: algorithms and applications, volume 1. Foundations and Trends. in Theoretical Computer Science, 2005.
[24]
Christos H. Papadimitriou, Prabhakar Raghavan, Hisao Tamaki, and Santosh Vempala. Latent semantic indexing: A probabilistic analysis. In PODS, 1998.
[25]
Spiros Papadimitriou, Jimeng Sun, and Christos Faloutsos. Streaming pattern discovery in multiple time-series. In VLDB, 2005.
[26]
Amnon Shashua and Anat Levin. Linear image coding for regression and classification using the tensor-rank principle. In CVPR, number 1, 2001.
[27]
L. R. Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3), 1966.
[28]
M. A. O. Vasilescu and D. Terzopoulos. Multilinear analysis of image ensembles: Tensorfaces. In ECCV, 2002.
[29]
N. Viereck, M. Dyrby, and S. B. Engelsen. Monitoring Thermal Processes by NMR Technology. Elsevier Academic Press, 2006.
[30]
Dong Xu, Shuicheng Yan, Lei Zhang, Hong-Jiang Zhang, Zhengkai Liu, and Heung-Yeung Shum. Concurrent subspaces analysis. In CVPR, 2005.
[31]
Jieping Ye. Generalized low rank approximations of matrices. Machine Learning, 61, 2004.
[32]
Lizhuang Zhao and Mohammed J. Zaki. Tricluster: An effective algorithm for mining coherent clusters in 3d microarray data. In SIGMOD, 2005.

Cited By

View all
  • (2024)Matrix stretchingOpen Mathematics10.1515/math-2024-003122:1Online publication date: 3-Aug-2024
  • (2024)Anomaly Detection in Dynamic Graphs: A Comprehensive SurveyACM Transactions on Knowledge Discovery from Data10.1145/366990618:8(1-44)Online publication date: 29-May-2024
  • (2024)A Searchable Symmetric Encryption-Based Privacy Protection Scheme for Cloud-Assisted Mobile CrowdsourcingIEEE Internet of Things Journal10.1109/JIOT.2023.332066611:2(1910-1924)Online publication date: 15-Jan-2024
  • Show More Cited By

Index Terms

  1. Beyond streams and graphs: dynamic tensor analysis

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2006
    986 pages
    ISBN:1595933395
    DOI:10.1145/1150402
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 August 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    KDD06

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)70
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Matrix stretchingOpen Mathematics10.1515/math-2024-003122:1Online publication date: 3-Aug-2024
    • (2024)Anomaly Detection in Dynamic Graphs: A Comprehensive SurveyACM Transactions on Knowledge Discovery from Data10.1145/366990618:8(1-44)Online publication date: 29-May-2024
    • (2024)A Searchable Symmetric Encryption-Based Privacy Protection Scheme for Cloud-Assisted Mobile CrowdsourcingIEEE Internet of Things Journal10.1109/JIOT.2023.332066611:2(1910-1924)Online publication date: 15-Jan-2024
    • (2024)Spatiotemporal Group Anomaly Detection via Graph Total Variation on TensorsICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448399(7035-7039)Online publication date: 14-Apr-2024
    • (2024)Scale-variant structural feature construction of EEG stream via component-increased Dynamic Tensor DecompositionKnowledge-Based Systems10.1016/j.knosys.2024.111747294(111747)Online publication date: Jun-2024
    • (2024)GPUTucker: Large-Scale GPU-Based Tucker Decomposition Using Tensor PartitioningExpert Systems with Applications10.1016/j.eswa.2023.121445237(121445)Online publication date: Mar-2024
    • (2024)MPCA and MDA via Einstein productComputational and Applied Mathematics10.1007/s40314-024-02866-543:6Online publication date: 2-Aug-2024
    • (2024)Incremental algorithms for truncated higher-order singular value decompositionsBIT Numerical Mathematics10.1007/s10543-023-01004-764:1Online publication date: 8-Jan-2024
    • (2024)Classifying Malware Using Tensor DecompositionMalware10.1007/978-3-031-66245-4_1(3-36)Online publication date: 5-Jul-2024
    • (2023)Anonymous Edge Representation for Inductive Anomaly Detection in Dynamic Bipartite GraphProceedings of the VLDB Endowment10.14778/3579075.357908816:5(1154-1167)Online publication date: 1-Jan-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media