Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3637528.3671723acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

Low Rank Multi-Dictionary Selection at Scale

Published: 24 August 2024 Publication History

Abstract

The sparse dictionary coding framework represents signals as a linear combination of a few predefined dictionary atoms. It has been employed for images, time series, graph signals and recently for 2-way (or 2D) spatio-temporal data employing jointly temporal and spatial dictionaries. Large and over-complete dictionaries enable high-quality models, but also pose scalability challenges which are exacerbated in multi-dictionary settings. Hence, an important problem that we address in this paper is: How to scale multi-dictionary coding for large dictionaries and datasets?
We propose a multi-dictionary atom selection technique for low-rank sparse coding named LRMDS. To enable scalability to large dictionaries and datasets, it progressively selects groups of row-column atom pairs based on their alignment with the data and performs convex relaxation coding via the corresponding sub-dictionaries. We demonstrate both theoretically and experimentally that when the data has a low-rank encoding with a sparse subset of the atoms, LRMDS is able to select them with strong guarantees under mild assumptions. Furthermore, we demonstrate the scalability and quality of LRMDS in both synthetic and real-world datasets and for a range of coding dictionaries. It achieves 3 times to 10 times speed-up compared to baselines, while obtaining up to two orders of magnitude improvement in representation quality on some of the real world datasets given a fixed target number of atoms.

Supplemental Material

MP4 File - LRMDS-video
In this video we introduce our low-rank multi-dictionary selection model. We first discuss the basic low-rank multi-dictionary coding model (TGSD), its wide range of applications and its main drawback: limited scalability with large over-complete dictionaries. We introduce the general idea of how we address this limitation by sub-selecting dictionary atoms informed by the input data. We preview some experimental results demonstrating the ability of our approach to accurately discard a large fraction of unnecessary atoms and enable significant reduction in both the representation error and running time.

References

[1]
[n. d.]. Wikipedia Page Views Statistics http://dumps.wikimedia.org/other/ pagecounts-raw/.
[2]
Peter Bickel, Chao Chen, Jaimie Kwon, John Rice, and Erik Zwet. 2002. Traffic Flow on a Freeway Network. (01 2002). https://doi.org/10.1007/978-0-387-21579-2_5
[3]
T Tony Cai and Lie Wang. 2011. Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Transactions on Information theory, Vol. 57, 7 (2011), 4680--4688.
[4]
Scott Shaobing Chen, David L Donoho, and Michael A Saunders. 2001. Atomic decomposition by basis pursuit. SIAM review, Vol. 43, 1 (2001), 129--159.
[5]
Nilson Maciel de Paiva, Elaine Crespo Marques, and Lirida Alves de Barros Naviner. 2017. Sparsity analysis using a mixed approach with greedy and LS algorithms on channel estimation. In 2017 3rd International Conference on Frontiers of Signal Processing (ICFSP). IEEE, 91--95.
[6]
Xiaowen Dong, Dorina Thanou, Laura Toni, Michael Bronstein, and Pascal Frossard. 2020. Graph signal processing for machine learning: A review and new perspectives. IEEE Signal processing magazine, Vol. 37, 6 (2020), 117--127.
[7]
Michael Elad and Michal Aharon. 2006. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image processing, Vol. 15, 12 (2006), 3736--3745.
[8]
Yong Fang, JiaJi Wu, and BorMin Huang. 2012. 2D sparse signal recovery via 2D orthogonal matching pursuit. Science China Information Sciences, Vol. 55 (2012), 889--897.
[9]
Kaito Fujii and Tasuku Soma. 2018. Fast greedy algorithms for dictionary selection with generalized sparsity constraints. Advances in Neural Information Processing Systems, Vol. 31 (2018).
[10]
Aboozar Ghaffari, Massoud Babaie-Zadeh, and Christian Jutten. 2009. Sparse decomposition of two dimensional signals. In 2009 IEEE international conference on acoustics, speech and signal processing. IEEE, 3157--3160.
[11]
Thomas Nall Eden Greville. 1966. Note on the generalized inverse of a matrix product. Siam Review, Vol. 8, 4 (1966), 518--521.
[12]
Shihao Ji, Ya Xue, and Lawrence Carin. 2008. Bayesian compressive sensing. IEEE Transactions on signal processing, Vol. 56, 6 (2008), 2346--2356.
[13]
Samue Kemp, Jason W Howel, and Peter C Lu. 2020. Bing COVID-19 Tracker. www.bing.com/covid
[14]
Jaeseok Lee, Jun Won Choi, and Byonghyo Shim. 2016. Sparse signal recovery via tree search matching pursuit. Journal of Communications and Networks, Vol. 18, 5 (2016), 699--712.
[15]
Boya Ma, Maxwell McNeil, and Petko Bogdanov. 2023. GIST: Graph Inference for Structured Time Series. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM). SIAM, 433--441.
[16]
Boya Ma, Maxwell McNeil, Abram Magner, and Petko Bogdanov. 2024. Low Rank Multi-Dictionary Selection at Scale. arxiv: 2406.06960 [cs.LG]
[17]
Elaine Crespo Marques, Nilson Maciel, Lirida Naviner, Hao Cai, and Jun Yang. 2018. A review of sparse recovery algorithms. IEEE access, Vol. 7 (2018), 1300--1322.
[18]
Andreas Maurer, Massi Pontil, and Bernardino Romera-Paredes. 2013. Sparse coding for multitask and transfer learning. In International conference on machine learning. PMLR, 343--351.
[19]
Maxwell McNeil and Petko Bogdanov. 2023. Multi-dictionary tensor decomposition. In 2023 IEEE International Conference on Data Mining (ICDM). IEEE, 1217--1222.
[20]
Maxwell McNeil, Boya Ma, and Petko Bogdanov. 2022. SAGA: Signal-aware graph aggregation. In Proceedings of the 2022 SIAM International Conference on Data Mining (SDM). SIAM, 136--144.
[21]
Maxwell McNeil, Carolina Mattsson, Frank W Takes, and Petko Bogdanov. 2023. CADENCE: Community-Aware Detection of Dynamic Network States. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM). SIAM, 1--9.
[22]
Maxwell J McNeil, Lin Zhang, and Petko Bogdanov. 2021. Temporal Graph Signal Decomposition. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1191--1201.
[23]
Yagyensh Chandra Pati, Ramin Rezaiifar, and Perinkulam Sambamurthy Krishnaprasad. 1993. Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In Proceedings of 27th Asilomar conference on signals, systems and computers. IEEE, 40--44.
[24]
Wei Qiu, Jianxiong Zhou, and Qiang Fu. 2019. Jointly using low-rank and sparsity priors for sparse inverse synthetic aperture radar imaging. IEEE Transactions on Image Processing, Vol. 29 (2019), 100--115.
[25]
Jérémie Rappaz, Julian McAuley, and Karl Aberer. 2021. Recommendation on Live-Streaming Platforms: Dynamic Availability and Repeat Consumption. In Fifteenth ACM Conference on Recommender Systems. 390--399.
[26]
Ron Rubinstein, Alfred M Bruckstein, and Michael Elad. 2010. Dictionaries for sparse representation modeling. Proc. IEEE, Vol. 98, 6 (2010), 1045--1057.
[27]
David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. 2013. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE signal processing magazine, Vol. 30, 3 (2013), 83--98.
[28]
Srikanth V. Tenneti and P. P. Vaidyanathan. 2015. Nested Periodic Matrices and Dictionaries: New Signal Representations for Period Estimation. IEEE Trans. Signal Processing, Vol. 63, 14 (2015), 3736--3750. https://doi.org/10.1109/TSP.2015.2434318
[29]
Ivana Tovsić and Pascal Frossard. 2011. Dictionary learning. IEEE Signal Processing Magazine, Vol. 28, 2 (2011), 27--38.
[30]
Jian Wang, Seokbeop Kwon, and Byonghyo Shim. 2012. Generalized orthogonal matching pursuit. IEEE Transactions on signal processing, Vol. 60, 12 (2012), 6202--6216.
[31]
John Wright, Allen Y Yang, Arvind Ganesh, S Shankar Sastry, and Yi Ma. 2008. Robust face recognition via sparse representation. IEEE transactions on pattern analysis and machine intelligence, Vol. 31, 2 (2008), 210--227.
[32]
Zhen James Xiang, Yun Wang, and Peter J Ramadge. 2016. Screening tests for lasso problems. IEEE transactions on pattern analysis and machine intelligence, Vol. 39, 5 (2016), 1008--1027.
[33]
Dong Zhang, Yongshun Zhang, and Cunqian Feng. 2017. Joint-2D-SL0 Algorithm for Joint Sparse Matrix Reconstruction. International Journal of Antennas and Propagation, Vol. 2017, 1 (2017), 6862852.
[34]
Lin Zhang, Wenyu Zhang, Maxwell J McNeil, Nachuan Chengwang, David S Matteson, and Petko Bogdanov. 2021. AURORA: A Unified fRamework fOR Anomaly detection on multivariate time series. Data Mining and Knowledge Discovery, Vol. 35, 5 (2021), 1882--1905.
[35]
Zheng Zhang, Yong Xu, Jian Yang, Xuelong Li, and David Zhang. 2015. A survey of sparse representation: algorithms and applications. IEEE access, Vol. 3 (2015), 490--530.

Index Terms

  1. Low Rank Multi-Dictionary Selection at Scale
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
    August 2024
    6901 pages
    ISBN:9798400704901
    DOI:10.1145/3637528
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 August 2024

    Check for updates

    Author Tags

    1. dictionary selection
    2. low rank methods
    3. sparse coding

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    KDD '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 88
      Total Downloads
    • Downloads (Last 12 months)88
    • Downloads (Last 6 weeks)40
    Reflects downloads up to 19 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media