Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Two-dimensional data partitioning for non-negative matrix tri-factorization

Published: 18 October 2024 Publication History

Abstract

As a two-sided clustering and dimensionality reduction paradigm, Non-negative Matrix Tri-Factorization (NMTF) has attracted much attention in machine learning and data mining researchers due to its excellent performance and reliable theoretical support. Unlike Non-negative Matrix Factorization (NMF) methods applicable to one-sided clustering only, NMTF introduces an additional factor matrix and uses the inherent duality of data to realize the mutual promotion of sample clustering and feature clustering, thus showing great advantages in many scenarios (e.g., text co-clustering). However, the existing methods for solving NMTF usually involve intensive matrix multiplication, which is characterized by high time and space complexities, that is, there are limitations of slow convergence of the multiplicative update rules and high memory overhead. In order to solve the above problems, this paper develops a distributed parallel algorithm with a 2-dimensional data partition scheme for NMTF (i.e., PNMTF-2D). Experiments on multiple text datasets show that the proposed PNMTF-2D can substantially improve the computational efficiency of NMTF (e.g., the average iteration time is reduced by up to 99.7% on Amazon) while ensuring the effectiveness of convergence and co-clustering.

References

[1]
I.S. Dhillon, Co-clustering documents and words using bipartite spectral graph partitioning, in: KDD, 2001, pp. 269–274.
[2]
H. Wang, F. Nie, H. Huang, F. Makedon, Fast nonnegative matrix tri-factorization for large-scale data co-clustering, in: AAAI, 2011, pp. 1553–1558.
[3]
K. Allab, L. Labiod, M. Nadif, A semi-nmf-pca unified framework for data clustering, IEEE Transactions on Knowledge and Data Engineering 29 (1) (2016) 2–16.
[4]
B. Long, Z. Zhang, P.S. Yu, Co-clustering by block value decomposition, in: KDD, 2005, pp. 635–640.
[5]
C. Ding, T. Li, W. Peng, H. Park, Orthogonal nonnegative matrix t-factorizations for clustering, in: KDD, 2006, pp. 126–135.
[6]
Y. Chen, Z. Lei, Y. Rao, H. Xie, F.L. Wang, J. Yin, Q. Li, Parallel non-negative matrix tri-factorization for text data co-clustering, IEEE Transactions on Knowledge and Data Engineering 35 (5) (2023) 5132–5146.
[7]
Y.-X. Wang, Y.-J. Zhang, Nonnegative matrix factorization: a comprehensive review, IEEE Transactions on Knowledge and Data Engineering 25 (6) (2012) 1336–1353.
[8]
D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, in: NIPS, 2000, pp. 556–562.
[9]
D.D. Lee, H.S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature 401 (6755) (1999) 788–791.
[10]
R.M. Butler, E.L. Lusk, Monitors, messages, and clusters: the p4 parallel programming system, Parallel Computing 20 (4) (1994) 547–564.
[11]
L. Clarke, I. Glendinning, R. Hempel, The mpi message passing interface standard, in: Programming Environments for Massively Parallel Distributed Systems, 1994, pp. 213–218.
[12]
C. Dong, H. Zhao, W. Wang, Parallel nonnegative matrix factorization algorithm on the distributed memory platform, International Journal of Parallel Programming 38 (2010) 117–137.
[13]
M. Flatz, M. Vajteršic, Parallel nonnegative matrix factorization via Newton iteration, Parallel Processing Letters 26 (03) (2016).
[14]
R. Kannan, G. Ballard, H. Park, Mpi-faun: an mpi-based framework for alternating-updating nonnegative matrix factorization, IEEE Transactions on Knowledge and Data Engineering 30 (3) (2017) 544–558.
[15]
E. Chan, M. Heimlich, A. Purkayastha, R. Van De Geijn, Collective communication: theory, practice, and experience, Concurrency and Computation 19 (13) (2007) 1749–1783.
[16]
J.L. Gustafson, Reevaluating Amdahl's law, Communications of the ACM 31 (5) (1988) 532–533.
[17]
J. Han, K. Song, F. Nie, X. Li, Bilateral k-means algorithm for fast co-clustering, in: AAAI, Vol. 31, 2017, pp. 1969–1975.
[18]
S. Wang, A. Huang, Penalized nonnegative matrix tri-factorization for co-clustering, Expert Systems with Applications 78 (2017) 64–73.
[19]
A. Salah, M. Ailem, M. Nadif, Word co-occurrence regularized non-negative matrix tri-factorization for text data co-clustering, in: AAAI, 2018, pp. 3992–3999.
[20]
A. Strehl, J. Ghosh, Cluster ensembles—a knowledge reuse framework for combining multiple partitions, Journal of Machine Learning Research 3 (Dec) (2002) 583–617.
[21]
G. Bouma, Normalized (pointwise) mutual information in collocation extraction, in: GSCL 30, 2009, pp. 31–40.
[22]
M. Röder, A. Both, A. Hinneburg, Exploring the space of topic coherence measures, in: WSDM, 2015, pp. 399–408.
[23]
X. Qin, Y. Chen, Y. Rao, H. Xie, M.L. Wong, F.L. Wang, A constrained optimization approach for cross-domain emotion distribution learning, Knowledge-Based Systems 227 (2021).
[24]
Z. Lei, H. Liu, J. Yan, Y. Rao, Q. Li, Nmtf-ltm: towards an alignment of semantics for lifelong topic modeling, IEEE Transactions on Knowledge and Data Engineering 35 (10) (2023) 10616–10632.

Index Terms

  1. Two-dimensional data partitioning for non-negative matrix tri-factorization
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Big Data Research
    Big Data Research  Volume 37, Issue C
    Aug 2024
    82 pages

    Publisher

    Elsevier Science Publishers B. V.

    Netherlands

    Publication History

    Published: 18 October 2024

    Author Tags

    1. Non-negative matrix tri-factorization
    2. Text co-clustering
    3. 2-Dimensional data partitioning

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 29 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media