Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

Nonnegative matrix factorization (NMF) provides a lower rank approximation of a matrix by a product of two nonnegative factors. NMF has been shown to produce clustering results that are often superior to those by other methods such as K-means. In this paper, we provide further interpretation of NMF as a clustering method and study an extended formulation for graph clustering called Symmetric NMF (SymNMF). In contrast to NMF that takes a data matrix as an input, SymNMF takes a nonnegative similarity matrix as an input, and a symmetric nonnegative lower rank approximation is computed. We show that SymNMF is related to spectral clustering, justify SymNMF as a general graph clustering method, and discuss the strengths and shortcomings of SymNMF and spectral clustering. We propose two optimization algorithms for SymNMF and discuss their convergence properties and computational efficiencies. Our experiments on document clustering, image clustering, and image segmentation support SymNMF as a graph clustering method that captures latent linear and nonlinear relationships in the data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://projects.ldc.upenn.edu/TDT2/.

  2. http://www.daviddlewis.com/resources/testcollections/reuters21578/.

  3. http://jmlr.csail.mit.edu/papers/volume5/lewis04a/lyrl2004_rcv1v2_README.htm.

  4. http://robotics.stanford.edu/~gal/data.html.

  5. http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php.

  6. http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html.

  7. http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html.

  8. http://www.ri.cmu.edu/research_project_detail.html?project_id=418&menu_id=261.

  9. http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/resources.html.

  10. http://www.cs.berkeley.edu/~fowlkes/BSE/.

References

  1. Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 898–916 (2011)

    Article  Google Scholar 

  2. Arora, R., Gupta, M.R., Kapila, A., Fazel, M.: Clustering by left-stochastic matrix factorization. In: ICML ’11: Proceedings of the 28nd International Conference on Machine Learning (2011)

  3. Banerjee, A., Dhillon, I.S., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von Mises–Fisher distributions. J. Mach. Learn. Res. 6, 1345–1382 (2005)

    MATH  MathSciNet  Google Scholar 

  4. Berman, A., Plemmons, R.J.: Nonnegative Matrices in the Mathematical Sciences. SIAM, Philadelphia, PA (1994)

    Book  MATH  Google Scholar 

  5. Berman, A., Shaked-Monderer, N.: Completely Positive Matrices. World Scientific, River Edge, NJ (2003)

    Book  MATH  Google Scholar 

  6. Bertsekas, D.P.: Projected newton methods for optimization problems with simple constraints. SIAM J. Control Optim. 20(2), 221–246 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  7. Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont, MA (1999)

    MATH  Google Scholar 

  8. Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1548–1560 (2011)

    Article  Google Scholar 

  9. Catral, M., Han, L., Neumann, M., Plemmons, R.J.: On reduced rank nonnegative matrix factorization for symmetric matrices. Linear Algebra Appl. 393, 107–126 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  10. Chan, P., Schlag, M., Zien, J.: Spectral k-way ratio-cut partitioning and clustering. IEEE Trans. CAD Integr. Circuits Syst. 13(9), 1088–1096 (1994)

    Article  Google Scholar 

  11. Choo, J., Lee, C., Reddy, C.K., Park, H.: Utopian: User-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Trans. Vis. Comput. Graph. 19(12), 1992–2001 (2013)

    Article  Google Scholar 

  12. Cour, T., Benezit, F., Shi, J.: Spectral segmentation with multiscale graph decomposition. In: CVPR ’05: Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1124–1131 (2005)

  13. Dhillon, I., Guan, Y., Kulis, B.: A Unified View of Kernel K-Means, Spectral Clustering and Graph Cuts. Technical Report TR-04-25, University of Texas at Austin (2005)

  14. Dhillon, I., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 42, 143–175 (2001)

    Article  MATH  Google Scholar 

  15. Ding, C., He, X., Simon, H.D.: On the equivalence of nonnegative matrix factorization and spectral clustering. In: SDM ’05: Proceedings of the SIAM International Conference on Data Mining, pp. 606–610 (2005)

  16. Ding, C., Li, T., Jordan, M.: Nonnegative matrix factorization for combinatorial optimization: spectral clustering, graph matching, and clique finding. In: ICDM ’08: Proceedings of the 8th IEEE International Conference on Data Mining, pp. 183–192 (2008)

  17. Ding, C., Li, T., Jordan, M.I.: Convex and semi-nonnegative matrix factorization. IEEE Trans. Pattern Anal. Mach. Intell. 32(1), 45–55 (2010)

    Article  Google Scholar 

  18. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley-Interscience, New York (2000)

    Google Scholar 

  19. Fowlkes, C., Malik, J.: How Much Does Globalization Help Segmentation? Technical Report UCB/CSD-4-1340. University of California, Berkeley (2004)

  20. Gillis, N., Kuang, D., Park, H.: Hierarchical clustering of hyperspectral images using rank-two nonnegative matrix factorization. IEEE Trans. Geosci. Remote Sens. 53(4):2066–2078, (2015)

  21. Globerson, A., Chechik, G., Pereira, F., Tishby, N.: Euclidean embedding of co-occurrence data. J. Mach. Learn. Res. 8, 2265–2295 (2007)

    MATH  MathSciNet  Google Scholar 

  22. Gonzales, E.F., Zhang, Y.: Accelerating the Lee–Seung Algorithm for Non-Negative Matrix Factorization. Technical Report TR05-02, Rice University (2005)

  23. He, Z., Xie, S., Zdunek, R., Zhou, G., Cichocki, A.: Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering. IEEE Trans. Neural Netw. 22(12), 2117–2131 (2011)

    Article  Google Scholar 

  24. Ho, N.D.: Nonnegative Matrix Factorization Algorithms and Applications. Ph.D. thesis, Universite catholique de Louvain (2008)

  25. Kannan, R., Ishteva, M., Park, H.: Bounded matrix factorization for recommender system. Knowl. Inf. Syst. 39(3), 491–511 (2014)

    Article  Google Scholar 

  26. Kelley, C.T.: Iterative Methods for Linear and Nonlinear Equations. SIAM, Philadelphia, PA (1995)

    Book  MATH  Google Scholar 

  27. Kim, H., Park, H.: Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12), 1495–1502 (2007)

    Article  Google Scholar 

  28. Kim, H., Park, H.: Nonnegative matrix factorization based on alternating non-negativity-constrained least squares and the active set method. SIAM J. Matrix. Anal. Appl. 30(2), 713–730 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  29. Kim, J., He, Y., Park, H.: Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J. Glob. Optim. 58(2), 285–319 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  30. Kim, J., Park, H.: Sparse Nonnegative Matrix Factorization for Clustering. Technical Report GT-CSE-08-01, Georgia Institute of Technology (2008)

  31. Kim, J., Park, H.: Toward faster nonnegative matrix factorization: a new algorithm and comparisons. In: ICDM ’08: Proceedings of the 8th IEEE International Conference on Data Mining, pp. 353–362 (2008)

  32. Kim, J., Park, H.: Fast nonnegative matrix factorization: An active-set-like method and comparisons. SIAM J. Sci. Comput. 33(6), 3261–3281 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  33. Kleinberg, J.: An impossibility theorem for clustering. Adv. Neural Inf. Process. Syst. 15, 446–453 (2002)

    Google Scholar 

  34. Kuang, D., Ding, C., Park, H.: Symmetric nonnegative matrix factorization for graph clustering. In: SDM ’12: Proceedings of the SIAM International Conference on Data Mining, pp. 106–117 (2012)

  35. Kuang, D., Park, H.: Fast rank-2 nonnegative matrix factorization for hierarchical document clustering. In: KDD ’13: Proceedings of the 19th ACM International Conference on Knowledge Discovery and Data Mining, pp. 739–747 (2013)

  36. Kulis, B., Basu, S., Dhillon, I., Mooney, R.: Semi-supervised graph clustering: a kernel approach. In: ICML ’05: Proceedings of the 22nd Internatioal Conference on Machine Learning, pp. 457–464 (2005)

  37. Lawson, C.L., Hanson, R.J.: Solving Least Squares Problems. Prentice-Hall, Englewood Cliffs, NJ (1974)

    MATH  Google Scholar 

  38. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)

    Article  Google Scholar 

  39. Lee, K.C., Ho, J., Kriegman, D.: Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans. Pattern Anal. Mach. Intell. 27(5), 684–698 (2005)

    Article  Google Scholar 

  40. Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)

    Google Scholar 

  41. Li, T., Ding, C., Jordan, M.I.: Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: ICDM ’07: Proceedings of the 7th IEEE International Conference on Data Mining, pp. 577–582 (2007)

  42. Lin, C.J.: On the convergence of multiplicative update algorithms for nonnegative matrix factorization. Trans. Neural Netw. 18(6), 1589–1596 (2007)

    Article  Google Scholar 

  43. Lin, C.J.: Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19(10), 2756–2779 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  44. Lucena, A., Ribeiro, C., Santos, A.C.: A hybrid heuristic for the diameter constrained minimum spanning tree problem. J. Glob. Optim. 46(3), 363–381 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  45. Ma, X., Gao, L., Yong, X., Fu, L.: Semi-supervised clustering algorithm for community structure detection in complex networks. Phys. A Stat. Mech. Appl. 389(1), 187–197 (2010)

    Article  Google Scholar 

  46. Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and texture analysis for image segmentation. Int. J. Comput. Vis. 43(1), 7–27 (2001)

    Article  MATH  Google Scholar 

  47. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York, NY (2008)

    Book  MATH  Google Scholar 

  48. Martin, D., Fowlkes, C., Malik, J.: Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans. Pattern Anal. Mach. Intell. 26(5), 530–549 (2004)

    Article  Google Scholar 

  49. Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: ICCV ’01: Proceedings of the 8th IEEE International Conference on Computer Vision, vol. 2, pp. 416–423 (2001)

  50. Nepusz, T., Petróczi, A., Négyessy, L., Bazsó, F.: Fuzzy communities and the concept of bridgeness in complex networks. Phys. Rev. E 77(1), 016,107 (2008)

  51. Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 14, 849–856 (2001)

    Google Scholar 

  52. Pauca, V.P., Shahnaz, F., Berry, M.W., Plemmons, R.J.: Text mining using non-negative matrix factorizations. In: SDM ’04: Proceedings of the SIAM International Conference on Data Mining, pp. 452–456 (2004)

  53. Rinnooy Kan, A., Timmer, G.: Stochastic global optimization methods, part II: multi level methods. Math. Program. 39, 57–78 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  54. Rinnooy Kan, A., Timmer, G.: Global optimization. In: Kan, R., Todds, M.J. (eds.) Handbooks in Operations Research and Management Science, vol. 1, pp. 631–662. North Holland, Amsterdam (1989)

  55. Shahnaz, F., Berry, M.W., Pauca, V.P., Plemmons, R.J.: Document clustering using nonnegative matrix factorization. Inf. Process. Manag. 42, 373–386 (2006)

    Article  MATH  Google Scholar 

  56. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)

    Article  Google Scholar 

  57. Sim, T., Baker, S., Bsat, M.: The CMU pose, illumination, and expression database. IEEE Trans. Pattern Anal. Mach. Intell. 25(12), 1615–1618 (2003)

    Article  Google Scholar 

  58. Stewart, G.W., Sun, J.G.: Matrix Perturbation Theory. Academic Press, New York (1990)

    MATH  Google Scholar 

  59. Strogatz, S.H.: Exploring complex networks. Nature 410, 268–276 (2001)

    Article  Google Scholar 

  60. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  61. von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  62. Wang, F., Li, T., Wang, X., Zhu, S., Ding, C.: Community discovery using nonnegative matrix factorization. Data Min. Knowl. Discov. 22(3), 493–521 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  63. Wild, S., Curry, J., Dougherty, A.: Improving non-negative matrix factorizations through structured initialization. Pattern Recognit. 37, 2217–2232 (2004)

    Article  Google Scholar 

  64. Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: SIGIR ’03: Proceedings of the 26th International ACM Conference on Research and Development in Informaion Retrieval, pp. 267–273 (2003)

  65. Yang, Z., Hao, T., Dikmen, O., Chen, X., Oja, E.: Clustering by nonnegative matrix factorization using graph random walk. Adv. Neural Inf. Process. Syst. 25, 1088–1096 (2012)

    Google Scholar 

  66. Yang, Z., Oja, E.: Clustering by low-rank doubly stochastic matrix decomposition. In: ICML ’12: Proceedings of the 29nd International Conference on Machine Learning (2012)

  67. Yu, S.X., Shi, J.: Multiclass spectral clustering. In: ICCV ’03: Proceedings of the 9th IEEE International Conference on Computer Vision, pp. 313–319 (2003)

  68. Yun, S., Tseng, P., Toh, K.C.: A block coordinate gradient descent method for regularized convex separable optimization and covariance selection. Math. Program. 129, 331–355 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  69. Zass, R., Shashua, A.: A unifying approach to hard and probabilistic clustering. In: ICCV ’05: Proceedings of the 10th IEEE International Conference on Computer Vision, pp. 294–301 (2005)

  70. Zelnik-Manor, L., Perona, P.: Self-tuning spectral clustering. Adv. Neural Inf. Process. Syst. 17, 1601–1608 (2004)

    Google Scholar 

  71. Zhang, Y., Yeung, D.Y.: Overlapping community detection via bounded nonnegative matrix tri-factorization. In: KDD ’12: Proceedings of the 18th ACM International Conference on Knowledge Discovery and Data Mining, pp. 606–614 (2012)

  72. Zhang, Z.Y., Wang, Y., Ahn, Y.Y.: Overlapping community detection in complex networks using symmetric binary matrix factorization. Phys. Rev. E 87(6), 062,803 (2013)

Download references

Acknowledgments

The work of the first and third authors was supported in part by the National Science Foundation (NSF) Grants CCF-0808863 and the Defense Advanced Research Projects Agency (DARPA) XDATA program Grant FA8750-12-2-0309. The work of the second author was supported by the TJ Park Science Fellowship of POSCO TJ Park Foundation. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF, the DARPA, or the NRF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haesun Park.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kuang, D., Yun, S. & Park, H. SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering. J Glob Optim 62, 545–574 (2015). https://doi.org/10.1007/s10898-014-0247-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-014-0247-2

Keywords

Navigation