Abstract
Complex networks exist in a wide range of real world systems, such as social networks, technological networks, and biological networks. During the last decades, many researchers have concentrated on exploring some common things contained in those large networks include the small-world property, power-law degree distributions, and network connectivity. In this paper, we will investigate another important issue, community discovery, in network analysis. We choose Nonnegative Matrix Factorization (NMF) as our tool to find the communities because of its powerful interpretability and close relationship between clustering methods. Targeting different types of networks (undirected, directed and compound), we propose three NMF techniques (Symmetric NMF, Asymmetric NMF and Joint NMF). The correctness and convergence properties of those algorithms are also studied. Finally the experiments on real world networks are presented to show the effectiveness of the proposed methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Albert R, Jeong H, Barabasi A-L (1999) The diameter of the world wide web. Nature 401: 130
Amaral LAN, Scala A, Bartheélémy M, Stanley HE (2000) Classes of small-world networks. Proc Natl Acad Sci USA 97: 11149–11152
Barthelemy M, Amaral LAN (1999) Small-world networks: evidence for a crossover picture. Phys Rev Lett 82: 3180
Brunet JP, Tamayo P, Golub TR, Mesirov JP (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci USA 102(12): 4164–4169
Chen G, Wang F, Zhang C (2007) Collaborative filtering using orthogonal nonnegative matrix tri-factorization. In: ICDM workshops on high performance computing, pp 303–308
Ding C, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of SIAM international conference on data mining, pp 606–610
Ding C, Li T, Jordan MI (2006a) Convex and semi-nonnegative matrix factorizations. LBNL Tech Report 60428
Ding C, Li T, Peng W, Park H (2006b) Orthogonal nonnegative matrix tri-factorizations for clustering. In: SIGKDD, pp 126–135
Ding C, Li T, Peng W (2008) On the equivalence between nonnegative matrix factorization and probabilistic latent semantic indexing. Comput Stat Data Anal 52(1): 155–173
Faloutsos M, Faloutsos P, Faloutsos C (1999) On power-law relationships of the internet topology. In: SIGCOMM, pp 251–262
Flake GW, Lawrence S, Giles CL (2000) Efficient identification of web communities. In: SIGKDD, pp 150–160
Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12): 7821–7826
Hofmann T, Puzicha J (1999) Latent class models for collaborative filtering. In: IJCAI, pp 688–693
Ino H, Kudo M, Nakamura A (2005) Partitioning of web graphs by community topology. In: WWW, pp 661–669
Lee DD, Seung HS (1999) Learning the parts of objects by nonnegative matrix factorization. Nature 401: 788–791
Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization, pp 556–562
Long B, Zhang Z, Wu X, Yu PS (2007) Relational clustering by symmetric convex coding. In: Proceedings of the 24th international conference on machine learning, pp 569–576
Miao G, Song Y, Zhang D, Bai H (2008) Parallel spectral clustering algorithm for large-scale community data mining. In: The 17th WWW workshop on social web search and mining (SWSM)
Newman MEJ (2004a) Detecting community structure in networks. Eur Phys J B 38: 321–330
Newman MEJ (2004b) Fast algorithm for detecting community structure in very large networks. Phys Rev E 69
Paatero P, Tapper U (1994) Positive matrix factorization: a nonnegative factor model with optimal utilization of error estimates of data values. Environmetrics 5: 111–126
Palla G, Derényi I, Farkas I, Vicsek T. Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043):814–818
Pauca VP, Shahnaz F, Berry MW, Plemmons RJ (2004) Positive matrix factorization: a nonnegative factor model with optimal utilization of error estimates of data values. In: The 4th SIAM international conference on data mining, pp 452–456
Pennock DM, Horvitz E, Lawrence S, Giles CL (2000) Collaborative filtering by personality diagnosis: a hybrid memory- and model-based approach. In: UAI
Priebe CE, Conroy JM, Marchette DJ, Park Y (2005) Scan statistics on Enron graphs. In: Proceedings of SIAM international conference on data mining
Resnick P., Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) Grouplens: an open architecture for collaborative filtering of netnews, pp 175–186
Ruan J, Zhang W (2007) An efficient spectral algorithm for network community discovery and its applications to biological and social networks. In: ICDM
Scott J (2000) Social network analysis: a handbook, 2nd edn. Sage Publications, London
Sharan R et al (2005) From the cover: conserved patterns of protein interaction in multiple species. Proc Natl Acad Sci USA 102(6): 1974–1979
Strehl A, Ghosh J, Cardie C (2002) Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3: 583–617
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4): 395–416
Wang F, Ma S, Yang L, Li T (2006) Recommendation on item graphs. In: ICDM, pp 1119–1123
Wang D, Li T, Zhu S, Ding CHQ (2008a) Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 307–314
Wang F, Li T, Zhang C (2008b) Semi-supervised clustering via matrix factorization. In: The 8th SIAM international conference on data mining
Wasserman S, Faust K (1994) Social network analysis. Cambridge University Press, Cambridge
Watts DJ, Strogatz SH (1998) Collective dynamics of ’small-world’ networks. Nature 393: 440–442
Weiss Y (1999) Segmentation using eigenvectors: a unifying view. In: ICCV, pp 975–982
Xie YL, Hopke PK, Paatero P (1999) Positive matrix factorization applied to a curve resolution problem. J Chemometr 12(6): 357–364
Yu K, Tresp V (2005) Learning to learn and collaborative filtering. In: NIPS workshop on inductive transfer: 10 years later
Zhou D, Huang J, Schölkopf B (2005) Learning from labeled and unlabeled data on a directed graph. In: Proceedings of the 22nd international conference on machine learning, pp 1036–1043
Zhang H, Giles CL, Foley HC, Yen J (2007) Probabilistic community discovery using hierarchical latent gaussian mixture model. In: AAAI, pp 663–668
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Eamonn Keogh.
Rights and permissions
About this article
Cite this article
Wang, F., Li, T., Wang, X. et al. Community discovery using nonnegative matrix factorization. Data Min Knowl Disc 22, 493–521 (2011). https://doi.org/10.1007/s10618-010-0181-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-010-0181-y