Abstract
Network motif discovery is the problem of finding subgraphs of a network that occur more frequently than expected, according to some reasonable null hypothesis. Such subgraphs may indicate small scale interaction features in genomic interaction networks or intriguing relationships involving actors or a relationship among airlines. When nodes are labeled, they can carry information such as the genomic entity under study or the dominant genre of an actor. For that reason, labeled subgraphs convey information beyond structure and could therefore enjoy more applications. To identify statistically significant motifs in a given network, we propose an analytical method (i.e. simulation-free) that extends the works of Picard et al. (J Comput Biol 15(1):1–20, 2008) and Schbath et al. (J Bioinform Syst Biol 2009(1):616234, 2009) to label-dependent scale-free graph models. We provide an analytical expression of the mean and variance of the count under the Expected Degree Distribution random graph model. Our model deals with both induced and non-induced motifs. We have tested our methodology on a wide set of graphs ranging from protein–protein interaction networks to movie networks. The analytical model is a fast (usually faster by orders of magnitude) alternative to simulation. This advantage increases as graphs grow in size.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adamic LA, Glance N (2005) The political blogosphere and the 2004 U.S. election: divided they blog. In: Proceedings of the 3rd international workshop on link discovery, ACM, New York, pp 36–43
Ahmed NK, Neville J, Rossi RA, Duffield NG, Willke TL (2017) Graphlet decomposition: framework, algorithms, and applications. Knowl Inf Syst 50(3):689–722
Ashburner M, Ball CA, Blake JA (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25–29
Barabasi AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Batagelj V, Mrvar M, Zavesnik M (2002) Network analysis of dictionaries. In: Language technologies, pp 135–142
Bindea G, Mlecnik B, Hackl H (2009) ClueGO: a cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics 25(8):1091–1093
Chen J, Yuan B (2006) Detecting functional modules in the yeast protein–protein interaction network. Bioinformatics 22(18):2283–2290
Chen J, Hsu W, Lee ML, Ng S (2006) NeMoFinder: dissecting genome-wide protein–protein interactions with meso-scale network motifs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 106–115
Chung F, Lu L (2002) The average distances in random graphs with given expected degrees. Proc Natl Acad Sci 99(25):15879–15882
Daudin JJ, Picard F, Robin S (2008) A mixture model for random graphs. Stat Comput 18(2):173–183
Davis M, Liu W, Miller P, Hunter RF, Kee F (2014) Agwan: a generative model for labelled, weighted graphs. In: New frontiers in mining complex patterns: second international workshop, NFMCP 2013, pp 181–200
De Domenico M, Omodei E, Arenas A (2016) Quantifying the diaspora of knowledge in the last century. Appl Netw Sci 1:15
Durak N, Pinar A, Kolda TG, Seshadhri C (2012) Degree relations of triangles in real-world networks and graph models. In: Proceedings of the 21st ACM international conference on Information and knowledge management (CIKM’12), pp 1712–1716
Erdös P, Rényi A (1959) On random graphs. Publ Math 6:290–297
Johnson NL, Kotz S, Kemp AW (1992) Univariate discrete distributions, 2nd edn. Wiley, New York
Kim M, Leskovec J (2011) Modeling social networks with node attributes using the multiplicative attribute graph model. In: Proceedings of the twenty-seventh conference on uncertainty in artificial intelligence, pp 400–409
Knuth DE (1993) The Stanford GraphBase: a platform for combinatorial computing. ACM Press, New York
Ley M (2002) The DBLP computer science bibliography: evolution, research issues, perspectives. In: Proceedings of the international symposium on string processing and information retrieval, vol 2476, pp. 1–10
Maere S, Heymans K, Kuiper M (2005) BiNGO: a cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 21(16):3448–3449
Meira LAA, Maximo VR, Fazenda AL, Conceicao AFD (2014) Acc-Motif: Accelerated Network Motif Detection. IEEE/ACM Trans Comput Biol Bioinform 11(5):853–862
Milo R, Shen-Orr S, Itzkovitz S et al (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827
Milo R, Kashtan N, Itzkovitz S (2004) On the uniform generation of random graphs with prescibed degree sequences. arXiv:cond-mat/0312028
Newman MEJ, Strogatz SH, Watts DJ (2001) Random graphs with arbitrary degree distributions and their applications. Phys Rev E 64:026118
Nowicki K, Snijders T (2001) Estimation and prediction for stochastic block structures. J Am Stat Assoc 96:1077–1087
Opsahl T (2011) Why anchorage is not (that) important: binary ties and sample selection. http://toreopsahl.com/2011/08/12
Park J, Newman M (2003) The origin of degree correlations in the internet and other networks. Phys Rev E 68:026112
Park J, Newman MEJ (2004) Statistical mechanics of networks. Phys Rev E 70(6):066117
Pfeiffer III JJ, Moreno S, La Fond T, Neville J, Gallagher B (2014) Attributed graph models: modeling network structure with correlated attributes. In: Proceedings of the 23rd international conference on world wide web, pp 831–842
Picard F, Daudin JJ, Koskas M (2008) Assessing the exceptionality of network motifs. J Comput Biol 15(1):1–20
Prasad TSK, Goel R, Kandasamy K, Keerthikumar S (2009) Human protein reference database—2009 update. Nucleic Acids Res 37(1):D767–D772
Prill R, Iglesias PA, Levchenko A (2005) Dynamic properties of network motifs contribute to biological network organization. PLoS Biol 3(11):e343
Ribeiro P, Silva F (2014) G-Tries: a data structure for storing and finding subgraphs. Data Min Knowl Discov 28(2):337–377
Ruepp A, Zollner A, Maier D, Albermann K, Hani J, Mokrejs M, Tetko I, Gldener U, Mannhaupt G, Mnsterktter M, Mewes HW (2004) The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res 32(18):5539–5545
Schbath S, Lacroix V, Sagot MF (2009) Assessing the exceptionality of coloured motifs in networks. J Bioinform Syst Biol 2009(1):616234
Seshadhri C, Kolda TG, Pinar A (2012) Community structure and scale-free collections of Erdos–Renyi graphs. Phys Rev E 85(5):056109
Shen-Orr SS, Milo R, Mangan S (2002) Network motifs in the transcriptional regulation network of Escherichia coli. Nat. Genet. 31:64–68
Sinha A, Shen Z, Song Y, Ma H, Eide D, Hsu B, Wang K (2015) An overview of Microsoft Academic Service (MAS) and applications. In: Proceedings of the 24th international conference on world wide web (WWW 15 Companion), pp 243–246
Squartini T, Garlaschelli D (2011) Analytical maximum-likelihood method to detect patterns in real networks. New J Phys 13(8):083001
Varshney LR, Chen BL, Paniagua E (2011) Structural properties of the Caenorhabditis elegans neuronal network. PLoS Comput Biol 7(2):e1001066
von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P (2002) Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417:399–403
Wernicke S (2006) Efficient detection of network motifs. IEEE/ACM Trans Comput Biol Bioinform 3(4):347–359
Acknowledgements
The authors would like to thank Simone Severini for insightful discussions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Srinivasan Parthasarathy.
This work has been partially supported by the U.S. National Science Foundation and National Institutes of Health under Grants NSF: MCB-1158273, IOS-1339362, MCB-1412232, MCB-1355462, IOS-0922738, MCB-0929338, and NIH: 2R01GM032877-25A1. This work has been also partially supported by the Italian MIUR Projects: PRISMA—CUP E61H12000140005 and CLARA—CUP E64G14000190008.
Rights and permissions
About this article
Cite this article
Micale, G., Giugno, R., Ferro, A. et al. Fast analytical methods for finding significant labeled graph motifs. Data Min Knowl Disc 32, 504–531 (2018). https://doi.org/10.1007/s10618-017-0544-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-017-0544-8