Abstract
Discovery of frequent subgraphs of a network is a challenging and time-consuming process. Several heuristics and improvements have been proposed before. However, when the size of subgraphs or the size of network is big, the process cannot be done in feasible time on a single machine. One of the promising solutions is using the processing power of available parallel and distributed systems. In this paper, we present a distributed solution for discovery of frequent subgraphs of a network using the MapReduce framework. The solution is named MRSUB and is developed to run over the Hadoop framework. MRSUB uses a novel and load-balanced parallel subgraph enumeration algorithm and fits it into the MapReduce framework. Also, a fast subgraph isomorphism detection heuristic is used which accelerates the whole process further. We executed MRSUB on a private cloud infrastructure with 40 machines and performed several experiments with different networks. Experimental results show that MRSUB scales well and offers an effective solution for discovery of frequent subgraphs of networks which are not possible on a single machine in feasible time.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Milo R, Shen-Orr S, Itzkovitz S et al (2002) Network motifs: simple building blocks of complex networks. Science 298:824–827
Wernicke S (2006) Efficient detection of network motifs. IEEE/ACM Trans Comput Biol Bioinform 3:347–359
Wernicke S, Rasche F (2006) FANMOD: a tool for fast network motif detection. Bioinformatics 22:1152–1153
Kashani ZRM, Ahrabian H, Elahi E et al (2009) Kavosh: a new algorithm for finding network motifs. BMC Bioinform 10:318
Ribeiro P, Silva F, Lopes L (2012) Parallel discovery of network motifs. J Parallel Distrib Comput 72:144–154
Grochow J, Kellis M (2007) Network motif discovery using subgraph enumeration and symmetry-breaking. In: Speed T, Huang H (eds) Research in computational molecular biology. Springer, Berlin, pp 92–106
Rudi AG, Shahrivari S, Jalili S, Kashani ZRM (2013) RANGI: a fast list-colored graph motif finding algorithm. IEEE/ACM Trans Comput Biol Bioinform 10:504–513
Masoudi-Nejad A, Schreiber F, Kashani ZRM (2012) Building blocks of biological networks: a review on major network motif discovery algorithms. Syst Biol 6:164–174
Wong E, Baur B, Quader S, Huang C-H (2012) Biological network motif detection: principles and practice. Brief Bioinform 13:202–215
Kang U, Tsourakakis CE, Faloutsos C (2011) PEGASUS: mining peta-scale graphs. Knowl Inf Syst 27:303–325
Inokuchi A, Washio T, Motoda H (2003) Complete mining of frequent patterns from graphs: mining graph data. Mach Learn 50:321–354
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of IEEE international conference on data mining, pp 313–320
Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of IEEE international conference on data mining, pp 721–724
Schreiber F, Schwbbermeyer H (2004) Towards motif detection in networks: frequency concepts and flexible search. In: Proceedings of the international workshop on network tools and applications in biology. NETTAB, pp 91–102
Kuramochi M, Karypis G (2004) GREW—a scalable frequent subgraph discovery algorithm. In: Proceedings of fourth IEEE international conference on data mining ICDM ’04, pp 439–442
Holder LB, Cook DJ, Djoko S (1994) Substructure discovery in the subdue system. In: Proceedings of the workshop on knowledge discovery in databases, pp 169–180
Afrati FN, Fotakis D, Ullman JD (2013) Enumerating subgraph instances using map-reduce. In: Proceedings of IEEE 29th international conference on data engineering (ICDE), pp 62–73
Cohen J (2009) Graph twiddling in a MapReduce world. Comput Sci Eng 11:29–41
Suri S, Vassilvitskii S (2011) Counting triangles and the curse of the last reducer. In: Proceedings of the 20th international conference on world wide web. ACM, NY, pp 607–614
Kolda TG, Pinar A, Plantenga T et al (2013) Counting triangles in massive graphs with MapReduce. arXiv:1301.5887
Pagh R, Tsourakakis CE (2012) Colorful triangle counting and a mapreduce implementation. Inf Process Lett 112:277–281
Zhao Z, Wang G, Butt AR et al (2012) Sahad: subgraph analysis in massive networks using hadoop. In: Proceedings of IEEE international parallel & distributed processing symposium (IPDPS). IEEE, pp 390–401
Kang U, Tsourakakis CE, Appel AP, Faloutsos C, Leskovec J (2011) Hadi: mining radii of large graphs. ACM Trans Knowl Discov Data 5(2):1–24
Johnson DS (2005) The NP-completeness column. ACM Trans Algorithms 1:160–176
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51:1–13
Dean J, Ghemawat S (2010) MapReduce: a flexible data processing tool. Commun ACM 53:72–77
White T (2012) Hadoop: the definitive guide. Yahoo Press, California
McKay BD (1981) Practical graph isomorphism. Department of Computer Science, Vanderbilt University, Tennessee, US
Junttila T, Kaski P (2007) Engineering an efficient canonical labeling tool for large and sparse graphs. In: Applegate D, Brodal GS, Panario D, Sedgewick R (eds) Proceedings of the ninth workshop on algorithm engineering and experiments and the fourth workshop on analytic algorithms and combinatorics. SIAM, pp 135–149
West DB (2001) Introduction to graph theory. Prentice Hall, Englewood Cliffs
Kashtan N, Itzkovitz S, Milo R, Alon U (2004) Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20:1746–1758
Srihari S, Leong HW (2013) A survey of computational methods for protein complex prediction from protein interaction networks. J Bioinform Comput Biol 11:12–30
Pablo MG, Danon L (2003) Community structure in jazz. Adv Complex Syst 6:565–573
Stehlé J, Voirin N, Barrat A et al (2011) High-resolution measurements of face-to-face contact patterns in a primary school. PLoS ONE 6:e23176
Leskovec J, Huttenlocher D, Kleinberg J (2010) Predicting positive and negative links in online social networks. In: Proceedings of the 19th international conference on World wide web, pp 641–650
Leskovec J, Mcauley JJ (2012) Learning to discover social circles in ego networks. In: Proceedings of advances in neural information processing systems (NIPS), pp 539–547
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1):1–41
Albert R, Jeong H, Barabási A-L (1999) Internet: diameter of the world-wide web. Nature 401:130–131
Stoica I, Morris R, Karger D et al (2001) Chord: a scalable peer-to-peer lookup service for internet applications. SIGCOMM Comput Commun Rev 31:149–160
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shahrivari, S., Jalili, S. Distributed discovery of frequent subgraphs of a network using MapReduce. Computing 97, 1101–1120 (2015). https://doi.org/10.1007/s00607-015-0446-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-015-0446-9