Abstract
Graphs are used to model many real objects such as social networks and web graphs. Many real applications in various fields require efficient and effective management of large-scale, graph-structured data. Although distributed graph engines such as GBase and Pregel handle billion-scale graphs, users need to be skilled at managing and tuning a distributed system in a cluster, which is a non-trivial job for ordinary users. Furthermore, these distributed systems need many machines in a cluster in order to provide reasonable performance. Several recent works proposed non-distributed graph processing platforms as complements to distributed platforms. In fact, efficient non-distributed platforms require less hardware resource and can achieve better energy efficiency than distributed ones. GraphChi is a representative non-distributed platform that is disk-based and can process billions of edges on CPUs in a single PC. However, the design drawbacks of GraphChi on I/O and computation model have limited its parallelism and performance. In this paper, we propose a general, disk-based graph engine called gGraph to process billion-scale graphs efficiently by utilizing both CPUs and GPUs in a single PC. GGraph exploits full parallelism and full overlap of computation and I/O processing as much as possible. Experiment results show that gGraph outperforms GraphChi and PowerGraph. In addition, gGraph achieves the best energy efficiency among all evaluated platforms.
Similar content being viewed by others
References
Addario-Berry L, Kennedy WS, King AD, Li Z, Reed B (2010) Finding a maximum-weight induced k-partite subgraph of an i-triangulated graph. Discret Appl Math 158(7):765–770
Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput 10(2):188–192
Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. In: Arnold D, Ruiter Bd (eds.) Computer graphics forum, vol. 8. Wiley Online Library, pp 3–11
Avery C (2011) Giraph: large-scale graph processing infrastruction on Hadoop. In: Proceedings of Hadoop Summit
Boldi P, Santini M, Vigna S (2008) A large time-aware web graph. In: ACM SIGIR forum, vol 42. ACM, pp 33–38
Chakrabarti D, Zhan Y, Faloutsos C (2004) R-MAT: a recursive model for graph mining. In: SIAM international conference on data mining (SDM), vol 4. SIAM, pp 442–446
Chan A, Dehne F, Taylor R (2005) Cgmgraph/cgmlib: implementing and testing cgm graph algorithms on PC clusters and shared memory machines. Int J High Perform Comput Appl 19(1):81–97
Che S, Beckmann B, Reinhardt S, Skadron K (2013) Pannotia: understanding irregular GPGPU graph applications. In: IEEE international symposium on workload characterization (IISWC), pp 185–195
Fackbook (2015) Facebook statistics. http://www.statisticbrain.com/facebook-statistics/
Fang J, Varbanescu AL, Sips H (2011) A comprehensive performance comparison of CUDA and OpenCL. In: 2011 international conference on parallel processing (ICPP). IEEE, pp 216–225
Ferrer R, Planas J, Bellens P, Duran A, Gonzalez M, Martorell X, Badia RM, Ayguade E, Labarta J (2011) Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL. In: Cooper K, Mellor-Crummey J, Sarkar V (eds.) Languages and compilers for parallel computing. Springer, Berlin, pp 215–229
Gharaibeh A, Costa LB, Santos-Neto E, Ripeanu M (2013) On graphs, GPUs, and blind dating: a workload to processor matchmaking quest. In: 2013 IEEE 27th international symposium on parallel and distributed processing (IPDPS). IEEE, pp 851–862
Gharaibeh A, Santos-Neto E, Costa LB, Ripeanu M (2013) Efficient large-scale graph processing on hybrid CPU and GPU systems. arXiv:1312.3018
Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX symposium on operating systems design and implementation (OSDI), pp 17–30
Graph 500 Steering Committee and others: Graph 500 benchmark. http://www.graph500.org/
Gregor D, Lumsdaine A (2005) The parallel bgl: a generic library for distributed graph computations. In: Parallel Object-Oriented Scientific Computing (POOSC)
Harish P, Vineet V, Narayanan P (2009) Large graph algorithms for massively multithreaded architectures. Centre for Visual Information Technology, I. Institute of Information Technology, Hyderabad, India, Tech. Rep. IIIT/TR/2009/74
ILK Research Group, Tilburg University (2015) The size of the world wide web (the internet). http://www.worldwidewebsize.com
Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS Oper Syst Rev 41(3):59–72
Kang U, Tong H, Sun J, Lin CY, Faloutsos C (2012) Gbase: an efficient analysis platform for large graphs. VLDB J 21(5):637–650
Kang U, Tsourakakis CE, Faloutsos C (2009) Pegasus: a peta-scale graph mining system implementation and observations. In: Ninth IEEE international conference on data mining (ICDM). IEEE, pp 229–238
Karypis G, Kumar V (1999) Parallel multilevel series k-way partitioning scheme for irregular graphs. SIAM Rev 41(2):278–300
Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th international conference on World wide web. ACM, pp 591–600
Kyrola A, Blelloch G, Guestrin C (2012) Graphchi: large-scale graph computation on just a PC. In: Proceedings of the 10th USENIX symposium on operating systems design and implementation (OSDI), vol 8, pp 31–46
Lee J, Han WS, Kasperovics R, Lee JH (2012) An in-depth comparison of subgraph isomorphism algorithms in graph databases. In: Proceedings of the 39th international conference on very large data bases. VLDB Endowment, pp 133–144
Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM (2012) Distributed graphlab: a framework for machine learning and data mining in the cloud. In: Proceedings of the very large data bases (VLDB) endowment, vol 5, issue 8, pp 716–727
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM international conference on management of data (SIGMOD). ACM, pp 135–146
Maserrat H, Pei J (2010) Neighbor query friendly compression of social networks. In: Proceedings of the 16th ACM international conference on Knowledge discovery and data mining (SIGKDD). ACM, pp 533–542
Merrill D, Garland M, Grimshaw A (2012) Scalable GPU graph traversal. In: ACM SIGPLAN notices, vol 47. ACM, pp 117–128
Neo Technology, inc. (2015) Learning neo4j. http://neo4j.com/book-learning-neo4j/
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web
Pearce R, Gokhale M, Amato NM (2010) Multithreaded asynchronous graph traversal for in-memory and semi-external memory. In: Proceedings of the 2010 ACM/IEEE international conference for high performance computing, networking, storage and analysis. IEEE Computer Society, pp 1–11
Salihoglu S, Widom J (2013) Gps: a graph processing system. In: Proceedings of the 25th international conference on scientific and statistical database management. ACM, p 22
Shao B, Wang H, Li Y (2013) Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the 2013 international conference on Management of data. ACM, pp 505–516
Shen J, Fang J, Sips H, Varbanescu AL (2012) Performance gaps between OpenMP and OpenCL for multi-core CPUs. In: 2012 41st international conference on parallel processing workshops (ICPPW). IEEE, pp 116–125
Shiloach Y, Vishkin U (1982) An O(log(n)) parallel connectivity algorithm. J Algorithms 3(1):57–67
Shun J, Blelloch GE (2013) Ligra: a lightweight graph processing framework for shared memory. In: Proceedings of the 18th ACM symposium on principles and practice of parallel programming (SIGPLAN). ACM, pp 135–146
Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111
Vineet V, Harish P, Patidar S, Narayanan P (2009) Fast minimum spanning tree for large graphs on the GPU. In: Proceedings of the conference on high performance graphics 2009. ACM, pp 167–171
Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In: Proceedings of the ACM SIGKDD workshop on mining data semantics. ACM, p 3
Acknowledgments
The authors would like to thank anonymous reviewers for their fruitful feedback and comments that have helped them improve the quality of this work. This work is partly supported by the National Natural Science Foundation of China (Grant No. 61202026, No. 61332001 and No. 61373155) and Program of China National 1000 Young Talent Plan.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, T., Zhang, J., Shu, W. et al. Efficient graph computation on hybrid CPU and GPU systems. J Supercomput 71, 1563–1586 (2015). https://doi.org/10.1007/s11227-015-1378-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-015-1378-z