Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3559009.3569647acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article
Public Access

HBMax: Optimizing Memory Efficiency for Parallel Influence Maximization on Multicore Architectures

Published: 27 January 2023 Publication History

Abstract

Influence maximization aims to select k most-influential vertices or seeds in a network, where influence is defined by a given diffusion process. Although computing optimal seed set is NP-Hard, efficient approximation algorithms exist. However, even state-of-the-art parallel implementations are limited by a sampling step that incurs large memory footprints. This in turn limits the problem size reach and approximation quality. In this work, we study the memory footprint of the sampling process collecting reverse reachability information in the IMM (Influence Maximization via Martingales) algorithm over large real-world social networks. We present a memory-efficient optimization approach (called HBMax) based on Ripples, a state-of-the-art multi-threaded parallel influence maximization solution. Our approach, HBMax, uses a portion of the reverse reachable (RR) sets collected by the algorithm to learn the characteristics of the graph. Then, it compresses the intermediate reverse reachability information with Huffman coding or bitmap coding, and queries on the partially decoded data, or directly on the compressed data to preserve the memory savings obtained through compression. Considering a NUMA architecture, we scale up our solution on 64 CPU cores and reduce the memory footprint by up to 82.1% with average 6.3% speedup (encoding overhead is offset by performance gain from memory reduction) without loss of accuracy. For the largest tested graph Twitter7 (with 1.4 billion edges), HBMax achieves 5.9× compression ratio and 2.2× speedup.

References

[1]
Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 44--54.
[2]
Maciej Besta, Simon Weber, Lukas Gianinazzi, Robert Gerstenberger, Andrey Ivanov, Yishai Oltchik, and Torsten Hoefler. 2019. Slim graph: Practical lossy graph compression for approximate graph processing, storage, and analytics. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--25.
[3]
Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna. 2004. UbiCrawler: A Scalable Fully Distributed Web Crawler. Software: Practice & Experience 34, 8 (2004), 711--726.
[4]
Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks. In Proceedings of the 20th international conference on World Wide Web, Sadagopan Srinivasan, Krithi Ramamritham, Arun Kumar, M. P. Ravindra, Elisa Bertino, and Ravi Kumar (Eds.). ACM Press, 587--596.
[5]
Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004). ACM Press, Manhattan, USA, 595--601.
[6]
Christian Borgs, Michael Brautbar, Jennifer Chayes, and Brendan Lucier. 2014. Maximizing social influence in nearly optimal time. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms. SIAM, 946--957.
[7]
Bridge-2 system. 2020. https://www.psc.edu/resources/bridges-2/. (Accessed on 12/22/2021).
[8]
CAIDA UCSD. 2008. The CAIDA UCSD Macroscopic Skitter Topology Dataset. https://www.caida.org/catalog/software/skitter [Online; accessed 24-January-2022].
[9]
Timothy A Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS) 38, 1 (2011), 1--25.
[10]
Domenico Ficara, Andrea Di Pietro, Stefano Giordano, Gregorio Procissi, and Fabio Vitucci. 2010. Enhancing counting Bloom filters through Huffman-coded multilayer structures. IEEE/ACM Transactions On Networking 18, 6 (2010), 1977--1987.
[11]
James D Foley, Foley Dan Van, Andries Van Dam, Steven K Feiner, John F Hughes, and J Hughes. 1996. Computer graphics: principles and practice. Vol. 12110. Addison-Wesley Professional.
[12]
Sayan Ghosh, Nathan R Tallent, Marco Minutoli, Mahantesh Halappanavar, Ramesh Peri, and Ananth Kalyanaraman. 2021. Single-node partitioned-memory for huge graph analytics: cost and performance trade-offs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--14.
[13]
David A Huffman. 1952. A method for the construction of minimum-redundancy codes. Proceedings of the IRE 40, 9 (1952), 1098--1101.
[14]
David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. 137--146.
[15]
Christoph Lameter. 2013. An overview of non-uniform memory access. Commun. ACM 56, 9 (2013), 59--54.
[16]
Jure Leskovec, Lada A Adamic, and Bernardo A Huberman. 2007. The dynamics of viral marketing. ACM Transactions on the Web (TWEB) 1, 1 (2007), 5--es.
[17]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
[18]
Jiesong Liu, Feng Zhang, Hourun Li, Dalin Wang, Weitao Wan, Xiaokun Fang, Jidong Zhai, and Xiaoyong Du. 2022. Exploring Query Processing on CPU-GPU Integrated Edge Device. IEEE Transactions on Parallel and Distributed Systems (2022).
[19]
Peter Macko, Virendra J Marathe, Daniel W Margo, and Margo I Seltzer. 2015. Llama: Efficient graph analytics using large multiversioned arrays. In 2015 IEEE 31st International Conference on Data Engineering. IEEE, 363--374.
[20]
Norbert Martínez-Bazan, M Ángel Águila-Lorente, Victor Muntés-Mulero, David Dominguez-Sal, Sergio Gómez-Villamor, and Josep-L Larriba-Pey. 2012. Efficient graph management based on bitmap indices. In Proceedings of the 16th International Database Engineering & Applications Sysmposium. 110--119.
[21]
Marco Minutoli, Maurizio Drocco, Mahantesh Halappanavar, Antonino Tumeo, and Ananth Kalyanaraman. 2020. cuRipples: Influence maximization on multi-GPU systems. In Proceedings of the 34th ACM International Conference on Super-computing. 1--11.
[22]
Marco Minutoli, Mahantesh Halappanavar, Ananth Kalyanaraman, Arun Sathanur, Ryan Mcclure, and Jason McDermott. 2019. Fast and scalable implementations of influence maximization algorithms. In 2019 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 1--12.
[23]
Marco Minutoli, Prathyush Sambaturu, Mahantesh Halappanavar, Antonino Tumeo, Ananth Kalyananaraman, and Anil Vullikanti. 2020. Preempt: scalable epidemic interventions using submodular optimization on multi-GPU systems. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--15.
[24]
Alan Mislove, Massimiliano Marcon, Krishna P Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement. 29--42.
[25]
Mohammad Hasanzadeh Mofrad, Rami Melhem, Yousuf Ahmad, and Mohammad Hammoud. 2019. Efficient distributed graph analytics using triply compressed sparse format. In 2019 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 1--11.
[26]
George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. 1978. An analysis of approximations for maximizing submodular set functions---I. Mathematical programming 14, 1 (1978), 265--294.
[27]
Zaifeng Pan, Feng Zhang, Yanliang Zhou, Jidong Zhai, Xipeng Shen, Onur Mutlu, and Xiaoyong Du. 2021. Exploring data analytics without decompression on embedded GPU systems. IEEE Transactions on Parallel and Distributed Systems 33, 7 (2021), 1553--1568.
[28]
Keith H Randall, Raymie Stata, Rajiv G Wickremesinghe, and Janet L Wiener. 2002. The link database: Fast access to graphs of the web. In Proceedings DCC 2002. Data Compression Conference. IEEE, 122--131.
[29]
Ripples. 2022. https://github.com/pnnl/ripples. (Accessed on 4/20/2022).
[30]
Mark A Roth and Scott J Van Horn. 1993. Database compression. ACM Sigmod Record 22, 3 (1993), 31--39.
[31]
Julian Shun and Guy E Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming. 135--146.
[32]
Janne Suontausta and Jilei Tian. 2003. Low memory decision tree method for text-to-phoneme mapping. In 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No. 03EX721). IEEE, 135--140.
[33]
Lubos Takac and Michal Zabovsky. 2012. Data analysis in public social networks. In International scientific conference and international workshop present day trends of innovations, Vol. 1.
[34]
Youze Tang, Yanchen Shi, and Xiaokui Xiao. 2015. Influence maximization in near-linear time: A martingale approach. In Proceedings of the 2015 ACM SIGMOD international conference on management of data. 1539--1554.
[35]
Vijay V Vazirani. 2001. Approximation algorithms. Vol. 1. Springer.
[36]
Jianguo Wang, Chunbin Lin, Yannis Papakonstantinou, and Steven Swanson. 2017. An experimental study of bitmap compression vs. inverted list compression. In Proceedings of the 2017 ACM International Conference on Management of Data. 993--1008.
[37]
William Webber, Alistair Moffat, and Justin Zobel. 2010. A similarity measure for indefinite rankings. ACM Transactions on Information Systems (TOIS) 28, 4 (2010), 1--38.
[38]
Jaewon Yang and Jure Leskovec. 2011. Patterns of temporal variation in online media. In Proceedings of the fourth ACM international conference on Web search and data mining. 177--186.
[39]
Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems 42, 1 (2015), 181--213.
[40]
Feng Zhang, Zaifeng Pan, Yanliang Zhou, Jidong Zhai, Xipeng Shen, Onur Mutlu, and Xiaoyong Du. 2021. G-TADOC: Enabling efficient GPU-based text analytics without decompression. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 1679--1690.
[41]
Feng Zhang, Weitao Wan, Chenyang Zhang, Jidong Zhai, Yunpeng Chai, Haixiang Li, and Xiaoyong Du. 2022. CompressDB: Enabling Efficient Compressed Data Direct Processing for Various Databases. In Proceedings of the 2022 International Conference on Management of Data. 1655--1669.
[42]
Feng Zhang, Jidong Zhai, Xipeng Shen, Dalin Wang, Zheng Chen, Onur Mutlu, Wenguang Chen, and Xiaoyong Du. 2021. TADOC: Text analytics directly on compression. The VLDB Journal 30, 2 (2021), 163--188.

Cited By

View all
  • (2024)Data-Aware Adaptive Compression for Stream ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337771036:9(4531-4549)Online publication date: Sep-2024
  • (2024)RIMR: Reverse Influence Maximization Rank2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00146(805-814)Online publication date: 27-May-2024
  • (2023)CompressStreamDB: Fine-Grained Adaptive Stream Processing without Decompression2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00038(408-422)Online publication date: Apr-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '22: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques
October 2022
569 pages
ISBN:9781450398688
DOI:10.1145/3559009
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

  • IFIP WG 10.3: IFIP WG 10.3
  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 January 2023

Check for updates

Badges

Author Tags

  1. compression
  2. influence maximization
  3. multicore architectures

Qualifiers

  • Research-article

Funding Sources

Conference

PACT '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)128
  • Downloads (Last 6 weeks)7
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Data-Aware Adaptive Compression for Stream ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337771036:9(4531-4549)Online publication date: Sep-2024
  • (2024)RIMR: Reverse Influence Maximization Rank2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00146(805-814)Online publication date: 27-May-2024
  • (2023)CompressStreamDB: Fine-Grained Adaptive Stream Processing without Decompression2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00038(408-422)Online publication date: Apr-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media