Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2925426.2926254acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Graph Prefetching Using Data Structure Knowledge

Published: 01 June 2016 Publication History

Abstract

Searches on large graphs are heavily memory latency bound, as a result of many high latency DRAM accesses. Due to the highly irregular nature of the access patterns involved, caches and prefetchers, both hardware and software, perform poorly on graph workloads. This leads to CPU stalling for the majority of the time. However, in many cases the data access pattern is well defined and predictable in advance, many falling into a small set of simple patterns. Although existing implicit prefetchers cannot bring significant benefit, a prefetcher armed with knowledge of the data structures and access patterns could accurately anticipate applications' traversals to bring in the appropriate data.
This paper presents a design of an explicitly configured prefetcher to improve performance for breadth-first searches and sequential iteration on the efficient and commonly-used compressed sparse row graph format. By snooping L1 cache accesses from the core and reacting to data returned from its own prefetches, the prefetcher can schedule timely loads of data in advance of the application needing it. For a range of applications and graph sizes, our prefetcher achieves average speedups of 2.3x, and up to 3.3x, with little impact on memory bandwidth requirements.

References

[1]
D. Ajwani, U. Dementiev, R. Meyer, and V. Osipov. Breadth first search on massive graphs. In 9th DIMACS Implementation Challenge Workshop: Shortest Paths, 2006.
[2]
Hassan Al-Sukhni, Ian Bratt, and Daniel A. Connors. Compiler-directed content-aware prefetching for dynamic data structures. In PACT, 2003.
[3]
Jiří Barnat and Ivana Černá. Distributed breadth-first search ltl model checking. Form. Methods Syst. Des., 29(2), September 2006.
[4]
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2), August 2011.
[5]
Ulrik Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25:163--177, 2001.
[6]
Robert Cooksey, Stephan Jourdan, and Dirk Grunwald. A stateless, content-directed data prefetching mechanism. In ASPLOS, 2002.
[7]
Ben Coppin. Artificial Intelligence Illuminated. Jones and Bartlett Publishers, 2004.
[8]
P. Demosthenous, N. Nicolaou, and J. Georgiou. A hardware-efficient lowpass filter design for biomedical applications. In BioCAS, 2010.
[9]
Jonathan Eastep. Evolve: a preliminary multicore architecture for introspective computing. Master's thesis, MIT, 2007.
[10]
E. Ebrahimi, O. Mutlu, and Y.N. Patt. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In HPCA, 2009.
[11]
Jack Edmonds and Richard M. Karp. Theoretical improvements in algorithmic efficiency for network flow problems. J. ACM, 19(2), April 1972.
[12]
Babak Falsafi and Thomas F. Wenisch. A primer on hardware prefetching. Synthesis Lectures on Computer Architecture, 9(1), 2014.
[13]
Adi Fuchs, Shie Mannor, Uri Weiser, and Yoav Etsion. Loop-aware memory prefetching using code block working sets. In MICRO, 2014.
[14]
Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. Green-marl: A dsl for easy and efficient graph analysis. In ASPLOS, 2012.
[15]
Doug Joseph and Dirk Grunwald. Prefetching using markov predictors. In ISCA, 1997.
[16]
Dongkeun Kim and Donald Yeung. Design and evaluation of compiler algorithms for pre-execution. SIGPLAN Not., 37(10), October 2002.
[17]
Onur Kocberber, Babak Falsafi, Kevin Lim, Parthasarathy Ranganathan, and Stavros Harizopoulos. Dark silicon accelerators for database indexing. In 1st Dark Silicon Workshop (DaSi), 2012.
[18]
Onur Kocberber, Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, and Parthasarathy Ranganathan. Meet the walkers: Accelerating index traversals for in-memory databases. In MICRO, 2013.
[19]
Snehasish Kumar, Arrvindh Shriraman, Vijayalakshmi Srinivasan, Dan Lin, and Jordon Phillips. SQRL: Hardware accelerator for collecting software data structures. In PACT, 2014.
[20]
M. Kurant, A. Markopoulou, and P. Thiran. On the bias of BFS (breadth first search). In ITC, 2010.
[21]
Shih-Chang Lai and Shih-Lien Lu. Hardware-based pointer data prefetcher. In ICCD, 2003.
[22]
N.B. Lakshminarayana and Hyesoon Kim. Spare register aware prefetching for graph algorithms on gpus. In HPCA, 2014.
[23]
Eric Lau, Jason E. Miller, Inseok Choi, Donald Yeung, Saman Amarasinghe, and Anant Agarwal. Multicore performance optimization using partner cores. In HotPar, 2011.
[24]
Charles E. Leiserson and Tao B. Schardl. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In SPAA, 2010.
[25]
Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
[26]
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. In VLDB, 2012.
[27]
Andrew Lumsdaine, Douglas Gregor, Bruce Hendrickson, and Jonathan Berry. Challenges in parallel graph processing. Parallel Processing Letters, 17(01):5--20, 2007.
[28]
V. Malhotra and C. Kozyrakis. Library-based prefetching for pointer-intensive applications. Technical report, Computer Systems Laboratory, Stanford University, 2006.
[29]
C. McNairy and D. Soltis. Itanium 2 processor microarchitecture. In MICRO, 2003.
[30]
Frank McSherry, Michael Isard, and Derek G Murray. Scalability! but at what cost? In HotOS XV, May 2015.
[31]
Todd C. Mowry, Monica S. Lam, and Anoop Gupta. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS V, 1992.
[32]
Richard C. Murphy, Kyle B. Wheeler, Brian W. Barrett, and James A. Ang. Introducing the graph 500. Cray User's Group (CUG), May 5, 2010.
[33]
Marc Najork and Janet L. Wiener. Breadth-first crawling yields high-quality pages. In WWW, 2001.
[34]
Karthik Nilakant, Valentin Dalibard, Amitabha Roy, and Eiko Yoneki. Prefedge: Ssd prefetcher for large-scale graph traversal. In SYSTOR, 2014.
[35]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, 1999.
[36]
Leeor Peled, ShieMannor, UriWeiser, and Yoav Etsion. Semantic locality and context-based prefetching using reinforcement learning. In ISCA, 2015.
[37]
Dana Ron. Algorithmic and analysis techniques in property testing. Foundations and Trends in Theoretical Computer Science, 5(2):73--205, 2009.
[38]
Amir Roth, Andreas Moshovos, and Gurindar S. Sohi. Dependence based prefetching for linked data structures. In ASPLOS, 1998.
[39]
Nadathur Satish, Changkyu Kim, Jatin Chhugani, and Pradeep Dubey. Large-scale energy-efficient graph traversal: A path to efficient data-intensive supercomputing. In SC, 2012.
[40]
V. Shkapenyuk and Torsten Suel. Design and implementation of a high-performance distributed web crawler. In ICDE, 2002.
[41]
Jeremy Siek, Lie-Quan Lee, and Andrew Lumsdaine. The Boost Graph Library: User Guide and Reference Manual. Addison-Wesley Longman Publishing Co., Inc., 2002.
[42]
Vish Viswanathan. Disclosure of h/w prefetcher control on some intel processors. https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors, September 2014.
[43]
Chia-Lin Yang and Alvin R. Lebeck. Push vs. pull: Data movement for linked data structures. In ICS, 2000.
[44]
Xiangyao Yu, Christopher J. Hughes, Nadathur Satish, and Srinivas Devadas. IMP: Indirect memory prefetcher. In MICRO, 2015.

Cited By

View all
  • (2024)Camouflage: Utility-Aware Obfuscation for Accurate Simulation of Sensitive Program TracesACM Transactions on Architecture and Code Optimization10.1145/365011021:2(1-23)Online publication date: 21-May-2024
  • (2024)Limoncello: Prefetchers for ScaleProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651373(577-590)Online publication date: 27-Apr-2024
  • (2024)PDG: A Prefetcher for Dynamic Graph UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333588043:4(1246-1259)Online publication date: Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '16: Proceedings of the 2016 International Conference on Supercomputing
June 2016
547 pages
ISBN:9781450343619
DOI:10.1145/2925426
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Graphs
  2. Prefetching

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICS '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)139
  • Downloads (Last 6 weeks)10
Reflects downloads up to 09 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Camouflage: Utility-Aware Obfuscation for Accurate Simulation of Sensitive Program TracesACM Transactions on Architecture and Code Optimization10.1145/365011021:2(1-23)Online publication date: 21-May-2024
  • (2024)Limoncello: Prefetchers for ScaleProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651373(577-590)Online publication date: 27-Apr-2024
  • (2024)PDG: A Prefetcher for Dynamic Graph UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333588043:4(1246-1259)Online publication date: Apr-2024
  • (2024)Application-Level Checkpoint/Restart for Large-Scale Attack and Compliance GraphsSoutheastCon 202410.1109/SoutheastCon52093.2024.10500065(1450-1455)Online publication date: 15-Mar-2024
  • (2024)Scalar Vector Runahead2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00101(1367-1381)Online publication date: 2-Nov-2024
  • (2024)Leviathan: A Unified System for General-Purpose Near-Data Computing2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00095(1278-1294)Online publication date: 2-Nov-2024
  • (2024)Terminus: A Programmable Accelerator for Read and Update Operations on Sparse Data Structures2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00092(1233-1246)Online publication date: 2-Nov-2024
  • (2024)Triangel: A High-Performance, Accurate, Timely On-Chip Temporal Prefetcher2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00090(1202-1216)Online publication date: 29-Jun-2024
  • (2024)Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00017(88-102)Online publication date: 29-Jun-2024
  • (2024)Practically Tackling Memory Bottlenecks of Graph-Processing Workloads2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00096(1034-1045)Online publication date: 27-May-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media