Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Hardware support for fine-grained event-driven computation in Anton 2

Published: 16 March 2013 Publication History

Abstract

Exploiting parallelism to accelerate a computation typically involves dividing it into many small tasks that can be assigned to different processing elements. An efficient execution schedule for these tasks can be difficult or impossible to determine in advance, however, if there is uncertainty as to when each task's input data will be available. Ideally, each task would run in direct response to the arrival of its input data, thus allowing the computation to proceed in a fine-grained event-driven manner. Realizing this ideal is difficult in practice, and typically requires sacrificing flexibility for performance.
In Anton 2, a massively parallel special-purpose supercomputer for molecular dynamics simulations, we addressed this challenge by including a hardware block, called the dispatch unit, that provides flexible and efficient support for fine-grained event-driven computation. Its novel features include a many-to-many mapping from input data to a set of synchronization counters, and the ability to prioritize tasks based on their type. To solve the additional problem of using a fixed set of synchronization counters to track input data for a potentially large number of tasks, we created a software library that allows programmers to treat Anton 2 as an idealized machine with infinitely many synchronization counters. The dispatch unit, together with this library, made it possible to simplify our molecular dynamics software by expressing it as a collection of independent tasks, and the resulting fine-grained execution schedule improved overall performance by up to 16% relative to a coarse-grained schedule for precisely the same computation.

References

[1]
Ghiath Al-Kadi and Andrei Sergeevich Terechko, "A hardware task scheduler for embedded video processing," 4th International Conference on High Performance Embedded Architectures and Compilers (HiPEAC '09), Paphos, Cyprus, January 25-28, 2009, pp. 140--152.
[2]
Nimar S. Arora, Robert D. Blumofe and C. Greg Plaxton, "Thread scheduling for multiprogrammed multiprocessors," 10th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA '98), Puerto Vallarta, Mexico, June 28-July 2, 1998, pp. 119--129.
[3]
Joseph M. Arul and Krishna M. Kavi, "Scalability of scheduled dataflow architecture (SDF) with register contexts," 5th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2002), Beijing, China, October 23-25, 2002, pp. 214--221.
[4]
Arvind and David E. Culler, "Dataflow architectures," Annual Review of Computer Science, Volume 1, June, 1986, pp. 225--253.
[5]
Arvind and Rishiyur S. Nikhil, "Executing a program on the MIT tagged-token dataflow architecture," IEEE Transactions on Computers, Volume 39, Issue 3, March, 1990, pp. 300--318.
[6]
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall and Yuli Zhou, "Cilk: an efficient multithreaded runtime system," Journal of Parallel and Distributed Computing, Volume 37, Issue 1, August, 1996, pp. 55--69.
[7]
Robert D. Blumofe and Charles E. Leiserson, "Scheduling multithreaded computations by work stealing," Journal of the ACM, Volume 46, Number 5, September, 1999, pp. 720--748.
[8]
Greg Buzzard, David Jacobson, Milon Mackay, Scott Marovich and John Wilkes, "An implementation of the Hamlyn sender-managed interface architecture," 2nd USENIX Symposium on Operating System Design and Implementation (OSDI '96), Seattle, WA, October 28-31, 1996, pp. 245--259.
[9]
David Chase and Yossi Lev, "Dynamic circular work-stealing deque," 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2005), Las Vegas, NV, July 18-20, 2005, pp. 21--28.
[10]
Ron O. Dror, J.P. Grossman, Kenneth M. Mackenzie, Brian Towles, Edmond Chow, John K. Salmon, Cliff Young, Joseph A. Bank, Brannon Batson, Martin M. Deneroff, Jeffrey S. Kuskin, Richard H. Larson, Mark A. Moraes and David E. Shaw, "Exploiting 162-nanosecond end-to-end communication latency on Anton," International Conference on High Performance Computing, Networking, Storage and Analysis (SC10), New Orleans, LA, November 15-18, 2010.
[11]
Thorsten von Eicken, David E. Culler, Seth Copen Goldstein and Klaus Erik Schauser, "Active messages: a mechanism for integrated communication and computation," 19th International Symposium on Computer Architecture (ISCA 1992), Gold Coast, Australia, May 19-21, 1992, pp. 430--440.
[12]
Yoav Etsion, Felipe Cabarcas, Alejandro Rico, Alex Ramirez, Rosa M. Badia, Eduard Ayguade, Jesus Labarta and Mateo Valero, "Task superscalar: an out-of-order task pipeline," 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '43), Atlanta, Georgia, December 4-8, 2010, pp. 89--100.
[13]
Mark Gebhart, Bertrand A. Maher, Katherine E. Coons, Jeff Diamond, Paul Gratz, Mario Marino, Nitya Ranganathan, Behnam Robatimili, Aaron Smith, James Burrill, Stephen W. Keckler, Doug Berger and Kathryn S. McKinley, "An evaluation of the TRIPS computer system," 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2009), Washington, D.C., March 7-11, 2009, pp. 1--12.
[14]
Danny Hendler and Nir Shavit, "Non-blocking steal-half work queues," 21st Annual ACM Symposium on Principles of Distributed Computing (PODC 2002), Monterey, CA, July 21-24, 2002, pp. 280--289.
[15]
Ralf Hoffmann, Matthias Korch and Thomas Rauber, "Performance evaluation of task pools based on hardware synchronization," ACM/IEEE Conference on High Performance Networking and Computing (SC04), Pittsburgh, PA, November 6-12, 2004.
[16]
Laxmikant V. Kale and Sanjeev Krishnan, "CHARM++: a portable concurrent object oriented system based on C++," 8th Annual Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA 1993), Washington, D.C., September 26-October 1, 1993, pp. 91--108.
[17]
Matthias Korch and Thomas Rauber, "A comparison of task pools for dynamic load balancing of irregular algorithms," Journal of Concurrency and Computation: Practice & Experience, Volume 16, Issue 1, December, 2003, pp. 1--47.
[18]
Sameer Kumar, Gabor Dozsa, Gheorghe Almasi, Dong Chen, Mark E. Giampapa, Philip Heidelberger, Michael Blocksome, Ahmad Faraj, Jeff Parker, Joseph Ratterman, Brian Smith and Charles Archer, "The deep computing messaging framework: Generalized scalable message passing on the Blue Gene/P supercomputer," 22nd International Conference on Supercomputing (ICS '08), Island of Kos, Greece, June 7-12, 2008, pp. 94--103.
[19]
Sanjeev Kumar, Christopher J. Hughes and Anthony Nguyen, "Carbon: architectural support for fine-grained parallelism on chip multiprocessors," 34th International Symposium on Computer Architecture (ISCA 2007), San Diego, CA, June 9-13, 2007, pp. 162--173.
[20]
Jeffrey S. Kuskin, Cliff Young, J.P. Grossman, Brannon Batson, Martin M. Deneroff, Ron O. Dror and David E. Shaw, "Incorporating flexibility in Anton, a specialized machine for molecular dynamics simulation," 14th International Symposium on High Performance Computer Architecture (HPCA-14), Salt Lake City, UT, February 16-20, 2008, pp. 343--354.
[21]
Michael D. Noakes, Deborah A. Wallach and William J. Dally, "The J-Machine multicomputer: an architectural evaluation," 20th International Symposium on Computer Architecture (ISCA 1993), San Diego, CA, May 16-19, 1993, pp. 224--235.
[22]
Gregory M. Papadopoulos and Kenneth R. Traub, "Multithreading: a revisionist view of dataflow architectures," 18th Annual International Symposium on Computer Architecture (ISCA 1991), Toronto, Canada, May 27-30, 1991, pp. 342--251.
[23]
Shuichi Sakai, Yoshinori Yamaguchi, Kei Hiraki, Yuetsu Kodama and Toshitsugu Yuba, "An architecture of a dataflow single chip processor," 16th Annual International Symposium on Computer Architecture (ISCA 1989), Jerusalem, Israel, June, 1989, pp. 46--53.
[24]
Daniel Sanchez, Richard M. Yoo and Christos Kozyrakis, "Flexible architectural support for fine-grain scheduling," 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2010), Pittsburgh, PA, March 13--17, 2010, pp. 311--322.
[25]
Steven L. Scott, "Synchronization and communication in the T3E multiprocessor," 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 1996), Cambridge, MA, October 1-5, 1996, pp. 26--36.
[26]
David E. Shaw, Martin M. Deneroff, Ron O. Dror, Jeffrey S. Kuskin, Richard H. Larson, John K. Salmon, Cliff Young, Brannon Batson, Kevin J. Bowers, Jack C. Chao, Michael P. Eastwood, Joseph Gagliardo, J.P. Grossman, C. Richard Ho, Douglas J. Ierardi, István Kolossváry, John L. Klepeis, Timothy Layman, Christine McLeavey, Mark A. Moraes, Rolf Mueller, Edward C. Priest, Yibing Shan, Jochen Spengler, Michael Theobald, Brian Towles and Stanley C. Wang, "Anton, a special-purpose machine for molecular dynamics simulation," 34th Annual International Symposium on Computer Architecture (ISCA 2007), San Diego, CA, June 9-13, 2007, pp. 1--12.
[27]
Magnus Själander, Andrei Terechko and Marc Duranton, "A look-ahead task management unit for embedded multi-core architectures," 11th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD 2008), Parma, Italy, September 3-5, 2008, pp. 149--157.
[28]
Kyriakos Stavrou, Costas Kyriacou, Paraskevas Evripidou and Pedro Trancoso, "Chip multiprocessor based on data-driven multithreading model," International Journal of High Performance Systems Architectures, Volume 1, Number 1, 2007, pp. 24--43.
[29]
David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III and Anant Agarwal, "On-chip interconnection architecture of the Tile Processor," IEEE Micro, Volume 27, Issue 5, September, 2007, pp. 15--31.

Cited By

View all
  • (2024)Advancing biomolecular simulation through exascale HPC, AI and quantum computingCurrent Opinion in Structural Biology10.1016/j.sbi.2024.10282687(102826)Online publication date: Aug-2024
  • (2016)Current state of theoretical and experimental studies of the voltage-dependent anion channel (VDAC)Biochimica et Biophysica Acta (BBA) - Biomembranes10.1016/j.bbamem.2016.02.0261858:7(1778-1790)Online publication date: Jul-2016
  • (2015)Flexible receptor docking for drug discoveryExpert Opinion on Drug Discovery10.1517/17460441.2015.107830810:11(1189-1200)Online publication date: 26-Aug-2015
  • Show More Cited By

Index Terms

  1. Hardware support for fine-grained event-driven computation in Anton 2

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 48, Issue 4
      ASPLOS '13
      April 2013
      540 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2499368
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
        March 2013
        574 pages
        ISBN:9781450318709
        DOI:10.1145/2451116
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 March 2013
      Published in SIGPLAN Volume 48, Issue 4

      Check for updates

      Author Tags

      1. anton 2
      2. dispatch unit
      3. event-driven
      4. parallel
      5. task scheduling

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)18
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 15 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Advancing biomolecular simulation through exascale HPC, AI and quantum computingCurrent Opinion in Structural Biology10.1016/j.sbi.2024.10282687(102826)Online publication date: Aug-2024
      • (2016)Current state of theoretical and experimental studies of the voltage-dependent anion channel (VDAC)Biochimica et Biophysica Acta (BBA) - Biomembranes10.1016/j.bbamem.2016.02.0261858:7(1778-1790)Online publication date: Jul-2016
      • (2015)Flexible receptor docking for drug discoveryExpert Opinion on Drug Discovery10.1517/17460441.2015.107830810:11(1189-1200)Online publication date: 26-Aug-2015
      • (2022)Towards General-Purpose Long-Timescale Molecular Dynamics Simulation on Exascale Supercomputers with Data Processing Units2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO)10.23919/MIPRO55190.2022.9803537(300-306)Online publication date: 23-May-2022
      • (2018)Harmonizing speculative and non-speculative execution in architectures for ordered parallelismProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00026(217-230)Online publication date: 20-Oct-2018
      • (2018)Millipede: Die-Stacked Memory Optimizations for Big Data Machine Learning Analytics2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2018.00026(160-171)Online publication date: May-2018
      • (2016)Data-centric execution of speculative parallel programsThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195644(1-13)Online publication date: 15-Oct-2016
      • (2016)Data-centric execution of speculative parallel programs2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO.2016.7783708(1-13)Online publication date: Oct-2016
      • (2015)Microarchitectural implications of event-driven server-side web applicationsProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830792(762-774)Online publication date: 5-Dec-2015
      • (2015)A scalable architecture for ordered parallelismProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830777(228-241)Online publication date: 5-Dec-2015
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media