Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3470496.3527387acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

A scalable architecture for reprioritizing ordered parallelism

Published: 11 June 2022 Publication History

Abstract

Many algorithms schedule their work, or tasks, according to a priority order for correctness or faster convergence. While priority schedulers commonly implement task enqueue and dequeueMin operations, some algorithms need a priority update operation that alters the scheduling metadata for a task. Prior software and hardware systems that support scheduling with priority updates compromise on either parallelism, work-efficiency, or both, leading to missed performance opportunities. Moreover, incorrectly navigating these compromises violates correctness in those algorithms that are not resilient to relaxing priority order.
We present Hive, a task-based execution model and multicore architecture that extracts abundant fine-grain parallelism from algorithms with priority updates, while retaining their strict priority schedules. Like prior hardware systems for ordered parallelism, Hive uses data- and control-dependence speculation and a large speculative window to execute tasks in parallel and out of order. Hive improves on prior work by (i) directly supporting updates in the interface, (ii) identifying the novel scheduler-carried dependence, and (iii) speculating on such dependences with task versioning, distinct from data versioning. Hive enables safe speculative updates to the schedule and avoids spurious conflicts among tasks to better utilize speculation tracking resources and efficiently uncover more parallelism. Across a suite of nine benchmarks, Hive improves performance at 256 cores by up to 2.8× over the next best hardware solution, and even more over software-only parallel schedulers.

References

[1]
2006. 9th DIMACS Implementation Challenge: Shortest Paths.
[2]
2015. OpenStreetMap. https://www.openstreetmap.org
[3]
Maleen Abeydeera and Daniel Sanchez. 2020. Chronos: Efficient Speculative Parallelism for Accelerators. In Proc. of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXV). ACM, 1247--1262.
[4]
Vitalii Aksenov, Dan Alistarh, and Janne H. Korhonen. 2020. Scalable Belief Propagation via Relaxed Scheduling. In Proc. of the International Conference on Neural Information Processing Systems (NeurIPS). MIT Press, 22361--22372.
[5]
Dan Alistarh, Trevor Brown, Justin Kopinsky, and Giorgi Nadiradze. 2018. Relaxed Schedulers Can Efficiently Parallelize Iterative Algorithms. In Proc. of the Symposium on Principles of Distributed Computing (PODC). ACM, 377--386.
[6]
Dan Alistarh, Justin Kopinsky, Jerry Li, and Giorgi Nadiradze. 2017. The Power of Choice in Priority Scheduling. In Proc. of the Symposium on Principles of Distributed Computing (PODC). ACM, 283--292.
[7]
Dan Alistarh, Justin Kopinsky, Jerry Li, and Nir Shavit. 2015. The SprayList: A scalable relaxed priority queue. In Proc. of the Symposium on Principles and Practice of Parallel Programming (PPoPP). ACM, 11--20.
[8]
Dan Alistarh, Giorgi Nadiradze, and Nikita Koval. 2019. Efficiency Guarantees for Parallel Incremental Algorithms under Relaxed Schedulers. In Proc. of the Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, 145--154.
[9]
J. R. Allen, Ken Kennedy, Carrie Porterfield, and Joe Warren. 1983. Conversion of Control Dependence to Data Dependence. In Proc. of the Symposium on Principles of Programming Languages (POPL). ACM, 177--189.
[10]
J. Ignacio Alvarez-Hamelin, Luca Dall'Asta, Alain Barrat, and Alessandro Vespignani. 2005. Large Scale Networks Fingerprinting and Visualization Using the K-Core Decomposition. In Proc. of the International Conference on Neural Information Processing Systems (NeurIPS). MIT Press, 41--50.
[11]
Analog Bits 2011. 4096 x 128 ternary CAM datasheet (28nm). Analog Bits. http://mail.analogbits.com/pdf/28nm_TCAM_Product_Brief.pdf
[12]
David A. Bader, Henning Meyerhenke, Peter Sanders, and Dorothea Wagner (Eds.). 2012. 10th DIMACS Implementation Challenge Workshop.
[13]
Rajeev Balasubramonian, Andrew B. Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories. ACM Transactions on Computer Architecture and Compiler Optimizations (TACO) 14, 2 (2017).
[14]
Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Julian Shun. 2012. Internally deterministic parallel algorithms can be fast. In Proc. of the Symposium on Principles and Practice of Parallel Programming (PPoPP). ACM, 181--192.
[15]
Guy E. Blelloch, Jeremy T. Fineman, and Julian Shun. 2012. Greedy Sequential Maximal Independent Set and Matching are Parallel on Average. In Proc. of the Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, 308--317.
[16]
Guy E. Blelloch, Yan Gu, Yihan Sun, and Kanat Tangwongsan. 2016. Parallel Shortest Paths Using Radius Stepping. In Proc. of the Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, 443--454.
[17]
J. Lawrence Carter and Mark Wegman. 1979. Universal classes of hash functions. J. Comput. System Sci. 18, 2 (1979), 143--154.
[18]
E. Castillo, L. Alvarez, M. Moreto, M. Casas, E. Vallejo, J. L. Bosque, R. Beivide, and M. Valero. 2018. Architectural Support for Task Dependence Management with Flexible Software Scheduling. In Proc. of the International Symposium on High Performance Computer Architecture (HPCA-24). IEEE, 283--295.
[19]
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proc. of the International Conference on Data Mining (SDM). SIAM, 442--446.
[20]
Barry A. Cipra. 1987. An Introduction to the Ising Model. The American Mathematical Monthly 94, 10 (1987), 937--959.
[21]
Vidushi Dadu, Sihao Liu, and Tony Nowatzki. 2021. PolyGraph: Exposing the Value of Flexibility for Graph Processing Accelerators. In Proc. of the International Symposium on Computer Architecture (ISCA-48). ACM/IEEE, 595--608.
[22]
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Transactions on Mathematical Software (TOMS) 38, 1 (2011), 1--25.
[23]
Laxman Dhulipala, Guy Blelloch, and Julian Shun. 2017. Julienne: A Framework for Parallel Graph Algorithms Using Work-efficient Bucketing. In Proc. of the Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, 293--304.
[24]
Edsger W. Dijkstra. 1959. A note on two problems in connexion with graphs. Numer. Math. 1, 1 (1959), 269--271.
[25]
Gal Elidan, Ian McGraw, and Daphne Koller. 2006. Residual Belief Propagation: Informed Scheduling for Asynchronous Message Passing. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI'06). AUAI Press, 165--173.
[26]
H. Esmaeilzadeh, E. Blem, R. St Amant, K. Sankaralingam, and D. Burger. 2011. Dark Silicon and The End of Multicore Scaling. In Proc. of the International Symposium on Computer Architecture (ISCA-38). ACM/IEEE, 122--134.
[27]
Y. Etsion, F. Cabarcas, A. Rico, A. Ramirez, R.M. Badia, E. Ayguade, J. Labarta, and M. Valero. 2010. Task Superscalar: An Out-of-Order Task Pipeline. In Proc. of the International Symposium on Microarchitecture (MICRO-43). IEEE/ACM, 89--100.
[28]
Michael L. Fredman and Robert Endre Tarjan. 1987. Fibonacci Heaps and Their Uses in Improved Network Optimization Algorithms. J. ACM 34, 3 (1987), 596--615.
[29]
Thomas Haigh, Mark Priestley, and Crispin Rope. 2014. Reconsidering the Stored-Program Concept. IEEE Annals of the History of Computing 36, 1 (2014), 4--17.
[30]
Lance Hammond, Mark Willey, and Kunle Olukotun. 1998. Data speculation support for a chip multiprocessor. In Proc. of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII). ACM, 58--69.
[31]
Peter Hart, Nils Nilsson, and Bertram Raphael. 1968. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics 4, 2 (1968), 100--107.
[32]
William Hasenplaugh, Tim Kaler, Tao B. Schardl, and Charles E. Leiserson. 2014. Ordering heuristics for parallel graph coloring. In Proc. of the Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, 166--177.
[33]
Muhammad Amber Hassaan, Martin Burtscher, and Keshav Pingali. 2011. Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms. In Proc. of the Symposium on Principles and Practice of Parallel Programming (PPoPP). ACM, 3--12.
[34]
Muhammad Amber Hassaan, Donald Nguyen, and Keshav Pingali. 2015. Kinetic Dependence Graphs. In Proc. of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XX). ACM, 457--471.
[35]
David R. Jefferson. 1985. Virtual time. ACM Transactions on Programming Languages and Systems (TOPLAS) 7, 3 (1985), 404--425.
[36]
Mark C. Jeffrey, Suvinay Subramanian, Maleen Abeydeera, Joel Emer, and Daniel Sanchez. 2016. Data-centric execution of speculative parallel programs. In Proc. of the International Symposium on Microarchitecture (MICRO-49). IEEE/ACM, 1--13.
[37]
Mark C. Jeffrey, Suvinay Subramanian, Cong Yan, Joel Emer, and Daniel Sanchez. 2015. A scalable architecture for ordered parallelism. In Proc. of the International Symposium on Microarchitecture (MICRO-48). IEEE/ACM, 228--241.
[38]
Mark C. Jeffrey, Suvinay Subramanian, Cong Yan, Joel Emer, and Daniel Sanchez. 2016. Unlocking ordered parallelism with the Swarm architecture. IEEE Micro 36, 3 (2016), 105--117.
[39]
Mark C. Jeffrey, Victor A. Ying, Suvinay Subramanian, Hyun Ryong Lee, Joel Emer, and Daniel Sanchez. 2018. Harmonizing speculative and non-speculative execution in architectures for ordered parallelism. In Proc. of the International Symposium on Microarchitecture (MICRO-51). IEEE/ACM, 217--230.
[40]
David S. Johnson. 1974. Approximation algorithms for combinatorial problems. J. Comput. System Sci. 9, 3 (1974), 256--278.
[41]
Changkyu Kim, Doug Burger, and Stephen W. Keckler. 2002. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In Proc. of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X). ACM, 211--222.
[42]
Venkata Krishnan and Josep Torrellas. 1999. A Chip-Multiprocessor Architecture with Speculative Multithreading. IEEE Trans. Comput. 48, 9 (1999), 866--880.
[43]
Joseph B. Kruskal. 1956. On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Amer. Math. Soc. 7 (1956), 48--50.
[44]
Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. 2007. Optimistic parallelism requires abstractions. In Proc. of the Conference on Programming Language Design and Implementation (PLDI). ACM.
[45]
Charles Leiserson and Tao Schardl. 2010. A work-efficient parallel breadth-first search algorithm. In Proc. of the Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, 303--314.
[46]
Andrew Lenharth, Donald Nguyen, and Keshav Pingali. 2015. Priority queues are not good concurrent priority schedulers. In Proc. of the European Conference on Parallel Processing (Euro-Par). Springer Berlin Heidelberg, 209--221.
[47]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proc. of the Conference on Programming Language Design and Implementation (PLDI). ACM, 190--200.
[48]
Michael W. Mahoney, Anirban Dasgupta, Jure Leskovec, and Kevin J. Lan. 2009. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics 6, 1 (2009).
[49]
Fragkiskos D. Malliaros, Christos Giatsidis, Apostolos N. Papadopoulos, and Michalis Vazirgiannis. 2020. The core decomposition of networks: theory, algorithms and applications. The VLDB Journal 29 (2020), 61--92.
[50]
David W. Matula and Leland L. Beck. 1983. Smallest-Last Ordering and Clustering and Graph Coloring Algorithms. J. ACM 30, 3 (1983), 417--427.
[51]
U. Meyer and P. Sanders. 2003. Delta-stepping: A Parallelizable Shortest Path Algorithm. Journal of Algorithms 49, 1 (2003), 114--152.
[52]
Lucas Morais, Vitor Silva, Alfredo Goldman, Carlos Alvarez, Jaume Bosch, Michael Frank, and Guido Araujo. 2019. Adding Tightly-Integrated Task Scheduling Acceleration to a RISC-V Multi-core Processor. In Proc. of the International Symposium on Microarchitecture (MICRO-52). IEEE/ACM, 861--872.
[53]
Flaviano Morone, Kate Burleson-Lesser, H. A. Vinutha, Srikanth Sastry, and Hernán A. Makse. 2019. The jamming transition is a k-core percolation transition. Physica A: Statistical Mechanics and its Applications 516 (2019), 172--177.
[54]
Flaviano Morone, Gino Del Ferraro, and Hernán A. Makse. 2019. The k-core as a predictor of structural collapse in mutualistic ecosystems. Nature Physics 15 (2019), 95--102.
[55]
Neha Narula, Cody Cutler, Eddie Kohler, and Robert Morris. 2014. Phase Reconciliation for Contended In-Memory Transactions. In Proc. of the Symposium on Operating Systems Design and Implementation (OSDI-11). USENIX, 511--524.
[56]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A Lightweight Infrastructure for Graph Analytics. In Proc. of the Symposium on Operating Systems Principles (SOSP-24). ACM, 456--471.
[57]
Heidi Pan, Krste Asanović, Robert Cohn, and Chi-Keung Luk. 2005. Controlling program execution through binary instrumentation. SIGARCH Comput. Archit. News 33, 5 (2005), 45--50.
[58]
Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, M. Amber Hassaan, Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. 2011. The tao of parallelism in algorithms. In Proc. of the Conference on Programming Language Design and Implementation (PLDI). ACM, 12--25.
[59]
D. N. Pnevmatikatos and G. S. Sohi. 1994. Guarded Execution and Branch Prediction in Dynamic ILP Processors. In Proc. of the International Symposium on Computer Architecture (ISCA-21). ACM/IEEE, 120--129.
[60]
Anastasiia Postnikova, Nikita Koval, Giorgi Nadiradze, and Dan Alistarh. 2022. Multi-Queues Can Be State-of-the-Art Priority Schedulers. In Proc. of the Symposium on Principles and Practice of Parallel Programming (PPoPP). ACM, 353--367.
[61]
R. C. Prim. 1957. Shortest connection networks and some generalizations. The Bell System Technical Journal 36, 6 (1957), 1389--1401.
[62]
Jose Renau, Karin Strauss, Luis Ceze, Wei Liu, Smruti Sarangi, James Tuck, and Josep Torrellas. 2005. Thread-level speculation on a CMP can be energy efficient. In Proc. of the International Conference on Supercomputing (ICS'05). ACM, 219--228.
[63]
Jose Renau, James Tuck, Wei Liu, Luis Ceze, Karin Strauss, and Josep Torrellas. 2005. Tasking with out-of-order spawn in TLS chip multiprocessors: Microarchitecture and compilation. In Proc. of the International Conference on Supercomputing (ICS'05). ACM, 179--188.
[64]
Hamza Rihani, Peter Sanders, and Roman Dementiev. 2015. Brief Announcement: MultiQueues: Simple Relaxed Concurrent Priority Queues. In Proc. of the Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, 80--82.
[65]
Stephen B. Seidman. 1983. Network structure and minimum degree. Social Networks 5, 3 (1983), 269--287.
[66]
Mohsin Shan and Omer Khan. 2021. Accelerating Concurrent Priority Scheduling Using Adaptive in-Hardware Task Distribution in Multicores. IEEE Computer Architecture Letters (CAL) 20, 1 (2021), 17--21.
[67]
Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos. 2016. CoreScope: Graph Mining Using k-Core Analysis --- Patterns, Anomalies and Algorithms. In Proc of the International Conference on Data Mining (ICDM). IEEE, 469--478.
[68]
Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, and Phillip B. Gibbons. 2013. Reducing Contention through Priority Updates. In Proc. of the Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, 152--163.
[69]
Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Brief announcement: The Problem Based Benchmark Suite. In Proc. of the Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, 68--70.
[70]
Gurindar S. Sohi, Scott E. Breach, and T. N. Vijaykumar. 1995. Multiscalar processors. In Proc. of the International Symposium on Computer Architecture (ISCA-22). ACM/IEEE, 414--425.
[71]
J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry. 2000. A scalable approach to thread-level speculation. In Proc. of the International Symposium on Computer Architecture (ISCA-27). ACM/IEEE, 1--12.
[72]
Suvinay Subramanian, Mark C. Jeffrey, Maleen Abeydeera, Hyun Ryong Lee, Victor A. Ying, Joel Emer, and Daniel Sanchez. 2017. Fractal: An execution model for fine-grain nested speculative parallelism. In Proc. of the International Symposium on Computer Architecture (ISCA-44). ACM/IEEE, 587--599.
[73]
Xubin Tan, Jaume Bosch, Miquel Vidal, Carlos Alvarez, Daniel Jiménez-González, Eduard Ayguadé, and Mateo Valero. 2017. General Purpose Task-Dependence Management Hardware for Task-Based Dataflow Programming Models. In Proc. of the International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 244-253.
[74]
Mikkel Thorup. 2000. On RAM Priority Queues. SIAM J. Comput. 30, 1 (2000), 86--109.
[75]
Jenn-Yuan Tsai, Jian Huang, Christoffer Amlo, David J. Lilja, and Pen-Chung Yew. 1999. The Superthreaded Processor Architecture. IEEE Trans. Comput. 48, 9 (1999), 881--902.
[76]
Jean Vuillemin. 1978. A Data Structure for Manipulating Priority Queues. Commun.ACM 21, 4 (1978), 309--315.
[77]
David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. On-chip interconnection architecture of the Tile Processor. IEEE Micro 27, 5 (2007), 15--31.
[78]
Maurice V. Wilkes and William Renwick. 1949. The EDSAC - an Electronic Calculating Machine. Journal of Scientific Instruments 26, 12 (1949), 385--391.
[79]
J. W. J Williams. 1964. Algorithm 232 Heapsort. Commun. ACM 7, 6 (1964), 347--349.
[80]
Martin Wimmer, Jakob Gruber, Jesper Larsson Träff, and Philippas Tsigas. 2015. The Lock-Free k-LSM Relaxed Priority Queue. In Proc. of the Symposium on Principles and Practice of Parallel Programming (PPoPP). ACM, 277--278.
[81]
Stefan Wuchty and Eivind Almaas. 2005. Peeling the yeast protein network. Proteomics 5 (2005), 444--449. Issue 2.
[82]
Jaewon Yang and Jure Leskovec. 2012. Defining and Evaluating Network Communities Based on Ground-Truth. In Proc. International Conference on Data Mining (ICDM). IEEE, 745--754.
[83]
Fahimeh Yazdanpanah, Carlos Alvarez, Daniel Jiménez-González, Rosa M. Badia, and Mateo Valero. 2015. Picos: A hardware runtime architecture support for OmpSs. Future Generation Computer Systems 53 (December 2015), 130--139.
[84]
Luke Yen, Jayaram Bobba, Michael R. Marty, Kevin E. Moore, Haris Volos, Mark D. Hill, Michael M. Swift, and David A. Wood. 2007. LogTM-SE: Decoupling hardware transactional memory from caches. In Proc. of the International Symposium on High Performance Computer Architecture (HPCA-13). IEEE, 261--272.
[85]
Serif Yesil, Azin Heidarshenas, Adam Morrison, and Josep Torrellas. 2019. Understanding Priority-Based Scheduling of Graph Algorithms on a Shared-Memory Platform. In Proc. of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC19). ACM, 1--14.
[86]
Victor A. Ying, Mark C. Jeffrey, and Daniel Sanchez. 2020. T4: Compiling sequential code for effective speculative parallelization in hardware. In Proc. of the International Symposium on Computer Architecture (ISCA-47). ACM/IEEE, 159--172.
[87]
Guowei Zhang, Webb Horn, and Daniel Sanchez. 2015. Exploiting Commutativity to Reduce the Cost of Updates to Shared Data in Cache-Coherent Systems. In Proc. of the International Symposium on Microarchitecture (MICRO-48). IEEE/ACM, 13--25.
[88]
Yunming Zhang, Ajay Brahmakshatriya, Xinyi Chen, Laxman Dhulipala, Shoaib Kamil, Saman Amarasinghe, and Julian Shun. 2020. Optimizing Ordered Graph Algorithms with GraphIt. In Proc. of the International Symposium on Code Generation and Optimization (CGO). IEEE.
[89]
Tingzhe Zhou, Maged Michael, and Michael Spear. 2019. A Practical, Scalable, Relaxed Priority Queue. In Proc. of the International Conference on Parallel Processing (ICPP). ACM, 1--10.

Cited By

View all
  • (2024)MuchiSim: A Simulation Framework for Design Exploration of Multi-Chip Manycore Systems2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00015(48-60)Online publication date: 5-May-2024

Index Terms

  1. A scalable architecture for reprioritizing ordered parallelism

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture
    June 2022
    1097 pages
    ISBN:9781450386104
    DOI:10.1145/3470496
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    • IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 June 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ordered irregular parallelism
    2. priority scheduling
    3. priority updates
    4. speculative execution
    5. task-level parallelism

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ISCA '22
    Sponsor:

    Acceptance Rates

    ISCA '22 Paper Acceptance Rate 67 of 400 submissions, 17%;
    Overall Acceptance Rate 543 of 3,203 submissions, 17%

    Upcoming Conference

    ISCA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)143
    • Downloads (Last 6 weeks)18
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)MuchiSim: A Simulation Framework for Design Exploration of Multi-Chip Manycore Systems2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00015(48-60)Online publication date: 5-May-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media