Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3384441.3395976acmconferencesArticle/Chapter ViewAbstractPublication PagespadsConference Proceedingsconference-collections
research-article
Public Access

Demand-Driven PDES: Exploiting Locality in Simulation Models

Published: 15 June 2020 Publication History

Abstract

Traditional parallel discrete event simulation (PDES) systems treat each simulation thread in the same manner, regardless of whether a thread has events to process in its input queue or not. At the same time, many real-life simulation models exhibit significant execution locality, where only part of the model (and thus a subset of threads) are actively sending or receiving messages in a given time period. These inactive threads still continuously check their queues and participate in simulation-wide time synchronization mechanisms, such as computing Global Virtual Time (GVT). This wastes resources, ties up CPU cores with threads that offer no contribution to event processing and limits the performance and scalability of the simulation.
In this paper, we propose a new paradigm for managing PDES threads that we call Demand-Driven PDES (DD-PDES). The key idea behind DD-PDES is to identify threads that have no events to process and de-schedule them from the CPU until they receive a message requiring event processing. Furthermore, these inactive threads are also excluded from participation in the GVT computation, accelerating that process as a result. DD-PDES ensures that the CPU cycles are mostly spent on actual event processing, resulting in performance improvements. This architecture allows for significant over-subscription of threads by exceeding the number of available hardware thread contexts on the chip. We demonstrate that on a Knights Landing processor, DD-PDES significantly outperforms the traditional simulation equipped with the best currently proposed GVT algorithms.

References

[1]
K. Bahulkar, J. Wang, N. Abu-Ghazaleh, and D. Ponomarev. 2012. Partitioning on Dynamic Bahavior for Parallel Discrete Event Simulation. In 26th IEEE/ACM/SCS Workshop on Principles of Advanced and Distributed Simulations (PADS).
[2]
Peter D Barnes Jr, Christopher D Carothers, David R Jefferson, and Justin M LaPre. 2013. Warp speed: executing time warp on 1,966,080 cores. In Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. ACM, 327--336.
[3]
D. Bauer, C. Carothers, and A. Holder. 2009. Scalable Time Warp on Bluegene Supercomputer. In Proc. of the ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS).
[4]
S. CarnÃă, S. Ferracci, E. De Santis, A. Pellegrini, and F. Quaglia. 2019. Hardware- Assisted Incremental Checkpointing in Speculative Parallel Discrete Event Simulation. In 2019 Winter Simulation Conference (WSC). 2759--2770.
[5]
C. Carothers, D. Bauer, and S. Pearce. 2000. ROSS: A High-Performance, Low Memory, Modular Time Warp System. In Proc of the 11th Workshop on Parallel and Distributed Simulation (PADS).
[6]
K. M. Chandy and L. Lamport. 1985. Distributed Snapshots: Determining Global States of Distributed Systems. ACM Transactions on Computer Systems 3, 1 (Feb. 1985), 63--75.
[7]
H. Chen, Y.Yao, andW. Tang. 2015. Can MIC Find Its Place in theWorld of PDES?. In Proceedings of International Symposium on Distributed Simulation and Real Time Systems (DS-RT).
[8]
Matthew Curtis-Maury, Xiaoning Ding, Christos D. Antonopoulos, and Dimitrios S. Nikolopoulos. 2005. An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors. In Proceedings of the 2005 and 2006 International Conference on OpenMP Shared Memory Parallel Programming (Eugene, OR, USA) (IWOMPâ??05/IWOMPâ??06). Springer-Verlag, Berlin, Heidelberg, 133â??144.
[9]
Ali Eker, Barry Williams, Kenneth Chiu, and Dmitry Ponomarev. 2019. Controlled Asynchrnous GVT: Accelerating Parallel Discrete Event Simulation on Many- Core Clusters. In Proceedings of 48th International Conference on Parallel Processing (ICPP 2019). 1--10.
[10]
Ali Eker, Barry Williams, Nitesh Mishra, Dushyant Thakur, Kenneth Chiu, Dmitry Ponomarev, and Nael Abu-Ghazaleh. 2018. Performance Implications of Global Virtual Time Algorithms on a Knights Landing Processor. In 2018 IEEE/ACM 22nd International Symposium on Distributed Simulation and Real Time Applications (DS-RT). IEEE, 1--10.
[11]
R. Fujimoto. 1990. Parallel Discrete Event Simulation. Commun. ACM 33, 10 (Oct. 1990), 30--53.
[12]
R. Fujimoto. 1990. Performance of Time Warp under synthetic workloads. Proceedings of the SCS Multiconference on Distributed Simulation 22, 1 (Jan. 1990), 23--28.
[13]
G.Chrysos. 2012. Intel Xeon Phi x100 Family Coprocessor - the Architecture. In Intel white paper.
[14]
S. Gupta and P. A. Wilsey. 2014. Lock-Free Pending Event Set Management in Time Warp. In ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (PADS).
[15]
Chao Huang, Orion Lawlor, and L. KalÃÍ. 2004. Adaptive MPI. 306--322. https: //doi.org/10.1007/978-3-540-24644-2_20
[16]
C. Iancu, S. Hofmeyr, F. Blagojevi-ÄĞ, and Y. Zheng. 2010. Oversubscription on multicore processors. In 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS). 1--11.
[17]
M. Ianni, R. Marotta, D. Cingolani, A. Pellegrini, and F. Quaglia. 2018. The Ultimate Share-Everything PDES System. In 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. 73--84.
[18]
M. Ianni, R. Marotta, A. Pellegrini, and F. Quaglia. 2017. A non-blocking global virtual time algorithm with logarithmic number of memory operations. In 2017 IEEE/ACM 21st International Symposium on Distributed Simulation and Real Time Applications (DS-RT). 1--8. https://doi.org/10.1109/DISTRA.2017.8167662
[19]
Deepak Jagtap, Ketan Bahulkar, Dmitry Ponomarev, and Nael Abu-Ghazaleh. 2012. Characterizing and understanding pdes behavior on tilera architecture. In 2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation. IEEE, 53--62.
[20]
D. Jagtap, N.Abu-Ghazaleh, and D.Ponomarev. 2012. Optimization of Parallel Discrete Event Simulator for Multi-core Systems. In International Parallel and Distributed Processing Symposium.
[21]
D. Jefferson. 1985. Virtual Time. ACM Transactions on Programming Languages and Systems 7, 3 (July 1985), 405--425.
[22]
Z. Lin and Y. Yao. 2015. An asynchronous GVT computing algorithm in neuron time warp-multi thread. In 2015 Winter Simulation Conference (WSC). 1115--1126. https://doi.org/10.1109/WSC.2015.7408238
[23]
Jonatan Linden, Pavol Bauer, Stefan Engblom, and Bengt Jonsson. 2019. Exposing Inter-process Information for Efficient PDES of Spatial Stochastic Systems on Multicores. ACM Transactions on Modeling and Computer Simulation 29, 2, 0--25.
[24]
F. Mattern. 1993. Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation. J. Parallel and Distrib. Comput. 18, 4 (Aug. 1993), 423--434.
[25]
Eric Mikida and Laxmikant V Kale. 2018. Adaptive methods for irregular parallel discrete event simulation workloads. In SIGSIM-PADS 2018 - Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. Association for Computing Machinery, Inc, 189--200. https://doi.org/10.1145/3200921. 3200936
[26]
Eric Mikida and Laxmikant V Kale. 2019. An adaptive non-blocking GVT algorithm. In SIGSIM-PADS 2019 - Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. 25--36. https://doi.org/10.1145/ 3316480.3322896
[27]
Alessandro Pellegrini and Francesco Quaglia. 2014. Wait-free global virtual time computation in shared memory timewarp systems. In Computer Architecture and High Performance Computing (SBAC-PAD), 2014 IEEE 26th International Symposium on. IEEE, 9--16.
[28]
Kalyan S. Perumalla, Alfred J. Park, and Vinod Tipparaju. 2011. GVT Algorithms and Discrete Event Dynamics on 129K+ Processor Cores. In Proceedings of the 2011 18th International Conference on High Performance Computing. IEEE Computer Society, 1--11.
[29]
Kalyan S. Perumalla, Alfred J. Park, and Vinod Tipparaju. 2014. Discrete Event Execution with One-Sided and Two-Sided GVT Algorithms on 216,000 Processor Cores. ACM Trans. Model. Comput. Simul. 24, 3, Article 16 (June 2014), 25 pages. https://doi.org/10.1145/2611561
[30]
B. Samadi. 1985. Distributed Simulation, Algorithms and Performance Analysis. Ph.D. Dissertation. Computer Science Department, University of California, Los Angeles, CA.
[31]
A. Sodani, R. Gramunt, J. Corbal, H. Kim, K. Vinod, S. Chinthamani, S. HUtsell, R. Agarwal, and Y. Liu. 2016. Knights Landing: Second-Generation Intel Xeon Phi Product. In IEEE Micro.
[32]
D Terpstra, H Jagode, H You, and J Dongarra. 2010. Collecting Performance Data with PAPI-C. In Tools for High Performance Computing 2009, 3rd Parallel Tools Workshop. Springer Berlin / Heidelberg, Dresden, Germany, 157--173.
[33]
Jingjing Wang, Deepak Jagtap, Nael Abu-Ghazaleh, and Dmitry Ponomarev. 2014. Parallel discrete event simulation for multi-core systems: Analysis and optimization. IEEE Transactions on Parallel and Distributed Systems 25, 6 (2014), 1574--1584.
[34]
Barry Williams, Dmitry Ponomarev, Nael Abu-Ghazaleh, and Philip Wilsey. 2017. Performance characterization of parallel discrete event simulation on knights landing processor. In Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. ACM, 121--132.
[35]
Linda F Wilson and Wei Shen. 1998. Experiments in load migration and dynamic load balancing in speedes. In 1998 Winter Simulation Conference. Proceedings (Cat. No. 98CH36274), Vol. 1. IEEE, 483--490.
[36]
Q. Xiong, E. Ates, M. C. Herbordt, and A. K. Coskun. 2018. Tangram: Colocating HPC Applications with Oversubscription. In 2018 IEEE High Performance extreme Computing Conference (HPEC). 1--7.

Cited By

View all
  • (2021)GVT-Guided Demand-Driven Scheduling in Parallel Discrete Event SimulationProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472470(1-10)Online publication date: 9-Aug-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSIM-PADS '20: Proceedings of the 2020 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
June 2020
204 pages
ISBN:9781450375924
DOI:10.1145/3384441
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. global virtual time
  2. intel xeon phi
  3. knights landing
  4. locality
  5. manycore architecture
  6. modeling
  7. parallel discrete event simulation
  8. performance
  9. simulation

Qualifiers

  • Research-article

Funding Sources

  • AFOSR

Conference

SIGSIM-PADS '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 398 of 779 submissions, 51%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)129
  • Downloads (Last 6 weeks)28
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)GVT-Guided Demand-Driven Scheduling in Parallel Discrete Event SimulationProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472470(1-10)Online publication date: 9-Aug-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media