Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Free access
Just Accepted

Spatial/Temporal Locality-based Load-sharing in Speculative Discrete Event Simulation on Multi-core Machines

Online AM: 08 January 2024 Publication History

Abstract

Shared-memory multi-processor/multi-core machines have become a reference for many application contexts. In particular, the recent literature on speculative parallel discrete event simulation has reshuffled the architectural organization of simulation systems in order to deeply exploit the main features of this type of machines. A core aspect dealt with has been the full sharing of the workload at the level of individual simulation events, which enables keeping the rollback incidence minimal. However, making each worker thread continuously switch its execution between events destined to different simulation objects does not favor locality. This problem appears even more evident in the case of Non-Uniform-Memory-Access (NUMA) machines, where memory accesses generating a cache miss to be served by a far NUMA node give rise to both higher latency and higher traffic at the level of the NUMA interconnection. In this article, we propose a workload-sharing algorithm where the worker threads can have short-term binding with specific simulation objects to favor spatial locality. The new bindings—carried out when a thread decides to switch its execution to other simulation objects—are based on both (a) the timeline according to which the object states have passed through the caching hierarchy and (b) the (dynamic) placement of objects within the NUMA architecture. At the same time, our solution still enables the worker threads to focus their activities on the events to be processed whose timestamps are closer to the simulation commit horizon—hence we exploit temporal locality along virtual time and keep the rollback incidence minimal. In our design we exploit lock-free constructs to support scalable thread synchronization while accessing the shared event pool. Furthermore, we exploit a multi-view approach of the event pool content, which additionally favors local accesses to the parts of the event pool that are currently relevant for the thread activity. Our solution has been released as an integration within the USE (Ultimate-Share-Everything) open source speculative simulation platform available to the community. Furthermore, in this article we report the results of an experimental study that shows the effectiveness of our proposal.

References

[1]
Francesco Antonacci, Alessandro Pellegrini, and Francesco Quaglia. 2013. Consistent and efficient output-streams management in optimistic simulation platforms. In SIGSIM Principles of Advanced Discrete Simulation, SIGSIM-PADS ’13, Montreal, QC, Canada, May 19-22, 2013, Margaret L. Loper and Gabriel A. Wainer (Eds.). ACM, New York, NY, USA, 315–326. https://doi.org/10.1145/2486092.2486133
[2]
Rassul Ayani and Hassan Rajaei. 1994. Parallel simulation based on conservative time windows: A performance study. Concurr. Pract. Exp. 6, 2 (1994), 119–142. https://doi.org/10.1002/cpe.4330060204
[3]
Christopher D. Carothers and Richard Fujimoto. 2000. Efficient Execution of Time Warp Programs on Heterogeneous, NOW Platforms. IEEE Trans. Parallel Distributed Syst. 11, 3 (2000), 299–317. https://doi.org/10.1109/71.841745
[4]
Christopher D. Carothers, Kalyan S. Perumalla, and Richard Fujimoto. 1999. Efficient optimistic parallel simulations using reverse computation. ACM Trans. Model. Comput. Simul. 9, 3 (1999), 224–253. https://doi.org/10.1145/347823.347828
[5]
Christopher D. Carothers, Kalyan S. Perumalla, and Richard M. Fujimoto. 1999. The Effect of State-Saving in Optimistic Simulation on a Cache-Coherent Non-Uniform Memory Access Architecture. In Proceedings of the 31st Conference on Winter Simulation: Simulation—a Bridge to the Future - Volume 2 (Phoenix, Arizona, USA) (WSC ’99). Association for Computing Machinery, New York, NY, USA, 1624–1633. https://doi.org/10.1145/324898.325340
[6]
Li-li Chen, Ya-shuai Lu, Yi-ping Yao, Shao-liang Peng, and Ling-da Wu. 2011. A Well-Balanced Time Warp System on Multi-Core Environments. In Proceedings of the 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation (PADS ’11). IEEE Computer Society, USA, 1–9. https://doi.org/10.1109/PADS.2011.5936752
[7]
Davide Cingolani, Alessandro Pellegrini, and Francesco Quaglia. 2017. Transparently Mixing Undo Logs and Software Reversibility for State Recovery in Optimistic PDES. ACM Trans. Model. Comput. Simul. 27, 2 (2017), 11:1–11:26. https://doi.org/10.1145/3077583
[8]
George Dantzig. 1957. Discrete-variable extremum problems. Operational Research 5(1957).
[9]
Phillip M. Dickens, David M. Nicol, Paul F. Reynolds Jr., and John Mark Duva. 1993. The impact of adding aggressiveness to a non-aggressive windowing protocol. In Proceedings of the 25th Winter Simulation Conference, Los Angeles, California, USA, December 12-15, 1993, Gerald W. Evans, Mansooreh Mollaghasemi, Edward C. Russell, and William E. Biles (Eds.). ACM Press, New York, NY, USA, 731–739. https://doi.org/10.1145/256563.256833
[10]
Phillip M. Dickens, David M. Nicol, Paul F. Reynolds Jr., and John Mark Duva. 1996. Analysis of Bounded Time Warp and Comparison with YAWNS. ACM Trans. Model. Comput. Simul. 6, 4 (1996), 297–320. https://doi.org/10.1145/240896.240913
[11]
Tom Dickman, Sounak Gupta, and Philip A. Wilsey. 2013. Event pool structures for PDES on many-core Beowulf clusters. In Proceedings of the 2013 ACM/SIGSIM Conference on Principles of Advanced Discrete Simulation. Association for Computing Machinery, New York, NY, USA, 103–114. https://doi.org/10.1145/2486092.2486106
[12]
Shlomi Dolev, Danny Hendler, and Adi Suissa. 2008. CAR-STM: scheduling-based collision avoidance and resolution for software transactional memory. In Proceedings of the Twenty-Seventh Annual ACM Symposium on Principles of Distributed Computing, PODC 2008, Toronto, Canada, August 18-21, 2008, Rida A. Bazzi and Boaz Patt-Shamir (Eds.). Association for Computing Machinery, New York, NY, USA, 125–134. https://doi.org/10.1145/1400751.1400769
[13]
Steve Franks, Fabian Gomes, Brian W. Unger, and John G. Cleary. 1997. State Saving for Interactive Optimistic Simulation. In Proceedings of the Eleventh Workshop on Parallel and Distributed Simulation, PADS ’97, Lockenhaus, Austria, June 10-13, 1997, Alois Ferscha, Rassul Ayani, and Carl Tropper (Eds.). IEEE Computer Society, USA, 72–79. https://doi.org/10.1109/PADS.1997.594589
[14]
Yaosheng Fu. 2017. Architectural Support for Large-scale Shared Memory Systems. Ph. D. Dissertation. Princeton University. https://dataspace.princeton.edu/handle/88435/dsp01pv63g290r
[15]
Richard Fujimoto. 1989. Time Warp on a Shared Memory Multiprocessor. In Proceedings of the International Conference on Parallel Processing, ICPP ’89, The Pennsylvania State University, University Park, PA, USA, August 1989. Volume 3: Algorithms and Applications. Pennsylvania State University Press, USA, 242–249.
[16]
Richard M Fujimoto. 1990. Performance of Time Warp Under Synthetic Workloads. In Proceedings of the Multiconference on Distributed Simulation. Society for Computer Simulation, USA, 23–28.
[17]
Richard M. Fujimoto and Maria Hybinette. 1997. Computing Global Virtual Time in Shared-Memory Multiprocessors. ACM Transactions on Modeling and Computer Simulation 7, 4 (1997), 425–446. https://doi.org/10.1145/268403.268404
[18]
Richard M. Fujimoto and Kiran S. Panesar. 1995. Buffer Management in Shared-Memory Time Warp Systems. In Proceedings of the Ninth Workshop on Parallel and Distributed Simulation (Lake Placid, New York, USA) (PADS ’95). IEEE Computer Society, USA, 149–156. https://doi.org/10.1145/214282.214330
[19]
David W. Glazer and Carl Tropper. 1993. On Process Migration and Load Balancing in Time Warp. IEEE Trans. Parallel Distributed Syst. 4, 3 (1993), 318–327. https://doi.org/10.1109/71.210814
[20]
Sounak Gupta and Philip A. Wilsey. 2014. Lock-Free Pending Event Set Management in Time Warp. In Proceedings of the 2nd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (Denver, Colorado, USA) (SIGSIM PADS ’14). Association for Computing Machinery, New York, NY, USA, 15–26. https://doi.org/10.1145/2601381.2601393
[21]
Joshua Hay and Philip A. Wilsey. 2015. Experiments with hardware-based transactional memory in parallel simulation. In Proceedings of the 2015 ACM/SIGSIM Conference on Principles of Advanced Discrete Simulation (PADS). Association for Computing Machinery, New York, NY, USA, 75–86. https://doi.org/10.1145/2769458.2769462
[22]
Yuxiong He, Charles E. Leiserson, and William M. Leiserson. 2010. The Cilkview Scalability Analyzer. In Proceedings of the Twenty-Second Annual ACM Symposium on Parallelism in Algorithms and Architectures (Thira, Santorini, Greece) (SPAA ’10). Association for Computing Machinery, New York, NY, USA, 145–156. https://doi.org/10.1145/1810479.1810509
[23]
Mauro Ianni, Romolo Marotta, Davide Cingolani, Alessandro Pellegrini, and Francesco Quaglia. 2018. The Ultimate Share-Everything PDES System. In Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (Rome, Italy) (SIGSIM-PADS ’18). Association for Computing Machinery, New York, NY, USA, 73–84. https://doi.org/10.1145/3200921.3200931
[24]
David R. Jefferson. 1985. Virtual Time. ACM Trans. Program. Lang. Syst. 7, 3 (1985), 404–425. https://doi.org/10.1145/3916.3988
[25]
Sunil Kandukuri and Stephen Boyd. 2002. Optimal Power Control in Interference-Limited Fading Wireless Channels with Outage-Probability Specifications. IEEE Transactions on Wireless Communications 1, 1(2002), 46–55.
[26]
Jonatan Lindén and Bengt Jonsson. 2013. A Skiplist-Based Concurrent Priority Queue with Minimal Memory Contention. In Proceedings of the 17th International Conference on Principles of Distributed Systems - Volume 8304 (Nice, France) (OPODIS 2013). Springer-Verlag, Berlin, Heidelberg, 206–220. https://doi.org/10.1007/978-3-319-03850-6_15
[27]
Romolo Marotta, Mauro Ianni, Alessandro Pellegrini, and Francesco Quaglia. 2017. A Conflict-Resilient Lock-Free Calendar Queue for Scalable Share-Everything PDES Platforms. In Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (Singapore, Republic of Singapore) (SIGSIM-PADS ’17). Association for Computing Machinery, New York, NY, USA, 15–26. https://doi.org/10.1145/3064911.3064926
[28]
John M. Mellor-Crummey and Michael L. Scott. 1991. Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors. ACM Trans. Comput. Syst. 9, 1 (1991), 21–65. https://doi.org/10.1145/103727.103729
[29]
Cristina Montañola-Sales, Joan Francesc Gilabert-Navarro, Josep Casanovas-Garcia, Clara Prats, Daniel López, Joaquim Valls, Pere-Joan Cardona, and Cristina Vilaplana. 2015. Modeling tuberculosis in Barcelona: a solution to speed-up agent-based simulations. In Proceedings of the 2015 Winter Simulation Conference, Huntington Beach, CA, USA, December 6-9, 2015. IEEE/ACM, USA, 1295–1306. https://doi.org/10.1109/WSC.2015.7408254
[30]
Federica Montesano, Romolo Marotta, and Francesco Quaglia. 2022. Spatial/Temporal Locality-based Load-sharing in Speculative Discrete Event Simulation on Multi-core Machines. In SIGSIM-PADS ’22: SIGSIM Conference on Principles of Advanced Discrete Simulation, Atlanta, GA, USA, June 8 - 10, 2022, Kalyan Perumalla, Margaret Loper, Dong (Kevin) Jin, and Christopher D. Carothers (Eds.). ACM, New York, NY, USA, 81–92. https://doi.org/10.1145/3518997.3531026
[31]
Federica Montesano, Romolo Marotta, and Francesco Quaglia. 2023. Artifact of ”Spatial/Temporal Locality-based Load- sharing in Speculative Discrete Event Simulation on Multi-core Machines” (TOMACS). https://doi.org/10.5281/zenodo.7342950
[32]
Avinash C. Palaniswamy and Philip A. Wilsey. 1993. Adaptive bounded time windows in an optimistically synchronized simulator. In Third Great Lakes Symposium on Design Automation of High Performance VLSI Systems, Kalamazoo, MI, USA, March 5-6, 1993. IEEE, USA, 114–118. https://doi.org/10.1109/GLSV.1993.224467
[33]
Alessandro Pellegrini and Francesco Quaglia. 2014. Wait-free Global Virtual Time computation in shared memory Time-Warp systems. In Proceedings of the 26th International Conference on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE Computer Society, USA, 9–16. https://doi.org/10.1109/SBAC-PAD.2014.38
[34]
Alessandro Pellegrini and Francesco Quaglia. 2015. NUMA Time Warp. In Proceedings of the 3rd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (London, United Kingdom) (SIGSIM PADS ’15). Association for Computing Machinery, New York, NY, USA, 59–70. https://doi.org/10.1145/2769458.2769479
[35]
Alessandro Pellegrini and Francesco Quaglia. 2019. Cross-State Events: a New Approach to Parallel Discrete Event Simulation and its Speculative Runtime Support. J. Parallel and Distrib. Comput. 132 (2019), 48–68. https://doi.org/10.1016/j.jpdc.2019.05.003
[36]
Bruno R. Preiss, Wayne M. Loucks, and Ian D. MacIntyre. 1994. Effects of the Checkpoint Interval on Time and Space in Time Warp. ACM Trans. Model. Comput. Simul. 4, 3 (1994), 223–253. https://doi.org/10.1145/189443.189444
[37]
Francesco Quaglia. 1998. Event History Based Sparse State Saving in Time Warp. In Proceedings of the 12th Workshop on Parallel and Distributed Simulation, PADS ’98, Banff, Alberta, Canada, May 26-29, 1998, Brian W. Unger and Alois Ferscha (Eds.). IEEE Computer Society, USA, 72–79. https://doi.org/10.1109/PADS.1998.685272
[38]
Francesco Quaglia. 2015. A low-overhead constant-time lowest-timestamp-first CPU scheduler for high-performance optimistic simulation platforms. Simulation Modelling Practice and Theory 53 (2015), 103–122.
[39]
Francesco Quaglia and Vittorio Cortellessa. 2002. On the processor scheduling problem in time warp synchronization. ACM Trans. Model. Comput. Simul. 12, 3 (2002), 143–175. https://doi.org/10.1145/643114.643115
[40]
Maryan Rab, Romolo Marotta, Mauro Ianni, Alessandro Pellegrini, and Francesco Quaglia. 2021. NUMA-Aware Non-Blocking Calendar Queue. In Proceedings of the IEEE/ACM 24th International Symposium on Distributed Simulation and Real Time Applications (Prague, Czech Republic) (DS-RT ’20). IEEE Press, USA, 59–67.
[41]
Dhananjai Madhava Rao and Julius Higiro. 2019. Managing Pending Events in Sequential and Parallel Simulations Using Three-tier Heap and Two-tier Ladder Queue. ACM Trans. Model. Comput. Simul. 29, 2 (2019), 9:1–9:28. https://doi.org/10.1145/3265750
[42]
Daniel A. Reed and Jack J. Dongarra. 2015. Exascale computing and big data. Commun. ACM 58, 7 (2015), 56–68. https://doi.org/10.1145/2699414
[43]
Peter L Reiher and David Jefferson. 1990. Virtual time based dynamic load management in the time warp operating system. Transactions of the Society for Computer Simulation 7, 2 (1990), 91–120.
[44]
Robert Rönngren, Michael Liljenstam, Rassul Ayani, and Johan Montagnat. 1996. Transparent Incremental State Saving in Time Warp Parallel Discrete Event Simulation. In Proceedings of the Tenth Workshop on Parallel and Distributed Simulation, PADS ’96, Philadelphia, PA, USA, May 22-24, 1996, Wayne M. Loucks and Bruno R. Preiss (Eds.). IEEE Computer Society, USA, 70–77. https://doi.org/10.1109/PADS.1996.761564
[45]
Caitlin J. Ross, Christopher D. Carothers, Misbah Mubarak, Robert B. Ross, Jianping Kelvin Li, and Kwan-Liu Ma. 2018. Leveraging Shared Memory in the Ross Time Warp Simulator for Complex Network simulations. In 2018 Winter Simulation Conference, WSC 2018, Gothenburg, Sweden, December 9-12, 2018, Björn Johansson and Sanjay Jain (Eds.). IEEE, USA, 3837–3848. https://doi.org/10.1109/WSC.2018.8632333
[46]
Markus Schordan, Tomas Oppelstrup, David R. Jefferson, and Peter D. Barnes Jr. 2018. Generation of Reversible C++ Code for Optimistic Parallel Discrete Event Simulation. New Gener. Comput. 36, 3 (2018), 257–280. https://doi.org/10.1007/s00354-018-0038-2
[47]
Emiliano Silvestri, Alessandro Pellegrini, Pierangelo di Sanzo, and Francesco Quaglia. 2022. Effective Runtime Management of Tasks and Priorities in GNU OpenMP Applications. IEEE Trans. Computers 71, 10 (2022), 2632–2645. https://doi.org/10.1109/TC.2021.3139463
[48]
Sudhir Srinivasan and Paul F. Reynolds Jr. 1998. Elastic Time. ACM Trans. Model. Comput. Simul. 8, 2 (1998), 103–139. https://doi.org/10.1145/280265.280267
[49]
Brian Paul Swenson and George F. Riley. 2012. A New Approach to Zero-Copy Message Passing with Reversible Memory Allocation in Multi-Core Architectures. In Proceedings of the 2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation (PADS ’12). IEEE Computer Society, USA, 44–52. https://doi.org/10.1109/PADS.2012.3
[50]
Wenjie Tang, Yiping Yao, Tianlin Li, Xiao Song, and Feng Zhu. 2020. An Adaptive Persistence and Work-stealing Combined Algorithm for Load Balancing on Parallel Discrete Event Simulation. ACM Trans. Model. Comput. Simul. 30, 2 (2020), 12:1–12:26. https://doi.org/10.1145/3364218
[51]
W. T. Tang, R. S. M. Goh, and I. L. Thng. 2005. Ladder queue: An O(1) priority queue structure for large-scale discrete event simulation. ACM Transactions on Modeling and Computer Simulation 15, 3(2005), 175–204.
[52]
R Kent Treiber. 1986. Systems programming: Coping with parallelism. Technical Report. International Business Machines Incorporated, Thomas J. Watson Research.
[53]
Roberto Vitali, Alessandro Pellegrini, and Gionata Cerasuolo. 2012. Cache-aware memory manager for optimistic simulations. In International ICST Conference on Simulation Tools and Techniques, SIMUTOOLS ’12, Sirmione-Desenzano, Italy, March 19-23, 2012, George F. Riley, Francesco Quaglia, and Jan Himmelspach (Eds.). ICST/ACM, Brussels, BEL, 129–138. https://doi.org/10.4108/icst.simutools.2012.247766
[54]
Roberto Vitali, Alessandro Pellegrini, and Francesco Quaglia. 2012. Load sharing for optimistic parallel simulations on multi core machines. ACM SIGMETRICS Performance Evaluation Review 40, 3 (jan 2012), 2–11. https://doi.org/10.1145/2425248.2425250
[55]
Roberto Vitali, Alessandro Pellegrini, and Francesco Quaglia. 2012. Towards Symmetric Multi-threaded Optimistic Simulation Kernels. In 26th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation, PADS 2012, Zhangjiajie, China, July 15-19, 2012. IEEE Computer Society, USA, 211–220. https://doi.org/10.1109/PADS.2012.46
[56]
Jingjing Wang, Deepak Jagtap, Nael B. Abu-Ghazaleh, and Dmitry Ponomarev. 2014. Parallel discrete event simulation for multi-core systems: Analysis and optimization. IEEE Transactions on Parallel and Distributed Systems 25, 6 (2014), 1574–1584. https://doi.org/10.1109/TPDS.2013.193
[57]
Zhonge Xiao, Fabian Gomes, Brian W. Unger, and John G. Cleary. 1995. A fast asynchronous GVT algorithm for shared memory multiprocessor architectures. In Proceedings of the Ninth Workshop on Parallel and Distributed Simulation, PADS’95, Lake Placid, New York, USA, June 14-16, 1995. IEEE Computer Society, USA, 203–208. https://doi.org/10.1109/PADS.1995.404296

Cited By

View all
  • (2024)Full-Stack Revision of Memory and Data Management in PDES on Multi-Core MachinesProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658831(417-420)Online publication date: 3-Jun-2024
  • (2024)Towards the Optimization of Memory and Data Management in Speculative PDES on Multi-Core MachinesProceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3615979.3662150(63-64)Online publication date: 24-Jun-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Modeling and Computer Simulation
ACM Transactions on Modeling and Computer Simulation Just Accepted
EISSN:1558-1195
Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Online AM: 08 January 2024
Accepted: 07 December 2023
Revised: 25 November 2023
Received: 21 November 2022

Check for updates

Author Tags

  1. PDES
  2. load-sharing
  3. shared-memory
  4. multi-core

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)83
  • Downloads (Last 6 weeks)19
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Full-Stack Revision of Memory and Data Management in PDES on Multi-Core MachinesProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658831(417-420)Online publication date: 3-Jun-2024
  • (2024)Towards the Optimization of Memory and Data Management in Speculative PDES on Multi-Core MachinesProceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3615979.3662150(63-64)Online publication date: 24-Jun-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media