Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/DS-RT.2016.11acmconferencesArticle/Chapter ViewAbstractPublication Pagesds-rtConference Proceedingsconference-collections
tutorial

Fault-Tolerant Adaptive Parallel and Distributed Simulation

Published: 21 September 2016 Publication History

Abstract

Discrete Event Simulation is a widely used technique that is used to model and analyze complex systems in many fields of science and engineering. The increasingly large size of simulation models poses a serious computational challenge, since the time needed to run a simulation can be prohibitively large. For this reason, Parallel and Distributes Simulation techniques have been proposed to take advantage of multiple execution units which are found in multicore processors, cluster of workstations or HPC systems. The current generation of HPC systems includes hundreds of thousands of computing nodes and a vast amount of ancillary components. Despite improvements in manufacturing processes, failures of some components are frequent, and the situation will get worse as larger systems are built. In this paper we describe FT-GAIA, a software-based fault-tolerant extension of the GAIA/ARTÌS parallel simulation middleware. FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes; furthermore, FT-GAIA offers some protection against Byzantine failures since synchronization messages are replicated as well, so that the receiving entity can identify and discard corrupted messages. We provide an experimental evaluation of FT-GAIA on a running prototype. Results show that a high degree of fault tolerance can be achieved, at the cost of a moderate increase in the computational load of the execution units.

References

[1]
R. M. Fujimoto, Parallel and distributed simulation systems, ser. Wiley series on parallel and distributed computing. Wiley, 2000.
[2]
X. Yang, Z. Wang, J. Xue, and Y. Zhou, "The reliability wall for exascale supercomputing," Computers, IEEE Transactions on, vol. 61, no. 6, pp. 767--779, 2012.
[3]
G. Bolch, S. Greiner, H. de Meer, and K. Trivedi, Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications. Wiley, 1998.
[4]
M. Dowson, "The ariane 5 software failure," SIGSOFT Softw. Eng. Notes, vol. 22, no. 2, pp. 84--, Mar. 1997.
[5]
A. Avizienis, "The N-version approach to fault-tolerant software," IEEE Trans. Softw. Eng., vol. 11, no. 12, pp. 1491--1501, Dec. 1985.
[6]
L. Bononi, M. Bracuto, G. D'Angelo, and L. Donatiello, "A new adaptive middleware for parallel and distributed simulation of dynamically interacting systems," in Proceedings of the 8th IEEE International Symposium on Distributed Simulation and Real-Time Applications. Washington, DC, USA: IEEE Computer Society, 2004, pp. 178--187.
[7]
L. Bononi, "ARTÌS: A parallel and distributed simulation middleware for performance evaluation," in ISCIS, ser. Lecture Notes in Computer Science, C. Aykanat, T. Dayar, and I. Korpeoglu, Eds., vol. 3280. Springer, 2004, pp. 627--637.
[8]
F. Cristian, "Understanding fault-tolerant distributed systems," Commun. ACM, vol. 34, no. 2, pp. 56--78, Feb. 1991.
[9]
O. P. Damani and V. K. Garg, "Fault-tolerant distributed simulation," in Proceedings of the Twelfth Workshop on Parallel and Distributed Simulation, ser. PADS '98. Washington, DC, USA: IEEE Computer Society, 1998, pp. 38--45.
[10]
D. R. Jefferson, "Virtual time," ACM Trans. Program. Lang. Syst., vol. 7, no. 3, pp. 404--425, Jul. 1985.
[11]
M. Eklöf, F. Moradi, and R. Ayani, "A framework for fault-tolerance in hla-based distributed simulations," in Proceedings of the 37th Conference on Winter Simulation, ser. WSC '05. Winter Simulation Conference, 2005, pp. 1182--1189.
[12]
M. Eklof, R. Ayani, and F. Moradi, "Evaluation of a fault-tolerance mechanism for hla-based distributed simulations," in Proceedings of the 20th Workshop on Principles of Advanced and Distributed Simulation, ser. PADS '06. Washington, DC, USA: IEEE Computer Society, 2006, pp. 175--182.
[13]
"IEEE Standard for Modeling and Simulation (M&S) High Level Architecture (HLA)--Framework and Rules," IEEE Std 1516-2010 (Revision of IEEE Std 1516-2000), pp. 1--38, 2010.
[14]
D. Chen, S. J. Turner, W. Cai, and M. Xiong, "A decoupled federate architecture for high level architecture-based distributed simulation," Journal of Parallel and Distributed Computing, vol. 68, no. 11, pp. 1487--1503, 2008.
[15]
J. A. Kohl and P. M. Papadopoulas, "Efficient and flexible fault tolerance and migration of scientific simulations using cumulvs," in Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools, ser. SPDT '98. New York, NY, USA: ACM, 1998, pp. 60--71.
[16]
J. Lüthi and S. Großmann, Computational Science - ICCS 2004: 4th International Conference, Kraków, Poland, June 6--9, 2004, Proceedings, Part III. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004, ch. FT-RSS: A Flexible Framework for Fault Tolerant HLA Federations, pp. 865--872.
[17]
D. Agrawal and J. R. Agre, "Replicated objects in time warp simulations," in Proceedings of the 24th Conference on Winter Simulation, ser. WSC '92. New York, NY, USA: ACM, 1992, pp. 657--664.
[18]
Z. Guessoum, J.-P. Briot, N. Faci, and O. Marin, "Towards Reliable Multi-Agent Systems. An Adaptive Replication Mechanism," International Journal of MultiAgent and Grid Systems, vol. 6, no. 1, 2010. {Online}. Available: http://liris.cnrs.fr/publis/?id=4840
[19]
G. D'Angelo and M. Marzolla, "New trends in parallel and distributed simulation: From many-cores to cloud computing," Simulation Modelling Practice and Theory (SIMPAT), 2014.
[20]
"Parallel And Distributed Simulation (PADS) research group," http://pads.cs.unibo.it, 2016.
[21]
IEEE 1516 Standard, Modeling and Simulation (M&S) High Level Architecture (HLA), 2000.
[22]
K. M. Chandy and J. Misra, "Asynchronous distributed simulation via a sequence of parallel computations," Commun. ACM, vol. 24, no. 4, pp. 198--206, Apr. 1981.
[23]
G. D'Angelo and S. Ferretti, "Simulation of scale-free networks," in Proc. of International Conference on Simulation Tools and Techniques, ser. Simutools '09, 2009, pp. 20:1--20:10.
[24]
G. D'Angelo and S. Ferretti, "LUNES: Agent-based Simulation of P2P Systems," in Proceedings of the International Workshop on Modeling and Simulation of Peer-to-Peer Architectures and Systems (MOSPAS 2011). IEEE, 2011.
[25]
J. Färber, "Network game traffic modelling," in Proceedings of the 1st Workshop on Network and System Support for Games, ser. NetGames '02. New York, NY, USA: ACM, 2002, pp. 53--57.

Cited By

View all
  • (2018)Anonymity and confidentiality in secure distributed simulationProceedings of the 22nd International Symposium on Distributed Simulation and Real Time Applications10.5555/3330299.3330308(71-78)Online publication date: 15-Oct-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DS-RT '16: Proceedings of the 20th International Symposium on Distributed Simulation and Real-Time Applications
September 2016
205 pages
ISBN:9781509035045

Sponsors

Publisher

IEEE Press

Publication History

Published: 21 September 2016

Check for updates

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

DS-RT '16

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Anonymity and confidentiality in secure distributed simulationProceedings of the 22nd International Symposium on Distributed Simulation and Real Time Applications10.5555/3330299.3330308(71-78)Online publication date: 15-Oct-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media