Abstract
As the high performance computing (HPC) community continues to push for ever larger machines, reliability remains a serious obstacle. Further, as feature size and voltages decrease, the rate of transient soft errors is on the rise. HPC programmers of today have to deal with these faults to a small degree and it is expected this will only be a larger problem as systems continue to scale.
In this paper we present SEFI, the Soft Error Fault Injection framework, a tool for profiling software for its susceptibility to soft errors. In particular, we focus in this paper on logic soft error injection. Using the open source virtual machine and processor emulator (QEMU), we demonstrate modifying emulated machine instructions to introduce soft errors. We conduct experiments by modifying the virtual machine itself in a way that does not require intimate knowledge of the tested application. With this technique, we show that we are able to inject simulated soft errors in the logic operations of a target application without affecting other applications or the operating system sharing the VM. We present some initial results and discuss where we think this work will be useful in next generation hardware/software co-design.
Chapter PDF
Similar content being viewed by others
Keywords
References
Bellard, F.: Qemu, a fast and portable dynamic translator. In: Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC 2005, p. 41. USENIX Association, Berkeley (2005)
Bronevetsky, G., Laguna, I., Bagchi, S., de Supinski, B., Schulz, M., Anh, D.: Statistical fault detection for parallel applications with automaded. In: IEEE Workshop on Silicon Errors in Logic - System Effects, SELSE (March 2010)
Bronevetsky, G., de Supinski, B.: Soft error vulnerability of iterative linear algebra methods. In: Workshop on Silicon Errors in Logic - System Effects, SELSE (April 2007)
Bronevetsky, G., de Supinski, B.R., Schulz, M.: A foundation for the accurate prediction of the soft error vulnerability of scientic applications. In: IEEE Workshop on Silicon Errors in Logic - System Effects (March 2009)
Cappello, F., Geist, A., Gropp, B., Kale, L., Kramer, B., Snir, M.: Toward exascale resilience. International Journal of High Performance Computing Applications 23, 374–388 (2009)
DeBardeleben, N., Laros, J., Daly, J., Scott, S., Engelmann, C., Harrod, B.: High-end computing resilience: Analysis of issues facing the hec community and path-forward for research and development (December 2009), http://institute.lanl.gov/resilience/docs/HECResilience.pdf
Dongarra, J., et al.: The international exascale software project roadmap. International Journal of High Performance Computing Applications 25, 3–60 (2011)
Kogge, P., et al.: Exascale computing study: Technology challenges in achieving exascale systems (2008)
Naughton, T., Bland, W., Vallee, G., Engelmann, C., Scott, S.L.: Fault injection framework for system resilience evaluation: fake faults for finding future failures. In: Proceedings of the 2009 Workshop on Resiliency in High Performance, Resilience 2009, pp. 23–28. ACM, New York (2009)
Quinn, H., Graham, P.: Terrestrial-based radiation upsets: A cautionary tale. In: Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, pp. 193–202. IEEE Computer Society, Washington, DC (2005)
Song, D., Brumley, D., Yin, H., Caballero, J., Jager, I., Kang, M.G., Liang, Z., Newsome, J., Poonsankam, P., Saxena, P.: A high-level overview covering vine, temu, and rudder. In: Proceedings of the 4th International Conference on Information Systems Security (December 2008)
Stott, D., Floering, B., Burke, D., Kalbarczpk, Z., Iyer, R.: Nftape: a framework for assessing dependability in distributed systems with lightweight fault injectors. In: Proceedings of IEEE International Computer Performance and Dependability Symposium, IPDS 2000, pp. 91–100 (2000)
Ziegler, J.F., Lanford, W.A.: The effect of sea level cosmic rays on electric devices. Journal Applied Physics 528 (1981)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
DeBardeleben, N., Blanchard, S., Guan, Q., Zhang, Z., Fu, S. (2012). Experimental Framework for Injecting Logic Errors in a Virtual Machine to Profile Applications for Soft Error Resilience. In: Alexander, M., et al. Euro-Par 2011: Parallel Processing Workshops. Euro-Par 2011. Lecture Notes in Computer Science, vol 7156. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29740-3_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-29740-3_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29739-7
Online ISBN: 978-3-642-29740-3
eBook Packages: Computer ScienceComputer Science (R0)