Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1542431.1542439acmconferencesArticle/Chapter ViewAbstractPublication PagesismmConference Proceedingsconference-collections
research-article

Self-recovery in server programs

Published: 19 June 2009 Publication History

Abstract

It is important that long running server programs retain availability amidst software failures. However, server programs do fail and one of the important causes of failures in server programs is due to memory errors. Software bugs in the server code like buffer overflows, integer overflows, etc. are exposed by certain user requests, leading to memory corruption, which can often result in crashes. One safe way of recovering from these crashes is to periodically checkpoint program state and rollback to the most recent checkpoint on a crash. However, checkpointing program state periodically can be quite expensive. Furthermore, since recovery can involve the rolling back of considerable state information in addition to replay of several benign user requests, the throughput and response time of the server can be reduced significantly during rollback recovery.
In this paper, we first conducted a detailed study to see how memory corruption propagates in server programs. Our study shows that memory locations that are corrupted during the processing of an user request, generally do not propagate across user requests. On the contrary, the memory locations that are corrupted are generally cleansed automatically, as memory (stack or the heap) gets deallocated or when memory gets overwritten with uncorrupted values. This self cleansing property in server programs led us to believe that recovering from crashes does not necessarily require the expensive roll back of state for recovery. Motivated by this observation, we propose SRS, a technique for self recovery in server programs which takes advantage of self-cleansing to recover from crashes. Those memory locations that are not fully cleansed are restored in a demand driven fashion, which makes SRS very efficient. Thus in SRS, when a crash occurs instead of rolling back to a safe state, the crash is suppressed and the program is made to execute forwards past the crash; we employ a mechanism called crash suppression, to prevent further crashes from recurring as the execution proceeds forwards. Experiments conducted on real world server programs with real bugs, show that in each of the cases the server program could efficiently recover from the crash and the faulty user request was isolated from future benign user requests.

References

[1]
mysql bug. bugs.mysql.com/bug.php?id=110.
[2]
National vulnerability database. http://nvd.nist.gov/statistics.cfm.
[3]
Derek Bruening, Timothy Garnett, and Saman Amarasinghe. An infrastructure for adaptive dynamic optimization. In CGO, pages 265--275. IEEE Computer Society, 2003.
[4]
George Candea, Shinichi Kawamoto, Yuichi Fujiki, Greg Friedman, and Armando Fox. Microreboot -- a technique for cheap recovery. In OSDI, pages 31--44, 2004.
[5]
Winnie Cheng, Qin Zhao, Bei Yu, and Scott Hiroshige. Tainttrace: Efficient Flow tracing with dynamic binary rewriting. ISCC, pages 749--754, 2006.
[6]
JaeWoong Chung, Michael Dalton, Hari Kannan, and Christos Kozyrakis. Thread-safe binary translation using transactional memory. In HPCA, 2008.
[7]
Jedidiah R. Crandall, S. Felix Wu, and Frederic T. Chong. Minos: Architectural support for protecting control data. ACM Trans. Archit. Code Optim., 3(4):359--389, 2006.
[8]
Michael Dalton, Hari Kannan, and Christos Kozyrakis. Raksha: a Flexible information flow architecture for software security. In ISCA, pages 482--493, 2007.
[9]
Jim Gray. Why do computers stop and what can be done about it? In Symposium on Reliability in Distributed Software and Database Systems, pages 3--12, 1986.
[10]
Shan Lu, Zhenmin Li, Feng Qin, Lin Tan, Pin Zhou, and Yuanyuan Zhou. Bugbench: A benchmark for evaluating bug detection tools. In Bugs, 2005.
[11]
Shubhendu S. Mukherjee, Joel S. Emer, and Steven K. Reinhardt. The soft error problem: An architectural perspective. In HPCA, pages 243--247, 2005.
[12]
Vijay Nagarajan and Rajiv Gupta. Architectural support for shadow memory in multiprocessors. In VEE, pages 1--10, 2009.
[13]
Nicholas Nethercote and Julian Seward. How to shadow every byte of memory used by a program. In VEE, pages 65--74, 2007a.
[14]
Nicholas Nethercote and Julian Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In PLDI, pages 89--100, 2007b.
[15]
James Newsome and Dawn Song. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In NDSS, 2005.
[16]
David L. Oppenheimer, Aaron B. Brown, James Beck, Daniel Hettena, Jon Kuroda, Noah Treuhaft, David A. Patterson, and Katherine A. Yelick. Roc-1: Hardware support for recovery oriented computing. IEEE Trans.Computers, 51(2):100--107, 2002.
[17]
David A. Patterson. Recovery oriented computing: A new research agenda for a new century. In HPCA, page 247, 2002.
[18]
James S. Plank, Kai Li, and Michael A. Puening. Diskless checkpointing. IEEE Trans. Parallel Distrib. Syst., 9(10):972--986, 1998.
[19]
Feng Qin, Joseph Tucek, Jagadeesan Sundaresan, and Yuanyuan Zhou. Rx: treating bugs as allergies -- a safe method to survive software failures. In SOSP, pages 235--248, 2005.
[20]
Feng Qin, Cheng Wang, Zhenmin Li, Ho seop Kim, Yuanyuan Zhou, and Youfeng Wu. Lift: A low-overhead practical information Flow tracking system for detecting security attacks. In MICRO 39, pages 135--148, 2006.
[21]
Brian Randell, P. A. Lee, and Philip C. Treleaven. Reliability issues in computing system design. ACM Comput. Surv., 10(2):123--165, 1978.
[22]
Steven K. Reinhardt and Shubhendu S. Mukherjee. Transient fault detection via simultaneous multithreading. In ISCA, pages 25--36, 2000.
[23]
George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, and David I. August. Swift: Software implemented fault tolerance. In CGO, pages 243--254, 2005.
[24]
Martin C. Rinard, Cristian Cadar, Daniel Dumitran, Daniel M. Roy, Tudor Leu, and William S. Beebee. Enhancing server availability and security through failure-oblivious computing. In OSDI, pages 303--316, 2004.
[25]
Stelios Sidiroglou, Michael E. Locasto, Stephen W. Boyd, and Angelos D. Keromytis. Building a reactive immune system for software services. In USENIX Annual Technical Conference, General Track, pages 149--161, 2005.
[26]
Sudarshan M. Srinivasan, Srikanth Kandula, Christopher R. Andrews, and Yuanyuan Zhou. Flashback: a lightweight extension for rollback and deterministic replay for software debugging. In ATEC, pages 29--44, 2004.
[27]
G. Edward Suh, Jae W. Lee, David Zhang, and Srinivas Devadas. Secure program execution via dynamic information flow tracking. In ASPLOS, pages 85--96, 2004.
[28]
Michael M. Swift, Muthukaruppan Annamalai, Brian N. Bershad, and Henry M. Levy. Recovering device drivers (awarded best paper!). In OSDI, pages 1--16, 2004.
[29]
Michael M. Swift, Brian N. Bershad, and Henry M. Levy. Improving the reliability of commodity operating systems. In SOSP, pages 207--222, 2003.
[30]
Sriraman Tallam, Chen Tian, Rajiv Gupta, and Xiangyu Zhang. Avoiding program failures through safe execution perturbations. In COMPSAC, pages 152--159, 2008.
[31]
T. N. Vijaykumar, Irith Pomeranz, and Karl Cheng. Transient-fault recovery using simultaneous multithreading. In ISCA, pages 87--98, 2002.
[32]
Cheng Wang, Ho-Seop Kim, Youfeng Wu, and Victor Ying. Compilermanaged software-based redundant multi-threading for transient fault detection. In CGO, pages 244--258, 2007.
[33]
Xiangyu Zhang, Sriraman Tallam, and Rajiv Gupta. Dynamic slicing long running programs through execution fast forwarding. In SIGSOFT '06/FSE--14, pages 81--91, 2006.

Cited By

View all
  • (2016)Automatic runtime recovery via error handler synthesisProceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering10.1145/2970276.2970360(684-695)Online publication date: 25-Aug-2016
  • (2016)Assessing Dependability with Software Fault InjectionACM Computing Surveys10.1145/284142548:3(1-55)Online publication date: 8-Feb-2016
  • (2016)Improving Reliability of Dynamic Software Updating Using Runtime Recovery2016 23rd Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC.2016.044(257-264)Online publication date: 2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISMM '09: Proceedings of the 2009 international symposium on Memory management
June 2009
158 pages
ISBN:9781605583471
DOI:10.1145/1542431
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. memory propagation
  2. self cleansing
  3. self recovery

Qualifiers

  • Research-article

Conference

ISMM '09
Sponsor:

Acceptance Rates

ISMM '09 Paper Acceptance Rate 15 of 32 submissions, 47%;
Overall Acceptance Rate 72 of 156 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2016)Automatic runtime recovery via error handler synthesisProceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering10.1145/2970276.2970360(684-695)Online publication date: 25-Aug-2016
  • (2016)Assessing Dependability with Software Fault InjectionACM Computing Surveys10.1145/284142548:3(1-55)Online publication date: 8-Feb-2016
  • (2016)Improving Reliability of Dynamic Software Updating Using Runtime Recovery2016 23rd Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC.2016.044(257-264)Online publication date: 2016
  • (2015)Automatic error elimination by horizontal code transfer across multiple applicationsACM SIGPLAN Notices10.1145/2813885.273798850:6(43-54)Online publication date: 3-Jun-2015
  • (2015)An analysis of patch plausibility and correctness for generate-and-validate patch generation systemsProceedings of the 2015 International Symposium on Software Testing and Analysis10.1145/2771783.2771791(24-36)Online publication date: 13-Jul-2015
  • (2015)Automatic error elimination by horizontal code transfer across multiple applicationsProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2737924.2737988(43-54)Online publication date: 3-Jun-2015
  • (2014)Automatic runtime error repair and containment via recovery shepherdingACM SIGPLAN Notices10.1145/2666356.259433749:6(227-238)Online publication date: 9-Jun-2014
  • (2014)Automatic runtime error repair and containment via recovery shepherdingProceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2594291.2594337(227-238)Online publication date: 9-Jun-2014
  • (2014)On the Soundness of SilenceProceedings of the 2014 Tenth European Dependable Computing Conference10.1109/EDCC.2014.16(118-129)Online publication date: 13-May-2014
  • (2011)Detecting and escaping infinite loops with joltProceedings of the 25th European conference on Object-oriented programming10.5555/2032497.2032537(609-633)Online publication date: 25-Jul-2011
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media