Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2837614.2837648acmconferencesArticle/Chapter ViewAbstractPublication PagespoplConference Proceedingsconference-collections
research-article
Public Access

Reducing crash recoverability to reachability

Published: 11 January 2016 Publication History

Abstract

Software applications run on a variety of platforms (filesystems, virtual slices, mobile hardware, etc.) that do not provide 100% uptime. As such, these applications may crash at any unfortunate moment losing volatile data and, when re-launched, they must be able to correctly recover from potentially inconsistent states left on persistent storage. From a verification perspective, crash recovery bugs can be particularly frustrating because, even when it has been formally proved for a program that it satisfies a property, the proof is foiled by these external events that crash and restart the program. In this paper we first provide a hierarchical formal model of what it means for a program to be crash recoverable. Our model captures the recoverability of many real world programs, including those in our evaluation which use sophisticated recovery algorithms such as shadow paging and write-ahead logging. Next, we introduce a novel technique capable of automatically proving that a program correctly recovers from a crash via a reduction to reachability. Our technique takes an input control-flow automaton and transforms it into an encoding that blends the capture of snapshots of pre-crash states into a symbolic search for a proof that recovery terminates and every recovered execution simulates some crash-free execution. Our encoding is designed to enable one to apply existing abstraction techniques in order to do the work that is necessary to prove recoverability. We have implemented our technique in a tool called Eleven82, capable of analyzing C programs to detect recoverability bugs or prove their absence. We have applied our tool to benchmark examples drawn from industrial file systems and databases, including GDBM, LevelDB, LMDB, PostgreSQL, SQLite, VMware and ZooKeeper. Within minutes, our tool is able to discover bugs or prove that these fragments are crash recoverable.

References

[1]
R. Alur, T. A. Henzinger, F. Y. C. Mang, S. Qadeer, S. K. Rajamani, and S. Tasiran. MOCHA: modularity in model checking. In Computer Aided Verification, 10th International Conference, CAV ’98, Proceedings, pages 521–525, 1998.
[2]
L. N. Bairavasundaram. PhD thesis, Characteristics, Impact, and Tolerance of Partial Disk Failures, 2008.
[3]
G. Barthe, P. R. D’Argenio, and T. Rezk. Secure information flow by self-composition. Mathematical Structures in Computer Science, 21(6):1207–1252, 2011.
[4]
T. Beyene, C. Popeea, and A. Rybalchenko. Solving existentially quantified horn clauses. In CAV’11, 2013.
[5]
D. Beyer and M. E. Keremoglu. Cpachecker: A tool for configurable software verification. In G. Gopalakrishnan and S. Qadeer, editors, Computer Aided Verification - 23rd International Conference, CAV 2011, Snowbird, UT, USA, July 14-20, 2011. Proceedings, volume 6806, pages 184–190. Springer, 2011.
[6]
M. Castro, B. Liskov, et al. Practical byzantine fault tolerance. In OSDI, volume 99, pages 173–186, 1999.
[7]
H. Chen, D. Ziegler, T. Chajed, A. Chlipala, M. F. Kaashoek, and N. Zeldovich. Using crash hoare logic for certifying the FSCQ file system. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015, pages 18–37. ACM, 2015.
[8]
H. Chen, D. Ziegler, A. Chlipala, M. F. Kaashoek, E. Kohler, and N. Zeldovich. Specifying crash safety for storage systems. In 15th Workshop on Hot Topics in Operating Systems, 2015.
[9]
J. Christ, J. Hoenicke, and A. Nutz. Smtinterpol: An interpolating SMT solver. In A. F. Donaldson and D. Parker, editors, Model Checking Software - 19th International Workshop, SPIN 2012, Oxford, UK, July 23-24, 2012. Proceedings, volume 7385 of Lecture Notes in Computer Science, pages 248–254. Springer, 2012.
[10]
A. Cimatti, E. M. Clarke, F. Giunchiglia, and M. Roveri. NUSMV: A new symbolic model checker. STTT, 2(4):410–425, 2000.
[11]
A. Cimatti, A. Griggio, B. Schaafsma, and R. Sebastiani. The Math-SAT5 SMT Solver. In N. Piterman and S. Smolka, editors, Proceedings of TACAS, volume 7795 of LNCS. Springer, 2013.
[12]
B. Cook and E. Koskinen. Making prophecies with decision predicates. In T. Ball and M. Sagiv, editors, Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011, pages 399–410. ACM, 2011.
[13]
B. Cook and E. Koskinen. Reasoning about nondeterminism in programs. In PLDI’13. ACM, 2013.
[14]
B. Cook, E. Koskinen, and M. Y. Vardi. Temporal property verification as a program analysis task. In CAV’11, pages 333–348, 2011.
[15]
B. Cook, A. Podelski, and A. Rybalchenko. Termination proofs for systems code. In PLDI’06, pages 415–426, 2006.
[16]
H. Cui, G. Hu, J. Wu, and J. Yang. Verifying systems rules using ruledirected symbolic execution. In Eighteenth International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS ’13), 2013.
[17]
D. Dietsch, M. Heizmann, V. Langenfeld, and A. Podelski. Fairness modulo theory: A new approach to LTL software model checking. In CAV, 2015.
[18]
G. R. Ganger and Y. N. Patt. Soft updates: A solution to the metadata update problem in file systems. Technical report, University of Michigan, 1995.
[19]
P. Gardner, G. Ntzik, and A. Wright. Local reasoning for the POSIX file system. In Z. Shao, editor, Proceedings of the 23rd European Symposium on Programming, ESOP 2014, volume 8410 of Lecture Notes in Computer Science, pages 169–188. Springer, 2014.
[20]
J. Gray, P. McJones, M. Blasgen, B. Lindsay, R. Lorie, T. Price, F. Putzolu, and I. Traiger. The recovery manager of the system r database manager. ACM Comput. Surv., 13(2):223–242, June 1981.
[21]
R. Gu, J. Koenig, T. Ramananandro, Z. Shao, X. N. Wu, S.-C. Weng, H. Zhang, and Y. Guo. Deep specifications and certified abstraction layers. In PRoceedings of the42nd ACM Symposium on Principles of Programming Languages (POPL’15), 2015.
[22]
R. Guerraoui and M. Kapalka. On the correctness of transactional memory. In S. Chatterjee and M. L. Scott, editors, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2008, Salt Lake City, UT, USA, February 20-23, 2008, pages 175–184. ACM, 2008.
[23]
H. S. Gunawi, A. Rajimwale, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Sqck: A declarative file system checker. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, pages 131–146, 2008.
[24]
T. A. Henzinger, R. Jhala, R. Majumdar, G. C. Necula, G. Sutre, and W. Weimer. Temporal-safety proofs for systems code. In E. Brinksma and K. G. Larsen, editors, Proceedings of the 14th International Conference on Computer Aided Verification (CAV’02), volume 2404, pages 526–538. Springer, 2002.
[25]
C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. Aries: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst., 17(1):94–162, Mar. 1992.
[26]
G. Ntzik, P. da Rocha Pinto, and P. Gardner. Fault-tolerant resource reasoning. Proceedings of the 13th Asian Symposium on Programming Languages and Systems (APLAS), Pohang, South Korea, 2015.
[27]
T. S. Pillai, V. Chidambaram, R. Alagappan, S. Al-Kiswany, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI ’14), Broomfield, CO, October 2014.
[28]
H. E. Ramadan, I. Roy, M. Herlihy, and E. Witchel. Committing conflicting transactions in an STM. In D. A. Reed and V. Sarkar, editors, Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2009, Raleigh, NC, USA, February 14-18, 2009, pages 163–172. ACM, 2009.
[29]
Y. Raz. The principle of commitment ordering, or guaranteeing serializability in a heterogeneous environment of multiple autonomous resource mangers using atomic commitment. In Proceedings of the 18th International Conference on Very Large Data Bases, pages 292– 312. Morgan Kaufmann Publishers Inc., 1992.
[30]
T. Ridge, D. Sheets, T. Tuerk, A. Giugliano, A. Madhavapeddy, and P. Sewell. Sibylfs: formal specification and oracle-based testing for POSIX and real-world file systems. In Proceedings of the 25th Symp. on Operating Systems Principles, SOSP 2015, pg. 38–53, 2015.
[31]
T. Terauchi and A. Aiken. Secure information flow as a safety problem. In Static Analysis, 12th International Symposium, SAS 2005, London, UK, September 7-9, 2005, Proceedings, pages 352–367, 2005.
[32]
S. C. Tweedie. Journaling the Linux ext2fs File System. In the Fourth Annual Linux Expo, May 1998.
[33]
J. Yang, C. Sar, and D. Engler. Explode: a lightweight, general system for finding serious storage system errors. In Proceedings of the Seventh Symposium on Operating Systems Design and Implementation (OSDI ’06), pages 131–146, Nov. 2006.
[34]
J. Yang, P. Twohey, D. Engler, and M. Musuvathi. Using model checking to find serious file system errors. In Proceedings of the Sixth Symposium on Operating Systems Design and Implementation (OSDI ’04), pages 273–288, Dec. 2004.
[35]
J. Yang, P. Twohey, D. Engler, and M. Musuvathi. Using model checking to find serious file system errors. ACM Transactions on Computer Systems, 24(4):393–423, Nov. 2006.
[36]
J. Yang, P. Twohey, B. Pfaff, C. Sar, and D. Engler. eXplode: A lightweight, general approach for finding serious errors in storage systems. In Proceedings of the first Workshop on the Evaluation of Software Defect Detection Tools (BUGS ’05), June 2005.

Cited By

View all
  • (2021)Automatically enforcing fresh and consistent inputs in intermittent systemsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454081(851-866)Online publication date: 19-Jun-2021
  • (2020)Towards a formal foundation of intermittent computingProceedings of the ACM on Programming Languages10.1145/34282314:OOPSLA(1-31)Online publication date: 13-Nov-2020
  • (2020)Finding Bugs in File Systems with an Extensible Fuzzing FrameworkACM Transactions on Storage10.1145/339120216:2(1-35)Online publication date: 18-May-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
January 2016
815 pages
ISBN:9781450335492
DOI:10.1145/2837614
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 51, Issue 1
    POPL '16
    January 2016
    815 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2914770
    • Editor:
    • Andy Gill
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 January 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Crash recovery
  2. formal verification
  3. program analysis

Qualifiers

  • Research-article

Funding Sources

Conference

POPL '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 824 of 4,130 submissions, 20%

Upcoming Conference

POPL '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)190
  • Downloads (Last 6 weeks)48
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Automatically enforcing fresh and consistent inputs in intermittent systemsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454081(851-866)Online publication date: 19-Jun-2021
  • (2020)Towards a formal foundation of intermittent computingProceedings of the ACM on Programming Languages10.1145/34282314:OOPSLA(1-31)Online publication date: 13-Nov-2020
  • (2020)Finding Bugs in File Systems with an Extensible Fuzzing FrameworkACM Transactions on Storage10.1145/339120216:2(1-35)Online publication date: 18-May-2020
  • (2020)Intermittent Computing with Peripherals, Formally VerifiedThe 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3372799.3394365(85-96)Online publication date: 16-Jun-2020
  • (2019)Finding semantic bugs in file systems with an extensible fuzzing frameworkProceedings of the 27th ACM Symposium on Operating Systems Principles10.1145/3341301.3359662(147-161)Online publication date: 27-Oct-2019
  • (2018)FCatchACM SIGPLAN Notices10.1145/3296957.317716153:2(419-431)Online publication date: 19-Mar-2018
  • (2018)An empirical study on crash recovery bugs in large-scale distributed systemsProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3236024.3236030(539-550)Online publication date: 26-Oct-2018
  • (2018)FCatchProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3177161(419-431)Online publication date: 19-Mar-2018
  • (2018)Design and Verification of Restart-Robust Industrial Control SoftwareIntegrated Formal Methods10.1007/978-3-319-98938-9_4(47-68)Online publication date: 9-Aug-2018
  • (2017)Verifying a high-performance crash-safe file system using a tree specificationProceedings of the 26th Symposium on Operating Systems Principles10.1145/3132747.3132776(270-286)Online publication date: 14-Oct-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media