Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1558977.1558992guideproceedingsArticle/Chapter ViewAbstractPublication PagesnsdiConference Proceedingsconference-collections
Article

MODIST: transparent model checking of unmodified distributed systems

Published: 22 April 2009 Publication History

Abstract

MODIST is the first model checker designed for transparently checking unmodified distributed systems running on unmodified operating systems. It achieves this transparency via a novel architecture: a thin interposition layer exposes all actions in a distributed system and a centralized, OS-independent model checking engine explores these actions systematically. We made MODIST practical through three techniques: an execution engine to simulate consistent, deterministic executions and failures; a virtual clock mechanism to avoid false positives and false negatives; and a state exploration framework to incorporate heuristics for efficient error detection.
We implemented MODIST on Windows and applied it to three well-tested distributed systems: Berkeley DB, a widely used open source database; MPS, a deployed Paxos implementation; and PACIFICA, a primary-backup replication protocol implementation. MODIST found 35 bugs in total. Most importantly, it found protocol-level bugs (i.e., flaws in the core distributed protocols) in every system checked: 10 in total, including 2 in Berkeley DB, 2 in MPS, and 6 in PACIFICA.

References

[1]
T. Ball and S. K. Rajamani. Automatically validating temporal safety properties of interfaces. In Proceedings of the Eighth International SPIN Workshop on Model Checking of Software (SPIN '01), pages 103-122, May 2001.
[2]
W. Bush, J. Pincus, and D. Sielaff. A static analyzer for finding dynamic programming errors. Software: Practice and Experience, 30(7):775-802, 2000.
[3]
C. Cadar, V. Ganesh, P. M. Pawlowski, D. L. Dill, and D. R. Engler. EXE: automatically generating inputs of death. In Proceedings of the 13th ACM conference on Computer and communications security (CCS '06), pages 322-335, Oct.-Nov. 2006.
[4]
C. Cadar, D. Dunbar, and D. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the Eighth Symposium on Operating Systems Design and Implementation (OSDI '08), pages 209-224, Dec. 2008.
[5]
S. Chandra, P. Godefroid, and C. Palm. Software model checking in practice: an industrial case study. In Proceedings of the 24th International Conference on Software Engineering (ICSE '02), pages 431-441, May 2002.
[6]
J. C. Corbett, M. B. Dwyer, J. Hatcliff, S. Laubach, C. S. Pasareanu, Robby, and H. Zheng. Bandera: Extracting finite-state models from Java source code. In Proceedings of the 22nd International Conference on Software Engineering (ICSE '00), pages 439-448, June 2000.
[7]
M. Das, S. Lerner, and M. Seigle. Esp: path-sensitive program verification in polynomial time. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI '02), pages 57-68, June 2002.
[8]
D. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specific, programmer-written compiler extensions. In Proceedings of the Fourth Symposium on Operating Systems Design and Implementation (OSDI '00), Sept. 2000.
[9]
C. Flanagan and P. Godefroid. Dynamic partial-order reduction for model checking software. In Proceedings of the 32nd Annual Symposium on Principles of Programming Languages (POPL '05), pages 110-121, Jan. 2005.
[10]
C. Flanagan, K. Leino, M. Lillibridge, G. Nelson, J. Saxe, and R. Stata. Extended static checking for Java. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI '02), pages 234-245, June 2002.
[11]
D. Geels, G. Altekarz, P. Maniatis, T. Roscoey, and I. Stoicaz. Friday: Global comprehension for distributed replay. In Proceedings of the Fourth Symposium on Networked Systems Design and Implementation (NSDI '07), Apr. 2007.
[12]
P. Godefroid. Model Checking for Programming Languages using VeriSoft. In Proceedings of the 24th Annual Symposium on Principles of Programming Languages (POPL '97), pages 174-186, Jan. 1997.
[13]
P. Godefroid, N. Klarlund, and K. Sen. Dart: Directed automated random testing. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation (PLDI '05), pages 213- 223, June 2005.
[14]
C. Gray and D. Cheriton. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (SOSP '89), pages 202-210, Dec. 1989.
[15]
Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, and Z. Zhang. R2: An application-level kernel for record and replay. In Proceedings of the Eighth Symposium on Operating Systems Design and Implementation (OSDI '08), pages 193-208, Dec. 2008.
[16]
R. Hastings and B. Joyce. Purify: Fast detection of memory leaks and access errors. In Proceedings of the USENIX Winter Technical Conference (USENIX Winter '92), pages 125-138, Dec. 1992.
[17]
G. J. Holzmann. The model checker SPIN. Software Engineering, 23(5): 279-295, 1997.
[18]
G. J. Holzmann. From code to models. In Proceedings of the Second International Conference on Applications of Concurrency to System Design (ACSD '01), June 2001.
[19]
C. Killian, J. W. Anderson, R. Jhala, and A. Vahdat. Life, death, and the critical transition: Finding liveness bugs in systems code. In Proceedings of the Fourth Symposium on Networked Systems Design and Implementation (NSDI '07), pages 243-256, April 2007.
[20]
C. E. Killian, J. W. Anderson, R. Braud, R. Jhala, and A. M. Vahdat. Mace: language support for building distributed systems. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI '07), pages 179-188, June 2007.
[21]
R. Konuru, H. Srinivasan, and J.-D. Choi. Deterministic replay of distributed Java applications. In Proceedings of the 14th International Symposium on Parallel and Distributed Processing (IPDPS '00), pages 219-228, May 2000.
[22]
L. Lamport. The part-time parliament. ACM Transactions on Computer Systems, 16(2):133-169, 1998.
[23]
L. Lamport. Time, clocks, and the ordering of events in a distributed system. Comm. ACM, 21(7):558-565, 1978.
[24]
W. Lin, M. Yang, L. Zhang, and L. Zhou. Pacifica: Replication in log-based distributed storage systems. Technical report.
[25]
X. Liu, Z. Guo, X. Wang, F. Chen, X. Lian, J. Tang, M. Wu, M. F. Kaashoek, and Z. Zhang. D3s: Debugging deployed distributed systems. In Proceedings of the Fifth Symposium on Networked Systems Design and Implementation (NSDI '08), pages 423-437, Apr. 2008.
[26]
F. Mattern. Virtual time and global states of distributed systems. In Proceedings of the International Workshop on Parallel and Distributed Algorithms , pages 215-226. 1988.
[27]
M. Musuvathi, D. Y. Park, A. Chou, D. R. Engler, and D. L. Dill. CMC: A pragmatic approach to model checking real code. In Proceedings of the Fifth Symposium on Operating Systems Design and Implementation (OSDI '02), pages 75-88, Dec. 2002.
[28]
M. Musuvathi, S. Qadeer, T. Ball, G. Basler, P. A. Nainar, and I. Neamtiu. Finding and reproducing heisenbugs in concurrent programs. In Proceedings of the Eighth Symposium on Operating Systems Design and Implementation (OSDI '08), pages 267-280, Dec. 2008.
[29]
N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI '07), pages 89-100, June 2007.
[30]
Phoenix. http://research.microsoft.com/Phoenix/.
[31]
K. Sen, D. Marinov, and G. Agha. CUTE: A concolic unit testing engine for C. In 5th joint meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE'05), pages 263-272, Sept. 2005.
[32]
S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: A lightweight extension for rollback and deterministic replay for software debugging. In Proceedings of the USENIX Annual Technical Conference (USENIX '04), pages 29-44, June 2004.
[33]
The Coverity Software Analysis Toolset. http://coverity.com.
[34]
W. Visser, K. Havelund, G. Brat, S. Park, and F. Lerda. Model checking programs. Automated Software Engineering, 10(2):203-232, 2003.
[35]
A. Vo, S. Vakkalanka, M. DeLisi, G. Gopalakrishnan, R. M. Kirby, and R. Thakur. Formal verification of practical mpi programs. In Principles and Practices of Parallel Programming (PPoPP), pages 261-260, Feb. 2009.
[36]
Windows API. http://msdn.microsoft.com/.
[37]
M. Yabandeh, N. Knezevic, D. Kostic, and V. Kuncak. CrystalBall: Predicting and preventing inconsistencies in deployed distributed systems. In Proceedings of the Sixth Symposium on Networked Systems Design and Implementation (NSDI '09), Apr. 2009.
[38]
J. Yang, P. Twohey, D. Engler, and M. Musuvathi. Using model checking to find serious file system errors. In Proceedings of the Sixth Symposium on Operating Systems Design and Implementation (OSDI '04), pages 273- 288, Dec. 2004.
[39]
J. Yang, C. Sar, and D. Engler. Explode: a lightweight, general system for finding serious storage system errors. In Proceedings of the Seventh Symposium on Operating Systems Design and Implementation (OSDI '06), pages 131-146, Nov. 2006.

Cited By

View all
  • (2024)Generalized Concurrency Testing Tool for Distributed SystemsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3685309(1861-1865)Online publication date: 11-Sep-2024
  • (2024)An Empirical Study on Kubernetes Operator BugsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680396(1746-1758)Online publication date: 11-Sep-2024
  • (2022)Who goes first? detecting go concurrency bugs via message reorderingProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507753(888-902)Online publication date: 28-Feb-2022
  • Show More Cited By

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NSDI'09: Proceedings of the 6th USENIX symposium on Networked systems design and implementation
April 2009
480 pages

Sponsors

  • USENIX Assoc: USENIX Assoc

Publisher

USENIX Association

United States

Publication History

Published: 22 April 2009

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Generalized Concurrency Testing Tool for Distributed SystemsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3685309(1861-1865)Online publication date: 11-Sep-2024
  • (2024)An Empirical Study on Kubernetes Operator BugsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680396(1746-1758)Online publication date: 11-Sep-2024
  • (2022)Who goes first? detecting go concurrency bugs via message reorderingProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507753(888-902)Online publication date: 28-Feb-2022
  • (2021)QuickSilver: modeling and parameterized verification for distributed agreement-based systemsProceedings of the ACM on Programming Languages10.1145/34855345:OOPSLA(1-31)Online publication date: 15-Oct-2021
  • (2021)A multiparty session typing discipline for fault-tolerant event-driven distributed programmingProceedings of the ACM on Programming Languages10.1145/34855015:OOPSLA(1-30)Online publication date: 15-Oct-2021
  • (2021)Reasoning about modern datacenter infrastructures using partial historiesProceedings of the Workshop on Hot Topics in Operating Systems10.1145/3458336.3465276(213-220)Online publication date: 1-Jun-2021
  • (2021)FoundationDB: A Distributed Unbundled Transactional Key Value StoreProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457559(2653-2666)Online publication date: 9-Jun-2021
  • (2020)Learning-based controlled concurrency testingProceedings of the ACM on Programming Languages10.1145/34282984:OOPSLA(1-31)Online publication date: 13-Nov-2020
  • (2019)Trace aware random testing for distributed systemsProceedings of the ACM on Programming Languages10.1145/33606063:OOPSLA(1-29)Online publication date: 10-Oct-2019
  • (2019)The Case for Learning-and-System Co-designACM SIGOPS Operating Systems Review10.1145/3352020.335203153:1(68-74)Online publication date: 25-Jul-2019
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media