Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3406085.3409009acmconferencesArticle/Chapter ViewAbstractPublication PagesicfpConference Proceedingsconference-collections
research-article

Teaching practical realistic verification of distributed algorithms in Erlang with TLA+

Published: 23 August 2020 Publication History

Abstract

Distributed systems are inherently complex as they need to address the interplay between features like communication, concurrency, and failure. Due to the inherent complexity of these interacting features, it is typically not possible to systematically test these kind of systems; yet, unexpected and unlikely combinations of events might cause corner cases that are hard to find. But since these systems are running typically for long durations, these events are likely to materialize eventually and must be handled correctly. Caught in such a dilemma, students are able to experience the benefits of applying verification tools to check their own algorithms and implementations. Having executable models with automatically generated executions allows them to experiment with different solutions by iteratively adapting and refining their algorithms.
In this experience report, we report on our experience of teaching verification in a (hands-on) distributed systems course. We argue that broadcast algorithms provide a sweet spot in design and verification complexity. To this end, we give an implementation of these algorithms in Erlang and derive a TLA+ specification. TLA+ is a formal language for describing and reasoning about distributed and concurrent systems and provides a model checker, TLC, among other things. Our study reveals interesting parallels between the Erlang and TLA+ code, while exposing the challenges of formally modeling communication and parallelism in distributed systems. Presenting selected aspects of our course design, we aim to motivate the feasibility and need for introducing verification in close correspondence to programming tasks.

References

[1]
Alzahrani, N., Spichkova, M., Blech, J.O. : From temporal models to property-based testing. In: Damiani, E., Spanoudakis, G., Maciaszek, L.A. (eds.) ENASE 2017-Proceedings of the 12th International Conference on Evaluation of Novel Approaches to Software Engineering, Porto, Portugal, April 28-29, 2017. pp. 241-246. SciTePress ( 2017 ), htps://doi.org/10.5220/0006340302410246
[2]
Armstrong, J. : Making reliable distributed systems in the presence of software errors. Ph.D. thesis, Royal Institute of Technology, Stockholm, Sweden ( 2003 ), htp://erlang.org/download/armstrong_thesis_2003. pdf
[3]
Birman, K.P., Joseph, T.A. : Reliable communication in the presence of failures. ACM Trans. Comput. Syst. 5 ( 1 ), 47-76 ( 1987 ), htps://doi.org/ 10.1145/7351.7478
[4]
Cachin, C., Guerraoui, R., Rodrigues, L.E.T. : Introduction to Reliable and Secure Distributed Programming (2. ed.). Springer ( 2011 ), htps: //doi.org/10.1007/978-3-642-15260-3
[5]
Cassar, I., Francalanza, A. : On synchronous and asynchronous monitor instrumentation for actor-based systems. In: Cámara, J., Proença, J. (eds.) Proceedings 13th International Workshop on Foundations of Coordination Languages and Self-Adaptive Systems, FOCLASA 2014, Rome, Italy, 6th September 2014. EPTCS, vol. 175, pp. 54-68 ( 2014 ), htps://doi.org/10.4204/EPTCS.175.4
[6]
Défago, X., Schiper, A., Urbán, P. : Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Comput. Surv. 36 ( 4 ), 372-421 ( 2004 ), htps://doi.org/10.1145/1041680.1041682
[7]
Demirbas, M. : My experience with using tla+ in distributed systems class. htp://muratbufalo.blogspot.com/ 2014 /08/using-tla-forteaching-distributed.html, accessed: 2019-06-14
[8]
Gotovos, A., Christakis, M., Sagonas, K. : Test-driven development of concurrent programs using concuerror. In: Rikitake, K., Stenman, E. (eds.) Proceedings of the 10th ACM SIGPLAN workshop on Erlang, Tokyo, Japan, September 23, 2011. pp. 51-61. ACM ( 2011 ), htps://doi. org/10.1145/2034654.2034664
[9]
Gotovos, A., Christakis, M., Sagonas, K. : Test-driven development of concurrent programs using concuerror. In: Proceedings of the 10th ACM SIGPLAN Workshop on Erlang. p. 51-61. Erlang ' 11, Association for Computing Machinery, New York, NY, USA ( 2011 ), htps://doi.org/ 10.1145/2034654.2034664
[10]
Hadzilacos, V., Toueg, S.: A modular approach to fault-tolerant broadcasts and related problems. Tech. rep., USA ( 1994 )
[11]
Jin, X., Li, X., Zhang, H., Foster, N., Lee, J., Soulé, R., Kim, C., Stoica, I. : Netchain: Scale-free sub-rtt coordination. In: 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). pp. 35-49. USENIX Association, Renton, WA ( 2018 ), htps://www.usenix.org/ conference/nsdi18/presentation/jin
[12]
Lamport, L. : The temporal logic of actions. ACM Trans. Program. Lang. Syst. 16 ( 3 ), 872-923 ( 1994 ), htps://doi.org/10.1145/177492.177726
[13]
Lamport, L. : Specifying Systems, The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley ( 2002 )
[14]
Lamport, L. : Teaching concurrency. SIGACT News 40 ( 1 ), 58-62 ( 2009 ), htps://doi.org/10.1145/1515698.1515713
[15]
Michael, E., Woos, D., Anderson, T., Ernst, M.D., Tatlock, Z. : Teaching rigorous distributed systems with eficient model checking. In: Proceedings of the Fourteenth EuroSys Conference 2019. EuroSys '19, Association for Computing Machinery, New York, NY, USA ( 2019 ), htps://doi.org/10.1145/3302424.3303947
[16]
Mu, S., Nelson, L., Lloyd, W., Li, J. : Consolidating concurrency control and consensus for commits under conflicts. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI'16). pp. 517-532 ( 2016 )
[17]
Newcombe, C. : Why Amazon chose TLA+. In: Abstract State Machines, Alloy, B, TLA, VDM, and Z. pp. 25-39. Springer ( 2014 ). https://doi.org/10.1007/978-3-662-43652-3
[18]
Newcombe, C., Rath, T., Zhang, F., Munteanu, B., Brooker, M., Deardeuf, M. : How Amazon web services uses formal methods. Communications of the ACM 58 ( 4 ), 66-73 ( 2015 )
[19]
Raynal, M. : Fault-Tolerant Message-Passing Distributed Systems-An Algorithmic Approach. Springer ( 2018 ), htps://doi.org/10.1007/978-3-319-94141-7
[20]
Schneider, F.B. : Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv. 22 ( 4 ), 299-319 ( 1990 ), htps://doi.org/10.1145/98163.98167
[21]
Svensson, H., Fredlund, L., Earle, C.B. : A unified semantics for future erlang. In: Fritchie, S.L., Sagonas, K. (eds.) Proceedings of the 9th ACM SIGPLAN workshop on Erlang, Baltimore, Maryland, USA, September 30, 2010. pp. 23-32. ACM ( 2010 ), htps://doi.org/10.1145/1863509. 1863514

Cited By

View all
  • (2024)Erla⁺: Translating TLA⁺ Models into Executable Actor-Based ImplementationsProceedings of the 23rd ACM SIGPLAN International Workshop on Erlang10.1145/3677995.3678190(13-23)Online publication date: 28-Aug-2024
  • (2023)Extending PlusCal for Modeling Distributed AlgorithmsiFM 202310.1007/978-3-031-47705-8_17(321-340)Online publication date: 6-Nov-2023
  • (2021)Iterative Approach to TLC Model Checker Application2021 IEEE 2nd KhPI Week on Advanced Technology (KhPIWeek)10.1109/KhPIWeek53812.2021.9570055(283-287)Online publication date: 13-Sep-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
Erlang 2020: Proceedings of the 19th ACM SIGPLAN International Workshop on Erlang
August 2020
52 pages
ISBN:9781450380492
DOI:10.1145/3406085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Broadcast algorithms
  2. Distributed algorithms
  3. Formal verification
  4. TLA+

Qualifiers

  • Research-article

Funding Sources

Conference

ICFP '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 51 of 68 submissions, 75%

Upcoming Conference

ICFP '25
ACM SIGPLAN International Conference on Functional Programming
October 12 - 18, 2025
Singapore , Singapore

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Erla⁺: Translating TLA⁺ Models into Executable Actor-Based ImplementationsProceedings of the 23rd ACM SIGPLAN International Workshop on Erlang10.1145/3677995.3678190(13-23)Online publication date: 28-Aug-2024
  • (2023)Extending PlusCal for Modeling Distributed AlgorithmsiFM 202310.1007/978-3-031-47705-8_17(321-340)Online publication date: 6-Nov-2023
  • (2021)Iterative Approach to TLC Model Checker Application2021 IEEE 2nd KhPI Week on Advanced Technology (KhPIWeek)10.1109/KhPIWeek53812.2021.9570055(283-287)Online publication date: 13-Sep-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media