Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2451116.2451170acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

DDOS: taming nondeterminism in distributed systems

Published: 16 March 2013 Publication History

Abstract

Nondeterminism complicates the development and management of distributed systems, and arises from two main sources: the local behavior of each individual node as well as the behavior of the network connecting them. Taming nondeterminism effectively requires dealing with both sources.
This paper proposes DDOS, a system that leverages prior work on deterministic multithreading to offer: 1) space-efficient record/replay of distributed systems; and 2) fully deterministic distributed behavior. Leveraging deterministic behavior at each node makes outgoing messages strictly a function of explicit inputs. This allows us to record the system by logging just message's arrival time, not the contents. Going further, we propose and implement an algorithm that makes all communication between nodes deterministic by scheduling communication onto a global logical timeline.
We implement both algorithms in a system called DDOS and evaluate our system with parallel scientific applications, an HTTP/memcached system and a distributed microbenchmark with a high volume of peer-to-peer communication. Our results show up to two orders of magnitude reduction in log size of record/replay, and that distributed systems can be made deterministic with an order of magnitude of overhead.

References

[1]
A. Aviram, S.-C. Weng, S. Hu, and B. Ford. Efficient System-Enforced Deterministic Parallelism. In OSDI, 2010.
[2]
M. Basrai and P. M. Chen. Cooperative revirt: Adapting message logging for intrusion analysis. Technical Report CSE-TR-504-04, University of Michigan, 2004.
[3]
T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman. CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution. In ASPLOS, 2010.
[4]
T. Bergan, N. Hunt, L. Ceze, and S. D. Gribble. Deterministic Process Groups in dOS. In OSDI, 2010.
[5]
E. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe and Efficient Concurrent Programming. In OOPSLA, 2009.
[6]
K. P. Birman. The Process Group Approach to Reliable Distributed Computing. Communications of the ACM, 36(12), December 1993.
[7]
J. Choi and H. Srinivasan. Deterministic Replay of Java Multithreaded Applications. In SIGMETRICS SPDT, 1998.
[8]
J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: Deterministic Shared Memory Multiprocessing. In ASPLOS, 2009.
[9]
J. Devietti, J. Nelson, T. Bergan, L. Ceze, and D. Grossman. RCDC: A Relaxed Consistency Deterministic Computer. In ASPLOS, 2011.
[10]
G. Dunlap, S. King, S. Cinar, M. Basrai, and P. Chen. ReVirt: Enabling Intrusion Analysis Through Virtual-Machine Logging and Replay. In OSDI, 2002.
[11]
G. Dunlap, D. Lucchetti, M. Fetterman, and P. Chen. Execution replay of multiprocessor virtual machines. In VEE, 2008.
[12]
S. A. Edwards and O. Tardieu. SHIM: A Deterministic Model for Heterogeneous Embedded Systems. In EMSOFT, 2005.
[13]
M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32:374--382, April 1985.
[14]
D. Geels, G. Altekar, S. Shenker, and I. Stoica. Abstract replay debugging for distributed applications. In USENIX Annual Technical Conference, 2009.
[15]
M. Hill and M. Xu. Racey: A Stress Test for Deterministic Execution. http://www.cs.wisc.edu/ markhill/racey.html.
[16]
D. Hower, P. Dudnik, D. Wood, and M. Hill. Calvin: Deterministic or Not? Free Will to Choose. In HPCA, 2011.
[17]
G. Kahn. The Semantics of a Simple Language for Parallel Programming. Information Processing, pages 471--475, 1974.
[18]
R. Konuru. Deterministic replay of distributed java applications. In In Proceedings of the 14th IEEE International Parallel and Distributed Processing Symposium, pages 219--228, 2000.
[19]
O. Laadan, N. Viennot, and J. Nieh. Transparent, Lightweight Application Execution Replay on Commodity Multiprocessor Operating Systems. In SIGMETRICS, 2010.
[20]
L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. Communications of the ACM, 21(7), July 1978.
[21]
L. Lamport. Distributed Snapshots: Determining Global States of Distributed Systems. ACM TOCS, 3(1), 1985.
[22]
L. Lamport. The Part-Time Parliament. ACM TOCS, 16(2), 1998.
[23]
T. J. LeBlanc and J. M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE TC, 36(4), 1987.
[24]
B. C. Lee, E. Ipek, O. Mutlu, and D. Burger. Architecting Phase Change Memory as a Scalable DRAM Alternative. In ISCA, 2009.
[25]
NASA Advanced Supercomputing Division. The NAS Parallel Benchmarks. http://www.nas.nasa.gov/Resources/Software/npb.html.
[26]
M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient Deterministic Multithreading in Software. In ASPLOS, 2009.
[27]
D. Ongaro, S. M. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. Fast Crash Recovery in RAMCloud. In SOSP, 2011.
[28]
M. Ronsse and K. D. Bosschere. RecPlay: A Fully Integrated Practical Record/Replay System. ACM TOCS, 17(2), 1999.
[29]
Y. Saito. Jockey: A User-Space Library for Record-Replay Debugging. In International Symposium on Automated Analysis-driven Debugging, 2005.
[30]
A. Thomson and D. J. Abadi. The case for determinism in database systems. In VLDB, 2010.
[31]
R. Van Renesse, K. Birman, and S. Maffeis. Horus: A Flexible Group Communication System. Communications of the ACM, 39(4), April 1996.
[32]
K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. DoublePlay: Parallelizing Sequential Logging and Replay. In ASPLOS, 2011.

Cited By

View all
  • (2023)Halfmoon: Log-Optimal Fault-Tolerant Stateful Serverless ComputingProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613154(314-330)Online publication date: 23-Oct-2023
  • (2022)Understanding and Reaching the Performance Limit of Schedule Tuning on Stable Synchronization DeterminismProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569669(223-238)Online publication date: 8-Oct-2022
  • (2020)Reproducible ContainersProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378519(167-182)Online publication date: 9-Mar-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
March 2013
574 pages
ISBN:9781450318709
DOI:10.1145/2451116
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 48, Issue 4
    ASPLOS '13
    April 2013
    540 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2499368
    Issue’s Table of Contents
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 41, Issue 1
    ASPLOS '13
    March 2013
    540 pages
    ISSN:0163-5964
    DOI:10.1145/2490301
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 March 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. determinism
  2. distributed systems
  3. record/replay

Qualifiers

  • Research-article

Conference

ASPLOS '13

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)6
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Halfmoon: Log-Optimal Fault-Tolerant Stateful Serverless ComputingProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613154(314-330)Online publication date: 23-Oct-2023
  • (2022)Understanding and Reaching the Performance Limit of Schedule Tuning on Stable Synchronization DeterminismProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569669(223-238)Online publication date: 8-Oct-2022
  • (2020)Reproducible ContainersProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378519(167-182)Online publication date: 9-Mar-2020
  • (2019)Lazy Determinism for Faster Deterministic MultithreadingProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304047(879-891)Online publication date: 4-Apr-2019
  • (2019)Semantics-aware scheduling policies for synchronization determinismProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295731(242-256)Online publication date: 16-Feb-2019
  • (2017)Monadic composition for deterministic, parallel batch processingProceedings of the ACM on Programming Languages10.1145/31338971:OOPSLA(1-26)Online publication date: 12-Oct-2017
  • (2016)Application-Level Determinism in Distributed Systems2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS.2016.0132(989-998)Online publication date: Dec-2016
  • (2015)Paxos made transparentProceedings of the 25th Symposium on Operating Systems Principles10.1145/2815400.2815427(105-120)Online publication date: 4-Oct-2015
  • (2015)RepFrameProceedings of the 6th Asia-Pacific Workshop on Systems10.1145/2797022.2797033(1-9)Online publication date: 27-Jul-2015
  • (2015)Marching BandProceedings of the 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing10.1109/PDP.2015.52(43-47)Online publication date: 4-Mar-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media