Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Active Replication of Multithreaded Applications

Published: 01 May 2006 Publication History

Abstract

Software-based active replication is expensive in terms of performance overhead. Multithreading can help improve performance; however, thread scheduling is a source of nondeterminism in replica behavior. To achieve strong replica consistency in multithreaded environments, this paper proposes intercepting mutex lock/unlock operations performed by threads on accessing the shared data and contributes with two algorithmic solutions: 1) a Loose Synchronization Algorithm (LSA), which captures the natural concurrency in a leader replica and projects it on follower replicas through interreplica communication, and 2) a Preemptive Deterministic Scheduler (PDS) algorithm, which removes the need for interreplica communication through the notion of round and by suspending threads when it is unable (yet) to schedule them deterministically. Failure behavior and performance of LSA and PDS implementations are evaluated in a triplicated system and compared with existing solutions. A performance evaluation indicates that LSA and PDS outperform existing solutions, with PDS offering lower throughput than LSA. A fault-injection campaign shows that PDS is more robust to errors due to the absence of interreplica communication. Hence, LSA and PDS represent a trade-off between performance and dependability. Finally, LSA and PDS are demonstrated in replicating the Apache Web server, a substantial real-world application.

References

[1]
M. Cukier et al., “AQuA: An Adaptive Architecture that Provides Dependable Distributed Objects,” Proc. IEEE Symp. Reliable Distributed Systems, pp. 245-253, 1998.
[2]
A. Borg et al., “Fault Tolerance under UNIX,” ACM Trans. Computer Systems, vol. 7, no. 1, pp. 1-24, 1989.
[3]
T.C. Bressoud and F.B. Schneider, “Hypervisor-Based Fault Tolerance,” ACM Trans. Computer Systems, vol. 14, no. 1, pp. 80-107, 1996.
[4]
P.A. Barrett et al., “The Delta-4 Extra Performance Architecture (XPA),” Proc. Int'l Symp. Fault-Tolerant Computing, pp. 481-488, 1990.
[5]
Theory and Practice of Object Systems, vol. 4, no. 2, pp. 81-92, 1998.
[6]
R. Jimenez-Peris, M. Patino-Martinez, and S. Arevalo, “Deterministic Scheduling for Transactional Multithreaded Replicas,” Proc. IEEE Symp. Reliable Distributed Systems, pp. 164-173, 2000.
[7]
C. Basile, K. Whisnant, Z. Kalbarczyk, and R. Iyer, “Loose Synchronization of Multithreaded Replicas,” Proc. IEEE Symp. Reliable Distributed Systems, pp. 250-255, 2002.
[8]
C. Basile, Z. Kalbarczyk, and R. Iyer, “Preemptive Deterministic Scheduling Algorithm for Multithreaded Replicas,” Proc. Int'l Conf. Dependable Systems and Networks, pp. 149-158, 2003.
[9]
M. Hayden, “The Ensemble System,” PhD dissertation, Dept. of Computer Science, Cornell Univ. 1997.
[10]
M.K. Reiter, “The Rampart Toolkit for Building High-Integrity Services,” Lecture Notes in Computer Science, vol. 938, pp. 99-110, 1994.
[11]
G. Chockler, I. Keidar, and R. Vitenberg, “Group Communication Specifications: A Comprehensive Study,” ACM Computing Surveys, vol. 33, no. 4, pp. 427-469, 2001.
[12]
R. Jimenez-Peris, M. Patino-Martinez, and G. Alonso, “Non-Intrusive, Parallel Recovery of Replicated Data,” Proc. IEEE Symp. Reliable Distributed Systems, pp. 150-159, 2002.
[13]
C. Basile, Z. Kalbarczyk, and R. Iyer, “Active Replication of Multithreaded Replicas: Appendix,”
[14]
L. Lamport, “Time, Clocks and the Ordering of Events in Distributed Systems,” Comm. ACM, vol. 21, no. 7, pp. 558-564, 1978.
[15]
F.B. Schneider, “Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial,” ACM Computing Surveys, vol. 22, no. 4, pp. 299-319, 1990.
[16]
C. Basile, Z. Kalbarczyk, K. Whisnant, and R. Iyer, “Active Replication of Multithreaded Applications,” Technical Report CRHC-02-01, Univ. of Illinois at Urbana-Champaign, 2002.
[17]
D. Stott, B. Floering, Z. Kalbarczyk, and R. Iyer, “Dependability Assessment in Distributed Systems with Lightweight Fault Injectors in NFTAPE,” Proc. Int'l Computer Performance and Dependability Symp., 2000.
[18]
E. Fuchs, “Validating the Fail-Silence Assumption of the MARS Architecture,” Proc. Dependable Computing for Critical Applications Conf., pp. 225-247, 1998.
[19]
H. Madeira and J.G. Silva, “Experimental Evaluation of the Fail-Silent Behavior in Computers without Error Masking,” Proc. Int'l Symp. Fault-Tolerant Computing, pp. 350-359, 1994.
[20]
M. Rimen, J. Ohlsson, and J. Torin, “On Microprocessor Error Behavior Modeling,” Proc. Int'l Symp. Fault-Tolerant Computing, pp. 76-85, 1994.
[21]
C. Basile, L. Wang, Z. Kalbarczyk, and R. Iyer, “Group Communication Protocols under Errors,” Proc. IEEE Symp. Reliable Distributed Systems, pp. 35-44, 2003.
[22]
S. Pleisch and A. Schiper, “FATOMAS— A Fault-Tolerant Mobile Agent System Based on the Agent-Dependent Approach,” Proc. Int'l Conf. Dependable Systems and Networks, pp. 215-224, 2001.
[23]
G.D. Parrington, S.K. Shrivastava, S.M. Wheater, and M.C. Little, “The Design and Implementation of Arjuna,” Computing Systems, vol. 8, no. 2, pp. 255-308, 1995.
[24]
R. Guerraoui, P. Felber, B. Garbinato, and K.R. Mazouni, “System Support for Object Groups,” Proc. ACM Conf. Object-Oriented Programming Systems, Languages, and Applications, pp. 244-258, 1998.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 17, Issue 5
May 2006
94 pages

Publisher

IEEE Press

Publication History

Published: 01 May 2006

Author Tags

  1. Fault tolerance
  2. fault injection.
  3. multithreading
  4. nondeterminism
  5. replication

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media