Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

On group communication in large-scale distributed systems

Published: 11 January 1995 Publication History

Abstract

An increasing number of applications with reliability requirements are being deployed in distributed systems that span large geographic distances or manage large numbers of objects. We consider the process group mechanism as an appropriate application structuring paradigm in such large-scale distributed systems. We give a formal characterization for the attribute "large scale" as applied to distributed systems and examine the technical problems that need to be solved in making group technology scalable. Our design advocates multiple roles for group membership over a minimal set of abstractions and primitives. The design is currently being implemented on top of "off-the-shelf" technologies for both communication and computation.

References

[1]
{1} Y. Amir, D. Dolev, S. Kramer and D. Malki. Transis: A Communication Sub-System for High Availability. In Proc. 22nd Annual International Symposium on Fault-Tolerant Computing, pages 76-84, July 1992.
[2]
{2} Ö. Babao¿lu, M.G. Baker, R. Davoli, and L.A. Giachini. RELACS: A Communications Infrastructure for Constructing Reliable Applications in Large-Scale Distributed Systems. Technical Report UBLCS-94-15, Laboratory for Computer Science, University of Bologna, Italy, June 1994.
[3]
{3} K. Birman. The Process Group Approach to Reliable Distributed Computing, Communication of the ACM, 9(12):36-53, December 1993.
[4]
{4} K. Birman and R. Cooper. The ISIS Project: Real Experience with a Fault-Tolerant Programming System. ACM SIGOPS Operating Systems Review, 25(2):103-107, April 1991.
[5]
{5} K. Birman, A. Schiper and P. Stephenson. Lightweight Causal and Atomic Multicast. ACM Trans. Computing Systems, 9(3):272-314, August 1991.
[6]
{6} T.D. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. In Proceedings 10th ACM Symposium on Principles of Distributed Computing, pages 325-340. ACM, August 1991.
[7]
{7} D.R. Cheriton and W. Zwaenepoel. Distributed Process Groups in the V Kernel. ACM Trans. Comput. Syst.. 3(2):77-107, May 1985.
[8]
{8} P. Felber, C. Malloth, A. Schiper and U. Wilhelm. Phoenix: A Group-Oriented Infrastructure for Large-Scale Distributed Systems. Technical Report, EPFL-LSE, Lausanne, Switzerland. In preparation.
[9]
{9} Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of Distributed Consensus with One Faulty Process. Journal of ACM, 32(2):374-382, April 1985.
[10]
{10} L. Liang, S.T. Chanson and G.W. Neufeld Process Groups and Group Communications: Classifications and Requirements. IEEE Computer, 23(2):56-66, February 1990.
[11]
{11} L.L. Peterson, N.C. Bucholz, and R.D. Schlichting. Preserving and using context information in inter-process communication. ACM Transactions on Computer Systems, 7(3):217-246, August 1989.
[12]
{12} A. Ricciardi, A. Schiper and K. Birman, Understanding Partitions and the "No Partition" Assumption. In Proc. 4th IEEE Workshop on Future Trends of Distributed Systems, Lisboa, September 1993.
[13]
{13} R. van Renesse, K. Birman, R. Cooper, B. Glade and P. Stephenson. The Horus System. In Reliable Distributed Computing with the Isis Toolkit, K.P. Birman, R. van Renesse (Ed.), IEEE Computer Society Press, Los Alamitos, CA, pages 133-147, 1993.
[14]
{14} A. Schiper and A. Sandoz. Uniform Reliable Multicast in a Virtually Synchronous Environment. In Proc. 13th Int. Conference on Distributed Computing Systems, pages 501-568, May 1993.
[15]
{15} A. Schiper and A. Ricciardi. Virtually-Synchronous Communication Based on a Weak Failure Suspector. In Proc. 23rd Int. Symp. on Fault-Tolerant Computing, Toulouse, pages 534-543, June 1993.

Cited By

View all
  • (2005)The inherent cost of strong-partial view-synchronous communicationDistributed Algorithms10.1007/BFb0022139(72-86)Online publication date: 15-Jun-2005
  • (2004)Achieving Critical System Survivability Through Software ArchitecturesArchitecting Dependable Systems II10.1007/978-3-540-25939-8_3(51-78)Online publication date: 2004
  • (2002)Scalable group membership service for mobile InternetProceedings of the Seventh IEEE International Workshop on Object-Oriented Real-Time Dependable Systems. (WORDS 2002)10.1109/WORDS.2002.1000065(295-298)Online publication date: 2002
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 29, Issue 1
Jan. 1995
94 pages
ISSN:0163-5980
DOI:10.1145/202453
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 January 1995
Published in SIGOPS Volume 29, Issue 1

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)6
Reflects downloads up to 28 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2005)The inherent cost of strong-partial view-synchronous communicationDistributed Algorithms10.1007/BFb0022139(72-86)Online publication date: 15-Jun-2005
  • (2004)Achieving Critical System Survivability Through Software ArchitecturesArchitecting Dependable Systems II10.1007/978-3-540-25939-8_3(51-78)Online publication date: 2004
  • (2002)Scalable group membership service for mobile InternetProceedings of the Seventh IEEE International Workshop on Object-Oriented Real-Time Dependable Systems. (WORDS 2002)10.1109/WORDS.2002.1000065(295-298)Online publication date: 2002
  • (1999)Error recovery in critical infrastructure systemsProceedings Computer Security, Dependability, and Assurance: From Needs to Solutions (Cat. No.98EX358)10.1109/CSDA.1998.798357(49-71)Online publication date: 1999

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media