Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1133373.1133387acmotherconferencesArticle/Chapter ViewAbstractPublication PagesewConference Proceedingsconference-collections
Article

Rewind, repair, replay: three R's to dependability

Published: 01 July 2002 Publication History

Abstract

Motivated by the growth of web and infrastructure services and their susceptibility to human operator-related failures, we introduce system-level undo as a recovery mechanism designed to improve service dependability. Undo enables system operators to recover from their inevitable mistakes and furthermore enables retroactive repair of problems that were not fixed quickly enough to prevent detrimental effects. We present the "three R's", a model of undo that matches the needs of human error recovery and retroactive repair; discuss several of the issues raised by this undo model; and introduce an initial architectural framework for undoable systems using the example of an undoable e-mail service system.

References

[1]
A. Borg, W. Blau et al. Fault Tolerance Under UNIX. ACM TOCS, 7(1):1--24, February 1989.
[2]
A. Brown and D. A. Patterson. To Err is Human. Proc. 2001 Workshop on Evaluating and Architecting System dependabilitY, Göteborg, Sweden, July 2001.
[3]
W. K. Edwards. Flexible Conflict Detection and Management in Collaborative Applications. Proc. 10th ACM Symp. on User Interface Software and Technology. Banff, Canada, October 1997.
[4]
W. K. Edwards and E. D. Mynatt. Timewarp: Techniques for Autonomous Collaboration. Proc ACM Conf. on Human Factors in Computing Systems. Atlanta, GA, March 1997.
[5]
W. K. Edwards, T. Igarashi, et al. A Temporal Model for Multi-Level Undo and Redo. Proc 13th ACM Symp. on User Interface Software and Technology. San Diego, CA, November 2000.
[6]
E. N. Elnozahy, D. B. Johnson, and Y. M. Wang. A Survey of Rollback-Recovery Protocols in Message-Passing Systems. CMU TR 96--181, Carnegie Mellon, 1996.
[7]
P. Enriquez, A. Brown, and D. A. Patterson. Lessons from the PSTN for Dependable Computing. Proc. 2002 Workshop on Self-Healing, Adaptive and self-MANaged Systems (SHAMAN), New York, June 2001.
[8]
E. Freeman and D. Gelernter. Lifestreams: A Storage Model for Personal Data. ACM SIGMOD Bulletin 25(1):80--86, March 1996.
[9]
E. Gamma, R. Helm, et al. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, 1995.
[10]
H. Korth, E. Levy, and A. Silberschatz. A Formal Approach to Recovery by Compensating Transactions. Proc 16th VLDB Conference, Brisbane, Australia, 1990.
[11]
D. Kurlander and S. Feiner. Editable Graphical Histories. Proc 1988 IEEE Workshop on Visual Languages, Pittsburgh, PA, October 1988.
[12]
D. E. Lowell, S. Chandra, and P. Chen. Exploring Failure Transparency and the Limits of Generic Recovery. Proc. 4th OSDI. San Diego, CA, October 2000.
[13]
C. Mohan, D. Haderle, et al. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. ACM Trans. Database Systems, 17(1):94--162, 1992.
[14]
D. Oppenheimer and D. A. Patterson. Why do Internet services fail, and what can be done about it? Proc. 10th ACM SIGOPS European Workshop. Saint-Emilion, France, September 2002.
[15]
J. Reason. Human Error. Cambridge University Press, 1990.
[16]
J. Rekimoto. Time-Machine Computing: A Time-Centric Approach for the Information Environment. Proc 12th ACM Symp. on User Interface Software and Technology, 1999.
[17]
Roxio, Inc. GoBack3. http://www.roxio.com/en/products/goback/index.jhtml.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
EW 10: Proceedings of the 10th workshop on ACM SIGOPS European workshop
July 2002
258 pages
ISBN:9781450378062
DOI:10.1145/1133373
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2002

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 37 of 37 submissions, 100%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Sanare: Pluggable Intrusion Recovery for Web ApplicationsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2021.313947220:1(590-605)Online publication date: 1-Jan-2023
  • (2021)Recovery‐Oriented ComputingFrom Traditional Fault Tolerance to Blockchain10.1002/9781119682127.ch3(63-101)Online publication date: 18-Jun-2021
  • (2015)Process recovery by rollback and input modificationInternational Journal of Communication Networks and Distributed Systems10.1504/IJCNDS.2015.07028815:1(61-83)Online publication date: 1-Jul-2015
  • (2014)CausalityProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/2556288.2556990(1777-1786)Online publication date: 26-Apr-2014
  • (2014)A Recoverability-Oriented Analysis for Operations on Cloud ApplicationsProceedings of the 2014 IEEE/IFIP Conference on Software Architecture10.1109/WICSA.2014.14(125-128)Online publication date: 7-Apr-2014
  • (2014)Towards a Taxonomy of Cloud Recovery StrategiesProceedings of the 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks10.1109/DSN.2014.70(696-701)Online publication date: 23-Jun-2014
  • (2014)Recovery‐Oriented ComputingBuilding Dependable Distributed Systems10.1002/9781118912744.ch3(57-95)Online publication date: Mar-2014
  • (2013)Supporting undoability in systems operationsProceedings of the 27th USENIX conference on Large Installation System Administration10.5555/2717477.2717484(75-87)Online publication date: 3-Nov-2013
  • (2013)Supporting undoability in systems operationsProceedings of the 27th international conference on Large Installation System Administration10.5555/2555492.2555499(75-87)Online publication date: 3-Nov-2013
  • (2013)Autonomous, failure-resilient orchestration of distributed discrete event simulationsProceedings of the 2013 ACM Cloud and Autonomic Computing Conference10.1145/2494621.2494625(1-10)Online publication date: 9-Aug-2013
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media