Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1066157.1066160acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Fault-tolerance in the Borealis distributed stream processing system

Published: 14 June 2005 Publication History

Abstract

We present a replication-based approach to fault-tolerant distributed stream processing in the face of node failures, network failures, and network partitions. Our approach aims to reduce the degree of inconsistency in the system while guaranteeing that available inputs capable of being processed are processed within a specified time threshold. This threshold allows a user to trade availability for consistency: a larger time threshold decreases availability but limits inconsistency, while a smaller threshold increases availability but produces more inconsistent results based on partial data. In addition, when failures heal, our scheme corrects previously produced results, ensuring eventual consistency.Our scheme uses a data-serializing operator to ensure that all replicas process data in the same order, and thus remain consistent in the absence of failures. To regain consistency after a failure heals, we experimentally compare approaches based on checkpoint/redo and undo/redo techniques and illustrate the performance trade-offs between these schemes.

References

[1]
Abadi et al. Aurora: A new model and architecture for data stream management. VLDB Journal, 12(2), Sept. 2003.
[2]
Abadi et al. The design of the Borealis stream processing engine. In CIDR, Jan. 2005.
[3]
Abadi et al. The design of the Borealis stream processing engine. Technical Report CS-04-08, Department of Computer Science, Brown University, Jan. 2005.
[4]
G. Alonso and C. Mohan. WFMS: The next generation of distributed processing tools. In S. Jajodia and L. Kerschberg, editors, Advanced Transaction Models and Architectures. Kluwer, 1997.
[5]
Alonso et al. Exotica/FMQM: A persistent message-based architecture for distributed workflow management. In Proc. of IFIP WG8.1 Working Conf. on Information Systems for Decentralized Organizations, Aug. 1995.
[6]
A. Arasu, S. Babu, and J. Widom. The CQL continuous query language: Semantic foundations and query execution. Technical Report 2003-67, Stanford University, Oct. 2003.
[7]
R. Avnur and J. M. Hellerstein. Eddies: continuously adaptive query processing. In SIGMOD, May 2000.
[8]
B. Babcock, S. Babu, M. Datar, and R. Motwani. Chain: Operator scheduling for memory minimization in data stream systems. In SIGMOD, June 2003.
[9]
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In PODS, June 2002.
[10]
P. A. Bernstein, M. Hsu, and B. Mann. Implementing recoverable requests using queues. In SIGMOD, June 1990.
[11]
E. A. Brewer. Lessons from giant-scale services. IEEE Internet Computing, 5(4):46--55, 2001.
[12]
D. Carney, U. Çetintemel, A. Rasin, S. Zdonik, M. Cherniack, and M. Stonebraker. Operator scheduling in a data stream manager. In 29th VLDB, Sept. 2003.
[13]
S. Chandrasekaran and M. J. Franklin. Remembrance of streams past: Overload-sensitive management of archived streams. In 30th VLDB, Sept. 2004.
[14]
Chandrasekaran et al. TelegraphCQ: Continuous dataflow processing for an uncertain world. In CIDR, Jan. 2003.
[15]
Cherniack et al. Scalable distributed stream processing. In CIDR, Jan. 2003.
[16]
C. Cranor, T. Johnson, V. Shkapenyuk, and O. Spatscheck. Gigascope: A stream database for network applications. In SIGMOD, June 2003.
[17]
A. Das, J. Gehrke, and M. Riedewald. Approximate join processing over data streams. In SIGMOD, June 2003.
[18]
E. N. M. Elnozahy, L. Alvisi, Y.-M. Wang, and D. B. Johnson. A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv., 34(3):375--408, 2002.
[19]
Feamster et al. Measuring the Effects of Internet Path Faults on Reactive Routing. In ACM Sigmetrics - Performance 2003, June 2003.
[20]
H. Garcia-Molina and D. Barbara. How to assign votes in a distributed system. Journal of the ACM, 32(4):841--860, Oct. 1985.
[21]
D. K. Gifford. Weighted voting for replicated data. In 7th SOSP, Dec. 1979.
[22]
J. Gray, P. Helland, P. O'Neil, and D. Shasha. The dangers of replication and a solution. In SIGMOD, June 1996.
[23]
J. Gray and A. Reuters. Transaction processing: concepts and techniques. Morgan Kaufmann, 1993.
[24]
M. Hsu. Special issue on workflow systems. IEEE Data Eng. Bulletin, 18(1), Mar. 1995.
[25]
J.-H. Hwang, M. Balazinska, A. Rasin, U. Çetintemel, M. Stonebraker, and S. Zdonik. High-availability algorithms for distributed stream processing. In 21st ICDE, Apr. 2005.
[26]
M. Kamath, G. Alonso, R. Guenthor, and C. Mohan. Providing high availability in very large workflow management systems. In 5th Int. Conf. on Extending Database Technology, Mar. 1996.
[27]
Kawell et al. Replicated document management in a group communication system. In Second CSCW, Sept. 1988.
[28]
Y.-N. Law, H. Wang, and C. Zaniolo. Query languages and data models for database sequences and data streams. In 30th VLDB, Sept. 2004.
[29]
D. Lomet and M. Tuttle. A theory of redo recovery. In SIGMOD, June 2003.
[30]
Motwani et al. Query processing, approximation, and resource management in a data stream management system. In CIDR, Jan. 2003.
[31]
Naughton et al. The Niagara Internet query system. IEEE Data Eng. Bulletin, 24(2), June 2001.
[32]
C. Olston. Approximate Replication. PhD thesis, Stanford University, 2003.
[33]
C. Olston, J. Jiang, and J. Widom. Adaptive filters for continuous queries over distributed data streams. In SIGMOD, June 2003.
[34]
V. Raman and J. M. Hellerstein. Partial results for online query processing. In SIGMOD, June 2002.
[35]
M. Shah, J. Hellerstein, and E. Brewer. Highly-available, fault-tolerant, parallel dataflows. In SIGMOD, June 2004.
[36]
U. Srivastava and J. Widom. Flexible time management in data stream systems. In 23rd PODS, June 2004.
[37]
R. E. Strom. Fault-tolerance in the SMILE stateful publish-subscribe system. In DEBS, May 2004.
[38]
N. Tatbul, U. Çetintemel, S. Zdonik, M. Cherniack, and M. Stonebraker. Load shedding in a data stream manager. In 29th VLDB, Sept. 2003.
[39]
Terry et al. Managing update conflicts in Bayou, a weakly connected replicated storage system. In 15th SOSP, Dec. 1995.
[40]
The NTP Project. NTP: The Network Time Protocol. http://www.ntp.org/.
[41]
P. A. Tucker and D. Maier. Dealing with disorder. In MPDS, June 2003.
[42]
R. Urbano. Oracle Streams Replication Administrator's Guide, 10g Release 1 (10.1). Oracle Corporation, Dec. 2003.

Cited By

View all
  • (2024)Snatch: Online Streaming Analytics at the Network EdgeProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629577(349-369)Online publication date: 22-Apr-2024
  • (2023)Adaptive Fragment-Based Parallel State Recovery for Stream Processing SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.325199734:8(2464-2478)Online publication date: Aug-2023
  • (2023)Dynamic Adaptive Checkpoint Mechanism for Streaming Applications Based on Reinforcement Learning2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS56603.2022.00076(538-545)Online publication date: Jan-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data
June 2005
990 pages
ISBN:1595930604
DOI:10.1145/1066157
  • Conference Chair:
  • Fatma Ozcan
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2005

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)5
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Snatch: Online Streaming Analytics at the Network EdgeProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629577(349-369)Online publication date: 22-Apr-2024
  • (2023)Adaptive Fragment-Based Parallel State Recovery for Stream Processing SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.325199734:8(2464-2478)Online publication date: Aug-2023
  • (2023)Dynamic Adaptive Checkpoint Mechanism for Streaming Applications Based on Reinforcement Learning2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS56603.2022.00076(538-545)Online publication date: Jan-2023
  • (2023)A survey on transactional stream processingThe VLDB Journal10.1007/s00778-023-00814-z33:2(451-479)Online publication date: 27-Sep-2023
  • (2022)Does Social Media Usage Influence Selective AttentionInternational Journal of Cyber Behavior, Psychology and Learning10.4018/IJCBPL.30490512:1(1-15)Online publication date: 12-Jul-2022
  • (2022)Enabling efficient and general subpopulation analytics in multidimensional data streamsProceedings of the VLDB Endowment10.14778/3551793.355186715:11(3249-3262)Online publication date: 1-Jul-2022
  • (2022)Incremental Checkpointing for Fault-Tolerant Stream Processing Systems: A Data Structure ApproachIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2020.298648710:1(124-136)Online publication date: 1-Jan-2022
  • (2022)S-QUERY: Opening the Black Box of Internal Stream Processor State2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00103(1314-1327)Online publication date: May-2022
  • (2022)Radial Basis Function Network with Differential PrivacyFuture Generation Computer Systems10.1016/j.future.2021.09.013127:C(473-486)Online publication date: 1-Feb-2022
  • (2022)A comprehensive study on fault tolerance in stream processing systemsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-020-0248-x16:2Online publication date: 1-Apr-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media