MillWheel is a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous flow of records, all within the envelope of the framework's fault-tolerance guarantees.

This paper describes MillWheel's programming model as well as its implementation. The case study of a continuous anomaly detector in use at Google serves to motivate how many of MillWheel's features are used. MillWheel's programming model provides a notion of logical time, making it simple to write time-based aggregations. MillWheel was designed from the outset with fault tolerance and scalability in mind. In practice, we find that MillWheel's unique combination of scalability, fault tolerance, and a versatile programming model lends itself to a wide variety of problems at Google.

References

[1]

D. J. Abadi, Y. Ahmad, M. Balazinska, M. Cherniack, J. hyon Hwang, W. Lindner, A. S. Maskey, E. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The design of the borealis stream processing engine. In In CIDR, pages 277-289, 2005.

Google Scholar

[2]

D. J. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: a new model and architecture for data stream management. The VLDB Journal, 12(2):120-139, 2003.

Digital Library

Google Scholar

[3]

A. Adya, J. Dunagan, and A. Wolman. Centrifuge: Integrated lease management and partitioning for cloud services. In NSDI, pages 1-16. USENIX Association, 2010.

Digital Library

Google Scholar

[4]

Apache. Apache hadoop. http://hadoop.apache.org, 2012.

Google Scholar

[5]

B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 1-16. ACM, 2002.

Digital Library

Google Scholar

[6]

S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, F. Reiss, and M. A. Shah. Telegraphcq: continuous dataflow processing. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pages 668-668. ACM, 2003.

Digital Library

Google Scholar

[7]

F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26:4:1-4:26, June 2008.

Digital Library

Google Scholar

[8]

T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. Mapreduce online. Technical report, University of California, Berkeley, 2009.

Google Scholar

[9]

J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, et al. Spanner: Googles globally-distributed database. To appear in Proceedings of OSDI, page 1, 2012.

Digital Library

Google Scholar

[10]

C. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk, and O. Spatscheck. Gigascope: High performance network monitoring with an sql interface. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pages 623-623. ACM, 2002.

Digital Library

Google Scholar

[11]

J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51:107-113, Jan. 2008.

Digital Library

Google Scholar

[12]

E. Deelman and B. K. Szymanski. Continuously monitored global virtual time. Technical report, in Intern. Conf. Parallel and Distributed Processing Techniques and Applications, Las Vegas, NV, 1996.

Google Scholar

[13]

Google. Protocol buffers. http://code.google.com/p/protobuf/, 2012.

Google Scholar

[14]

J.-H. Hwang, M. Balazinska, A. Rasin, U. Cetintemel, M. Stonebraker, and S. Zdonik. High-availability algorithms for distributed stream processing. In Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on, pages 779-790. IEEE, 2005.

Digital Library

Google Scholar

[15]

D. R. Jefferson. Virtual time. ACM Transactions on Programming Languages and Systems, 7:404-425, 1985.

Digital Library

Google Scholar

[16]

T. Johnson, S. Muthukrishnan, V. Shkapenyuk, and O. Spatscheck. A heartbeat mechanism and its application in gigascope. In Proceedings of the 31st international conference on Very large data bases, pages 1079-1088. VLDB Endowment, 2005.

Digital Library

Google Scholar

[17]

Y. Kwon, M. Balazinska, and A. Greenberg. Fault-tolerant stream processing using a distributed, replicated file system. Proceedings of the VLDB Endowment, 1(1):574-585, 2008.

Digital Library

Google Scholar

[18]

L. Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558-565, July 1978.

Digital Library

Google Scholar

[19]

J. Li, K. Tufte, V. Shkapenyuk, V. Papadimos, T. Johnson, and D. Maier. Out-of-order processing: a new architecture for high-performance stream systems. Proceedings of the VLDB Endowment, 1(1):274-288, 2008.

Digital Library

Google Scholar

[20]

D. Logothetis, C. Olston, B. Reed, K. C. Webb, and K. Yocum. Stateful bulk processing for incremental analytics. In Proceedings of the 1st ACM symposium on Cloud computing, pages 51-62. ACM, 2010.

Digital Library

Google Scholar

[21]

S. Madden and M. J. Franklin. Fjording the stream: An architecture for queries over streaming sensor data. In Data Engineering, 2002. Proceedings. 18th International Conference on, pages 555-566. IEEE, 2002.

Digital Library

Google Scholar

[22]

N. Marz. Trident. https://github.com/nathanmarz/storm/wiki/Trident-tutorial, 2012.

Google Scholar

[23]

N. Marz. Twitter storm. https://github.com/nathanmarz/storm/wiki, 2012.

Google Scholar

[24]

R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma. Query processing, resource management, and approximation in a data stream management system. Technical Report 2002-41, Stanford InfoLab, 2002.

Google Scholar

[25]

D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. Ciel: a universal execution engine for distributed data-flow computing. In Proceedings of the 8th USENIX conference on Networked systems design and implementation, page 9. USENIX Association, 2011.

Digital Library

Google Scholar

[26]

L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In Data Mining Workshops (ICDMW), 2010 IEEE International Conference on, pages 170-177, dec. 2010.

Digital Library

Google Scholar

[27]

D. Peng, F. Dabek, and G. Inc. Large-scale incremental processing using distributed transactions and notifications. In 9th USENIX Symposium on Operating Systems Design and Implementation, 2010.

Digital Library

Google Scholar

[28]

M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, and M. J. Franklin. Flux: An adaptive partitioning operator for continuous query systems. In Data Engineering, 2003. Proceedings. 19th International Conference on, pages 25-36. IEEE, 2003.

Crossref

Google Scholar

[29]

U. Srivastava and J. Widom. Flexible time management in data stream systems. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 263-274. ACM, 2004.

Digital Library

Google Scholar

[30]

M. Stonebraker, U. Çetintemel, and S. Zdonik. The 8 requirements of real-time stream processing. ACM SIGMOD Record, 34(4):42-47, 2005.

Digital Library

Google Scholar

[31]

P. A. Tucker, D. Maier, T. Sheard, and L. Fegaras. Exploiting punctuation semantics in continuous data streams. Knowledge and Data Engineering, IEEE Transactions on, 15(3):555-568, 2003.

Digital Library

Google Scholar

[32]

F. Yang, Z. Qian, X. Chen, I. Beschastnikh, L. Zhuang, L. Zhou, and J. Shen. Sonora: A platform for continuous mobile-cloud computing. Technical report, Technical Report. Microsoft Research Asia.

Google Scholar

[33]

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, 2011.

Digital Library

Google Scholar

[34]

M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica. Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing, pages 10-10. USENIX Association, 2012.

Digital Library

Google Scholar

Cited By

View all

Wang ZHuang YNi SKumar AAlsudais SLiu XLin XDing YLi C(2024)Texera: A System for Collaborative and Interactive Data Analytics Using WorkflowsProceedings of the VLDB Endowment10.14778/3681954.368202217:11(3580-3588)Online publication date: 30-Aug-2024
https://dl.acm.org/doi/10.14778/3681954.3682022
Zhang CBonifati AÖzsu M(2024)Incremental Sliding Window Connectivity over Streaming GraphsProceedings of the VLDB Endowment10.14778/3675034.367504017:10(2473-2486)Online publication date: 6-Aug-2024
https://dl.acm.org/doi/10.14778/3675034.3675040
Lam KZhao XZhu CKuo T(2024)MVLevelDB+: Meeting Relative Consistency Requirements of Temporal Queries in Sensor Stream DatabasesACM Transactions on Embedded Computing Systems10.1145/369478724:1(1-26)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3694787
Show More Cited By

Index Terms

MillWheel: fault-tolerant stream processing at internet scale
1. Computer systems organization

Recommendations

Sampling + DMR: practical and low-overhead permanent fault detection
ISCA '11

With technology scaling, manufacture-time and in-field permanent faults are becoming a fundamental problem. Multi-core architectures with spares can tolerate them by detecting and isolating faulty cores, but the required fault detection coverage becomes ...
Reliability Analysis of N-Modular Redundancy Systems with Intermittent and Permanent Faults

It is well known that static redundancy techniques are very efficient against intermittent (transient) faults which constitute a large portion of logic faults in digital systems. However, very little theoretical work has been done in evaluating the ...
Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults
ASPLOS XVII: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems

Future microprocessors need low-cost solutions for reliable operation in the presence of failure-prone devices. A promising approach is to detect hardware faults by deploying low-cost monitors of software-level symptoms of such faults. Recently, ...

Reviews

Reviewer: Peng Li

In this paper, the authors introduce MillWheel, a framework for processing real-time streamed data that is used at Google. It is based on the paradigm of streaming computing, in which users specify a computation graph and provide code for individual nodes. After the computation starts, the application receives continuous data and processes it in a real-time fashion to reduce latency. MillWheel targets latency-sensitive applications, such as the detection of query hikes or query dips, and the delivery of ads. The MillWheel model features persistent storage, low watermarks, and duplicate prevention. The persistent storage stores the state of each node so that if a node fails, the system can resume the computation without losing data by reading the stored state. The low watermark subsystem tracks the timestamps of processed data at each node, providing the basis for fault tolerance and duplication prevention. With those features, MillWheel is capable of processing continuous data streams that consist of data tokens with keys and timestamps. If the user wants to trigger a specific event at a future time, he or she can use the provided timer mechanism (which is optional) to register the event. Fault tolerance is a key feature of MillWheel. Since the framework might run on thousands of nodes continuously, the chance of node failure is high. When a node fails, it is required that computed results are saved; if the node keeps state, the state can be resumed so that future computations are still correct. After the node restarts from the failure, duplication of computed results should also be prevented. To achieve these goals, MillWheel ensures exactly once delivery with the help of persistent storage and data acknowledgment. For stateful computations (in which repeated computations with the same input may have different results), MillWheel provides a mechanism called “strong productions” to checkpoint produced data before changing the node state. This form of fault tolerance, however, might incur unnecessary cost for stateless computations (in which computations with the same input always yield the same output). MillWheel provides an option to turn off strong productions and use weak productions to improve the performance of stateless computations. The state of computations is stored in both disks and memories. While disks provide space for huge amounts of state data, memories have a speed advantage. To ensure consistency, all state write operations are wrapped in per-key atomic operations. However, in case of work migration or node restart, there might be zombie writers and network remnants that issue stale writes. To prevent this, all write operations are assigned with a unique sequence. The user can customize the granularity of state modification operations for performance benefit according to the failure probabilities. MillWheel has been implemented on Google clusters. Streams are delivered via remote procedure call. Load distribution is handled by a replicated master. Persistent state is maintained by a database like BigTable or Spanner. Low watermarks are computed conservatively by a central authority, but interested consumers should compute the low watermark according to their own records and those of subscribed senders. Experiments show that with weak productions, the median record delay of a node is several milliseconds. With strong productions and exactly-once delivery enabled, the median delay increases to tens of milliseconds, which is still within human reaction time. Framework-level cache is also used to reduce traffic between storage layers. MillWheel has been used in various Google internal systems, such as ad delivery and image processing for Google Street View. However, the authors point out that there are applications that MillWheel does not suit well. For example, if an application cannot be parallelized well among different keys, there could be bottleneck stages that slow down the whole computation. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 6, Issue 11

August 2013

237 pages

ISSN:2150-8097

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2013

Published in PVLDB Volume 6, Issue 11

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

418
Total Citations
View Citations
2,133
Total Downloads

Downloads (Last 12 months)139
Downloads (Last 6 weeks)21

Reflects downloads up to 03 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Wang ZHuang YNi SKumar AAlsudais SLiu XLin XDing YLi C(2024)Texera: A System for Collaborative and Interactive Data Analytics Using WorkflowsProceedings of the VLDB Endowment10.14778/3681954.368202217:11(3580-3588)Online publication date: 30-Aug-2024
https://dl.acm.org/doi/10.14778/3681954.3682022
Zhang CBonifati AÖzsu M(2024)Incremental Sliding Window Connectivity over Streaming GraphsProceedings of the VLDB Endowment10.14778/3675034.367504017:10(2473-2486)Online publication date: 6-Aug-2024
https://dl.acm.org/doi/10.14778/3675034.3675040
Lam KZhao XZhu CKuo T(2024)MVLevelDB+: Meeting Relative Consistency Requirements of Temporal Queries in Sensor Stream DatabasesACM Transactions on Embedded Computing Systems10.1145/369478724:1(1-26)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3694787
Jindal ABeedkar KSingh VMohammed JSingla TGupta AChoudhary K(2024)Reactive Dataflow for Inflight Error Handling in ML WorkflowsProceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663333(51-61)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3650203.3663333
Meldrum MCarbone P(2024)μWheel: Aggregate Management for Streams and QueriesProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666031(54-65)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3629104.3666031
Xiao YZhao YLin SKuzmanovic A(2024)Snatch: Online Streaming Analytics at the Network EdgeProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629577(349-369)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3629577
Bonifati ATommasini RBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654679
Dong SFan ZBai TYang TXue HChen PWu Y(2024)M4: A Framework for Per-Flow Quantile Estimation2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00364(4787-4800)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00364
Siachamis GPsarakis KFragkoulis Mvan Deursen ACarbone PKatsifodimos A(2024)CheckMate: Evaluating Checkpointing Protocols for Streaming Dataflows2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00309(4030-4043)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00309
Geren HSohrabi NTari ZMoustafa N(2024)A Predictive Profiling and Performance Modeling Approach for Distributed Stream Processing in Edge2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00125(1520-1532)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00125
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Sampling + DMR: practical and low-overhead permanent fault detection

Reliability Analysis of N-Modular Redundancy Systems with Intermittent and Permanent Faults

Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults

Reviews

Access critical reviews of Computing literature here