research-article

Open access

Adaptive online scheduling in storm

Authors:

Leonardo Aniello,

Roberto Baldoni,

Leonardo QuerzoniAuthors Info & Claims

DEBS '13: Proceedings of the 7th ACM international conference on Distributed event-based systems

Pages 207 - 218

https://doi.org/10.1145/2488222.2488267

Published: 29 June 2013 Publication History

Abstract

Today we are witnessing a dramatic shift toward a data-driven economy, where the ability to efficiently and timely analyze huge amounts of data marks the difference between industrial success stories and catastrophic failures. In this scenario Storm, an open source distributed realtime computation system, represents a disruptive technology that is quickly gaining the favor of big players like Twitter and Groupon. A Storm application is modeled as a topology, i.e. a graph where nodes are operators and edges represent data flows among such operators. A key aspect in tuning Storm performance lies in the strategy used to deploy a topology, i.e. how Storm schedules the execution of each topology component on the available computing infrastructure.

In this paper we propose two advanced generic schedulers for Storm that provide improved performance for a wide range of application topologies. The first scheduler works offline by analyzing the topology structure and adapting the deployment to it; the second scheduler enhance the previous approach by continuously monitoring system performance and rescheduling the deployment at run-time to improve overall performance. Experimental results show that these algorithms can produce schedules that achieve significantly better performances compared to those produced by Storm's default scheduler.

References

[1]

Gartner says big data creates big jobs: 4.4 million it jobs globally to support big data by 2015. http://www.gartner.com/newsroom/id/2207915.

[2]

Storm. http://storm-project.net/.

[3]

Streammine3g. https://streammine3g.inf.tu-dresden.de/trac.

[4]

Data never sleeps infographic. http://www.domo.com/learn/7/236#videos-and-infographics, 2012.

[5]

D. J. Abadi, Y. Ahmad, M. Balazinska, M. Cherniack, J. hyon Hwang, W. Lindner, A. S. Maskey, E. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The design of the borealis stream processing engine. In Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research, 2005.

[6]

L. Amini, H. Andrade, R. Bhagwan, F. Eskesen, R. King, P. Selo, Y. Park, and C. Venkatramani. Spc: a distributed, scalable platform for data mining. In Proceedings of the 4th international workshop on Data mining standards, services and platforms, 2006.

Digital Library

[7]

L. Aniello, L. Querzoni, and R. Baldoni. Input data organization for batch processing in time window based computations. In Proceedings of the 28th Symposium On Applied Computing, 2013.

Digital Library

[8]

R. Baldoni, G. A. Di Luna, D. Firmani, and G. Lodi. A model for continuous query latencies in data streams. In Proceedings of the 1st International Workshop on Algorithms and Models for Distributed Event Processing, 2011.

Digital Library

[9]

S. Bansal, P. Kumar, and K. Singh. An improved duplication strategy for scheduling precedence constrained graphs in multiprocessor systems. IEEE Transactions on Parallel and Distributed Systems, 2003.

Digital Library

[10]

Y. Baram, R. El-Yaniv, and K. Luz. Online choice of active learning algorithms. The Journal of Machine Learning Research, 2004.

Digital Library

[11]

A. Brito, A. Martin, T. Knauth, S. Creutz, D. Becker, S. Weigert, and C. Fetzer. Scalable and low-latency data processing with stream mapreduce. In Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and Science, 2011.

Digital Library

[12]

M. Cammert, C. Heinz, J. Kramer, B. Seeger, S. Vaupel, and U. Wolske. Flexible multi-threaded scheduling for continuous queries over data streams. In Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop, 2007.

Digital Library

[13]

T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. Mapreduce online. In Proceedings of the 7th USENIX conference on Networked systems design and implementation, 2010.

Digital Library

[14]

J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 2008.

Digital Library

[15]

C. Eaton, D. Deroos, T. Deutsch, G. Lapis, and P. Zikopoulos. Understanding Big Data. Mc Graw Hill, 2012.

[16]

A. H. Hormati, Y. Choi, M. Kudlur, R. Rabbah, T. Mudge, and S. Mahlke. Flextream: Adaptive compilation of streaming applications for heterogeneous architectures. In Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques, 2009.

Digital Library

[17]

F. P. Junqueira and B. C. Reed. The life and times of a zookeeper. In Proceedings of the 21st annual symposium on Parallelism in algorithms and architectures, 2009.

Digital Library

[18]

Y.-K. Kwok and I. Ahmad. Dynamic critical-path scheduling: An effective technique for allocating task graphs to multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 1996.

Digital Library

[19]

G. T. Lakshmanan, Y. G. Rabinovich, and O. Etzion. A stratified approach for supporting high throughput event processing applications. In Proceedings of the 3rd ACM International Conference on Distributed Event-Based Systems, 2009.

Digital Library

[20]

A. Martin, C. Fetzer, and A. Brito. Active replication at (almost) no cost. In Proceedings of the 2011 IEEE 30th International Symposium on Reliable Distributed Systems, 2011.

Digital Library

[21]

A. Martin, T. Knauth, S. Creutz, D. Becker, S. Weigert, C. Fetzer, and A. Brito. Low-overhead fault tolerance for high-throughput data processing systems. In Proceedings of the 2011 31st International Conference on Distributed Computing Systems, 2011.

Digital Library

[22]

L. A. Moakar, A. Labrinidis, and P. K. Chrysanthis. Adaptive class-based scheduling of continuous queries. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering Workshops, 2012.

Digital Library

[23]

L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In Proceedings of the 2010 IEEE International Conference on Data Mining Workshops, 2010.

Digital Library

[24]

P. Pietzuch, J. Ledlie, J. Shneidman, M. Roussopoulos, M. Welsh, and M. Seltzer. Network-aware operator placement for stream-processing systems. In Proceedings of the 22nd International Conference on Data Engineering, 2006.

Digital Library

[25]

M. A. Sharaf, P. K. Chrysanthis, and A. Labrinidis. Preemptive rate-based operator scheduling in a data stream management system. In Proceedings of the ACS/IEEE 2005 International Conference on Computer Systems and Applications, 2005.

Digital Library

[26]

M. A. Suleman, M. K. Qureshi, Khubaib, and Y. N. Patt. Feedback-directed pipeline parallelism. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, 2010.

Digital Library

[27]

T. White. Hadoop: The Definitive Guide. O'Reilly Media, Inc., 2012.

Digital Library

[28]

J. Wolf, N. Bansal, K. Hildrum, S. Parekh, D. Rajan, R. Wagle, K.-L. Wu, and L. Fleischer. Soda: an optimizing scheduler for large-scale stream-based distributed computer systems. In Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware, 2008.

Digital Library

[29]

Y. Xing, J.-H. Hwang, U. Çetintemel, and S. Zdonik. Providing resiliency to load variations in distributed stream processing. In Proceedings of the 32nd international conference on Very large data bases, 2006.

Digital Library

[30]

Y. Xing, S. Zdonik, and J.-H. Hwang. Dynamic load distribution in the borealis stream processor. In Proceedings of the 21st International Conference on Data Engineering, 2005.

Digital Library

Cited By

Chatziliadis XZacharatou EEracar AZeuch SMarkl V(2024)Efficient Placement of Decomposable Aggregation Functions for Stream Processing over Large Geo-Distributed TopologiesProceedings of the VLDB Endowment10.14778/3648160.364818617:6(1501-1514)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.14778/3648160.3648186
Kozar ADel Monte BZeuch SMarkl V(2024)Fault Tolerance Placement in the Internet of ThingsProceedings of the ACM on Management of Data10.1145/36549412:3(1-29)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654941
Heinrich RBinnig CKornmayer HLuthra M(2024)Costream: Learned Cost Models for Operator Placement in Edge-Cloud Environments2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00015(96-109)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00015
Show More Cited By

Index Terms

Adaptive online scheduling in storm
1. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles

Recommendations

R-Storm: Resource-Aware Scheduling in Storm
Middleware '15: Proceedings of the 16th Annual Middleware Conference

The era of big data has led to the emergence of new systems for real-time distributed stream processing, e.g., Apache Storm is one of the most popular stream processing systems in industry today. However, Storm, like many other stream processing systems ...
Scheduling of deteriorating jobs with release dates to minimize the maximum lateness

In this paper, we consider the problem of scheduling n deteriorating jobs with release dates on a single (batching) machine. Each job's processing time is a simple linear function of its starting time. The objective is to minimize the maximum lateness. ...
Fair Online Scheduling for Selfish Jobs on Heterogeneous Machines
SPAA '16: Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures

Scheduling jobs on multiple machines has numerous applications and has been a central topic of research in the scheduling literature. Recently, much progress has been made particularly in online scheduling with the development of powerful analysis ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

DEBS '13: Proceedings of the 7th ACM international conference on Distributed event-based systems

June 2013

360 pages

ISBN:9781450317580

DOI:10.1145/2488222

General Chairs:
Sharma Chakravarthy
The University of Texas at Arlington, USA
,
Susan D. Urban
Texas Tech University, USA
,
Program Chairs:
Peter Pietzuch
Imperial College, UK
,
Elke Rundensteiner
Worcester Polytechnic Institute, USA

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Test of Time Paper

Author Tags

Qualifiers

Research-article

Conference

DEBS '13

Sponsor:

DEBS '13: The 7th ACM International Conference on Distributed Event-Based Systems

June 29 - July 3, 2013

Texas, Arlington, USA

Acceptance Rates

DEBS '13 Paper Acceptance Rate 16 of 58 submissions, 28%;

Overall Acceptance Rate 145 of 583 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

174
Total Citations
View Citations
1,578
Total Downloads

Downloads (Last 12 months)135
Downloads (Last 6 weeks)11

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chatziliadis XZacharatou EEracar AZeuch SMarkl V(2024)Efficient Placement of Decomposable Aggregation Functions for Stream Processing over Large Geo-Distributed TopologiesProceedings of the VLDB Endowment10.14778/3648160.364818617:6(1501-1514)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.14778/3648160.3648186
Kozar ADel Monte BZeuch SMarkl V(2024)Fault Tolerance Placement in the Internet of ThingsProceedings of the ACM on Management of Data10.1145/36549412:3(1-29)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654941
Heinrich RBinnig CKornmayer HLuthra M(2024)Costream: Learned Cost Models for Operator Placement in Edge-Cloud Environments2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00015(96-109)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00015
Zhang ZJin PXie XWang XLiu RWan S(2024)Online Nonstop Task Management for Storm-Based Distributed Stream Processing EnginesJournal of Computer Science and Technology10.1007/s11390-021-1629-939:1(116-138)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1007/s11390-021-1629-9
Michailidou ABellas CGounaris A(2024)Optimizing task allocation in multi-query edge analyticsCluster Computing10.1007/s10586-024-04427-127:6(8289-8306)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s10586-024-04427-1
Hadian HSharifi M(2024)GT-scheduler: a hybrid graph-partitioning and tabu-search based task scheduler for distributed data stream processing systemsCluster Computing10.1007/s10586-023-04260-y27:5(5815-5832)Online publication date: 13-Feb-2024
https://doi.org/10.1007/s10586-023-04260-y
Sun DWang YSui JGao SRong JBuyya R(2024)Lc‐Stream: An elastic scheduling strategy with latency constraints in geo‐distributed stream computing environmentsConcurrency and Computation: Practice and Experience10.1002/cpe.808536:14Online publication date: 20-Mar-2024
https://doi.org/10.1002/cpe.8085
Nasiri HNasehi SDivband AGoudarzi M(2023)A scheduling algorithm to maximize storm throughput in heterogeneous clusterJournal of Big Data10.1186/s40537-023-00771-y10:1Online publication date: 17-Jun-2023
https://doi.org/10.1186/s40537-023-00771-y
Pham LTran M(2023)Fine-Grained Elasticity for Big Data Stream Processing of IoT Applications2023 RIVF International Conference on Computing and Communication Technologies (RIVF)10.1109/RIVF60135.2023.10471874(487-492)Online publication date: 23-Dec-2023
https://doi.org/10.1109/RIVF60135.2023.10471874
Nasiri HDarjani AKavand NGoudarzi M(2023)H-Storm: A Hybrid CPU-FPGA Architecture to Accelerate Apache StormJournal of Grid Computing10.1007/s10723-023-09692-921:4Online publication date: 7-Nov-2023
https://doi.org/10.1007/s10723-023-09692-9
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents