Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2488222.2488267acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
research-article
Open access

Adaptive online scheduling in storm

Published: 29 June 2013 Publication History

Abstract

Today we are witnessing a dramatic shift toward a data-driven economy, where the ability to efficiently and timely analyze huge amounts of data marks the difference between industrial success stories and catastrophic failures. In this scenario Storm, an open source distributed realtime computation system, represents a disruptive technology that is quickly gaining the favor of big players like Twitter and Groupon. A Storm application is modeled as a topology, i.e. a graph where nodes are operators and edges represent data flows among such operators. A key aspect in tuning Storm performance lies in the strategy used to deploy a topology, i.e. how Storm schedules the execution of each topology component on the available computing infrastructure.
In this paper we propose two advanced generic schedulers for Storm that provide improved performance for a wide range of application topologies. The first scheduler works offline by analyzing the topology structure and adapting the deployment to it; the second scheduler enhance the previous approach by continuously monitoring system performance and rescheduling the deployment at run-time to improve overall performance. Experimental results show that these algorithms can produce schedules that achieve significantly better performances compared to those produced by Storm's default scheduler.

References

[1]
Gartner says big data creates big jobs: 4.4 million it jobs globally to support big data by 2015. http://www.gartner.com/newsroom/id/2207915.
[2]
Storm. http://storm-project.net/.
[3]
Streammine3g. https://streammine3g.inf.tu-dresden.de/trac.
[4]
Data never sleeps infographic. http://www.domo.com/learn/7/236#videos-and-infographics, 2012.
[5]
D. J. Abadi, Y. Ahmad, M. Balazinska, M. Cherniack, J. hyon Hwang, W. Lindner, A. S. Maskey, E. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The design of the borealis stream processing engine. In Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research, 2005.
[6]
L. Amini, H. Andrade, R. Bhagwan, F. Eskesen, R. King, P. Selo, Y. Park, and C. Venkatramani. Spc: a distributed, scalable platform for data mining. In Proceedings of the 4th international workshop on Data mining standards, services and platforms, 2006.
[7]
L. Aniello, L. Querzoni, and R. Baldoni. Input data organization for batch processing in time window based computations. In Proceedings of the 28th Symposium On Applied Computing, 2013.
[8]
R. Baldoni, G. A. Di Luna, D. Firmani, and G. Lodi. A model for continuous query latencies in data streams. In Proceedings of the 1st International Workshop on Algorithms and Models for Distributed Event Processing, 2011.
[9]
S. Bansal, P. Kumar, and K. Singh. An improved duplication strategy for scheduling precedence constrained graphs in multiprocessor systems. IEEE Transactions on Parallel and Distributed Systems, 2003.
[10]
Y. Baram, R. El-Yaniv, and K. Luz. Online choice of active learning algorithms. The Journal of Machine Learning Research, 2004.
[11]
A. Brito, A. Martin, T. Knauth, S. Creutz, D. Becker, S. Weigert, and C. Fetzer. Scalable and low-latency data processing with stream mapreduce. In Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and Science, 2011.
[12]
M. Cammert, C. Heinz, J. Kramer, B. Seeger, S. Vaupel, and U. Wolske. Flexible multi-threaded scheduling for continuous queries over data streams. In Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop, 2007.
[13]
T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. Mapreduce online. In Proceedings of the 7th USENIX conference on Networked systems design and implementation, 2010.
[14]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 2008.
[15]
C. Eaton, D. Deroos, T. Deutsch, G. Lapis, and P. Zikopoulos. Understanding Big Data. Mc Graw Hill, 2012.
[16]
A. H. Hormati, Y. Choi, M. Kudlur, R. Rabbah, T. Mudge, and S. Mahlke. Flextream: Adaptive compilation of streaming applications for heterogeneous architectures. In Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques, 2009.
[17]
F. P. Junqueira and B. C. Reed. The life and times of a zookeeper. In Proceedings of the 21st annual symposium on Parallelism in algorithms and architectures, 2009.
[18]
Y.-K. Kwok and I. Ahmad. Dynamic critical-path scheduling: An effective technique for allocating task graphs to multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 1996.
[19]
G. T. Lakshmanan, Y. G. Rabinovich, and O. Etzion. A stratified approach for supporting high throughput event processing applications. In Proceedings of the 3rd ACM International Conference on Distributed Event-Based Systems, 2009.
[20]
A. Martin, C. Fetzer, and A. Brito. Active replication at (almost) no cost. In Proceedings of the 2011 IEEE 30th International Symposium on Reliable Distributed Systems, 2011.
[21]
A. Martin, T. Knauth, S. Creutz, D. Becker, S. Weigert, C. Fetzer, and A. Brito. Low-overhead fault tolerance for high-throughput data processing systems. In Proceedings of the 2011 31st International Conference on Distributed Computing Systems, 2011.
[22]
L. A. Moakar, A. Labrinidis, and P. K. Chrysanthis. Adaptive class-based scheduling of continuous queries. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering Workshops, 2012.
[23]
L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In Proceedings of the 2010 IEEE International Conference on Data Mining Workshops, 2010.
[24]
P. Pietzuch, J. Ledlie, J. Shneidman, M. Roussopoulos, M. Welsh, and M. Seltzer. Network-aware operator placement for stream-processing systems. In Proceedings of the 22nd International Conference on Data Engineering, 2006.
[25]
M. A. Sharaf, P. K. Chrysanthis, and A. Labrinidis. Preemptive rate-based operator scheduling in a data stream management system. In Proceedings of the ACS/IEEE 2005 International Conference on Computer Systems and Applications, 2005.
[26]
M. A. Suleman, M. K. Qureshi, Khubaib, and Y. N. Patt. Feedback-directed pipeline parallelism. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, 2010.
[27]
T. White. Hadoop: The Definitive Guide. O'Reilly Media, Inc., 2012.
[28]
J. Wolf, N. Bansal, K. Hildrum, S. Parekh, D. Rajan, R. Wagle, K.-L. Wu, and L. Fleischer. Soda: an optimizing scheduler for large-scale stream-based distributed computer systems. In Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware, 2008.
[29]
Y. Xing, J.-H. Hwang, U. Çetintemel, and S. Zdonik. Providing resiliency to load variations in distributed stream processing. In Proceedings of the 32nd international conference on Very large data bases, 2006.
[30]
Y. Xing, S. Zdonik, and J.-H. Hwang. Dynamic load distribution in the borealis stream processor. In Proceedings of the 21st International Conference on Data Engineering, 2005.

Cited By

View all
  • (2024)Efficient Placement of Decomposable Aggregation Functions for Stream Processing over Large Geo-Distributed TopologiesProceedings of the VLDB Endowment10.14778/3648160.364818617:6(1501-1514)Online publication date: 1-Feb-2024
  • (2024)Fault Tolerance Placement in the Internet of ThingsProceedings of the ACM on Management of Data10.1145/36549412:3(1-29)Online publication date: 30-May-2024
  • (2024)Costream: Learned Cost Models for Operator Placement in Edge-Cloud Environments2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00015(96-109)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DEBS '13: Proceedings of the 7th ACM international conference on Distributed event-based systems
June 2013
360 pages
ISBN:9781450317580
DOI:10.1145/2488222
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2013

Permissions

Request permissions for this article.

Check for updates

Badges

  • Test of Time Paper

Author Tags

  1. cep
  2. distributed event processing
  3. scheduling
  4. storm

Qualifiers

  • Research-article

Conference

DEBS '13

Acceptance Rates

DEBS '13 Paper Acceptance Rate 16 of 58 submissions, 28%;
Overall Acceptance Rate 145 of 583 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)135
  • Downloads (Last 6 weeks)11
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Placement of Decomposable Aggregation Functions for Stream Processing over Large Geo-Distributed TopologiesProceedings of the VLDB Endowment10.14778/3648160.364818617:6(1501-1514)Online publication date: 1-Feb-2024
  • (2024)Fault Tolerance Placement in the Internet of ThingsProceedings of the ACM on Management of Data10.1145/36549412:3(1-29)Online publication date: 30-May-2024
  • (2024)Costream: Learned Cost Models for Operator Placement in Edge-Cloud Environments2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00015(96-109)Online publication date: 13-May-2024
  • (2024)Online Nonstop Task Management for Storm-Based Distributed Stream Processing EnginesJournal of Computer Science and Technology10.1007/s11390-021-1629-939:1(116-138)Online publication date: 1-Feb-2024
  • (2024)Optimizing task allocation in multi-query edge analyticsCluster Computing10.1007/s10586-024-04427-127:6(8289-8306)Online publication date: 1-Sep-2024
  • (2024)GT-scheduler: a hybrid graph-partitioning and tabu-search based task scheduler for distributed data stream processing systemsCluster Computing10.1007/s10586-023-04260-y27:5(5815-5832)Online publication date: 13-Feb-2024
  • (2024)Lc‐Stream: An elastic scheduling strategy with latency constraints in geo‐distributed stream computing environmentsConcurrency and Computation: Practice and Experience10.1002/cpe.808536:14Online publication date: 20-Mar-2024
  • (2023)A scheduling algorithm to maximize storm throughput in heterogeneous clusterJournal of Big Data10.1186/s40537-023-00771-y10:1Online publication date: 17-Jun-2023
  • (2023)Fine-Grained Elasticity for Big Data Stream Processing of IoT Applications2023 RIVF International Conference on Computing and Communication Technologies (RIVF)10.1109/RIVF60135.2023.10471874(487-492)Online publication date: 23-Dec-2023
  • (2023)H-Storm: A Hybrid CPU-FPGA Architecture to Accelerate Apache StormJournal of Grid Computing10.1007/s10723-023-09692-921:4Online publication date: 7-Nov-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media