Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A Stepwise Auto-Profiling Method for Performance Optimization of Streaming Applications

Published: 14 November 2017 Publication History

Abstract

Data stream management systems (DSMSs) are scalable, highly available, and fault-tolerant systems that aggregate and analyze real-time data in motion. To continuously perform analytics on the fly within the stream, state-of-the-art DSMSs host streaming applications as a set of interconnected operators, with each operator encapsulating the semantic of a specific operation. For parallel execution on a particular platform, these operators need to be appropriately replicated in multiple instances that split and process the workload simultaneously. Because the way operators are partitioned affects the resulting performance of streaming applications, it is essential for DSMSs to have a method to compare different operators and make holistic replication decisions to avoid performance bottlenecks and resource wastage. To this end, we propose a stepwise profiling approach to optimize application performance on a given execution platform. It automatically scales distributed computations over streams based on application features and processing power of provisioned resources and builds the relationship between provisioned resources and application performance metrics to evaluate the efficiency of the resulting configuration. Experimental results confirm that the proposed approach successfully fulfills its goals with minimal profiling overhead.

References

[1]
Daniel J. Abadi, Don Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. 2003. Aurora: A new model and architecture for data stream management. VLDB J. 12, 2 (Aug. 2003), 120--139.
[2]
Lisa Amini, Navendu Jain, Anshul Sehgal, Jeremy Silber, and Olivier Verscheure. 2006. Adaptive control of extreme-scale stream processing systems. In Proceedings of the 26th IEEE International Conference on Distributed Computing Systems (ICDCS’06). IEEE Computer Society, 71--77.
[3]
Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni. 2013. Adaptive online scheduling in storm. In Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems (DEBS’13). ACM, 207--218.
[4]
Joshua Auerbach, David F. Bacon, Perry Cheng, and Rodric Rabbah. 2010. Lime: A java-compatible and synthesizable language for heterogeneous architectures. In Proceedings of the ACM International Conference on Object-Oriented Programming Systems Languages and Applications (OOPSLA’10). ACM, 89--108.
[5]
Paolo Bellavista, Antonio Corradi, Andrea Reale, and Nicola Ticca. 2014. Priority-based resource scheduling in distributed stream processing systems for big data applications. In Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing (UCC’14). IEEE, 363--370.
[6]
Michael Cammert, Christoph Heinz, Jurgen Kramer, Bernhard Seeger, Sonny Vaupel, and Udo Wolske. 2007. Flexible multi-threaded scheduling for continuous queries over data streams. In Proceedings of the 23rd IEEE International Conference on Data Engineering Workshop (ICDE’07). IEEE, 624--633.
[7]
Valeria Cardellini, Vincenzo Grassi, Francesco Lo Presti, and Matteo Nardelli. 2015. Distributed QoS-aware scheduling in storm. In Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems (DEBS’15). ACM, 344--347.
[8]
Valeria Cardellini, Matteo Nardelli, and Dario Luzi. 2016. Elastic stateful stream processing in storm. In Proceedings of the 2016 International Conference on High Performance Computing Simulation (HPCS’16). IEEE, 583--590.
[9]
Raul Castro Fernandez, Matteo Migliavacca, Evangelia Kalyvianaki, and Peter Pietzuch. 2013. Integrating scale out and fault tolerance in stream processing using operator state management. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, 725--736.
[10]
Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong, Sailesh Krishnamurthy, Samuel R. Madden, Fred Reiss, and Mehul A. Shah. 2003. TelegraphCQ: Continuous dataflow processing. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD’03). ACM, 668--668.
[11]
Andreas Chatzistergiou and Stratis D. Viglas. 2014. Fast heuristics for near-optimal task allocation in data stream processing over clusters. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM’14). ACM, 1579--1588.
[12]
Jianjun Chen, David J. DeWitt, Feng Tian, and Yuan Wang. 2000. NiagaraCQ: A scalable continuous query system for internet databases. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD’00). ACM, 379--390.
[13]
Tathagata Das, Yuan Zhong, Ion Stoica, and Scott Shenker. 2014. Adaptive stream processing using dynamic batch sizing. In Proceedings of the ACM Symposium on Cloud Computing (SOCC’14). ACM, 1--13.
[14]
Anh Vu Do, Junliang Chen, Chen Wang, Young Choon Lee, A. Y. Zomaya, and Bing Bing Zhou. 2011. Profiling applications for virtual machine placement in clouds. In Proceedings of the IEEE International Conference on Cloud Computing (CLOUD’11). IEEE, 660--667.
[15]
Lorenz Fischer and Abraham Bernstein. 2015. Workload scheduling in distributed stream processors using graph partitioning. In Proceedings of the 2015 IEEE International Conference on Big Data (BigData’15). IEEE Computer Society, 124--133.
[16]
Lorenz Fischer, Shen Gao, and Abraham Bernstein. 2015. Machines tuning machines: Configuring distributed stream processors with Bayesian Optimization. In Proceedings of the 2015 IEEE International Conference on Cluster Computing (CLUSTER’15). IEEE, 22--31.
[17]
Tom Z. J. Fu, Jianbing Ding, Richard T. B. Ma, Marianne Winslett, Yin Yang, and Zhenjie Zhang. 2015. DRS: Dynamic resource scheduling for real-time analytics over fast streams. In Proceedings of the IEEE 35th International Conference on Distributed Computing Systems (ICDCS’15). IEEE, 411--420.
[18]
Bugra Gedik, Scott Schneider, Martin Hirzel, and Kun-Lung Wu. 2014. Elastic scaling for data stream processing. IEEE Trans. Parallel Distrib. Syst. 25, 6 (June 2014), 1447--1463.
[19]
Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, and Saman Amarasinghe. 2002. A stream compiler for communication-exposed architectures. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’02). ACM, 291--303.
[20]
Vincenzo Gulisano, Ricardo Jimenez-Peris, Marta Patino-Martinez, Claudio Soriente, and Patrick Valduriez. 2012. StreamCloud: An elastic and scalable data streaming system. IEEE Transactions on Parallel and Distributed Systems 23, 12 (Dec. 2012), 2351--2365.
[21]
Thomas Heinze, Leonardo Aniello, Leonardo Querzoni, and Zbigniew Jerzak. 2014a. Cloud-based data stream processing. In Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems (DEBS’14). ACM, 238--245.
[22]
Thomas Heinze, Valerio Pappalardo, Zbigniew Jerzak, and Christof Fetzer. 2014b. Auto-scaling techniques for elastic data stream processing. In Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems (DEBS’14). ACM, 318--321.
[23]
Thomas Heinze, Lars Roediger, Andreas Meister, Yuanzhen Ji, Zbigniew Jerzak, and Christof Fetzer. 2015. Online parameter optimization for elastic data stream processing. In Proceedings of the 6th ACM Symposium on Cloud Computing (SoCC’15). ACM, 276--287.
[24]
Nicolas Hidalgo, Daniel Wladdimiro, and Erika Rosas. 2017. Self-adaptive processing graph with operator fission for elastic stream processing. Journal of Systems and Software 127 (2017), 205--216.
[25]
Amir H. Hormati, Yoonseo Choi, Manjunath Kudlur, Rodric Rabbah, Trevor Mudge, and Scott Mahlke. 2009. Flextream: Adaptive compilation of streaming applications for heterogeneous architectures. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT’09). ACM, 214--223.
[26]
Waldemar Hummer, Benjamin Satzger, and Schahram Dustdar. 2013. Elastic stream processing in the cloud. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 3, 5 (Sept. 2013), 333--345.
[27]
Navendu Jain, Lisa Amini, Henrique Andrade, Richard King, Yoonho Park, Philippe Selo, and Chitra Venkatramani. 2006. Design, implementation, and evaluation of the linear road benchmark on the stream processing core. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’06). ACM, 431--442.
[28]
Jeffrey O. Kephart and David M. Chess. 2003. The vision of autonomic computing. Computer 36, 1 (Jan. 2003), 41--50.
[29]
Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter heron: Stream processing at scale. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’15). ACM, 239--250.
[30]
Teng Li, Jian Tang, and Jielong Xu. 2015. A predictive scheduling framework for fast and distributed stream data processing. In Proceedings of the 2015 IEEE International Conference on Big Data (BigData’15). IEEE Computer Society, 333--338.
[31]
Bjn Lohrmann, Peter Janacik, and Odej Kao. 2015. Elastic stream processing with latency guarantees. In Proceedings of the 2015 IEEE 35th International Conference on Distributed Computing Systems (ICDCS’15). IEEE, 399--410.
[32]
Bjrn Lohrmann, Daniel Warneke, and Odej Kao. 2014. Nephele streaming: Stream processing under QoS constraints at scale. Cluster Comput. 17, 1 (2014), 61--78.
[33]
Kasper Grud Skat Madsen, Yongluan Zhou, and Li Su. 2016. Enorm: Efficient window-based computation in large-scale distributed stream processing systems. In Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems (DEBS’16). ACM, 37--48.
[34]
Tiziano De Matteis and Gabriele Mencagli. 2017. Proactive elasticity and energy awareness in data stream processing. Journal of Systems and Software 127 (2017), 302--319.
[35]
Lory Al Moakar, Alexandros Labrinidis, and Panos K. Chrysanthis. 2012. Adaptive class-based scheduling of continuous queries. In Proceeding of the 28th IEEE International Conference on Data Engineering Workshop (ICDE’12). IEEE, 289--294.
[36]
Boyang Peng, Mohammad Hosseini, Zhihao Hong, Reza Farivar, and Roy Campbell. 2015. R-Storm: Resource-aware scheduling in Storm. In Proceedings of the 16th Annual Middleware Conference (Middleware’15). ACM, 149--161.
[37]
Feng Qian, Zhaoguang Wang, Alexandre Gerber, Zhuoqing Mao, Subhabrata Sen, and Oliver Spatscheck. 2011. Profiling resource usage for mobile applications: A cross-layer approach. In Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services (MobiSys’11). ACM, 321--334.
[38]
Zhengping Qian, Yong He, Chunzhi Su, Zhuojie Wu, Hongyu Zhu, Taizhi Zhang, Lidong Zhou, Yuan Yu, and Zheng Zhang. 2013. TimeStream: Reliable stream computation in the cloud. In Proceedings of the European Conference on Computer Systems (EuroSys’13). ACM, 1--14.
[39]
Sajith Ravindra, Miyuru Dayarathna, and Sanath Jayasena. 2017. Latency aware elastic switching-based stream processing over compressed data streams. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering (ICPE’17). ACM, 91--102.
[40]
Scott Schneider, Henrique Andrade, Bugra Gedik, Alain Biem, and Kun-Lung Wu. 2009. Elastic scaling of data parallel operators in stream processing. In Proceedings of the IEEE International Symposium on Parallel Distributed Processing (IPDPS’09). IEEE, 1--12.
[41]
Scott Schneider, Martin Hirzel, Bugra Gedik, and Kun-Lung Wu. 2012. Auto-parallelizing stateful distributed streaming applications. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). ACM, 53--64.
[42]
Du Shen, Qi Luo, Denys Poshyvanyk, and Mark Grechanik. 2015. Automating performance bottleneck detection using search-based application profiling. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA’15). ACM, 270--281.
[43]
Muhammad Aater Suleman, Moinuddin K. Qureshi, Khubaib, and Yale N. Patt. 2010. Feedback-directed pipeline parallelism. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). ACM, 147--156.
[44]
Dawei Sun, Ge Fu, Xinran Liu, and Hong Zhang. 2014. Optimizing data stream graph for big data stream computing in cloud datacenter environments. Int. J. Adv. Comput. Technol. 6, 5 (2014), 53--65.
[45]
Dawei Sun, Guangyan Zhang, Songlin Yang, Weimin Zheng, Samee U. Khan, and Keqin Li. 2015. Re-Stream: Real-time and energy-efficient resource scheduling in big data stream computing environments. Info. Sci. 319 (Oct. 2015), 92--112.
[46]
Bhuvan Urgaonkar, Prashant Shenoy, and Timothy Roscoe. 2002. Resource overbooking and application profiling in shared hosting platforms. SIGOPS Oper. Syst. Rev. 36, SI (Dec. 2002), 239--254.
[47]
Rafael Weingartner, Gabriel Beims Brascher, and Carlos Becker Westphall. 2015. Cloud resource management: A survey on forecasting and profiling models. J. Netw. Comput. Appl. 47 (2015), 99--106.
[48]
Joel Wolf, Nikhil Bansal, Kirsten Hildrum, Sujay Parekh, Deepak Rajan, Rohit Wagle, Kun-Lung Wu, and Lisa Fleischer. 2008. SODA: An optimizing scheduler for large-scale stream-based distributed computer systems. In Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware (Middleware’08). Springer-Verlag, 306--325.
[49]
Yingjun Wu and Kian-Lee Tan. 2015. ChronoStream: Elastic stateful stream computation in the cloud. In Proceedings of the 2015 IEEE 31st International Conference on Data Engineering. IEEE, 723--734.
[50]
Ying Xing, Stan Zdonik, and Jeong-Hyon Hwang. 2005. Dynamic load distribution in the borealis stream processor. In Proceedings of the 21st International Conference on Data Engineering (ICDE’05). IEEE Computer Society, 791--802.
[51]
Jielong Xu, Zhenhua Chen, Jian Tang, and Sen Su. 2014. T-Storm: Traffic-aware online scheduling in storm. In Proceedings of the 2014 IEEE 34th International Conference on Distributed Computing Systems (ICDCS’14). IEEE Computer Society, 535--544.
[52]
Le Xu, Boyang Peng, and Indranil Gupta. 2016. Stela: Enabling stream processing systems to scale-in and scale-out on-demand. In Proceedings of the 2016 IEEE International Conference on Cloud Engineering (IC2E’16). IEEE, 22--31.

Cited By

View all
  • (2024)Evaluating Stream Processing AutoscalersProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666036(110-122)Online publication date: 24-Jun-2024
  • (2024)Bayesian-Driven Automated Scaling in Stream Computing With Multiple QoS TargetsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339983435:7(1251-1267)Online publication date: Jul-2024
  • (2023)Hierarchical Auto-scaling Policies for Data Stream Processing on Heterogeneous ResourcesACM Transactions on Autonomous and Adaptive Systems10.1145/359743518:4(1-44)Online publication date: 16-May-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Autonomous and Adaptive Systems
ACM Transactions on Autonomous and Adaptive Systems  Volume 12, Issue 4
December 2017
224 pages
ISSN:1556-4665
EISSN:1556-4703
DOI:10.1145/3155314
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2017
Accepted: 01 July 2017
Revised: 01 May 2017
Received: 01 November 2016
Published in TAAS Volume 12, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Stream processing
  2. data stream management systems
  3. performance optimization
  4. resource management

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)5
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Evaluating Stream Processing AutoscalersProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666036(110-122)Online publication date: 24-Jun-2024
  • (2024)Bayesian-Driven Automated Scaling in Stream Computing With Multiple QoS TargetsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339983435:7(1251-1267)Online publication date: Jul-2024
  • (2023)Hierarchical Auto-scaling Policies for Data Stream Processing on Heterogeneous ResourcesACM Transactions on Autonomous and Adaptive Systems10.1145/359743518:4(1-44)Online publication date: 16-May-2023
  • (2023)An elastic and traffic-aware scheduler for distributed data stream processing in heterogeneous clustersThe Journal of Supercomputing10.1007/s11227-022-04669-z79:1(461-498)Online publication date: 1-Jan-2023
  • (2023)On combining system and machine learning performance tuning for distributed data stream applicationsDistributed and Parallel Databases10.1007/s10619-023-07434-041:3(411-438)Online publication date: 17-May-2023
  • (2023)Mjolnir: A framework agnostic auto-tuning system with deep reinforcement learningApplied Intelligence10.1007/s10489-022-03956-953:11(14008-14022)Online publication date: 1-Jun-2023
  • (2023)Online and transparent self-adaptation of stream parallel patternsComputing10.1007/s00607-021-00998-8105:5(1039-1057)Online publication date: 1-May-2023
  • (2022)Runtime Adaptation of Data Stream Processing Systems: The State of the ArtACM Computing Surveys10.1145/351449654:11s(1-36)Online publication date: 9-Sep-2022
  • (2022)Efficient Runtime Profiling for Black-box Machine Learning Services on Sensor Streams2022 IEEE 6th International Conference on Fog and Edge Computing (ICFEC)10.1109/ICFEC54809.2022.00020(88-93)Online publication date: May-2022
  • (2022)Automatic Performance Tuning for Distributed Data Stream Processing Systems2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00296(3194-3197)Online publication date: May-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media