Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2967938.2967957acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Auto-tuning Spark Big Data Workloads on POWER8: Prediction-Based Dynamic SMT Threading

Published: 11 September 2016 Publication History

Abstract

Much research work devotes to tuning big data analytics in modern data centers, since %the truth that even a small percentage of performance improvement immediately translates to huge cost savings because of the large scale. Simultaneous multithreading (SMT) receives great interest from data center communities, as it has the potential to boost performance of big data analytics by increasing the processor resources utilization. For example, the emerging processor architectures like POWER8 support up to 8-way multithreading. However, as different big data workloads have disparate architectural characteristics, how to identify the most efficient SMT configuration to achieve the best performance is challenging in terms of both complex application behaviors and processor architectures. In this paper, we specifically focus on auto-tuning SMT configuration for Spark-based big data workloads on POWE-R8. However, our methodology could be generalized and extended to other programming software stacks and other architectures.
We propose a prediction-based dynamic SMT threading (PBDST) framework to adjust the thread count in SMT cores on POWER8 processors by using versatile machine learning algorithms.Its innovation lies in adopting online SMT configuration predictions derived from micro-architecture level profiling, to regulate the thread counts that could achieve nearly optimal performance. Moreover it is implemented at Spark software stack layer and transparent to user applications. After evaluating a large set of machine learning algorithms, we choose the most efficient ones to perform online predictions. The experimental results demonstrate that our approach can achieve up to 56.3% performance improvement and an average performance gain of 16.2% in comparison with the default configuration---the maximum SMT configuration---SMT8 on our system.

References

[1]
Apache Flink. https://flink.apache.org/index.html.
[2]
Performance counters for linux. https://perf.wiki.kernel.org/index.php.
[3]
Spark Performance Tests. In Webpage: https://github.com/databricks/spark-perf.
[4]
D. W. Aha, D. Kibler, and M. K. Albert. Instance-based learning algorithms. Machine learning, 6(1):37--66, 1991.
[5]
BAE Detica. The big data refinery: Distilling intelligence from big data. White Paper, https://www.baesystemsdetica.com/uploads/resources/Big_Data_Refinery_Whitepaper_single_pages_19.06.12.pdf, July 2012.
[6]
R. Bertran, M. Gonzalez, X. Martorell, N. Navarro, and E. Ayguade. Decomposable and responsive power models for multicore processors using performance counters. In Proceedings of the 24th ACM International Conference on Supercomputing, pages 147--158. ACM, 2010.
[7]
L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classification and regression trees. CRC press, 1984.
[8]
T. Creech, A. Kotha, and R. Barua. Efficient multiprogramming for multicores with scaf. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pages 334--345. ACM, 2013.
[9]
M. Curtis-Maury, F. Blagojevic, C. D. Antonopoulos, and D. S. Nikolopoulos. Prediction-based power-performance adaptation of multithreaded scientific codes. IPDPS, 19(10):1396--1410, 2008.
[10]
M. Curtis-Maury, J. Dzierwa, C. D. Antonopoulos, and D. S. Nikolopoulos. Online power-performance adaptation of multithreaded programs using hardware event-based prediction. In Proceedings of the 20th annual international conference on Supercomputing, pages 157--166. ACM, 2006.
[11]
M. Curtis-Maury, A. Shah, F. Blagojevic, et al. Prediction models for multi-dimensional power-performance optimization on many cores. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 250--259. ACM, 2008.
[12]
M. De Vuyst, R. Kumar, and D. M. Tullsen. Exploiting unbalanced thread scheduling for energy and performance on a cmp of smt processors. In 20th International Parallel and Distributed Processing Symposium, pages 10--pp. IEEE, 2006.
[13]
S. Eyerman and L. Eeckhout. Probabilistic job symbiosis modeling for smt processor scheduling. ACM Sigplan Notices, 45(3):91--102, 2010.
[14]
M. Ferdman, A. Adileh, O. Kocberber, et al. Clearing the clouds. In Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2012.
[15]
Y. Freund, R. Schapire, and N. Abe. A short introduction to boosting. Journal-Japanese Society For Artificial Intelligence, 14(771--780):1612, 1999.
[16]
J. R. Funston, K. El Maghraoui, J. Jann, et al. An smt-selection metric to improve multithreaded applications' performance. In Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pages 1388--1399. IEEE, 2012.
[17]
P. Harrington. Machine learning in action. Manning, 2012.
[18]
T. Hastie, R. Tibshirani, J. Friedman, and J. Franklin. The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2):83--85, 2005.
[19]
W. Heirman, T. E. Carlson, K. Van Craeynest, et al. Automatic smt threading for openmp applications on the intel xeon phi co-processor. In Proceedings of the 4th International Workshop on Runtime and Operating Systems for Supercomputers. ACM, 2014.
[20]
W. Heirman, T. E. Carlson, K. Van Craeynest, et al. Undersubscribed threading on clustered cache architectures. In IEEE 20th International Symposium on High Performance Computer Architecture, pages 678--689. IEEE, 2014.
[21]
C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. Neural Networks, IEEE Transactions on, 13(2):415--425, 2002.
[22]
W. Huang, J. Lin, Z. Zhang, and J. M. Chang. Performance characterization of java applications on smt processors. In IEEE International Symposium on Performance Analysis of Systems and Software, pages 102--111. IEEE, 2005.
[23]
IDC. The digital universe in 2020: A universe of opportunities and challenges. http://www.emc.com/leadership/digital-universe/iview/executive-summary-a-universe-of.htm, 2012.
[24]
Z. Jia, L. Wang, J. Zhan, L. Zhang, and C. Luo. Characterizing data analysis workloads in data centers. In IEEE International Symposium on Workload Characterization, pages 66--76. IEEE, 2013.
[25]
G. H. John and P. Langley. Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 338--345. Morgan Kaufmann Publishers Inc., 1995.
[26]
C. Jung, D. Lim, J. Lee, and S. Han. Adaptive execution techniques for smt multiprocessor architectures. In Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 236--246. ACM, 2005.
[27]
J. Lee, J.-H. Park, H. Kim, C. Jung, D. Lim, and S. Han. Adaptive execution techniques of parallel programs for multiprocessors. Journal of Parallel and Distributed Computing, 70(5):467--480, 2010.
[28]
J. Li and J. F. Martinez. Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In The Twelfth International Symposium on High-Performance Computer Architecture, pages 77--87. IEEE, 2006.
[29]
M. Li, G. Chen, Q. Wang, et al. Pater: A hardware prefetching automatic tuner on ibm power8 processor. IEEE Computer Architecture Letters.
[30]
M. Li, J. Tan, Y. Wang, et al. Sparkbench: a comprehensive benchmarking suite for in memory data analytic platform spark. In Proceedings of the 12th ACM International Conference on Computing Frontiers, page 53. ACM, 2015.
[31]
S.-w. Liao, T.-H. Hung, D. Nguyen, et al. Machine learning-based prefetch optimization for data center applications. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, page 56. ACM, 2009.
[32]
D. Lo and C. Kozyrakis. Dynamic management of turbomode in modern multi-core chips. In High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on, pages 603--613. IEEE, 2014.
[33]
H. M. Mathis, A. E. Mericas, J. D. McCalpin, et al. Characterization of simultaneous multithreading (smt) efficiency in power5. IBM Journal of Research and Development, 49(4.5):555--564, 2005.
[34]
D. C. Montgomery, E. A. Peck, and G. G. Vining. Introduction to linear regression analysis, volume 821. John Wiley & Sons, 2012.
[35]
T. Moseley, J. L. Kihm, D. A. Connors, and D. Grunwald. Methods for modeling resource contention on simultaneous multithreading processors. In 2005 International Conference on Computer Design, pages 373--380. IEEE.
[36]
J. C. Platt. Using analytic qp and sparseness to speed training of support vector machines. Advances in neural information processing systems, pages 557--563, 1999.
[37]
J. R. Quinlan. Induction of decision trees. Machine learning, 1(1):81--106, 1986.
[38]
Y. Ruan, V. S. Pai, E. Nahum, and J. M. Tracey. Evaluating the impact of simultaneous multithreading on network servers using real hardware. In ACM SIGMETRICS Performance Evaluation Review, volume 33, pages 315--326. ACM, 2005.
[39]
A. Settle, J. Kihm, A. Janiszewski, and D. Connors. Architectural support for enhanced smt job scheduling. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pages 63--73. IEEE Computer Society, 2004.
[40]
B. Sinharoy, J. A. V. Norstrand, R. J. Eickemeyer, et al. Ibm power8 processor core microarchitecture. IBM Journal of Research and Development, 59(1):2:1--2:21, Jan 2015.
[41]
B. Sinharoy, R. Swanberg, N. Nayar, et al. Advanced features in ibm power8 systems. IBM Journal of Research and Development, 59(1):1:1--1:18, Jan 2015.
[42]
A. Snavely and D. M. Tullsen. Symbiotic job scheduling for a simultaneous mutlithreading processor. In International Conference on Architecture Support for Programming Languages and Operating Systems., pages 234--244. ACM, 2000.
[43]
D. M. Tullsen, S. J. Eggers, J. S. Emer, et al. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In ACM SIGARCH Computer Architecture News, volume 24, pages 191--202. ACM, 1996.
[44]
A. Vega, A. Buyuktosunoglu, H. Hanson, P. Bose, and S. Ramani. Crank it up or dial it down: coordinated multiprocessor frequency and folding control. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pages 210--221. ACM, 2013.
[45]
L. Wang, J. Zhan, C. Luo, et al. Bigdatabench: a big data benchmark suite from internet services. In 20th International Symposium on High Performance Computer Architecture, pages 488--499. IEEE, 2014.
[46]
T. White. Hadoop: The Definitive Guide. O'Reilly Media, 2009.
[47]
M. Zaharia, M. Chowdhury, et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In USENIX Conference on Networked Systems Design and Implementation, 2012.
[48]
Y. Zhang, M. Burcea, V. Cheng, et al. An adaptive openmp loop scheduler for hyperthreaded smps. In International Conference on Parallel and Distributed Systems, pages 256--263, 2004.
[49]
Y. Zhang, M. A. Laurenzano, J. Mars, and L. Tang. Smite: Precise qos prediction on real-system smt processors to improve utilization in warehouse scale computers. In the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pages 406--418, 2014.
[50]
Y. Zhang, M. Voss, and E. Rogers. Runtime empirical selection of loop schedulers on hyperthreaded smps. In 19th IEEE International Parallel and Distributed Processing Symposium, pages 44b--44b, 2005.

Cited By

View all
  • (2023)Towards General and Efficient Online Tuning for SparkProceedings of the VLDB Endowment10.14778/3611540.361154816:12(3570-3583)Online publication date: 12-Sep-2023
  • (2023)SimCost: cost-effective resource provision prediction and recommendation for spark workloadsDistributed and Parallel Databases10.1007/s10619-023-07436-y42:1(73-102)Online publication date: 22-Jun-2023
  • (2022)Runtime prediction of big data jobs: performance comparison of machine learning algorithms and analytical modelsJournal of Big Data10.1186/s40537-022-00623-19:1Online publication date: 19-May-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation
September 2016
474 pages
ISBN:9781450341219
DOI:10.1145/2967938
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dynamic smt tuning
  2. power8
  3. spark big data

Qualifiers

  • Research-article

Conference

PACT '16
Sponsor:
  • IFIP WG 10.3
  • IEEE TCCA
  • SIGARCH
  • IEEE CS TCPP

Acceptance Rates

PACT '16 Paper Acceptance Rate 31 of 119 submissions, 26%;
Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)6
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Towards General and Efficient Online Tuning for SparkProceedings of the VLDB Endowment10.14778/3611540.361154816:12(3570-3583)Online publication date: 12-Sep-2023
  • (2023)SimCost: cost-effective resource provision prediction and recommendation for spark workloadsDistributed and Parallel Databases10.1007/s10619-023-07436-y42:1(73-102)Online publication date: 22-Jun-2023
  • (2022)Runtime prediction of big data jobs: performance comparison of machine learning algorithms and analytical modelsJournal of Big Data10.1186/s40537-022-00623-19:1Online publication date: 19-May-2022
  • (2022)LOCAT: Low-Overhead Online Configuration Auto-Tuning of Spark SQL ApplicationsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526157(674-684)Online publication date: 10-Jun-2022
  • (2022)ConEx: Efficient Exploration of Big-Data System Configurations for Better PerformanceIEEE Transactions on Software Engineering10.1109/TSE.2020.300756048:3(893-909)Online publication date: 1-Mar-2022
  • (2022)Adaptive Code Learning for Spark Configuration Tuning2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00195(1995-2007)Online publication date: May-2022
  • (2021)Intelligent Adaptation of Hardware Knobs for Improving Performance and Power ConsumptionIEEE Transactions on Computers10.1109/TC.2020.298023070:1(1-16)Online publication date: 1-Jan-2021
  • (2020)Tools for Reduced Precision ComputationACM Computing Surveys10.1145/338103953:2(1-35)Online publication date: 17-Apr-2020
  • (2020)A Survey on Automatic Parameter Tuning for Big Data Processing SystemsACM Computing Surveys10.1145/338102753:2(1-37)Online publication date: 26-Apr-2020
  • (2020)A Survey of Profit Optimization Techniques for Cloud ProvidersACM Computing Surveys10.1145/337691753:2(1-35)Online publication date: 20-Mar-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media