Abstract
We propose a deterministic analytical model that considers dynamic allocation of spark executors while predicting execution time of spark applications. Our new model uses idle time and backlog time metrics to determine whether to add or remove executors. Following the update of executors, this model traverses every stage of a direct acyclic graph using a graph traversal algorithm. We repeat this process until the total execution time of the spark application is calculated. We validate our model against the measured execution time for Query-52 and K-Means workloads that reveal error rates of 4.96% and 4.74%, respectively. A comparison of our model to four classic machine learning models indicates that it is more effective than linear regression, neural networks, decision trees, and random forest. To the best of our knowledge, this is the first deterministic analytical model that accounts for dynamic allocation of executors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tariq, H., Das, O.: A deterministic model to predict execution time of spark applications. In: Gilly, K., Thomas, N. (eds.) Computer Performance Engineering. EPEW 2022. LNCS, vol 13659, pp. 167–181. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25049-1_11
Spark Job Scheduling. https://spark.apache.org/docs/latest/job-scheduling.html. Accessed 28 Mar 2023
TPC-DS decision support benchmark. https://www.tpc.org/tpcds/
SparkBench. https://codait.github.io/spark-bench/workloads/. Accessed 11 Apr 2022
Maros, A., et al.: Machine learning for performance prediction of spark cloud applications. In: 2019 IEEE 12th International Conference on Cloud Computing (CLOUD), Milan, Italy, pp. 99–106 (2019). https://doi.org/10.1109/CLOUD.2019.00028
Didona, D., Quaglia, F., Romano, P., Torre, E.: Enhancing performance prediction robustness by combining analytical modeling and machine learning. In: ACM/SPEC 6th International Conference on Performance Engineering (ICPE), pp. 145–156 (2015)
Wang, K., Khan, M.M.H., Nguyen, N., Gokhale, S.: A model driven approach towards improving the performance of apache spark applications. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Madison, WI, USA, pp. 233–242 (2019). https://doi.org/10.1109/ISPASS.2019.00036
Li, M., Tan, J., Wang, Y., et al.: SparkBench: a spark benchmarking suite characterizing large-scale in-memory data analytics. Cluster Comput. 20, 2575–2589 (2017). https://doi.org/10.1007/s10586-016-0723-1
Venkataraman, S., Yang, Z., Franklin, M., Recht, B., Stoica, I.: Ernest: efficient performance prediction for large-scale advanced analytics. In: 13th USENIX Symposium on Networked Systems Design and Implementation NSDI 2016, pp. 363–378 (2016)
Ardagna, D., et al.: Performance prediction of cloud-based big data applications. In: 2018 ACM/SPEC 9th International Conference on Performance Engineering (ICPE), pp. 192–199 (2018)
Ardagna, D., et al.: Predicting the performance of big data applications on the cloud. J. Supercomput. 77, 1321–1353 (2021)
Shah, S., Amannejad, Y., Krishnamurthy, D., Wang, M.: Quick execution time predictions for spark applications. In: IEEE 15th International Conference on Network and Service Management (CNSM), pp. 1–9 (2019)
Shah, S., Amannejad, Y., Krishnamurthy, D., Wang, M.: PERIDOT: modeling execution time of spark applications. IEEE Open J. Comput. Soc. 2, 346–359 (2021)
Shah, S., Amannejad, Y., Krishnamurthy, D.: Diaspore: diagnosing performance interference in Apache Spark. IEEE Access 9, 103230–103243 (2021)
Acknowledgements
We acknowledge the assistance of undergraduate students Grahi Desai, Yiran Chen, Marc Lima, and Asma Fawzia Kawser Maisha in collecting the results of machine learning models. We would also like to thank NSERC Canada for financial support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tariq, H., Das, O. (2023). Execution Time Prediction Model that Considers Dynamic Allocation of Spark Executors. In: Iacono, M., Scarpa, M., Barbierato, E., Serrano, S., Cerotti, D., Longo, F. (eds) Computer Performance Engineering and Stochastic Modelling. EPEW ASMTA 2023 2023. Lecture Notes in Computer Science, vol 14231. Springer, Cham. https://doi.org/10.1007/978-3-031-43185-2_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-43185-2_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43184-5
Online ISBN: 978-3-031-43185-2
eBook Packages: Computer ScienceComputer Science (R0)