Abstract
Parallelism is one of the main sources for performance improvement in modern computing environment, but the efficient exploitation of the available parallelism depends on a number of parameters. Determining the optimum number of threads for a given data parallel loop, for example, is a difficult problem and dependent on the specific parallel platform. This paper presents a learning-based approach to parallel workload allocation in a cost-aware manner. This approach uses static program features to classify programs, before deciding the best workload allocation scheme based on its prior experience with similar programs. Experimental results on 12 Java benchmarks (76 test cases with different workloads in total) show that it can efficiently allocate the parallel workload among Java threads and achieve an efficiency of 86% on average.
Chapter PDF
Similar content being viewed by others
References
Agakov, F., Bonilla, E., Cavazos, J., et al.: Using machine learning to focus iterative optimization. In: Proc. of the 2006 International Symposium on Code Generation and Optimization (2006)
Arnold, M., Hind, M., Ryhder, B.: Online feedback-directed Java optimization. ACM SIGPLAN Notices 37(11) (2002)
Artigas, P., Gupta, M., Midkiff, S., Moreira, J.: Automatic loop transformation and parallelization for Java. In: Proc. of the 14th International Conference for Supercomputing (2000)
Banerjee, U., Eigenmann, R., Nicolau, A., Padua, D.: Automatic program parallelization. Proceedings of the IEEE 81(2) (1993)
Blume, W., Eigenmann, R., Hoeflinger, J., et al.: Automatic detection of parallelism, a grand challenge for high performance computing. IEEE Parallel and Distributed Technology 2(3) (1994)
Cavazos, J., Dubach, C., Agakov, F., et al.: Automatic performance model construction for the fast software exploration of new hardware design. In: CASES 2006. Proc. of International Conference on Compilers, Architecture and Synthesis for Embedded Systems (2006)
Cavazos, J., O’Boyle, M.: Method-specific dynamic compilation using logistic regression. In: OOPSLA 2006. Proc. of ACM SIGPLAN Conferences on Object-Oriented Programming, Systems, Languages, and Applications, ACM Press, New York (2006)
Chen, M., Olukotun, K.: The Jrpm System for Dynamically Parallelizing Java Programs. ACM SIGARCH Computer Architecture News 31(2) (2003)
Cintra, M., Martinez, J., Torrellas, J.: Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In: Intl. Symp. on Computer Architecture (ISCA) (2000)
Cooper, K., Subranmanian, D., Torzon, L.: Adaptive optimizing compilers for the 21st century. Journal of Supercomputing 23(1) (2001)
Dongarra, J., Foster, I., Fox, G., Gropp, W., Kennedy, K., Torzon, L., White, A.: Sourcebook of parallel computing. Morgan Kaufmann, US (2003)
Gupta, M., Nim, R.: Techniques for speculative run-time parallelization of loops. In: Proc. of Supercomputing 1998 (1998)
Long, S., O’Boyle, M.: Adaptive Java optimization using instance-based learning. In: Proc. of the 18th ACM International Conference on Supercomputing, France (2004)
Marcuello, P., Gonzales, A., Tubella, J.: Speculative Multithreaded processors. In: Proc. of the 1998 ACM International Conference on Supercomputing (1998)
Mitchell, T.: Machine learning. McGraw-Hill, US (1997)
Oplinger, J., Heine, D., Lam, M.: In search of speculative thread-level parallelism. In: Malyshkin, V. (ed.) Parallel Computing Technologies. LNCS, vol. 1662, Springer, Heidelberg (1999)
Wright, G., El-Mahdy, A., Watson, I.: Java machine and integrated circuit architecture, Java Microarchitecture. Kluwer, Dordrecht (2002)
Yang, L., Schopf, J., Foster, I.: Conservative scheduling: using predicted variance to improve scheduling decisions in dynamic environments. In: Proc. of Scientific Computing (2003)
Zhao, J., Kirkham, C., Rogers526, I.: Lazy interprocedural analysis for dynamic loop parallelization. In: Proc. of Workshop on New Horizons in Compilers, India (2006)
Zhao, J., Rogers, I., Kirkham, C., Watson, I.: Loop parallelization for the Jikes RVM. In: PDCAT 2005. Proc. of the 6th International Conference on Parallel and Distributed Computing, Applications and Technologies (2005)
Zhu, W., Wang, C.L., Lau, C.M.: JESSICA2: a distributed Java virtual machine with transparent thread migration support. In: proc. of IEEE 4th International Conference on Cluster Computing (2002)
Zivojnovic, V., et al.: DSPstone: a DSP-oriented benchmarking methodology. In: Proc. of Signal Processing Applications & Technology (1994)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 IFIP International Federation for Information Processing
About this paper
Cite this paper
Long, S., Fursin, G., Franke, B. (2007). A Cost-Aware Parallel Workload Allocation Approach Based on Machine Learning Techniques. In: Li, K., Jesshope, C., Jin, H., Gaudiot, JL. (eds) Network and Parallel Computing. NPC 2007. Lecture Notes in Computer Science, vol 4672. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74784-0_51
Download citation
DOI: https://doi.org/10.1007/978-3-540-74784-0_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74783-3
Online ISBN: 978-3-540-74784-0
eBook Packages: Computer ScienceComputer Science (R0)