Abstract
With the increasing architectural diversity of many-core architectures the challenges of parallel programming and code portability will sharply rise. The EU project PEPPHER addresses these issues with a component-based approach to application development on top of a task-parallel execution model. Central to this approach are multi-architectural components which encapsulate different implementation variants of application functionality tailored for different core types. An intelligent runtime system selects and dynamically schedules component implementation variants for efficient parallel execution on heterogeneous many-core architectures. On top of this model we have developed language, compiler and runtime support for a specific class of applications that can be expressed using the pipeline pattern. We propose C/C++ language annotations for specifying pipeline patterns and describe the associated compilation and runtime infrastructure. Experimental results indicate that with our high-level approach performance comparable to manual parallelization can be achieved.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Intel, Threading Building Blocks (2009), http://threadingbuildingblocks.org
Nvidia, C.: Compute Unified Device Architecture Programming Guide. NVIDIA, Santa Clara (2007)
Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D.J.: Introduction to the Cell Multiprocessor. IBM Journal of Research and Development 49(4-5), 589–604 (2005)
Munshi, A. (ed.): OpenCL 1.0 Specification. Khronos OpenCL Working Group (2011)
Pan, H., Hindman, B., Asanović, K.: Composing Parallel Software Efficiently with Lithe. In: Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2010, pp. 376–387. ACM, New York (2010)
Ansel, J., Chan, C.P., Wong, Y.L., Olszewski, M., Zhao, Q., Edelman, A., Amarasinghe, S.P.: PetaBricks: A Language and Compiler for Algorithmic Choice. In: Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2009, pp. 38–49 (2009)
Wernsing, J.R., Stitt, G.: Elastic Computing: A Framework for Transparent, Portable, and Adaptive Multi-core Heterogeneous Computing. In: Proceedings of the ACM SIGPLAN/SIGBED 2010 Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 115–124. ACM (2010)
Vandierendonck, H., Pratikakis, P., Nikolopoulos, D.S.: Parallel Programming of General-Purpose Programs using Task-based Programming Models. In: Proceedings of the 3rd USENIX Conference on Hot Topics in Parallelism, HotPar 2011, Berkeley, CA, USA, p. 13 (2011)
Benkner, S., Pllana, S., Traff, J., Tsigas, P., Dolinsky, U., Augonnet, C., Bachmayer, B., Kessler, C., Moloney, D., Osipov, V.: PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems. IEEE Micro 31(5), 28–41 (2011)
Sandrieser, M., Benkner, S., Pllana, S.: Using explicit platform descriptions to support programming of heterogeneous many-core systems. Parallel Computing 38(12), 52–65 (2012), http://www.sciencedirect.com/science/article/pii/S0167819111001396
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience (23), 187–198 (2011)
Quinlan, D.: ROSE: Compiler Support for Object-Oriented Frameworks. Parallel Processing Letters 49 (2005)
Topcuoglu, H., Hariri, S., Wu, M.-Y.: Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing. IEEE Transactions on Parallel and Distributed Systems 13(3) (March 2002)
Burrows, M.: A Block-Sorting Lossless Data Compression Algorithm. Research Report 124, Digital Systems Research Center (1994)
Intel, Intel Threading Building Blocks - Pipeline Documentation, http://threadingbuildingblocks.org/files/documentation/a00150.html
Seward, J.: BZIP2 Library Utility Function Documentation (September 2011), http://bzip.org/1.0.5/bzip2-manual-1.0.5.html#util-fns
Gilchrist, J.: Parallel Data Compression with bzip2. In: Proceedings of the 16th IASTED International Conference on Parallel and Distributed Computing and Systems, vol. 16, pp. 559–564 (2004)
Gary, B.: Learning openCV: Computer Vision with the openCV Library. O’Reilly, USA (2008)
Benoit, A., Robert, Y.: Mapping Pipeline Skeletons onto Heterogeneous Platforms. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007, Part I. LNCS, vol. 4487, pp. 591–598. Springer, Heidelberg (2007)
Cole, M.: Bringing Skeletons out of the Closet: A Pragmatic Manifesto for Skeletal Parallel Programming. Parallel Computing (2004)
Mattson, T., Sanders, B., Massingill, B.: Patterns for Parallel Programming. Addison-Wesley (2005)
Pop, A., Cohen, A.: A Stream-Computing Extension to OpenMP. In: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers. ACM (2011)
Thies, W., Karczmarek, M., Amarasinghe, S.: StreamIt: A Language for Streaming Applications. In: Horspool, R.N. (ed.) CC 2002. LNCS, vol. 2304, pp. 179–196. Springer, Heidelberg (2002)
Sermulins, J., Thies, W., Rabbah, R., Amarasinghe, S.: Cache Aware Optimization of Stream Programs. ACM SIGPLAN Notices 40(7) (2005)
Schaefer, C., Pankratius, V., Tichy, W.: Engineering Parallel Applications with Tunable Architectures. In: ICSE 2010: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, vol. 1 (May 2010)
Otto, F., Schaefer, C.A., Dempe, M., Tichy, W.F.: A Language-Based Tuning Mechanism for Task and Pipeline Parallelism. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010, Part II. LNCS, vol. 6272, pp. 328–340. Springer, Heidelberg (2010)
Suleman, M., Qureshi, M., Khubaib, Patt, Y.: Feedback-Directed Pipeline Parallelism. In: PACT 2010: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (2010)
Ayguade, E., Badia, R.M., Cabrera, D., Duran, A., Gonzalez, M., Igual, F., Jimenez, D., Labarta, J., Martorell, X., Mayo, R., Perez, J.M., Quintana-Ortí, E.S.: A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 154–167. Springer, Heidelberg (2009)
Wolfe, M.: Implementing the PGI Accelerator Model. In: GPGPU 2010: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units. ACM (March 2010)
Bodin, F., Bihan, S.: Heterogeneous Multicore Parallel Programming for Graphics Processing Units. Scientific Programming 17, 325–335 (2009)
OpenACC. Directives for Accelerators, http://www.openacc-standard.org/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Benkner, S., Bajrovic, E., Marth, E., Sandrieser, M., Namyst, R., Thibault, S. (2012). High-Level Support for Pipeline Parallelism on Many-Core Architectures. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds) Euro-Par 2012 Parallel Processing. Euro-Par 2012. Lecture Notes in Computer Science, vol 7484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32820-6_61
Download citation
DOI: https://doi.org/10.1007/978-3-642-32820-6_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32819-0
Online ISBN: 978-3-642-32820-6
eBook Packages: Computer ScienceComputer Science (R0)