Abstract
Nested parallel programming models, where the task graph associated to a computation is series-parallel are easy to program and show good analysis properties. These can be exploited for efficient scheduling, accurate cost estimation or automatic mapping to different architectures. Restricting synchronization structures to nested series-parallelism may bring performance losses due to a less parallel solution, as compared to more generic ones based in unstructured models (e.g. message passing).
A new algorithmic technique is presented which allows automatic transformation of the task graph of any unstructured application to a seriesparallel form (nested-parallelism). The tool is applied to random and irregular application task graphs to investigate the potential performance degradation when conveying them into series-parallel form. Results show that a wide range of irregular applications can be expressed using a structured coordination model with a small loss of parallelism.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Skillicorn, D., Talia, D.: Models and languages for parallel computation. ACM Computing Surveys 30 (1998) 123–169 408
Valdés, J., Tarjan, R., Lawler, E.: The recognition of series parallel digraphs. SIAM Journal of Computing 11 (1982) 298–313 408, 411
Lodaya, K., Weil, P.: Series-parallel posets: Algebra, automata, and languages. In: Proc. STACS’98. Volume 1373 of LNCS., Paris, France, Springer (1998) 555–565 408
Gemund, A.v.: The importance of synchronization structure in parallel program optimization. In: Proc. 11th ACM ICS, Vienna (1997) 164–171 408
Sahner, R., Trivedi, K.: Performance and reliability analysis using directed acyclic graphs. IEEE Trans. on Software Eng. 13 (1987) 1105–1114 408
Skillicorn, D.: A cost calculus for parallel functional programming. Journal of Parallel and Distributed Computing 28 (1995) 65–83 408
Blumofe, R., Leiserson, C.: Scheduling multithreaded computations by work stealing. In: Proc. Annual Symposium on FoCS. (1994) 356–368 408
Finta, L., Liu, Z., Milis, I., Bampis, E.: Scheduling UET-UCT series-parallel graphs on two processors. Theoretical Computer Science 162 (1996) 323–340 408
Valiant, L.: A bridging model for parallel computation. Comm.ACM 33 (1990) 103–111 408
Kessler, C.: NestStep: nested parallelism and virtual shared memory for the BSP model. In: Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA’99), Las Vegas (USA) (1999) 408
Bonorden, O., Juurlink, B., von Otte, I., Rieping, I.: The Paderborn University BSP (PUB) library-design, implementation, and performance. In: Proc. IPPS/SPDP’99, San Juan, Puerto Rico, Computer Society, IEEE (1999) 408
Darlington, J., Guo, Y., To, H., Yang, J.: Functional skeletons for parallel coordination. In: Europar’95. LNCS (1995) 55–69 408
Cole, M.: Frame: an imperative coordination language for parallel programming. Technical Report EDI-INF-RR-0026, Division of Informatics, University of Edinburgh (2000) 408
Blumofe, R., Joerg, C., Kuszmaul, B., Leiserson, C., Randall, K., Zhou, Y.: Cilk: An efficient multithreaded runtime system. In: Proc. of 5th PPoPP, ACM (1995) 207–216 408
González-Escribano, A., Gemund, A.v., Cardeñoso-Payo, V., Alonso-López, J., Martín-García, D., Pedrosa-Calvo, A.: Measuring the performance impact of SP-restricted programming in shared-memory machines. In J. M. L. M. Palma, J. Dongarra, V. H., ed.: VECPAR 2000. Number 1981 in LNCS, Porto (Portugal), Springer (2000) 128–728 408, 410
Lin, H.: A general approach for parallelizing the FEM software package DIANA. In: Proc. High Performance Computing Conference’94, National Supercomputing Research Center. National University of Singapur (1994) 229–236 409, 417
Lin, H., Gemund, A.v., Meijdam, J., Nauta, P.: Tgex: a tool for portable parallel and distributed execution of unstructured problems. Volume 1067 of LNCS., Berlin, Springer (1996) 467–474 409, 417
González-Escribano, A., Gemund, A.v., Cardeñoso, V.: Predicting the impact of implementation level aspects on parallel application performance. In: CPC’2001 Ninth Int. Workshop on Compilers for Parallel Computing, Edinburgh, Scotland UK (2001) 367–374 410
Gerbessiotis, A., Valiant, L.: Direct bulk-synchronous parallel algorithms. Technical Report TR-10-92, Center for Research in Computing Technology, Harvard University, Cambridge, Massachussets (1992) 410
Malony, A., Mertsiotakis, V., Quick, A.: Automatic scalability analysis of parallel programs based on modeling techniques. In Haring, G., Kotsis, G., eds.: Comp. Perf. Eval.: Modelling Techniques and Tools (LNCS 794), Berlin, Springer-Verlag (1994) 139–158 410
Blelloch, G., Chatterjee, S., Hardwick, J., Sipelstein, J., Zagha, M.: Implementation of a portable nested data-parallel language. Journal of Parallel and Distributed Computing 21 (1994) 4–14 410
Boeres, C., Rebello, V., Skillicorn, D.: Static scheduling using task replication for LogP and BSP models. In: EuroPar’98. Volume 1480 of LNCS., Springer (1998) 337–346 410
Munier, A., Hanen, C.: Using duplication for scheduling unitary tasks on m processors with unit communication delays. Technical Report LITP 95/47, Laboratoire Informatique Théorique et Programmation, Institut Blaise Pascal, Université Pierre et Marie Curie, 4, place jussieu, 75252 Paris cedex 05 (1995) 410
Eisenbiegler, J., Lówe, W., Wehrenpfennig, A.: On the optimization by redundancy using an extended LogP model. In: Proc. Advances in Parallel and Distributed Computing Conference (APDC’97), IEEE (1997) 410
Bilardi, G., Herley, K., Pietracaprina, A.: BSP vs. LogP. In: Proc. 8th ACM symposium on Parallel algorithms and architectures (SPAA’96), Padua, Italy, ACM (1996) 25–32 410
Ramachandran, V., Grayson, B., Dahlin, M.: Emulations between QSM, BSP and LogP: A framework for general-purpose parallel algorithm design. In: Proc. ACM-SIAM SODA’99. (1999) 957–958 410
González-Escribano, A., Gemund, A.v., Cardeñoso, V.: A new algorithm for mapping DAGs to series-parallel form. Technical Report IT-DI-2002-2, Dpto. Informática, Univ. Valladolid (2002) 411, 413
Bein, W., Kamburowski, J., Stallman, F.: Optimal reductions of two-terminal directed acyclic graphs. SIAM Journal of Computing 6 (1992) 1112–1129 411
Bodlaender, H.: Dynamic algorithms for graphs with treewidth 2. In: Proc. Workshop on Graph-Theoretic Concepts in Computer Science. (1994) 413
González-Escribano, A., Cardeñoso, V., Gemund, A.v.: On the loss of parallelism by imposing synchronization structure. In: Proc. 1st Euro-PDS Int’l Conf. on Parallel and Distributed Systems, Barcelona (1997) 251–256 415
Tobita, T., Kasahara, H.: A standard task graph set for fair evaluation of multiprocessor scheduling algorithms. In: ICS’99 Workshop. (1999) 71–77 416
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
González-Escribano1, A., van Gemund, A.J., Cardeñoso-Payo, V. (2003). Mapping Unstructured Applications into Nested Parallelism Best Student Paper Award: First Prize. In: Palma, J.M.L.M., Sousa, A.A., Dongarra, J., Hernández, V. (eds) High Performance Computing for Computational Science — VECPAR 2002. VECPAR 2002. Lecture Notes in Computer Science, vol 2565. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36569-9_27
Download citation
DOI: https://doi.org/10.1007/3-540-36569-9_27
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00852-1
Online ISBN: 978-3-540-36569-3
eBook Packages: Springer Book Archive