Nothing Special   »   [go: up one dir, main page]

Skip to main content

Mapping Unstructured Applications into Nested Parallelism Best Student Paper Award: First Prize

  • Conference paper
  • First Online:
High Performance Computing for Computational Science — VECPAR 2002 (VECPAR 2002)

Abstract

Nested parallel programming models, where the task graph associated to a computation is series-parallel are easy to program and show good analysis properties. These can be exploited for efficient scheduling, accurate cost estimation or automatic mapping to different architectures. Restricting synchronization structures to nested series-parallelism may bring performance losses due to a less parallel solution, as compared to more generic ones based in unstructured models (e.g. message passing).

A new algorithmic technique is presented which allows automatic transformation of the task graph of any unstructured application to a seriesparallel form (nested-parallelism). The tool is applied to random and irregular application task graphs to investigate the potential performance degradation when conveying them into series-parallel form. Results show that a wide range of irregular applications can be expressed using a structured coordination model with a small loss of parallelism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Skillicorn, D., Talia, D.: Models and languages for parallel computation. ACM Computing Surveys 30 (1998) 123–169 408

    Article  Google Scholar 

  2. Valdés, J., Tarjan, R., Lawler, E.: The recognition of series parallel digraphs. SIAM Journal of Computing 11 (1982) 298–313 408, 411

    Article  MATH  Google Scholar 

  3. Lodaya, K., Weil, P.: Series-parallel posets: Algebra, automata, and languages. In: Proc. STACS’98. Volume 1373 of LNCS., Paris, France, Springer (1998) 555–565 408

    Google Scholar 

  4. Gemund, A.v.: The importance of synchronization structure in parallel program optimization. In: Proc. 11th ACM ICS, Vienna (1997) 164–171 408

    Google Scholar 

  5. Sahner, R., Trivedi, K.: Performance and reliability analysis using directed acyclic graphs. IEEE Trans. on Software Eng. 13 (1987) 1105–1114 408

    Article  Google Scholar 

  6. Skillicorn, D.: A cost calculus for parallel functional programming. Journal of Parallel and Distributed Computing 28 (1995) 65–83 408

    Article  MATH  Google Scholar 

  7. Blumofe, R., Leiserson, C.: Scheduling multithreaded computations by work stealing. In: Proc. Annual Symposium on FoCS. (1994) 356–368 408

    Google Scholar 

  8. Finta, L., Liu, Z., Milis, I., Bampis, E.: Scheduling UET-UCT series-parallel graphs on two processors. Theoretical Computer Science 162 (1996) 323–340 408

    Article  MATH  MathSciNet  Google Scholar 

  9. Valiant, L.: A bridging model for parallel computation. Comm.ACM 33 (1990) 103–111 408

    Article  Google Scholar 

  10. Kessler, C.: NestStep: nested parallelism and virtual shared memory for the BSP model. In: Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA’99), Las Vegas (USA) (1999) 408

    Google Scholar 

  11. Bonorden, O., Juurlink, B., von Otte, I., Rieping, I.: The Paderborn University BSP (PUB) library-design, implementation, and performance. In: Proc. IPPS/SPDP’99, San Juan, Puerto Rico, Computer Society, IEEE (1999) 408

    Google Scholar 

  12. Darlington, J., Guo, Y., To, H., Yang, J.: Functional skeletons for parallel coordination. In: Europar’95. LNCS (1995) 55–69 408

    Google Scholar 

  13. Cole, M.: Frame: an imperative coordination language for parallel programming. Technical Report EDI-INF-RR-0026, Division of Informatics, University of Edinburgh (2000) 408

    Google Scholar 

  14. Blumofe, R., Joerg, C., Kuszmaul, B., Leiserson, C., Randall, K., Zhou, Y.: Cilk: An efficient multithreaded runtime system. In: Proc. of 5th PPoPP, ACM (1995) 207–216 408

    Google Scholar 

  15. González-Escribano, A., Gemund, A.v., Cardeñoso-Payo, V., Alonso-López, J., Martín-García, D., Pedrosa-Calvo, A.: Measuring the performance impact of SP-restricted programming in shared-memory machines. In J. M. L. M. Palma, J. Dongarra, V. H., ed.: VECPAR 2000. Number 1981 in LNCS, Porto (Portugal), Springer (2000) 128–728 408, 410

    Chapter  Google Scholar 

  16. Lin, H.: A general approach for parallelizing the FEM software package DIANA. In: Proc. High Performance Computing Conference’94, National Supercomputing Research Center. National University of Singapur (1994) 229–236 409, 417

    Google Scholar 

  17. Lin, H., Gemund, A.v., Meijdam, J., Nauta, P.: Tgex: a tool for portable parallel and distributed execution of unstructured problems. Volume 1067 of LNCS., Berlin, Springer (1996) 467–474 409, 417

    Google Scholar 

  18. González-Escribano, A., Gemund, A.v., Cardeñoso, V.: Predicting the impact of implementation level aspects on parallel application performance. In: CPC’2001 Ninth Int. Workshop on Compilers for Parallel Computing, Edinburgh, Scotland UK (2001) 367–374 410

    Google Scholar 

  19. Gerbessiotis, A., Valiant, L.: Direct bulk-synchronous parallel algorithms. Technical Report TR-10-92, Center for Research in Computing Technology, Harvard University, Cambridge, Massachussets (1992) 410

    Google Scholar 

  20. Malony, A., Mertsiotakis, V., Quick, A.: Automatic scalability analysis of parallel programs based on modeling techniques. In Haring, G., Kotsis, G., eds.: Comp. Perf. Eval.: Modelling Techniques and Tools (LNCS 794), Berlin, Springer-Verlag (1994) 139–158 410

    Google Scholar 

  21. Blelloch, G., Chatterjee, S., Hardwick, J., Sipelstein, J., Zagha, M.: Implementation of a portable nested data-parallel language. Journal of Parallel and Distributed Computing 21 (1994) 4–14 410

    Article  Google Scholar 

  22. Boeres, C., Rebello, V., Skillicorn, D.: Static scheduling using task replication for LogP and BSP models. In: EuroPar’98. Volume 1480 of LNCS., Springer (1998) 337–346 410

    Google Scholar 

  23. Munier, A., Hanen, C.: Using duplication for scheduling unitary tasks on m processors with unit communication delays. Technical Report LITP 95/47, Laboratoire Informatique Théorique et Programmation, Institut Blaise Pascal, Université Pierre et Marie Curie, 4, place jussieu, 75252 Paris cedex 05 (1995) 410

    Google Scholar 

  24. Eisenbiegler, J., Lówe, W., Wehrenpfennig, A.: On the optimization by redundancy using an extended LogP model. In: Proc. Advances in Parallel and Distributed Computing Conference (APDC’97), IEEE (1997) 410

    Google Scholar 

  25. Bilardi, G., Herley, K., Pietracaprina, A.: BSP vs. LogP. In: Proc. 8th ACM symposium on Parallel algorithms and architectures (SPAA’96), Padua, Italy, ACM (1996) 25–32 410

    Google Scholar 

  26. Ramachandran, V., Grayson, B., Dahlin, M.: Emulations between QSM, BSP and LogP: A framework for general-purpose parallel algorithm design. In: Proc. ACM-SIAM SODA’99. (1999) 957–958 410

    Google Scholar 

  27. González-Escribano, A., Gemund, A.v., Cardeñoso, V.: A new algorithm for mapping DAGs to series-parallel form. Technical Report IT-DI-2002-2, Dpto. Informática, Univ. Valladolid (2002) 411, 413

    Google Scholar 

  28. Bein, W., Kamburowski, J., Stallman, F.: Optimal reductions of two-terminal directed acyclic graphs. SIAM Journal of Computing 6 (1992) 1112–1129 411

    Article  Google Scholar 

  29. Bodlaender, H.: Dynamic algorithms for graphs with treewidth 2. In: Proc. Workshop on Graph-Theoretic Concepts in Computer Science. (1994) 413

    Google Scholar 

  30. González-Escribano, A., Cardeñoso, V., Gemund, A.v.: On the loss of parallelism by imposing synchronization structure. In: Proc. 1st Euro-PDS Int’l Conf. on Parallel and Distributed Systems, Barcelona (1997) 251–256 415

    Google Scholar 

  31. Tobita, T., Kasahara, H.: A standard task graph set for fair evaluation of multiprocessor scheduling algorithms. In: ICS’99 Workshop. (1999) 71–77 416

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

González-Escribano1, A., van Gemund, A.J., Cardeñoso-Payo, V. (2003). Mapping Unstructured Applications into Nested Parallelism Best Student Paper Award: First Prize. In: Palma, J.M.L.M., Sousa, A.A., Dongarra, J., Hernández, V. (eds) High Performance Computing for Computational Science — VECPAR 2002. VECPAR 2002. Lecture Notes in Computer Science, vol 2565. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36569-9_27

Download citation

  • DOI: https://doi.org/10.1007/3-540-36569-9_27

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00852-1

  • Online ISBN: 978-3-540-36569-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics