Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1345206.1345225acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Programming with tiles

Published: 20 February 2008 Publication History

Abstract

The importance of tiles or blocks in scientific computing cannot be overstated. Many algorithms, both iterative and recursive, can be expressed naturally if tiles are represented explicitly. From the point of view of performance, tiling, either as a code or a data layout transformation, is one of the most effective ways to exploit locality, which is a must to achieve good performance in current computers because of the significant difference in speed between processor and memory. Furthermore, tiles are also useful to express data distribution in parallel computations. However, despite the importance of tiles, most languages do not support them directly. This gives place to bloated programs populated with numerous subscript expressions which make the code difficult to read and coding mistakes more likely.
This paper discusses Hierarchically Tiled Arrays (HTAs), a data type which facilitates the easy manipulation of tiles in object-oriented languages with emphasis on two new features, dynamic partitioning and overlapped tiling. These features facilitate the expression of locality and communication while maintaining the same performance of algorithms written using conventional languages.

References

[1]
NAS Parallel Benchmarks. Website. http://www.nas.nasa.gov/Software/NPB/.
[2]
W. Abu-Sufah, D. J. Kuck, and D. H. Lawrie. On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations. IEEE Trans. Comput., 30(5):341--356, 1981.
[3]
E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, third edition, 1999.
[4]
C. Barton, C. Casc ' aval, G. Almasi, Y. Zheng, M. Farreras, S. Chatterje, and J. N. Amaral. Shared Memory Programming for Large Scale Machines. In PLDI '06: Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 108--117, 2006.
[5]
P. Bientinesi, J. A. Gunnels, M. E. Myers, E. S. Quintana-Ortí, and R. A. van de Geijn. The Science of Deriving Dense Linear Algebra Algorithms. ACM Trans. Math. Softw., 31(1):1--26, Mar. 2005.
[6]
P. Bientinesi, E. S. Quintana-Ortí, and R. A. van de Geijn. Representing linear algebra algorithms in code: the FLAME application program interfaces. ACM Trans. Math. Softw., 31(1):27--59, 2005.
[7]
G. Bikshandi. Parallel Programming with Hierarchically Tiled Arrays. PhD thesis, UIUC, 2007.
[8]
G. Bikshandi, J. Guo, D. Hoeflinger, G. Almasi, B. B. Fraguela, M. J. Garzaran, D. Padua, and C. von Praun. Programming for Parallelism and Locality with Hierarchically Tiled Arrays. In PPoPP '06: Proc. of the ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pages 48--57, 2006.
[9]
G. Bikshandi, J. Guo, C. von Praun, G. Tanase, B. B. Fraguela, M. J. Garzarán, D. Padua, and L. Rauchwerger. Design and use of htalib - a library for Hierarchically Tiled Arrays. In Proc. of the Intl. Workshop on Languages and Compilers for Parallel Computing, 2006.
[10]
W. Carlson, J. Draper, D. Culler, K. Yelick, E. Brooks, and K. Warren. Introduction to UPC and Language Specification. Technical Report CCS-TR-99-157, IDA Center for Computing Sciences, 1999.
[11]
R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald, and R. Menon. Parallel programming in OpenMP. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001.
[12]
K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: programming the memory hierarchy. In Supercomputing '06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, page 83, 2006.
[13]
G. C. Fox, M. A. Johnson, G. A. Lyzenga, S. W. Otto, J. K. Salmon, and D. W. Walker. Solving Problems on Concurrent Processors. Vol. 1: General Techniques and Regular Problems. Prentice-Hall, Inc., 1988.
[14]
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, page 285, 1999.
[15]
W. Gropp, E. Lusk, and A. Skjellum. Using MPI (2nd ed.): Portable Parallel Programming with the Message-Passing Interface. MIT Press, 1999.
[16]
F. G. Gustavson. High-performance linear algebra algorithms using new generalized data structures for matrices. IBM J. Res. Dev., 47(1):31--55, 2003.
[17]
M. H. Halstead. Elements of Software Science. Elsevier, 1977.
[18]
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiler Optimizations for Fortran D on MIMD Distributed-memory Machines. In Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pages 86--100, 1991.
[19]
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiling Fortran D for MIMD Distributed-memory Machines. Commun. ACM, 35(8):66--80, 1992.
[20]
F. Irigoin and R. Triolet. Supernode Partitioning. In POPL '88: Proc. of the 15th ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pages 319--329, 1988.
[21]
A. W. Lim, S.-W. Liao, and M. S. Lam. Blocking and Array Contraction Across Arbitrarily Nested Loops Using Affine Partitioning. In PPoPP '01: Proc. of the 8th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pages 103--112, 2001.
[22]
McCabe. A Complexity Measure. IEEE Transactions on Software Engineering, 2:308--320, 1976.
[23]
A. C. McKellar and J. E. G. Coffman. Organizing Matrices and Matrix Operations for Paged Memory Systems. Communications of the ACM, 12(3):153--165, 1969.
[24]
R. W. Numrich and J. Reid. Co-array Fortran for Parallel Programming. SIGPLAN Fortran Forum, 17(2):1--31, 1998.
[25]
E. S. Quintana-Ortí and R. A. van de Geijn. Formal Derivation of Algorithms: The Triangular Sylvester Equation. ACM Trans. Math. Softw., 29(2):218--243, 2003.
[26]
J. Ramanujam and P. Sadayappan. Tiling Multidimensional Iteration Spaces for Nonshared Memory Machines. In Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pages 111--120, 1991.
[27]
J. Reinders. Intel Threading Building Blocks. O'Reilly, 2007.
[28]
J. V. W. Reynders, P. J. Hinker, J. C. Cummings, S. R. Atlas, S. Banerjee, W. F. Humphrey, S. R. Karmesin, K. Keahey, M. Srikant, and M. D. Tholburn. POOMA: A Framework for Scientific Simulations of Paralllel Architectures. In Parallel Programming in C++, pages 547--588. MIT Press, 1996.
[29]
S. Toledo. Locality of Reference in LU Decomposition with Partial Pivoting. SIAM Journal on Matrix Analysis and Applications, 18(4):1065--1081, 1997.
[30]
T. Veldhuizen. Techniques for scientific C++. Technical Report TR542, Department of Computer Science, Indiana University, 2000.
[31]
R. Whaley, A. Petitet, and J. Dongarra. Automated Empirical Optimizations of Sofware and the ATLAS Project. Parallel Computing, 27(1-2):3--35, 2001.
[32]
M. E. Wolf and M. S. Lam. A Data Locality Optimizing Algorithm. In Proc. of the Conf. on Programming Language Design and Implementation, pages 30--44, 1991.
[33]
M. Wolfe. More Iteration Space Tiling. In Supercomputing '89: Proceedings of the 1989 ACM/IEEE conference on Supercomputing, pages 655--664, 1989.

Cited By

View all
  • (2021)Scratch-Style Relational Algebra and Calculus2021 International Conference on Computational Science and Computational Intelligence (CSCI)10.1109/CSCI54926.2021.00021(575-581)Online publication date: Dec-2021
  • (2019)Performance Evaluation of Tree Ensemble Classification Models Towards Challenges of Big Data AnalyticsEmerging Technologies in Computer Engineering: Microservices in Big Data Analytics10.1007/978-981-13-8300-7_12(141-154)Online publication date: 18-May-2019
  • (2018)Program generation for small-scale linear algebra applicationsProceedings of the 2018 International Symposium on Code Generation and Optimization - CGO 201810.1145/3179541.3168812(327-339)Online publication date: 2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
February 2008
308 pages
ISBN:9781595937957
DOI:10.1145/1345206
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data-parallel
  2. locality
  3. parallel programming
  4. tiling

Qualifiers

  • Research-article

Conference

PPoPP08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Scratch-Style Relational Algebra and Calculus2021 International Conference on Computational Science and Computational Intelligence (CSCI)10.1109/CSCI54926.2021.00021(575-581)Online publication date: Dec-2021
  • (2019)Performance Evaluation of Tree Ensemble Classification Models Towards Challenges of Big Data AnalyticsEmerging Technologies in Computer Engineering: Microservices in Big Data Analytics10.1007/978-981-13-8300-7_12(141-154)Online publication date: 18-May-2019
  • (2018)Program generation for small-scale linear algebra applicationsProceedings of the 2018 International Symposium on Code Generation and Optimization - CGO 201810.1145/3179541.3168812(327-339)Online publication date: 2018
  • (2018)Program generation for small-scale linear algebra applicationsProceedings of the 2018 International Symposium on Code Generation and Optimization10.1145/3168812(327-339)Online publication date: 24-Feb-2018
  • (2018)Performance Factor Analysis and Scope of Optimization for Big Data Processing on Cluster2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC)10.1109/PDGC.2018.8745857(418-423)Online publication date: Dec-2018
  • (2018)The locality descriptorProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00074(829-842)Online publication date: 2-Jun-2018
  • (2018)A case for richer cross-layer abstractionsProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00027(207-220)Online publication date: 2-Jun-2018
  • (2015)Cache-oblivious wavefront: improving parallelism of recursive dynamic programming algorithms without losing cache-efficiencyACM SIGPLAN Notices10.1145/2858788.268851450:8(205-214)Online publication date: 24-Jan-2015
  • (2015)Cache-oblivious wavefront: improving parallelism of recursive dynamic programming algorithms without losing cache-efficiencyProceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/2688500.2688514(205-214)Online publication date: 24-Jan-2015
  • (2015)Programming Languages for Scientific ComputingEncyclopedia of Applied and Computational Mathematics10.1007/978-3-540-70529-1_250(1173-1181)Online publication date: 21-Nov-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media