research-article

Programming with tiles

Authors:

Ganesh Bikshandi,

Basilio B. Fraguela,

Maria J. Garzaran,

David PaduaAuthors Info & Claims

PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming

Pages 111 - 122

https://doi.org/10.1145/1345206.1345225

Published: 20 February 2008 Publication History

Abstract

The importance of tiles or blocks in scientific computing cannot be overstated. Many algorithms, both iterative and recursive, can be expressed naturally if tiles are represented explicitly. From the point of view of performance, tiling, either as a code or a data layout transformation, is one of the most effective ways to exploit locality, which is a must to achieve good performance in current computers because of the significant difference in speed between processor and memory. Furthermore, tiles are also useful to express data distribution in parallel computations. However, despite the importance of tiles, most languages do not support them directly. This gives place to bloated programs populated with numerous subscript expressions which make the code difficult to read and coding mistakes more likely.

This paper discusses Hierarchically Tiled Arrays (HTAs), a data type which facilitates the easy manipulation of tiles in object-oriented languages with emphasis on two new features, dynamic partitioning and overlapped tiling. These features facilitate the expression of locality and communication while maintaining the same performance of algorithms written using conventional languages.

References

[1]

NAS Parallel Benchmarks. Website. http://www.nas.nasa.gov/Software/NPB/.

[2]

W. Abu-Sufah, D. J. Kuck, and D. H. Lawrie. On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations. IEEE Trans. Comput., 30(5):341--356, 1981.

Digital Library

[3]

E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, third edition, 1999.

Digital Library

[4]

C. Barton, C. Casc ' aval, G. Almasi, Y. Zheng, M. Farreras, S. Chatterje, and J. N. Amaral. Shared Memory Programming for Large Scale Machines. In PLDI '06: Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 108--117, 2006.

Digital Library

[5]

P. Bientinesi, J. A. Gunnels, M. E. Myers, E. S. Quintana-Ortí, and R. A. van de Geijn. The Science of Deriving Dense Linear Algebra Algorithms. ACM Trans. Math. Softw., 31(1):1--26, Mar. 2005.

Digital Library

[6]

P. Bientinesi, E. S. Quintana-Ortí, and R. A. van de Geijn. Representing linear algebra algorithms in code: the FLAME application program interfaces. ACM Trans. Math. Softw., 31(1):27--59, 2005.

Digital Library

[7]

G. Bikshandi. Parallel Programming with Hierarchically Tiled Arrays. PhD thesis, UIUC, 2007.

[8]

G. Bikshandi, J. Guo, D. Hoeflinger, G. Almasi, B. B. Fraguela, M. J. Garzaran, D. Padua, and C. von Praun. Programming for Parallelism and Locality with Hierarchically Tiled Arrays. In PPoPP '06: Proc. of the ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pages 48--57, 2006.

Digital Library

[9]

G. Bikshandi, J. Guo, C. von Praun, G. Tanase, B. B. Fraguela, M. J. Garzarán, D. Padua, and L. Rauchwerger. Design and use of htalib - a library for Hierarchically Tiled Arrays. In Proc. of the Intl. Workshop on Languages and Compilers for Parallel Computing, 2006.

Digital Library

[10]

W. Carlson, J. Draper, D. Culler, K. Yelick, E. Brooks, and K. Warren. Introduction to UPC and Language Specification. Technical Report CCS-TR-99-157, IDA Center for Computing Sciences, 1999.

[11]

R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald, and R. Menon. Parallel programming in OpenMP. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001.

Digital Library

[12]

K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: programming the memory hierarchy. In Supercomputing '06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, page 83, 2006.

Digital Library

[13]

G. C. Fox, M. A. Johnson, G. A. Lyzenga, S. W. Otto, J. K. Salmon, and D. W. Walker. Solving Problems on Concurrent Processors. Vol. 1: General Techniques and Regular Problems. Prentice-Hall, Inc., 1988.

Digital Library

[14]

M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, page 285, 1999.

Digital Library

[15]

W. Gropp, E. Lusk, and A. Skjellum. Using MPI (2nd ed.): Portable Parallel Programming with the Message-Passing Interface. MIT Press, 1999.

Digital Library

[16]

F. G. Gustavson. High-performance linear algebra algorithms using new generalized data structures for matrices. IBM J. Res. Dev., 47(1):31--55, 2003.

Digital Library

[17]

M. H. Halstead. Elements of Software Science. Elsevier, 1977.

Digital Library

[18]

S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiler Optimizations for Fortran D on MIMD Distributed-memory Machines. In Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pages 86--100, 1991.

Digital Library

[19]

S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiling Fortran D for MIMD Distributed-memory Machines. Commun. ACM, 35(8):66--80, 1992.

Digital Library

[20]

F. Irigoin and R. Triolet. Supernode Partitioning. In POPL '88: Proc. of the 15th ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pages 319--329, 1988.

Digital Library

[21]

A. W. Lim, S.-W. Liao, and M. S. Lam. Blocking and Array Contraction Across Arbitrarily Nested Loops Using Affine Partitioning. In PPoPP '01: Proc. of the 8th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pages 103--112, 2001.

Digital Library

[22]

McCabe. A Complexity Measure. IEEE Transactions on Software Engineering, 2:308--320, 1976.

Digital Library

[23]

A. C. McKellar and J. E. G. Coffman. Organizing Matrices and Matrix Operations for Paged Memory Systems. Communications of the ACM, 12(3):153--165, 1969.

Digital Library

[24]

R. W. Numrich and J. Reid. Co-array Fortran for Parallel Programming. SIGPLAN Fortran Forum, 17(2):1--31, 1998.

Digital Library

[25]

E. S. Quintana-Ortí and R. A. van de Geijn. Formal Derivation of Algorithms: The Triangular Sylvester Equation. ACM Trans. Math. Softw., 29(2):218--243, 2003.

Digital Library

[26]

J. Ramanujam and P. Sadayappan. Tiling Multidimensional Iteration Spaces for Nonshared Memory Machines. In Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pages 111--120, 1991.

Digital Library

[27]

J. Reinders. Intel Threading Building Blocks. O'Reilly, 2007.

Digital Library

[28]

J. V. W. Reynders, P. J. Hinker, J. C. Cummings, S. R. Atlas, S. Banerjee, W. F. Humphrey, S. R. Karmesin, K. Keahey, M. Srikant, and M. D. Tholburn. POOMA: A Framework for Scientific Simulations of Paralllel Architectures. In Parallel Programming in C++, pages 547--588. MIT Press, 1996.

[29]

S. Toledo. Locality of Reference in LU Decomposition with Partial Pivoting. SIAM Journal on Matrix Analysis and Applications, 18(4):1065--1081, 1997.

Digital Library

[30]

T. Veldhuizen. Techniques for scientific C++. Technical Report TR542, Department of Computer Science, Indiana University, 2000.

[31]

R. Whaley, A. Petitet, and J. Dongarra. Automated Empirical Optimizations of Sofware and the ATLAS Project. Parallel Computing, 27(1-2):3--35, 2001.

Digital Library

[32]

M. E. Wolf and M. S. Lam. A Data Locality Optimizing Algorithm. In Proc. of the Conf. on Programming Language Design and Implementation, pages 30--44, 1991.

Digital Library

[33]

M. Wolfe. More Iteration Space Tiling. In Supercomputing '89: Proceedings of the 1989 ACM/IEEE conference on Supercomputing, pages 655--664, 1989.

Digital Library

Cited By

Meston LKawash J(2021)Scratch-Style Relational Algebra and Calculus2021 International Conference on Computational Science and Computational Intelligence (CSCI)10.1109/CSCI54926.2021.00021(575-581)Online publication date: Dec-2021
https://doi.org/10.1109/CSCI54926.2021.00021
Godara HGovil MPilli E(2019)Performance Evaluation of Tree Ensemble Classification Models Towards Challenges of Big Data AnalyticsEmerging Technologies in Computer Engineering: Microservices in Big Data Analytics10.1007/978-981-13-8300-7_12(141-154)Online publication date: 18-May-2019
https://doi.org/10.1007/978-981-13-8300-7_12
Spampinato DFabregat-Traver DBientinesi PPüschel M(2018)Program generation for small-scale linear algebra applicationsProceedings of the 2018 International Symposium on Code Generation and Optimization - CGO 201810.1145/3179541.3168812(327-339)Online publication date: 2018
https://doi.org/10.1145/3179541.3168812
Show More Cited By

Index Terms

Programming with tiles
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
      2. Language types
        Concurrent programming languages

Recommendations

Flow tiles
SCA '04: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation

We present <i>flow tiles</i>, a novel technique for representing and designing velocity fields. Unlike existing procedural flow generators, tiling offers a natural user interface for field design. Tilings can be constructed to meet a wide variety of ...
Programming for parallelism and locality with hierarchically tiled arrays
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming

Tiling has proven to be an effective mechanism to develop high performance implementations of algorithms. Tiling can be used to organize computations so that communication costs in parallel programs are reduced and locality in sequential codes or ...
An alternative for Wang tiles: colored edges versus colored corners

In this article we revisit the concept of Wang tiles and introduce corner tiles, square tiles with colored corners. During past years, Wang tiles have become a valuable tool in computer graphics. Important applications of Wang tiles include texture ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming

February 2008

308 pages

ISBN:9781595937957

DOI:10.1145/1345206

General Chair:
Siddhartha Chatterjee
IBM Research USA
,
Program Chair:
Michael L. Scott
University of Rochester USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PPoPP08

Sponsor:

PPoPP08: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 20 - 23, 2008

UT, Salt Lake City, USA

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
794
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Reflects downloads up to 23 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Meston LKawash J(2021)Scratch-Style Relational Algebra and Calculus2021 International Conference on Computational Science and Computational Intelligence (CSCI)10.1109/CSCI54926.2021.00021(575-581)Online publication date: Dec-2021
https://doi.org/10.1109/CSCI54926.2021.00021
Godara HGovil MPilli E(2019)Performance Evaluation of Tree Ensemble Classification Models Towards Challenges of Big Data AnalyticsEmerging Technologies in Computer Engineering: Microservices in Big Data Analytics10.1007/978-981-13-8300-7_12(141-154)Online publication date: 18-May-2019
https://doi.org/10.1007/978-981-13-8300-7_12
Spampinato DFabregat-Traver DBientinesi PPüschel M(2018)Program generation for small-scale linear algebra applicationsProceedings of the 2018 International Symposium on Code Generation and Optimization - CGO 201810.1145/3179541.3168812(327-339)Online publication date: 2018
https://doi.org/10.1145/3179541.3168812
Spampinato DFabregat-Traver DBientinesi PPüschel MKnoop JSchordan MJohnson TO'Boyle M(2018)Program generation for small-scale linear algebra applicationsProceedings of the 2018 International Symposium on Code Generation and Optimization10.1145/3168812(327-339)Online publication date: 24-Feb-2018
https://dl.acm.org/doi/10.1145/3168812
Godara HGovil MPilli E(2018)Performance Factor Analysis and Scope of Optimization for Big Data Processing on Cluster2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC)10.1109/PDGC.2018.8745857(418-423)Online publication date: Dec-2018
https://doi.org/10.1109/PDGC.2018.8745857
Vijaykumar NEbrahimi EHsieh KGibbons PMutlu O(2018)The locality descriptorProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00074(829-842)Online publication date: 2-Jun-2018
https://dl.acm.org/doi/10.1109/ISCA.2018.00074
Vijaykumar NJain AMajumdar DHsieh KPekhimenko GEbrahimi EHajinazar NGibbons PMutlu O(2018)A case for richer cross-layer abstractionsProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00027(207-220)Online publication date: 2-Jun-2018
https://dl.acm.org/doi/10.1109/ISCA.2018.00027
Tang YYou RKan HTithi JGanapathi PChowdhury R(2015)Cache-oblivious wavefront: improving parallelism of recursive dynamic programming algorithms without losing cache-efficiencyACM SIGPLAN Notices10.1145/2858788.268851450:8(205-214)Online publication date: 24-Jan-2015
https://dl.acm.org/doi/10.1145/2858788.2688514
Tang YYou RKan HTithi JGanapathi PChowdhury RCohen AGrove D(2015)Cache-oblivious wavefront: improving parallelism of recursive dynamic programming algorithms without losing cache-efficiencyProceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/2688500.2688514(205-214)Online publication date: 24-Jan-2015
https://dl.acm.org/doi/10.1145/2688500.2688514
Knepley M(2015)Programming Languages for Scientific ComputingEncyclopedia of Applied and Computational Mathematics10.1007/978-3-540-70529-1_250(1173-1181)Online publication date: 21-Nov-2015
https://doi.org/10.1007/978-3-540-70529-1_250
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents