Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Locality Analysis for Parallel C Programs

Published: 01 February 1999 Publication History

Abstract

Many parallel architectures support a memory model where some memory accesses are local and, thus, inexpensive, while other memory accesses are remote and potentially quite expensive. In the case of memory references via pointers, it is often difficult to determine if the memory reference is guaranteed to be local and, thus, can be handled via an inexpensive memory operation. Determining which memory accesses are local can be done by the programmer, the compiler, or a combination of both. The overall goal is to minimize the work required by the programmer and have the compiler automate the process as much as possible. This paper reports on compiler techniques for determining when indirect memory references are local. The locality analysis has been implemented for a parallel dialect of C called EARTH-C, and it uses an algorithm inspired by type inference algorithms for fast points-to analysis. The algorithm statically estimates when an indirect reference via a pointer can be safely assumed to be a local access. The locality inference algorithm is also used to guide the automatic specialization of functions in order to take advantage of locality specific to particular calling contexts. In addition to these purely static techniques, we also suggest fine-grain and coarse-grain dynamic techniques. In this case, dynamic locality checks are inserted into the program and specialized code for the local case is inserted. In the fine-grain case, the checks are put around single memory references, while in the coarse-grain case the checks are put around larger program segments. The static locality analysis and automatic specialization has been implemented in the EARTH-C compiler, which produces low-level threaded code for the EARTH multithreaded architecture. Experimental results are presented for a set of benchmarks that operate on irregular, dynamically allocated data structures. Overall, the techniques give moderate to significant speedups, with the combination of static and dynamic techniques giving the best performance overall.

References

[1]
U. Bruening W.K. Giloi and W. Schroeder-Preikschat, "Latency Hiding in Message-Passing Architectures," Proc. Eighth Int'l Parallel Processing Symp., Cancún, Mexico, pp. 704-709, Apr. 1994.
[2]
M.C. Carlisle, "Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines," PhD thesis, Dept. of Computer Science, Princeton Univ., June 1996.
[3]
L. Hendren C. Donawa M. Emami G. Gao Justiani and B. Sridharan, "Designing the McCAT Compiler Based on a Family of Structured Intermediate Representations," Proc. Fifth Int'l Workshop Languages and Compilers for Parallel Computing, pp. 406-420, New Haven, Conn., Aug. 1992.
[4]
L.J. Hendren X. Tang Y. Zhu G.R. Gao X. Xue H. Cai and P. Ouellet, "Compiling C for the EARTH Multithreaded Architecture," Proc. 1996 Conf. Parallel Architectures and Compilation Techniques (PACT '96), pp. 12-23, Boston, Oct. 1996.
[5]
F. Henglein, "Efficient Type Inference for Higher-Order Binding-Time Analysis," Functional Programming and Computer Architecture, pp. 448-472, 1991.
[6]
H.H.J. Hum O. Maquelin K.B. Theobald X. Tian G.R. Gao and L.J. Hendren, "A Study of the EARTH-MANNA Multithreaded System," Int'l J. Parallel Programming, vol. 24, no. 4, pp. 319-347, Aug. 1996.
[7]
C. Lapkowski and L. Hendren, "Extended SSA Numbering: Introducing SSA Properties to Languages with Multi-Level Pointers," Proc. Seventh Int'l Conf. Compiler Construction, pp. 128-143, Lisbon, Portugal, Apr. 1998.
[8]
O. Maquelin G.R. Gao H.H.J. Hum K.B. Theobald and X.-M. Tian, "Polling Watchdog: Combining Polling and Interrupts for Efficient Message Handling," Proc. 23rd Ann. Intl. Symp. Computer Architecture, pp. 178-188, Philadelphia, May 1996.
[9]
M. Rinard and P. Diniz, "Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers," Proc. ACM SIGPLAN '96 Conf. Programming Language Design and Implementation, pp. 54-67, Philadelphia, May 1996.
[10]
A. Rogers M.C. Carlisle J.H. Reppy and L.J. Hendren, "Supporting Dynamic Data Structures on Distributed-Memory Machines," ACM Trans. Programming Languages and Systems, vol. 17, no. 2, pp. 233-263, Mar. 1995.
[11]
B. Steensgaard, "Points-To Analysis in Almost Linear Time," Conf. Record 23rd ACM SIGPLAN-SIGACT Symp. Principles of Programming Languages, pp. 32-41, St. Petersburg, Fla., Jan. 1996.
[12]
Y. Zhu and L. Hendren, "Communication Optimizations for Parallel C Programs," Proc. ACM SIGPLAN '98 Conf. Programming Language Design and Implementation, pp. 199-211, Montreal, Canada, July 1998.

Cited By

View all
  • (2005)On the parallelization of irregular and dynamic programsParallel Computing10.1016/j.parco.2005.02.01231:6(544-562)Online publication date: 1-Jun-2005
  • (2004)A Framework to Capture Dynamic Data Structures in Pointer-Based CodesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2004.126479815:2(151-166)Online publication date: 1-Feb-2004
  • (2003)Type systems for distributed data sharingProceedings of the 10th international conference on Static analysis10.5555/1760267.1760288(273-294)Online publication date: 11-Jun-2003

Recommendations

Reviews

Daniel Grosu

The authors investigate compiler techniques for parallel programs used to determine when indirect memory references are local. The authors present an algorithm for static locality analysis of parallel C programs based on type-inference techniques. In addition to this algorithm, they propose two dynamic techniques (fine-grain and coarse-grain) in which dynamic locality checks are introduced into the program. In the case of the fine-grain technique, the locality checks are put around single memory references, and in the case of the coarse-grain technique, they are put around program segments. The effectiveness of these techniques is evaluated using a set of benchmarks. The authors find that the static locality analysis results in significant improvements in all cases; the fine-grain technique is seldom beneficial; and the coarse-grain technique can be of some benefit when used carefully. They also find that the combination of static analysis and dynamic locality checks performs best on all benchmarks. This paper is a valuable research work intended for researchers interested in compiler techniques for parallel programs.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 10, Issue 2
February 1999
96 pages
ISSN:1045-9219
Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 February 1999

Author Tags

  1. Locality analysis
  2. compiling for parallel architectures.
  3. multithreaded architectures

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2005)On the parallelization of irregular and dynamic programsParallel Computing10.1016/j.parco.2005.02.01231:6(544-562)Online publication date: 1-Jun-2005
  • (2004)A Framework to Capture Dynamic Data Structures in Pointer-Based CodesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2004.126479815:2(151-166)Online publication date: 1-Feb-2004
  • (2003)Type systems for distributed data sharingProceedings of the 10th international conference on Static analysis10.5555/1760267.1760288(273-294)Online publication date: 11-Jun-2003

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media