Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/207110.207145acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article
Free access

Unifying data and control transformations for distributed shared-memory machines

Published: 01 June 1995 Publication History

Abstract

We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations involve changing the execution order of programs. We have developed new techniques for compiler optimizations for distributed shared-memory machines, although the same techniques can be used for sequential machines with a memory hierarchy.
Our compiler optimizations are based on an algebraic representation of data mappings and a new data locality model. We present a pure data transformation algorithm and an algorithm unifying data and control transformations. While there has been much work on control transformations, the opportunities for data transformations have been largely neglected. In fact, data transformations have the advantage of being applicable to programs that cannot be optimized with control transformations. The unified algorithm, which performs data and control transformations simultaneously, offers improvement over optimizations obtained by applying data and control transformations separately.
The experimental results using a set of applications on a parallel machine show that the new optimizations improve performance significantly. These results are further analyzed using locality metrics with instrumentation and simulation.

References

[1]
J. M. Anderson and M. Lam. Global optimizations for parallelism and locality on scalable parallel machines. Proceedings of ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, June 1993.
[2]
V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer. An interactive environment for data partitioning and distribution. In Proc. 5th Distributed Memory Comput. Conf., April 1990.
[3]
D. Bau, i. Kodukula, V. Kotlyar, K. Pingali, and P. StodghilI. Solving alignment using elementary linear algebra. Proceedings of the Seventh Annual Workshop on Languages and Compilers for Parallel Computing, 1994.
[4]
R. Bianchini and T. J. LeBlanc. Software caching on cache-coherent multiprocessors. In Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing, pages 521-526, December 1992.
[5]
William J. Bolosky, Robert P. Fitzgerald, and Michael L. Scott. Simple but effective techniques for NUMA memory management. In Proceedings of the Twelfth ACM Symposium on Operating Systems Principles, pages 19- 31, Litchfield Park, AZ, December 1989.
[6]
William J. Bolosky and Michael L. Scott. False sharing and its effect on shared memory performance. In Proceedings of fhe Fourth Usenix Symposium on Experiences with Distributed and Multiprocessor Systems, pages 57-71, San Diego, CA, September 1993. Also available as MSR-TR-93-1, Microsoft Research Laboratory, September 1993.
[7]
S. Carr, K. McKinley, and C.-W. Tseng. Compiler optimizations for improving data locality, in Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 252-262, October 1994.
[8]
J. B. Carter, J. K. Bennett, and W. Zwaenepoel. Implementation and performance of Munin. in Proceedings of the Thirteenth ACM Symposium on Operating .Systems Principles, pages 152-164, Pacific Grove, CA, October 1991.
[9]
S. Chatterjee, J. R. Gilbert, R. Schreiber, and S. 71:ng. Automatic array alignment in data-parallel programs. Proceedings of ACM Symposium on Principles of Programming Languages, 20, 1993.
[10]
Alan L. Cox and Robert J. Fowler. The implementation of a coherent memory abstraction on a NUMA multiprocessor: Experiences with PLATINUM. In Proceed~!ngs of the Twelfth ACM Symposium on Operating Systems Principles, pages 32--44, Litchfield Park, AZ, December 1989.
[11]
M. Dubois, J. Skeppstedt, L. Ricciulli, K. Ramamurthy, and P. Stenstr6m. The detection and elimination of useless misses in multiprocessors. In Proceedings of the International Symposium on Computer Architecture, pages 88-97, San Diego, CA, May 1993.
[12]
S. J. Eggers and T E. Jeremiassen. Eliminating false sharing. In Proc. 1991 International Conference on Parallel Processing, 1991.
[13]
C. Eisenbeis, W. Jalby, D. Windheiser, and F. Bodin. A strategy for array management in local memory. In Proc. 3th Annual Workshop on Languages and Compilers, August 1990.
[14]
J. Ferrante, V. Sarkar, and W. Thrash. On estimating and exchange cache effectiveness. In Proceeding.r of the Fourth Workshop on Languages and Compilers for Parallel Computing, August 1991.
[15]
D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformations. Journal of Parallel and Distributed Computing, 5:587-616, 1988.
[16]
M. Gupta and P. Banerjee. Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers. IEEE Transactions on Parallel and Distributed Systems, 3:179-193, 1992.
[17]
J. Hennessy and D. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 1990.
[18]
D. Hudak and S. Abraham. Compiler techniques for data partitioning of sequentially iterated parallel loops. In Proc. ACM Int. Conf. Supercomputing, June 19913.
[19]
K. Knobe, J. Lukas, and G. Steele. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8:102-118, February 1990.
[20]
J. Li and M. Chen. Index domain alignment: Minimizing cost of cross-referencing between distributed arrays. Technical report, Yale University, 1989.
[21]
K. Li and P. Hudak. Memory coherence in shared virtual memory systems. A CM Transactions on Computer Systems, November 1989.
[22]
W. Li and K. Pingali. Access normalization: Loop restructuring for NUMA compilers. A CM Transactions on Computer Systems, 11 (4):353-375, November 1993.
[23]
W. Li and K. Pingali. A singular loop transformation framework based on non-singular matrices. International Journal of Parallel Programming, 22(2), April 1994.
[24]
Wei Li. Compiler cache optimizations for banded matrix problems. In 9th ACM International Conference on Supercomputing, July 1995.
[25]
D. J. Lilja. Cache coherence in large-scale sharedmemory multiprocessors: Issues and comparisons. Computing Surveys, 25(3):303-338, September 1993.
[26]
A. Nagurney, C. F. Nicholson, and P. M. Bishop. Spatial price equilibrium models with discriminatory ad valorem tariffs: formulation and comparative computation using variational inequalities. In Recent Advances in Spatial Equilibrium Modeling: Methodology and Applications. Springer-Verlag, Heidelberg, 1995. forthcoming.
[27]
B. Nitzberg and V. Lo. Distributed shared memory: A survey of issues and algorithms. Computer, 24(8):52- 60, August 1991.
[28]
A. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Rice University, May 1989.
[29]
R. P. LaRowe, Jr. and C. S. Ellis. Experimental comparison of memory management policies for NUMA multiprocessors. A CM Transactions on Computer Systems, 9(4):319-363, November 1991.
[30]
J. Ramanujam and P. Sadayappan. Compile-time techniques for data distribution in distributed memory machines. IEEE Transactions on Parallel and Distributed Systems, 2, October 1991.
[31]
Jack E. Veenstra and Robert J. Fowler. Mint: a front end for efficient simulation of shared-memory multiprocessors. Proceedings of the Second International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS '94), pages 201-207, January 1994.
[32]
M. Wolf and M. Lam. A data locality optimizing algorithm. In Proc. ACM SIGPLAN 91 Conference on Programming Language Design and Implementation, pages 30-44, June 1991.

Cited By

View all
  • (2024)TrackFM: Far-out Compiler Support for a Far Memory WorldProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624856(401-419)Online publication date: 27-Apr-2024
  • (2019)Co-optimizing memory-level parallelism and cache-level parallelismProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314599(935-949)Online publication date: 8-Jun-2019
  • (2018)Enhancing computation-to-core assignment with physical location informationACM SIGPLAN Notices10.1145/3296979.319238653:4(312-327)Online publication date: 11-Jun-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '95: Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
June 1995
335 pages
ISBN:0897916972
DOI:10.1145/207110
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1995

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

PLDI95
Sponsor:

Acceptance Rates

PLDI '95 Paper Acceptance Rate 28 of 105 submissions, 27%;
Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)83
  • Downloads (Last 6 weeks)15
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)TrackFM: Far-out Compiler Support for a Far Memory WorldProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624856(401-419)Online publication date: 27-Apr-2024
  • (2019)Co-optimizing memory-level parallelism and cache-level parallelismProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314599(935-949)Online publication date: 8-Jun-2019
  • (2018)Enhancing computation-to-core assignment with physical location informationACM SIGPLAN Notices10.1145/3296979.319238653:4(312-327)Online publication date: 11-Jun-2018
  • (2018)Computing with Near DataProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/32873212:3(1-30)Online publication date: 21-Dec-2018
  • (2018)Enhancing computation-to-core assignment with physical location informationProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3192366.3192386(312-327)Online publication date: 11-Jun-2018
  • (2017)Compressing three-dimensional sparse arrays using inter- and intra-task parallelization strategies on Intel Xeon and Xeon PhiThe Journal of Supercomputing10.1007/s11227-016-1820-x73:8(3391-3410)Online publication date: 1-Aug-2017
  • (2016)Optimal Heavy-Traffic Queue Length Scaling in an Incompletely Saturated SwitchACM SIGMETRICS Performance Evaluation Review10.1145/2964791.290146644:1(13-24)Online publication date: 14-Jun-2016
  • (2016)Using Predictions in Online OptimizationACM SIGMETRICS Performance Evaluation Review10.1145/2964791.290146444:1(193-206)Online publication date: 14-Jun-2016
  • (2016)On the Approximation Error of Mean-Field ModelsACM SIGMETRICS Performance Evaluation Review10.1145/2964791.290146344:1(285-297)Online publication date: 14-Jun-2016
  • (2016)The Value of PrivacyACM SIGMETRICS Performance Evaluation Review10.1145/2964791.290146144:1(249-260)Online publication date: 14-Jun-2016
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media