Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Source-level global optimizations for fine-grain distributed shared memory systems

Published: 18 June 2001 Publication History

Abstract

This paper describes and evaluates the use of aggressive static analysis in Jackal, a fine-grain Distributed Shared Memory (DSM) system for Java. Jackal uses an optimizing, source-level compiler rather than the binary rewriting techniques employed by most other fine-grain DSM systems. Source-level analysis makes existing access-check optimizations (e.g., access-check batching) more effective and enables two novel fine-grain DSM optimizations: object-graph aggregation and automatic computation migration.
The compiler detects situations where an access to a root object is followed by accesses to subobjects. Jackal attempts to aggregate all access checks on objects in such object graphs into a single check on the graph's root object. If this check fails, the entire graph is fetched. Object-graph aggregation can reduce the number of network roundtrips and, since it is an advanced form of access-check batching, improves sequential performance.
Computation migration (or function shipping) is used to optimize critical sections in which a single processor owns both the shared data that is accessed and the lock that protects the data. It is usually more efficient to execute such critical sections on the processor that holds the lock and the data than to incur multiple roundtrips for acquiring the lock, fetching the data, writing the data back, and releasing the lock. Jackal's compiler detects such critical sections and optimizes them by generating single-roundtrip computation-migration code rather than standard data-shipping code.
Jackal's optimizations improve both sequential and parallel application performance. On average, sequential execution times of instrumented, optimized programs are within 10% of those of uninstrumented programs. Application speedups usually improve significantly and several Jackal applications perform as well as hand-optimized message-passing programs.

References

[1]
Y. Aridor, M. Factor, and A. Teperman. cJVM: a Single System Image of a JVM on a Cluster. In Proc. of the 1999 Int. Conf. on Parallel Processing, Aizu, Japan, Sept. 1999.]]
[2]
H. Bal, R. Bhoedjang, R. Hofman, C. Jacobs, K. Langendoen, T. R~hl, and M. Kaashoek. Performance Evaluation of the Orca Shared Object System. ACM Trans. on Computer Systems, 16(1):1-40, Feb. 1998.]]
[3]
B. Bershad, M. Zekauskas, and W. Sawdon. The Midway Distributed Shared Memory System. In Proc. of the 38th IEEE Int. Computer Conference, pages 528-537, Los Alamitos, CA, Feb. 1993.]]
[4]
R. Bhoedjang, K. Verstoep, T. R~hl, and H. Bal. Evaluating Design Alternatives for Reliable Communication on High-Speed Networks. In ASPLOS, pages 71-81, Nov. 2000.]]
[5]
N. Boden, D. Cohen, R. Felderman, A. Kulawik, C. Seitz, J. Seizovic, and W. Su. Myrinet: A Gigabit-per-second Local Area Network. IEEE Micro, 15(1):29-36, Feb. 1995.]]
[6]
J. Choi, M. Gupta, M. Serrano, V. Sreedhar, and S. Midkiff. Escape Analysis for Java. In OOPSLA'99, pages 1-19, Denver, CO, Nov. 1999.]]
[7]
O. Doederlein. The Java Performance Report - Part II. Online at http://www.javalobby.org/features/jpr/.]]
[8]
S. Dwarkadas, A. Cox, and W. Zwaenepoel. An Integrated Compile-Time/Run-Time Software Distributed Shared Memory System. In ASPLOS'96, Oct. 1996.]]
[9]
R. Ghiya and L. Hendren. Putting Pointer Analysis to Work. In Symp. on Principles of Programming Languages, pages 121-133, San Diego, CA, Jan. 1998.]]
[10]
J. Gosling, B. Joy, and G. Steele. The Java Language Specification. Addison-Wesley, Reading, MA, 1996.]]
[11]
P. Havlak and K. Kennedy. An Implementation of Interprocedural Bounded Regular Section Analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):350-360, 1991.]]
[12]
W. Hsieh, M. Kaashoek, and W. Weihl. Dynamic Computation Migration in DSM Systems. In Supercomputing '96, Pitssburgh, PA, Nov. 1996.]]
[13]
W. Hsieh, P. Wang, and W. Weihl. Computation Migration: Enhancing Locality for Distributed-Memory Parallel Systems. In PPoPP'93, pages 239-248, May 1993.]]
[14]
M. H. I. Schoinas, B. Falsafi and D. W. J.R. Larus. Sirocco: Cost-Effective Fine-Grain Distributed Shared Memory. In PACT, pages 40-49, Paris, France, Oct. 1998.]]
[15]
K. Johnson, M. Kaashoek, and D. Wallach. CRL: High-Performance All-Software Distributed Shared Memory. In Proc. of the 15th Symp. on Operating Systems Principles, pages 213-226, Copper Mountain, CO, Dec. 1995.]]
[16]
A. A. C. Julian Dolby. An automatic object inlining optimization and its evaluation. In PLDI 2000, pages 345-357, Vancouver, BC, Canada, June 2000.]]
[17]
P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. In Proc. of the Winter 1994 Usenix Conf., pages 115-131, San Francisco, CA, Jan. 1994.]]
[18]
J. Knoop, O. R~thing, and B. Steffen. Lazy code motion. ACM SIGPLAN Notices, 27(7):224-234, 1992.]]
[19]
L. Kontothanassis and M. Scott. Using Memory-Mapped Network Interface to Improve the Performance of Distributed Shared Memory. In Proc. of the 2nd Int. Symp. on High-Performance Computer Architecture, pages 166-177, San Jose, CA, Feb. 1996.]]
[20]
J. Maassen, R. van Nieuwpoort, R. Veldema, H. Bal, and A. Plaat. An Efficient Implementation of Java's Remote Method Invocation. In PPoPP, pages 173-182, May 1999.]]
[21]
M. W. Macbeth, K. A. McGuigan, and P. J. Hatcher. Executing Java Threads in Parallel in a Distributed-Memory Environment. In Proc. CASCON'98, pages 40-54, Missisauga, ON, Canada, 1998.]]
[22]
W. Pugh. Fixing the Java Memory Model. In ACM 1999 Java Grande Conference, pages 89-98, San Francisco, CA, June 1999.]]
[23]
R. Veldema, R. Hofman, R. Bhoedjang and H.E. Bal. Runtime-optimizations for a Java DSM. In ACM 2001 Java Grande Conference, pages 89-98, San Francisco, CA, June 2001.]]
[24]
A. Rogers, M. Carlisle, J. Reppy, and L. Hendren. Supporting Dynamic Data Structures on Distributed-Memory Machines. ACM Trans. on Programming Languages and Systems, 17(2):233-263, Mar. 1995.]]
[25]
D. Scales, K. Gharachorloo, and C. Thekkath. Shasta: A Low Overhead, Software-Only Approach for Supporting Fine-Grain Shared Memory. In ASPLOS, pages 174-185, Oct. 1996.]]
[26]
I. Schoinas, B. Falsafi, A. Lebeck, S. Reinhardt, J. Larus, and D. Wood. Fine-grain Access Control for Distributed Shared Memory. In ASPLOS, pages 297-306, Oct. 1994.]]
[27]
T. Suganuma, T. Ogasawara, M. Takeuchi, T. Yasue, M. Kawahito, K. Ishizaki, H. Komatsu, and T. Nakatani. Overview of the IBM Java Just-in-Time Compiler. IBM Systems Journal, 39(1):175-193, 2000.]]
[28]
J. Whaley and M. Rinard. Compositional Pointer and Escape Analysis for Java Programs. In OOPSLA(14), pages 187-206, Denver, CO, Nov. 1999.]]
[29]
S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proc. of the 22nd Int. Symp. on Computer Architecture, pages 24-36, Santa Margherita Ligure, Italy, June 1995.]]
[30]
Y.C. Hu, W. Yu, D. Wallach, A.L. Cox, W. Zwaenepoel. Run-time Support for Distributed Sharing in Typed Languages. In Proceedings of Languages, Compilers, and Runtimes for Scalable Computing, May 2000.]]
[31]
W. Yu and A. Cox. Java/DSM: A Platform for Heterogeneous Computing. Concurrency: Practice and Experience, 9(11):1213-1224, Nov. 1997.]]
[32]
Y. Zhou, L. Iftode, and K. Li. Performance Evaluation of Two Home-Based Lazy Release-Consistency Protocols for Shared Virtual Memory Systems. In 2nd USENIX Symp. on OSDI, pages 75-88, Seattle, WA, Oct. 1996.]]

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 36, Issue 7
July 2001
143 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/568014
Issue’s Table of Contents
  • cover image ACM Conferences
    PPoPP '01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
    June 2001
    142 pages
    ISBN:1581133464
    DOI:10.1145/379539
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2001
Published in SIGPLAN Volume 36, Issue 7

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Priority Promotion with Parysian FlairJournal of Computer and System Sciences10.1016/j.jcss.2024.103580(103580)Online publication date: Aug-2024
  • (2015)Benchmarks for Parity GamesFundamentals of Software Engineering10.1007/978-3-319-24644-4_9(127-142)Online publication date: 12-Nov-2015
  • (2007)Esodyp+: Prefetching in the Jackal Software DSMEuro-Par 2007 Parallel Processing10.1007/978-3-540-74466-5_60(563-573)Online publication date: 2007
  • (2006)A platform-independent distributed runtime for standard multithreaded JavaInternational Journal of Parallel Programming10.1007/s10766-006-0007-034:2(113-142)Online publication date: 1-Apr-2006
  • (2004)A distributed runtime for java:yesterday and today18th International Parallel and Distributed Processing Symposium, 2004. Proceedings.10.1109/IPDPS.2004.1303149(159-165)Online publication date: 2004
  • (2003)JavaSplit: a runtime for execution of monolithic Java programs on heterogenous collections of commodity workstationsProceedings IEEE International Conference on Cluster Computing CLUSTR-0310.1109/CLUSTR.2003.1253306(110-117)Online publication date: 2003
  • (2011)Design of Hierarchical Thread Pool Executor for DSM2011 Second International Conference on Intelligent Systems, Modelling and Simulation10.1109/ISMS.2011.50(284-288)Online publication date: Jan-2011
  • (2009)Towards an actor-based concurrent machine modelProceedings of the 4th workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems10.1145/1565824.1565825(4-9)Online publication date: 6-Jul-2009
  • (2009)PleiadProceedings of the 6th ACM conference on Computing frontiers10.1145/1531743.1531761(109-116)Online publication date: 18-May-2009
  • (2009)Path-Analytic Distributed Object PrefetchingProceedings of the 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks10.1109/I-SPAN.2009.131(98-103)Online publication date: 14-Dec-2009
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media