Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/52400.52442acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

A characterization of sharing in parallel programs and its application to coherency protocol evaluation

Published: 17 May 1988 Publication History

Abstract

In this paper we use trace-driven simulation to analyze the memory reference patterns of write shared data in several parallel applications. We first develop a characterization of write sharing (based on the notion of a write run), and then examine the traces, using metrics derived from the characterization. The results indicate that the amount of write sharing in all programs is small; and that it is characterized by short to medium sequences of per processor references, with little contention for either data or locks.
We determine to what extent this analysis can be used to predict the coherency overhead of write-invalidate and write-broadcast protocols. We develop a simple model of write sharing from the write run characterization. By applying the results of the sharing analysis to the model, weighted by machine-specific cycle costs for carrying out coherency-related bus operations, we can estimate relative protocol performance. We compare these results to those from detailed architectural simulations.
The simulation results indicate that (1) neither protocol dominates in performance; and that (2) the write run model is a good predictor of protocol performance when the unit of the coherency operations matches that in the sharing analysis. This is the case for the write-broadcast protocols, in which one word is broadcast for each write to shared data. However, in Berkeley Ownership, a write-invalidate protocol, the unit of coherency is an entire cache block. When the block size is large, performance for this protocol is quite sensitive to the memory reference patterns within the block.

References

[1]
A. Agarwal, R. L. Sites and M. Horowitz, "ATUM: A New Technique for Capturing Address Traces Using Microcode", Proceedings Thirteenth International Symposium on Compurer Architecture, 14,2 (June 1986). 119-127.
[2]
A. Agarwal and A. Gupta, "Memory-Reference Characteristics of Multiprocessor Applications under MACH", to appear in Proceedings of ACM Sigmetrics (1988).
[3]
I. Archibald and I. Baer, "An Economical Solution to the Cache Coherency Problem", Proceedings 11th Annual International Symposium on Computer Architecture. 12, 3 (June 1984), 355-362.
[4]
I. Archibald and I. Baer, "An Evaluation of Cache Coherence Solutions in Shared-Bus Multiprocessors', ACM Transactions on Computer Systems, 4,4 (November 1986). 273-298.
[5]
C. G. Bell, "Multi: A New Class of Multiprocessor Computers", Science, 228 (April 1985), 462467.
[6]
P. Bitar and A. M. Despain, ""Multiprocessor Cache Synchronization: Issues, Innovations, Evolution", Proceedings 13th Annual International Symposium on Computer Architecture, 14.2 (June 1986), 424-442.
[7]
W. C. Brantley, K. P. McAuliffe and J. Weiss, "RP3 Processor-Memory Element", Proceedings 1985 International Conference on Parallel Processing (198.5). 782-789.
[8]
A. Casotto, F. Romeo and A. Sangiovanni-Vincentelli, "A Parallel Simulated Annealing Algorithm for the Placement of Macro-Ceils", IEEE International Conference on Computer-Aided Design, Santa Clara, CA (November 1986). 30-33.
[9]
L. M. Censier and P. Feautrier. "A New Solution to Coherence Problems in Multicache Systems"", IEEE Transactions on Computers, C-27, 12 (December 1978), 1112-1118.
[10]
S. Devadas and A. R. Newton, "Topological Optimization of Multiple Level Army Logic", IEEE Transactions on Computer-Aided Design (November 1987).
[11]
M. Dubois and F. A. Briggs, "Effects of Cache Coherency in Multiprocessors", IEEE Transactions on Computers, C-31, 11 (November 1982). 1083-1099.
[12]
ELXSI, System 6400 Introduction (April 1984).
[13]
I. EdIer, A. Gottlieb, C. P. Kruskal, l, K. P. McAuliffe, L. Rudolph, M. Snir, P. J. Teller and I. Wilson, "Issues Related to MIMD Shared-memory Computers: the NYU Ultracomputer Approach", Proceedings 12th Annual International Symposium on Computer Architecture, 13, 3 (June 1985).
[14]
S. J. Eggers, "Simulation Analysis of Data Sharing Support in Shared Memory Multiprocessors", Ph.D. thesis, in progress, University of California, Berkeley (completion, 1988).
[15]
G. Fielland and D. Rogers, "32--bit Computer System Shares Load Equally Among up to 12 Proces.wrs", Electronic Design (September 1984). 153-168.
[16]
S. J. Frank, "Synapse Tightly Coupled Multiprocessors", Unpublished manuscript (1984).
[17]
R. M. Fuiimoto, "SIMON: a Simulator of MulticomDuter Networks", Technical Report No. UCB/Computer Science Dpt. 83/140, University of California. Berkeley (SeDtember
[18]
J. R. Goodman, "Using Cache Memory to Reduce Processor-Memory Traffic", Proceedings 10th Annual International Symposium on Computer Architecture, 11, 3 (June 1983). 124-131.
[19]
J. R. Goodman, "Cache Memory Optimization to Reduce Processor/Memory Traffic", Journal of VLSI and Computer Systems, 2,1& 2 (1987). 61-86.
[20]
R. N. Gustafson and F. J. Sparacio, "IBM 3081 Processor Unit Design Considerations and Design Process", IBM Journal of Research andDevelopment, 26.1 (January 1982), 12-21.
[21]
D. E. Heller, "Multiprocessor Simulation Program SIMON", Shell Development Company (November 1984).
[22]
M. D. Hill. S. J. Eagers, J. R. Lams, G. S. Taylor, G. Adams, B. K. Bose, G. KGIon, P. M. Hansen,-J. Keller, S. I. Kona. C. G. Lee. D. Lee, J. M. Pendleton, S. A. Ritchie, D. A. Wood, B. G.Zorn, P. N. Hibinger, D. Hodges, R. H. Katz, I. Ousterhout and D. A. Patterson, "SPUR: A VLSI Multiprocessor Workstation", IEEE Computer, 19. 11 (November 1986), 8-22.
[23]
R. Katz, S. Eggers, D. Wood, C. L. Perkins and R. Sheldon, "Implementing a Cache Consistency protocol", Proceedings 12th Annual International Symposium on Computer Architecture, 13.3 (June 1985), 276-283.
[24]
H. T. Ma, S. Devadas, R. Wei and A. Sangiovanni- Vincentelli, "Logic Verification Algorithms and their Parallel Imulementation". Proceedings of the 24th Design Automation Conference (July 1987), 283-290.
[25]
M. H. MacDougaB, "Instruction-Level Program and Processor Modeling", Computer, 17.7 (July 1984). 14-26.
[26]
E. McCreight, "The DRAGON Computer System: An Early Overview", NATO Advanced Study Institute on Microarchitecture of VLSI Computers, Urbino, Italy (July 1984).
[27]
S. McGrogan, R. Olson and N. Toda, "Parallelizing Large Existing Programs - Methodology and Experiences", Proceedings of Spring COMPCON (March I986), 458-466.
[28]
R. Olson, "Parallel Processing in a Message-Based Qwating System", IEEE Sofhare (July 1985). 3949.
[29]
M. S. Papamarcos and J. H. Panel, ""A Low-Overhead Coherence Solution for Multiprocessors with Private Cache Memories", Proceedings 11 th Annual International Symposium on Computer Architecture, 12.3 (January 1985). 348-354.
[30]
J. H. Patel, "Analysis of Multiprocessors with Private Cache Memories". IEEE Transactions on Computers, C-31, 4 (April 1982). 296-304.
[31]
G. F. Phster, W. C. Brantley, D. A. George, S. L. Harvey, W. J. Kleinfelder. K. P. McAuliffe. E. A. Melton. V. A. Norton and J. Weiss, ""The IBM Research Page1 Processor Prototype (RP3): Introduction and Architecture"", Proceedings 1985 International Conference on Parallel Processing (1985).
[32]
C. D. Rose, "Encore Eyes Multiprocessor Market", Efectronics (July 8,1985), 118-119.
[33]
M. Satyanarayanan, "Commercial Multiprocessing Systems", 1EEE Compwer. 13.5 (May 1980).
[34]
Z. Segall and L. Rudolph, "Dynamic Decentralized Cache Schemes for an MIMD Parallel Processor", Proceedings of the 11th International Symposium on Computer Architecture, 12.3 (June 1984), 340-347.
[35]
Sequent Computer Systems, Inc., Balance 8000 Technical Summary (November 1984).
[36]
A. J. Smith, "CPU Cache Consistency with Software Support Using 'One Time Identifiers"', Proceedings of the Pacific Computer Communications Symposium. Seoul, Republic of Korea (October, 1985).
[37]
J. R. Spim, Program Behavior: Models and Measurement, Elsevier North-Holland, Inc., New York NY, (1977).
[38]
C. K. Tang, "Cache System Design in the Tightly Coupled Mutliprocessor System", Proceedings of National Computer colfereme (1976). 749-753.
[39]
Texas Insmunents, "NuBUS Specification", TI-2242825- 0001 (1983).
[40]
C. P. Thacker and L. C. Stewart, "Firefly: A Multiprocessor Workstation", Proceedings of Second International CoMerence on Architectural Support for Programming Languages and Operating Systems, Palo Alto. CA (October 1987). 164-172.
[41]
M. K. Vernon and M. A. Holliday, "Performance Analysis of Multiprocessor Cache Consistency Protocols Using Generalized Timed Petri Nets", Preceedings of Performance '86 and ACM Sigmetrics 1986, Raleigh. NC (May 1986). 9- 17.
[42]
L. C. Widdoes, Jr., "The S-l Project Developing High- Performance Digital Computers"", Proceedings qf Compcon 80, San Francisco, CA (February 1980), 282-291.
[43]
D. A. Wood, S. J. Eggers. G. A. Gibson, D. Jeong, R. H. Katz and D. A. Patterson. "The SPUR Cache Controller Chip", Proceedings of SouthCon '88, Orlando FL (March 1988).
[44]
W. C. Yen and K. S. Fu, "Analysis of Multiprocessor Cache Organizations with Alternative Main Memory Update Policies", Proceedings 8th AM International Symposium on Computer Architecture, 9.3 (May, 1982). 89-100.

Cited By

View all
  • (2014)Trace-Driven Memory Access Pattern Recognition in Computational KernelsProceedings of the Second Workshop on Optimizing Stencil Computations10.1145/2686745.2686748(25-32)Online publication date: 20-Oct-2014
  • (2009)How to simulate 1000 coresACM SIGARCH Computer Architecture News10.1145/1577129.157713337:2(10-19)Online publication date: 23-Jul-2009
  • (2004)Speeding-up multiprocessors running DBMS workloads through coherence protocolsInternational Journal of High Performance Computing and Networking10.1504/IJHPCN.2004.0075621:1-3(17-32)Online publication date: 1-Aug-2004
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '88: Proceedings of the 15th Annual International Symposium on Computer architecture
June 1988
461 pages
ISBN:0818608617
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 16, Issue 2
    Special Issue: Proceedings of the 15th annual international symposium on Computer Architecture
    May 1988
    431 pages
    ISSN:0163-5964
    DOI:10.1145/633625
    Issue’s Table of Contents

Sponsors

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 17 May 1988

Check for updates

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)10
Reflects downloads up to 29 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2014)Trace-Driven Memory Access Pattern Recognition in Computational KernelsProceedings of the Second Workshop on Optimizing Stencil Computations10.1145/2686745.2686748(25-32)Online publication date: 20-Oct-2014
  • (2009)How to simulate 1000 coresACM SIGARCH Computer Architecture News10.1145/1577129.157713337:2(10-19)Online publication date: 23-Jul-2009
  • (2004)Speeding-up multiprocessors running DBMS workloads through coherence protocolsInternational Journal of High Performance Computing and Networking10.1504/IJHPCN.2004.0075621:1-3(17-32)Online publication date: 1-Aug-2004
  • (2000)Analysis of Shared Memory Misses and Reference PatternsProceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors10.5555/557517.846828Online publication date: 17-Sep-2000
  • (1999)PSCRIEEE Transactions on Parallel and Distributed Systems10.1109/71.78086810:7(742-763)Online publication date: 1-Jul-1999
  • (1999)Techniques for Improving Performance of Hybrid Snooping Cache ProtocolsJournal of Parallel and Distributed Computing10.1006/jpdc.1999.155859:3(329-359)Online publication date: 1-Dec-1999
  • (1998)A study of three dynamic approaches to handle widely shared data in shared-memory multiprocessorsProceedings of the 12th international conference on Supercomputing10.1145/277830.277943(457-464)Online publication date: 13-Jul-1998
  • (1998)Minimization of Communication Cost Through Caching in Mobile EnvironmentsIEEE Transactions on Parallel and Distributed Systems10.1109/71.6678989:4(378-390)Online publication date: 1-Apr-1998
  • (1998)Performance Evaluation and Cost Analysis of Cache Protocol Extensions for Shared-Memory MultiprocessorsIEEE Transactions on Computers10.1109/12.72978547:10(1041-1055)Online publication date: 1-Oct-1998
  • (1998)Evaluating the Effect of Coherence Protocols on the Performance of Parallel Programming ConstructsInternational Journal of Parallel Programming10.1023/A:101874491948326:2(143-181)Online publication date: 1-Apr-1998
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media