Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Timestamp snooping: an approach for extending SMPs

Published: 12 November 2000 Publication History

Abstract

Symmetric muultiprocessor (SMP) servers provide superior performance for the commercial workloads that dominate the Internet. Our simulation results show that over one-third of cache misses by these applications result in cache-to-cache transfers, where the data is found in another processor's cache rather than in memory. SMPs are optimized for this case by using snooping protocols that broadcast address transactions to all processors. Conversely, directory-based shared-memory systems must indirectly locate the owner and sharers through a directory, resulting in larger average miss latencies.This paper proposes timestamp snooping, a technique that allows SMPs to i) utilize high-speed switched interconnection networks and ii) exploit physical locality by delivering address transactions to processors and memories without regard to order. Traditional snooping requires physical ordering of transactions. Timestamp snooping works by processing address transactions in a logical order. Logical time is maintained by adding a few bits per address transaction and having network switches perform a handshake to ensure on-time delivery. Processors and memories then reorder transactions based on their timestamps to establish a total order.We evaluate timestamp snooping with commercial workloads on a 16-processor SPARC system using the Simics full-system simulator. We simulate both an indirect (butterfly) and a direct (torus) network design. For OLTP, DSS, web serving, web searching, and one scientific application, timestamp snooping with the butterfly network runs 6-28% faster than directories, at a cost of 13-43% more link traffic. Similarly, with the torus network, timestamp snooping runs 6-29% faster for 17-37% more link traffic. Thus, timestamp snooping is worth considering when buying more interconnect bandwidth is easier than reducing interconnect latency.

References

[1]
Y. Afek, G. Brown, and M. Merritt. Lazy Caching. ACM Trans. Prog. Lang. Syst., 15(1):182-205, Jan. 1993.
[2]
A. Agarwal, R. Simoni, M. Horowitz, and J. Hennessy. An Evaluation of Directory Schemes for Cache Coherence. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 280-289, 1988.
[3]
Altavista Business Solutions. http://doc.altavista.com/ business_solutions/bus_solutions.html.
[4]
Apache HTTP Server Project. http://www.apache.org/ httpd.html.
[5]
E. Artiaga, N. Navarro, X. Martorell, and Y. Becerra. Implementing PARMACS Macros for Shared Memory Multiprocessor Environments. Technical report, Polytechnic University of Catalunya, Department of Computer Architecture Technical Report UPC-DAC-1997-07, Jan. 1997.
[6]
P. Barford and M. Crovella. Generating Representative Web Workloads for Network and Server Performance Evaluation. In Proceedings of the 1998 ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, pages 151-160, June 1998.
[7]
L. A. Barroso et al. Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 282-293, June 2000.
[8]
L. A. Barroso, K. Gharachorloo, and E. Bugnion. Memory System Characterization of Commercial Workloads. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 3-14, June 1998.
[9]
E. E. Bilir, R. M. Dickson, Y. Hu, M. Plakal, D. J. Sorin, M. D. Hill, and D. A. Wood. Multicast Snooping: A New Coherence Method Using a Multicast Address Network. In Proceedings of the 26th Annual International Symposium on Computer Architecture, May 1999.
[10]
R. Bisiani, A. Nowatzyk, and M. Ravishankar. Coherent Shared Memory on a Message Passing Machine. In Proceedings of the 1989 International Conference on Parallel Processing, pages I-133-141. ICPP, August 1989.
[11]
J. Borkenhagen and S. Storino. 4th Generation 64-bit PowerPC-Compatible Commercial Processor Design. IBM Whitepaper, January 13, 1999, http://www.rs6000.ibm.com/ resource/technology/nstar.pdf.
[12]
A. Charlesworth. Extending the SMP Envelope. IEEE Micro, pages 39-49, Jan/Feb 1998.
[13]
K. Diefendorff. Power4 Focuses on Memory Bandwidth. Microprocessor Report, 13(13), Oct. 1999.
[14]
J. Duato, S. Yalamanchili, and L. Ni. Interconnection Networks. IEEE Computer Society Press, 1997.
[15]
S. J. Frank. Tightly Coupled Multiprocessor System Speeds Memory-access Times. Electronics, 57(1):164-169, Jan. 1984.
[16]
R. M. Fujimoto. Parallel Discrete Event Simulation. Commun. ACM, 33(10):30-53, Oct. 1990.
[17]
R. M. Fujimoto. The Virtual Time Machine. In Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures (SPAA), June 1990.
[18]
K. Gharachorloo, M. Sharma, S. Steely, and S. V. Doren. Architecture and Design of AlphaServer GS320. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX), Nov. 2000.
[19]
L. Gwennap. Alpha 21364 to Ease Memory Bottleneck. Microprocessor Report, Oct. 1998.
[20]
M. Horowitz, C.-K. K. Yang, and S. Sidiropoulos. High- Speed Electrical Signaling: Overview and Limitations. IEEE Micro, 18(1), January/February 1998.
[21]
C. Hristea, D. Lenoski, and J. Keen. Measuring Memory Hierarchy Performance of Cache-coherent Multiprocessors Using Micro Benchmarks. In Proceedings of Supercomputing '97, Nov. 1997.
[22]
D. R. Jefferson. Virtual Time. ACM Trans. Prog. Lang. Syst., 7(3):404-425, July 1985.
[23]
S. Kunkel. Personal Communication, Apr. 2000.
[24]
S. Kunkel, B. Armstrong, and P. Vitale. System Optimization for OLTP Workloads. IEEE Micro, pages 56- 64, May/June 1999.
[25]
A. Landin, E. Hagersten, and S. Haridi. Race-Free Interconnection Networks and Multiprocessor Consistency. In Proceedings of the International Symposium on Computer Architecture, June 1991.
[26]
J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In Proceedings of the 24th Annual International Symposium on Computer Architecture, June 1997.
[27]
G. Lauterbach and T. Horel. UltraSPARC-III: Designing Third Generation 64-Bit Performance. IEEE Micro, 19(3), May/June 1999.
[28]
C. E. Leiserson. Systolic Priority Queues. In Caltech Conference on VLSI, pages 199-214, Jan. 1979.
[29]
T. D. Lovett and R. M. Clapp. STiNG: A CC-NUMA Computer System for the Commercial Marketplace. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 308-317, May 1996.
[30]
P. S. Magnusson et al. SimICS/sun4m: A Virtual Workstation. In Proceedings of Usenix Annual Technical Conference, June 1998.
[31]
S.-W. Moon, J. Rexford, and K. G. Shin. Scalable Hardware Priority Queue Architectures for High-Speed Packet Switches. In Proc. IEEE Real-Time Technology and Applications Symposium, pages 203-212, June 1997.
[32]
A. Nowatzyk. Performance Analysis of Hypercube Based Ensemble Machine Architectures. Phd thesis, Carnegie- Mellon, 1989.
[33]
A. Nowatzyk, M. Monger, M. Parkin, E. Kelly, M. Borwne, G. Aybay, and D. Lee. S3.mp: A Multiprocessor in a Matchbox. In Proc. PASA, 1993.
[34]
G. M. Papadopoulos. SC99 State-of-the-Field Address, 1999.
[35]
F. Pong, M. Dubois, and K. Lee. Design and Performance of SMPs with Asynchronous Caches. Technical Report HPL- 1999-149, HP Labs, Nov. 1999.
[36]
A. G. Ranade. How to Emulate Shared Memory. Journal of Computer and System Sciences, 42(3):307-326, 1991.
[37]
P. F. Reynolds, Jr., C. Williams, and R. R. Wagner, Jr. Isotach Networks. IEEE Transactions on Parallel and Distributed Systems, 8(4):337-348, April 1997.
[38]
A. Singhal, D. Broniarczyk, F. Cerauskis, J. Price, L. Yaun, C. Cheng, D. Doblar, S. Fosth, N. Agarwal, K. Harvery, E. Hagersten, and B. Liencres. Gigaplane: A High Performance Bus of Large SMPs. In IEEE Hot Interconnects, pages 41-52, Aug. 1996.
[39]
D. J. Sorin, M. Plakal, M. D. Hill, A. E. Condon, M. M. Martin, and D. A. Wood. Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol. Technical Report 1412, Computer Sciences Department, University of Wisconsin-Madison, Mar. 2000.
[40]
P. Sweazey and A. J. Smith. A Class of Compatible Cache Consistency Protocols and their Support by the IEEE Futurebus. In Proceedings of the 13th Annual International Symposium on Computer Architecture, pages 414-423, June 1986.
[41]
Transaction Processing Performance Council. TPC Benchmark C, Draft Specification, Revision 4.0.q, Aug. 1999.
[42]
Transaction Processing Performance Council. TPC Benchmark H (Decision Support), Standard Specification, Revision 1.1.0, June 1999.
[43]
G. White and P. Vogt. Profusion (tm): A Buffered, Cache Coherent Crossbar Switch. In IEEE Hot Interconnects, pages 87-96, Aug. 1997.
[44]
C. Williams, J. Paul F. Reyolds, and B. R. de Supinski. Delta Coherence Protocols. IEEE Concurrency, 8(3):21-27, July- September 2000.
[45]
S. C. Woo, M. Ohara, E. Torrie, J. P. Shingh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 24-36, June 22-24, 1995.

Cited By

View all
  • (2022)Snooping Coherence ProtocolsA Primer on Memory Consistency and Cache Coherence10.1007/978-3-031-01733-9_7(99-138)Online publication date: 18-Oct-2022
  • (2015)TardisProceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT)10.1109/PACT.2015.12(227-240)Online publication date: 18-Oct-2015
  • (2008)Two proposals for the inclusion of directory information in the last-level private caches of glueless shared-memory multiprocessorsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2008.07.00168:11(1413-1424)Online publication date: 1-Nov-2008
  • Show More Cited By

Index Terms

  1. Timestamp snooping: an approach for extending SMPs

                              Recommendations

                              Comments

                              Please enable JavaScript to view thecomments powered by Disqus.

                              Information & Contributors

                              Information

                              Published In

                              cover image ACM SIGOPS Operating Systems Review
                              ACM SIGOPS Operating Systems Review  Volume 34, Issue 5
                              Dec. 2000
                              269 pages
                              ISSN:0163-5980
                              DOI:10.1145/384264
                              Issue’s Table of Contents
                              • cover image ACM Conferences
                                ASPLOS IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
                                November 2000
                                271 pages
                                ISBN:1581133170
                                DOI:10.1145/378993
                              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                              Publisher

                              Association for Computing Machinery

                              New York, NY, United States

                              Publication History

                              Published: 12 November 2000
                              Published in SIGOPS Volume 34, Issue 5

                              Check for updates

                              Qualifiers

                              • Article

                              Contributors

                              Other Metrics

                              Bibliometrics & Citations

                              Bibliometrics

                              Article Metrics

                              • Downloads (Last 12 months)161
                              • Downloads (Last 6 weeks)22
                              Reflects downloads up to 18 Nov 2024

                              Other Metrics

                              Citations

                              Cited By

                              View all
                              • (2022)Snooping Coherence ProtocolsA Primer on Memory Consistency and Cache Coherence10.1007/978-3-031-01733-9_7(99-138)Online publication date: 18-Oct-2022
                              • (2015)TardisProceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT)10.1109/PACT.2015.12(227-240)Online publication date: 18-Oct-2015
                              • (2008)Two proposals for the inclusion of directory information in the last-level private caches of glueless shared-memory multiprocessorsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2008.07.00168:11(1413-1424)Online publication date: 1-Nov-2008
                              • (2022)Snooping Coherence ProtocolsA Primer on Memory Consistency and Cache Coherence10.1007/978-3-031-01764-3_7(107-149)Online publication date: 28-Mar-2022
                              • (2021)WiDir: A Wireless-Enabled Directory Cache Coherence Protocol2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00034(304-317)Online publication date: Feb-2021
                              • (2020)A Primer on Memory Consistency and Cache Coherence, Second EditionSynthesis Lectures on Computer Architecture10.2200/S00962ED2V01Y201910CAC04915:1(1-294)Online publication date: 4-Feb-2020
                              • (2018)ProtogenProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00030(247-260)Online publication date: 2-Jun-2018
                              • (2018)G-TSC: Timestamp Based Coherence for GPUs2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2018.00042(403-415)Online publication date: Feb-2018
                              • (2018)High-Performance GPU Transactional Memory via Eager Conflict Detection2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2018.00029(235-246)Online publication date: Feb-2018
                              • (2017)Efficient Sequential Consistency in GPUs via Relativistic Cache Coherence2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2017.40(625-636)Online publication date: Feb-2017
                              • Show More Cited By

                              View Options

                              View options

                              PDF

                              View or Download as a PDF file.

                              PDF

                              eReader

                              View online with eReader.

                              eReader

                              Login options

                              Media

                              Figures

                              Other

                              Tables

                              Share

                              Share

                              Share this Publication link

                              Share on social media