Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Effects of Multithreading on Cache Performance

Published: 01 February 1999 Publication History

Abstract

As the performance gap between processor and memory grows, memory latency becomes a major bottleneck in achieving high processor utilization. Multithreading has emerged as one of the most promising and exciting techniques used to tolerate memory latency by exploiting thread-level parallelism. The question, however, remains as to how effective multithreading is on tolerating memory latency. The performance of multithreading is not only affected by the overlapping of memory latency with useful computation, but also strongly depends on the cache behavior and the overhead of multithreading (e.g., thread management and context-switch costs). In particular, multithreading affects the behavior of caches, and, thus, the overall performance in a nontrivial fashion. To study these issues, this paper presents the Multithreaded Virtual Processor (MVP) model. MVP integrates the multithreaded programming paradigm and a modern superscalar processor with support for fast context switching and thread scheduling. Our studies with MVP show that, in general, the performance improvements are obtained not only by tolerating memory latency but also lower cache miss rates due to exploitation of data locality.However, multithreading creates an additional stress on the memory hierarchy caused by the interference among threads. Also, the dynamic behavior of multithreaded execution hinders the instruction locality that results in a high number of misses in the L1 instruction cache.

References

[1]
A. Agarwal J. Kubiatowicz D. Kranz, and B. Lim, “Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors,” IEEE Micro, pp. 48-61, June 1993.
[2]
R. Alverson D. Callahan D. Cummings B. Koblenz A. Porterfield, and B. Smith, “The Tera Computer System,” Proc. Int'l Conf. Supercomputing, pp. 1-6, June 1990.
[3]
D. Butenhof, Programming with POSIX Threads. Addison Wesley, 1997.
[4]
D. Callahan K. Kennedy, and A. Porterfield, “Software Prefetching,” Proc. Fourth Symp. Architectural Support for Programming Languages and Operating Systems, pp. 40-52, Apr. 1991.
[5]
B. Catanzaro, Multiprocessor System Architectures. Prentice Hall, 1994.
[6]
T. Chen and J. Baer, “A Performance Study of Software and Hardware Data Prefetching Schemes,” Proc. 21st Ann. Int'l Symp. Computer Architecture, pp. 223-232, Apr. 1994.
[7]
S. Coleman and K. McKinley, “Tile Size Selection Using Cache Organization and Data Layout,” Proc. SIGPLAN '95 Conf. Programming Language Design and Implementation, June 1995.
[8]
D. Culler and J. Singh, Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, 1999.
[9]
H. Gulati and N. Bagherzadeh, “Performance Study of a Multithreaded Superscalar Microprocessor,” Proc. Second Int'l Symp. High-Performance Computer Architecture, pp. 298-307, Jan. 1995.
[10]
B. Gunther, “Multithreading with Distributed Functional Units,” IEEE Trans. Computers, vol. 46, no. 4, pp. 399-411, Apr. 1997.
[11]
J. Hennessy and D. Patterson, Computer Architecture—A Quantitative Approach, second ed. Morgan Kaufmann, 1994.
[12]
T. Johnson M. Merten, and W. Hwu, “Runtime Spatial Locality Detection and Optimization,” Proc. Micro-30, Dec. 1997.
[13]
A. Klaiber and H. Levy, “An Architecture for Software-Controlled Data Prefetching,” Proc. 18th Ann. Int'l Symp. Computer Architecture, pp. 43-53, May 1991.
[14]
G. Kurpanek K. Chan J. Zheng E. DeLano, and W. Bryg, “PA7200: A PA-RISC Processor with Integrated High Performance MP Bus Interface,” Digest of Papers, Spring COMPCON '94, pp. 375-382, 1994.
[15]
B. Lee H. Kwak R. Carlson S. Yoon, and W. Han, “A Simulation Study of Multithreaded Virtual Processor,” Proc. Second European Conf. Parallel and Distributed Systems, July 1998.
[16]
J. Lo S. Eggers J. Emer H. Levy R. Stamm, and D. Tullsen, “Converting Thread-Level Parallelism into Instruction-Level Parallelism via Simultaneous Multithreading,” ACM Trans. Computer Systems, pp. 322-354, Aug. 1997.
[17]
A. Milenkovic and V. Milutinovic, “Lazy Prefetching,” Proc. IEEE HICSS-98, Jan. 1998.
[18]
V. Milutinovic M. Tomasevic B. Markovic, and M. Tremblay, “The Split Temporal/Spatial Cache: Initial Performance Analysis,” Proc. SCIzzL-5, Mar. 1996.
[19]
T. Mowry M. Lam, and A. Gupta, “Design and Evaluation of a Compiler Algorithm for Prefetching,” Proc. Fifth Int'l Conf. Architectural Support for Programming Language and Operating Systems, pp. 62-73 Oct. 1992.
[20]
S. Palacharla and R. Kessler, “Evaluating Stream Buffers as a Second Cache Replacement,” Proc. 21st Ann. Int'l Symp. Computer Architecture, pp. 24-33, Apr. 1994.
[21]
J. Philbin J. Edler O. Anshus C. Douglas, and K. Li, “Thread Scheduling for Cache Locality,” Proc. Architectural Support for Programming Languages and Operating Systems, pp. 60-71, 1996.
[22]
B. Smith, “The Architecture of HEP,” Proc. Parallel MIMD Computation: HEP Supercomputer and Applications, 1985.
[23]
G. Sohi and M. Franklin, “High-Bandwidth Data Memory Systems for Superscalar Processors,” Proc. AS-PLOS-IV, pp. 53-62, Apr. 1991.
[24]
R. Thekkath and S. Eggers, “The Effectiveness of Multiple Hardware Contexts,” Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, Oct. 1994.
[25]
S. Woo M. Ohara E. Torrie J. Singh, and A. Gupta, “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proc. 22nd Int'l Symp. Computer Architecture, pp. 24-36, June 1995.

Cited By

View all
  • (2022)Memory Space RecyclingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35080346:1(1-24)Online publication date: 28-Feb-2022
  • (2022)Performance Portability Assessment: Non-negative Matrix Factorization as a Case StudyEuro-Par 2022: Parallel Processing Workshops10.1007/978-3-031-31209-0_18(239-250)Online publication date: 22-Aug-2022
  • (2014)Revisiting LP-NUCA Energy ConsumptionACM Transactions on Architecture and Code Optimization10.1145/263221711:2(1-26)Online publication date: 1-Jun-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 48, Issue 2
Special issue on cache memory and related problems
February 1999
168 pages
ISSN:0018-9340
Issue’s Table of Contents

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 February 1999

Author Tags

  1. Multithreading
  2. and locality.
  3. context switching
  4. memory latency
  5. memory tolerance

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Memory Space RecyclingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35080346:1(1-24)Online publication date: 28-Feb-2022
  • (2022)Performance Portability Assessment: Non-negative Matrix Factorization as a Case StudyEuro-Par 2022: Parallel Processing Workshops10.1007/978-3-031-31209-0_18(239-250)Online publication date: 22-Aug-2022
  • (2014)Revisiting LP-NUCA Energy ConsumptionACM Transactions on Architecture and Code Optimization10.1145/263221711:2(1-26)Online publication date: 1-Jun-2014
  • (2014)Analytical cache models with applications to cache partitioningACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2667181(323-334)Online publication date: 10-Jun-2014
  • (2010)Understanding the behavior and implications of context switch missesACM Transactions on Architecture and Code Optimization10.1145/1880043.18800487:4(1-28)Online publication date: 30-Dec-2010
  • (2009)Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia ApplicationsJournal of Signal Processing Systems10.1007/s11265-008-0293-457:2(263-283)Online publication date: 1-Nov-2009
  • (2008)On concurrency improvements in enterprise SOA middlewareProceedings of the ACM/IFIP/USENIX Middleware '08 Conference Companion10.1145/1462735.1462745(42-47)Online publication date: 1-Dec-2008
  • (2008)Characterizing and modeling the behavior of context switch missesProceedings of the 17th international conference on Parallel architectures and compilation techniques10.1145/1454115.1454130(91-101)Online publication date: 25-Oct-2008
  • (2007)Cache-Friendly implementations of transitive closureACM Journal of Experimental Algorithmics10.1145/1187436.121058611(1.3-es)Online publication date: 9-Feb-2007
  • (2006)Evaluate the performance changes of processor simulator benchmarks When context switches are incorporatedACM SIGAda Ada Letters10.1145/1185875.1185645XXVI:3(9-14)Online publication date: 12-Nov-2006
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media