research-article

Effects of Multithreading on Cache Performance

Authors:

Woo-Jong HahnAuthors Info & Claims

IEEE Transactions on Computers, Volume 48, Issue 2

Pages 176 - 184

https://doi.org/10.1109/12.752659

Published: 01 February 1999 Publication History

Abstract

As the performance gap between processor and memory grows, memory latency becomes a major bottleneck in achieving high processor utilization. Multithreading has emerged as one of the most promising and exciting techniques used to tolerate memory latency by exploiting thread-level parallelism. The question, however, remains as to how effective multithreading is on tolerating memory latency. The performance of multithreading is not only affected by the overlapping of memory latency with useful computation, but also strongly depends on the cache behavior and the overhead of multithreading (e.g., thread management and context-switch costs). In particular, multithreading affects the behavior of caches, and, thus, the overall performance in a nontrivial fashion. To study these issues, this paper presents the Multithreaded Virtual Processor (MVP) model. MVP integrates the multithreaded programming paradigm and a modern superscalar processor with support for fast context switching and thread scheduling. Our studies with MVP show that, in general, the performance improvements are obtained not only by tolerating memory latency but also lower cache miss rates due to exploitation of data locality.However, multithreading creates an additional stress on the memory hierarchy caused by the interference among threads. Also, the dynamic behavior of multithreaded execution hinders the instruction locality that results in a high number of misses in the L1 instruction cache.

References

[1]

A. Agarwal J. Kubiatowicz D. Kranz, and B. Lim, “Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors,” IEEE Micro, pp. 48-61, June 1993.

Digital Library

[2]

R. Alverson D. Callahan D. Cummings B. Koblenz A. Porterfield, and B. Smith, “The Tera Computer System,” Proc. Int'l Conf. Supercomputing, pp. 1-6, June 1990.

Digital Library

[3]

D. Butenhof, Programming with POSIX Threads. Addison Wesley, 1997.

Digital Library

[4]

D. Callahan K. Kennedy, and A. Porterfield, “Software Prefetching,” Proc. Fourth Symp. Architectural Support for Programming Languages and Operating Systems, pp. 40-52, Apr. 1991.

Digital Library

[5]

B. Catanzaro, Multiprocessor System Architectures. Prentice Hall, 1994.

Digital Library

[6]

T. Chen and J. Baer, “A Performance Study of Software and Hardware Data Prefetching Schemes,” Proc. 21st Ann. Int'l Symp. Computer Architecture, pp. 223-232, Apr. 1994.

Digital Library

[7]

S. Coleman and K. McKinley, “Tile Size Selection Using Cache Organization and Data Layout,” Proc. SIGPLAN '95 Conf. Programming Language Design and Implementation, June 1995.

Digital Library

[8]

D. Culler and J. Singh, Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, 1999.

Digital Library

[9]

H. Gulati and N. Bagherzadeh, “Performance Study of a Multithreaded Superscalar Microprocessor,” Proc. Second Int'l Symp. High-Performance Computer Architecture, pp. 298-307, Jan. 1995.

Digital Library

[10]

B. Gunther, “Multithreading with Distributed Functional Units,” IEEE Trans. Computers, vol. 46, no. 4, pp. 399-411, Apr. 1997.

Digital Library

[11]

J. Hennessy and D. Patterson, Computer Architecture—A Quantitative Approach, second ed. Morgan Kaufmann, 1994.

Digital Library

[12]

T. Johnson M. Merten, and W. Hwu, “Runtime Spatial Locality Detection and Optimization,” Proc. Micro-30, Dec. 1997.

Digital Library

[13]

A. Klaiber and H. Levy, “An Architecture for Software-Controlled Data Prefetching,” Proc. 18th Ann. Int'l Symp. Computer Architecture, pp. 43-53, May 1991.

Digital Library

[14]

G. Kurpanek K. Chan J. Zheng E. DeLano, and W. Bryg, “PA7200: A PA-RISC Processor with Integrated High Performance MP Bus Interface,” Digest of Papers, Spring COMPCON '94, pp. 375-382, 1994.

[15]

B. Lee H. Kwak R. Carlson S. Yoon, and W. Han, “A Simulation Study of Multithreaded Virtual Processor,” Proc. Second European Conf. Parallel and Distributed Systems, July 1998.

[16]

J. Lo S. Eggers J. Emer H. Levy R. Stamm, and D. Tullsen, “Converting Thread-Level Parallelism into Instruction-Level Parallelism via Simultaneous Multithreading,” ACM Trans. Computer Systems, pp. 322-354, Aug. 1997.

Digital Library

[17]

A. Milenkovic and V. Milutinovic, “Lazy Prefetching,” Proc. IEEE HICSS-98, Jan. 1998.

Digital Library

[18]

V. Milutinovic M. Tomasevic B. Markovic, and M. Tremblay, “The Split Temporal/Spatial Cache: Initial Performance Analysis,” Proc. SCIzzL-5, Mar. 1996.

[19]

T. Mowry M. Lam, and A. Gupta, “Design and Evaluation of a Compiler Algorithm for Prefetching,” Proc. Fifth Int'l Conf. Architectural Support for Programming Language and Operating Systems, pp. 62-73 Oct. 1992.

Digital Library

[20]

S. Palacharla and R. Kessler, “Evaluating Stream Buffers as a Second Cache Replacement,” Proc. 21st Ann. Int'l Symp. Computer Architecture, pp. 24-33, Apr. 1994.

Digital Library

[21]

J. Philbin J. Edler O. Anshus C. Douglas, and K. Li, “Thread Scheduling for Cache Locality,” Proc. Architectural Support for Programming Languages and Operating Systems, pp. 60-71, 1996.

Digital Library

[22]

B. Smith, “The Architecture of HEP,” Proc. Parallel MIMD Computation: HEP Supercomputer and Applications, 1985.

Digital Library

[23]

G. Sohi and M. Franklin, “High-Bandwidth Data Memory Systems for Superscalar Processors,” Proc. AS-PLOS-IV, pp. 53-62, Apr. 1991.

Digital Library

[24]

R. Thekkath and S. Eggers, “The Effectiveness of Multiple Hardware Contexts,” Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, Oct. 1994.

Digital Library

[25]

S. Woo M. Ohara E. Torrie J. Singh, and A. Gupta, “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proc. 22nd Int'l Symp. Computer Architecture, pp. 24-36, June 1995.

Digital Library

Cited By

Ryoo JKandemir MKarakoy M(2022)Memory Space RecyclingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35080346:1(1-24)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3508034
Faqir-Rhazoui YGarcía CTirado F(2022)Performance Portability Assessment: Non-negative Matrix Factorization as a Case StudyEuro-Par 2022: Parallel Processing Workshops10.1007/978-3-031-31209-0_18(239-250)Online publication date: 22-Aug-2022
https://dl.acm.org/doi/10.1007/978-3-031-31209-0_18
Gracia DFerrerón ACampo LArnal TYúfera V(2014)Revisiting LP-NUCA Energy ConsumptionACM Transactions on Architecture and Code Optimization10.1145/263221711:2(1-26)Online publication date: 1-Jun-2014
https://dl.acm.org/doi/10.1145/2632217
Show More Cited By

Index Terms

Effects of Multithreading on Cache Performance
1. Hardware
  1. Hardware validation
  2. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Multiprocessing / multiprogramming / multitasking

Recommendations

Cache management for discrete processor architectures
ISPA'05: Proceedings of the Third international conference on Parallel and Distributed Processing and Applications

Many schemes had been used to reduce the performance (or speed) gap between processors and main memories; such as the cache memory is one of the most methods. In this paper, we issue the structure of shared cache, which is based on the multiprocessor ...
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruction-level parallelism (ILP) and thread-level parallelism (TLP). Wide-issue super-scalar processors exploit ILP by executing multiple instructions from a ...
A large, fast instruction window for tolerating cache misses
ISCA '02: Proceedings of the 29th annual international symposium on Computer architecture

Instruction window size is an important design parameter for many modern processors. Large instruction windows offer the potential advantage of exposing large amounts of instruction level parallelism. Unfortunately naively scaling conventional window ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers

IEEE Transactions on Computers Volume 48, Issue 2

Special issue on cache memory and related problems

February 1999

168 pages

ISSN:0018-9340

Editors:
Jean-Luc Gaudiot
Univ. of Southern California
,
Veljko Milutinovic
Univ. of Belgrade, Serbia, Yugoslavia
,
Mateo Valero
Univ. Politecnica de Catalunya, Barcelona, Spain

Issue’s Table of Contents

Copyright © Copyright © 1999 IEEE. All Rights Reserved.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 February 1999

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ryoo JKandemir MKarakoy M(2022)Memory Space RecyclingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35080346:1(1-24)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3508034
Faqir-Rhazoui YGarcía CTirado F(2022)Performance Portability Assessment: Non-negative Matrix Factorization as a Case StudyEuro-Par 2022: Parallel Processing Workshops10.1007/978-3-031-31209-0_18(239-250)Online publication date: 22-Aug-2022
https://dl.acm.org/doi/10.1007/978-3-031-31209-0_18
Gracia DFerrerón ACampo LArnal TYúfera V(2014)Revisiting LP-NUCA Energy ConsumptionACM Transactions on Architecture and Code Optimization10.1145/263221711:2(1-26)Online publication date: 1-Jun-2014
https://dl.acm.org/doi/10.1145/2632217
Suh GDevadas SRudolph L(2014)Analytical cache models with applications to cache partitioningACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2667181(323-334)Online publication date: 10-Jun-2014
https://dl.acm.org/doi/10.1145/2591635.2667181
Liu FSolihin Y(2010)Understanding the behavior and implications of context switch missesACM Transactions on Architecture and Code Optimization10.1145/1880043.18800487:4(1-28)Online publication date: 30-Dec-2010
https://dl.acm.org/doi/10.1145/1880043.1880048
Girodias BBouchebaba YNicolescu GAboulhamid EPaulin PLavigueur B(2009)Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia ApplicationsJournal of Signal Processing Systems10.1007/s11265-008-0293-457:2(263-283)Online publication date: 1-Nov-2009
https://dl.acm.org/doi/10.1007/s11265-008-0293-4
Sethi MAnand ADouglis F(2008)On concurrency improvements in enterprise SOA middlewareProceedings of the ACM/IFIP/USENIX Middleware '08 Conference Companion10.1145/1462735.1462745(42-47)Online publication date: 1-Dec-2008
https://dl.acm.org/doi/10.1145/1462735.1462745
Liu FGuo FSolihin YKim SEker AMoshovos ATarditi DOlukotun K(2008)Characterizing and modeling the behavior of context switch missesProceedings of the 17th international conference on Parallel architectures and compilation techniques10.1145/1454115.1454130(91-101)Online publication date: 25-Oct-2008
https://dl.acm.org/doi/10.1145/1454115.1454130
Penner MPrasanna V(2007)Cache-Friendly implementations of transitive closureACM Journal of Experimental Algorithmics10.1145/1187436.121058611(1.3-es)Online publication date: 9-Feb-2007
https://dl.acm.org/doi/10.1145/1187436.1210586
Shindi RCooper S(2006)Evaluate the performance changes of processor simulator benchmarks When context switches are incorporatedACM SIGAda Ada Letters10.1145/1185875.1185645XXVI:3(9-14)Online publication date: 12-Nov-2006
https://dl.acm.org/doi/10.1145/1185875.1185645
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents