Article

Learning-Based SMT Processor Resource Distribution via Hill-Climbing

Authors:

Seungryul Choi,

Donald YeungAuthors Info & Claims

ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture

Pages 239 - 251

https://doi.org/10.1109/ISCA.2006.25

Published: 01 May 2006 Publication History

Abstract

The key to high performance in Simultaneous Multithreaded (SMT) processors lies in optimizing the distribution of shared resources to active threads. Existing resource distribution techniques optimize performance only indirectly. They infer potential performance bottlenecks by observing indicators, like instruction occupancy or cache miss counts, and take actions to try to alleviate them. While the corrective actions are designed to improve performance, their actual performance impact is not known since end performance is never monitored. Consequently, potential performance gains are lost whenever the corrective actions do not effectively address the actual bottlenecks occurring in the pipeline. We propose a different approach to SMT resource distribution that optimizes end performance directly. Our approach observes the impact that resource distribution decisions have on performance at runtime, and feeds this information back to the resource distribution mechanisms to improve future decisions. By evaluating many different resource distributions, our approach tries to learn the best distribution over time. Because we perform learning on-line, learning time is crucial. We develop a hill-climbing algorithm that efficiently learns the best distribution of resources by following the performance gradient within the resource distribution space. This paper conducts an in-depth investigation of learningbased SMT resource distribution. First, we compare existing resource distribution techniques to an ideal learning-based technique that performs learning off-line. This limit study shows learning-based techniques can provide up to 19.2% gain over ICOUNT, 18.0% gain over FLUSH, and 7.6% gain over DCRA across 21 multithreaded workloads. Then, we present an on-line learning algorithm based on hill-climbing. Our evaluation shows hill-climbing provides a 12.4% gain over ICOUNT, 11.3% gain over FLUSH, and 2.4% gain over DCRA across a larger set of 42 multiprogrammed workloads.

References

[1]

{1} D. Burger and T. M. Austin. The SimpleScalar Tool Set, Version 2.0. CS TR 1342, University of Wisconsin-Madison, June 1997.

Digital Library

[2]

{2} F. J. Cazorla, A. Ramirez, M. Valero, and E. Fernandez. Dynamically Controlled Resource Allocation in SMT Processors. In Proceedings of the 37th International Symposium on Microarchitecture, pages 171-182. IEEE Computer Society, December 2004.

Digital Library

[3]

{3} G. K. Dorai and D. Yeung. Transparent Threads: Resource Allocation in SMT Processors for High Single-Thread Performance. In Proceedings of the 11th Annual International Conference on Parallel Architectures and Compilation Techniques, Charlottesville, VA, September 2002.

Digital Library

[4]

{4} A. El-Moursy and D. H. Albonesi. Front-End Policies for Improved Issue Efficiency in SMT Processors. In Proceedings of the 9th International Conference on High Performance Computer Architecture, February 2003.

Digital Library

[5]

{5} R. Goncalves, E. Ayguade, and a. P. O. A. N. M. Valero. Performance Evaluation of Decoding and Dispatching Stages in Simultaneous Multithreaded Architectures. In Proceedings of the 13th Symposium on Computer Architecture and High Performance Computing , September 2001.

[6]

{6} http://www.intel.com/design/Pentium4/index.htm. Intel Pentium 4 Processor. 2002.

[7]

{7} R. N. Kalla, B. Sinharoy, and J. M. Tendler. IBM Power5 Chip: A Dual-Core Multithreaded Processor. IEEE Micro, 24(2): 40-47, 2004.

Digital Library

[8]

{8} D. Kim and D. Yeung. Design and Evaluation of Compiler Algorithms for Pre-Execution. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), San Jose, CA, October 2002.

Digital Library

[9]

{9} F. Latorre, J. Gonzalez, and A. Gonzalez. Back-end Assignment Schemes for Clustered Multithreaded Processors. In Proceedings of the 18th Annual International Conference on Supercomputing, pages 316-325, July 2004.

Digital Library

[10]

{10} K. Luo, M. Franklin, S. S. Mukherjee, and A. Seznec. Boosting SMT Performance by Speculation Control. In Proceedings of the International Parallel and Distributed Processing Symposium, San Francisco, CA, April 2001.

Digital Library

[11]

{11} K. Luo, J. Gummaraju, and M. Franklin. Balancing Throughput and Fairness in SMT Processors. In Proceedings of the International Symposium on Performance Analysis of Systems and Software , November 2001.

[12]

{12} D. Madon, E. Sanchez, and S. Monnier. A Study of a Simultaneous Multithreaded Processor Implementation. In Proceedings of EuroPar '99, pages 716-726, Toulouse, France, August 1999. Springer-Verlag.

Digital Library

[13]

{13} D. T. Marr, F. Binns, D. Hill, G. Hinton, D. Koufaty, J. A. Miller, and M. Upton. Hyper-threading Technology Architecture and Microarchitecture. In Intel Technology Journal, 6(1), February 2002.

[14]

{14} S. E. Raasch and S. K. Reinhardt. The Impact of Resource Partitioning on SMT processors. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, September 2003.

Digital Library

[15]

{15} T. Sherwood, E. Perelman, and B. Calder. Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications. In Proceedings of the 10th International Conference on Parallel Architectures and Compilation Techniques, September 2001.

Digital Library

[16]

{16} T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. In Proceedings of 10th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, October 2002. ACM.

Digital Library

[17]

{17} T. Sherwood, S. Sair, and B. Calder. Phase Tracking and Prediction. In Proceedings of the 30th Annual International Symposium on Computer Architecture, pages 336-347, June 2003.

Digital Library

[18]

{18} A. Snavely, D. M. Tullsen, and G. Voelker. Symbiotic Jobscheduling with Priorities for a Simultaneous Multithreading Processor. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems, June 2002.

Digital Library

[19]

{19} D. M. Tullsen and J. A. Brown. Handling long-latency loads in a simultaneous multithreading processor. In Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, pages 318-327. IEEE Computer Society, 2001.

Digital Library

[20]

{20} D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor. In Proceedings of the 1996 International Symposium on Computer Architecture , Philadelphia, May 1996.

Digital Library

Cited By

Gerogiannis GTorrellas J(2024)Practical Online Reinforcement Learning for Microprocessors With Micro-Armed BanditIEEE Micro10.1109/MM.2024.340871944:4(80-87)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/MM.2024.3408719
Gerogiannis GTorrellas J(2023)Micro-Armed Bandit: Lightweight & Reusable Reinforcement Learning for Microarchitecture Decision-Making56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623780(698-713)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3623780
Donyanavard BMück TRahmani ADutt NSadighi AMaurer FHerkersdorf A(2019)SOSAProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358312(685-698)Online publication date: 12-Oct-2019
https://dl.acm.org/doi/10.1145/3352460.3358312
Show More Cited By

Index Terms

Learning-Based SMT Processor Resource Distribution via Hill-Climbing
1. General and reference
  1. Document types
    1. Reference works
2. Hardware

Recommendations

Hill-climbing smt processor resource distribution
Learning-Based SMT Processor Resource Distribution via Hill-Climbing

The key to high performance in Simultaneous Multithreaded (SMT) processors lies in optimizing the distribution of shared resources to active threads. Existing resource distribution techniques optimize performance only indirectly. They infer potential ...
Hill-climbing SMT processor resource distribution

The key to high performance in Simultaneous MultiThreaded (SMT) processors lies in optimizing the distribution of shared resources to active threads. Existing resource distribution techniques optimize performance only indirectly. They infer potential ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture

June 2006

383 pages

ISBN:076952608X

ACM SIGARCH Computer Architecture News Volume 34, Issue 2
May 2006
383 pages
ISSN:0163-5964
DOI:10.1145/1150019
Issue’s Table of Contents

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 May 2006

Check for updates

Qualifiers

Article

Conference

ISCA06

Sponsor:

SIGARCH

ISCA06: The 33rd Annual International Symposium on Computer Architecture 2006

June 17 - 21, 2006

Acceptance Rates

ISCA '06 Paper Acceptance Rate 31 of 234 submissions, 13%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

80
Total Citations
View Citations
592
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)1

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gerogiannis GTorrellas J(2024)Practical Online Reinforcement Learning for Microprocessors With Micro-Armed BanditIEEE Micro10.1109/MM.2024.340871944:4(80-87)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/MM.2024.3408719
Gerogiannis GTorrellas J(2023)Micro-Armed Bandit: Lightweight & Reusable Reinforcement Learning for Microarchitecture Decision-Making56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623780(698-713)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3623780
Donyanavard BMück TRahmani ADutt NSadighi AMaurer FHerkersdorf A(2019)SOSAProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358312(685-698)Online publication date: 12-Oct-2019
https://dl.acm.org/doi/10.1145/3352460.3358312
Ding YMishra NHoffmann HManne SHunter HAltman E(2019)Generative and multi-phase learning for computer systems optimizationProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3326633(39-52)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3307650.3326633
Rahmani ADonyanavard BMück TMoazzemi KJantsch AMutlu ODutt N(2018)SPECTRACM SIGPLAN Notices10.1145/3296957.317319953:2(169-183)Online publication date: 19-Mar-2018
https://dl.acm.org/doi/10.1145/3296957.3173199
Rahmani ADonyanavard BMück TMoazzemi KJantsch AMutlu ODutt NShen XTuck JBianchini RSarkar V(2018)SPECTRProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3173199(169-183)Online publication date: 19-Mar-2018
https://dl.acm.org/doi/10.1145/3173162.3173199
Xu QJeon HKim KRo WAnnavaram M(2016)Warped-slicerACM SIGARCH Computer Architecture News10.1145/3007787.300116144:3(230-242)Online publication date: 18-Jun-2016
https://dl.acm.org/doi/10.1145/3007787.3001161
Zhou MDu YChilders BMosse DMelhem R(2016)Symmetry-Agnostic Coordinated Management of the Memory Hierarchy in Multicore SystemsACM Transactions on Architecture and Code Optimization10.1145/284725412:4(1-26)Online publication date: 4-Jan-2016
https://dl.acm.org/doi/10.1145/2847254
Xu QJeon HKim KRo WAnnavaram MMin SLoh G(2016)Warped-slicerProceedings of the 43rd International Symposium on Computer Architecture10.1109/ISCA.2016.29(230-242)Online publication date: 18-Jun-2016
https://dl.acm.org/doi/10.1109/ISCA.2016.29
Zhang YLin W(2016)Efficient resource sharing algorithm for physical register file in simultaneous multi-threading processorsMicroprocessors & Microsystems10.1016/j.micpro.2016.06.00245:PB(270-282)Online publication date: 1-Sep-2016
https://dl.acm.org/doi/10.1016/j.micpro.2016.06.002
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents