Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/ISCA.2006.25acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

Learning-Based SMT Processor Resource Distribution via Hill-Climbing

Published: 01 May 2006 Publication History

Abstract

The key to high performance in Simultaneous Multithreaded (SMT) processors lies in optimizing the distribution of shared resources to active threads. Existing resource distribution techniques optimize performance only indirectly. They infer potential performance bottlenecks by observing indicators, like instruction occupancy or cache miss counts, and take actions to try to alleviate them. While the corrective actions are designed to improve performance, their actual performance impact is not known since end performance is never monitored. Consequently, potential performance gains are lost whenever the corrective actions do not effectively address the actual bottlenecks occurring in the pipeline. We propose a different approach to SMT resource distribution that optimizes end performance directly. Our approach observes the impact that resource distribution decisions have on performance at runtime, and feeds this information back to the resource distribution mechanisms to improve future decisions. By evaluating many different resource distributions, our approach tries to learn the best distribution over time. Because we perform learning on-line, learning time is crucial. We develop a hill-climbing algorithm that efficiently learns the best distribution of resources by following the performance gradient within the resource distribution space. This paper conducts an in-depth investigation of learningbased SMT resource distribution. First, we compare existing resource distribution techniques to an ideal learning-based technique that performs learning off-line. This limit study shows learning-based techniques can provide up to 19.2% gain over ICOUNT, 18.0% gain over FLUSH, and 7.6% gain over DCRA across 21 multithreaded workloads. Then, we present an on-line learning algorithm based on hill-climbing. Our evaluation shows hill-climbing provides a 12.4% gain over ICOUNT, 11.3% gain over FLUSH, and 2.4% gain over DCRA across a larger set of 42 multiprogrammed workloads.

References

[1]
{1} D. Burger and T. M. Austin. The SimpleScalar Tool Set, Version 2.0. CS TR 1342, University of Wisconsin-Madison, June 1997.
[2]
{2} F. J. Cazorla, A. Ramirez, M. Valero, and E. Fernandez. Dynamically Controlled Resource Allocation in SMT Processors. In Proceedings of the 37th International Symposium on Microarchitecture, pages 171-182. IEEE Computer Society, December 2004.
[3]
{3} G. K. Dorai and D. Yeung. Transparent Threads: Resource Allocation in SMT Processors for High Single-Thread Performance. In Proceedings of the 11th Annual International Conference on Parallel Architectures and Compilation Techniques, Charlottesville, VA, September 2002.
[4]
{4} A. El-Moursy and D. H. Albonesi. Front-End Policies for Improved Issue Efficiency in SMT Processors. In Proceedings of the 9th International Conference on High Performance Computer Architecture, February 2003.
[5]
{5} R. Goncalves, E. Ayguade, and a. P. O. A. N. M. Valero. Performance Evaluation of Decoding and Dispatching Stages in Simultaneous Multithreaded Architectures. In Proceedings of the 13th Symposium on Computer Architecture and High Performance Computing , September 2001.
[6]
{6} http://www.intel.com/design/Pentium4/index.htm. Intel Pentium 4 Processor. 2002.
[7]
{7} R. N. Kalla, B. Sinharoy, and J. M. Tendler. IBM Power5 Chip: A Dual-Core Multithreaded Processor. IEEE Micro, 24(2): 40-47, 2004.
[8]
{8} D. Kim and D. Yeung. Design and Evaluation of Compiler Algorithms for Pre-Execution. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), San Jose, CA, October 2002.
[9]
{9} F. Latorre, J. Gonzalez, and A. Gonzalez. Back-end Assignment Schemes for Clustered Multithreaded Processors. In Proceedings of the 18th Annual International Conference on Supercomputing, pages 316-325, July 2004.
[10]
{10} K. Luo, M. Franklin, S. S. Mukherjee, and A. Seznec. Boosting SMT Performance by Speculation Control. In Proceedings of the International Parallel and Distributed Processing Symposium, San Francisco, CA, April 2001.
[11]
{11} K. Luo, J. Gummaraju, and M. Franklin. Balancing Throughput and Fairness in SMT Processors. In Proceedings of the International Symposium on Performance Analysis of Systems and Software , November 2001.
[12]
{12} D. Madon, E. Sanchez, and S. Monnier. A Study of a Simultaneous Multithreaded Processor Implementation. In Proceedings of EuroPar '99, pages 716-726, Toulouse, France, August 1999. Springer-Verlag.
[13]
{13} D. T. Marr, F. Binns, D. Hill, G. Hinton, D. Koufaty, J. A. Miller, and M. Upton. Hyper-threading Technology Architecture and Microarchitecture. In Intel Technology Journal, 6(1), February 2002.
[14]
{14} S. E. Raasch and S. K. Reinhardt. The Impact of Resource Partitioning on SMT processors. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, September 2003.
[15]
{15} T. Sherwood, E. Perelman, and B. Calder. Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications. In Proceedings of the 10th International Conference on Parallel Architectures and Compilation Techniques, September 2001.
[16]
{16} T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. In Proceedings of 10th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, October 2002. ACM.
[17]
{17} T. Sherwood, S. Sair, and B. Calder. Phase Tracking and Prediction. In Proceedings of the 30th Annual International Symposium on Computer Architecture, pages 336-347, June 2003.
[18]
{18} A. Snavely, D. M. Tullsen, and G. Voelker. Symbiotic Jobscheduling with Priorities for a Simultaneous Multithreading Processor. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems, June 2002.
[19]
{19} D. M. Tullsen and J. A. Brown. Handling long-latency loads in a simultaneous multithreading processor. In Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, pages 318-327. IEEE Computer Society, 2001.
[20]
{20} D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor. In Proceedings of the 1996 International Symposium on Computer Architecture , Philadelphia, May 1996.

Cited By

View all
  • (2024)Practical Online Reinforcement Learning for Microprocessors With Micro-Armed BanditIEEE Micro10.1109/MM.2024.340871944:4(80-87)Online publication date: 1-Jul-2024
  • (2023)Micro-Armed Bandit: Lightweight & Reusable Reinforcement Learning for Microarchitecture Decision-Making56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623780(698-713)Online publication date: 28-Oct-2023
  • (2019)SOSAProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358312(685-698)Online publication date: 12-Oct-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture
June 2006
383 pages
ISBN:076952608X
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 34, Issue 2
    May 2006
    383 pages
    ISSN:0163-5964
    DOI:10.1145/1150019
    Issue’s Table of Contents

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 May 2006

Check for updates

Qualifiers

  • Article

Conference

ISCA06
Sponsor:

Acceptance Rates

ISCA '06 Paper Acceptance Rate 31 of 234 submissions, 13%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)1
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Practical Online Reinforcement Learning for Microprocessors With Micro-Armed BanditIEEE Micro10.1109/MM.2024.340871944:4(80-87)Online publication date: 1-Jul-2024
  • (2023)Micro-Armed Bandit: Lightweight & Reusable Reinforcement Learning for Microarchitecture Decision-Making56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623780(698-713)Online publication date: 28-Oct-2023
  • (2019)SOSAProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358312(685-698)Online publication date: 12-Oct-2019
  • (2019)Generative and multi-phase learning for computer systems optimizationProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3326633(39-52)Online publication date: 22-Jun-2019
  • (2018)SPECTRACM SIGPLAN Notices10.1145/3296957.317319953:2(169-183)Online publication date: 19-Mar-2018
  • (2018)SPECTRProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3173199(169-183)Online publication date: 19-Mar-2018
  • (2016)Warped-slicerACM SIGARCH Computer Architecture News10.1145/3007787.300116144:3(230-242)Online publication date: 18-Jun-2016
  • (2016)Symmetry-Agnostic Coordinated Management of the Memory Hierarchy in Multicore SystemsACM Transactions on Architecture and Code Optimization10.1145/284725412:4(1-26)Online publication date: 4-Jan-2016
  • (2016)Warped-slicerProceedings of the 43rd International Symposium on Computer Architecture10.1109/ISCA.2016.29(230-242)Online publication date: 18-Jun-2016
  • (2016)Efficient resource sharing algorithm for physical register file in simultaneous multi-threading processorsMicroprocessors & Microsystems10.1016/j.micpro.2016.06.00245:PB(270-282)Online publication date: 1-Sep-2016
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media