Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2830772.2830815acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Long term parking (LTP): criticality-aware resource allocation in OOO processors

Published: 05 December 2015 Publication History

Abstract

Modern processors employ large structures (IQ, LSQ, register file, etc.) to expose instruction-level parallelism (ILP) and memory-level parallelism (MLP). These resources are typically allocated to instructions in program order. This wastes resources by allocating resources to instructions that are not yet ready to be executed and by eagerly allocating resources to instructions that are not part of the application's critical path.
This work explores the possibility of allocating pipeline resources only when needed to expose MLP, and thereby enabling a processor design with significantly smaller structures, without sacrificing performance. First we identify the classes of instructions that should not reserve resources in program order and evaluate the potential performance gains we could achieve by delaying their allocations. We then use this information to "park" such instructions in a simpler, and therefore more efficient, Long Term Parking (LTP) structure. The LTP stores instructions until they are ready to execute, without allocating pipeline resources, and thereby keeps the pipeline available for instructions that can generate further MLP.
LTP can accurately and rapidly identify which instructions to park, park them before they execute, wake them when needed to preserve performance, and do so using a simple queue instead of a complex IQ. We show that even a very simple queue-based LTP design allows us to significantly reduce IQ (64 → 32) and register file (128 → 96) sizes while retaining MLP performance and improving energy efficiency.

References

[1]
A. R. Lebeck, J. Koppanalil, T. Li, J. Patwardhan, and E. Rotenberg, "A Large, Fast Instruction Window for Tolerating Cache Misses," in Proc. International Symposium on Computer Architecture (ISCA), 2002.
[2]
D. Ernst, A. Hamel, and T. Austin, "Cyclone: A Broadcast-free Dynamic Instruction Scheduler with Selective Replay," in Proc. International Symposium on Computer Architecture (ISCA), 2003.
[3]
E. Morancho, J. M. Llabería, and A. Olivé, "On Reducing Energy-consumption by Late-inserting Instructions into the Issue Queue," in Proc. International Symposium on Low Power Electronics and Design (ISLPED), 2007.
[4]
Y. Kora, K. Yamaguchi, and H. Ando, "MLP-aware Dynamic Instruction Window Resizing for Adaptively Exploiting Both ILP and MLP," in Proc. International Symposium on Microarchitecture (MICRO), 2013.
[5]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 Simulator," SIGARCH Comput. Archit. News, 2011.
[6]
J. L. Henning, "SPEC CPU2006 Benchmark Descriptions," SIGARCH Comput. Archit. News, 2006.
[7]
T. Carlson, W. Heirman, O. Allam, S. Kaxiras, and L. Eeckhout, "The Load Slice Core Microarchitecture," in Proc. International Symposium on Computer Architecture (ISCA), 2015.
[8]
S. Palacharla, N. P. Jouppi, and J. E. Smith, "Complexity-effective Superscalar Processors," in Proc. International Symposium on Computer Architecture (ISCA), 1997.
[9]
M. K. Gowan, L. L. Biro, and D. B. Jackson, "Power Considerations in the Design of the Alpha 21264 Microprocessor," in Proc. Design Automation Conference (DAC), 1998.
[10]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures," in Proc. International Symposium on Microarchitecture (MICRO), 2009.
[11]
N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, "CACTI 6.0: A Tool to Model Large Caches," tech. rep., Hewlett Packard Labs, 2009.
[12]
J. Dundas and T. Mudge, "Improving Data Cache Performance by Pre-executing Instructions under a Cache Miss," in International Conference on Supercomputing (ICS), 1997.
[13]
O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt, "Runahead Execution: An Alternative to Very Large Instruction Windows for Out-Of-Order Processors," in Proc. International Symposium on High-Performance Computer Architecture (HPCA), 2003.
[14]
H. Zhou, "Dual-core Execution: Building a Highly Scalable Single-thread Instruction Window," in Proc. International Conference on Parallel Architectures and Compilation Techniques (PACT), 2005.
[15]
G. A. Muthler, D. Crowe, S. J. Patel, and S. S. Lumetta, "Instruction Fetch Deferral Using Static Slack," in Proc. International Symposium on Microarchitecture (MICRO), 2002.
[16]
E. Morancho, J. M. Llabería, and A. Olivé, "Recovery Mechanism for Latency Misprediction," in Proc. International Conference on Parallel Architectures and Compilation Techniques (PACT), 2001.
[17]
S. Wallace and N. Bagherzadeh, "A Scalable Register File Architecture for Dynamically Scheduled Processors," in Proc. International Conference on Parallel Architectures and Compilation Techniques (PACT), 1996.
[18]
T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, and V. Vinals, "Delaying Physical Register Allocation through Virtual-Physical Registers," in Proc. International Symposium on Microarchitecture (MICRO), 1999.
[19]
M. Moudgill, K. Pingali, and S. Vassiliadis, "Register Renaming and Dynamic Speculation: An Alternative Approach," in Proc. International Symposium on Microarchitecture (MICRO), 1993.
[20]
A. Cristal, O. J. Santana, and M. Valero, "Toward Kilo-instruction Processors," ACM Transactions on Architecture and Code Optimization, vol. 1, pp. 368--396, Dec. 2004.
[21]
S. T. Srinivasan, R. Rajwar, H. Akkary, A. Gandhi, and M. Upton, "Continual Flow Pipelines," in Proc. Internationl Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2004.
[22]
A. Hilton and A. Roth, "BOLT: Energy-efficient Out-Of-Order Latency-tolerant Execution," in Proc. International Symposium on High-Performance Computer Architecture (HPCA), 2010.
[23]
A. Gandhi, H. Akkary, R. Rajwar, S. T. Srinivasan, and K. Lai, "Scalable Load and Store Processing in Latency Tolerant Processors," in Proc. International Symposium on Computer Architecture (ISCA), 2005.
[24]
A. Hilton and A. Roth, "Decoupled Store Completion/Silent Deterministic Replay: Enabling Scalable Data Memory for CPR/CFP Processors," in Proc. International Symposium on Computer Architecture (ISCA), 2009.
[25]
S. Sethumadhavan, F. Roesner, J. S. Emer, D. Burger, and S. W. Keckler, "Late-Binding: Enabling Unordered Load-Store Queues," in Proc. International Symposium on Computer Architecture (ISCA), 2007.
[26]
B. R. Fisk and R. I. Bahar, "The Non-Critical Buffer: Using Load Latency Tolerance to Improve Data Cache Efficiency," in International Conference on Computer Design (ICCD), 1999.
[27]
E. Tune, D. Liang, D. M. Tullsen, and B. Calder, "Dynamic Prediction of Critical Path Instructions," in Proc. International Symposium on High-Performance Computer Architecture (HPCA), 2001.
[28]
B. Fields, S. Rubin, and R. Bodík, "Focusing Processor Policies via Critical-path Prediction," in Proc. International Symposium on Computer Architecture (ISCA), 2001.
[29]
B. Calder, D. Grunwald, and J. Emer, "Predictive Sequential Associative Cache," in Proc. International Symposium on High-Performance Computer Architecture (HPCA), 1996.
[30]
M. Goshima, K. Nishino, T. Kitamura, Y. Nakashima, S. Tomita, and S.-i. Mori, "A High-speed Dynamic Instruction Scheduling Scheme for Superscalar Processors," in Proc. International Symposium on Microarchitecture (MICRO), 2001.

Cited By

View all
  • (2024)Harvesting memory-bound CPU stall cycles in software with MSHProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691942(57-75)Online publication date: 10-Jul-2024
  • (2024)Localizing the Tag Comparisons in the Wakeup Logic to Reduce Energy Consumption of the Issue Queue2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00044(493-506)Online publication date: 2-Nov-2024
  • (2024)TURBULENCE: Complexity-Effective Out-of-Order Execution on GPU With Distance-Based ISAIEEE Computer Architecture Letters10.1109/LCA.2023.328931723:2(175-178)Online publication date: Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture
December 2015
787 pages
ISBN:9781450340342
DOI:10.1145/2830772
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

MICRO-48
Sponsor:

Acceptance Rates

MICRO-48 Paper Acceptance Rate 61 of 283 submissions, 22%;
Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)68
  • Downloads (Last 6 weeks)9
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Harvesting memory-bound CPU stall cycles in software with MSHProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691942(57-75)Online publication date: 10-Jul-2024
  • (2024)Localizing the Tag Comparisons in the Wakeup Logic to Reduce Energy Consumption of the Issue Queue2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00044(493-506)Online publication date: 2-Nov-2024
  • (2024)TURBULENCE: Complexity-Effective Out-of-Order Execution on GPU With Distance-Based ISAIEEE Computer Architecture Letters10.1109/LCA.2023.328931723:2(175-178)Online publication date: Jul-2024
  • (2024)Multi: Reduce Energy Overhead of Criticality-Aware Dynamic Instruction Scheduling for Energy Efficiency2024 IEEE 42nd International Conference on Computer Design (ICCD)10.1109/ICCD63220.2024.00020(60-67)Online publication date: 18-Nov-2024
  • (2023)Clockhands: Rename-free Instruction Set Architecture for Out-of-order ProcessorsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614272(1-16)Online publication date: 28-Oct-2023
  • (2023)Orinoco: Ordered Issue and Unordered Commit with Non-Collapsible QueuesProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589046(1-14)Online publication date: 17-Jun-2023
  • (2023)Performance Analysis of Criticality-Aware Out-of-Order Cores for Exploiting MLP2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)10.1109/ITC-CSCC58803.2023.10212794(1-4)Online publication date: 25-Jun-2023
  • (2023)Speculative Register Reclamation2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071122(1182-1194)Online publication date: Feb-2023
  • (2022)Efficient Instruction Scheduling Using Real-time Load Delay TrackingACM Transactions on Computer Systems10.1145/354868140:1-4(1-21)Online publication date: 24-Nov-2022
  • (2022)Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order CoresACM Transactions on Architecture and Code Optimization10.1145/350670419:2(1-28)Online publication date: 7-Mar-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media