Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Revisiting reorder buffer architecture for next generation high performance computing

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Modern microprocessors achieve high application performance at an acceptable level of power dissipation. Reorder buffer is used for out-of-order instructions to be committed in-order. The reorder buffer plays a key role in modern microprocessors because performance improvement techniques highly rely on aggressive speculation to feed wider issue, out-of-order, and deep pipelines. In terms of power to performance trade-off, reorder buffer is particularly important. This is because enlarging the reorder buffer size achieves high performance but naive scaling of the conventional reorder buffer architecture can severely increase the complexity and power consumption. In this paper, we propose low-power reorder buffer techniques for contemporary microprocessors. First, the separated reorder buffer reduces power dissipation by deferred allocation and early release. The deferred allocation delays the SROB allocation of instructions until all their data dependencies are resolved. Then, the instructions are executed in program order and they are released faster from the SROB. The result of the instruction is written into rename buffers immediately after the execution completes. Then, the result values in the rename buffer are written into the architectural register file at the commit state. The proposed approaches in this paper provide higher resource utilization and low power consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Folegnani D, Gonzalez A (2001) Energy-effective issue logic. In: The proceedings of the IEEE international symposium on computer architecture (ISCA)

    Google Scholar 

  2. Nan H, Kim KK, Wang W, Choi K (2011) Dynamic voltage and frequency scaling for power-constrained design using process voltage and temperature sensor circuits. J Inf Process Syst 7(1)

  3. Åsberg M, Nolte T, Pettersson P Prototyping and code synthesis of hierarchically scheduled systems using TIMES. J Converg 1(1):75–84

  4. Sathappan OL, Chitra P, Venkatesh P, Prabhu M Modified genetic algorithm for multiobjective task scheduling on heterogeneous computing system. Int J Inf Technol, Commun Converg 1(2):146–158

  5. Ye Y, Li X, Wu B, Li Y A comparative study of feature weighting methods for document co-clustering. Int J Inf Technol, Commun Converg 1(2):206–220

  6. Fisher JD (2009) Design and implementation of low power reorder buffer. Dissertation of University of Texas at San Antonio, 77 p

  7. Cristal A, Santana O, Cazorla F, Galluzzi M, Ramirez T, Pericas M, Valero M (2005) Kilo-instruction processors: overcoming the memory wall. IEEE micro

    Google Scholar 

  8. Kirman N, Kirman M, Chaudhuri M, Martinez J (2005) Checkpointed early load retirement. In: Proceedings of the international symposium on high-performance computer architecture (HPCA)

    Google Scholar 

  9. Martinez J, Renau J, Huang M, Prvulovic M, Torrellas J (2002) Cherry: Checkpointed early resource recycling in our-of-order microprocessors. In: Proceedings of the IEEE international symposium on microarchitecture (MICRO)

    Google Scholar 

  10. Dundas J, Mudge T (1997) Improving data cache performance by pre-executing instructions under a cache miss. In: Proceedings of the ACM international conference on supercomputing (ICS), July 1997

    Google Scholar 

  11. Mutlu O, Stark J, Wilkerson C, Patt YN (2003) Runahead execution: An alternative to very large instruction windows for out-of-order processors. In: Proceedings of the IEEE international symposium on high performance computer architecture (HPCA), February 2003, pp 129–140

    Google Scholar 

  12. Kucuk G, Ergin O, Ponomarev D, Ghose K (2003) Distributed reorder buffer schemes for low power. In: International conference on computer design (ICCD)

    Google Scholar 

  13. Smith JE (1985) Implementation of precise interrupts in pipelined processors. The anatomy of a microprocessor: A system perspective. IEEE CS Press, Los Alamitos

    Google Scholar 

  14. Brown JA, Porter L, Tullsen DM (2011) Fast thread migration via cache working set prediction. In: International symposium on high performance computer architecture (HPCA)

    Google Scholar 

  15. Mehrara M, Hsu PC, Samadi M, Mahlke S (2011) Dynamic parallelization of JavaScript applications using an ultra-lightweight speculation mechanism. In: International symposium on high performance computer architecture (HPCA)

    Google Scholar 

  16. Sima D (2000) The design space of register renaming techniques. IEEE micro

    Google Scholar 

  17. Obaidat MS, Dhurandher SK, Gupta D, Gupta N, Asthana A (2010) DEESR, dynamic energy efficient and secure routing protocol for wireless sensor networks in urban environments. J Inf Process 6(3)

  18. Jerbi K, Wipliez M, Raulet M, Babel M, Déforges O, Abid M Automatic method for efficient hardware implementation from RVC-CAL dataflow: A LAR coder baseline case study. J Converg 1(1):85–92

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Young-Sik Jeong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choi, M., Park, J.H. & Jeong, YS. Revisiting reorder buffer architecture for next generation high performance computing. J Supercomput 65, 484–495 (2013). https://doi.org/10.1007/s11227-011-0734-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-011-0734-x

Keywords

Navigation