Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/MICRO.2014.31acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
tutorial

Architectural Specialization for Inter-Iteration Loop Dependence Patterns

Published: 13 December 2014 Publication History

Abstract

Hardware specialization is an increasingly common technique to enable improved performance and energy efficiency in spite of the diminished benefits of technology scaling. This paper proposes a new approach called explicit loop specialization (XLOOPS) based on the idea of elegantly encoding inter-iteration loop dependence patterns in the instruction set. XLOOPS supports a variety of inter-iteration data-and control-dependence patterns for both single and nested loops. The XLOOPS hardware/software abstraction requires only lightweight changes to a general-purpose compiler to generate XLOOPS binaries and enables executing these binaries on: (1) traditional microarchitectures with minimal performance impact, (2) specialized microarchitectures to improve performance and/or energy efficiency, and (3) adaptive microarchitectures that can seamlessly migrate loops between traditional and specialized execution to dynamically trade-off performance vs. energy efficiency. We evaluate XLOOPS using a vertically integrated research methodology and show compelling performance and energy efficiency improvements compared to both simple and complex general-purpose processors.

References

[1]
K. Atasu, L. Pozzi, and P. Ienne. Automatic Application-Specific Instruction-Set Extensions Under Microarchitectural Constraints. Design Automation Conf., Jun 2003.
[2]
N. Binkert et al. The Gem5 Simulator. SIGARCH Comput. Archit. News, 39(2):1--7, May 2011.
[3]
S. Campanoni et al. HELIX-RC: An Architecture-Compiler Co-Design for Automatic Parallelization of Irregular Programs. Int'l Symp. on Computer Architecture, Jun 2014.
[4]
N. Clark et al. Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization. Int'l Symp. on Microarchitecture, Dec 2004.
[5]
L. Codrescu et al. Hexagon DSP: An Architecture Optimized for Mobile Multimedia and Communications. IEEE Micro, 34(2):34--43, Mar/Apr 2014.
[6]
J. Cong et al. Application-Specific Instruction Generation for Configurable Processor Architectures. Int'l Symp. on Field Programmable Gate Arrays, Feb 2004.
[7]
W. J. Dally et al. Efficient Embedded Computing. IEEE Computer, 47(7):27--32, Jul 2008.
[8]
R. Espasa, M. Valero, and J. E. Smith. Vector Architectures: Past, Present, and Future. Int'l Symp. on Supercomputing, Jul 1998.
[9]
G. Goff, K. Kennedey, and C.-W. Tseng. Practical Dependence Testing. ACM SIGPLAN Conf. on Programming Language Design and Implementation, Jun 1991.
[10]
V. Govindaraju et al. DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing. IEEE Micro, 32(5):38--51, Sep/Oct 2012.
[11]
V. Govindaraju, C.-H. Ho, and K. Sankaralingam. Dynamically Specialized Datapaths for Energy-Efficient Computing. Int'l Symp. on High-Performance Computer Architecture, Feb 2011.
[12]
V. Govindaraju, T. Nowatzki, and K. Sankaralingam. Breaking SIMD Shackles with an Exposed Flexible Microarchitecture and the Access Execute PDG. Int'l Conf. on Parallel Architectures and Compilation Techniques, Sep 2013.
[13]
S. Gupta et al. Bundled Execution of Recurring Traces for Energy-efficient General Purpose Processing. Int'l Symp. on Microarchitecture, Dec 2011.
[14]
M. R. Guthaus et al. MiBench: A Free, Commercially Representative Embedded Benchmark Suite. IEEE Annual Workshop on Workload Characterization, Dec 2001.
[15]
T. Harris, J. Larus, and R. Rajwar. Transactional Memory, 2nd edition. Synthesis Lectures on Computer Architecture, 5(1):1--263, 2010.
[16]
C. Jesshope. Implementing an Efficient Vector Instruction Set in a Chip Multiprocessor Using Micro-Threaded Pipelines. Australia Computer Science Communications, 23(4):80--88, 2001.
[17]
C. Kozyrakis and D. Patterson. Scalable Vector Processors for Embedded Systems. IEEE Micro, 23(6):36--45, Nov 2003.
[18]
R. Krashinsky et al. The Vector-Thread Architecture. Int'l Symp. on Computer Architecture, Jun 2004.
[19]
V. Krishnan and J. Torrellas. A Chip-Multiprocessor Architecture with Speculative Multithreading. IEEE Computer, 48(9):866--880, Sep 1999.
[20]
S. Kumar, C. J. Hughes, and A. Nguyen. Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors. Int'l Symp. on Computer Architecture, Jun 2007.
[21]
Y. Lee et al. Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerator Cores. Int'l Symp. on Computer Architecture, Jun 2011.
[22]
S. Li et al. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. Int'l Symp. on Microarchitecture, Dec 2009.
[23]
E. Lindholm et al. NVIDIA Tesla: A Unified Graphics and Computer Architecture. IEEE Micro, 28(2):39--55, Mar/Apr 2008.
[24]
The LLVM Compiler Infrastructure Project. Online Webpage, 2011 (accessed February, 2011). http://www.llvm.org.
[25]
D. Lockhart, G. Zibrat, and C. Batten. PyMTL: A Unified Framework for Vertically Integrated Computer Architecture Research. Int'l Symp. on Microarchitecture, Dec 2014.
[26]
N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi. CACTI 6.0: A Tool to Model Large Caches. HP Technical Report HPL-2009-85, 2009.
[27]
Polyhedral Benchmark Suite. Online Webpage, 2014 (accessed May, 2014). http://www.cse.ohio-state.edu/~pouchet/software/polybench.
[28]
D. Sanchez, R. M. Yoo, and C. Kozyrakis. Flexible Architectural Support for Fine-Grain Scheduling. Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, Mar 2010.
[29]
J. Shun et al. Brief Announcement: The Problem Based Benchmark Suite. Symp. on Parallel Algorithms and Architectures, Jun 2012.
[30]
G. Sohi, S. Breach, and T. Vijaykumar. Multiscalar Processors. Int'l Symp. on Computer Architecture, Jun 1995.
[31]
J. G. Steffan et al. A Scalable Approach to Thread-Level Speculation. Int'l Symp. on Computer Architecture, May 2000.
[32]
TMS320C28x Floating Point Unit and Instruction Set. Reference Guide, 2008. http://www.ti.com/lit/ug/sprueo2a/sprueo2a.pdf.
[33]
G. Venkatesh et al. QsCores: Trading Dark Silicon for Scalable Energy Efficiency with Quasi-Specific Cores. Int'l Symp. on Microarchitecture, 2011.
[34]
J. Wawrzynek et al. Spert-II: A Vector Microprocessor System. IEEE Computer, 29(3):79--86, Mar 1996.
[35]
C. M. Wittenbrink, E. Kilgariff, and A. Prabhu. Fermi GF100 GPU Architecture. IEEE Micro, 31(2):50--59, Mar/Apr 2011.
[36]
H. Zhong et al. Uncovering Hidden Loop Level Parallelism in Sequential Applications. Int'l Symp. on High-Performance Computer Architecture, Feb 2008.

Cited By

View all
  • (2021)AurochsProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00039(402-415)Online publication date: 14-Jun-2021
  • (2020)Efficient Nearest-Neighbor Data Sharing in GPUsACM Transactions on Architecture and Code Optimization10.1145/342998118:1(1-26)Online publication date: 30-Dec-2020
  • (2018)TAPASProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00028(245-257)Online publication date: 20-Oct-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO-47: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture
December 2014
697 pages
ISBN:9781479969982

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 13 December 2014

Check for updates

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

MICRO-47
Sponsor:

Acceptance Rates

MICRO-47 Paper Acceptance Rate 53 of 279 submissions, 19%;
Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)AurochsProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00039(402-415)Online publication date: 14-Jun-2021
  • (2020)Efficient Nearest-Neighbor Data Sharing in GPUsACM Transactions on Architecture and Code Optimization10.1145/342998118:1(1-26)Online publication date: 30-Dec-2020
  • (2018)TAPASProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00028(245-257)Online publication date: 20-Oct-2018
  • (2018)Inter-thread communication in multithreaded, reconfigurable coarse-grain arraysProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00013(42-54)Online publication date: 20-Oct-2018
  • (2017)Stream-Dataflow AccelerationACM SIGARCH Computer Architecture News10.1145/3140659.308025545:2(416-429)Online publication date: 24-Jun-2017
  • (2017)Using intra-core loop-task accelerators to improve the productivity and performance of task-based parallel programsProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3136952(759-773)Online publication date: 14-Oct-2017
  • (2017)Stream-Dataflow AccelerationProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080255(416-429)Online publication date: 24-Jun-2017
  • (2017)An Integrated Vector-Scalar Design on an In-Order ARM CoreACM Transactions on Architecture and Code Optimization10.1145/307561814:2(1-26)Online publication date: 26-May-2017
  • (2016)Continuous shape shiftingThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195666(1-12)Online publication date: 15-Oct-2016
  • (2016)Analyzing Behavior Specialized AccelerationACM SIGARCH Computer Architecture News10.1145/2980024.287241244:2(697-711)Online publication date: 25-Mar-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media