research-article

Free access

An energy-efficient memory hierarchy for multi-issue processors

Authors:

Luigi CarroAuthors Info & Claims

DATE '17: Proceedings of the Conference on Design, Automation & Test in Europe

Pages 368 - 373

Published: 27 March 2017 Publication History

Abstract

Embedded processors must rely on the efficient use of instruction-level parallelism to answer the performance and energy needs of modern applications. However, a limiting factor to better use available resources inside the processor concerns memory bandwidth. Adding extra ports to allow for more data accesses drastically increases costs and energy. In this paper, we present a novel memory architecture system for embedded multi-issue processors that can overcome the limited memory bandwidth without adding extra ports to the system. We combine the use of software-managed memories (SMM) with the data cache to provide a system with a higher throughput without increasing the number of ports. Compiler-automated code transformations minimize the effort of programmers to benefit from the proposed architecture. Our experimental results show an average speedup of 1.17x, while consuming 69% less dynamic energy and on average 74.7% lower energy-delay product regarding data memory in comparison to a baseline processor.

References

[1]

Y. Tatsumi and H. J. Mattausch, "Fast quadratic increase of multiport-storage-cell area with port number," Electron. Lett., vol. 35, no. 25, pp. 2185--2187, 1999.

[2]

S. Li, K. Chen, J. H. Ahn, J. B. Brockman, and N. P. Jouppi, "CACTI-P: Architecture-level Modeling for SRAM-based Structures with Advanced Leakage Reduction Techniques," in Proceedings of the International Conference on Computer-Aided Design, 2011, pp. 694--701.

Digital Library

[3]

M. Verma, L. Wehmeyer, and P. Marwedel, "Cache-aware scratchpad allocation algorithm," in Proceedings of the conference on Design, automation and test in Europe-Volume 2, 2004, p. 21264.

Digital Library

[4]

M. Verma and P. Marwedel, "Overlay techniques for scratchpad memories in low power embedded processors," Very Large Scale Integr. Syst. IEEE Trans., vol. 14, no. 8, pp. 802--815, 2006.

Digital Library

[5]

S. Steinke, L. Wehmeyer, B.-S. Lee, and P. Marwedel, "Assigning program and data objects to scratchpad for energy reduction," in Design, Automation and Test in Europe Conference and Exhibition, 2002. Proceedings, 2002, pp. 409--415.

Digital Library

[6]

F. Angiolini, F. Menichelli, A. Ferrero, L. Benini, and M. Olivieri, "A Post-compiler Approach to Scratchpad Mapping of Code," in Proceedings of the 2004 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, 2004, pp. 259--267.

Digital Library

[7]

O. Avissar, R. Barua, and D. Stewart, "An Optimal Memory Allocation Scheme for Scratch-pad-based Embedded Systems," ACM Trans. Embed. Comput. Syst., vol. 1, no. 1, pp. 6--26, Nov. 2002.

Digital Library

[8]

W. Che and K. S. Chatha, "Scheduling of stream programs onto SPM enhanced processors with code overlay," in Embedded Systems for Real-Time Multimedia (ESTIMedia), 2011 9th IEEE Symposium on, 2011, pp. 9--18.

[9]

M. Moazeni, A. Bui, and M. Sarrafzadeh, "A memory optimization technique for software-managed scratchpad memory in GPUs," in Application Specific Processors, 2009. SASP '09. IEEE 7th Symposium on, 2009, pp. 43--49.

[10]

D. Kirk, "NVIDIA Cuda Software and Gpu Parallel Computing Architecture," in Proceedings of the 6th International Symposium on Memory Management, 2007, pp. 103--104.

Digital Library

[11]

C. E. LaForest and J. G. Steffan, "Efficient multi-ported memories for FPGAs," in Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays, 2010, pp. 41--50.

Digital Library

[12]

G. A. Malazgirt, H. E. Yantir, A. Yurdakul, and S. Niar, "Application specific multi-port memory customization in FPGAs," in 2014 24th International Conference on Field Programmable Logic and Applications (FPL), 2014, pp. 1--4.

[13]

H. Bajwa and X. Chen, "Low-Power High-Performance and Dynamically Configured Multi-Port Cache Memory Architecture," in Electrical Engineering, 2007. ICEE '07. International Conference on, 2007, pp. 1--6.

[14]

T. Jost, G. Nazar, and L. Carro, "Improving Performance in VLIW Softcore Processors through Software-controlled ScratchPads," in Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2016 International Conference on, 2016.

[15]

C. Lattner and V. Adve, "LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation," in Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, 2004, p. 75-.

Digital Library

[16]

K. Cooper and L. Torczon, Engineering a compiler. Elsevier, 2011.

Digital Library

[17]

J. A. Fisher, P. Faraboschi, and C. Young, Embedded computing: a VLIW approach to architecture, compilers and tools. Elsevier, 2005.

Digital Library

[18]

S. Wong, T. Van As, and G. Brown, "ρ-VEX: A reconfigurable and extensible softcore VLIW processor," in ICECE Technology, 2008. FPT 2008. International Conference on, 2008, pp. 369--372.

[19]

Hewlett-Packard Laboratories, "VEX Toolchain." {Online}. Available: http://www.hpl.hp.com/downloads/vex/.

An energy-efficient memory hierarchy for multi-issue processors

Recommendations

Energy-efficient register caching with compiler assistance

The register file is a critical component in a modern superscalar processor. It must be large enough to accommodate the results of all in-flight instructions. It must also have enough ports to allow simultaneous issue and writeback of many values each ...
Generation of Pack Instruction Sequence for Media Processors Using Multi-Valued Decision Diagram

SIMD instructions are often implemented in modern multimedia oriented processors. Although SIMD instructions are useful for many digital signal processing applications, most compilers do not exploit SIMD instructions. The difficulty in the utilization ...
Architectural partitioning of control memory for application specific programmable processors
ICCAD '95: Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design

Abstract: Because of programmability of Application Specific Programmable Processors (ASPPs), microcode-based control is effectively used to drive ASPP datapaths for different applications. In ASPPs, each application needs a separate microprogram ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

DATE '17: Proceedings of the Conference on Design, Automation & Test in Europe

March 2017

1814 pages

Publisher

European Design and Automation Association

Leuven, Belgium

Publication History

Published: 27 March 2017

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
39
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)8

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents