Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1450135.1450191acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

Speculative DMA for architecturally visible storage in instruction set extensions

Published: 19 October 2008 Publication History

Abstract

Instruction set extensions (ISEs) can accelerate embedded processor performance. Many algorithms for ISE generation have shown good potential; some of them have recently been expanded to include Architecturally Visible Storage (AVS) - compiler-controlled memories, similar to scratchpads, that are accessible only to ISEs. To achieve a speedup using AVS, Direct Memory Access (DMA) transfers are required to move data from the main memory to the AVS; unfortunately, this creates coherence problems between the AVS and the cache, which previous methods for ISEs with AVS failed to address; additionally, these methods need to leave many conservative DMA transfers in place, whose execution significantly limits the achievable speedup. This paper presents a memory coherence scheme for ISEs with AVS, which can ensure execution correctness and memory consistency with minimal area overhead. We also present a method that speculatively removes redundant DMA transfers. Cycle-accurate experimental results were obtained using an FPGA-emulation platform. These results show that the application-specific instruction-set extended processors with speculative DMA-enhanced AVS gain significantly over previous techniques, despite the overhead of the coherence mechanism.

References

[1]
ARM Ltd. ARM Cortex-A9 MPCore. http://www.arm.com/products/CPUs/.
[2]
K. Atasu, G. Dündar, and C. Õzturan. An integer linear programming approach for identifying instruction-set extensions. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, pages 172--77, Jersey City, N.J., Sept. 2005.
[3]
L. Benini, A. Macii, E. Macii, and M. Poncino. Synthesis of application-specific memory for power optimization in embedded systems. In Proceedings of the 37th Design Automation Conference, pages 300--303, Los Angeles, Calif., June 2000.
[4]
P. Biswas, N. Dutt, L. Pozzi, and P. Ienne. Introduction of architecturally visible storage in instruction set extensions. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, CAD-26(3):435--46, Mar. 2007.
[5]
N. T. Clark, H. Zhong, and S. A. Mahlke. Automated custom instruction generation for domain-specific processor acceleration. IEEE Transactions on Computers, C-54(10):1258--70, Oct. 2005.
[6]
D. E. Culler, J. P. Singh, and A. Gupta. Parallel Computer Architecture: A Hardware/software Approach. Morgan Kaufmann, San Mateo, Calif., 1999.
[7]
T. R. Halfhill. EEMBC releases first benchmarks. Microprocessor Report, 1 May 2000.
[8]
P. Ienne and R. Leupers, editors. Customizable Embedded Processors--Design Technologies and Applications. Systems on Silicon Series. Morgan Kaufmann, San Mateo, Calif., 2006.
[9]
M. S. Papamarcos and J. H. Patel. A low overhead coherence solution for multiprocessors with private cache memories. In Proceedings of the 11th Annual International Symposium on Computer Architecture, pages 348--54, Ann Arbor, Mich., Jan. 1984.
[10]
L. Pozzi, K. Atasu, and P. Ienne. Exact and approximate algorithms for the extension of embedded processor instruction sets. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, CAD-25(7):1209--29, July 2006.
[11]
S. Steinke, L. Wehmeyer, B.-S. Lee, and P. Marwedel. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, Paris, Mar. 2002.
[12]
A. K. Verma, P. Brisk, and P. Ienne. Rethinking custom ISE identification: A new processor-agnostic method. In Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems, pages 125--34, Salzburg, Sept. 2007.
[13]
P. Yu and T. Mitra. Scalable custom instructions identification for instruction set extensible processors. In Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems, pages 69--78, Washington, D.C., Sept. 2004.

Cited By

View all
  • (2018)Rapid Memory-Aware Selection of Hardware Accelerators in Programmable SoC DesignIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2017.276912526:3(445-456)Online publication date: 1-Mar-2018
  • (2015)Architecture Support for Tightly-Coupled Multi-Core Clusters with Shared-Memory HW AcceleratorsIEEE Transactions on Computers10.1109/TC.2014.236052264:8(2132-2144)Online publication date: 1-Aug-2015
  • (2014)Virtual Ways: Low-Cost Coherence for Instruction Set Extensions with Architecturally Visible StorageACM Transactions on Architecture and Code Optimization10.1145/257687711:2(1-26)Online publication date: 15-Jul-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CODES+ISSS '08: Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
October 2008
288 pages
ISBN:9781605584706
DOI:10.1145/1450135
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. application-specific processors
  2. architecturally visible storage
  3. instruction set extensions
  4. speculative direct memory access

Qualifiers

  • Research-article

Conference

ESWEEK 08
ESWEEK 08: Fourth Embedded Systems Week
October 19 - 24, 2008
GA, Atlanta, USA

Acceptance Rates

CODES+ISSS '08 Paper Acceptance Rate 44 of 143 submissions, 31%;
Overall Acceptance Rate 280 of 864 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)2
Reflects downloads up to 30 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Rapid Memory-Aware Selection of Hardware Accelerators in Programmable SoC DesignIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2017.276912526:3(445-456)Online publication date: 1-Mar-2018
  • (2015)Architecture Support for Tightly-Coupled Multi-Core Clusters with Shared-Memory HW AcceleratorsIEEE Transactions on Computers10.1109/TC.2014.236052264:8(2132-2144)Online publication date: 1-Aug-2015
  • (2014)Virtual Ways: Low-Cost Coherence for Instruction Set Extensions with Architecturally Visible StorageACM Transactions on Architecture and Code Optimization10.1145/257687711:2(1-26)Online publication date: 15-Jul-2014
  • (2014)Way StealingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2012.223668922:1(62-75)Online publication date: 1-Jan-2014
  • (2014)Improve memory access for achieving both performance and energy efficiencies on heterogeneous systems2014 International Conference on Field-Programmable Technology (FPT)10.1109/FPT.2014.7082759(91-98)Online publication date: Dec-2014
  • (2014)A HLS-Based Toolflow to Design Next-Generation Heterogeneous Many-Core Platforms with Shared MemoryProceedings of the 2014 12th IEEE International Conference on Embedded and Ubiquitous Computing10.1109/EUC.2014.27(130-137)Online publication date: 26-Aug-2014
  • (2013)GPGPU Computing for Cloud AuditingHigh Performance Cloud Auditing and Applications10.1007/978-1-4614-3296-8_10(259-282)Online publication date: 1-Aug-2013
  • (2012)Hierarchical Design of an Application-Specific Instruction Set Processor for High-Throughput and Scalable FFT ProcessingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2011.210551220:3(551-563)Online publication date: 1-Mar-2012
  • (2012)Optically-Clocked Instruction Set Extensions for High Efficiency Embedded ProcessorsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2011.216973059:3(604-615)Online publication date: Mar-2012
  • (2011)Architecture and design automation for application-specific processors2011 9th IEEE International Conference on ASIC10.1109/ASICON.2011.6157399(1094-1097)Online publication date: Oct-2011
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media