Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1837274.1837409acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Xetal-Pro: an ultra-low energy and high throughput SIMD processor

Published: 13 June 2010 Publication History

Abstract

This paper presents Xetal-Pro SIMD processor, which is based on Xetal-II, one of the most computational-efficient (in terms of GOPS/Watt) processors available today. Xetal-Pro supports ultra wide VDD scaling from nominal supply to the sub-threshold region. Although aggressive VDD scaling causes severe throughput degradation, this can be compensated by the nature of massive parallelism in the Xetal family. The predecessor of Xetal-Pro, Xetal-II, includes a large on-chip frame memory (FM), which cannot operate reliably at ultra low voltage. Therefore we investigate both different FM realizations and memory organization alternatives. We propose a hybrid memory architecture which reduces the non-local memory traffic and enables further VDD scaling. Compared to Xetal-II operating at nominal voltage, we could gain more than 10x energy reduction while still delivering a sufficiently high throughput of 0.69 GOPS (counting multiply and add operations only). This work gives a new insight to the design of ultra-low energy SIMD processors, which are suitable for portable streaming applications.

References

[1]
A. Abbo, R. Kleihorst, V. Choudhary, L. Sevat, P. Wielage, S. Mouy, B. Vermeulen, and M. Heijligers. Xetal-II: a 107 GOPS, 600 mW massively parallel processor for video scene analysis. IEEE Journal of Solid-State Circuits, 43(1):192--201, 2008.
[2]
B. Calhoun and A. Chandrakasan. A 256kb sub-threshold SRAM in 65nm CMOS. In IEEE Int. Solid-Stace Circ. Conf, pages 2592--2601, 2006.
[3]
P. Francesco, P. Marchal, D. Atienza, L. Benini, F. Catthoor, and J. Mendias. An integrated hardware/software approach for run-time scratchpad management. In Proceedings of the 41st annual conference on Design automation, pages 238--243. ACM New York, NY, USA, 2004.
[4]
N. Jayasena, M. Erez, J. Ahn, and W. Dally. Stream register files with indexed access. In High Performance Computer Architecture, 2004. HPCA-10. Proceedings. 10th International Symposium on, pages 60--72, 2004.
[5]
H. Kaul, M. A. Anders, S. K. Mathew, S. K. Hsu, A. Agarwal, R. K. Krishnamurthy, and S. Borkar. A 300mV 494GOPS/W Reconfigurable Dual-Supply 4-Way SIMD Vector Processing Accelerator in 45nm CMOS. In IEEE Int. Solid-Stace Circ. Conf, pages 260--263, 2009.
[6]
R. Kenneth. Castleman. Digital image processing. Prentice Hall Press, Upper Saddle River, NJ, 1996.
[7]
J. Kwong, Y. Ramadass, N. Verma, and A. Chandrakasan. A 65 nm Sub-V t Microcontroller With Integrated SRAM and Switched Capacitor DC-DC Converter. IEEE Journal of Solid-State Circuits, 44(1):115--126, 2009.
[8]
S. Kyo and S. Okazaki. IMAPCAR: A 100 GOPS In-Vehicle Vision Processor Based on 128 Ring Connected Four-Way VLIW Processing Elements. Journal of Signal Processing Systems, pages 1--12.
[9]
Y. Pu, J. de Gyvez, H. Corporaal, and Y. Ha. An Ultra-Low-Energy/Frame Multi-Standard JPEG CO-Processor in 65nm CMOS with Sub/Near-Threshold Power Supply. In IEEE Int. Solid-Stace Circ. Conf, pages 146--147, 2009.
[10]
M. Seok, S. Hanson, Y. Lin, Z. Foo, D. Kim, Y. Lee, N. Liu, D. Sylvester, and D. Blaauw. The Phoenix Processor: A 30pW platform for sensor applications. In 2008 IEEE Symposium on VLSI Circuits, pages 188--189, 2008.
[11]
N. Verma and A. Chandrakasan. A 256 kb 65 nm 8T subthreshold SRAM employing Sense-amplifier Redundancy. IEEE Journal of Solid State Circuits, 43(1):141, 2008.
[12]
A. Wang, A. Chandrakasan, T. Inc, and T. Dallas. A 180-mV subthreshold FFT processor using a minimum energy design methodology. IEEE Journal of Solid-State Circuits, 40(1):310--319, 2005.
[13]
B. Zhai, L. Nazhandali, J. Olson, A. Reeves, M. Minuth, R. Helfand, S. Pant, D. Blaauw, and T. Austin. A 2.60 pJ/Inst subthreshold sensor processor for optimal energy efficiency. In VLSI Circuits, 2006. Digest of Technical Papers. 2006 Symposium on, pages 154--155, 2006.

Cited By

View all
  • (2018)Heterogeneous and Inexact: Maximizing Power Efficiency of Edge Computing Sensors for Health Monitoring Applications2018 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS.2018.8351595(1-5)Online publication date: May-2018
  • (2018)An Access-Pattern-Aware On-Chip Vector Memory System with Automatic Loading for SIMD Architectures2018 IEEE High Performance extreme Computing Conference (HPEC)10.1109/HPEC.2018.8547551(1-7)Online publication date: Sep-2018
  • (2017)An Inexact Ultra-low Power Bio-signal Processing Architecture With Lightweight Error RecoveryACM Transactions on Embedded Computing Systems10.1145/312656516:5s(1-19)Online publication date: 27-Sep-2017
  • Show More Cited By

Index Terms

  1. Xetal-Pro: an ultra-low energy and high throughput SIMD processor

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DAC '10: Proceedings of the 47th Design Automation Conference
    June 2010
    1036 pages
    ISBN:9781450300025
    DOI:10.1145/1837274
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Xetal-Pro
    2. SIMD
    3. hybrid memory system
    4. low-energy

    Qualifiers

    • Research-article

    Conference

    DAC '10
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

    Upcoming Conference

    DAC '25
    62nd ACM/IEEE Design Automation Conference
    June 22 - 26, 2025
    San Francisco , CA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Heterogeneous and Inexact: Maximizing Power Efficiency of Edge Computing Sensors for Health Monitoring Applications2018 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS.2018.8351595(1-5)Online publication date: May-2018
    • (2018)An Access-Pattern-Aware On-Chip Vector Memory System with Automatic Loading for SIMD Architectures2018 IEEE High Performance extreme Computing Conference (HPEC)10.1109/HPEC.2018.8547551(1-7)Online publication date: Sep-2018
    • (2017)An Inexact Ultra-low Power Bio-signal Processing Architecture With Lightweight Error RecoveryACM Transactions on Embedded Computing Systems10.1145/312656516:5s(1-19)Online publication date: 27-Sep-2017
    • (2017)HEAL-WEAR: An Ultra-Low Power Heterogeneous System for Bio-Signal AnalysisIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2017.270149964:9(2448-2461)Online publication date: Sep-2017
    • (2017)A Synchronization-Based Hybrid-Memory Multi-Core Architecture for Energy-Efficient Biomedical Signal ProcessingIEEE Transactions on Computers10.1109/TC.2016.261042666:4(575-585)Online publication date: 1-Apr-2017
    • (2016)Nano-engineered architectures for ultra-low power wireless body sensor nodesProceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis10.1145/2968456.2968464(1-10)Online publication date: 1-Oct-2016
    • (2016)A configurable SIMD architecture with explicit datapath for intelligent learning2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)10.1109/SAMOS.2016.7818343(156-163)Online publication date: Jul-2016
    • (2016)A multi-core reconfigurable architecture for ultra-low power bio-signal analysis2016 IEEE Biomedical Circuits and Systems Conference (BioCAS)10.1109/BioCAS.2016.7833820(416-419)Online publication date: Oct-2016
    • (2016)Exploring the Design Space of an Energy-Efficient Accelerator for the SKA1-Low Central Signal ProcessorInternational Journal of Parallel Programming10.1007/s10766-016-0420-y44:5(1003-1027)Online publication date: 1-Apr-2016
    • (2015)A compilation technique and performance profits for VLIW with heterogeneous vectors2015 4th Mediterranean Conference on Embedded Computing (MECO)10.1109/MECO.2015.7181860(9-12)Online publication date: Jun-2015
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media