research-article

A multi-fpga 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer

Authors:

Rob A. RutenbarAuthors Info & Claims

FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Pages 83 - 92

https://doi.org/10.1145/1508128.1508141

Published: 22 February 2009 Publication History

Abstract

Today's best quality speech recognition systems are implemented in software. These systems fully occupy the resources of a high-end server to deliver results at real-time speed: each hour of audio requires a significant fraction of an hour of computation for recognition. This is profoundly limiting for applications that require extreme recognition speed, for example, high-volume tasks such as video indexing (e.g., YouTube), or high-speed tasks such as triage of homeland security intelligence. We describe the architecture and implementation of one critical component -- the backend search stage -- of a high-speed, large-vocabulary recognizer. Implemented on a multi-FPGA Berkeley Emulation Engine 2 (BEE2) platform, we handle a standard 5000-word Wall Street Journal speech benchmark. Our backend search engine can decode on average 10 times faster than real-time running at 100 MHz, i.e, 10x faster than real-time, with negligible degradation in accuracy, running at a clock rate ~ 30x slower than a conventional server. To the best of our knowledge, this is both the most complex, and the fastest recognizer ever to be realized in a hardware form.

References

[1]

X. Huang, A. Acero and H.-W. Hon, "Spoken Language Processing: A Guide to Theory, Algorithm and System Development", Prentice Hall PTR, New Jersey, 2001.

Digital Library

[2]

Leavitt, N. "Let's Hear It for Audio Mining", Computer, 35(10):23--25, Oct 2002.

Digital Library

[3]

Stolzle, A. et al. "Integrated Circuits for a Real-Time Large-Vocabulary Continuous Speech Recognition System," IEEE Journal of Solid-State Circuits, vol. 26 no. 1, pp 2--11, Jan 1991.

[4]

R. Kavaler et al., A Dynamic Time Warp Integrated Circuit for a 1000-Word Recognition System", IEEE Journal of Solid-State Circuits, vol SC-22, NO 1, February 1987, pp 3--14.

[5]

Cali, L., Lertora, F., Besana, M., Borgatti, M., "Co-Design Method Enables Speech Recognition SoC", EETimes, Nov. 2001, p. 12.

[6]

Mathew, B. Davis, A. and Fang, Z. "A Low-power Accelerator for the SPHINX 3 Speech Recognition System", International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp 210--219. ACM Press, 2003.

Digital Library

[7]

Krishna, R. Mahlke, S. and Austin, T. "Architectural optimizations for low-power, real-time speech recognition", International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp 220--231. 2003.

Digital Library

[8]

Nedevschi, S. Patra, R. and Brewer, E. "Hardware Speech Recognition on Low-Cost and Low-Power Devices," Proc. Design and Automation Conference, 2005.

Digital Library

[9]

"The Talking Cure", The Economist, Mar 12 2005, p. 11.

[10]

Lin, E., Yu. K., Rutenbar, R., Chen, T. "A 1000-Word Vocabulary, Speaker-Independent, Continuous Live-Mode Speech Recognizer Implemented in a Single FPGA", International Symposium on Field-Programmable Gate Arrays (FPGA), Feb 2007.

Digital Library

[11]

Chang, C. Wawrzynek, J. and Brodersen, R. W. "BEE2: A High-End Reconfigurable Computing System," IEEE Design and Test of Computers, vol. 22, no. 2, pp 114--125, Mar/Apr 2005.

Digital Library

[12]

CMU Sphinx Open Source Speech Recognition Engines, http://cmusphinx.sourceforge.net/html/cmusphinx.php.

[13]

Pallett, D., "A Look at NIST's Benchmark ASR Tests: Past, Present, and Future", Proc 2003 IEEE Workshop on Automatic Speech Recognition and Understanding.

[14]

Hwang, M. et al., "Subphonetic Modeling with Markov States--Senone", International Conference on Acoustics, Speech and Signal Processing, p. 33--36, Mar. 1992

[15]

Huang, X. D., Ariki, Y., and Jack, M. "Hidden Markov Models for Speech Recognition", Edinburgh University Press, 1990.

Digital Library

[16]

Viterbi, A.: "Error bounds for convolutional codes and an asymptotically optimum decoding algorithm", IEEE Transactions on Information Theory, vol. 13, pp 260--269 1967.

Digital Library

Cited By

Yazdani RArnau JGonzalez A(2019)A Low-Power, High-Performance Speech Recognition AcceleratorIEEE Transactions on Computers10.1109/TC.2019.293707568:12(1817-1831)Online publication date: 1-Dec-2019
https://doi.org/10.1109/TC.2019.2937075
Pasciaroni AJulian PAndreou A(2019)Parallelism Analysis for a Multi-core Speech Recognition Architecture2019 Argentine Conference on Electronics (CAE)10.1109/CAE.2019.8709282(92-97)Online publication date: Mar-2019
https://doi.org/10.1109/CAE.2019.8709282
Khawaja ALandgraf JPrakash RWei MSchkufza ERossbach CArpaci-Dusseau AVoelker G(2018)Sharing, protection, and compatibility for reconfigurable fabric with AmorphosProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291177(107-127)Online publication date: 8-Oct-2018
https://dl.acm.org/doi/10.5555/3291168.3291177
Show More Cited By

Index Terms

A multi-fpga 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems
2. Hardware
  1. Communication hardware, interfaces and storage
    1. Signal processing systems

Recommendations

A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA
FPGA '07: Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays

The Carnegie Mellon In Silico Vox project seeks to move best-quality speech recognition technology from its current software-only form into a range of efficient all-hardware implementations. The central thesis is that, like graphics chips, the ...
A real-time FPGA-based 20 000-word speech recognizer with optimized DRAM access

A real-time hardware-based large vocabulary speech recognizer requires high memory bandwidth. We have developed a field-programmable-gate-array (FPGA)-based 20000-word speech recognizer utilizing efficient dynamic random access memory (DRAM) access. ...
Building Medium-Vocabulary Isolated-Word Lithuanian HMM Speech Recognition System

In this paper, the opening work on the development of a Lithuanian HMM speech recognition system is described. The triphone single-Gaussian HMM speech recognition system based on Mel Frequency Cepstral Coefficients (MFCC) was developed using HTK toolkit. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

February 2009

302 pages

ISBN:9781605584102

DOI:10.1145/1508128

General Chair:
Paul Chow
University of Toronto, Canada
,
Program Chair:
Peter Cheung
Imperial College London, UK

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

FPGA '09

Sponsor:

FPGA '09: ACM/SIGDA International Symposium on Field Programmable Gate Arrays

February 22 - 24, 2009

California, Monterey, USA

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

33
Total Citations
View Citations
614
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yazdani RArnau JGonzalez A(2019)A Low-Power, High-Performance Speech Recognition AcceleratorIEEE Transactions on Computers10.1109/TC.2019.293707568:12(1817-1831)Online publication date: 1-Dec-2019
https://doi.org/10.1109/TC.2019.2937075
Pasciaroni AJulian PAndreou A(2019)Parallelism Analysis for a Multi-core Speech Recognition Architecture2019 Argentine Conference on Electronics (CAE)10.1109/CAE.2019.8709282(92-97)Online publication date: Mar-2019
https://doi.org/10.1109/CAE.2019.8709282
Khawaja ALandgraf JPrakash RWei MSchkufza ERossbach CArpaci-Dusseau AVoelker G(2018)Sharing, protection, and compatibility for reconfigurable fabric with AmorphosProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291177(107-127)Online publication date: 8-Oct-2018
https://dl.acm.org/doi/10.5555/3291168.3291177
Devaraj RSarkar ABiswas S(2017)Fault-Tolerant Preemptive Aperiodic RT Scheduling by Supervisory Control of TDES on MultiprocessorsACM Transactions on Embedded Computing Systems10.1145/301227816:3(1-25)Online publication date: 11-Apr-2017
https://dl.acm.org/doi/10.1145/3012278
Shen ZHe ZLi SWang QShao Z(2017)A Multi-Quadcopter Cooperative Cyber-Physical System for Timely Air Pollution LocalizationACM Transactions on Embedded Computing Systems10.1145/300571616:3(1-23)Online publication date: 28-Apr-2017
https://dl.acm.org/doi/10.1145/3005716
Bouraoui HJerad CChattopadhyay AHadj-Alouane N(2017)Hardware Architectures for Embedded Speaker Recognition ApplicationsACM Transactions on Embedded Computing Systems10.1145/297516116:3(1-28)Online publication date: 28-Apr-2017
https://dl.acm.org/doi/10.1145/2975161
Yazdani RSegura AArnau JGonzalez AHsu WYang CLipasti MLee H(2016)An ultra low-power hardware accelerator for automatic speech recognitionThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195696(1-12)Online publication date: 15-Oct-2016
https://dl.acm.org/doi/10.5555/3195638.3195696
Crescenzi PD'angelo GSeverini LVelaj Y(2016)Greedily Improving Our Own Closeness Centrality in a NetworkACM Transactions on Knowledge Discovery from Data10.1145/295388211:1(1-32)Online publication date: 20-Jul-2016
https://dl.acm.org/doi/10.1145/2953882
Wagner I(2016)Gender and Performance in Computer ScienceACM Transactions on Computing Education10.1145/292017316:3(1-16)Online publication date: 20-May-2016
https://dl.acm.org/doi/10.1145/2920173
Kölling MMcKay F(2016)Heuristic Evaluation for Novice Programming SystemsACM Transactions on Computing Education10.1145/287252116:3(1-30)Online publication date: 8-Jun-2016
https://dl.acm.org/doi/10.1145/2872521
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents