Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1508128.1508141acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

A multi-fpga 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer

Published: 22 February 2009 Publication History

Abstract

Today's best quality speech recognition systems are implemented in software. These systems fully occupy the resources of a high-end server to deliver results at real-time speed: each hour of audio requires a significant fraction of an hour of computation for recognition. This is profoundly limiting for applications that require extreme recognition speed, for example, high-volume tasks such as video indexing (e.g., YouTube), or high-speed tasks such as triage of homeland security intelligence. We describe the architecture and implementation of one critical component -- the backend search stage -- of a high-speed, large-vocabulary recognizer. Implemented on a multi-FPGA Berkeley Emulation Engine 2 (BEE2) platform, we handle a standard 5000-word Wall Street Journal speech benchmark. Our backend search engine can decode on average 10 times faster than real-time running at 100 MHz, i.e, 10x faster than real-time, with negligible degradation in accuracy, running at a clock rate ~ 30x slower than a conventional server. To the best of our knowledge, this is both the most complex, and the fastest recognizer ever to be realized in a hardware form.

References

[1]
X. Huang, A. Acero and H.-W. Hon, "Spoken Language Processing: A Guide to Theory, Algorithm and System Development", Prentice Hall PTR, New Jersey, 2001.
[2]
Leavitt, N. "Let's Hear It for Audio Mining", Computer, 35(10):23--25, Oct 2002.
[3]
Stolzle, A. et al. "Integrated Circuits for a Real-Time Large-Vocabulary Continuous Speech Recognition System," IEEE Journal of Solid-State Circuits, vol. 26 no. 1, pp 2--11, Jan 1991.
[4]
R. Kavaler et al., A Dynamic Time Warp Integrated Circuit for a 1000-Word Recognition System", IEEE Journal of Solid-State Circuits, vol SC-22, NO 1, February 1987, pp 3--14.
[5]
Cali, L., Lertora, F., Besana, M., Borgatti, M., "Co-Design Method Enables Speech Recognition SoC", EETimes, Nov. 2001, p. 12.
[6]
Mathew, B. Davis, A. and Fang, Z. "A Low-power Accelerator for the SPHINX 3 Speech Recognition System", International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp 210--219. ACM Press, 2003.
[7]
Krishna, R. Mahlke, S. and Austin, T. "Architectural optimizations for low-power, real-time speech recognition", International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp 220--231. 2003.
[8]
Nedevschi, S. Patra, R. and Brewer, E. "Hardware Speech Recognition on Low-Cost and Low-Power Devices," Proc. Design and Automation Conference, 2005.
[9]
"The Talking Cure", The Economist, Mar 12 2005, p. 11.
[10]
Lin, E., Yu. K., Rutenbar, R., Chen, T. "A 1000-Word Vocabulary, Speaker-Independent, Continuous Live-Mode Speech Recognizer Implemented in a Single FPGA", International Symposium on Field-Programmable Gate Arrays (FPGA), Feb 2007.
[11]
Chang, C. Wawrzynek, J. and Brodersen, R. W. "BEE2: A High-End Reconfigurable Computing System," IEEE Design and Test of Computers, vol. 22, no. 2, pp 114--125, Mar/Apr 2005.
[12]
CMU Sphinx Open Source Speech Recognition Engines, http://cmusphinx.sourceforge.net/html/cmusphinx.php.
[13]
Pallett, D., "A Look at NIST's Benchmark ASR Tests: Past, Present, and Future", Proc 2003 IEEE Workshop on Automatic Speech Recognition and Understanding.
[14]
Hwang, M. et al., "Subphonetic Modeling with Markov States--Senone", International Conference on Acoustics, Speech and Signal Processing, p. 33--36, Mar. 1992
[15]
Huang, X. D., Ariki, Y., and Jack, M. "Hidden Markov Models for Speech Recognition", Edinburgh University Press, 1990.
[16]
Viterbi, A.: "Error bounds for convolutional codes and an asymptotically optimum decoding algorithm", IEEE Transactions on Information Theory, vol. 13, pp 260--269 1967.

Cited By

View all
  • (2019)A Low-Power, High-Performance Speech Recognition AcceleratorIEEE Transactions on Computers10.1109/TC.2019.293707568:12(1817-1831)Online publication date: 1-Dec-2019
  • (2019)Parallelism Analysis for a Multi-core Speech Recognition Architecture2019 Argentine Conference on Electronics (CAE)10.1109/CAE.2019.8709282(92-97)Online publication date: Mar-2019
  • (2018)Sharing, protection, and compatibility for reconfigurable fabric with AmorphosProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291177(107-127)Online publication date: 8-Oct-2018
  • Show More Cited By

Index Terms

  1. A multi-fpga 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
        February 2009
        302 pages
        ISBN:9781605584102
        DOI:10.1145/1508128
        • General Chair:
        • Paul Chow,
        • Program Chair:
        • Peter Cheung
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 22 February 2009

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. dsp
        2. fpga
        3. in silico vox
        4. speech recognition

        Qualifiers

        • Research-article

        Conference

        FPGA '09
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 125 of 627 submissions, 20%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)8
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 16 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2019)A Low-Power, High-Performance Speech Recognition AcceleratorIEEE Transactions on Computers10.1109/TC.2019.293707568:12(1817-1831)Online publication date: 1-Dec-2019
        • (2019)Parallelism Analysis for a Multi-core Speech Recognition Architecture2019 Argentine Conference on Electronics (CAE)10.1109/CAE.2019.8709282(92-97)Online publication date: Mar-2019
        • (2018)Sharing, protection, and compatibility for reconfigurable fabric with AmorphosProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291177(107-127)Online publication date: 8-Oct-2018
        • (2017)Fault-Tolerant Preemptive Aperiodic RT Scheduling by Supervisory Control of TDES on MultiprocessorsACM Transactions on Embedded Computing Systems10.1145/301227816:3(1-25)Online publication date: 11-Apr-2017
        • (2017)A Multi-Quadcopter Cooperative Cyber-Physical System for Timely Air Pollution LocalizationACM Transactions on Embedded Computing Systems10.1145/300571616:3(1-23)Online publication date: 28-Apr-2017
        • (2017)Hardware Architectures for Embedded Speaker Recognition ApplicationsACM Transactions on Embedded Computing Systems10.1145/297516116:3(1-28)Online publication date: 28-Apr-2017
        • (2016)An ultra low-power hardware accelerator for automatic speech recognitionThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195696(1-12)Online publication date: 15-Oct-2016
        • (2016)Greedily Improving Our Own Closeness Centrality in a NetworkACM Transactions on Knowledge Discovery from Data10.1145/295388211:1(1-32)Online publication date: 20-Jul-2016
        • (2016)Gender and Performance in Computer ScienceACM Transactions on Computing Education10.1145/292017316:3(1-16)Online publication date: 20-May-2016
        • (2016)Heuristic Evaluation for Novice Programming SystemsACM Transactions on Computing Education10.1145/287252116:3(1-30)Online publication date: 8-Jun-2016
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media