Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Real-time translation of English speech through speech feature extraction

  • Original Article
  • Published:
Artificial Life and Robotics Aims and scope Submit manuscript

Abstract

Real-time English speech translation is useful in numerous situations, including business and travel. The goal of this research is to improve real-time English speech translation efficacy. Initially, filter bank (FBank) features were extracted from English speech. Subsequently, an enhanced Transformer model was introduced, incorporating a causal convolution module in the front end of the encoder to capture English speech features with location information. The performance of the optimized model in translating English speech to different target languages was tested using the MuST-C dataset. The results revealed differences in translation results for different target languages using the improved Transformer. The highest bilingual evaluation understudy (BLEU) score was observed for Spanish text at 20.84, while Russian text obtained the lowest score of 10.56. The average BLEU score was 18.51, with an average lag time delay of 1202.33 ms. Compared to the conventional Transformer model, the improved model exhibited higher BLEU scores, lower time delay, and optimal performance when utilizing a convolutional kernel size of 3 × 3. The results demonstrate the dependability of the improved Transformer model in real-time English speech translation, highlighting its practical usefulness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The data used and analyzed in this paper are available from the corresponding author upon reasonable request.

References

  1. Liu H, Zhang M, Pérez A, Xie N, Li B, Liu Q (2019) Role of language control during interbrain phase synchronization of cross-language communication. Neuropsychologia 131:316–324

    Article  Google Scholar 

  2. Gaido M, Tang Y, Kulikov I, Huang R, Gong H, Inaguma H (2023), Named Entity Detection and Injection for Direct Speech Translation. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, pp.1–5.

  3. Kano T, Sakti S, Nakamura S (2021), Transformer-Based Direct Speech-To-Speech Translation with Transcoder. In: 2021 IEEE Spoken Language Technology Workshop (SLT), Shenzhen, China, pp.958–965.

  4. Dinh TA, Liu D, Niehues J (2022), Tackling Data Scarcity in Speech Translation Using Zero-Shot Multilingual Machine Translation Techniques. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, pp.6222–6226.

  5. Wu YJ, Qin Y (2022) Machine translation of english speech: comparison of multiple algorithms. J Intell Syst 31:159–167

    Google Scholar 

  6. Iranzo-Sánchez J, Jorge J, Baquero-Arnal P, Silvestre-Cerdà JA, Giménez A, Civera J, Sanchis A, Juan A (2021) Streaming cascade-based speech translation leveraged by a direct segmentation model. Neural Netw 142:303–315

    Article  Google Scholar 

  7. Birkenbeuel J, Joyce H, Sahyouni R, Cheung D, Maducdoc MM, Mostaghni N, Sahyouni S, Djalilian H, Chen J, Lin HW (2021) Google translate in healthcare: preliminary evaluation of transcription, translation and speech synthesis accuracy. BMJ Innov 7:422–429

    Article  Google Scholar 

  8. Balpande M, Sansare R, Padelkar T, Shinde V (2021), Speaker Recognition based on Mel-Frequency Cepstral Coefficients and Vector Quantization. In: 2021 IEEE Bombay Section Signature Conference (IBSSC), Gwalior, India, pp.1–6.

  9. Ray S, Kinget PR (2023) Ultra-low-power and compact-area analog audio feature extraction based on time-mode analog filterbank interpolation and time-mode analog rectification. IEEE J Solid-State Circuits 58:1025–1036

    Article  Google Scholar 

  10. Miao H, Cheng G, Zhang P (2022) Low-latency transformer model for streaming automatic speech recognition. Electron Lett 58:44–46

    Article  Google Scholar 

  11. Wei Y, Wu C, Li G, Shi H (2022) Sequential transformer via an outside-in attention for image captioning. Eng Appl Artif Intell 108:1–8

    Article  Google Scholar 

  12. Dong Q, Cao C, Fu Y (2022), Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp.11348–11358.

  13. Wang H, Yang J, Wang R, Shi L (2023) Remaining useful life prediction of bearings based on convolution attention mechanism and temporal convolution network. IEEE Access 11:24407–24419

    Article  Google Scholar 

  14. Bhandari V, Londhe ND, Kshirsagar GB (2023) Compact temporal dilated convolution with channel-wise attention and cost sensitive learning for Single trial P300 detection. Biomed Signal Process Control 85:104924

    Article  Google Scholar 

  15. Cattoni R, Di Gangi MA, Bentivogli L, Negri M, Turchi M (2021) MuST-C: A multilingual corpus for end-to-end speech translation. Comput Speech Lang 66:1–14

    Article  Google Scholar 

  16. Adlaon KMM, Marcos N (2018), Neural Machine Translation for Cebuano to Tagalog with Subword Unit Translation. In: 2018 International Conference on Asian Language Processing (IALP), Bandung, Indonesia, pp. 328–333

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoyan Lei.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lei, X. Real-time translation of English speech through speech feature extraction. Artif Life Robotics 29, 410–415 (2024). https://doi.org/10.1007/s10015-024-00951-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10015-024-00951-w

Keywords

Navigation