Research Paper:
Speech Feature Extraction in Broadcast Hosting Based on Fluctuating Equation Inversion
Chi Xu
School of Theater Drama and Art College, Shenyang Normal University
No.253 Huanghe North Avenue, Huanggu District, Shenyang, Liaoning 110136, China
Corresponding author
Speech is one of the most sophisticated human motor skills. Speaker identification is the ability of a software component or hardware to acquire a speech signal, recognize the speakers included in the signal, and identify the speaker after the audio signals have been received. This study proposes a fluctuating equation inversion method using feature extraction for broadcast hosting. Feature extraction aims to provide useful signal features from natural audio that can be applied to various downstream processes, including recitation, evaluation, and categorization. Initially, data were collected from the CASIA dataset. This study evaluated the experimental outcomes of the proposed approach using mel-frequency cepstral coefficients, gammatone frequency cepstral coefficients, and linear frequency cepstral coefficients. The proposed technique was tested on a publicly accessible dataset, and the findings showed that it performed better in terms of recognition accuracy (98%), precision (97%), recall (96.05%), sensitivity (92.56%), and F1-score (95.09%) than the conventional feature extraction methods. The proposed approach can be utilized to improve audio signal quality and user experience across broadcast-hosting applications.
- [1] X. Zhang, “Challenges, opportunities and innovations faced by the broadcasting and hosting industry in the era of convergence media,” Advances in Journalism and Communication, Vol.9, No.3, pp. 102-113, 2021. https://doi.org/10.4236/ajc.2021.93008
- [2] A. A. Setiadi, S. Afifi, and B. A. Suparno, “Adaptation of multi-platform broadcasting management in the disruption era: A case study of private television in Indonesia,” Asian J. of Media and Communication, Vol.5, No.2, pp. 191-206, 2021. https://doi.org/10.20885/asjmc.vol5.iss2.art5
- [3] Y. Chu and Y. Sun, “New media technology system and development path of broadcasting and hosting style in the new media era,” Int. Conf. on Cognitive Based Information Processing and Applications (CIPA 2021), Vol.2, pp. 584-591, 2022. https://doi.org/10.1007/978-981-16-5854-9_74
- [4] M. Z. Islam et al., “Regulating online broadcast media against offensive materials in Malaysia,” Indian J. of Science and Technology, Vol.14, No.15, pp. 1233-1238, 2021. https://doi.org/10.17485/IJST/v14i15.595
- [5] S. Yang et al., “LRW-1000: A naturally-distributed large-scale benchmark for lip reading in the wild,” 2019 14th IEEE Int. Conf. on Automatic Face & Gesture Recognition (FG 2019), 2019. https://doi.org/10.1109/FG.2019.8756582
- [6] S. Wang et al., “Exploring RNN-transducer for Chinese speech recognition,” Proc. of APSIPA Annual Summit and Conf. 2019, pp. 1364-1369, 2019.
- [7] X. Wang, X. Chen, and C. Cao, “Human emotion recognition by optimally fusing facial expression and speech feature,” Signal Processing: Image Communication, Vol.84, Article No.115831, 2020. https://doi.org/10.1016/j.image.2020.115831
- [8] Y. Zhang et al., “BigSSL: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition,” IEEE J. of Selected Topics in Signal Processing, Vol.16, No. 6, pp. 1519-1532, 2022. https://doi.org/10.1109/JSTSP.2022.3182537
- [9] R. Ranjan and A. Thakur, “Analysis of feature extraction techniques for speech recognition system,” Int. J. of Innovative Technology and Exploring Engineering, Vol.8, No.7C2, pp. 197-200, 2019.
- [10] P. Gimeno et al., “Multiclass audio segmentation based on recurrent neural networks for broadcast domain data,” EURASIP J. on Audio, Speech, and Music Processing, Vol.2020, Article No.5, 2020. https://doi.org/10.1186/s13636-020-00172-6
- [11] A. A. Khan and J. Shao, “SPNet: A deep network for broadcast sports video highlight generation,” Computers and Electrical Engineering, Vol.99, Article No.107779, 2022. https://doi.org/10.1016/j.compeleceng.2022.107779
- [12] H.-T. Hu and T.-T. Lee, “Frame-synchronized blind speech watermarking via improved adaptive mean modulation and perceptual-based additive modulation in the DWT domain,” Digital Signal Processing, Vol.87, pp. 75-85, 2019. https://doi.org/10.1016/j.dsp.2019.01.006
- [13] M. Rebol, C. Gütl, and K. Pietroszek, “Passing a non-verbal turing test: Evaluating gesture animations generated from speech,” 2021 IEEE Virtual Reality and 3D User Interfaces (VR), pp. 573-581, 2021. https://doi.org/10.1109/VR50410.2021.00082
- [14] R. Gu et al., “Neural spatial filter: Target speaker speech separation assisted with directional information,” Proc. of the 20th Annual Conf. of the Int. Speech Communication Association (Interspeech 2019), pp. 4290-4294, 2019. https://doi.org/10.21437/Interspeech.2019-2266
- [15] S. He, H. Li, and X. Zhang, “Speakerfilter: Deep learning-based target speaker extraction using anchor speech,” 2020 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 376-380, 2020. https://doi.org/10.1109/ICASSP40776.2020.9054222
- [16] N. L. Westhausen et al., “Reduction of subjective listening effort for TV broadcast signals with recurrent neural networks,” IEEE/ACM Trans. on Audio, Speech, and Language Processing, Vol.29, pp. 3541-3550, 2021. https://doi.org/10.1109/TASLP.2021.3126931
- [17] S. Dua et al., “Developing a speech recognition system for recognizing tonal speech signals using a convolutional neural network,” Applied Sciences, Vol.12, No.12, Article No.6223, 2022. https://doi.org/10.3390/app12126223
- [18] L. Lin and L. Tan, “Multi-distributed speech emotion recognition based on Mel frequency cepstogram and parameter transfer,” Chinese J. of Electronics, Vol.31, No.1, pp. 155-167, 2022. https://doi.org/10.1049/cje.2020.00.080
- [19] F. S. Cabral, H. Fukai, and S. Tamura, “Feature extraction methods proposed for speech recognition are effective on road condition monitoring using smartphone inertial sensors,” Sensors, Vol.19, No.16, Article No.3481, 2019. https://doi.org/10.3390/s19163481
- [20] H. Kondhalkar and P. Mukherji, “A novel algorithm for speech recognition using tonal frequency cepstral coefficients based on human cochlea frequency map,” J. of Engineering Science and Technology, Vol.14, No.2, pp. 726-746, 2019.
- [21] S. P. Dewi, A. L. Prasasti, and B. Irawan, “Analysis of LFCC feature extraction in baby crying classification using KNN,” 2019 IEEE Int. Conf. on Internet of Things and Intelligence System (IoTaIS), pp. 86-91, 2019. https://doi.org/10.1109/IoTaIS47347.2019.8980389
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.