Abstract
To secure the privacy, confidentiality, and integrity of Speech Data (SD), the concept of secure Speech Recognition (SR) involves accurately recording and comprehending spoken language while employing diverse security processes. As the Internet of Things (IoT) rapidly evolves, the integration of SR capabilities into IoT devices gains significance. However, ensuring the security and privacy of private SD post-integration remains a critical concern. Despite the potential benefits, implementing the proposed Reptile Search Optimized Hidden Markov Model (RSO-HMM) for SR and integrating it with IoT devices may encounter complexities due to diverse device types. Moreover, the challenge of maintaining data security and privacy for assigned SD in practical IoT settings poses a significant hurdle. Ensuring seamless interoperability and robust security measures is essential. We introduce the Reptile Search Optimized Hidden Markov Model (RSO-HMM) for SR, utilizing retrieved aspects as speech data. Gathering a diverse range of SD from speakers with varying linguistic backgrounds enhances the accuracy of the SR system. Preprocessing involves Z-score normalization for robustness and mitigation of outlier effects. The Perceptual Linear Prediction (PLP) technique facilitates efficient extraction of essential acoustic data from speech sources. Addressing data security, Elliptic Curve Cryptography (ECC) is employed for encryption, particularly suited for resource-constrained scenarios. Our study evaluates the SR system, employing key performance metrics including accuracy, precision, recall, and F1 score. The thorough assessment demonstrates the system's remarkable performance, achieving an impressive accuracy of 96%. The primary objective revolves around appraising the system's capacity and dependability in accurately transcribing speech signals. By proposing a comprehensive approach that combines the RSO-HMM for SR, data preprocessing techniques, and ECC encryption, this study advocates for the wider adoption of SR technology within the IoT ecosystem. By tackling critical data security concerns, this approach paves the way for a safer and more efficient globally interconnected society, encouraging the broader utilization of SR technology in various applications.
Similar content being viewed by others
References
Abdulkareem, A., Somefun, T.E., Chinedum, O.K., Agbetuyi, F., Somefun, T.E.: Design and implementation of a speech recognition system integrated with the Internet of Things. Int. J. Electr. Comput. Eng. (IJECE) 11(2), 1796–1803 (2021)
Abdullah, H., Warren, K., Bindschaedler, V., Papernot, N., Traynor, P.: Sok: the faults in our answers: an overview of attacks against automatic speech recognition and speaker identification systems. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 730–747. IEEE (2021)
Aghakhani, H., Schönherr, L., Eisenhofer, T., Kolossa, D., Holz, T., Kruegel, C., Vigna, G.: VenoMave: targeted poisoning against speech recognition. In: 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pp. 404–417. IEEE (2023)
Aloufi, R., Haddadi, H. and Boyle, D.: Configurable privacy-preserving automatic speech recognition (2021). arXiv preprint arXiv:2104.00766
Bajpai, S., Radha, D.: Smartphone as a controlling device for smart homes using speech recognition. In: 2019 International Conference on Communication and Signal Processing (ICCSP), pp. 0701–0705. IEEE (2019)
Chen, Y., Zhang, J., Yuan, X., Zhang, S., Chen, K., Wang, X., Guo, S.: Sok: a modularized approach to study the security of automatic speech recognition systems. ACM Trans. Privacy Secur. 25(3), 1–31 (2022)
Elrefaei, L.A., Alhassan, T.Q., Omar, S.S.: An Arabic visual dataset for visual speech recognition. Procedia Comput. Sci. 163, 400–409 (2019)
Ge, Y., Ansari, S., Abdulghani, A., Imran, M.A., Abbasi, Q.H.: Intelligent instruction-based IoT framework for smart home applications using speech recognition. In: 2020 IEEE International Conference on Smart Internet of Things (SmartIoT), pp. 197–204. IEEE (2020)
Gondi, S., Pratap, V.: Performance evaluation of offline speech recognition on edge devices. Electronics 10(21), 2697 (2021)
Ibrahim, H. and Varol, A., 2020. A study on automatic speech recognition systems. In 2020 8th International Symposium on Digital Forensics and Security (ISDFS) (pp. 1–5). IEEE.
Isyanto, H., Arifin, A.S., Suryanegara, M.: Performance of smart personal assistant applications based on speech recognition technology using IoT-based voice commands. In: 2020 International Conference on Information and Communication Technology Convergence (ICTC), pp. 640–645. IEEE (2020)
Jain, N., Rastogi, S.: Speech recognition systems—a comprehensive study of concepts and mechanisms. Acta Inform. Malays. (AIM) 3(1), 1–3 (2019)
Jiang, D., Tan, C., Peng, J., Chen, C., Wu, X., Zhao, W., Song, Y., Tong, Y., Liu, C., Xu, Q., Yang, Q.: A gdpr-compliant ecosystem for speech recognition with transfer, federated, and evolutionary learning. ACM Trans. Intell. Syst. Technol. (TIST) 12(3), 1–19 (2021)
Knott, B., Venkataraman, S., Hannun, A., Sengupta, S., Ibrahim, M., van der Maaten, L.: Crypten: Secure multi-party computation meets machine learning. Adv. Neural. Inf. Process. Syst. 34, 4961–4973 (2021)
Kong, Y., Zhang, J. Adversarial audio: a new information hiding method and backdoor for dnn-based speech recognition models (2019). arXiv preprint arXiv:1904.03829
Kubanek, M., Bobulski, J., Kulawik, J.: A method of speech coding for speech recognition using a convolutional neural network. Symmetry 11(9), 1185 (2019)
Ma, P., Petridis, S., Pantic, M.: Detecting adversarial attacks on audiovisual speech recognition. In: ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6403–6407. IEEE (2021)
Ma, Z., Liu, Y., Liu, X., Ma, J., Li, F.: Privacy-preserving outsourced speech recognition for smart IoT devices. IEEE Internet Things J. 6(5), 8406–8420 (2019)
Mukhamadiyev, A., Khujayarov, I., Djuraev, O., Cho, J.: Automatic speech recognition method based on deep learning approaches for Uzbek language. Sensors 22(10), 3683 (2022)
Munir, A., Ehsan, S.K., Raza, S.M., Mudassir, M.: Face and speech recognition-based smart home. In: 2019 International Conference on Engineering and Emerging Technologies (ICEET), pp. 1–5. IEEE (2019)
Mustafa, M.K., Allen, T., Appiah, K.: A comparative review of dynamic neural networks and hidden Markov model methods for mobile on-device speech recognition. Neural Comput. Appl. 31, 891–899 (2019)
Savchenko, V.V.: Acoustic variability of voice signals as a factor of information security for automatic speech recognition systems with tuning to user’s voice. Radioelectron. Commun. Syst. 63, 532–542 (2020)
Shi, X., Yu, F., Lu, Y., Liang, Y., Feng, Q., Wang, D., Qian, Y., Xie, L.: The accented English speech recognition challenge 2020: open datasets, tracks, baselines, results, and methods. In: ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6918–6922. IEEE (2021)
Sim, K.C., Zadrazil, P., Beaufays, F.: An investigation into on-device personalization of end-to-end automatic speech recognition models (2019). arXiv preprint arXiv:1909.06678
Sood, M., Jain, S.: Speech recognition employing mfcc and dynamic time warping algorithm. In: Innovations in Information and Communication Technologies (IICT-2020) Proceedings of International Conference on ICRIHE-2020, Delhi, India: IICT-2020, pp. 235–242. Springer International Publishing (2021)
Swetha, P., Srilatha, J.: Applications of speech recognition in the agriculture sector: a review. ECS Trans. 107(1), 19377 (2022)
Wang, P., Lu, X., Sun, H., Lv, W.: Application of speech recognition technology in IoT smart home. In: 2019 IEEE 3rd Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), pp. 1264–1267. IEEE (2019)
Wang, Q., Feng, C., Xu, Y., Zhong, H., Sheng, V.S.: A novel privacy-preserving speech recognition framework using bidirectional LSTM. J. Cloud Comput. 9, 1–13 (2020)
Yang, C.H., Qi, J., Chen, P.Y., Ma, X., Lee, C.H.: Characterizing speech adversarial examples using self-attention u-net enhancement. In: ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3107–3111. IEEE (2020)
Yang, Q., Jin, W., Zhang, Q., Wei, Y., Guo, Z., Li, X., Yang, Y., Luo, Q., Tian, H., Ren, T.L.: Mixed-modality speech recognition and interaction using a wearable artificial throat. Nat. Mach. Intell. 5(2), 169–180 (2023)
Yu, F., Xu, Z., Liu, C., Chen, X.: Masker: adaptive mobile security enhancement against automatic speech recognition in eavesdropping. In: Proceedings of the 56th Annual Design Automation Conference 2019, pp. 1–6 (2019)
Zhang, S.X., Gong, Y., Yu, D.: Encrypted speech recognition using deep polynomial networks. In: ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5691–5695. IEEE (2019)
Acknowledgements
Key information project of China Southern Power Grid Energy Research Institute Power planning basic database v1.0 construction project of China Southern Power Grid (0006200000081599).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design.All authors read and approved the final manuscript
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Z., He, S. & Li, G. Secure speech-recognition data transfer in the internet of things using a power system and a tried-and-true key generation technique. Cluster Comput 27, 14669–14684 (2024). https://doi.org/10.1007/s10586-024-04649-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-024-04649-3