Abstract
Speech represents the most natural and basic method of communication for living beings. Speech provides the most direct and natural way for humans, and even humans and machines, to communicate. People who do not have disabilities can converse with each other in natural language, however people who have disabilities, such as Deafness or Dumbness, can only communicate by texting and sign language. But one can use sign language when the other person is near to us. Speech detection/recognition is a segment of computer science which allows the computer to recognize and translate spoken language into text. Speech detection technology gives machines the ability to identify and respond to spoken commands. If we need to send any information, we can make audio and send it to them. Every time we speak or play audio, it consists of some signals. These signals are used to make communication between humans and machines. The current systems can only have applications on speech to text conversion. The proposed system tries to implement more by converting audio to text and as well as text to speech which are more useful. This project will aid in the conversion of audio to manuscript and manuscript to speech. This project also translates the languages which is helpful for illiterate people too.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Trivedi, A., Pant, N., Shah, P., Sonik, S., Agrawal, S.: Speech to text and text to speech recognition systems-a review. IOSR J. Comput. Eng. 20(2), 36–43 (2018)
Shakhovska, N., Basystiuk, O., Shakhovska, K.: Development of the speech-to- text chatbot interface based on google API. In: MoMLeT, pp. 212–221 (2019)
Benkerzaz, S., Elmir, Y., Dennai, A.: A study on automatic speech recognition. J. Inf. Technol. Rev. 10(3), 80–83 (2019)
Thiruvengatanadhan, R.: Speech recognition using SVM. Int. Res. J. Eng. Technol. (IRJET) 5(9), 918–921 (2018)
Basystiuk, O., et al.: The developing of the system for automatic audio to text conversion. In: IT&AS, pp. 1–8 (2021)
Tsap, V., Shakhovska, N., Sokolovskyi, I.: The developing of the system for automatic audio to text conversion. In: MoMLeT+DS, pp. 75–84 (2021)
Tjandra, A., Sakti, S., Nakamura, S.: Machine speech chain. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 976–989 (2020)
Anidjar, O.H., Lapidot, I., Hajaj, C., Dvir, A., Gilad, I.: Hybrid speech and text analysis methods for speaker change detection. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 2324–2338 (2021)
Ren, Y., et al.: Fastspeech: fast, robust and controllable text to speech. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Thiruvengatanadhan, R.: Speech recognition using sonogram and AANN (2019)
Bain, K., Basson, S.H., Wald, M.: Speech recognition in university classrooms: liberated learning project. In: Proceedings of the Fifth International ACM Conference on Assistive Technologies, July 2002
Kumar, N., Narang, A., Lall, B.: Zero-shot normalization driven multi- speaker text to speech synthesis. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 1679–1693 (2022)
Novitasari, S., Sakti, S., Nakamura, S.: A machine speech chain approach for dynamically adaptive Lombard TTS in static and dynamic noise environments. IEEE/ACM Trans. Audio Speech Lang. Process. (2022)
Zheng, Y., Tao, J., Wen, Z., Yi, J.: Forward–backward decoding sequence for regularizing end-to-end tts. IEEE/ACM Trans. Audio Speech Lang. Process. 27(12), 2067–2079 (2019)
Valentini-Botinhao, C., Yamagishi, J.: Speech enhancement of noisy and reverberant speech for text-to-speech. IEEE/ACM Trans. Audio Speech Lang. Process. 26(8), 1420–1433 (2018)
Gaikwad, S.K., Gawali, B.W., Yannawar, P.: A review on speech recognition technique. Int. J. Comput. Appl. 10(3), 16–24 (2015)
Babu Pandipati, D.R.: Speech to text conversion using deep learningneural net methods. Turkish J. Comput. Math. Educ. (TURCOMAT), 12(5), 2037–2042 (2021)
Nadig, P.P.S., Pooja, G., Kavya, D., Chaithra, R., Radhika, A.D.: Survey on text-to-speech Kannada using neural networks. Int. J. Adv. Res. Ideas Innov. Technol. 5(6), 128 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Panapana, P., Pothala, E.R., Nagireddy, S.S.L., Mattaparthi, H.P., Meesala, N. (2023). Automatic Bidirectional Conversion of Audio and Text: A Review from Past Research. In: Abraham, A., Pllana, S., Casalino, G., Ma, K., Bajaj, A. (eds) Intelligent Systems Design and Applications. ISDA 2022. Lecture Notes in Networks and Systems, vol 716. Springer, Cham. https://doi.org/10.1007/978-3-031-35501-1_30
Download citation
DOI: https://doi.org/10.1007/978-3-031-35501-1_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35500-4
Online ISBN: 978-3-031-35501-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)