Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3552466.3556529acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Detection of Synthetic Speech Based on Spectrum Defects

Published: 10 October 2022 Publication History

Abstract

Synthetic spoofing speech has become a threat to online communication and automatic speaker verification (ASV) systems based on deep learning since the synthetic model can produce anyone's voice. The first Audio Deep Synthesis Detection Challenge (ADD 2022) is launched to spur researchers around the world to build innovative new technologies that can further accelerate and foster research on detecting deep synthesis and manipulated speech. This paper presents a spoofing detection system submitted to ADD 2022 Track 3.2 Detection task (FG-D). The system consists of two parts to detect synthetic speech. First, Mel-frequency cepstral coefficients (MFCCs), Linear frequency cepstral coefficients (LFCCs), Delta coefficients, and Delta-Delta coefficients features derived from speech spectrogram are fed into DenseNet for building the DenseNet detection system (DDS). Then Mute segment classifier (MSC), High-frequency classifier (HFC), and Block spectrogram classifier (BSC) algorithms are designed for the defects of the synthetic speech on the spectrogram and the spectrum defect detection system SPECT is formed. The experimental results of the fusion system composed of SPECT and DDS in ADD FG-D demonstrate an EER of 8.5%, and our final submission ranks 6th in the evaluation phase of ADD FG-D.

Supplementary Material

MP4 File (DDAM-03.mp4)
Presentation video

References

[1]
Moustafa Alzantot, Ziqi Wang, and Mani B Srivastava. 2019. Deep residual neural networks for audio spoofing detection. arXiv preprint arXiv:1907.00501.
[2]
Zhongxin Bai and Xiao-Lei Zhang. 2021. Speaker recognition based on deep learning: an overview. Neural Networks, 140, 65--99.
[3]
Joaqun Cáceres, Roberto Font, Teresa Grau, Javier Molina, and Biometric Vox SL. 2021. The biometric vox system for the asvspoof 2021 challenge. In Proc. ASVspoof2021 Workshop.
[4]
Donald D Greenwood. 1961. Auditory masking and the critical band. The journal of the acoustical society of America, 33, 4, 484--502.
[5]
John HL Hansen and Taufiq Hasan. 2015. Speaker recognition by machines and humans: a tutorial review. IEEE Signal processing magazine, 32, 6, 74--99.
[6]
Md Afzal Hossan, Sheeraz Memon, and Mark A Gregory. 2010. A novel approach for MFCC feature extraction. In 2010 4th International Conference on Signal Processing and Communication Systems. IEEE, 1--5.
[7]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4700--4708.
[8]
Wen-Chin Huang, Chen-Chou Lo, Hsin-Te Hwang, Yu Tsao, and Hsin-Min Wang. 2018. Wavenet vocoder and its applications in voice conversion. In Proc. The 30th ROCLING Conference on Computational Linguistics and Speech Processing (ROCLING).
[9]
Lauri Juvela, Bajibabu Bollepalli, Xin Wang, Hirokazu Kameoka, Manu Airaksinen, Junichi Yamagishi, and Paavo Alku. 2018. Speech waveform synthesis from mfcc sequences with generative adversarial networks. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5679--5683.
[10]
Kshitiz Kumar, Chanwoo Kim, and Richard M Stern. 2011. Delta-spectral cepstral coefficients for robust speech recognition. In 2011 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, 4784--4787.
[11]
Anwei Luo, Enlei Li, Yongliang Liu, Xiangui Kang, and Z Jane Wang. 2021. A capsule network based approach for detection of audio spoofing attacks. In ICASSP 2021--2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6359--6363.
[12]
A. Nautsch, W. Xin, N. Evans, T. Kinnunen, and A. L. Kong. 2021. Asvspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech. IEEE Transactions on Biometrics Behavior and Identity Science, PP, 99, 1--1.
[13]
V. Tiwari. 2010. Mfcc and its applications in speaker recognition. international journal on emerging technologies issn.
[14]
Tomoki Toda, Ling-Hui Chen, Daisuke Saito, Fernando Villavicencio, Mirjam Wester, Zhizheng Wu, and Junichi Yamagishi. 2016. The voice conversion challenge 2016. In Interspeech, 1632--1636.
[15]
Anton Tomilov, Aleksei Svishchev, Marina Volkova, Artem Chirkovskiy, Alexander Kondratev, and Galina Lavrentyeva. 2021. Stc antispoofing systems for the asvspoof2021 challenge. In Proc. ASVspoof 2021 Workshop.
[16]
Aäron Van Den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W Senior, and Koray Kavukcuoglu. 2016. Wavenet: a generative model for raw audio. SSW, 125, 2.
[17]
Z. Wu, O. Watts, and S. King. 2016. Merlin: an open source neural network speech synthesis system. In 9th ISCA Speech Synthesis Workshop.
[18]
Zhizheng Wu, Nicholas Evans, Tomi Kinnunen, Junichi Yamagishi, Federico Alegre, and Haizhou Li. 2015. Spoofing and countermeasures for speaker verification: a survey. speech communication, 66, 130--153.
[19]
Zhizheng Wu, Junichi Yamagishi, Tomi Kinnunen, Cemal Hanilçi, Mohammed Sahidullah, Aleksandr Sizov, Nicholas Evans, Massimiliano Todisco, and Hector Delgado. 2017. Asvspoof: the automatic speaker verification spoofing and countermeasures challenge. IEEE Journal of Selected Topics in Signal Processing, 11, 4, 588--604.
[20]
Junichi Yamagishi et al. 2021. Asvspoof 2021: accelerating progress in spoofed and deepfake speech detection. arXiv preprint arXiv:2109.00537.
[21]
Jiangyan Yi et al. 2022. Add 2022: the first audio deep synthesis detection challenge. In ICASSP 2022--2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 9216--9220

Cited By

View all
  • (2025)A Survey on Speech Deepfake DetectionACM Computing Surveys10.1145/371445857:7(1-38)Online publication date: 24-Jan-2025
  • (2024)Dynamic Ensemble Teacher-Student Distillation Framework for Light-Weight Fake Audio DetectionIEEE Signal Processing Letters10.1109/LSP.2024.343193631(2305-2309)Online publication date: 2024
  • (2023)UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery LocalizationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613767(8749-8759)Online publication date: 26-Oct-2023
  • Show More Cited By

Index Terms

  1. Detection of Synthetic Speech Based on Spectrum Defects

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      DDAM '22: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia
      October 2022
      107 pages
      ISBN:9781450394963
      DOI:10.1145/3552466
      • General Chairs:
      • Jianhua Tao,
      • Haizhou Li,
      • Helen Meng,
      • Dong Yu,
      • Masato Akagi,
      • Program Chairs:
      • Jiangyan Yi,
      • Cunhang Fan,
      • Ruibo Fu,
      • Shan Lian,
      • Pengyuan Zhang
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 October 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. add
      2. deepfake
      3. spectrogram
      4. synthetic speech

      Qualifiers

      • Research-article

      Funding Sources

      • Ningbo Natural Science Foundation
      • Ningbo Science and Technology Innovation Project
      • the National Natural Science Foundation of China

      Conference

      MM '22
      Sponsor:

      Acceptance Rates

      DDAM '22 Paper Acceptance Rate 12 of 14 submissions, 86%;
      Overall Acceptance Rate 12 of 14 submissions, 86%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)44
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 13 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)A Survey on Speech Deepfake DetectionACM Computing Surveys10.1145/371445857:7(1-38)Online publication date: 24-Jan-2025
      • (2024)Dynamic Ensemble Teacher-Student Distillation Framework for Light-Weight Fake Audio DetectionIEEE Signal Processing Letters10.1109/LSP.2024.343193631(2305-2309)Online publication date: 2024
      • (2023)UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery LocalizationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613767(8749-8759)Online publication date: 26-Oct-2023
      • (2023)Learning From Yourself: A Self-Distillation Method For Fake Speech DetectionICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10096837(1-5)Online publication date: 4-Jun-2023
      • (2023)A voice spoofing detection framework for IoT systems with feature pyramid and online knowledge distillationJournal of Systems Architecture10.1016/j.sysarc.2023.102981143(102981)Online publication date: Oct-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media