research-article

Research on Speech Enhancement based on Full-scale Connection

Authors:

Yan HuAuthors Info & Claims

EITCE '21: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering

Pages 350 - 354

https://doi.org/10.1145/3501409.3501474

Published: 31 December 2021 Publication History

Abstract

In order to solve the problem that the popular monaural speech enhancement models that based on encoder-decoder do not make full use of full-scale features, a full-scale feature connected speech enhancement model FSC-SENet is proposed. Firstly, this paper constructs a speech enhancement model based on CRN architecture. Convolutional encoder and decoder are used to extract features and recover speech signals, and LSTM modules are used to extract temporal features at the bottleneck of the model. Then a full-scale connection method and multi feature dynamic fusion mechanism are proposed, so that the decoder can make full use of the full-scale features to recover clean speech in the decoding process. Experimental results on TIMIT corpus show that compared with CRN, our FSC-SENet improves PESQ score by 0.39 and STOI score by 2.8% under seen noise cases, and PESQ score by 0.43 and STOI score by 3.1% under unseen noise cases, which proves that the proposed full-scale connection and dynamic feature fusion mechanism can make CRN have better speech enhancement performance.

References

[1]

Wang Y, Wang D. Towards scaling up classification-based speech separation [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(7): 1381--1390.

Digital Library

[2]

Xu Y, Du J, Dai L-R, et al. A regression approach to speech enhancement based on deep neural networks [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 23(1): 7--19.

[3]

Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation [C]. MICCAI 2015, 2015: 234--241.

[4]

Badrinarayanan V, Kendall A, Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation [J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12): 2481--2495.

[5]

Jansson A, Humphrey E, Montecchio N, et al. Singing voice separation with deep u-net convolutional networks [C]. 18th International Society for Music Information Retrieval Conference, 2017: 23--27.

[6]

Stoller D, Ewert S, Dixon S. Wave-u-net: A multi-scale neural network for end-to-end audio source separation [C]. International Society for Music Information Retrieval (ISMIR) Conference 2018, 2018: 334--340.

[7]

Soni M H, Shah N, Patil H A. Time-frequency masking-based speech enhancement using generative adversarial network [C]. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018: 5039--5043.

[8]

Park S R, Lee J W. A fully convolutional neural network for speech enhancement [C]. Interspeech 2017, 2017: 1993--1997.

[9]

Tan K, Wang D. A convolutional recurrent neural network for real-time speech enhancement [C]. Interspeech, 2018: 3229--3233.

[10]

Li A, Zheng C, Fan C, et al. A recursive network with dynamic attention for monaural speech enhancement [C]. Interspeech 2020, 2020: 2422--2426.

[11]

Huang H, Lin L, Tong R, et al. Unet 3+: A full-scale connected unet for medical image segmentation [C]. ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020: 1055--1059.

[12]

Garofolo J S, Lamel L F, Fisher W M, et al. Darpa timit acoustic-phonetic continous speech corpus cd-rom. Nist speech disc 1--1.1 [J]. 1993, 93: 27403.

[13]

Hu G, Wang D. A tandem algorithm for pitch estimation and voiced speech segregation [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(8): 2067--2079.

Digital Library

[14]

Varga A, Steeneken H J. Assessment for automatic speech recognition: Ii. Noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems [J]. Speech communication, 1993, 12(3): 247--251.

Digital Library

[15]

Rix A W, Beerends J G, Hollier M P, et al. Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs [C]. 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings (Cat No 01CH37221), 2001: 749--752.

Digital Library

[16]

Taal C H, Hendriks R C, Heusdens R, et al. An algorithm for intelligibility prediction of time-frequency weighted noisy speech [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(7): 2125--2136.

Digital Library

Index Terms

Research on Speech Enhancement based on Full-scale Connection
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition

Recommendations

Combined speech enhancement and auditory modelling for robust distributed speech recognition

The performance of automatic speech recognition (ASR) systems in the presence of noise is an area that has attracted a lot of research interest. Additive noise from interfering noise sources, and convolutional noise arising from transmission channel ...
Reconstruction-based speech enhancement from robust acoustic features

A method of speech enhancement that reconstructs clean speech from acoustic features.Features estimated by a statistical method incorporating noise and speaker adaptation.Listening tests find enhancement highly effective in reducing background noise. ...
DNN-based speech enhancement with self-attention on feature dimension
Abstract
To make full use of the key information in frame-level features, a DNN-based model for speech enhancement is proposed using self-attention on the feature dimension. Two improvement strategies are adopted to strengthen the attention of the fully ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

EITCE '21: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering

October 2021

1723 pages

ISBN:9781450384322

DOI:10.1145/3501409

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 December 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

EITCE 2021

EITCE 2021: 2021 5th International Conference on Electronic Information Technology and Computer Engineering

October 22 - 24, 2021

Xiamen, China

Acceptance Rates

EITCE '21 Paper Acceptance Rate 294 of 531 submissions, 55%;

Overall Acceptance Rate 508 of 972 submissions, 52%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
24
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)2

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents