research-article

RESONATE: reverberation environment simulation for improved classification of speech models

Authors:

Robert F. Dickerson,

Shahriar Nirjon,

John A. StankovicAuthors Info & Claims

IPSN '14: Proceedings of the 13th international symposium on Information processing in sensor networks

Pages 107 - 118

Published: 15 April 2014 Publication History

Abstract

Home monitoring systems currently gather information about peoples activities of daily living and information regarding emergencies, however they currently lack the ability to track speech. Practical speech analysis solutions are needed to help monitor ongoing conditions such as depression, as the amount of social interaction and vocal affect is important for assessing mood and well-being. Although there are existing solutions that classify the identity and the mood of a speaker, when the acoustic signals are captured in reverberant environments they perform poorly. In this paper, we present a practical reverberation compensation method called RESONATE, which uses simulated room impulse responses to adapt a training corpus for use in multiple real reverberant rooms. We demonstrate that the system creates robust classifiers that perform within 5 -- 10% of baseline accuracy of non-reverberant environments. We demonstrate and evaluate the performance of this matched condition strategy using a public dataset, and also in controlled experiments with six rooms, and two long-term and uncontrolled real deployments. We offer a practical implementation that performs collection, feature extraction, and classification on-node, and training and simulation of training sets on a base station or cloud service.

References

[1]

J. Mundt, P. Snyder, M. Cannizzaro, K. Chappie, and D. Geralts, "Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology," Journal of neurolinguistics, vol. 20, no. 1, pp. 50--64, 2007.

[2]

N. Cummins, J. Epps, M. Breakspear, and R. Goecke, "An investigation of depressed speech detection: features and normalization," in Interspeech, 2011, pp. 6--9.

[3]

A. Flint, S. Black, I. Campbell-Taylor, G. Gailey, and C. Levinton, "Abnormal speech articulation, psychomotor retardation, and subcortical dysfunction in major depression," Journal of psychiatric research, vol. 27, no. 3, pp. 309--319, 1993.

[4]

M. Alpert, E. Pouget, and R. Silva, "Reflections of depression in acoustic measures of the patient's speech," Journal of affective disorders, vol. 66, no. 1, pp. 59--69, 2001.

[5]

R. F. Dickerson, E. Gorlin, and J. A. Stankovic, "Empath: a continuous remote emotional health monitoring system for depressive illness," in Conference on Wireless Health, San Diego, CA, 2011.

Digital Library

[6]

H. Lu, A. Brush, B. Priyantha, A. Karlson, and J. Liu, "SpeakerSense: energy efficient unobtrusive speaker identification on mobile phones," in Pervasive, San Fransisco, CA, 2011.

Digital Library

[7]

C. Xu, S. Li, G. Liu, and Y. Zhang, "Crowd++: Unsupervised Speaker Count with Smartphones," in Ubicomp, 2013, pp. 43--52.

Digital Library

[8]

B. Schuller, D. Seppi, A. Batliner, A. Maier, S. Steidl, and S. S. De, "Toward more reality in the recognition of emotional speech," in IEEE Acoustics, Speech, and Signal Processing, no. 101, Honolulu, HI, 2007, pp. 941--944.

[9]

S. Nirjon, R. F. Dickerson, P. Asare, Q. Li, D. Hong, and J. A. Stankovic, "Auditeur: A Mobile-Cloud Service Platform for Acoustic Event Detection on Smartphones," in International Conference on Mobile Systems, Applications and Services (MobiSys 2013), Taipei, Taiwan, 2013.

Digital Library

[10]

A. Sehr, M. Delcroix, and K. Kinoshita, "Making machines understand us in reverberant rooms," IEEE Signal Processing, vol. 29, no. 6, pp. 114--126, 2012.

[11]

E. Habet, "Room impulse response generator for Matlab," 2012.

[12]

J. B. Allen and D. A. Berkley, "Image method for efficiently simulating small room acoustics," Acoustic Society of America, vol. 65, no. 4, pp. 943--950, 1979.

[13]

H. W. Lollmann, E. Yilmaz, M. Jeub, and P. Vary, "An improved algorithm for blind reverberation time estimation," in Acoustic Echo and Noise Control, no. 2, 2010, pp. 1--4.

[14]

P. Jeub, Marco and Schafer, Magnus and Vary, "A binaural room impulse database for the evaluation of dereverberation algorithms," in Digital Signal Processing, Santorini, Greece, 2009.

Digital Library

[15]

C. Plapous, C. Marro, and P. Scalart, "Improved signal to noise ratio estimation for speech enhancement," IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 6, pp. 2098--2108, Nov. 2006.

Digital Library

[16]

P. Khoa, "Noise robust voice activity detection," Ph.D. dissertation, Nanyang University, 2012.

[17]

J. Sohn, S. Member, N. S. Kim, and W. Sung, "A statistical model based voice activity detection," Signal Processing, vol. 6, no. 1, pp. 1998--2000, 1999.

[18]

J. Ramírez, J. Segura, J. M. Górriz, and L. Garcia, "Improved voice activity detection using contextual multiple hypothesis testing for robust speech recognition," IEEE Transactions on Audio, Speech & Language Processing, vol. 15, no. 8, pp. 2177--2189, 2007.

Digital Library

[19]

A. Cho, Yong Duk and Al-Naimi, Khaldoon and Kondoz, "Improved voice activity detection based on a smoothed statistical likelihood ratio," in Acoustics, Speech, and Signal Processing, Salt Lake City, UT, 2001, pp. 737--740.

Digital Library

[20]

B. Schuller, S. Steidl, and A. Batliner, "The interspeech 2009 emotion challenge," Interspeech, pp. 312--315, 2009.

[21]

F. Eyben, M. Wöllmer, and B. Schuller, "openSMILE: the Munich sersatile and fast open-source audio feature extractor," in ACM Proceedings of Multimedia (MM), Florence, Italy, 2010, pp. 1459--1462.

Digital Library

[22]

G. McKeown, M. F. Valstar, R. Cowie, and M. Pantic, "The SEMAINE corpus of emotionally coloured character interactions," IEEE Multimedia and Expo, pp. 1079--1084, Jul. 2010.

[23]

F. Burkhardt, A. Paeschke, and M. Rolfes, "A database of german emotional s peech," in European Conference on Speech Communication and Technology, 2005, pp. 3--6.

[24]

Y. wei Chen, "Combining SVMs with various feature selection strategies," in Feature Extraction, no. 1. Springer-Verlag, 2005.

Cited By

Gao YSalekin AGordon KRose KWang HStankovic J(2021)Emotion Recognition Robust to Indoor Environmental Distortions and Non-targeted Emotions Using Out-of-distribution DetectionACM Transactions on Computing for Healthcare10.1145/34923003:2(1-22)Online publication date: 20-Dec-2021
https://dl.acm.org/doi/10.1145/3492300
Chen ZAhmed MSalekin AStankovic JLiu YXing GHe YPicco G(2019)ARASID: Artificial Reverberation-Adjusted Indoor Speaker Identification Dealing with Variable DistancesProceedings of the 2019 International Conference on Embedded Wireless Systems and Networks10.5555/3324320.3324339(154-165)Online publication date: 25-Feb-2019
https://dl.acm.org/doi/10.5555/3324320.3324339
Islam MNirjon SEskicioglu RMottola LPriyantha B(2019)SoundSemanticsProceedings of the 18th International Conference on Information Processing in Sensor Networks10.1145/3302506.3310402(217-228)Online publication date: 16-Apr-2019
https://dl.acm.org/doi/10.1145/3302506.3310402
Show More Cited By

Index Terms

RESONATE: reverberation environment simulation for improved classification of speech models
1. Software and its engineering

Recommendations

Text-Independent/Text-Prompted Speaker Recognition by Combining Speaker-Specific GMM with Speaker Adapted Syllable-Based HMM

We presented a new text-independent/text-prompted speaker recognition method by combining speaker-specific Gaussian Mixture Model (GMM) with syllable-based HMM adapted by MLLR or MAP. The robustness of this speaker recognition method for speaking style'...
Acoustic analysis and feature transformation from neutral to whisper for speaker identification within whispered speech audio streams

Whispered speech is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect privacy and to avoid certain content from being overheard or made public. Due to the ...
Modulation Spectral Features for Robust Far-Field Speaker Identification

In this paper, auditory inspired modulation spectral features are used to improve automatic speaker identification (ASI) performance in the presence of room reverberation. The modulation spectral signal representation is obtained by first filtering the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

IPSN '14: Proceedings of the 13th international symposium on Information processing in sensor networks

April 2014

368 pages

ISBN:9781479931460

General Chair:
Adam Wolisz
TU Berlin, Germany
,
Program Chairs:
Jie Liu
Microsoft Research, USA
,
Lin Zhong
Rice University, USA

Sponsors

IEEE
SIGBED: ACM Special Interest Group on Embedded Systems
IEEE-CS\DATC: IEEE Computer Society

Publisher

IEEE Press

Publication History

Published: 15 April 2014

Check for updates

Author Tags

Qualifiers

Research-article

Acceptance Rates

IPSN '14 Paper Acceptance Rate 23 of 111 submissions, 21%;

Overall Acceptance Rate 143 of 593 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
122
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gao YSalekin AGordon KRose KWang HStankovic J(2021)Emotion Recognition Robust to Indoor Environmental Distortions and Non-targeted Emotions Using Out-of-distribution DetectionACM Transactions on Computing for Healthcare10.1145/34923003:2(1-22)Online publication date: 20-Dec-2021
https://dl.acm.org/doi/10.1145/3492300
Chen ZAhmed MSalekin AStankovic JLiu YXing GHe YPicco G(2019)ARASID: Artificial Reverberation-Adjusted Indoor Speaker Identification Dealing with Variable DistancesProceedings of the 2019 International Conference on Embedded Wireless Systems and Networks10.5555/3324320.3324339(154-165)Online publication date: 25-Feb-2019
https://dl.acm.org/doi/10.5555/3324320.3324339
Islam MNirjon SEskicioglu RMottola LPriyantha B(2019)SoundSemanticsProceedings of the 18th International Conference on Information Processing in Sensor Networks10.1145/3302506.3310402(217-228)Online publication date: 16-Apr-2019
https://dl.acm.org/doi/10.1145/3302506.3310402
Islam MIslam BNirjon SChoudhury TKo SCampbell AGanesan D(2017)SoundSifterProceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services10.1145/3081333.3081338(29-41)Online publication date: 16-Jun-2017
https://dl.acm.org/doi/10.1145/3081333.3081338
Emi IStankovic JNilsen WStankovic J(2015)SARRIMAProceedings of the conference on Wireless Health10.1145/2811780.2811916(1-8)Online publication date: 14-Oct-2015
https://dl.acm.org/doi/10.1145/2811780.2811916

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents