Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2602339.2602352acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

RESONATE: reverberation environment simulation for improved classification of speech models

Published: 15 April 2014 Publication History

Abstract

Home monitoring systems currently gather information about peoples activities of daily living and information regarding emergencies, however they currently lack the ability to track speech. Practical speech analysis solutions are needed to help monitor ongoing conditions such as depression, as the amount of social interaction and vocal affect is important for assessing mood and well-being. Although there are existing solutions that classify the identity and the mood of a speaker, when the acoustic signals are captured in reverberant environments they perform poorly. In this paper, we present a practical reverberation compensation method called RESONATE, which uses simulated room impulse responses to adapt a training corpus for use in multiple real reverberant rooms. We demonstrate that the system creates robust classifiers that perform within 5 -- 10% of baseline accuracy of non-reverberant environments. We demonstrate and evaluate the performance of this matched condition strategy using a public dataset, and also in controlled experiments with six rooms, and two long-term and uncontrolled real deployments. We offer a practical implementation that performs collection, feature extraction, and classification on-node, and training and simulation of training sets on a base station or cloud service.

References

[1]
J. Mundt, P. Snyder, M. Cannizzaro, K. Chappie, and D. Geralts, "Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology," Journal of neurolinguistics, vol. 20, no. 1, pp. 50--64, 2007.
[2]
N. Cummins, J. Epps, M. Breakspear, and R. Goecke, "An investigation of depressed speech detection: features and normalization," in Interspeech, 2011, pp. 6--9.
[3]
A. Flint, S. Black, I. Campbell-Taylor, G. Gailey, and C. Levinton, "Abnormal speech articulation, psychomotor retardation, and subcortical dysfunction in major depression," Journal of psychiatric research, vol. 27, no. 3, pp. 309--319, 1993.
[4]
M. Alpert, E. Pouget, and R. Silva, "Reflections of depression in acoustic measures of the patient's speech," Journal of affective disorders, vol. 66, no. 1, pp. 59--69, 2001.
[5]
R. F. Dickerson, E. Gorlin, and J. A. Stankovic, "Empath: a continuous remote emotional health monitoring system for depressive illness," in Conference on Wireless Health, San Diego, CA, 2011.
[6]
H. Lu, A. Brush, B. Priyantha, A. Karlson, and J. Liu, "SpeakerSense: energy efficient unobtrusive speaker identification on mobile phones," in Pervasive, San Fransisco, CA, 2011.
[7]
C. Xu, S. Li, G. Liu, and Y. Zhang, "Crowd++: Unsupervised Speaker Count with Smartphones," in Ubicomp, 2013, pp. 43--52.
[8]
B. Schuller, D. Seppi, A. Batliner, A. Maier, S. Steidl, and S. S. De, "Toward more reality in the recognition of emotional speech," in IEEE Acoustics, Speech, and Signal Processing, no. 101, Honolulu, HI, 2007, pp. 941--944.
[9]
S. Nirjon, R. F. Dickerson, P. Asare, Q. Li, D. Hong, and J. A. Stankovic, "Auditeur: A Mobile-Cloud Service Platform for Acoustic Event Detection on Smartphones," in International Conference on Mobile Systems, Applications and Services (MobiSys 2013), Taipei, Taiwan, 2013.
[10]
A. Sehr, M. Delcroix, and K. Kinoshita, "Making machines understand us in reverberant rooms," IEEE Signal Processing, vol. 29, no. 6, pp. 114--126, 2012.
[11]
E. Habet, "Room impulse response generator for Matlab," 2012.
[12]
J. B. Allen and D. A. Berkley, "Image method for efficiently simulating small room acoustics," Acoustic Society of America, vol. 65, no. 4, pp. 943--950, 1979.
[13]
H. W. Lollmann, E. Yilmaz, M. Jeub, and P. Vary, "An improved algorithm for blind reverberation time estimation," in Acoustic Echo and Noise Control, no. 2, 2010, pp. 1--4.
[14]
P. Jeub, Marco and Schafer, Magnus and Vary, "A binaural room impulse database for the evaluation of dereverberation algorithms," in Digital Signal Processing, Santorini, Greece, 2009.
[15]
C. Plapous, C. Marro, and P. Scalart, "Improved signal to noise ratio estimation for speech enhancement," IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 6, pp. 2098--2108, Nov. 2006.
[16]
P. Khoa, "Noise robust voice activity detection," Ph.D. dissertation, Nanyang University, 2012.
[17]
J. Sohn, S. Member, N. S. Kim, and W. Sung, "A statistical model based voice activity detection," Signal Processing, vol. 6, no. 1, pp. 1998--2000, 1999.
[18]
J. Ramírez, J. Segura, J. M. Górriz, and L. Garcia, "Improved voice activity detection using contextual multiple hypothesis testing for robust speech recognition," IEEE Transactions on Audio, Speech & Language Processing, vol. 15, no. 8, pp. 2177--2189, 2007.
[19]
A. Cho, Yong Duk and Al-Naimi, Khaldoon and Kondoz, "Improved voice activity detection based on a smoothed statistical likelihood ratio," in Acoustics, Speech, and Signal Processing, Salt Lake City, UT, 2001, pp. 737--740.
[20]
B. Schuller, S. Steidl, and A. Batliner, "The interspeech 2009 emotion challenge," Interspeech, pp. 312--315, 2009.
[21]
F. Eyben, M. Wöllmer, and B. Schuller, "openSMILE: the Munich sersatile and fast open-source audio feature extractor," in ACM Proceedings of Multimedia (MM), Florence, Italy, 2010, pp. 1459--1462.
[22]
G. McKeown, M. F. Valstar, R. Cowie, and M. Pantic, "The SEMAINE corpus of emotionally coloured character interactions," IEEE Multimedia and Expo, pp. 1079--1084, Jul. 2010.
[23]
F. Burkhardt, A. Paeschke, and M. Rolfes, "A database of german emotional s peech," in European Conference on Speech Communication and Technology, 2005, pp. 3--6.
[24]
Y. wei Chen, "Combining SVMs with various feature selection strategies," in Feature Extraction, no. 1. Springer-Verlag, 2005.

Cited By

View all
  • (2021)Emotion Recognition Robust to Indoor Environmental Distortions and Non-targeted Emotions Using Out-of-distribution DetectionACM Transactions on Computing for Healthcare10.1145/34923003:2(1-22)Online publication date: 20-Dec-2021
  • (2019)ARASID: Artificial Reverberation-Adjusted Indoor Speaker Identification Dealing with Variable DistancesProceedings of the 2019 International Conference on Embedded Wireless Systems and Networks10.5555/3324320.3324339(154-165)Online publication date: 25-Feb-2019
  • (2019)SoundSemanticsProceedings of the 18th International Conference on Information Processing in Sensor Networks10.1145/3302506.3310402(217-228)Online publication date: 16-Apr-2019
  • Show More Cited By

Index Terms

  1. RESONATE: reverberation environment simulation for improved classification of speech models

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    IPSN '14: Proceedings of the 13th international symposium on Information processing in sensor networks
    April 2014
    368 pages
    ISBN:9781479931460

    Sponsors

    Publisher

    IEEE Press

    Publication History

    Published: 15 April 2014

    Check for updates

    Author Tags

    1. reverberation compensation
    2. speaker identification

    Qualifiers

    • Research-article

    Acceptance Rates

    IPSN '14 Paper Acceptance Rate 23 of 111 submissions, 21%;
    Overall Acceptance Rate 143 of 593 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 19 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Emotion Recognition Robust to Indoor Environmental Distortions and Non-targeted Emotions Using Out-of-distribution DetectionACM Transactions on Computing for Healthcare10.1145/34923003:2(1-22)Online publication date: 20-Dec-2021
    • (2019)ARASID: Artificial Reverberation-Adjusted Indoor Speaker Identification Dealing with Variable DistancesProceedings of the 2019 International Conference on Embedded Wireless Systems and Networks10.5555/3324320.3324339(154-165)Online publication date: 25-Feb-2019
    • (2019)SoundSemanticsProceedings of the 18th International Conference on Information Processing in Sensor Networks10.1145/3302506.3310402(217-228)Online publication date: 16-Apr-2019
    • (2017)SoundSifterProceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services10.1145/3081333.3081338(29-41)Online publication date: 16-Jun-2017
    • (2015)SARRIMAProceedings of the conference on Wireless Health10.1145/2811780.2811916(1-8)Online publication date: 14-Oct-2015

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media