Common sounds in bedrooms (CSIBE) corpora for sound event recognition of domestic robots

290 Accesses
2 Citations
Explore all metrics

Abstract

Although sound event recognition attracted much attention in the scientific community, applications in the robotics domain have not been in the focus. A new database was published in this paper and classifiers were evaluated with this dataset to guide the future practical developments of domestic robots. A corpus (CSIBE-RAW) was collected from the internet to build acoustic models to recognize 13 sound events and omit ambient sounds. As a case study, CSIBE-RAW was rerecorded in four room settings (CSIBE-AIBO) to create reverberation-tolerant classifiers for a Sony ERS-7. After eight classifiers were reviewed, the convolutional neural network achieved the best accuracy (95.07%) after multi-conditional learning and it was suitable for real-time classification on the robot. The effects of lossy audio codecs were studied, lossy encoder-tolerant audio statistics were specified for the feature vector and the Ogg Vorbis encoder with 128 kbit VBR was found superior to store big data and avoid any significant accuracy loss with the compression ratio 1:8.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust technique for environmental sound classification using convolutional recurrent neural network

Article 07 December 2023

Audio Event Detection Using Wireless Sensor Networks Based on Deep Learning

Edge AI Implementation for Recognizing Sounds Created by Human Activities in Smart Offices Design Concepts

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Andrew G, Gao J (2007) Scalable training of L1-regularized log-linear models. In: Proceedings of the 24th international conference on Machine learning, pp 33–40
Beltrán J, Chávez E, Favela J (2015) Scalable identification of mixed environmental sounds, recorded from heterogeneous sources. J Pattern Recognit Lett 68:153–160
Article Google Scholar
Bergstra J, Casagrande N, Erhan D et al (2006) Aggregate features and AdaBoost for music classification. J Mach Learn 65(2):473–484
Article Google Scholar
Besacier L, Bergamini C, Vaufreydaz D, Castelli E (2001) The effect of speech and audio compression on speech recognition performance. In: Proceedings of the 4th IEEE international symposium on signal processing, pp 301–306
Borsky M, Pollak P, Mizera P (2015) Advanced acoustic modelling techniques in MP3 speech recognition. EURASIP J Audio Speech Music Process 1:1–7
Google Scholar
Bradski GR, Kaehler A (2008) Learning OpenCV, 1st edn. O’Reilly Media, Newton
Google Scholar
Bullock J (2007) LibXtract: a lightweight library for audio feature extraction. In: Proceedings of international computer music conference
Cakir E, Heittola T, Huttunen H, et al (2016) Polyphonic sound event detection using multi label deep neural networks. In: Proceedings of IEEE international joint conference on neural networks (IJCNN 2016)
Chmulik M, Jarina R (2012) Bio-inspired optimization of acoustic features for generic sound recognition. In: Proceedings of 19th international conference on systems, signals and image processing (IWSSIP), pp 629–632
Choi I, Kwon K, Hyun Bae S, et al (2016) DNN-based sound event detection with exemplar-based approach for noise reduction. In: Proceedings of detection and classification of acoustic scenes and events workshop (DCASE2016)
Chu S, Narayanan S, Kuo CCJ (2009) Environmental sound recognition with time-frequency audio features. IEEE Trans Audio Speech Lang Process 17(6):1142–1158
Article Google Scholar
Delgado-Contreras JR, Garcia-Vazquez JP, Brena RF (2014) Classification of environmental audio signals using statistical time and frequency features. In: Proceedings of international conference on electronics, communications and computers (CONIELECOMP), pp 212–216
Dennis J (2014) Sound event recognition in unstructured environments using spectrogram image processing. Ph.D. thesis, Nanyang Technological University
Foster P, Sigtia S, Krstulovic S, Barkerh J (2015) CHiME-Home: a dataset for sound source recognition in a domestic environment. In: Proceedings of 11th IEEE workshop on applications of signal processing to audio and acoustics (WASPAA)
Goldstein EB (2010) Sensation and perception. Wadsworth, p 490
Hertel L, Phan H, Mertins A (2016) Comparing time and frequency domain for audio event recognition using deep learning. In: Proceedings of IEEE international joint conference on neural networks (IJCNN 2016). arXiv:1603.05824
Hsieh C-J, Chang K-W, Lin C-J (2008) A dual coordinate descent method for large-scale linear SVM. In: Proceedings of 25th international conference on machine learning, pp 408–415
Jensen K (1999) Timbre models of musical sounds. Ph.D. dissertation, DIKU report
King DE (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10:1755–1758
Google Scholar
Maxime J, Alameda-Pineda X, Girin L, Horaud R (2014) Sound representation and classification benchmark for domestic robots. In: Proceedings of IEEE international conference on robotics and automation (ICRA)
McLoughlin I, Zhang H, Xie Z, Song Y, Xiao W (2015) Robust sound event classification using deep neural networks. IEEE/ACM Trans Audio Speech Lang Process 23(3):540–552
Article Google Scholar
Mesaros A, Heittola T, Eronen A, Virtanen T (2010) Acoustic event detection in real life recordings. In: Proceedings of EUSIPCO
Ng PS, Sanches I (2004) The influence of audio compression on speech recognition systems. In: Proceedings of 9th conference on speech and computer
Ness S, Trail S, Driessen P, Schloss A, Tzanetakis G (2011) Music information robotics: coping strategies for musically challenged robots. In: Proceedings of 12th international society for music information retrieval conference (ISMIR), pp 567–572
Nouza J, Cerva P, Silovsky J (2013) Adding controlled amount of noise to improve recognition of compressed and spectrally distorted speech. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 8046–8050
Phan H, Maas M, Mazur R, Mertins A (2015) Random regression forests for acoustic event detection and classification. IEEE/ACM Trans Audio Speech Lang Process 23(1):20–31
Article Google Scholar
Phan H, Hertel L, Maass M, et al (2016) Robust audio event recognition with 1-max pooling convolutional neural networks. In: Proceedings of 17th annual conference of the interenational speech communication association (INTERSPEECH 2016). arXiv:1604.06338
Plinge A, Grzeszick R, Fink G A (2014) A bag-of-features approach to acoustic event detection. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing
Pollak P, Behunek M (2011) Accuracy of MP3 speech recognition under real-word conditions: experimental study. In: Proceedings of IEEE signal processing and multimedia applications (SIGMAP), pp 1–6
Pollard HF, Jansson EV (1982) A tristimulus method for the specification of musical timbre. J Acust 51:162–171
Google Scholar
Ruiz-Martinez CA, Akhtar MT, Washizawa Y, Escamilla-Hernandez E (2013) On investigating efficient methodology for environmental sound recognition. In: Proceedings of international symposium on intelligent signal processing and communications systems (ISPACS), pp 210–214
Sáenz-Lechón N, Osma-Ruiz V, Godino-Llorente JI (2008) Effects of audio compression in automatic detection of voice pathologies. IEEE Trans Biomed Eng 55(12):2831–2835
Article Google Scholar
Salamon J, Jakoby C, Bello J P (2014) A dataset and taxonomy for urban sound research. In: Proceedings 22nd ACM international conference on multimedia, pp 1041–1044
Sebbanü M, Nock R, Chauchat J, Rakotomalala R (2000) Impact of learning set quality and size on decision tree performances. Int J Comput Syst Signals 1(1):85–105
Google Scholar
Stowell D, Stowell D, Benetos E, Lagrange M, Plumbley MD (2015) Detection and classification of acoustic scenes and events. IEEE Trans Multimed 17(10):1733–1746
Article Google Scholar
Sug H (2009) An effective sampling method for decision trees considering comprehensibility and accuracy. WSEAS Trans Comput 8(4):631–640
Google Scholar
Terence NWZ, Dat TH, Dennis J, Siong CE (2013) A robust sound event recognition framework under TV playing conditions. In: Proceedings of signal and information processing association annual summit and conference (APSIPA), pp 1–5
Theodorou T, Mporas I, Fakotakis N (2014) Audio feature selection for recognition of non-linguistic vocalization sounds. In: Proceedings of Hellenic conference on artificial intelligence, pp 395–405
Chapter Google Scholar
Tsuruoka Y, Tsujii J, Ananiadou S (2009) Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty. In: Proceedings of ACL-IJCNLP, pp 477–485
Uemura A, Kazumasa I, Katto J (2014) Effects of audio compression on chord recognition. In: Proceedings of international conference on multimedia modeling, pp 345–352
Chapter Google Scholar
Urbano J, Bogdanov D, Herrera P, Gómez E, Serra X (2014) What is the effect of audio quality on the robustness of MFCCs and chroma features? In: Proceedings of 15th ISMIR conference, pp 573–578
Wang Y, Neves L, Metze F (2016) Audio-based multimedia event detection using deep recurrent neural networks. In: Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2742–2746
Yamamoto S, Nakadai K, Nakano M, et al (2006) Real-time robot audition system that recognizes simultaneous speech in the real world. In: Proceedings of international conference on intelligent robots and systems (IROS), pp 5333–5338

Download references

Acknowledgements

Nokia Foundation provided a Grant (ID: 201510141) for two months in 2015 when the CSIBE-RAW dataset was collected, CSIBE-AIBO was recorded and the initial baseline system was implemented. We want to say special thanks to Toni Heittola and Annamaria Mesaros from Tampere University of Technology for their invaluable comments to overcome some problems.

Author information

Authors and Affiliations

University of Tampere, Kalevantie 4, 33100, Tampere, Finland
Csaba Kertész & Markku Turunen

Authors

Csaba Kertész
View author publications
You can also search for this author in PubMed Google Scholar
Markku Turunen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Csaba Kertész.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kertész, C., Turunen, M. Common sounds in bedrooms (CSIBE) corpora for sound event recognition of domestic robots. Intel Serv Robotics 11, 335–346 (2018). https://doi.org/10.1007/s11370-018-0258-9

Download citation

Received: 02 September 2017
Accepted: 22 August 2018
Published: 28 August 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s11370-018-0258-9

Common sounds in bedrooms (CSIBE) corpora for sound event recognition of domestic robots

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Robust technique for environmental sound classification using convolutional recurrent neural network

Audio Event Detection Using Wireless Sensor Networks Based on Deep Learning

Edge AI Implementation for Recognizing Sounds Created by Human Activities in Smart Offices Design Concepts

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Common sounds in bedrooms (CSIBE) corpora for sound event recognition of domestic robots

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Robust technique for environmental sound classification using convolutional recurrent neural network

Audio Event Detection Using Wireless Sensor Networks Based on Deep Learning

Edge AI Implementation for Recognizing Sounds Created by Human Activities in Smart Offices Design Concepts

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation