Noise-Robust Sound-Event Classification System with Texture Analysis
<p>Overall system structure of the proposed method.</p> "> Figure 2
<p>Conversion of two-dimensional gray-level image from a one-dimensional sound signal.</p> "> Figure 3
<p>Process of extracting texture image using the dominant neighborhood structure (DNS) algorithm.</p> "> Figure 4
<p>Convolutional neural network (CNN) structure for sound event classification.</p> "> Figure 5
<p>Sound waveform acquired from railway-point machine: (<b>a</b>) normal event, (<b>b</b>) gravel event, (<b>c</b>) ice-covered event, and (<b>d</b>) unscrewed event. Horizontal axis shows the time axis. and vertical axis displays the sound signal in dB.</p> "> Figure 5 Cont.
<p>Sound waveform acquired from railway-point machine: (<b>a</b>) normal event, (<b>b</b>) gravel event, (<b>c</b>) ice-covered event, and (<b>d</b>) unscrewed event. Horizontal axis shows the time axis. and vertical axis displays the sound signal in dB.</p> "> Figure 6
<p>Texture image of different types of events in railway-point machine: (<b>a</b>) normal event, (<b>b</b>) gravel event, (<b>c</b>) ice-covered event, and (<b>d</b>) unscrewed event.</p> "> Figure 7
<p>The two-step process of transforming sound signals into a texture image. The first step is the process of converting a sound signal into a 2D gray-level image, and the second step is the process of creating a noise-robust texture image by applying DNS: (<b>a</b>) noise-free (normal event), (<b>b</b>) SNR 18, (<b>c</b>) SNR 0, (<b>d</b>) wind, and (<b>e</b>) rain.</p> "> Figure 8
<p>Structural similarity (SSIM) comparison graph before and after applying DNS to railway-point-machine sound data under various noise conditions.</p> "> Figure 9
<p>Sound waveforms acquired for the case of respiratory diseases: (<b>a</b>) normal (grunt) event, (<b>b</b>) postweaning multisystemic wasting syndrome (PMWS) event, (<b>c</b>) porcine reproductive and respiratory syndrome (PRRS) event, and (<b>d</b>) mycoplasma hyopneumoniae (MH) event. Horizontal axis shows the time axis and vertical axis shows the sound signal in dB.</p> "> Figure 10
<p>Texture images of different types of events in the pigsty: (<b>a</b>) normal (grunt) event, (<b>b</b>) PMWS event, (<b>c</b>) PRRS event, and (<b>d</b>) MH event.</p> "> Figure 11
<p>Texture image of a normal (grunt) sound event in a noisy environment: (<b>a</b>) SNR 18, (<b>b</b>) SNR 0, (<b>c</b>) strong footstep, and (<b>d</b>) door opening.</p> "> Figure 12
<p>SSIM comparison graph before and after applying DNS to porcine sound data under various noise conditions.</p> ">
Abstract
:1. Introduction
2. Classification of Sound Events Using Noise-Robust Systems
2.1. Preprocessing Module
2.2. Texture-Extract Module
2.3. Classification Module
- Convolutional neural networks (CNN): CNN is a representative deep-learning model for image classification [19]. It consists of a convolution layer, a pooling layer, and a fully connected layer [20]. The convolution layer extracts a feature map through a convolution operation on the input image. Based on the features extracted from the convolution layer, the pooling layer applies a subsampling method (max, min, average pooling, etc.) and abstracts the input space to reduce weak features and extract strong features. The fully connected layer is used for the purpose of object classification using the features extracted through iteration between the convolution layer and the pooling layer. From the last layer to the initial layer, a back-propagation algorithm is used to optimize learning by finding weights that minimize error. This gradually extracts the strong-feature maps and develops high-accuracy models through continuous iterative learning. In this study, the CNN structure was designed as shown in Figure 4. The same layer structure was later used for both data types studied in this work (railway industry and livestock industry).
- Support vector machine (SVM): SVM is widely used in binary classification problems. This is a method of classification by finding an optimal linear-decision plane based on the concept of minimizing structural risk [21,22]. The decision plane is a weighted combination of learning elements called support vectors that exist at the interfaces between the classes. For example, assume that we are analyzing a dataset that can be linearly separated. The goal is to separate the classes by a hyperplane that maximizes the distance of the support vectors. This hyperplane is called an optimal separating hyperplane, and it obtains a support vector by solving a quadratic programming problem. In the case of data that cannot be linearly separated, the input vector is nonlinearly mapped to a higher-dimensional feature space where the linear hyperplane is found. At this time, the objective function and the decision function are calculated as the inner product of the vector. It is not necessary to explicitly calculate the mapping process of the complex calculation. That is, a kernel function satisfying the Mercer condition can be replaced with a mapping function that is used in place of a data vector. In this study, we used the radial basis function (RBF) as a kernel function.
- k-nearest neighbors algorithm (k-NN): k-NN is representative nonparametric methodology. This is a machine-learning algorithm applied to data classification [23]. As the name implies, k-NN determines the class of data by referring them to the k-closest data points. The Euclidean distance method is usually used to measure the distance.
- C4.5: The C4.5 algorithm [24] is a tree-based classification algorithm that is an improvement over the ID3 algorithm. Since ID3 is a decision-tree algorithm, analysts can easily understand and explain its results. However, unlike other probabilistic classification algorithms, it is impossible to make predictions when using this method, and only classifying data is allowed. In order to overcome the shortcomings of the ID3 algorithm, the C4.5 algorithm considers more properties, such as, “handling of numerical attributes”, “problem excluding nonsignificant properties”, “tree-depth problem”, “missing-value processing”, and “cost consideration.”
3. Experimental Results
3.1. Experimental Results on Railway-Point-Machine Sound Data
3.1.1. Experimental Data
3.1.2. Extracting Texture Image and Analysis
3.1.3. Classification Results
3.2. Experimental Results on Porcine Respiratory Sound Data
3.2.1. Experimental Data
3.2.2. Extracting Texture Image and Analysis
3.2.3. Classification Results
4. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Ozer, I.; Ozer, Z.; Findik, O. Noise Robust Sound Event Classification with Convolutional Neural Network. Neurocomputing 2018, 272, 505–512. [Google Scholar] [CrossRef]
- Sharan, R.V.; Moir, T.J. Robust Acoustic Event Classification Using Deep Neural Networks. Inf. Sci. 2017, 396, 24–32. [Google Scholar] [CrossRef]
- Adavanne, S.; Pertilä, P.; Virtanen, T. Sound Event Detection Using Spatial Features and Convolutional Recurrent Neural Network. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017. [Google Scholar]
- Salamon, J.; Bello, J.P. Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification. IEEE Signal Process. Lett. 2017, 24, 279–283. [Google Scholar] [CrossRef]
- McLoughlin, I.; Zhang, H.; Xie, Z.; Song, Y.; Xiao, W. Robust Sound Event Classification Using Deep Neural Networks. IEEE/ACM Trans. Audio Speech Lang. Process. 2015, 23, 540–552. [Google Scholar] [CrossRef] [Green Version]
- Zhang, H.; McLoughlin, I.; Song, Y. Robust Sound Event Recognition Using Convolutional Neural Networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, Australia, 19–24 April 2015. [Google Scholar]
- Gilchrist, A. Introducing Industry 4.0; Apress: New York, NY, USA, 2016; pp. 195–215. [Google Scholar]
- Guarino, M.; Jans, P.; Costa, A.; Aerts, J.M.; Berckmans, D. Field Test of Algorithm for Automatic Cough Detection in Pig Houses. Comput. Electron. Agric. 2008, 62, 22–28. [Google Scholar] [CrossRef]
- Chung, Y.; Oh, S.; Lee, J.; Park, D.; Chang, H.; Kim, S. Automatic Detection and Recognition of Pig Wasting Diseases Using Sound Data in Audio Surveillance. Sensors 2013, 13, 12929–12942. [Google Scholar] [CrossRef] [PubMed]
- Lee, J.; Jin, L.; Park, D.; Chung, Y.; Chang, H. Acoustic Features for Pig Wasting Disease Detection. Int. J. Inf. Process. Manag. 2015, 6, 37–46. [Google Scholar]
- Asada, T.; Roberts, C.; Koseki, T. An Algorithm for Improved Performance of Railway Condition Monitoring Equipment: Alternating-current point machine case study. Transp. Res. C Emerg. Technol. 2013, 30, 81–92. [Google Scholar] [CrossRef]
- Asada, T.; Roberts, C. Development of an Effective Condition Monitoring System for AC Point Machines. In Proceedings of the 5th IET Conference on Railway Condition Monitoring and Non-Destructive Testing (RCM 2011), Derby, UK, 29–30 November 2011. [Google Scholar]
- Kim, H.; Sa, J.; Chung, Y.; Park, D.; Yoon, S. Fault Diagnosis of Railway Point Machines Using Dynamic Time Warping. Electron. Lett. 2016, 52, 818–819. [Google Scholar] [CrossRef]
- Sa, J.; Choi, Y.; Chung, Y.; Lee, J.; Park, D. Aging Detection of Electrical Point Machines Based on Support Vector Data Description. Symmetry 2017, 9, 290. [Google Scholar] [CrossRef]
- Lee, J.; Choi, H.; Park, D.; Chung, Y.; Kim, H.Y.; Yoon, S. Fault Detection and Diagnosis of Railway Point Machines by Sound Analysis. Sensors 2016, 16, 549. [Google Scholar] [CrossRef] [PubMed]
- Sharan, R.V.; Moir, T.J. Noise Robust Audio Surveillance Using Reduced Spectrogram Image Feature and One-against-all SVM. Neurocomputing 2015, 158, 90–99. [Google Scholar] [CrossRef]
- Khellah, F. Texture Classification Using Dominant Neighborhood Structure. IEEE Trans. Image Process. 2011, 21, 3270–3279. [Google Scholar] [CrossRef] [PubMed]
- Khellah, F. Textured Image Denoising Using Dominant Neighborhood Structure. Arab. J. Sci. Eng. 2014, 39, 3759–3770. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Cunningham, R.; Sánchez, M.B.; May, G.; Loram, I. Estimating Full Regional Skeletal Muscle Fibre Orientation from B-Mode Ultrasound Images Using Convolutional, Residual, and Deconvolutional Neural Networks. J. Imaging 2018, 4, 29. [Google Scholar] [CrossRef]
- Lee, J.; Noh, B.; Jang, S.; Park, D.; Chung, Y.; Chang, H. Stress Detection and Classification of Laying Hens by Sound Analysis. Asian Australas. J. Anim. Sci. 2015, 28, 592–598. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Santos, P.; Villa, L.F.; Reñones, A.; Bustillo, A.; Maudes, J. An SVM-based Solution for Fault Detection in Wind Turbines. Sensors 2015, 15, 5627–5648. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Akbulut, Y.; Sengur, A.; Guo, Y.; Smarandache, F. NS-k-NN: Neutrosophic Set-Based k-Nearest Neighbors Classifier. Symmetry 2017, 9, 179. [Google Scholar] [CrossRef]
- Szarvas, G.; Farkas, R.; Kocsor, A. A Multilingual Named Entity Recognition System Using Boosting and C4.5 Decision Tree Learning Algorithms. In International Conference on Discovery Science; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: from Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Han, J.; Kamber, M.; Pei, J. Data Mining: Concepts and Techniques, 3rd ed.; Morgan Kaufman: San Francisco, CA, USA, 2012. [Google Scholar]
- Theodoridis, S.; Koutroumbas, K. Pattern Recognition, 4th ed.; Academic Press: Kidlington, Oxford, UK, 2009. [Google Scholar]
- Powers, D.M.W. Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness and Correlation. J. Mach. Learn. Technol. 2011, 2, 2229–3981. [Google Scholar]
Bird Chirping | Helicopter | Wind | Rain | |
---|---|---|---|---|
SNR (dB) | 38.1146 | 14.5317 | 11.3320 | 8.4212 |
Mean Intensity | –1.5 × 10−5 | 4.2 × 10−6 | –1.9 × 10−5 | –1.3 × 10−5 |
Max Intensity | 0.0097 | 0.2429 | 0.2849 | 0.2560 |
Min Intensity | –0.0103 | –0.2724 | –0.2559 | –0.2863 |
Noise Conditions | F1 Score | |||
---|---|---|---|---|
CNN | Support Vector Machine (SVM) | k-Nearest Neighbors (k-NN) | C4.5 | |
SNR 18 | 0.9932 | 0.9861 | 0.9049 | 0.8781 |
SNR 15 | 0.9932 | 0.9866 | 0.8996 | 0.8666 |
SNR 12 | 0.9932 | 0.9868 | 0.8971 | 0.8578 |
SNR 9 | 0.9906 | 0.9851 | 0.8948 | 0.8481 |
SNR 6 | 0.9906 | 0.9853 | 0.8882 | 0.7993 |
SNR 3 | 0.9855 | 0.9832 | 0.8821 | 0.7882 |
SNR 0 | 0.9745 | 0.9732 | 0.8827 | 0.7438 |
Bird chirping | 0.9915 | 0.9851 | 0.8972 | 0.9617 |
Helicopter | 0.9898 | 0.9838 | 0.8962 | 0.8521 |
Wind | 0.9881 | 0.9816 | 0.8867 | 0.8226 |
Rain | 0.9779 | 0.9731 | 0.8822 | 0.7969 |
Average | 0.9880 | 0.9827 | 0.8920 | 0.8377 |
Standard deviation | 0.0063 | 0.0050 | 0.0079 | 0.0576 |
Noise Conditions | F1 score | |||
---|---|---|---|---|
Proposed Method | Modulation [16] | Mel-Frequency Cepstral Coefficients (MFCC) [15] | Modulation + MFCC [16] | |
SNR 18 | 0.9932 | 0.5902 | 0.5912 | 0.5953 |
SNR 15 | 0.9932 | 0.5462 | 0.5465 | 0.5469 |
SNR 12 | 0.9932 | 0.5206 | 0.5204 | 0.5272 |
SNR 9 | 0.9906 | 0.2415 | 0.3172 | 0.4366 |
SNR 6 | 0.9906 | 0.2415 | 0.2415 | 0.2415 |
SNR 3 | 0.9855 | 0.2415 | 0.2415 | 0.2415 |
SNR 0 | 0.9745 | 0.2415 | 0.2415 | 0.2415 |
Bird chirping | 0.9915 | 0.9734 | 0.9949 | 0.9898 |
Helicopter | 0.9898 | 0.9734 | 0.9727 | 0.9768 |
Wind | 0.9881 | 0.9624 | 0.9609 | 0.9715 |
Rain | 0.9779 | 0.3253 | 0.3776 | 0.2415 |
Average | 0.9880 | 0.5325 | 0.5460 | 0.5464 |
Standard deviation | 0.0063 | 0.3097 | 0.3029 | 0.3081 |
Weak Footsteps | Radio Operation | Strong Footsteps | Door Opening | |
---|---|---|---|---|
SNR (dB) | 9.1172 | 8.7971 | 7.4681 | 4.6820 |
Mean Intensity | 2.9 × 10−5 | –9.5 × 10−6 | –1.1 × 10−5 | –3.7 × 10−5 |
Max Intensity | 0.4594 | 0.3682 | 0.9198 | 0.8978 |
Min Intensity | –0.5862 | –0.3615 | –0.9794 | –0.8593 |
Noise Conditions | F1 Score | |||
---|---|---|---|---|
CNN | SVM | k-NN | C4.5 | |
SNR 18 | 0.9939 | 0.9901 | 0.9919 | 0.9331 |
SNR 15 | 0.9939 | 0.9896 | 0.9919 | 0.9195 |
SNR 12 | 0.9939 | 0.9875 | 0.9919 | 0.8891 |
SNR 9 | 0.9925 | 0.9831 | 0.9897 | 0.8681 |
SNR 6 | 0.9897 | 0.9548 | 0.9015 | 0.7935 |
SNR 3 | 0.9709 | 0.8909 | 0.8375 | 0.7856 |
SNR 0 | 0.8643 | 0.8884 | 0.8271 | 0.7469 |
Weak footsteps | 0.9877 | 0.9829 | 0.9826 | 0.8834 |
Radio operation | 0.9410 | 0.9709 | 0.9654 | 0.8564 |
Strong footsteps | 0.9748 | 0.9554 | 0.9456 | 0.8471 |
Door opening | 0.9196 | 0.8724 | 0.8859 | 0.8381 |
Average | 0.9657 | 0.9515 | 0.9374 | 0.8510 |
Standard deviation | 0.0416 | 0.0453 | 0.0637 | 0.0573 |
Noise Conditions | F1 Score | |||
---|---|---|---|---|
Proposed Method | Modulation [16] | MFCC [9] | Modulation + MFCC [16] | |
SNR 18 | 0.9939 | 0.8665 | 0.8365 | 0.8993 |
SNR 15 | 0.9939 | 0.8671 | 0.8161 | 0.8611 |
SNR 12 | 0.9939 | 0.8343 | 0.7653 | 0.8435 |
SNR 9 | 0.9925 | 0.8139 | 0.7277 | 0.8089 |
SNR 6 | 0.9897 | 0.7971 | 0.6752 | 0.7997 |
SNR 3 | 0.9709 | 0.7377 | 0.6279 | 0.7514 |
SNR 0 | 0.8643 | 0.7112 | 0.5354 | 0.7191 |
Weak footsteps | 0.9877 | 0.8833 | 0.7902 | 0.9232 |
Radio operation | 0.9410 | 0.8051 | 0.7881 | 0.8263 |
Strong footsteps | 0.9748 | 0.8495 | 0.7638 | 0.8949 |
Door opening | 0.9196 | 0.7167 | 0.6927 | 0.7258 |
Average | 0.9657 | 0.8075 | 0.7290 | 0.8230 |
Standard deviation | 0.0416 | 0.0615 | 0.0899 | 0.0700 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Choi, Y.; Atif, O.; Lee, J.; Park, D.; Chung, Y. Noise-Robust Sound-Event Classification System with Texture Analysis. Symmetry 2018, 10, 402. https://doi.org/10.3390/sym10090402
Choi Y, Atif O, Lee J, Park D, Chung Y. Noise-Robust Sound-Event Classification System with Texture Analysis. Symmetry. 2018; 10(9):402. https://doi.org/10.3390/sym10090402
Chicago/Turabian StyleChoi, Yongju, Othmane Atif, Jonguk Lee, Daihee Park, and Yongwha Chung. 2018. "Noise-Robust Sound-Event Classification System with Texture Analysis" Symmetry 10, no. 9: 402. https://doi.org/10.3390/sym10090402
APA StyleChoi, Y., Atif, O., Lee, J., Park, D., & Chung, Y. (2018). Noise-Robust Sound-Event Classification System with Texture Analysis. Symmetry, 10(9), 402. https://doi.org/10.3390/sym10090402