Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Random fourier feature based music-speech classification

Published: 01 January 2020 Publication History

Abstract

The present paper proposes Random Kitchen Sink based music/speech classification. The temporal and spectral features such as spectral centroid, Spectral roll-off, spectral flux, Mel-frequency cepstral coefficients, entropy, and Zero-crossing rate are extracted from the signals. In order to show the competence of the proposed approach, experimental evaluations and comparisons are performed. Even though both speech and music signals differ in their production mechanisms, those share many common characteristics such as a common spectrum of frequency and are comparatively non-stationary which makes the classification difficult. The proposed approach explicitly maps the data to a feature space where it is linearly separable. The evaluation results shows that the proposed approach provides competing scores with the methods in the available literature.

References

[1]
Lavner Y. and Ruinskiy D., A decision-tree-based algorithm for speech/music classification and segmentation, EURASIP Journal on Audio, Speech, and Music Processing 2009(1) (2009), 239892.
[2]
Shirazi J. and Ghaemmaghami S., Improvement to speech-music discrimination using sinusoidal model based features, Multimedia Tools and Applications 50(2) (2010), 415–435.
[3]
Mezghani E., Charfeddine M., Amar C.B. and Nicolas H., Multifeature speech/music discrimination based on mid-term level statistics and supervised classifiers, In 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), 2016, pp. 1–8, IEEE.
[4]
Tzanetakis G. and Cook P., Musical genre classification of audio signals, IEEE Transactions on speech and audio processing 10(5) (2002), 293–302.
[5]
Khonglah B.K., Sharma R. and Prasanna S.M., Speech vs music discrimination using empirical mode decomposition. In 2015 Twenty First National Conference on Communications (NCC), 2015, pp. 1–6, IEEE.
[6]
Saunders J., Real-time discrimination of broadcast speech/music, IEEE ICASSP 2 (1996), 993–996.
[7]
Scheirer E. and Slaney M., Construction and evaluation of a robust multifeature speech/music discriminator, IEEE ICASSP 2 (1997), 1331–1334.
[8]
Alexandre-Cortizo E., Rosa-Zurera M. and Lopez-Ferreras F., Application of fisher linear discriminant analysis to speech/music classification, In, EUROCON ICCT 2 (2005), 1666–1669.
[9]
Williams G. and Ellis D.P., Speech/music discrimination based on posterior probability features, In Sixth European Conference on Speech Communication and Technology, 1999.
[10]
El-Maleh K., Klein M., Petrucci G. and Kabal P., Speech/music discrimination for multimedia applications, IEEE ICASSP 4 (2000), 2445–2448, IEEE.
[11]
Pikrakis A., Giannakopoulos T. and Theodoridis S., A speech/music discriminator of radio recordings based on dynamic programming and bayesian networks, IEEE Transactions on Multimedia 10(5) (2008), 846–857.
[12]
Lee C.H., Shih J.L., Yu K.M. and Lin H.S., Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features, IEEE Transactions on Multimedia 11(4) (2009), 670–682.
[13]
Sell G. and Clark P., Music tonality features for speech/music discrimination, ICASSP, 2014, pp. 2489–2493, IEEE.
[14]
Pikrakis A. and Theodoridis S., Speech-music discrimination: A deep learning perspective, EUSIP, 2014, pp. 616–620, IEEE.
[15]
Neammalai P., Phimoltares S. and Lursinsap C., Speech and music classification using hybrid form of spectrogram and fourier transformation. In Signal and Information Processing Association Annual Summit and Conference, 2014, pp. 1–6, IEEE.
[16]
Khonglah B.K. and Prasanna S.M., Speech/music classification using speech-specific features, Digital Signal Processing 48 (2016), 71–83.
[17]
Bhattacharjee M., Prasanna S.R.M. and Guha P., Time-Frequency Audio Features for Speech-Music Classification, arXiv preprint arXiv:1811.01222, 2018.
[18]
Baghel S., Khonglah B.K., Prasanna S.M. and Guha P., Shouted/normal speech classification using speech-specific features, TENCON, 2016, pp. 1655–1659, IEEE.
[19]
Tsipas N., Vrysis L., Dimoulas C. and Papanikolaou G., Efficient audio-driven multimedia indexing through similarity-based speech/music discrimination, Multimedia Tools and Applications 76(24) (2017), 25603–25621.
[20]
Rahimi A. and Recht B., Uniform approximation of functions with random bases. In 2008 46th Annual Allerton Conference on Communication, Control, and Computing 2008, pp. 555–561, IEEE.
[21]
Rahimi A. and Recht B., Random features for large-scale kernel machines. In Advances in neural information processing systems, 2008, pp. 1177–1184.
[22]
Scholkopf B. and Smola A.J., Learning with kernels: support vector machines, regularization, optimization, and beyond, MIT press, 2001.
[23]
Hofmann T., Schölkopf B. and Smola A.J., Kernel methods in machine learning, The annals of statistics, 2008, 1171–1220.
[24]
Kumar S.S., Premjith B., Kumar M.A. and Soman K.P., AMRITA CEN-NLP@ SAIL2015: sentiment analysis in Indian Language using regularized least square approach with randomized feature learning, In International Conference on Mining Intelligence and Knowledge Exploration, 2015, pp. 671–683, Springer, Cham.
[25]
Athira S., Harikumar K., Sowmya V. and Soman Dr. K.P., Parameter analysis of random kitchen sink algorithm, International Journal of Applied Engineering Research 10(20) (2015), 19351–19355.
[26]
Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., . . . and Vanderplas J., Scikit-learn: Machine learning in Python, Journal of machine learning research 12(Oct) (2011), 2825–2830.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology  Volume 38, Issue 5
Special section: Intelligent data analysis and applications & smart vehicular technology, communications and applications
2020
1353 pages

Publisher

IOS Press

Netherlands

Publication History

Published: 01 January 2020

Author Tags

  1. Music/speech
  2. random kitchen sink
  3. feature vector
  4. GTZAN database
  5. S&S database
  6. spectral features

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media