Abstract
In this paper, we evaluate two popular Recurrent Neural Network (RNN) architectures employing the mechanism of gating: Long-Short Term Memory (LSTM) and Gated Recurrent Unit (GRU), in music classification tasks. We examine the performance on four datasets concerning genre, emotion and dance style recognition. Our key result is a significant improvement of classification accuracy achieved by training the recurrent network on random short subsequences of the vector sequences in the training set. We examine the effect of this training approach on both architectures and discuss the implications for the potential use of RNN in music information retrieval.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sundermeyer, M., Schlüter R., Ney, H.: LSTM Neural Networks for Language Modeling. Interspeech (2012)
Graves, A., Santiago, F., Schmidhuber, J.: Bidirectional LSTM networks for improved phoneme classification and recognition. In: Artificial Neural Networks: Formal Models and Their Applications–ICANN 2005, pp. 753–753 (2015)
Schmidhuber, J., Hochreiter, S.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Zhou, G., et al.: Minimal gated unit for recurrent neural networks. Int. J. Autom. Comput. 13(3), 226–234 (2016)
Fu, Z., et al.: A survey of audio-based music classification and annotation. IEEE Trans. Multimedia 13(2), 303–319 (2011)
Sturm, B.: The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use (2013). arXiv preprint, arXiv:1306.1461
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)
Sturm, B.: An analysis of the GTZAN music genre dataset. In: Proceedings of the Second International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies. ACM (2012)
Kim, Y.E., et al.: Music emotion recognition: a state of the art review. In: Proceedings of the 11th International Conference on Music Information Retrieval (2010)
Yang, Y., Chen, H.H.: Machine recognition of music emotion: a review. ACM Trans. Intell. Syst. Technol. 3(3), 40 (2012)
Song, Y., Dixon, S., Pearce, M.: Evaluation of musical features for emotion classification. In: Proceedings of the 13th International Conference on Music Information Retrieval (2012)
Hamel, P., Wood, S., Eck, D.: Automatic identification of instrument classes in polyphonic and polyinstrument audio. In: Proceedings of the 10th International Conference on Music Information Retrieval, Kobe, Japan (2009)
Abeßer, J., Dittmar, C., Schuller, G.: Automatic recognition and parametrization of frequency modulation techniques in bass guitar recordings. In: Audio Engineering Society Conference: 42nd International Conference: Semantic Audio. Audio Engineering Society (2011)
Chung, J., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling (2014). arXiv preprint, arXiv:1412.3555
Greff, K., et al.: LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. (2016)
Jozefowicz, R., Zaremba, W., Sutskever I.: An empirical exploration of recurrent network architectures. In: Proceedings of the 32nd International Conference on Machine Learning (2015)
Goller, C., Küchler, A.: Learning task-dependent distributed representations by backpropagation through structure. Neural Networks (1996)
Chung, J., Gulcehre, C., Cho, K.H., Bengio, Y.: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling (2014)
Aljanaki, A., Wiering, F., Veltkamp, R.: Collecting annotations for induced musical emotion via online game with a purpose Emotify. Technical report Series 2014. UU-CS-2014-015 (2014)
Seyerlehner, K., Widmer, G., Schnitzer, D.: From rhythm patterns to perceived tempo. In: Proceedings of the 8th International Conference on Music Information Retrieval (2007)
Theano Development Team: “Theano: A Python framework for fast computation of mathematical expressions”
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Jakubik, J. (2018). Evaluation of Gated Recurrent Neural Networks in Music Classification Tasks. In: Borzemski, L., Świątek, J., Wilimowska, Z. (eds) Information Systems Architecture and Technology: Proceedings of 38th International Conference on Information Systems Architecture and Technology – ISAT 2017. ISAT 2017. Advances in Intelligent Systems and Computing, vol 655. Springer, Cham. https://doi.org/10.1007/978-3-319-67220-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-67220-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67219-9
Online ISBN: 978-3-319-67220-5
eBook Packages: EngineeringEngineering (R0)