Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/ICASSP.2017.7952601guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Designing efficient architectures for modeling temporal features with convolutional neural networks

Published: 05 March 2017 Publication History

Abstract

Many researchers use convolutional neural networks with small rectangular filters for music (spectrograms) classification. First, we discuss why there is no reason to use this filters setup by default and second, we point that more efficient architectures could be implemented if the characteristics of the music features are considered during the design process. Specifically, we propose a novel design strategy that might promote more expressive and intuitive deep learning architectures by efficiently exploiting the representational capacity of the first layer - using different filter shapes adapted to fit musical concepts within the first layer. The proposed architectures are assessed by measuring their accuracy in predicting the classes of the Ballroom dataset. We also make available<sup>1</sup> the used code (together with the audio-data) so that this research is fully reproducible.

6. References

[1]
Keunwoo Choi, George Fazekas, and Mark Sandler, “Automatic tagging using deep convolutional neural networks,” in 17th InternationalSociety for Music Information Retrieval Conference (ISMIR), 2016.
[2]
Huy Phan, Lars Hertel, Marco Maass, and Alfred Mertins, “Robust audio event recognition with 1-max pooling convolutional neural networks,” arXiv preprint arXiv:1604.06338, 2016.
[3]
Yoonchang Han, Jaehun Kim, and Kyogu Lee, “Deep convolutional neural networks for predominant instrument recognition in polyphonic music,” arXiv preprint arXiv:1605.09507, 2016.
[4]
Jordi Pons, Thomas Lidy, and Xavier Serra, “Experimenting with musically motivated convolutional neural networks,” in 14th International Workshop on Content-Based Multimedia Indexing (CBMI). IEEE, 2016.
[5]
Jan Schlüter and Sebastian Böck, “Improved musical onset detection with convolutional neural networks,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014.
[6]
Fabien Gouyon, Simon Dixon, Elias Pampalk, and Gerhard Widmer, “Evaluating rhythmic descriptors for musical genre classification,” in Proceedings of the 25th AES International Conference, 2004, pp. 196–204.
[7]
Sebastian Böck, Florian Krebs, and Gerhard Widmer, “Accurate tempo estimation based on recurrent neural networks and resonating comb filters,” in 16th International Society for Music Information Retrieval Conference (ISMIR), 2015.
[8]
Florian Krebs, Sebastian Böck, Matthias Dorfer, and Gerhard Widmer, “Downbeat tracking using beat-synchronous features and recurrent neural networks,” in 17th InternationalSociety for Music Information Retrieval Conference (ISMIR), 2016.
[9]
Simon Durand, Juan P Bello, Bertrand David, and Gaël Richard, “Feature adapted convolutional neural networks for downbeat tracking,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016.
[10]
Ugo Marchand and Geoffroy Peeters, “The modulation scale spectrum and its application to rhythm-content description,” in IEEE International Workshop on Machine Learning for Signal Processing, 2016.
[11]
Sander Dieleman and Benjamin Schrauwen, “End-to-end learning for music audio,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014.
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in The IEEE International Conference on Computer Vision (ICCV), 2015.

Cited By

View all
  • (2021)A Survey of Music Visualization TechniquesACM Computing Surveys10.1145/346183554:7(1-29)Online publication date: 18-Jul-2021

Index Terms

  1. Designing efficient architectures for modeling temporal features with convolutional neural networks
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
        Mar 2017
        6527 pages

        Publisher

        IEEE Press

        Publication History

        Published: 05 March 2017

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 23 Sep 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2021)A Survey of Music Visualization TechniquesACM Computing Surveys10.1145/346183554:7(1-29)Online publication date: 18-Jul-2021

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media