research-article

Designing efficient architectures for modeling temporal features with convolutional neural networks

Authors:

Jordi Pons,

Xavier SerraAuthors Info & Claims

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pages 2472 - 2476

https://doi.org/10.1109/ICASSP.2017.7952601

Published: 05 March 2017 Publication History

Abstract

Many researchers use convolutional neural networks with small rectangular filters for music (spectrograms) classification. First, we discuss why there is no reason to use this filters setup by default and second, we point that more efficient architectures could be implemented if the characteristics of the music features are considered during the design process. Specifically, we propose a novel design strategy that might promote more expressive and intuitive deep learning architectures by efficiently exploiting the representational capacity of the first layer - using different filter shapes adapted to fit musical concepts within the first layer. The proposed architectures are assessed by measuring their accuracy in predicting the classes of the Ballroom dataset. We also make available<sup>1</sup> the used code (together with the audio-data) so that this research is fully reproducible.

6. References

[1]

Keunwoo Choi, George Fazekas, and Mark Sandler, “Automatic tagging using deep convolutional neural networks,” in 17th InternationalSociety for Music Information Retrieval Conference (ISMIR), 2016.

Google Scholar

[2]

Huy Phan, Lars Hertel, Marco Maass, and Alfred Mertins, “Robust audio event recognition with 1-max pooling convolutional neural networks,” arXiv preprint arXiv:1604.06338, 2016.

Google Scholar

[3]

Yoonchang Han, Jaehun Kim, and Kyogu Lee, “Deep convolutional neural networks for predominant instrument recognition in polyphonic music,” arXiv preprint arXiv:1605.09507, 2016.

Google Scholar

[4]

Jordi Pons, Thomas Lidy, and Xavier Serra, “Experimenting with musically motivated convolutional neural networks,” in 14th International Workshop on Content-Based Multimedia Indexing (CBMI). IEEE, 2016.

Google Scholar

[5]

Jan Schlüter and Sebastian Böck, “Improved musical onset detection with convolutional neural networks,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014.

Google Scholar

[6]

Fabien Gouyon, Simon Dixon, Elias Pampalk, and Gerhard Widmer, “Evaluating rhythmic descriptors for musical genre classification,” in Proceedings of the 25th AES International Conference, 2004, pp. 196–204.

Google Scholar

[7]

Sebastian Böck, Florian Krebs, and Gerhard Widmer, “Accurate tempo estimation based on recurrent neural networks and resonating comb filters,” in 16th International Society for Music Information Retrieval Conference (ISMIR), 2015.

Google Scholar

[8]

Florian Krebs, Sebastian Böck, Matthias Dorfer, and Gerhard Widmer, “Downbeat tracking using beat-synchronous features and recurrent neural networks,” in 17th InternationalSociety for Music Information Retrieval Conference (ISMIR), 2016.

Google Scholar

[9]

Simon Durand, Juan P Bello, Bertrand David, and Gaël Richard, “Feature adapted convolutional neural networks for downbeat tracking,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016.

Google Scholar

[10]

Ugo Marchand and Geoffroy Peeters, “The modulation scale spectrum and its application to rhythm-content description,” in IEEE International Workshop on Machine Learning for Signal Processing, 2016.

Google Scholar

[11]

Sander Dieleman and Benjamin Schrauwen, “End-to-end learning for music audio,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014.

Google Scholar

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in The IEEE International Conference on Computer Vision (ICCV), 2015.

Google Scholar

Cited By

View all

Lima HSantos CMeiguins B(2021)A Survey of Music Visualization TechniquesACM Computing Surveys10.1145/346183554:7(1-29)Online publication date: 18-Jul-2021
https://dl.acm.org/doi/10.1145/3461835

Index Terms

Designing efficient architectures for modeling temporal features with convolutional neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Cross-layer features in convolutional neural networks for generic classification tasks
2015 IEEE International Conference on Image Processing (ICIP)
Recent works about convolutional neural networks (CNN) show breakthrough performance on various tasks. However, most of them only use the features extracted from the topmost layer of CNN instead of leveraging the features extracted from different layers. ...
Audio Recapture Detection With Convolutional Neural Networks

In this paper, we investigate how features can be effectively learned by deep neural networks for audio forensic problems. By providing a preliminary feature preprocessing based on electric network frequency (ENF) analysis, we propose a convolutional ...
Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music

Identifying musical instruments in polyphonic music recordings is a challenging but important problem in the field of music information retrieval. It enables music search by instrument, helps recognize musical genres, or can make music transcription ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Mar 2017

6527 pages

Publisher

IEEE Press

Publication History

Published: 05 March 2017

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Lima HSantos CMeiguins B(2021)A Survey of Music Visualization TechniquesACM Computing Surveys10.1145/346183554:7(1-29)Online publication date: 18-Jul-2021
https://dl.acm.org/doi/10.1145/3461835

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

6. References

Cited By

Index Terms

Recommendations

Cross-layer features in convolutional neural networks for generic classification tasks

Audio Recapture Detection With Convolutional Neural Networks

Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations