Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3632754.3632761acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfireConference Proceedingsconference-collections
research-article
Open access

Expressivity-aware Music Performance Retrieval using Mid-level Perceptual Features and Emotion Word Embeddings

Published: 12 February 2024 Publication History

Abstract

This paper explores a specific sub-task of cross-modal music retrieval. We consider the delicate task of retrieving a performance or rendition of a musical piece based on a description of its style, expressive character, or emotion from a set of different performances of the same piece. We observe that a general purpose cross-modal system trained to learn a common text-audio embedding space does not yield optimal results for this task. By introducing two changes – one each to the text encoder and the audio encoder – we demonstrate improved performance on a dataset of piano performances and associated free-text descriptions. On the text side, we use emotion-enriched word embeddings (EWE) and on the audio side, we extract mid-level perceptual features instead of generic audio embeddings. Our results highlight the effectiveness of mid-level perceptual features learnt from music and emotion enriched word embeddings learnt from emotion-labelled text in capturing musical expression in a cross-modal setting. Additionally, our interpretable mid-level features provide a route for introducing explainability in the retrieval and downstream recommendation processes.

References

[1]
2023. Apple Music Classical. https://support.apple.com/en-us/HT213415
[2]
Andrea Agostinelli, Timo I. Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matthew Sharifi, Neil Zeghidour, and Christian Havnø Frank. 2023. MusicLM: Generating Music From Text. ArXiv abs/2301.11325 (2023).
[3]
Ameeta Agrawal, Aijun An, and Manos Papagelis. 2018. Learning emotion-enriched word representations. In Proceedings of the 27th international conference on computational linguistics. 950–961.
[4]
Jessica Akkermans, Renee Schapiro, Daniel Müllensiefen, Kelly Jakubowski, Daniel Shanahan, David Baker, Veronika Busch, Kai Lothwesen, Paul Elvers, Timo Fischinger, 2019. Decoding emotions in expressive music performances: A multi-lab replication and extension study. Cognition and Emotion 33, 6 (2019), 1099–1118.
[5]
Anna Aljanaki and Mohammad Soleymani. 2018. A Data-driven Approach to Mid-level Perceptual Musical Feature Modeling. In Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018. Paris, France, 615–621.
[6]
Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. 2011. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011. Miami, Florida.
[7]
Carlos Cancino-Chacón, Silvan David Peter, Shreyan Chowdhury, Anna Aljanaki, and Gerhard Widmer. 2020. On the Characterization of Expressive Performance in Classical Music: First Results of the Con Espressione Game. In Proceedings of the 21st International Society for Music Information Retrieval Conference, ISMIR 2020. Online.
[8]
Shreyan Chowdhury. 2022. Modelling Emotional Expression in Music Using Interpretable and Transferable Perceptual Features. Ph. D. Dissertation. Johannes Kepler University Linz, Austria.
[9]
Shreyan Chowdhury and Gerhard Widmer. 2021. On Perceived Emotion in Expressive Piano Performance: Further Experimental Evidence for the Relevance of Mid-level Perceptual Features. In Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR 2021. Online.
[10]
Shreyan Chowdhury and Gerhard Widmer. 2021. Towards Explaining Expressive Qualities in Piano Recordings: Transfer of Explanatory Features Via Acoustic Domain Adaptation. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Online, 561–565. https://doi.org/10.1109/ICASSP39728.2021.9413638
[11]
Shreyan Chowdhury and Gerhard Widmer. 2023. Decoding and Visualising Intended Emotion in an Expressive Piano Performance. In Proceedings of the 23rd International Society for Music Information Retrieval Conference, ISMIR 2022, Late-breaking Demo Session. Bangalore, India.
[12]
DCASE Challenge. 2023. Language-based audio retrieval. https://dcase.community/challenge2023/task-language-based-audio-retrieval
[13]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). New Orleans, Louisiana, USA.
[14]
SeungHeon Doh, Minz Won, Keunwoo Choi, and Juhan Nam. 2023. Toward Universal Text-To-Music Retrieval. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1–5. https://doi.org/10.1109/ICASSP49357.2023.10094670
[15]
Alf Gabrielsson and Patrik N. Juslin. 1996. Emotional Expression in Music Performance: Between the Performer’s Intention and the Listener’s Experience. Psychology of Music 24, 1 (1996), 68–91. https://doi.org/10.1177/0305735696241007
[16]
Yuan Gong, Yu-An Chung, and James Glass. 2021. AST: Audio Spectrogram Transformer. In Proceedings of Interspeech 2021. Brno, Czech Republic, 571–575.
[17]
Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), 770–778.
[18]
Qingqing Huang, Aren Jansen, Joonseok Lee, Ravi Ganti, Judith Yue Li, and Daniel P. W. Ellis. 2022. MuLan: A Joint Embedding of Music Audio and Natural Language. In Proceedings of the 23rd International Society for Music Information Retrieval Conference, ISMIR 2022. Bangalore, India.
[19]
Patrik N Juslin. 2013. What does music express? Basic emotions and beyond. Frontiers in psychology 4 (2013), 596.
[20]
Bochen Li and Aparna Kumar. 2019. Query by Video: Cross-modal Music Retrieval. In Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019. Delft, The Netherlands, 604–611.
[21]
Ilaria Manco, Emmanouil Benetos, Elio Quinton, and György Fazekas. 2022. Contrastive Audio-Language Learning for Music. In Proceedings of the 23rd International Society for Music Information Retrieval Conference, ISMIR 2022. Bangalore, India.
[22]
Meinard Müller, Andreas Arzt, Stefan Balke, Matthias Dorfer, and Gerhard Widmer. 2018. Cross-modal music retrieval and applications: An overview of key methodologies. IEEE Signal Processing Magazine 36, 1 (2018), 52–62.
[23]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532–1543. https://doi.org/10.3115/v1/D14-1162
[24]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning Transferable Visual Models from Natural Language Supervision. In International Conference on Machine Learning. PMLR, Online, 8748–8763.
[25]
V. Sanh, L. Debut, J. Chaumond, and T. Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In Neural Information Processing Systems Workshop on Energy Efficient Machine Learning and Cognitive Computing. Vancouver, BC, Canada.
[26]
Markus Schedl, Emilia Gómez, Julián Urbano, 2014. Music information retrieval: Recent developments and applications. Foundations and Trends® in Information Retrieval 8, 2-3 (2014), 127–261.
[27]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. Advances in Neural Information Processing Systems 30.
[28]
Minz Won, Keunwoo Choi, and Xavier Serra. 2021. Semi-supervised music tagging transformer. In Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR 2021, Online.
[29]
Minz Won, Andrés Ferraro, Dmitry Bogdanov, and Xavier Serra. 2020. Evaluation of CNN-based Automatic Music Tagging Models. Proceedings of Sound and Music Computing (SMC) abs/2006.00751 (2020).
[30]
Minz Won, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore, and Xavier Serra. 2021. Emotion Embedding Spaces for Matching Music to Stories. In Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR 2021. Online.

Index Terms

  1. Expressivity-aware Music Performance Retrieval using Mid-level Perceptual Features and Emotion Word Embeddings
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        FIRE '23: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation
        December 2023
        170 pages
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 12 February 2024

        Check for updates

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        Conference

        FIRE 2023

        Acceptance Rates

        Overall Acceptance Rate 19 of 64 submissions, 30%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 90
          Total Downloads
        • Downloads (Last 12 months)90
        • Downloads (Last 6 weeks)21
        Reflects downloads up to 30 Sep 2024

        Other Metrics

        Citations

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media