Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3632754.3632759acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfireConference Proceedingsconference-collections
research-article
Open access

Are we describing the same sound? An analysis of word embedding spaces of expressive piano performance

Published: 12 February 2024 Publication History

Abstract

Semantic embeddings play a crucial role in natural language-based information retrieval. Embedding models represent words and contexts as vectors whose spatial configuration is derived from the distribution of words in large text corpora. While such representations are generally very powerful, they might fail to account for fine-grained domain-specific nuances. In this article, we investigate this uncertainty for the domain of characterizations of expressive piano performance. Using a music research dataset of free text performance characterizations and a follow-up study sorting the annotations into clusters, we derive a ground truth for a domain-specific semantic similarity structure. We test five embedding models and their similarity structure for correspondence with the ground truth. We further assess the effects of contextualizing prompts, hubness reduction, cross-modal similarity, and k-means clustering. The quality of embedding models shows great variability with respect to this task; more general models perform better than domain-adapted ones and the best model configurations reach human-level agreement.

References

[1]
Ameeta Agrawal, Aijun An, and Manos Papagelis. 2018. Learning emotion-enriched word representations. In Proceedings of the 27th international conference on computational linguistics. 950–961.
[2]
BAAI. 2023. BGE repository. https://github.com/FlagOpen/FlagEmbedding
[3]
Michel Bernays and Caroline Traube. 2010. Expression of piano timbre: gestural control, perception and verbalization. In Proceedings of CIM09: The 5th Conference on Interdisciplinary Musicology.
[4]
Michel Bernays and Caroline Traube. 2011. Verbal expression of piano timbre: Multidimensional semantic space of adjectival descriptors. In Proceedings of the international symposium on performance science (ISPS2011). European Association of Conservatoires (AEC) Utrecht, Netherlands, 299–304.
[5]
Michel Bernays and Caroline Traube. 2013. Expressive production of piano timbre: touch and playing techniques for timbre control in piano performance. In Proceedings of the 10th Sound and Music Computing Conference (SMC2013). KTH Royal Institute of Technology Stockholm, Sweden, 341–346.
[6]
Michel Bernays and Caroline Traube. 2014. Investigating pianists’ individuality in the performance of five timbral nuances through patterns of articulation, touch, dynamics, and pedaling. Frontiers in Psychology 5 (2014), 157.
[7]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
[8]
Carlos Cancino-Chacón, Silvan Peter, Shreyan Chowdhury, Anna Aljanaki, and Gerhard Widmer. 2021. Sorting Musical Expression: Characterization of Descriptions of Expressive Piano Performances. In Extended Abstract in 16th International Conference on Music Perception and Cognition ICMPC 2021) and 11th Triennial Conference of ESCOM (ICMPC-ESCOM 2021).
[9]
Carlos Cancino-Chacón, Silvan Peter, Shreyan Chowdhury, Anna Aljanaki, and Gerhard Widmer. 2023. Con Espressione Dataset. https://zenodo.org/record/3968828
[10]
Carlos Cancino-Chacón, Silvan David Peter, Shreyan Chowdhury, Anna Aljanaki, and Gerhard Widmer. 2020. On the Characterization of Expressive Performance in Classical Music: First Results of the Con Espressione Game. In Proceedings of the 21st International Society for Music Information Retrieval Conference, ISMIR 2020. Online.
[11]
Tuomas Eerola and Jonna K Vuoskoski. 2012. A review of music and emotion studies: Approaches, emotion models, and stimuli. Music Perception: An Interdisciplinary Journal 30, 3 (2012), 307–340.
[12]
Tuomas Eerola, Jonna K Vuoskoski, Henna-Riikka Peltola, Vesa Putkinen, and Katharina Schäfer. 2018. An integrative review of the enjoyment of sadness associated with music. Physics of Life Reviews 25 (2018), 100–121.
[13]
Benjamin Elizalde, Soham Deshmukh, Mahmoud Al Ismail, and Huaming Wang. 2023. Clap learning audio concepts from natural language supervision. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5.
[14]
Philippe Esling, Axel Chemla-Romeu-Santos, and Adrien Bitton. 2018. Generative timbre spaces: regularizing variational auto-encoders with perceptual metrics. arxiv:1805.08501 [cs.SD]
[15]
Roman Feldbauer and Arthur Flexer. 2019. A comprehensive empirical comparison of hubness reduction in high-dimensional spaces. Knowledge and Information Systems 59, 1 (2019), 137–166.
[16]
Roman Feldbauer, Maximilian Leodolter, Claudia Plant, and Arthur Flexer. 2018. Fast approximate hubness reduction for large high-dimensional data. In 2018 IEEE International Conference on Big Knowledge (ICBK). IEEE, 358–367.
[17]
Arthur Flexer. 2023. Can ChatGPT be useful for distant reading of music similarity?. In Proceedings of the 2nd Workshop on Human-Centric Music Information Retrieval (HCMIR), 2023.
[18]
Alf Gabrielsson. 2003. Music performance research at the millennium. Psychology of music 31, 3 (2003), 221–272.
[19]
Carol L Krumhansl. 1989. Why is musical timbre so hard to understand. Structure and perception of electroacoustic sound and music 9 (1989), 43–53.
[20]
Alexander Lerch, Claire Arthur, Ashis Pati, and Siddharth Gururani. 2020. An Interdisciplinary Review of Music Performance Analysis. Transactions of the International Society for Music Information Retrieval (Nov 2020). https://doi.org/10.5334/tismir.53
[21]
Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. 2023. Towards General Text Embeddings with Multi-stage Contrastive Learning. arxiv:2308.03281 [cs.CL]
[22]
Raja Marjieh, Ilia Sucholutsky, Pol van Rijn, Nori Jacoby, and Thomas L. Griffiths. 2023. Large language models predict human sensory judgments across six modalities. arxiv:2302.01308 [cs.CL]
[23]
Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers. 2022. MTEB: Massive Text Embedding Benchmark. (2022). arxiv:2210.07316 [cs.CL]
[24]
Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers. 2023. MTEB Leaderboard. https://huggingface.co/spaces/mteb/leaderboard
[25]
OpenAI. 2022. New and improved embedding model. https://openai.com/blog/new-and-improved-embedding-model
[26]
Caroline Palmer. 1997. Music performance. Annual review of psychology 48, 1 (1997), 115–138.
[27]
James A Russell. 1980. A Circumplex Model of Affect. Journal of Personality and Social Psychology 39, 6 (1980), 1161–1178.
[28]
Charalampos Saitis, Stefan Weinzierl, Katharina von Kriegstein, Sølvi Ystad, and Christine Cuskley. 2020. Timbre semantics through the lens of crossmodal correspondences: A new way of asking old questions. Acoustical Science and Technology 41, 1 (2020), 365–368.
[29]
Simon Schaerlaeken, Donald Glowinski, and Didier Grandjean. 2022. Linking musical metaphors and emotions evoked by the sound of classical music. Psychology of Music 50, 1 (2022), 245–264.
[30]
Simon Schaerlaeken, Donald Glowinski, Marc-André Rappaz, and Didier Grandjean. 2019. “Hearing music as...”: Metaphors evoked by the sound of classical music.Psychomusicology: Music, Mind, and Brain 29, 2-3 (2019), 100.
[31]
Dominik Schnitzer, Arthur Flexer, Markus Schedl, and Gerhard Widmer. 2012. Local and global scaling reduce hubs in space.Journal of Machine Learning Research 13, 10 (2012).
[32]
Kai Siedenburg and Charalampos Saitis. 2023. The language of sounds unheard: Exploring musical timbre semantics of large language models. arxiv:2304.07830 [cs.CL]
[33]
Yading Song, Simon Dixon, Marcus T Pearce, and Andrea R Halpern. 2016. Perceived and induced emotion responses to popular music: Categorical and dimensional models. Music Perception: An Interdisciplinary Journal 33, 4 (2016), 472–492.
[34]
Marcel Zentner, Didier Grandjean, and Klaus R Scherer. 2008. Emotions evoked by the sound of music: characterization, classification, and measurement.Emotion 8, 4 (2008), 494.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
FIRE '23: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation
December 2023
170 pages
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 February 2024

Check for updates

Author Tags

  1. Embeddings
  2. Evaluation
  3. Music Performance
  4. Semantic Similarity

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

FIRE 2023

Acceptance Rates

Overall Acceptance Rate 19 of 64 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 150
    Total Downloads
  • Downloads (Last 12 months)150
  • Downloads (Last 6 weeks)28
Reflects downloads up to 30 Sep 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media