Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Getting Closer to the Essence of Music: The Con Espressione Manifesto

Published: 03 October 2016 Publication History

Abstract

This text offers a personal and very subjective view on the current situation of Music Information Research (MIR). Motivated by the desire to build systems with a somewhat deeper understanding of music than the ones we currently have, I try to sketch a number of challenges for the next decade of MIR research, grouped around six simple truths about music that are probably generally agreed on but often ignored in everyday research.

References

[1]
S. Abdallah and M. Plumbley. 2008. Information dynamics: Patterns of expectation and surprise in the perception of music. Connect. Sci. 21, 2--3 (2008), 89--117.
[2]
A. Arzt and G. Widmer. 2010. Simple tempo models for real-time music tracking. In Proceedings of the 7th Sound and Music Computing Conference (SMC 2010). Barcelona, Spain.
[3]
G. Assayag and S. Dubnov. 2004. Using factor oracles for machine improvisation. Soft Comput. 8 (2004), 1--7.
[4]
J. J. Aucouturier and F. Pachet. 2004. Improving timbre similarity: How high is the sky. J. Neg. Res/Speech Audio Sci. 1, 1 (2004), 1--13.
[5]
L. Barrington, A. Chan, and G. Lanckriet. 2010. Modeling music as a dynamic texture. IEEE Trans. Audio Speech Lang/ Process/ 18, 3 (2010), 602--612.
[6]
Y. Bengio. 2009. Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1 (2009), 1--127.
[7]
S. Böck and M. Schedl. 2012. Polyphonic piano note transcription with recurrent neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’12).
[8]
R. Bod. 2002. Memory-based models of melodic analysis: Challenging the Gestalt principles. J. New Music Res. 30, 3 (2002), 27--36.
[9]
N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent. 2012. Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. In Proceedings of the 29th International Conference on Machine Learning (ICML’12).
[10]
T. Collins, S. Böck, F. Krebs, and G. Widmer. 2014. Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio. In Proceedings of the 53rd AES Conference on Semantic Audio. London, UK.
[11]
D. Deutsch. 2013. Grouping mechanisms in music. In The Psychology of Music (3rd Ed.), D. Deutsch (Ed.). Academic Press.
[12]
S. Dubnov, S. McAdams, and R. Reynolds. 2006. Structural and affective aspects of music from statistical audio signal analysis. J. Am. Soc. Info. Sci. Technol. 57, 11 (2006), 1526--1536.
[13]
D. Eck and J. Schmidhuber. 2002. Learning the long-term structure of the blues. In Artificial Neural Networks, (ICANN’02). Springer Verlag, Berlin, 284--289.
[14]
F. Eyben, S. Böck, B. Schuller, and A. Graves. 2010. Universal onset detection with bidirectional long short-term memory neural networks. In Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR’10).
[15]
A. Flexer, E. Pampalk, and G. Widmer. 2005. Hidden Markov models for spectral similarity of songs. In Proceedings of the 8th International Conference on Digital Audio Effects (DAFx’05).
[16]
A. Gabrielsson. 2002. Emotion perceived and emotion felt: Same or different? Music. Sci. Special Issue 2001--2002 (2002), 123--147.
[17]
A. Gabrielsson and E. Lindström. 2010. The role of structure in the musical expression of emotions. In Handbook of Music and Emotion: Theory, Research, Applications, P. Juslin and J. Sloboda (Eds.). Oxford University Press, New York, NY, 367--400.
[18]
M. Grachten and F. Krebs. 2014. An assessment of learned score features for modeling expressive dynamics in music. IEEE Trans. Multimed. 16, 5 (2014), 1211--1218.
[19]
M. Hamanaka, K. Hirata, and S. Tojo. 2006. Implementing a generative theory of tonal music. J. New Music Res. 35, 4 (2006), 249--277.
[20]
J. Hawkins and D. George. 2006. Hierarchical Temporal Memory: Concepts, Theory, and Terminology. Numenta, technical report.
[21]
K. He, X. Zhang, S. Ren, and J. Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-level Performance on Imagenet Classification. arxiv preprint arxiv:1502.01852 (2015).
[22]
P. Herrera, J. Serrà, C. Laurier, E. Guaus, E. Gómez, and X. Serra. 2009. The discipline formerly known as MIR. In Proceedings of the 10th International Society for Music Information Retrieval Conference (ISMIR’10).
[23]
S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neur. Comput. 9, 8 (1997), 1735--1780.
[24]
E. Humphrey, J. Bello, and Y. LeCun. 2013. Feature learning and deep architectures: New directions for music informatics. J. Intell. Info. Syst. 41 (2013), 461--481.
[25]
A. Huq, J. P. Bello, and R. Rowe. 2010. Automated music emotion recognition: A systematic evaluation. J. New Music Res. 39, 3 (2010), 227--244.
[26]
D. Huron. 2006. Sweet Anticipation: Music and the Psychology of Expectation. MIT Press, Cambridge, MA.
[27]
P. Juslin. 2013. What does music express? Basic emotions and beyond. Front. Psychol. 4, article 596 (2013).
[28]
Y. Kim, E. Schmidt, R. Migneco, B. Morton, P. Richardson, J. Scott, J. Speck, and D. Turnbull. 2010. Music emotion recognition: A state of the art review. In Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR’10).
[29]
P. Knees and M. Schedl. 2013. A survey of music similarity and recommendation from music context data. ACM Trans. Multimed. Comput. Commun. Appl. 10, 1 (2013), 2:1--2:21.
[30]
F. Krebs, S. Böck, and G. Widmer. 2015. An efficient state-space model for joint tempo and meter tracking. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR’15).
[31]
M. Leman. 2008. Embodied Music Cognition and Mediation Technology. MIT Press, Cambridge, MA.
[32]
J. London. 2000. Musical expression and musical meaning in context. In 6th International Conference on Music Perception and Cognition (ICMPC’00). Retrieved from http://www.people.carleton.edu/jlondon/musical_expression_and_mus.htm.
[33]
J. Madsen, B. S. Jensen, and J. Larsen. 2014. Modeling temporal structure in music for emotion prediction using pairwise comparisons. In Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR’14).
[34]
L. B. Meyer. 1956. Emotion and Meaning in Music. Chicago University Press, Chicago, IL.
[35]
A. Moles. 1966. Information Theory and Aesthetic Perception. University of Illinois Press, Urbana, IL.
[36]
M. Müller. 2015. Fundamentals of Music Processing. Audio, Analysis, Algorithms, Applications. Springer Verlag, Berlin.
[37]
E. Narmour. 1992. The Analysis and Cognition of Melodic Complexity: The Implication-Realization Model. University of Chicago Press, Chicago, IL.
[38]
J. Nika and M. Chemillier. 2012. Improtek: Integrating harmonic controls into improvisation in the filiation of OMax. In Proceedings of International Computer Music Conference (ICMC’12).
[39]
F. Pachet. 2003. The continuator: Musical interaction with style. J. New Music Res. 32, 3 (2003), 333--341.
[40]
C. Palmer. 1997. Music performance. Annu. Rev. Psychol. 48, 1 (1997), 115--138.
[41]
A. Papadopoulos, F. Pachet, P. Roy, and J. Sakellariou. 2015. Exact sampling for regular and Markov constraints with belief propagation. In Proceedings of the 21st International Conference on Principles and Practice of Constraint Programming (CP’15).
[42]
A. Patel. 2008. Music, Language and the Brain. Oxford University Press, Oxford, UK.
[43]
J. Paulus, M. Müller, and A. Klapuri. 2010. State of the art report: Audio-based music structure analysis. In 11th International Society for Music Information Retrieval Conference (ISMIR’10).
[44]
M. Pearce, M. Herrojo Ruiz, S. Kapasi, G. Wiggins, and J. Bhattacharya. 2010a. Unsupervised statistical learning underpins computational, behavioural, and neural manifestations of musical expectation. NeuroImage 50, 1 (2010), 302--313.
[45]
M. Pearce, D. Müllensiefen, and G. Wiggins. 2010b. The role of expectation and probabilistic learning in auditory boundary perception: A model comparison. Perception 39, 10 (2010), 1365--1391.
[46]
M. Pearce and G. Wiggins. 2012. Auditory expectation: The information dynamics of music perception and cognition. Top. Cogn. Sci. 4 (2012), 625--652.
[47]
A. Porter, D. Bogdanov, R. Kaye, R. Tsukanov, and X. Serra. 2015. AcousticBrainz: A community platform for gathering music information obtained from audio. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR’15).
[48]
C. Raphael. 2010. Music plus one and machine learning. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). Haifa, Israel.
[49]
P. Roy and F. Pachet. 2013. Enforcing meter in finite-length Markov sequences. In Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI’13).
[50]
J. A. Russell. 1980. A circumplex model of affect. J. Pers. Soc. Psychol. 39, 6 (1980), 1161--1178.
[51]
P. Russell. 1982. Relationships between judgements of the complexity, pleasingness and interestingness of music. Curr. Psychol. Res. 2 (1982), 195--202.
[52]
M. Schedl, A. Flexer, and J. Urbano. 2013. The neglected user in music information retrieval research. J. Intell. Info. Syst. 41, 3 (2013), 523--539.
[53]
J. Schlüter and R. Sonnleitner. 2012. Unsupervised feature learning for speech and music detection in radio broadcasts. In Proceedings of the 15th International Conference on Digital Audio Effects (DAFx 2’12).
[54]
X. Serra. 2011. A multicultural approach in music information research. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR’11).
[55]
X. Serra, M. Magas, E. Benetos, M. Chudy, S. Dixon, A. Flexer, E. Gómez, F. Gouyon, P. Herrera, S. Jordà, O. Paytuvi, G. Peeters, J. Schlüter, H. Vinet, and G. Widmer. 2013. Roadmap for Music Information ReSearch. Creative Commons BY-NC-ND 3.0 license ISBN: 978-2-9540351-1-6. Retrieved from http://mires.eecs.qmul.ac.uk.
[56]
S. Sigtia, N. Boulanger-Lewandowski, and S. Dixon. 2015. Audio chord recognition with a hybrid recurrent neural network. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR’15).
[57]
Y. Song, S. Dixon, and M. Pearce. 2012. A survey of music recommendation systems and future perspectives. In Proceedings of the 9th International Symposium on Computer Music Modeling and Retrieval (CMMR’12).
[58]
B. L. Sturm. 2014. A simple method to determine if a music information retrieval system is a horse. IEEE Trans. Multimed. 16, 6 (2014), 1636--1644.
[59]
D. Temperley. 2007. Music and Probability. MIT Press, Cambridge, MA.
[60]
R. E. Thayer. 1989. The Biopsychology of Mood and Arousal. Oxford University Press, New York, NY.
[61]
K. Ullrich, J. Schlüter, and T. Grill. 2014. Boundary detection in music structure analysis using convolutional neural networks. In Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR’14).
[62]
Y. Vaizman, R. Granot, and G. Lanckriet. 2011. Modeling dynamic patterns for emotional content in music. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR’11).
[63]
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. 2015. Show and Tell: A Neural Image Caption Generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVRP 2015).
[64]
M. Wertheimer. 1938. Laws of organization in perceptual forms (reprint). In A Source Book of Gestalt Psychology, W. D. Ellis (Ed.). Kegan Paul, Trench, Trübner & Company, London, 71--88.
[65]
T. Weyde, S. Cottrell, J. Dykes, E. Benetos, D. Wolff, A. Kachkaev, S. Dixon, S. Hargreaves, M. Barthet, N. Gold, S. Abdallah, D. Tidhar, and M. Plumbley. 2015. The digital music lab: A big data infrastructure for digital musicology. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR 2015), Demos and Late Breaking News Session.
[66]
G. Widmer, S. Flossmann, and M. Grachten. 2009. YQX plays Chopin. AI Mag. 30, 3 (2009), 35--48.
[67]
G. Widmer and W. Goebl. 2004. Computational models of expressive music performance: The state of the art. J. New Music Res. 33, 3 (2004), 203--216.
[68]
G. Wiggins, D. Müllensiefen, and M. Pearce. 2010. On the non-existence of music: Why music theory is a figment of the imagination. Music. Sci. Discuss. Forum 5 (2010), 231--255.
[69]
Y.-H. Yang and H. Chen. 2012. Machine recognition of music emotion: A review. ACM Trans. Intell. Syst. Technol. 3, 3 (2012), 40:1--30.
[70]
M. Zentner, D. Grandjean, and K. Scherer. 2008. Emotions evoked by the sound of music. Characterization, classification, and measurement. Emotion 8, 4 (2008), 494--521.

Cited By

View all
  • (2024)Exploring Variational Auto-encoder Architectures, Configurations, and Datasets for Generative Music Explainable AIMachine Intelligence Research10.1007/s11633-023-1457-121:1(29-45)Online publication date: 15-Jan-2024
  • (2023)The Internet of Sounds: Convergent Trends, Insights, and Future DirectionsIEEE Internet of Things Journal10.1109/JIOT.2023.325360210:13(11264-11292)Online publication date: 1-Jul-2023
  • (2023)Deep learning’s shallow gains: a comparative evaluation of algorithms for automatic music generationMachine Language10.1007/s10994-023-06309-w112:5(1785-1822)Online publication date: 1-May-2023
  • Show More Cited By

Index Terms

  1. Getting Closer to the Essence of Music: The Con Espressione Manifesto

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 8, Issue 2
    Survey Paper, Special Issue: Intelligent Music Systems and Applications and Regular Papers
    March 2017
    407 pages
    ISSN:2157-6904
    EISSN:2157-6912
    DOI:10.1145/3004291
    • Editor:
    • Yu Zheng
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 October 2016
    Accepted: 01 February 2016
    Revised: 01 January 2016
    Received: 01 October 2015
    Published in TIST Volume 8, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. MIR
    2. music perception
    3. musical expressivity

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Austrian Science Fund FWF (Wittgenstein Award 2009)
    • European Research Council (ERC Advanced)

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Exploring Variational Auto-encoder Architectures, Configurations, and Datasets for Generative Music Explainable AIMachine Intelligence Research10.1007/s11633-023-1457-121:1(29-45)Online publication date: 15-Jan-2024
    • (2023)The Internet of Sounds: Convergent Trends, Insights, and Future DirectionsIEEE Internet of Things Journal10.1109/JIOT.2023.325360210:13(11264-11292)Online publication date: 1-Jul-2023
    • (2023)Deep learning’s shallow gains: a comparative evaluation of algorithms for automatic music generationMachine Language10.1007/s10994-023-06309-w112:5(1785-1822)Online publication date: 1-May-2023
    • (2022)COSMOS: Computational Shaping and Modeling of Musical StructuresFrontiers in Psychology10.3389/fpsyg.2022.52753913Online publication date: 27-May-2022
    • (2021)Structure, Abstraction and Reference in Artificial Musical IntelligenceHandbook of Artificial Intelligence for Music10.1007/978-3-030-72116-9_15(409-422)Online publication date: 3-Jul-2021
    • (2020)Cloud-smart Musical Instrument InteractionsACM Transactions on Internet of Things10.1145/33778811:3(1-29)Online publication date: Jun-2020
    • (2020)The Impact of the Complexity of Harmony on the Acceptability of MusicACM Transactions on Applied Perception10.1145/337501417:1(1-27)Online publication date: 10-Feb-2020
    • (2019)PerformanceNetProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v33i01.33011174(1174-1181)Online publication date: 27-Jan-2019
    • (2018)Generating music medleys via playing music puzzle gamesProceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence10.5555/3504035.3504313(2281-2288)Online publication date: 2-Feb-2018
    • (2018)Computational Models of Expressive Music Performance: A Comprehensive and Critical ReviewFrontiers in Digital Humanities10.3389/fdigh.2018.000255Online publication date: 24-Oct-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media