Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3503161.3548368acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

SongDriver: Real-time Music Accompaniment Generation without Logical Latency nor Exposure Bias

Published: 10 October 2022 Publication History

Abstract

Real-time music accompaniment generation has a wide range of applications in the music industry, such as music education and live performances. However, automatic real-time music accompaniment generation is still understudied and often faces a trade-off between logical latency and exposure bias. In this paper, we propose SongDriver, a real-time music accompaniment generation system without logical latency nor exposure bias. Specifically, SongDriver divides one accompaniment generation task into two phases: 1) The arrangement phase, where a Transformer model first arranges chords for input melodies in real-time, and caches the chords for the next phase instead of playing them out. 2) The prediction phase, where a CRF model generates playable multi-track accompaniments for the coming melodies based on previously cached chords. With this two-phase strategy, SongDriver directly generates the accompaniment for the upcoming melody, achieving zero logical latency. Furthermore, when predicting chords for a timestep, SongDriver refers to the cached chords from the first phase rather than its previous predictions, which avoids the exposure bias problem. Since the input length is often constrained under real-time conditions, another potential problem is the loss of long-term sequential information. To make up for this disadvantage, we extract four musical features from a long-term music piece before the current time step as global information. In the experiment, we train SongDriver on some open-source datasets and an original àiMusic Dataset built from Chinese-style modern pop music sheets. The results show that SongDriver outperforms existing SOTA (state-of-the-art) models on both objective and subjective metrics, meanwhile significantly reducing the physical latency.

Supplementary Material

MP4 File (MM22-mmfp2843.mp4)
We propose SongDriver, a real-time music accompaniment generation system without logical latency nor exposure bias. Specifically, SongDriver divides one accompaniment generation task into two phases: 1) The arrangement phase, where a Transformer model first arranges chords for input melodies in real-time, and caches the chords for the next phase instead of playing them out. 2) The prediction phase, where a CRF model generates playable multi-track accompaniments for the coming melodies based on previously cached chords.

References

[1]
Uraquitan Sidney Cunha and Geber Ramalho. 1999. An intelligent hybrid model for chord prediction. Organised Sound, Vol. 4, 2 (1999), 115--119.
[2]
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860 (2019).
[3]
Roger B Dannenberg. 1984. An on-line algorithm for real-time accompaniment. In ICMC, Vol. 84. 193--198.
[4]
Chris Donahue, Huanru Henry Mao, Yiting Ethan Li, Garrison W Cottrell, and Julian McAuley. 2019. LakhNES: Improving multi-instrumental music generation with cross-domain pre-training. arXiv preprint arXiv:1907.04868 (2019).
[5]
Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, and Yi-Hsuan Yang. 2018. Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[6]
Christos Garoufis, Athanasia Zlatintsi, and Petros Maragos. 2020. An LSTM-Based Dynamic Chord Progression Generation System for Interactive Music Performance. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4502--4506.
[7]
Christopher Harte, Mark Sandler, and Martin Gasser. 2006. Detecting harmonic change in musical audio. In Proceedings of the 1st ACM workshop on Audio and music computing multimedia. 21--26.
[8]
Bastian Havers, Romaric Duvignau, Hannaneh Najdataei, Vincenzo Gulisano, Ashok Chaitanya Koppisetty, and Marina Papatriantafilou. 2019. Driven: a framework for efficient data retrieval and clustering in vehicular networks. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 1850--1861.
[9]
Lejaren Arthur Hiller and Leonard M Isaacson. 1979. Experimental Music; Composition with an electronic computer. Greenwood Publishing Group Inc.
[10]
Wen-Yi Hsiao, Jen-Yu Liu, Yin-Cheng Yeh, and Yi-Hsuan Yang. 2021. Compound Word Transformer: Learning to compose full-song music over dynamic directed hypergraphs. arXiv preprint arXiv:2101.02402 (2021).
[11]
Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M Dai, Matthew D Hoffman, Monica Dinculescu, and Douglas Eck. 2018. Music transformer. arXiv preprint arXiv:1809.04281 (2018).
[12]
Nan Jiang, Sheng Jin, Zhiyao Duan, and Changshui Zhang. 2020. Rl-duet: Online music accompaniment generation using deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 710--718.
[13]
John Lafferty, Andrew McCallum, and Fernando CN Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. (2001).
[14]
Tzu-Hsuan Lin. 2021. Real-time pop music accompaniment generation according to vocal melody by deep learning models. https://tw40210.github.io/Real-time-pop-music-accompaniment-generation-according-to-vocal-melody-by-deep-learning-models_DEMO/.
[15]
Linqing Liu, Yao Lu, Min Yang, Qiang Qu, Jia Zhu, and Hongyan Li. 2018. Generative adversarial network for abstractive text summarization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[16]
Christine Payne. 2019. MuseNet, 2019. URL https://openai. com/blog/musenet (2019).
[17]
Christopher Raphael. 2010. Music plus one and machine learning. In ICML.
[18]
Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, and Tie-Yan Liu. 2020a. Popmag: Pop music accompaniment generation. In Proceedings of the 28th ACM International Conference on Multimedia. 1198--1206.
[19]
Yi Ren, Jinglin Liu, Xu Tan, Chen Zhang, Tao Qin, Zhou Zhao, and Tie-Yan Liu. 2020b. SimulSpeech: End-to-end simultaneous speech to text translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3787--3796.
[20]
Ian Simon, Dan Morris, and Sumit Basu. 2008. MySong: automatic accompaniment generation for vocal melodies. In Proceedings of the SIGCHI conference on human factors in computing systems. 725--734.
[21]
Ian Simon and Sageev Oore. 2017. Performance rnn: Generating music with expressive timing and dynamics. Magenta Blog (2017).
[22]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).
[23]
Prateek Verma and Preeti Rao. 2012. Real-time melodic accompaniment system for indian music using tms320c6713. In 2012 25th International Conference on VLSI Design. IEEE, 119--124.
[24]
Benedikte Wallace and Charles P Martin. 2019. Comparing Models for Harmony Prediction in an Interactive Audio Looper. In International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar). Springer, 173--187.
[25]
Shih-Lun Wu and Yi-Hsuan Yang. 2021. MuseMorphose: Full-song and fine-grained music style transfer with just one Transformer VAE. arXiv e-prints (2021), arXiv-2105.
[26]
Li-Chia Yang, Szu-Yu Chou, and Yi-Hsuan Yang. 2017. MidiNet: A convolutional generative adversarial network for symbolic-domain music generation. arXiv preprint arXiv:1703.10847 (2017).
[27]
Li-Chia Yang and Alexander Lerch. 2020. On the evaluation of generative models in music. Neural Computing and Applications, Vol. 32, 9 (2020), 4773--4784.
[28]
Yin-Cheng Yeh, Wen-Yi Hsiao, Satoru Fukayama, Tetsuro Kitahara, Benjamin Genchel, Hao-Min Liu, Hao-Wen Dong, Yian Chen, Terence Leong, and Yi-Hsuan Yang. 2021. Automatic melody harmonization with triad chords: A comparative study. Journal of New Music Research, Vol. 50, 1 (2021), 37--51.

Cited By

View all
  • (2024)Drawlody: Sketch-Based Melody Creation With Enhanced Usability and InterpretabilityIEEE Transactions on Multimedia10.1109/TMM.2024.336069526(7074-7088)Online publication date: 31-Jan-2024
  • (2024)SDMuse: Stochastic Differential Music Editing and Generation via Hybrid RepresentationIEEE Transactions on Multimedia10.1109/TMM.2023.328499626(1681-1689)Online publication date: 1-Jan-2024

Index Terms

  1. SongDriver: Real-time Music Accompaniment Generation without Logical Latency nor Exposure Bias

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automatic improvisation
    2. music accompaniment generation

    Qualifiers

    • Research-article

    Funding Sources

    • Project of Key Laboratory of Intelligent Processing Technology for Digital Music (Zhejiang Conservatory of Music), Ministry of Culture and Tourism
    • the Key R&D Program of Zhejiang Province
    • the Key Project of Natural Science Foundation of Zhejiang Province

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 995 of 4,171 submissions, 24%

    Upcoming Conference

    MM '24
    The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)96
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 24 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Drawlody: Sketch-Based Melody Creation With Enhanced Usability and InterpretabilityIEEE Transactions on Multimedia10.1109/TMM.2024.336069526(7074-7088)Online publication date: 31-Jan-2024
    • (2024)SDMuse: Stochastic Differential Music Editing and Generation via Hybrid RepresentationIEEE Transactions on Multimedia10.1109/TMM.2023.328499626(1681-1689)Online publication date: 1-Jan-2024

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media