Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Bar transformer: a hierarchical model for learning long-term structure and generating impressive pop music

Published: 15 August 2022 Publication History

Abstract

Recently many deep learning-based automatic music generation models have been proposed. How to generate long pieces of pop music with distinctive musical characteristics remains a challenging problem, as it relies heavily on musical structures. Some transformer-based models take advantage of self-attention for generating long-sequence music; however, most pay little attention to well-organized musical structures. In this article, we propose a novel note-to-bar hierarchical model named the Bar Transformer to address long-term dependency issues and generate impressive and structurally meaningful music. In particular, we propose a novel note-to-bar approach that pre-processes the notes within each individual bar to provide a strong structural constraint to increase our model’s awareness of the note-to-bar structure in music. The Bar Transformer is constructed using an encoder-decoder framework, including a two-layer encoder and an arrangement decoder. In the two-layer encoder, the bottom is a note-level encoder, which outputs embeddings by learning the relation between notes within an individual bar, and the top is a bar-level encoder, which uses these embeddings to encode each bar from the melody and chord. The decoder is an arrangement decoder used to generalize the interrelationships among the bars and simultaneously generate melodies and chords. The experimental results of the structural analysis and the aural evaluations demonstrate that our approach outperforms the Music Transformer model and other regressive models used for music generation.

References

[1]
Briot JP From artificial neural networks to deep learning for music generation: history, concepts and trends Neural Comput Applic 2021 33 1 39-65
[2]
Briot JP and Pachet F Deep learning for music generation: challenges and directions Neural Comput Applic 2020 32 4 981-993
[3]
Brown T, Mann B, Ryder N et al (2020) Language models are few-shot learners. In: Advances in neural information processing systems (NeurIPS), pp 1877–1901
[4]
Brunner G, Wang Y, Wattenhofer R et al (2017) Jambot: Music theory aware chord based generation of polyphonic music with lstms. In: 2017 IEEE 29th international conference on tools with artificial intelligence (ICTAI), IEEE, pp 519–526.
[5]
Brunner G, Konrad A, Wang Y et al (2018) Midi-vae: Modeling dynamics and instrumentation of music with applications to style transfer. In: Proceedings of the 19th international society for music information retrieval conference(ISMIR), pp 747–754
[6]
Choi K, Hawthorne C, Simon I et al (2020) Encoding musical style with transformer autoencoders. In: International conference on machine learning(ICML), pp 1899–1908
[7]
Chu H, Urtasun R, Fidler S (2017) Song from pi: a musically plausible network for pop music generation. In: 5th International conference on learning representations(ICLR)
[8]
Chuan CH, Herremans D (2018) Modeling temporal tonal relations in polyphonic music through deep networks with a novel image-based representation. In: Proceedings of the AAAI conference on artificial intelligence(AAAI), pp 2159–2166
[9]
Chung J, Ahn S, Bengio Y (2017) Hierarchical multiscale recurrent neural networks. In: 5th International conference on learning representations(ICLR)
[10]
Devlin J, Chang MW, Lee K et al (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the north american chapter of the association for computational linguistics: human language technologies (NAACL-HLT), pp 4171–4186.
[11]
Dong HW, Yang YH (2018) Convolutional generative adversarial networks with binary neurons for polyphonic music generation. In: Proceedings of the 19th international society for music information retrieval conference (ISMIR), pp 190–196
[12]
Dong HW, Hsiao WY, Yang LC et al (2018) Musegan: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Proceedings of the AAAI conference on artificial intelligence(AAAI), pp 34–41
[13]
Furner M, Islam MZ, and Li CT Knowledge discovery and visualisation framework using machine learning for music information retrieval from broadcast radio data Expert Syst Appl 2021 182 115,236
[14]
Gao T, Cui Y, Ding F (2021) Seqvae: sequence variational autoencoder with policy gradient. Appl Intell, pp 1–8
[15]
Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE International conference on acoustics, speech and signal processing, IEEE, pp 6645–6649
[16]
Guo Z, Dimos M, Dorien H (2021) Hierarchical recurrent neural networks for conditional melody generation with long-term structure. In: International joint conference on neural networks(IJCNN), pp 1–8.
[17]
Hadjeres G, Pachet F, Nielsen F (2017) Deepbach: a steerable model for bach chorales generation. In: International conference on machine learning(ICML), pp 1362–1371
[18]
Huang CZA, Vaswani A, Uszkoreit J et al (2019) Music transformer: generating music with long-term structure. In: 7th International conference on learning representations(ICLR)
[19]
Huang YS, Yang YH (2020) Pop music transformer: Beat-based modeling and generation of expressive pop piano compositions. In: Proceedings of the 28th ACM international conference on multimedia, pp 1180–1188
[20]
Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: 3rd International conference on learning representations(ICLR)
[21]
Liang FT, Gotham M, Johnson M et al (2017) Automatic stylistic composition of bach chorales with deep lstm. In: Proceedings of the 18th international society for music information retrieval conference(ISMIR), pp 449–456
[22]
Ockelford A (2017) Repetition in music: Theoretical and metatheoretical perspectives. Routledge
[23]
Oord AVD, Dieleman S, Zen H et al (2016) Wavenet: a generative model for raw audio. In: The 9th ISCA speech synthesis workshop, pp 125
[24]
Pappagari R, Zelasko P, Villalba J et al (2019) Hierarchical transformers for long document classification. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), IEEE, pp 838-844.
[25]
Paszke A, Gross S, Massa F et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems(NeurIPS), pp 8024–8035
[26]
Pauwels J, O’Hanlon K, Gómez E et al (2019) 20 years of automatic chord recognition from audio. In: Proceedings of the 20th International society for music information retrieval conference (ISMIR), pp 54–63
[28]
Roberts A, Engel J, Raffel C et al (2018) A hierarchical latent vector model for learning long-term structure in music. In: International conference on machine learning(ICML), pp 4364–4373
[29]
Roig C, Tardón LJ, Barbancho I, et al. A non-homogeneous beat-based harmony markov model Knowl-Based Syst 2018 142 85-94
[30]
Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies (NAACL-HLT), pp 464–468
[31]
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Advances in neural information processing systems(NeurIPS), pp 5998–6008
[32]
Villegas R, Yang J, Zou Y et al (2017) Learning to generate long-term future via hierarchical prediction. In: International conference on machine learning(ICML), pp 3560–3569
[33]
Waite E (2016) Project magenta: generating long-term structure in songs and stories. https://magenta.tensorflow.org/2016/07/15/lookback-rnn-attention-rnn/
[34]
Wang Z, Zhang Y, Zhang Y et al (2020) Pianotree vae: Structured representation learning for polyphonic music. In: Proceedings of the 21th international society for music information retrieval conference(ISMIR), pp 368–375
[35]
Wu J, Hu C, Wang Y, et al. A hierarchical recurrent neural network for symbolic melody generation IEEE Trans Cybern 2020 50 6 2749-2757
[36]
Wu J, Liu X, Hu X, et al. Popmnet: Generating structured pop music melodies using neural networks Artif Intell 2020 286 103,303
[37]
Yang LC, Chou SY, Yang YH (2017) Midinet: A convolutional generative adversarial network for symbolic-domain music generation. In: Proceedings of the 18th International society for music information retrieval conference(ISMIR), pp 324–331
[38]
Ycart A and Benetos E Learning and evaluation methodologies for polyphonic music sequence prediction with lstms EEE/ACM Trans Audio, Speech, Language Process 2020 28 1328-1341
[39]
Zhang N (2020) Learning adversarial transformer for symbolic music generation. IEEE Trans Neural Netw Learn Syst, pp 1–10.
[40]
Zhu H, Liu Q, Yuan NJ et al (2018) Xiaoice band: a melody and arrangement generation framework for pop music. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 2837–2846
[41]
Zhu H, Liu Q, Yuan NJ, et al. Pop music generation: from melody to multi-style arrangement ACM Trans Knowl Discov 2020 14 5 1-31

Cited By

View all
  • (2024)The Road AheadInternational Journal of Intelligent Systems10.1155/2024/40131952024Online publication date: 1-Jan-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Applied Intelligence
Applied Intelligence  Volume 53, Issue 9
May 2023
1640 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 15 August 2022
Accepted: 27 July 2022

Author Tags

  1. Music generation
  2. Impressive
  3. Long-term structure
  4. Hierarchical

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The Road AheadInternational Journal of Intelligent Systems10.1155/2024/40131952024Online publication date: 1-Jan-2024

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media