research-article

Bar transformer: a hierarchical model for learning long-term structure and generating impressive pop music

Authors:

Mao YeAuthors Info & Claims

Applied Intelligence, Volume 53, Issue 9

Pages 10130 - 10148

https://doi.org/10.1007/s10489-022-04049-3

Published: 15 August 2022 Publication History

Abstract

Recently many deep learning-based automatic music generation models have been proposed. How to generate long pieces of pop music with distinctive musical characteristics remains a challenging problem, as it relies heavily on musical structures. Some transformer-based models take advantage of self-attention for generating long-sequence music; however, most pay little attention to well-organized musical structures. In this article, we propose a novel note-to-bar hierarchical model named the Bar Transformer to address long-term dependency issues and generate impressive and structurally meaningful music. In particular, we propose a novel note-to-bar approach that pre-processes the notes within each individual bar to provide a strong structural constraint to increase our model’s awareness of the note-to-bar structure in music. The Bar Transformer is constructed using an encoder-decoder framework, including a two-layer encoder and an arrangement decoder. In the two-layer encoder, the bottom is a note-level encoder, which outputs embeddings by learning the relation between notes within an individual bar, and the top is a bar-level encoder, which uses these embeddings to encode each bar from the melody and chord. The decoder is an arrangement decoder used to generalize the interrelationships among the bars and simultaneously generate melodies and chords. The experimental results of the structural analysis and the aural evaluations demonstrate that our approach outperforms the Music Transformer model and other regressive models used for music generation.

References

[1]

Briot JP From artificial neural networks to deep learning for music generation: history, concepts and trends Neural Comput Applic 2021 33 1 39-65

[2]

Briot JP and Pachet F Deep learning for music generation: challenges and directions Neural Comput Applic 2020 32 4 981-993

[3]

Brown T, Mann B, Ryder N et al (2020) Language models are few-shot learners. In: Advances in neural information processing systems (NeurIPS), pp 1877–1901

[4]

Brunner G, Wang Y, Wattenhofer R et al (2017) Jambot: Music theory aware chord based generation of polyphonic music with lstms. In: 2017 IEEE 29th international conference on tools with artificial intelligence (ICTAI), IEEE, pp 519–526.

[5]

Brunner G, Konrad A, Wang Y et al (2018) Midi-vae: Modeling dynamics and instrumentation of music with applications to style transfer. In: Proceedings of the 19th international society for music information retrieval conference(ISMIR), pp 747–754

[6]

Choi K, Hawthorne C, Simon I et al (2020) Encoding musical style with transformer autoencoders. In: International conference on machine learning(ICML), pp 1899–1908

[7]

Chu H, Urtasun R, Fidler S (2017) Song from pi: a musically plausible network for pop music generation. In: 5th International conference on learning representations(ICLR)

[8]

Chuan CH, Herremans D (2018) Modeling temporal tonal relations in polyphonic music through deep networks with a novel image-based representation. In: Proceedings of the AAAI conference on artificial intelligence(AAAI), pp 2159–2166

[9]

Chung J, Ahn S, Bengio Y (2017) Hierarchical multiscale recurrent neural networks. In: 5th International conference on learning representations(ICLR)

[10]

Devlin J, Chang MW, Lee K et al (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the north american chapter of the association for computational linguistics: human language technologies (NAACL-HLT), pp 4171–4186.

[11]

Dong HW, Yang YH (2018) Convolutional generative adversarial networks with binary neurons for polyphonic music generation. In: Proceedings of the 19th international society for music information retrieval conference (ISMIR), pp 190–196

[12]

Dong HW, Hsiao WY, Yang LC et al (2018) Musegan: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Proceedings of the AAAI conference on artificial intelligence(AAAI), pp 34–41

[13]

Furner M, Islam MZ, and Li CT Knowledge discovery and visualisation framework using machine learning for music information retrieval from broadcast radio data Expert Syst Appl 2021 182 115,236

[14]

Gao T, Cui Y, Ding F (2021) Seqvae: sequence variational autoencoder with policy gradient. Appl Intell, pp 1–8

[15]

Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE International conference on acoustics, speech and signal processing, IEEE, pp 6645–6649

[16]

Guo Z, Dimos M, Dorien H (2021) Hierarchical recurrent neural networks for conditional melody generation with long-term structure. In: International joint conference on neural networks(IJCNN), pp 1–8.

[17]

Hadjeres G, Pachet F, Nielsen F (2017) Deepbach: a steerable model for bach chorales generation. In: International conference on machine learning(ICML), pp 1362–1371

[18]

Huang CZA, Vaswani A, Uszkoreit J et al (2019) Music transformer: generating music with long-term structure. In: 7th International conference on learning representations(ICLR)

[19]

Huang YS, Yang YH (2020) Pop music transformer: Beat-based modeling and generation of expressive pop piano compositions. In: Proceedings of the 28th ACM international conference on multimedia, pp 1180–1188

[20]

Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: 3rd International conference on learning representations(ICLR)

[21]

Liang FT, Gotham M, Johnson M et al (2017) Automatic stylistic composition of bach chorales with deep lstm. In: Proceedings of the 18th international society for music information retrieval conference(ISMIR), pp 449–456

[22]

Ockelford A (2017) Repetition in music: Theoretical and metatheoretical perspectives. Routledge

[23]

Oord AVD, Dieleman S, Zen H et al (2016) Wavenet: a generative model for raw audio. In: The 9th ISCA speech synthesis workshop, pp 125

[24]

Pappagari R, Zelasko P, Villalba J et al (2019) Hierarchical transformers for long document classification. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), IEEE, pp 838-844.

[25]

Paszke A, Gross S, Massa F et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems(NeurIPS), pp 8024–8035

[26]

Pauwels J, O’Hanlon K, Gómez E et al (2019) 20 years of automatic chord recognition from audio. In: Proceedings of the 20th International society for music information retrieval conference (ISMIR), pp 54–63

[27]

Payne C (2019) Musenet. https://openaicom/blog/musenet

[28]

Roberts A, Engel J, Raffel C et al (2018) A hierarchical latent vector model for learning long-term structure in music. In: International conference on machine learning(ICML), pp 4364–4373

[29]

Roig C, Tardón LJ, Barbancho I, et al. A non-homogeneous beat-based harmony markov model Knowl-Based Syst 2018 142 85-94

[30]

Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies (NAACL-HLT), pp 464–468

[31]

Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Advances in neural information processing systems(NeurIPS), pp 5998–6008

[32]

Villegas R, Yang J, Zou Y et al (2017) Learning to generate long-term future via hierarchical prediction. In: International conference on machine learning(ICML), pp 3560–3569

[33]

Waite E (2016) Project magenta: generating long-term structure in songs and stories. https://magenta.tensorflow.org/2016/07/15/lookback-rnn-attention-rnn/

[34]

Wang Z, Zhang Y, Zhang Y et al (2020) Pianotree vae: Structured representation learning for polyphonic music. In: Proceedings of the 21th international society for music information retrieval conference(ISMIR), pp 368–375

[35]

Wu J, Hu C, Wang Y, et al. A hierarchical recurrent neural network for symbolic melody generation IEEE Trans Cybern 2020 50 6 2749-2757

[36]

Wu J, Liu X, Hu X, et al. Popmnet: Generating structured pop music melodies using neural networks Artif Intell 2020 286 103,303

[37]

Yang LC, Chou SY, Yang YH (2017) Midinet: A convolutional generative adversarial network for symbolic-domain music generation. In: Proceedings of the 18th International society for music information retrieval conference(ISMIR), pp 324–331

[38]

Ycart A and Benetos E Learning and evaluation methodologies for polyphonic music sequence prediction with lstms EEE/ACM Trans Audio, Speech, Language Process 2020 28 1328-1341

[39]

Zhang N (2020) Learning adversarial transformer for symbolic music generation. IEEE Trans Neural Netw Learn Syst, pp 1–10.

[40]

Zhu H, Liu Q, Yuan NJ et al (2018) Xiaoice band: a melody and arrangement generation framework for pop music. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 2837–2846

[41]

Zhu H, Liu Q, Yuan NJ, et al. Pop music generation: from melody to multi-style arrangement ACM Trans Knowl Discov 2020 14 5 1-31

Cited By

S. BChirchi VKadry SAgoramoorthy MP. GK. ST. A. S(2024)The Road AheadInternational Journal of Intelligent Systems10.1155/2024/40131952024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/4013195

Recommendations

XiaoIce Band: A Melody and Arrangement Generation Framework for Pop Music
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

With the development of knowledge of music composition and the recent increase in demand, an increasing number of companies and research institutes have begun to study the automatic generation of music. However, previous models have limitations when ...
Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

A great number of deep learning based models have been recently proposed for automatic music composition. Among these models, the Transformer stands out as a prominent approach for generating expressive classical piano performance with a coherent ...
Pop Music Generation: From Melody to Multi-style Arrangement
Special Issue on KDD 2018, Regular Papers and Survey Paper

Music plays an important role in our daily life. With the development of deep learning and modern generation techniques, researchers have done plenty of works on automatic music generation. However, due to the special requirements of both melody and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Applied Intelligence

Applied Intelligence Volume 53, Issue 9

May 2023

1640 pages

ISSN:0924-669X

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 15 August 2022

Accepted: 27 July 2022

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
National Natural Science Foundation of China
Guangxi Scientific Research Basic Ability Enhancement Program for Young and Middle-aged Teachers
Guangxi Science and Technology Major Project
Guangxi Postdoctoral Special Foundation
Guangxi Key Laboratory of Electrochemical and Magnetochemical Functional Materials

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

S. BChirchi VKadry SAgoramoorthy MP. GK. ST. A. S(2024)The Road AheadInternational Journal of Intelligent Systems10.1155/2024/40131952024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/4013195

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents