Abstract
With the availability of paired lyrics-melody dataset and advancements of artificial intelligence techniques, research on melody generation conditioned on lyrics has become possible. In this work, for melody generation, we propose a novel architecture, Three Branch Conditional (TBC) LSTM-GAN conditioned on lyrics which is composed of a LSTM-based generator and discriminator respectively. The generative model is composed of three branches of identical and independent lyrics-conditioned LSTM-based sub-networks, each responsible for generating an attribute of a melody. For discrete-valued sequence generation, we leverage the Gumbel-Softmax technique to train GANs. Through extensive experiments, we show that our proposed model generates tuneful and plausible melodies from the given lyrics and outperforms the current state-of-the-art models quantitatively as well as qualitatively.
A. Srivastava—was involved in this work during his internship at the National Institute of Informatics, Tokyo, Japan.
The second author has the same contribution as the first author for this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ackerman, M., Loker, D.: Algorithmic songwriting with ALYSIA. CoRR abs/1612.01058 (2016). http://arxiv.org/abs/1612.01058
Bao, H., et al.: Neural melody composition from lyrics. CoRR abs/1809.04318 (2018). http://arxiv.org/abs/1809.04318
Fedus, W., Goodfellow, I.J., Dai, A.M.: Maskgan: better text generation via filling in the. ArXiv abs/1801.07736 (2018)
Guo, J., Lu, S., Cai, H., Zhang, W., Yu, Y., Wang, J.: Long text generation via adversarial training with leaked information. ArXiv abs/1709.08624 (2018)
Hiller, Jr., L.A., Isaacson, L.M.: Musical composition with a high-speed digital computer. J. Audio Eng. Soc. 6(3), 154–160 (1958). http://www.aes.org/e-lib/browse.cfm?elib=231
Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel-softmax (2016)
Jolicoeur-Martineau, A.: The relativistic discriminator: a key element missing from standard gan. ArXiv abs/1807.00734 (2019)
Lin, K., Li, D., He, X., Zhang, Z., Sun, M.T.: Adversarial ranking for language generation. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 3158–3168. Curran Associates Inc., Red Hook (2017)
Maddison, C.J., Mnih, A., Teh, Y.W.: The concrete distribution: a continuous relaxation of discrete random variables (2016)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. CoRR abs/1411.1784 (2014). http://arxiv.org/abs/1411.1784
Nie, W., Narodytska, N., Patel, A.B.: Relgan: relational generative adversarial networks for text generation. In: ICLR (2019)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation, October 2002. https://doi.org/10.3115/1073083.1073135
Rodriguez, J.D.F., Vico, F.J.: AI methods in algorithmic composition: a comprehensive survey. CoRR abs/1402.0585 (2014). http://arxiv.org/abs/1402.0585
Semeniuta, S., Severyn, A., Gelly, S.: On accurate evaluation of gans for language generation (2018)
Sutton, R., Mcallester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 12, February 2000
Wiggins, G.A.: A preliminary framework for description, analysis and comparison of creative systems. J. Knowl. Based Syst. 19(7), 449–458 (2006)
Yu, L., Zhang, W., Wang, J., Yu, Y.: Seqgan: sequence generative adversarial nets with policy gradient. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 2852–2858. AAAI Press (2017)
Yu, Yi., Harscoët, Florian, Canales, Simon, Reddy M, Gurunath, Tang, Suhua, Jiang, Junjun: Lyrics-conditioned neural melody generation. In: Ro, Yong Man, Cheng, Wen-Huang., Kim, Junmo, Chu, Wei-Ta., Cui, Peng, Choi, Jung-Woo., Hu, Min-Chun., De Neve, Wesley (eds.) MMM 2020. LNCS, vol. 11962, pp. 709–714. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_58
Yu, Y., Srivastava, A., Canales, S.: Conditional lstm-gan for melody generation from lyrics. ACM Trans. Multimedia Comput. Commun. Appl. (2020)
Yu, Y., Tang, S., Raposo, F., Chen, L.: Deep cross-modal correlation learning for audio and lyrics in music retrieval. ACM Trans. Multimedia Comput. Commun. Appl. 15(1), February 2019. https://doi.org/10.1145/3281746
Zhang, Y., Gan, Z., Fan, K., Chen, Z., Henao, R., Shen, D., Carin, L.: Adversarial feature matching for text generation. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70. pp. 4006–4015. ICML’17, JMLR.org (2017)
Zhao, J.J., Kim, Y., Zhang, K., Rush, A.M., LeCun, Y.: Adversarially regularized autoencoders. In: ICML (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Srivastava, A. et al. (2022). Melody Generation from Lyrics Using Three Branch Conditional LSTM-GAN. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13141. Springer, Cham. https://doi.org/10.1007/978-3-030-98358-1_45
Download citation
DOI: https://doi.org/10.1007/978-3-030-98358-1_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98357-4
Online ISBN: 978-3-030-98358-1
eBook Packages: Computer ScienceComputer Science (R0)