MITT: Musical Instrument Timbre Transfer Based on the Multichannel Attention-Guided Mechanism

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12836))

Included in the following conference series:

International Conference on Intelligent Computing

1816 Accesses

Abstract

Research on neural style transfer and domain translation has clearly demonstrated the ability of deep learning algorithms to manipulate images based on their artistic style. The idea of image translation has been applied to the task of music-style transfer and to the timbre transfer of musical instrument recordings; however, the results have not been ideal. Generally, the task of instrument timbre transfer depends on the ability to extract a separated manipulable instrument timbre feature. However, as the distinction between a musical note and its timbre is often not sufficiently clear, generated samples by current timbre transfer models usually contain irrelevant waveforms. Here, we propose a method of timbre transfer, for musical instrument sounds, capable of converting one instrument sound to another while preserving note information (duration, pitch, rhythm, etc.). The multichannel attention-guided mechanism is used to enable timbre transfer between spectrograms, enhancing the ability of the model guidance generator to capture the most distinguishable components (harmonic components) in the process. The proposed model uses a Markov discriminator to optimize the generator, enabling it to accurately learn a spectrogram’s higher-order feature. Experimental results demonstrate that the proposed instrument timbre transfer model effectively captures the harmonic components in the target domain and produces explicit high-frequency details.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Neural Symbolic Music Genre Transfer Insights

Sound Transformation: Applying Image Neural Style Transfer Networks to Audio Spectograms

Music Style Transfer Using Constant-Q Transform Spectrograms

References

Mor, N., Wolf, L., Polyak, A., et al.: A universal music translation network. arXiv preprint arXiv:1805.07848 (2016)
Oord, A., Dieleman, S., Zen, H., et al.: Wavenet: a generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)
Huang, S., Li, Q., Anil, C., et al.: Timbretron: a wavenet (cyclegan (cqt (audio))) pipeline for musical timbre transfer. arXiv preprint arXiv:1811.09620 (2016)
Brown, J.C.: Calculation of a constant Q spectral transform. J. Acoustical Soc. Am. 89(1), 425–434 (1991)
Article Google Scholar
Zhu, J.Y., Park, T., Isola, P., et al.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232. IEEE, Italy (2017)
Google Scholar
Bitton, A., Esling, P., Chemla-Romeu-Santos, A.: Modulated Variational auto-Encoders for many-to-many musical timbre transfer. arXiv preprint arXiv:1810.00222 (2018)
Jain, D.K., Kumar, A., Cai, L., et al.: ATT: Attention-based Timbre Transfer. In: 2020 International Joint Conference on Neural Networks, pp. 1–6. IEEE, UK (2020)
Google Scholar
Tang, H., Liu, H., Xu, D., et al.: Attentiongan: unpaired image-to-image translation using attention-guided generative adversarial networks. arXiv preprint arXiv:1911.11897 (2019)
Isola, P., Zhu, J.Y., Zhou, T., et al.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134. IEEE, Italy (2017).
Google Scholar
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014)
Reed, S., Akata, Z., Yan, X., et al.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069 (2016)
Google Scholar
Yamamoto, R., Song, E., Kim, J.M.: Parallel WaveGAN: a fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. In: ICASSP 2020–2020 IEEE International Conference on Acoustics. Speech and Signal Processing, pp. 6199–6203. IEEE, USA (2020)
Google Scholar
Yi, Z., Zhang, H., Tan, P., et al.: Dualgan: Unsupervised dual learning for image-to-image translation In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2849–2857. IEEE, USA (2017)
Google Scholar
Pasini, M.: Melgan-vc: voice conversion and audio style transfer on arbitrarily long samples using spectrograms. arXiv preprint arXiv:1910.03713 (2019)
Schörkhuber, C., Klapuri, A., Sontacchi, A.: Pitch shifting of audio signals using the constant-q transform. In: Proceedings of the DAFx Conference (2012)
Google Scholar
Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. arXiv preprint arXiv:1703.00848 (2017)
Choi, Y., Choi, M., Kim, M., et al.: Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789–8797. IEEE, USA (2018)
Google Scholar
Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal Unsupervised Image-to-Image Translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) European Conference on Computer Vision, ECCV 2018. Lecture Notes in Computer Science, vol. 11207. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_11
Kim, J., Kim, M., Kang, H., et al.: U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. arXiv preprint arXiv:1907.10830 (2019)
Tang, H., Xu, D., Sebe, N., Yan, Y.: Attention-guided generative adversarial networks for unsupervised image-to-image translation. In: 2019 International Joint Conference on Neural Networks, pp. 1–8. IEEE, Hungary (2019).
Google Scholar
Engel, J., Agrawal, K.K., Chen, S., et al.: Gansynth: adversarial neural audio synthesis. arXiv preprint arXiv:1902.08710 (2019)

Download references

Author information

Authors and Affiliations

Hefei University of Technology, Hefei, China
Huayuan Chen & Yanxiang Chen

Authors

Huayuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yanxiang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Shenzhen University, Shenzhen, China
Jianqiang Li
Far Eastern Branch of the Russian Academy of Sciences, Vladivostok, Russia
Valeriya Gribova
Polytechnic University of Bari, Bari, Italy
Vitoantonio Bevilacqua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, H., Chen, Y. (2021). MITT: Musical Instrument Timbre Transfer Based on the Multichannel Attention-Guided Mechanism. In: Huang, DS., Jo, KH., Li, J., Gribova, V., Bevilacqua, V. (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science(), vol 12836. Springer, Cham. https://doi.org/10.1007/978-3-030-84522-3_47

Download citation

DOI: https://doi.org/10.1007/978-3-030-84522-3_47
Published: 09 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-84521-6
Online ISBN: 978-3-030-84522-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MITT: Musical Instrument Timbre Transfer Based on the Multichannel Attention-Guided Mechanism

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Neural Symbolic Music Genre Transfer Insights

Sound Transformation: Applying Image Neural Style Transfer Networks to Audio Spectograms

Music Style Transfer Using Constant-Q Transform Spectrograms

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

MITT: Musical Instrument Timbre Transfer Based on the Multichannel Attention-Guided Mechanism

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Neural Symbolic Music Genre Transfer Insights

Sound Transformation: Applying Image Neural Style Transfer Networks to Audio Spectograms

Music Style Transfer Using Constant-Q Transform Spectrograms

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation