Evaluation of the Convolutional NMF for Supervised Polyphonic Music Transcription and Note Isolation

Stanislaw Gorlow¹⁷ &
Jordi Janer¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9237))

Included in the following conference series:

International Conference on Latent Variable Analysis and Signal Separation

2594 Accesses

Abstract

We evaluate the convolutive nonnegative matrix factorization in the context of automatic music transcription of polyphonic piano recordings and the associated problem of note isolation. Our intention is to find out whether the temporal continuity of piano notes is truthfully captured by the convolutional kernels and how the performance scales with complexity. Systematic studies of this kind are lacking in existing literature. We make use of established measures of accuracy and similarity. NMF dictionaries covering the piano’s pitch range are learned from a given sample bank of isolated notes. The kernel alias patch size is varied. By using a measure of performance advantage, we show up that the improvements due to convolved bases do not justify the extra computational effort as compared to the standard NMF. In particular, this is true for the more realistic case, in which the dictionary does not fully correspond to the mixture signal. Further pertinent conclusions are drawn as well.

S. Gorlow is now with Sony Computer Science Laboratory (CSL) in Paris, France.

This work was funded in part by the Yamaha Corporation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Music Source Separation with Deep Convolution Neural Network

Adaptation and Optimization of AugmentedNet for Roman Numeral Analysis Applied to Audio Signals

Singing voice separation using a deep convolutional neural network trained by ideal binary mask and cross entropy

Article 13 December 2018

Notes

1.
https://staff.aist.go.jp/m.goto/RWC-MDB/.
2.
http://www.mpi-inf.mpg.de/resources/SMD/SMD_MIDI-Audio-Piano-Music.html.
3.
The results shown are representative of what we experienced for different piano recordings.
4.
https://code.google.com/p/nmflib/.
5.
The number was chosen empirically. Above it, no significant improvement was observed.

References

Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Article Google Scholar
Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: Proceedings of the WASPAA 2003, pp. 177–180, October 2003
Google Scholar
Abdallah, S.A., Plumbley, M.D.: “Polyphonic music transcription by non-negative sparse coding of power spectra. In: Proceedings of the ISMIR 2004, pp. 318–325, October 2004
Google Scholar
Smaragdis, P.: Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs. In: Puntonet, C.G., Prieto, A.G. (eds.) ICA 2004. LNCS, vol. 3195, pp. 494–499. Springer, Heidelberg (2004)
Chapter Google Scholar
Smaragdis, P.: Convolutive speech bases and their application to supervised speech separation. IEEE Audio, Speech, Lang. Process. 15(1), 1–12 (2007)
Article Google Scholar
Huber, R., Kollmeier, B.: PEMO-Q – a new method for objective audio quality assessment using a model of auditory perception. IEEE Audio, Speech, Lang. Process. 14(6), 1902–1911 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Music Technology Group, Universitat Pompeu Fabra, Roc Boronat 138, 08018, Barcelona, Spain
Stanislaw Gorlow & Jordi Janer

Authors

Stanislaw Gorlow
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Janer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stanislaw Gorlow .

Editor information

Editors and Affiliations

Inria, Villers-les-Nancy, France
Emmanuel Vincent
Tel Aviv University, Tel-Aviv, Israel
Arie Yeredor
Technical University of Libere, Liberec, Czech Republic
Zbyněk Koldovský
The Czech Academy of Sciences, Prague, Czech Republic
Petr Tichavský

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gorlow, S., Janer, J. (2015). Evaluation of the Convolutional NMF for Supervised Polyphonic Music Transcription and Note Isolation. In: Vincent, E., Yeredor, A., Koldovský, Z., Tichavský, P. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2015. Lecture Notes in Computer Science(), vol 9237. Springer, Cham. https://doi.org/10.1007/978-3-319-22482-4_51

Download citation

DOI: https://doi.org/10.1007/978-3-319-22482-4_51
Published: 15 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22481-7
Online ISBN: 978-3-319-22482-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluation of the Convolutional NMF for Supervised Polyphonic Music Transcription and Note Isolation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Music Source Separation with Deep Convolution Neural Network

Adaptation and Optimization of AugmentedNet for Roman Numeral Analysis Applied to Audio Signals

Singing voice separation using a deep convolutional neural network trained by ideal binary mask and cross entropy

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Evaluation of the Convolutional NMF for Supervised Polyphonic Music Transcription and Note Isolation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Music Source Separation with Deep Convolution Neural Network

Adaptation and Optimization of AugmentedNet for Roman Numeral Analysis Applied to Audio Signals

Singing voice separation using a deep convolutional neural network trained by ideal binary mask and cross entropy

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation