Abstract
We evaluate the convolutive nonnegative matrix factorization in the context of automatic music transcription of polyphonic piano recordings and the associated problem of note isolation. Our intention is to find out whether the temporal continuity of piano notes is truthfully captured by the convolutional kernels and how the performance scales with complexity. Systematic studies of this kind are lacking in existing literature. We make use of established measures of accuracy and similarity. NMF dictionaries covering the piano’s pitch range are learned from a given sample bank of isolated notes. The kernel alias patch size is varied. By using a measure of performance advantage, we show up that the improvements due to convolved bases do not justify the extra computational effort as compared to the standard NMF. In particular, this is true for the more realistic case, in which the dictionary does not fully correspond to the mixture signal. Further pertinent conclusions are drawn as well.
S. Gorlow is now with Sony Computer Science Laboratory (CSL) in Paris, France.
This work was funded in part by the Yamaha Corporation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
The results shown are representative of what we experienced for different piano recordings.
- 4.
- 5.
The number was chosen empirically. Above it, no significant improvement was observed.
References
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: Proceedings of the WASPAA 2003, pp. 177–180, October 2003
Abdallah, S.A., Plumbley, M.D.: “Polyphonic music transcription by non-negative sparse coding of power spectra. In: Proceedings of the ISMIR 2004, pp. 318–325, October 2004
Smaragdis, P.: Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs. In: Puntonet, C.G., Prieto, A.G. (eds.) ICA 2004. LNCS, vol. 3195, pp. 494–499. Springer, Heidelberg (2004)
Smaragdis, P.: Convolutive speech bases and their application to supervised speech separation. IEEE Audio, Speech, Lang. Process. 15(1), 1–12 (2007)
Huber, R., Kollmeier, B.: PEMO-Q – a new method for objective audio quality assessment using a model of auditory perception. IEEE Audio, Speech, Lang. Process. 14(6), 1902–1911 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Gorlow, S., Janer, J. (2015). Evaluation of the Convolutional NMF for Supervised Polyphonic Music Transcription and Note Isolation. In: Vincent, E., Yeredor, A., Koldovský, Z., Tichavský, P. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2015. Lecture Notes in Computer Science(), vol 9237. Springer, Cham. https://doi.org/10.1007/978-3-319-22482-4_51
Download citation
DOI: https://doi.org/10.1007/978-3-319-22482-4_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22481-7
Online ISBN: 978-3-319-22482-4
eBook Packages: Computer ScienceComputer Science (R0)