Abstract
Detecting multiple pitches (F0s) and segregating musical instrument lines from monaural recordings of contrapuntal polyphonic music into separate tracks is a difficult problem in music signal processing. Applications include audio-to-MIDI conversion, automatic music transcription, and audio enhancement and transformation. Past attempts at separation have been limited to separating two harmonic signals in a contrapuntal duet (Maher, 1990) or several harmonic signals in a single chord (Virtanen and Klapuri, 2001, 2002). Several researchers have attempted polyphonic pitch detection (Klapuri, 2001; Eggink and Brown, 2004a), predominant melody extraction (Goto, 2001; Marolt, 2004; Eggink and Brown, 2004b), and instrument recognition (Eggink and Brown, 2003). Our solution assumes that each instrument is represented as a time-varying harmonic series and that errors can be corrected using prior knowledge of instrument spectra. Fundamental frequencies (F0s) for each time frame are estimated from input spectral data using an Expectation-Maximization (EM) based algorithm with Gaussian distributions used to represent the harmonic series. Collisions (i.e., overlaps) between instrument harmonics, which frequently occur, are predicted from the estimated F0s. The uncollided harmonics are matched to ones contained in a pre-stored spectrum library in order that each F0‘s harmonic series is assigned to the appropriate instrument. Corrupted harmonics are restored using data taken from the library. Finally, each voice is additively resynthesized to a separate track. This algorithm is demonstrated for a monaural signal containing three contrapuntal musical instrument voices with distinct timbres.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Beauchamp, J.: Unix Workstation Software for Analysis, Graphics, Modification, and Synthesis of Musical Sounds. Audio Eng. Soc., 1–17 (1993) Preprint No. 3479
Beauchamp, J.W., Horner, A.: Wavetable Interpolation Synthesis Based on Time-Variant Spectral Analysis of Musical Sounds. Audio Eng. Soc., 1–17 (1995) Preprint No. 3960
Eggink, J., Brown, G.J.: A missing feature approach to instrument identification in polyphonic music. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2003), pp. 553–556 (2003)
Eggink, J., Brown, G.J.: Instrument recognition in accompanied sonatas and concertos. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 2004, pp. IV217–IV220 (2004a)
Eggink, J., Brown, G.J.: Extracting melody lines from complex audio. In: Proc. 5th Int. Conf. on Music Information Retrieval (ISMIR 2004), pp. 84–91 (2004b)
Fritts, L.: University of Iowa Musical Instrument Samples (1997), On-line at, http://theremin.music.uiowa.edu/MIS.html
Goto, M.: A predominant-F0 estimation method for CD recordings: MAP estimation using EM algorithm for adaptive tone models. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2001), pp. 3365–3368 (2001)
Klapuri, A.: Multipitch estimation and sound separation by the spectral smoothness principle. In: Proc. ICASSP 2001, pp. 3381–3384 (2001)
Maher, R.: Evaluation of a method for separating digitized duet signals. J. Audio Eng. Soc. 38(12), 957–979 (1990)
Marolt, M.: Gaussian mixture models for extraction of melodic lines from audio recordings. In: Proc. 5th Int. Conf. on Music Information Retrieval (ISMIR 2004), pp. 80–83 (2004)
McAulay, R.J., Quatieri, T.F.: Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. Acoust. Speech, Signal Processing ASSP-34, 744–754 (1986)
Pepper, A.: The Intimate Art Pepper (music CD), tracks 5 & 7 (1996)
Rabiner, L., Juang, B.-H.: Fundamentals of Speech Recognition, pp. 125–128. Prentice-Hall, Englewood Cliffs (1993)
Smith, J.O., Serra, X.: PARSHL: An analysis/synthesis program for nonharmonic sounds based on a sinusoidal representation. In: Proc. 1987 Int. Computer Music Conf., pp. 290–297 (1987)
Thiede, T., Treurniet, W.C., Bitto, R., Schmidmer, C., Sporer, T., Beerends, J.G., Colomes, C., Keyhl, M.l., Stoll, G., Brandenburg, K., Feiten, B.: PEAQ-The ITU Standard for Objective Measurement of Perceived Audio Quality. J. Audio Eng. Soc. 48(1/2), 3–29 (2000)
Virtanen, T., Klapuri, A.: Separation of harmonic sounds using multipitch analysis and iterative parameter estimation. In: IEEE Workshop on Applicatioins of Signal Processing to Audio and Acoustics (WASPAA 2001), pp. 83–86 (2001)
Virtanen, T., Klapuri, A.: Separation of Harmonic Sounds Using Linear Models for the Overtone Series. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP 2002 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bay, M., Beauchamp, J.W. (2006). Harmonic Source Separation Using Prestored Spectra. In: Rosca, J., Erdogmus, D., Príncipe, J.C., Haykin, S. (eds) Independent Component Analysis and Blind Signal Separation. ICA 2006. Lecture Notes in Computer Science, vol 3889. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11679363_70
Download citation
DOI: https://doi.org/10.1007/11679363_70
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32630-4
Online ISBN: 978-3-540-32631-1
eBook Packages: Computer ScienceComputer Science (R0)