Abstract
In this paper we describe an unsupervised, deterministic algorithm for segmenting DJ-mixed Electronic Dance Music streams (for example; podcasts, radio shows, live events) into their respective tracks. We attempt to reconstruct boundaries as close as possible to what a human domain expert would engender. The goal of DJ-mixing is to render track boundaries effectively invisible from the standpoint of human perception which makes the problem difficult.
We use Dynamic Programming (DP) to optimally segment a cost matrix derived from a similarity matrix. The similarity matrix is based on the cosines of a time series of kernel-transformed Fourier based features designed with this domain in mind. Our method is applied to EDM streams. Its formulation incorporates long-term self similarity as a first class concept combined with DP and it is qualitatively assessed on a large corpus of long streams that have been hand labelled by a domain expert.
Chapter PDF
Similar content being viewed by others
References
Foote, J.: Visualizing music and audio using self-similarity. In: Proceedings of the Seventh ACM International Conference on Multimedia (Part 1), pp. 77–80. ACM (1999)
Foote, J.: A similarity measure for automatic audio classification. In: Proc. AAAI 1997 Spring Symposium on Intelligent Integration and Use of Text, Image, Video, and Audio Corpora (1997)
Foote, J.: Automatic audio segmentation using a measure of audio novelty. In: 2000 IEEE International Conference on Multimedia and Expo, ICME 2000, vol. 1, pp. 452–455. IEEE (2000)
Foote, J.T., Cooper, M.L.: Media segmentation using self-similarity decomposition. In: Electronic Imaging 2003, pp. 167–175. International Society for Optics and Photonics (2003)
Foote, J., Cooper, M.: Visualizing musical structure and rhythm via self-similarity. In: Proceedings of the 2001 International Computer Music Conference, pp. 419–422 (2001)
Goodwin, M.M., Laroche, J.: Audio segmentation by feature-space clustering using linear discriminant analysis and dynamic programming. In: 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 131–134. IEEE (2003)
Goodwin, M.M., Laroche, J.: A dynamic programming approach to audio segmentation and speech/music discrimination. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), vol. 4, pp. iv–309. IEEE (2004)
Peeters, G., La Burthe, A., Rodet, X.: Toward automatic music audio summary generation from signal analysis. In: Proc. of ISMIR, pp. 94–100 (2002)
Peeters, G.: Deriving musical structures from signal analysis for music audio summary generation: “Sequence” and “State” approach. In: Wiil, U.K. (ed.) CMMR 2003. LNCS, vol. 2771, pp. 143–166. Springer, Heidelberg (2004)
Peiszer, E., Lidy, T., Rauber, A.: Automatic audio segmentation: Segment boundary and structure detection in popular music. In: Proc. of LSAS (2008)
Sox, the swiss army knife of sound processing programs, http://sox.sourceforge.net/
Lindgren, M.: Cuenation, website for edm community to share track time metadata, http://cuenation.com/
Nyquist, H.: Certain topics in telegraph transmission theory. Transactions of the American Institute of Electrical Engineers 47(2), 617–644 (1928)
Frigo, M., Johnson, S.G.: The fftw web page (2004)
Tzanetakis, G., Cook, P.: Multifeature audio segmentation for browsing and annotation. In: 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 103–106. IEEE (1999)
Tzanetakis, G., Cook, F.: A framework for audio analysis based on classification and temporal segmentation. In: Proceedings of 25th EUROMICRO Conference, vol. 2, pp. 61–67 (1999)
Theiler, J.P., Gisler, G.: Contiguity-enhanced k-means clustering algorithm for unsupervised multispectral image segmentation. In: Optical Science, Engineering and Instrumentation 1997, pp. 108–118. International Society for Optics and Photonics (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 IFIP International Federation for Information Processing
About this paper
Cite this paper
Scarfe, T., Koolen, W.M., Kalnishkan, Y. (2013). A Long-Range Self-similarity Approach to Segmenting DJ Mixed Music Streams. In: Papadopoulos, H., Andreou, A.S., Iliadis, L., Maglogiannis, I. (eds) Artificial Intelligence Applications and Innovations. AIAI 2013. IFIP Advances in Information and Communication Technology, vol 412. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41142-7_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-41142-7_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41141-0
Online ISBN: 978-3-642-41142-7
eBook Packages: Computer ScienceComputer Science (R0)