Abstract.
Content-based audio classification and segmentation is a basis for further audio/video analysis. In this paper, we present our work on audio segmentation and classification which employs support vector machines (SVMs). Five audio classes are considered in this paper: silence, music, background sound, pure speech, and non- pure speech which includes speech over music and speech over noise. A sound stream is segmented by classifying each sub-segment into one of these five classes. We have evaluated the performance of SVM on different audio type-pairs classification with testing unit of different- length and compared the performance of SVM, K-Nearest Neighbor (KNN), and Gaussian Mixture Model (GMM). We also evaluated the effectiveness of some new proposed features. Experiments on a database composed of about 4- hour audio data show that the proposed classifier is very efficient on audio classification and segmentation. It also shows the accuracy of the SVM-based method is much better than the method based on KNN and GMM.
Similar content being viewed by others
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Lu, L., Zhang, HJ. & Li, S. Content-based audio classification and segmentation by using support vector machines. Multimedia Systems 8, 482–492 (2003). https://doi.org/10.1007/s00530-002-0065-0
Issue Date:
DOI: https://doi.org/10.1007/s00530-002-0065-0