Abstract
Hypertext/text domains are characterized by several tens or hundreds of thousands of features. This represents a challenge for supervised learning algorithms which have to learn accurate classifiers using a small set of available training examples. In this paper, a fuzzy semi-supervised support vector machines (FSS-SVM) algorithm is proposed. It tries to overcome the need for a large labelled training set. For this, it uses both labelled and unlabelled data for training. It also modulates the effect of the unlabelled data in the learning process. Empirical evaluations with two real-world hypertext datasets showed that, by additionally using unlabelled data, FSS-SVM requires less labelled training data than its supervised version, support vector machines, to achieve the same level of classification performance. Also, the incorporated fuzzy membership values of the unlabelled training patterns in the learning process have positively influenced the classification performance in comparison with its crisp variant.
Chapter PDF
Similar content being viewed by others
References
Liere, R. and P. Tadepalli (1996). “The use of active learning in text categorization.” Proceedings of the AAAI Symposium on Machine Learning in Information Access.
Lewis, D. D. (1992). “Feature selection and feature extraction for text categorization.” Proceedings of the workshop on Speech and Natural Language: 212-217.
Vapnik, V. N. (1998). Statistical learning theory, Wiley New York.
Joachims, T. (1999). “Transductive inference for text classification using support vector machines.” Proceedings of the Sixteenth International Conference on Machine Learning: 200-209.
Bennett, K. and A. Demiriz (1998). “Semi-supervised support vector machines.” Advances in Neural Information Processing Systems 11: 368-374.
Fung, G. and O. Mangasarian (1999). “semi-supervised support vector machines for unlabeled data classification.” (Technical Report 99-05). Data mining Institute, University of Wisconsin at Madison, Madison, WI.
Zhang, X. (1999). “Using class-center vectors to build support vector machines.” Neural Networks for Signal Processing IX, 1999. Proceedings of the 1999 IEEE Signal Processing Society Workshop: 3-11.
Cao, L. J., H. P. Lee, et al. (2003). “Modified support vector novelty detector using training data with outliers.” Pattern Recognition Letters 24(14): 2479-2487.
Lin, C. F. and S. D. Wang (2002). “Fuzzy support vector machines.” IEEE Transactions on Neural Networks 13(2): 464-471.
Sheng-de Wang, C. L. (2003). “Training algorithms for fuzzy support vector machines with noisy data.” Neural Networks for Signal Processing, 2003. NNSP'03. 2003 IEEE 13th Workshop on: 517-526.
Bezdek, J. C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms, Kluwer Academic Publishers Norwell, MA, USA.
Bensaid, A. M., L. O. Hall, et al. (1996). “Partially supervised clustering for image segmentation.” Pattern Recognition 29(5), 859-871.
Guyon, I., N. Matic, et al. (1996). “Discovering informative patterns and data cleaning.” Advances in knowledge discovery and data mining table of contents: 181-203.
Sinka, M. P. and D. W. Corne (2002). “A large benchmark dataset for web document clustering.” Soft Computing Systems: Design, Management and Applications 87: 881-890
Benbrahim, H. and M. Bramer (2004). “Neighbourhood Exploitation in Hypertext Categorization.” In Proceedings of the Twenty-fourth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, Cambridge, December 2004, pp. 258-268. ISBN 1-85233-907-1
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 International Federation for Information Processing
About this paper
Cite this paper
Benbrahim, H., Bramer, M. (2008). A Fuzzy Semi-Supervised Support Vector Machines Approach to Hypertext Categorization. In: Bramer, M. (eds) Artificial Intelligence in Theory and Practice II. IFIP AI 2008. IFIP – The International Federation for Information Processing, vol 276. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09695-7_10
Download citation
DOI: https://doi.org/10.1007/978-0-387-09695-7_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09694-0
Online ISBN: 978-0-387-09695-7
eBook Packages: Computer ScienceComputer Science (R0)