article

Free access

One-class svms for document classification

Authors:

Larry M. Manevitz,

Malik YousefAuthors Info & Claims

The Journal of Machine Learning Research, Volume 2

Pages 139 - 154

Published: 01 March 2002 Publication History

Abstract

We implemented versions of the SVM appropriate for one-class classification in the context of information retrieval. The experiments were conducted on the standard Reuters data set. For the SVM implementation we used both a version of Schoelkopf et al. and a somewhat different version of one-class SVM based on identifying "outlier" data as representative of the second-class. We report on experiments with different kernels for both of these implementations and with different representations of the data, including binary vectors, tf-idf representation and a modification called "Hadamard" representation. Then we compared it with one-class versions of the algorithms prototype (Rocchio), nearest neighbor, naive Bayes, and finally a natural one-class neural network classification method based on "bottleneck" compression generated filters.The SVM approach as represented by Schoelkopf was superior to all the methods except the neural network one, where it was, although occasionally worse, essentially comparable. However, the SVM methods turned out to be quite sensitive to the choice of representation and kernel in ways which are not well understood; therefore, for the time being leaving the neural network approach as the most robust.

References

[1]

M. Balabanovic and Y. Shoham. Learning information retrieval agents: Experiments with automated web browsing. In Working Notes of AAAI Spring Symposium Series on Information Gathering from Distributed, Heterogeneous Environments. AAAI-Press, 1995.

[2]

P. Datta. Characteristic Concept Representations. PhD thesis, University of California, Irvine, 1997.

Digital Library

[3]

S.T. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representation for text categorization. In Proceedings of the seventh International Conference on Information and Knowledge Management (CIKM'98), pages 148-155, 1998.

Digital Library

[4]

P. Munro G.W. Cottrell and D. Zipser. Image compression by back propagation: an example of extensional programming. In N.E. Sharkey, editor, Advances in Cognitive Science, volume 3. Ablex, 1988.

[5]

N. Japkowicz, C. Myers, and M. Gluck. A novelty detection approach to classification. In Proceeding of the Fourteenth International Conference On Artificial Intelligence, pages 518-523. Montreal, Canada, 1995.

Digital Library

[6]

T. Joachims. A probabilistic analysis of the Rocchio algorithm with TF-IDF for text categorization. Technical Report CMU-CS-96-118, School of Computer Science, Carnegie Mellon University, Pittsburgh, 1996.

[7]

T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In Proceeding 10 European Conference on Machine Learning (ECML), pages 137-142. Springer Verlag, 1998. URL http://www-ai.cs.uni-dortmund.de/DOKIMENTE/Joachims_97a.sp.gz.

Digital Library

[8]

K. Lang. NewsWeeder: Learning to filter news. In Twelfth International Conference on Machine Learning, pages 331-339. Lake Tahoe, CA, 1995.

[9]

D. Lewis. Reuters-21578 text categorization test collection. http://www.research.att.com/~lewis, 1997.

[10]

L. Manevitz and M. Yousef. Document classification via neural networks trained exclusively with positive examples. Technical report, Department of Computer Science, University of Haifa, Haifa, 2001.

[11]

M. Pazzani and D. Billsus. Learning and revising user profiles: The identification of interesting web sites. Machine Learning, 27: 313-331, 1997.

Digital Library

[12]

M. Pazzani, J. Muramatsu, and D. Billsus. Syskill & Webert: Identifying interesting web sites. In AAAI Conference 1996, pages 54-61, 1996.

Digital Library

[13]

B. Schölkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, and R.C. Williamson. Estimating the support of a high-dimensional distribution. Technical report, Microsoft Research, MSR-TR-99-87, 1999.

[14]

C.J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition, 1979.

Digital Library

[15]

Yiming Yang and Xin Liu. A re-examination of text categorization methods. In Marti A. Hearst, Fredric Gey, and Richard Tong, editors, Proceedings of SIGIR- 99, 22nd ACM International Conference on Research and Development in Information Retrieval, pages 42-49, Berkeley, US, 1999. ACM Press, New York, US. URL http://www.cs.cmu.edu/yiming/papers.yy/sigir99.ps.

Digital Library

Cited By

Junwei HXu QJiang YWang ZSun YHuang QCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution DetectionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681118(1544-1553)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681118
Feng CSerra ESpezzano F(2024)PARs: Predicate-based Association Rules for Efficient and Accurate Anomaly ExplanationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679625(612-621)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679625
Chen ZLi ZChen XChen XFan HHu R(2024)Rectifying inaccurate unsupervised learning for robust time series anomaly detectionInformation Sciences: an International Journal10.1016/j.ins.2024.120222662:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.ins.2024.120222
Show More Cited By

One-class svms for document classification
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches

Recommendations

One-class Document Classification: One-class Document Classification via Neural Networks and Support Vector Machines
A novel multi-class classification algorithm based on one-class support vector machine

The existing multi-class classification algorithms based on support vector machine (SVM) generally decompose the original problem into smaller subproblems. However, the decomposition approach raises the problems of unreliable and unbalanced training of ...
Boosting One-Class Support Vector Machines for Multi-Class Classification

AdaBoost.M1 has been successfully applied to improve the accuracy of a learning algorithm for multi-class classification problems. However, it may be hard to satisfy the required conditions in some practical cases. An improved algorithm called ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 2, Issue

3/1/2002

735 pages

ISSN:1532-4435

EISSN:1533-7928

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 March 2002

Published in JMLR Volume 2

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

241
Total Citations
View Citations
4,525
Total Downloads

Downloads (Last 12 months)94
Downloads (Last 6 weeks)20

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Junwei HXu QJiang YWang ZSun YHuang QCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution DetectionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681118(1544-1553)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681118
Feng CSerra ESpezzano F(2024)PARs: Predicate-based Association Rules for Efficient and Accurate Anomaly ExplanationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679625(612-621)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679625
Chen ZLi ZChen XChen XFan HHu R(2024)Rectifying inaccurate unsupervised learning for robust time series anomaly detectionInformation Sciences: an International Journal10.1016/j.ins.2024.120222662:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.ins.2024.120222
Li WYang YSuzuki EFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Class-Specific Word Sense Aware Topic Modeling via Soft Orthogonalized TopicsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614809(1218-1227)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614809
Rezvani SWang X(2023)A broad review on class imbalance learning techniquesApplied Soft Computing10.1016/j.asoc.2023.110415143:COnline publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1016/j.asoc.2023.110415
Chen LWang DWang HWu XGu LMa FXie FFeng MYin ZLiu CJin T(2022)Research on an Oil Pipeline Anomaly Identification Method for Distinguishing True and False AnomaliesMobile Information Systems10.1155/2022/93668972022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/9366897
Murmu SKasyap HTripathy S(2022)PassMon: A Technique for Password Generation and Strength EstimationJournal of Network and Systems Management10.1007/s10922-021-09620-w30:1Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1007/s10922-021-09620-w
Seki K(2022)Turning News Texts into Business SentimentAdvances in Information Retrieval10.1007/978-3-030-99739-7_39(311-315)Online publication date: 10-Apr-2022
https://dl.acm.org/doi/10.1007/978-3-030-99739-7_39
Zhou YXu JWu JTaghavi ZKorpeoglu EAchan KHe JZhu FChin Ooi BMiao CWang HSkrypnyk IHsu WChawla S(2021)PUREProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467234(2409-2419)Online publication date: 14-Aug-2021
https://dl.acm.org/doi/10.1145/3447548.3467234
Feng CTian PZhu FChin Ooi BMiao CWang HSkrypnyk IHsu WChawla S(2021)Time Series Anomaly Detection for Cyber-physical Systems via Neural System Identification and Bayesian FilteringProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467137(2858-2867)Online publication date: 14-Aug-2021
https://dl.acm.org/doi/10.1145/3447548.3467137
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents