research-article

Automatic classification of security messages based on text categorization

Authors:

Stéphane Ubéda,

Véronique LegrandAuthors Info & Claims

NOTERE '08: Proceedings of the 8th international conference on New technologies in distributed systems

Article No.: 9, Pages 1 - 7

https://doi.org/10.1145/1416729.1416741

Published: 23 June 2008 Publication History

Abstract

The generated messages by the security devices are the necessary data for the detection of the malicious activities in an information system. The heterogeneity of the devices and the lack of a standard for the security messages make the automatic processing of the messages difficult. The messages are short, use a very wide vocabulary and have different formats. We propose in this article the application of the text categorization technics for the automatic classification of security log files messages, in categories defined by an ontology. We develop an extraction module for the message attributes to reduce the vocabulary size. Then we apply two training algorithms: the k-nearest neighbour algorithm and the naive bayes, on two corpus of security log messages.

References

[1]

F. Benali, V. Legrand, and S. Ubéda. An ontology for the management of heteregenous alerts of information system. In The 2007 International Conference on Security and Management (SAM '07), Las Vegas, USA, June 2007.

[2]

E. Brill. A simple rule-based part-of-speech tagger. In Proceedings of ANLP-92, 3rd Conference on Applied Natural Language Processing, pages 152--155, Trento, IT, 1992.

Digital Library

[3]

F. D. C. Apte and S. Weiss. Text mining with decision rules and decision trees. In the Conference on Automated Learning and Discovery, Workshop 6: Learning from Text and the Web, 1998.

[4]

W. W. Cohen and Y. Singer. Context-sensitive learning methods for text categorization. In SIGIR '96: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 307--315, New York, NY, USA, 1996. ACM Press.

Digital Library

[5]

F. Cuppens and A. Miége. Alert correlation in a cooperative intrusion detection framework. In Proceedings of the IEEE Symposium of Security and Privacy, 2002.

Digital Library

[6]

J. O. P. E. Wiener and A. S. Weigend. A neural network approach to topic spotting. In the 4th Annual Symposium on Document Analysis and Information Retrieval. Morgan Kaufmann, 1995.

[7]

T. Joachims. Text categorization with support vector machines: learning with many relevant features. In C. Nédellec and C. Rouveirol, editors, Proceedings of ECML-98, 10th European Conference on Machine Learning, number 1398, pages 137--142, Chemnitz, DE, 1998. Springer Verlag, Heidelberg, DE.

Digital Library

[8]

J. Saraydaryan, V. Legrand, and S. Ubéda. Behavioral anomaly detection using bayesian modelization based on a global vision of the system. In NOTERE, April 2007.

[9]

D. D. Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. In C. Nédellec and C. Rouveirol, editors, Proceedings of ECML-98, 10th European Conference on Machine Learning, number 1398, pages 4--15, Chemnitz, DE, 1998. Springer Verlag, Heidelberg, DE.

Digital Library

[10]

R. E. Schapire and Y. Singer. Improved boosting using confidence-rated predictions, volume 37, pages 297--336, 1999.

Digital Library

[11]

F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1): 1--47, 2002.

Digital Library

[12]

H. Somers. Review article: Example-based machine translation. Machine Translation, 14(2):113--157, 1999.

Digital Library

[13]

Y. Yang and C. G. Chute. An example-based mapping method for text categorization and retrieval. ACM Trans. Inf. Syst., 12(3):252--277, 1994.

Digital Library

[14]

Y. Yang and X. Liu. A re-examination of text categorization methods, pages 42--49, 1999.

Digital Library

[15]

G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley (Reading MA), 1949.

Recommendations

Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
A generalized cluster centroid based classifier for text categorization

In this paper, a Generalized Cluster Centroid based Classifier (GCCC) and its variants for text categorization are proposed by utilizing a clustering algorithm to integrate two well-known classifiers, i.e., the K-nearest-neighbor (KNN) classifier and ...
Using kNN model for automatic text categorization

An investigation is conducted on two well-known similarity-based learning approaches to text categorization: the k-nearest neighbors (kNN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, a new ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

NOTERE '08: Proceedings of the 8th international conference on New technologies in distributed systems

June 2008

399 pages

ISBN:9781595939371

DOI:10.1145/1416729

Conference Chairs:
Djamal Benslimane
University of Lyon, France
,
Aris Ouksel
Northwestern University

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Lyon 1 University
SIGAPP: ACM Special Interest Group on Applied Computing
Mairie de Villeurbanne
Conseil Général du Rhône
INSA Lyon: Institut National des Sciences Appliquées de Lyon
Conseil Régional Rhône-Alpes
Mutuelle d'assurance MAIF
I.U.T.A LYON 1: Institute of Technology Lyon 1
Ministère de l'Enseignement Supérieur et de la Recherche
Lyon 2 University
ISTASE: High-Level Engineering School in Telecommunication
France Telecom
LIRIS: Lyon Research Center for Images and Intelligent Information Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
242
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents