Authors:
Carlos-Emiliano González-Gallardo
1
;
Juan-Manuel Torres-Moreno
2
;
Azucena Montes Rendón
3
and
Gerardo Sierra
4
Affiliations:
1
Université d'Avignon et des Pays de Vaucluse, France
;
2
École Polytechnique de Montréal and Université d'Avignon et des Pays de Vaucluse, Canada
;
3
Centro Nacional de Investigación y Desarrollo Tecnológico, Mexico
;
4
GIL-Instituto de Ingeniería and Universidad Nacional Autónoma de México, Mexico
Keyword(s):
Text Mining, Machine Learning, Classification, n-grams, POS, Blogs, Tweets, Social Network.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Clustering and Classification Methods
;
Computational Intelligence
;
Evolutionary Computing
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Mining Text and Semi-Structured Data
;
Soft Computing
;
Symbolic Systems
Abstract:
In this paper we describe a dynamic normalization process applied to social network multilingual documents (Facebook and Twitter) to improve the performance of the Author profiling task for short texts. After the normalization process, n-grams of characters and n-grams of POS tags are obtained to extract all the possible stylistic information encoded in the documents (emoticons, character flooding, capital letters, references to other users, hyperlinks, hashtags, etc.). Experiments with SVM showed up to 90% of performance.