Authors:
Mahdi Washha
1
;
Aziz Qaroush
2
;
Manel Mezghani
1
and
Florence Sèdes
1
Affiliations:
1
University of Toulouse, France
;
2
Birzeit University, Palestinian Territory, Occupied
Keyword(s):
Twitter, Social Networks, Spam.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Enterprise Information Systems
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Society, e-Business and e-Government
;
Software Agents and Internet Computing
;
Symbolic Systems
;
User Profiling and Recommender Systems
;
Web 2.0 and Social Networking Controls
;
Web Information Systems and Technologies
Abstract:
The popularity of social networks is mainly conditioned by the integrity and the quality of contents generated by users as well as the maintenance of users’ privacy. More precisely, Twitter data (e.g. tweets) are valuable for a tremendous range of applications such as search engines and recommendation systems in which working on a high quality information is a compulsory step. However, the existence of ill-intentioned users in Twitter imposes challenges to maintain an acceptable level of data quality. Spammers are a concrete example of ill-intentioned users. Indeed, they have misused all services provided by Twitter to post spam content which consequently leads to serious problems such as polluting search results. As a natural reaction, various detection methods have been designed which inspect individual tweets or accounts for the existence of spam. In the context of large collections of Twitter users, applying these conventional methods is time consuming requiring months to filter
out spam accounts in such collections. Moreover, Twitter community cannot apply them either randomly or sequentially on each user registered because of the dynamicity of Twitter network. Consequently, these limitations raise the need to make the detection process more systematic and faster. Complementary to the conventional detection methods, our proposal takes the collective perspective of users (or accounts) to provide a searchable information to retrieve accounts having high potential for being spam ones. We provide a design of an unsupervised automatic method to predict spammy naming patterns, as searchable information, used in naming spam accounts. Our experimental evaluation demonstrates the efficiency of predicting spammy naming patterns to retrieve spam accounts in terms of precision, recall, and normalized discounted cumulative gain at different ranks.
(More)