Abstract
A very example of web 2.0 application is Wikipedia, an online encyclopedia where anyone can edit and share information. However, blatantly unproductive edits greatly undermine the quality of Wikipedia. Their irresponsible acts force editors to waste time undoing vandalisms. For the purpose of improving information quality on Wikipedia and freeing the maintainer from such repetitive tasks, machine learning methods have been proposed to detect vandalism automatically. However, most of them focused on mining new features which seem to be inexhaustible to be discovered. Therefore, the question of how to make the best use of these features needs to be tackled. In this paper, we leverage feature transformation techniques to analyze the features and propose a framework using these methods to enhance detection. Experiment results on the public dataset PAN-WVC-10 show that our method is effective and it provides another useful method to help detect vandalism in Wikipedia.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Potthast, M.: Crowdsourcing a wikipedia vandalism corpus. In: 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, Switzerland, pp. 789–790. ACM, New York (2007)
Adler, B.T., de Alfaro, L., Mola-Velasco, S.M., Rosso, P., West, A.G.: Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features. In: Gelbukh, A. (ed.) CICLing 2011, Part II. LNCS, vol. 6609, pp. 277–288. Springer, Heidelberg (2011)
Lipka, N., Stein, B.: Identifying Featured Articles in Wikipedia: Writing Style Matters. In: 19th International World Wide Web Conference, Raleigh, USA, pp. 1147–1148. ACM, New York (2010)
Anderka, M., Stein, B.: A Breakdown of Quality Flaws in Wikipedia. In: The 2nd Joint WICOW/AIRWeb Workshop on Web Quality, Lyon, France, pp. 11–18. ACM, New York (2012)
Carter, J.: ClueBot and Vandalism on Wikipedia (2010), http://www.acm.uiuc.edu/~carter11/ClueBot.pdf
Rodríguez Posada, E.J.: AVBOT: Detecció y correcció de vandalismos en Wikipedia. NovATIca (203), 51–55 (2010)
Potthast, M., Stein, B., Gerling, R.: Automatic Vandalism Detection in Wikipedia. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 663–668. Springer, Heidelberg (2008)
Smets, K., Goethals, B., Verdonk, B.: Automatic vandalism detection in Wikipedia: Towards a machine learning approach. In: AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, Chicago, Illinois, USA, pp. 43–48 (2008)
Harpalani, M., Hart, M., Singh, S., Johnson, R., Choi, Y.: Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, pp. 83–88. ACM, New York (2011)
Wu, Q., Irani, D., Pu, C., Ramaswamy, L.: Elusive Vandalism Detection in Wikipedia: A Text Stability-based Approach. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, Ontario, Canada, pp. 1897–1800. ACM, New York (2010)
Wang, W.Y., Mckeown, K.R.: “Got you!”: automatic vandalism detection in Wikipedia with web-based shallow syntactic-semantic modeling. In: Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China, pp. 1146–1154. Association for Computational Linguistics Stroudsburg, PA (2010)
Chin, S.C., Street, W.N., Srinivasan, P., Eichmann, D.: Detecting Wikipedia vandalism with active learning and statistical language models. In: Proceedings of the 4th Workshop on Information Credibility, Raleigh, North Carolina, USA, pp. 3–10. ACM, New York (2010)
Mola-velasco, S.M.: Wikipedia Vandalism Detection. In: Proceedings of the 20th International Conference Companion on World Wide Web Conference, Hyderabad, India, pp. 391–395. ACM, New York (2011)
Adler, B., Alfaro, L.: A Content-Driven Reputation System for the Wikipedia. In: Proceedings of the 16th International World Wide Web Conference, Banff, Alberta, Canada, pp. 261–270. ACM, New York (2007)
Adler, B.T., Alfaro, L., Pye, I.: Detecting Wikipedia Vandalism using WikiTrust. Lab Report for PAN at CLEF (2010)
West, A.G., Lee, I., Kannan, S.: Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata. In: Proceedings of the Third European Workshop on System Security, Paris, France, pp. 22–28. ACM, New York (2010)
Fodor, I.K.: A survey of dimension reduction techniques. Technical report UCRL-ID-148494, LLNL (2002)
Wold, S., Esbensen, K.: Principal component analysis. Chemometrics and Intelligent Laboratory Systems 2(1-3), 37–52 (1987)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chang, T., Lin, H., Lin, Y. (2012). Feature Transformation Method Enhanced Vandalism Detection in Wikipedia. In: Hou, Y., Nie, JY., Sun, L., Wang, B., Zhang, P. (eds) Information Retrieval Technology. AIRS 2012. Lecture Notes in Computer Science, vol 7675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35341-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-35341-3_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35340-6
Online ISBN: 978-3-642-35341-3
eBook Packages: Computer ScienceComputer Science (R0)