Abstract
With the advances of Information Communication Technology (ICT) and the popularity of intelligent terminals, Online Social Network, which is characterized by powerful functions of information publishing, dissemination, acquisition and sharing, has attracted a huge number of users and become one of the most popular internet application services currently. However, the growth of Online Social Network has also led to the emergence of cyberbullying issues. Information spreads extremely fast via Online Social Network, making the harm caused by cyberbullying grow exponentially with time. As a result, it becomes critical to detect the cyberbullying in a quick and efficient way. In this paper, in order to solve this challenge, we propose an improved TF-IDF based fastText (ITFT) model for effective cyberbullying detection. Specifically, in our proposed scheme, we improve the TF-IDF algorithm by adding the position weight, keywords are extracted by the improved algorithm and used as input to achieve the purpose of filtering noise data to improve the accuracy. We use the fastText to construct a binary classifier to categorize the input data. Extensive experiments are conducted, and the results demonstrate that our proposed scheme can achieve better efficiency and accuracy in cyberbullying detection as compared with baselines.
References
Slonje R, Smith PK (2008) Cyberbullying: Another main type of bullying? Scand J Psychol 49(2):147–154
Hinduja S, Patchin JW (2010) Bullying, cyberbullying, and suicide. Arch Suicide Res 14(3):206–221
Patchin JW (2006) Bullies move beyond the schoolyard: A preliminary look at cyberbullying. Youth Viol Juvenile Just 4(2):148–169
Smith PK, Mahdavi J, Carvalho M, Fisher S, Tippett N (2010) Cyberbullying: Its nature and impact in secondary school pupils. J Child Psychol Psych 49(4):376–385
Schmidt A, Wiegand M (2017) A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, SocialNLP@EACL 2017, Valencia, Spain, pp 1–10
Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014. Long Papers Baltimore, Vol 1, pp 655–665
Tang D, Qin B, Liu T (2015) Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, pp 1422–1432
Joulin A, Grave E, Bojanowski P, Mikolov T (2016) Bag of tricks for efficient text classification. arXiv:1607.01759
Salawu S, He Y, Lumsden J (2017) Approaches to automated detection of cyberbullying: A survey. IEEE Transactions on Affective Computing
Nandhini B, Sheeba J (2015) Cyberbullying detection and classification using information retrieval algorithm. In: Proceedings of the 2015 International Conference on Advanced Research in Computer Science Engineering and Technology (ICARCSET 2015). ACM, pp 20
Squicciarini AC, Rajtmajer SM, Liu Y, Griffin C (2015) Identification and characterization of cyberbullying dynamics in an online social network. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015, Paris, France, pp 280–285
Chavan VS, Shylaja SS (2015) Machine learning approach for detection of cyber-aggressive comments by peers on social media network. In: 2015 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2015, Kochi, India, pp 2354–2358
Fahrnberger G, Nayak D, Martha VS, Ramaswamy S (2014) Safechat: a tool to shield children’s communication from explicit messages. In: International conference on innovations for community services
Pérez PJC, Valdez CJL, Ortiz MdGC, Barrera JPS, Pérez PF (2012) Misaac: Instant messaging tool for ciberbullying detection. In: Proceedings on the International Conference on Artificial Intelligence (ICAI). The Steering Committee of The World Congress in Computer Science, Computer, pp 1
Serra S, Venter HS (2011) Mobile cyber-bullying: A proposal for a pre-emptive approach to risk mitigation by employing digital forensic readiness. In: Information Security South Africa Conference 2011, Hyatt Regency Hotel, Rosebank, Johannesburg, South Africa, Proceedings ISSA 2011
Chen Y, Zhou Y, Zhu S, Xu H (2012) Detecting offensive language in social media to protect adolescent online safety. In: 2012 International Conference on Privacy, Security, Risk and Trust, PASSAT 2012, and 2012 International Confernece on Social Computing, SocialCom 2012, Amsterdam, Netherlands, pp 71–80
Bretschneider U, Wöhner T, Peters R (2014) Detecting online harassment in social networks. In: Proceedings of the International Conference on Information Systems - Building a Better World through Information Systems, ICIS 2014, Auckland, New Zealand
Agrawal S, Awekar A (2018) Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms. In: European Conference on Information Retrieval. Springer, pp 141–153
Al-Ajlan MA, Ykhlef M (2018) Optimized Twitter Cyberbullying Detection based on Deep Learning. In: 2018 21st Saudi Computer Society National Computer Conference (NCC). IEEE, pp 1–5
Dadvar M, Trieschnigg RB, de Jong F (2013) Expert knowledge for automatic detection of bullies in social networks. In: 25th Benelux Conference on Artificial Intelligence, BNAIC 2013. TU Delft, pp 57–64
Dadvar M, Trieschnigg D, de Jong F (2014) Experts and machines against bullies: A hybrid approach to detect cyberbullies. In: Advances in Artificial Intelligence - 27th Canadian Conference on Artificial Intelligence, Canadian AI 2014. Proceedings, Montréal, pp 275–281
Silva C, Santos R, Barbosa R (2018) Detection and Prevention of Bullying on Online Social Networks: The Combination of Textual, Visual and Cognitive. In: International Conference on Intelligent Technologies for Interactive Entertainment. Springer, pp 95–104
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. Computer Science
Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(1):2493–2537
Wu HC, Luk RWP, Wong K, Kwok K (2008) Interpreting TF-IDF, term weights as making relevance decisions. ACM Trans. Inf. Syst. 26(3):13:1–13:
Mihalcea R, Tarau P (2004) Textrank: Bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing , EMNLP 2004, A meeting of SIGDAT, a Special Interest Group of the ACL, held in conjunction with ACL 2004, Barcelona, Spain, pp 404–411
Tasci S, Gungor T (2009) Lda-based keyword selection in text categorization. In: The 24th International Symposium on Computer and Information Sciences, ISCIS 2009, North Cyprus, pp 230–235
Bosch A, Zisserman A, Muñoz X (2006) Scene classification via plsa. In: Computer vision - ECCV 2006, 9th european conference on computer vision. Proceedings, Graz, Part IV, pp 517–530
Chen J, Chen C, Liang Y (2016) Optimized tf-idf algorithm with the adaptive weight of position of word. In: 2016 2Nd international conference on artificial intelligence and industrial engineering (AIIE 2016). Atlantis Press
Davidian M (2002) Hierarchical linear models: Applications and data analysis methods. Publ Amer Stat Assoc 98(463):767–768
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is supported by the National Natural Science Foundation of China under Grant No.61872230, No.61702321 and No.61572311
Rights and permissions
About this article
Cite this article
Wu, J., Wen, M., Lu, R. et al. Toward efficient and effective bullying detection in online social network. Peer-to-Peer Netw. Appl. 13, 1567–1576 (2020). https://doi.org/10.1007/s12083-019-00832-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12083-019-00832-1