Abstract
Public health surveillance via social media can be a useful tool to identify and track potential cases of a disease. The aim of this research was to design a method for identifying tweets describing potential Covid-19 cases. The proposed method uses a Wide & Deep (W&D) architecture, which combines two learning branches fed from different features to improve classification effectiveness. The deep branch uses a BERT-type model, while the wide branch considers two different lexical-based features. It was evaluated on the data from Task 5 of the Social Media Mining For Health (#SMM4H) 2021 competition. Results show that the proposed W&D method performed better than the wide-only and deep-only models, achieving an F1-score of 0.79 which matches the results of the 1st place ensemble-model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
A model based on BERT_Large pre-trained on a large collection of Covid-19 tweets.
References
WHO Coronavirus (COVID-19) Dashboard. https://covid19.who.int/. Accessed 30 Jan 2022
Oliver, D.: Mistruths and misunderstandings about covid-19 death numbers. In: BMJ 2021, vol. 372, pp. 352. the bmj. https://doi.org/10.1136/bmj.n352
Al-Garadi, M.A., Khan, M.S., Varathan, K.D., Mujtaba, G., Al-kabsi, A.M.: Using online social networks to track a pandemic: a systematic review. J. Biomed. Inform. 62, 1–11 (2016)
Xu, W., Ritter, A., Baldwin, T., Rahimi, A.: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020). Association for Computational Linguistics (2020). https://aclanthology.org/2020.wnut-1.0
Klein, A., et al.: Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task. Association for Computational Linguistics, Mexico City, Mexico (2021). https://aclanthology.org/2021.smm4h-1.0
Valdes, A., Lopez, J., Montes, M.: UACH-INAOE at SMM4H: a BERT based approach for classification of COVID-19 Twitter posts. In: Proceedings of the Sixth Social Media Mining for Health #SMM4H Workshop and Shared Task, pp. 65–68. Association for Computational Linguistics, Mexico City, Mexico (2021). https://doi.org/10.18653/v1/2021.smm4h-1.10
Aji, A., Nityasya, M., Wibowo, H., Prasojo, R., Fatyanosa, T.: BERT Goes BRRR: a venture towards the lesser error in classifying medical self-reporters on Twitter. In: Proceedings of the Sixth Social Media Mining for Health #SMM4H Workshop and Shared Task, pp. 58–64. Association for Computational Linguistics, Mexico City, Mexico (2021). https://doi.org/10.18653/v1/2021.smm4h-1.9
Cheng, H., et al.: Wide & Deep Learning for Recommender Systems (2016)
Müller, M., Salathé, M, Kummervold, P.: COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter (2020)
Ortega, R., Franco, A., Montes, M.: Identificación del perfil de autores en redes sociales usando nuevos esquemas de pesado que enfatizan información de tipo personal. Computación y Sistemas 23(2), 501–510 (2019) https://doi.org/10.13053/cvs-23-2-3005
Banik, S., Gosh, A., Banik, S.: Classification of COVID19 Tweets based on sentimental analysis. In: 2021 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–7 (2021) https://doi.org/10.1109/ICCCI50826.2021.9402540
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2019)
Magge, A., Pimpalkhute, V., Rallapalli, D, Siguenza, D., Gonzalez, G.: UPennHLP at WNUT-2020 task 2 : transformer models for classification of COVID19 posts on Twitter. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp. 378–382. Association for Computational Linguistics (2020)
Bai, Y., Zhou, X.: Automatic detecting for health-related Twitter Data with BioBERT. In: Proceedings of the Fifth Social Media Mining for Health Applications Workshop and Shared Task 2020, pp. 63–69. Association for Computational Linguistics, Barcelona, Spain (2020)
Vaswani, A., et al.: Attention Is All You Need (2017)
Roitero, K., Bozzato, C., Della-Mea, V., Mizzaro, S., Serra, G.: Twitter goes to the doctor: detecting medical Tweets using machine learning and BERT. In: Proceedings of the Workshop on Semantic Indexing and Information Retrieval for Health from Heterogeneous Content Types and Languages Co-located with 42nd European Conference on Information Retrieval, pp. 63–69. CEUR-WS.org, Lisbon, Portugal (2020)
Ifeanyi, C., Azah, N., Liyana, S.: Context-based feature technique for sarcasm identification in benchmark datasets using deep learning and BERT model. IEEE Access 9, 48501–48518 (2021)
López-Santillán, R., González, L., Montes-y-Gómez, M., López-Monroy, P., López-Santillán, M.: A Transformer-based Wide & Deep Neural Network for Author Profiling. In Revision (2022)
Burel, G., Saif, H., Alani, H.: Semantic wide and deep learning for detecting crisis-information categories on social media. In: International Semantic Web Conference, pp. 138–155 (2017)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Valdés-Chávez, A., López-Santillán, J.R., Gonzalez-Gurrola, L.C., Ramírez-Alonso, G., Montes-y-Gómez, M. (2022). A Wide & Deep Learning Approach for Covid-19 Tweet Classification. In: Vergara-Villegas, O.O., Cruz-Sánchez, V.G., Sossa-Azuela, J.H., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Olvera-López, J.A. (eds) Pattern Recognition. MCPR 2022. Lecture Notes in Computer Science, vol 13264. Springer, Cham. https://doi.org/10.1007/978-3-031-07750-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-07750-0_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07749-4
Online ISBN: 978-3-031-07750-0
eBook Packages: Computer ScienceComputer Science (R0)