Identifying Personal Experience Tweets of Medication Effects Using Pre-trained RoBERTa Language Model and Its Updating

Minghao Zhu, Youzhe Song, Ge Jin, Keyuan Jiang

Abstract

Post-market surveillance, the practice of monitoring the safe use of pharmaceutical drugs is an important part of pharmacovigilance. Being able to collect personal experience related to pharmaceutical product use could help us gain insight into how the human body reacts to different medications. Twitter, a popular social media service, is being considered as an important alternative data source for collecting personal experience information with medications. Identifying personal experience tweets is a challenging classification task in natural language processing. In this study, we utilized three methods based on Facebook’s Robustly Optimized BERT Pretraining Approach (RoBERTa) to predict personal experience tweets related to medication use: the first one combines the pre-trained RoBERTa model with a classifier, the second combines the updated pre-trained RoBERTa model using a corpus of unlabeled tweets with a classifier, and the third combines the RoBERTa model that was trained with our unlabeled tweets from scratch with the classifier too. Our results show that all of these approaches outperform the published methods (Word Embedding + LSTM) in classification performance (p < 0.05), and updating the pre-trained language model with tweets related to medications could even improve the performance further.

Anthology ID:: 2020.louhi-1.14
Volume:: Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis
Month:: November
Year:: 2020
Address:: Online
Editors:: Eben Holderness, Antonio Jimeno Yepes, Alberto Lavelli, Anne-Lyse Minard, James Pustejovsky, Fabio Rinaldi
Venue:: Louhi
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 127–137
Language:
URL:: https://aclanthology.org/2020.louhi-1.14
DOI:: 10.18653/v1/2020.louhi-1.14
Bibkey:
Cite (ACL):: Minghao Zhu, Youzhe Song, Ge Jin, and Keyuan Jiang. 2020. Identifying Personal Experience Tweets of Medication Effects Using Pre-trained RoBERTa Language Model and Its Updating. In Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis, pages 127–137, Online. Association for Computational Linguistics.
Cite (Informal):: Identifying Personal Experience Tweets of Medication Effects Using Pre-trained RoBERTa Language Model and Its Updating (Zhu et al., Louhi 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.louhi-1.14.pdf
Video:: https://slideslive.com/38940051

PDF Cite Search Video