Nothing Special   »   [go: up one dir, main page]

Arabic Dialect Identification and Sentiment Classification using Transformer-based Models

Joseph Attieh, Fadi Hassan


Abstract
In this paper, we present two deep learning approaches that are based on AraBERT, submitted to the Nuanced Arabic Dialect Identification (NADI) shared task of the Seventh Workshop for Arabic Natural Language Processing (WANLP 2022). NADI consists of two main sub-tasks, mainly country-level dialect and sentiment identification for dialectical Arabic. We present one system per sub-task. The first system is a multi-task learning model that consists of a shared AraBERT encoder with three task-specific classification layers. This model is trained to jointly learn the country-level dialect of the tweet as well as the region-level and area-level dialects. The second system is a distilled model of an ensemble of models trained using K-fold cross-validation. Each model in the ensemble consists of an AraBERT model and a classifier, fine-tuned on (K-1) folds of the training set. Our team Pythoneers achieved rank 6 on the first test set of the first sub-task, rank 9 on the second test set of the first sub-task, and rank 4 on the test set of the second sub-task.
Anthology ID:
2022.wanlp-1.54
Volume:
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Houda Bouamor, Hend Al-Khalifa, Kareem Darwish, Owen Rambow, Fethi Bougares, Ahmed Abdelali, Nadi Tomeh, Salam Khalifa, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
485–490
Language:
URL:
https://aclanthology.org/2022.wanlp-1.54
DOI:
10.18653/v1/2022.wanlp-1.54
Bibkey:
Cite (ACL):
Joseph Attieh and Fadi Hassan. 2022. Arabic Dialect Identification and Sentiment Classification using Transformer-based Models. In Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP), pages 485–490, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Arabic Dialect Identification and Sentiment Classification using Transformer-based Models (Attieh & Hassan, WANLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.wanlp-1.54.pdf