Arabic Dialect Identification with a Few Labeled Examples Using Generative Adversarial Networks

Mahmoud Yusuf, Marwan Torki, Nagwa El-Makky

Abstract

Given the challenges and complexities introduced while dealing with Dialect Arabic (DA) variations, Transformer based models, e.g., BERT, outperformed other models in dealing with the DA identification task. However, to fine-tune these models, a large corpus is required. Getting a large number high quality labeled examples for some Dialect Arabic classes is challenging and time-consuming. In this paper, we address the Dialect Arabic Identification task. We extend the transformer-based models, ARBERT and MARBERT, with unlabeled data in a generative adversarial setting using Semi-Supervised Generative Adversarial Networks (SS-GAN). Our model enabled producing high-quality embeddings for the Dialect Arabic examples and aided the model to better generalize for the downstream classification task given few labeled examples. Experimental results showed that our model reached better performance and faster convergence when only a few labeled examples are available.

Anthology ID:: 2022.aacl-main.16
Volume:: Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:: November
Year:: 2022
Address:: Online only
Editors:: Yulan He, Heng Ji, Sujian Li, Yang Liu, Chua-Hui Chang
Venues:: AACL | IJCNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 196–204
Language:
URL:: https://aclanthology.org/2022.aacl-main.16
DOI:
Bibkey:
Cite (ACL):: Mahmoud Yusuf, Marwan Torki, and Nagwa El-Makky. 2022. Arabic Dialect Identification with a Few Labeled Examples Using Generative Adversarial Networks. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 196–204, Online only. Association for Computational Linguistics.
Cite (Informal):: Arabic Dialect Identification with a Few Labeled Examples Using Generative Adversarial Networks (Yusuf et al., AACL-IJCNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.aacl-main.16.pdf

PDF Cite Search