Nothing Special   »   [go: up one dir, main page]

Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation

Fei Yuan, Yinquan Lu, Wenhao Zhu, Lingpeng Kong, Lei Li, Yu Qiao, Jingjing Xu


Abstract
Multilingual neural machine translation (MNMT) aims to build a unified model for many language directions. Existing monolithic models for MNMT encounter two challenges: parameter interference among languages and inefficient inference for large models. In this paper, we revisit the classic multi-way structures and develop a detachable model by assigning each language (or group of languages) to an individual branch that supports plug-and-play training and inference. To address the needs of learning representations for all languages in a unified space, we propose a novel efficient training recipe, upon which we build an effective detachable model, Lego-MT.For a fair comparison, we collect data from OPUS and build a translation benchmark covering 433 languages and 1.3B parallel data. Experiments show that Lego-MT with 1.2B parameters brings an average gain of 3.2 spBLEU. It even outperforms M2M-100 with 12B parameters. The proposed training recipe brings a 28.2× speedup over the conventional multi-way training method.code and data repo: https://github.com/CONE-MT/Lego-MT.git.
Anthology ID:
2023.findings-acl.731
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11518–11533
Language:
URL:
https://aclanthology.org/2023.findings-acl.731
DOI:
10.18653/v1/2023.findings-acl.731
Bibkey:
Cite (ACL):
Fei Yuan, Yinquan Lu, Wenhao Zhu, Lingpeng Kong, Lei Li, Yu Qiao, and Jingjing Xu. 2023. Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 11518–11533, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation (Yuan et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.731.pdf
Video:
 https://aclanthology.org/2023.findings-acl.731.mp4