Extracting Text Representations for Terms and Phrases in Technical Domains

Abstract

Extracting dense representations for terms and phrases is a task of great importance for knowledge discovery platforms targeting highly-technical fields. Dense representations are used as features for downstream components and have multiple applications ranging from ranking results in search to summarization. Common approaches to create dense representations include training domain-specific embeddings with self-supervised setups or using sentence encoder models trained over similarity tasks. In contrast to static embeddings, sentence encoders do not suffer from the out-of-vocabulary (OOV) problem, but impose significant computational costs. In this paper, we propose a fully unsupervised approach to text encoding that consists of training small character-based models with the objective of reconstructing large pre-trained embedding matrices. Models trained with this approach can not only match the quality of sentence encoders in technical domains, but are 5 times smaller and up to 10 times faster, even on high-end GPUs.

Anthology ID:: 2023.acl-industry.7
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Sunayana Sitaram, Beata Beigman Klebanov, Jason D Williams
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 61–70
Language:
URL:: https://aclanthology.org/2023.acl-industry.7
DOI:: 10.18653/v1/2023.acl-industry.7
Bibkey:
Cite (ACL):: Francesco Fusco and Diego Antognini. 2023. Extracting Text Representations for Terms and Phrases in Technical Domains. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 61–70, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Extracting Text Representations for Terms and Phrases in Technical Domains (Fusco & Antognini, ACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.acl-industry.7.pdf

PDF Cite Search