Abstract
We describe the use of linguistic linked data to support a cross-lingual transfer framework for sentiment analysis in the pharmaceutical domain. The proposed system dynamically gathers translations from the Linked Open Data (LOD) cloud, particularly from Apertium RDF, in order to project a deep learning-based sentiment classifier from one language to another, thus enabling scalability and avoiding the need of model re-training when transferred across languages. We describe the whole pipeline traversed by the multilingual data, from their conversion into RDF based on a new dynamic and flexible transformation framework, through their linking and publication as linked data, and finally their exploitation in the particular use case. Based on experiments on projecting a sentiment classifier from English to Spanish, we demonstrate how linked data techniques are able to enhance the multilingual capabilities of a deep learning-based approach in a dynamic and scalable way, in a real application scenario from the pharmaceutical domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
See https://www.w3.org/2016/05/ontolex/#core for a diagram and complete description of the OntoLex-lemon core module.
- 4.
See the whole diagram of the vartrans module at https://www.w3.org/2016/05/ontolex/#variation-translation-vartrans.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
RWE is evidence for the effectiveness and safety of a drug product, gathered outside of the controlled settings of clinical trials, in order to demonstrate added value of a drug in terms of improvements in quality of life in specific patient populations.
- 13.
- 14.
The mapping is available as CSV and TSV in GitHub and open to comments and modification by the community. See https://github.com/sid-unizar/apertium-lexinfo-mapping.
- 15.
- 16.
Access to a testing SPARQL endpoint, as well as a number of example queries to the Apertium RDF v2.0 dataset, can be found at 10.6084/m9.figshare.12355358. A stable version of Apertium RDF v2.0 will be uploaded to http://linguistic.linkeddata.es/apertium/ and hosted by Universidad Politécnica de Madrid (UPM) as part of the Prêt-à-LLOD project and documented through https://lod-cloud.net/.
- 17.
Trained on news text, available from https://drive.google.com/open?id=1GpyF2h0j8K5TKT7y7Aj0OyPgpFc8pMNS.
- 18.
Trained on the PubMed Central corpus, available from http://bio.nlplab.org.
- 19.
Trained on Wikipedia text, available from https://drive.google.com/open?id=1GpyF2h0j8K5TKT7y7Aj0OyPgpFc8pMNS.
- 20.
Trained on the concatenation of the Scielo corpus and a medical subset of Wikipedia text, available from https://zenodo.org/record/2542722#.XeUOo5NKjUK.
- 21.
- 22.
The overlap between these resources amounts to 647 processed entries between Apertium and BingLiu, but only 54 between Apertium and Pharma, and only 12 between Pharma and BingLiu.
- 23.
Accuracy is defined as the proportion of correct labels in all labels predicted by the model on the test set.
- 24.
For Apertium, Pharma, and Bing Liu, Table 3 displays only the best-performing configurations of monolingual embeddings.
- 25.
Despite not being exactly comparable due to non-parallel evaluation data, the classifiers resulting from the Task Extension setting differ by only 4.3 points in source vs. target language accuracy (0.816 vs. 0.773, respectively).
References
Barnes, J., Klinger, R., Schulte im Walde, S.: Bilingual sentiment embeddings: joint projection of sentiment across languages. In: Proceedings of ACL (2018)
Chiarcos, C., Fäth, C., Ionov, M.: The ACoLi dictionary graph. In: Proceedings of LREC, pp. 3281–3290. ELRA, Marseille (2020)
Cimiano, P., Buitelaar, P., McCrae, J., Sintek, M.: LexInfo: a declarative model for the lexicon-ontology interface. J. Web Semant. 9(1), 29–51 (2011)
Cimiano, P., Chiarcos, C., McCrae, J.P., Gracia, J.: Linguistic Linked Data: Representation Generation and Applications. Springer International Publishing, Switzerland (2020)
Feng, Y., Wan, X.: Learning bilingual sentiment-specific word embeddings without cross-lingual supervision. In: Proceedings of NAACL:HLT (2019)
Forcada, M.L., et al.: Apertium: a free/open-source platform for rule-based machine translation. Mach. Transl. 25(2), 127–144 (2011)
Francopoulo, G., et al.: Lexical Markup Framework (LMF) for NLP multilingual resources. In: Proceedings of the Workshop on Multilingual Language Resources and Interoperability, pp. 1–8, Sydney (2006)
Fäth, C., Chiarcos, C., Ebbrecht, B., Ionov, M.: Fintan - flexible, integrated transformation and annotation engineering. In: Proceedings of LREC (2020)
Gracia, J., Montiel-Ponsoda, E., Vila-Suero, D., Aguado-de Cea, G.: Enabling language resources to expose translations as linked data on the web. In: Proceedings of LREC, pp. 409–413 (2014)
Gracia, J., Villegas, M., Gómez-Pérez, A., Bel, N.: The apertium bilingual dictionaries on the web of data. Semant.Web 9(2), 231–240 (2018)
Hartung, M., Orlikowski, M., Veríssimo, S.: Evaluating the impact of bilingual lexical resources on cross-lingual sentiment projection in the pharmaceutical domain. https://doi.org/10.5281/zenodo.3707940 (2020)
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of KDD, pp. 168–177 (2004)
McCrae, J.P., Bosque-Gil, J., Gracia, J., Buitelaar, P., Cimiano, P.: The OntoLex-Lemon Model: development and applications. In: Proceedings of eLex 2017 Electronic lexicography in the 21st century, pp. 587–597 (2017)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space, January 2013
Mogadala, A., Rettinger, A.: Bilingual word embeddings from parallel and non-parallel corpora for cross-language text classification. In: Proceedings of NAACL:HLT (2016)
SRIA-Editorial-Team: Strategic Research and Innovation Agenda for the Multilingual Digital Single Market. Technical report, Cracking the Language Barrier initiative (2016)
Søgaard, A., Vulic, I., Ruder, S., Faruqui, M.: Cross-lingual word embeddings. Morgan Claypool (2019)
Vila-Suero, D., Gómez-Pérez, A., Montiel-Ponsoda, E., Gracia, J., Aguado-de-Cea, G.: Publishing linked data on the web: the multilingual dimension. In: Buitelaar, P., Cimiano, P. (eds.) Towards the Multilingual Semantic Web, pp. 101–117. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43585-4_7
Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing multilingual text analysis tools via robust projection across aligned corpora. In: Proceedings of HLT (2001)
Zennaki, O., Semmar, N., Besacier, L.: Inducing multilingual text analysis tools using bidirectional recurrent neural networks. In: Proceedings of COLING (2016)
Zhou, X., Wan, X., Xiao, J.: Cross-lingual sentiment classification with bilingual document representation learning. In: Proceedings of ACL (2016)
Acknowledgements
This work was funded by the Prêt-à-LLOD project within the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 825182. This work is also based upon work from COST Action CA18209 – NexusLinguarum “European network for Web-centred linguistic data science”, supported by COST (European Cooperation in Science and Technology). It has been also partially supported by the Spanish projects TIN2016-78011-C4-3-R (AEI/FEDER, UE) and DGA/FEDER 2014–2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Gracia, J. et al. (2020). Leveraging Linguistic Linked Data for Cross-Lingual Model Transfer in the Pharmaceutical Domain. In: Pan, J.Z., et al. The Semantic Web – ISWC 2020. ISWC 2020. Lecture Notes in Computer Science(), vol 12507. Springer, Cham. https://doi.org/10.1007/978-3-030-62466-8_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-62466-8_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62465-1
Online ISBN: 978-3-030-62466-8
eBook Packages: Computer ScienceComputer Science (R0)