Leveraging Linguistic Linked Data for Cross-Lingual Model Transfer in the Pharmaceutical Domain

Jorge Gracia¹⁶,
Christian Fäth¹⁷,
Matthias Hartung¹⁸,
Max Ionov¹⁷,
Julia Bosque-Gil¹⁶,
Susana Veríssimo¹⁸,
Christian Chiarcos¹⁷ &
…
Matthias Orlikowski¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12507))

Included in the following conference series:

International Semantic Web Conference

3555 Accesses
2 Citations

Abstract

We describe the use of linguistic linked data to support a cross-lingual transfer framework for sentiment analysis in the pharmaceutical domain. The proposed system dynamically gathers translations from the Linked Open Data (LOD) cloud, particularly from Apertium RDF, in order to project a deep learning-based sentiment classifier from one language to another, thus enabling scalability and avoiding the need of model re-training when transferred across languages. We describe the whole pipeline traversed by the multilingual data, from their conversion into RDF based on a new dynamic and flexible transformation framework, through their linking and publication as linked data, and finally their exploitation in the particular use case. Based on experiments on projecting a sentiment classifier from English to Spanish, we demonstrate how linked data techniques are able to enhance the multilingual capabilities of a deep learning-based approach in a dynamic and scalable way, in a real application scenario from the pharmaceutical domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of Consumer Reviews

Cross-lingual sentiment transfer with limited resources

Article 15 November 2017

Extracting Multilingual Relations with Joint Learning of Language Models

Notes

1.
http://linguistic-lod.org/llod-cloud.
2.
https://www.w3.org/2016/05/ontolex/.
3.
See https://www.w3.org/2016/05/ontolex/#core for a diagram and complete description of the OntoLex-lemon core module.
4.
See the whole diagram of the vartrans module at https://www.w3.org/2016/05/ontolex/#variation-translation-vartrans.
5.
https://www.apertium.org/.
6.
http://wiki.apertium.org/wiki/Main_Page.
7.
http://www.meta-net.eu/projects/METANET4U/.
8.
http://linguistic.linkeddata.es/resource/id/apertium.
9.
https://www.lexinfo.net/ontology/2.0/lexinfo.
10.
https://www.pret-a-llod.eu/.
11.
https://www.semalytix.com.
12.
RWE is evidence for the effectiveness and safety of a drug product, gathered outside of the controlled settings of clinical trials, in order to demonstrate added value of a drug in terms of improvements in quality of life in specific patient populations.
13.
https://www.w3.org/2015/09/bpmlod-reports/bilingual-dictionaries/.
14.
The mapping is available as CSV and TSV in GitHub and open to comments and modification by the community. See https://github.com/sid-unizar/apertium-lexinfo-mapping.
15.
http://ec.europa.eu/isa/actions/01-trusted-information-exchange/1-1action_en.htm.
16.
Access to a testing SPARQL endpoint, as well as a number of example queries to the Apertium RDF v2.0 dataset, can be found at 10.6084/m9.figshare.12355358. A stable version of Apertium RDF v2.0 will be uploaded to http://linguistic.linkeddata.es/apertium/ and hosted by Universidad Politécnica de Madrid (UPM) as part of the Prêt-à-LLOD project and documented through https://lod-cloud.net/.
17.
Trained on news text, available from https://drive.google.com/open?id=1GpyF2h0j8K5TKT7y7Aj0OyPgpFc8pMNS.
18.
Trained on the PubMed Central corpus, available from http://bio.nlplab.org.
19.
Trained on Wikipedia text, available from https://drive.google.com/open?id=1GpyF2h0j8K5TKT7y7Aj0OyPgpFc8pMNS.
20.
Trained on the concatenation of the Scielo corpus and a medical subset of Wikipedia text, available from https://zenodo.org/record/2542722#.XeUOo5NKjUK.
21.
Available from https://github.com/jbarnesspain/blse/tree/master/lexicons/bingliu.
22.
The overlap between these resources amounts to 647 processed entries between Apertium and BingLiu, but only 54 between Apertium and Pharma, and only 12 between Pharma and BingLiu.
23.
Accuracy is defined as the proportion of correct labels in all labels predicted by the model on the test set.
24.
For Apertium, Pharma, and Bing Liu, Table 3 displays only the best-performing configurations of monolingual embeddings.
25.
Despite not being exactly comparable due to non-parallel evaluation data, the classifiers resulting from the Task Extension setting differ by only 4.3 points in source vs. target language accuracy (0.816 vs. 0.773, respectively).

References

Barnes, J., Klinger, R., Schulte im Walde, S.: Bilingual sentiment embeddings: joint projection of sentiment across languages. In: Proceedings of ACL (2018)
Google Scholar
Chiarcos, C., Fäth, C., Ionov, M.: The ACoLi dictionary graph. In: Proceedings of LREC, pp. 3281–3290. ELRA, Marseille (2020)
Google Scholar
Cimiano, P., Buitelaar, P., McCrae, J., Sintek, M.: LexInfo: a declarative model for the lexicon-ontology interface. J. Web Semant. 9(1), 29–51 (2011)
Article Google Scholar
Cimiano, P., Chiarcos, C., McCrae, J.P., Gracia, J.: Linguistic Linked Data: Representation Generation and Applications. Springer International Publishing, Switzerland (2020)
Book Google Scholar
Feng, Y., Wan, X.: Learning bilingual sentiment-specific word embeddings without cross-lingual supervision. In: Proceedings of NAACL:HLT (2019)
Google Scholar
Forcada, M.L., et al.: Apertium: a free/open-source platform for rule-based machine translation. Mach. Transl. 25(2), 127–144 (2011)
Article Google Scholar
Francopoulo, G., et al.: Lexical Markup Framework (LMF) for NLP multilingual resources. In: Proceedings of the Workshop on Multilingual Language Resources and Interoperability, pp. 1–8, Sydney (2006)
Google Scholar
Fäth, C., Chiarcos, C., Ebbrecht, B., Ionov, M.: Fintan - flexible, integrated transformation and annotation engineering. In: Proceedings of LREC (2020)
Google Scholar
Gracia, J., Montiel-Ponsoda, E., Vila-Suero, D., Aguado-de Cea, G.: Enabling language resources to expose translations as linked data on the web. In: Proceedings of LREC, pp. 409–413 (2014)
Google Scholar
Gracia, J., Villegas, M., Gómez-Pérez, A., Bel, N.: The apertium bilingual dictionaries on the web of data. Semant.Web 9(2), 231–240 (2018)
Article Google Scholar
Hartung, M., Orlikowski, M., Veríssimo, S.: Evaluating the impact of bilingual lexical resources on cross-lingual sentiment projection in the pharmaceutical domain. https://doi.org/10.5281/zenodo.3707940 (2020)
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of KDD, pp. 168–177 (2004)
Google Scholar
McCrae, J.P., Bosque-Gil, J., Gracia, J., Buitelaar, P., Cimiano, P.: The OntoLex-Lemon Model: development and applications. In: Proceedings of eLex 2017 Electronic lexicography in the 21st century, pp. 587–597 (2017)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space, January 2013
Google Scholar
Mogadala, A., Rettinger, A.: Bilingual word embeddings from parallel and non-parallel corpora for cross-language text classification. In: Proceedings of NAACL:HLT (2016)
Google Scholar
SRIA-Editorial-Team: Strategic Research and Innovation Agenda for the Multilingual Digital Single Market. Technical report, Cracking the Language Barrier initiative (2016)
Google Scholar
Søgaard, A., Vulic, I., Ruder, S., Faruqui, M.: Cross-lingual word embeddings. Morgan Claypool (2019)
Google Scholar
Vila-Suero, D., Gómez-Pérez, A., Montiel-Ponsoda, E., Gracia, J., Aguado-de-Cea, G.: Publishing linked data on the web: the multilingual dimension. In: Buitelaar, P., Cimiano, P. (eds.) Towards the Multilingual Semantic Web, pp. 101–117. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43585-4_7
Chapter Google Scholar
Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing multilingual text analysis tools via robust projection across aligned corpora. In: Proceedings of HLT (2001)
Google Scholar
Zennaki, O., Semmar, N., Besacier, L.: Inducing multilingual text analysis tools using bidirectional recurrent neural networks. In: Proceedings of COLING (2016)
Google Scholar
Zhou, X., Wan, X., Xiao, J.: Cross-lingual sentiment classification with bilingual document representation learning. In: Proceedings of ACL (2016)
Google Scholar

Download references

Acknowledgements

This work was funded by the Prêt-à-LLOD project within the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 825182. This work is also based upon work from COST Action CA18209 – NexusLinguarum “European network for Web-centred linguistic data science”, supported by COST (European Cooperation in Science and Technology). It has been also partially supported by the Spanish projects TIN2016-78011-C4-3-R (AEI/FEDER, UE) and DGA/FEDER 2014–2020.

Author information

Authors and Affiliations

Aragon Institute of Engineering Research, University of Zaragoza, Zaragoza, Spain
Jorge Gracia & Julia Bosque-Gil
Goethe University Frankfurt, Frankfurt, Germany
Christian Fäth, Max Ionov & Christian Chiarcos
Semalytix GmbH, Bielefeld, Germany
Matthias Hartung, Susana Veríssimo & Matthias Orlikowski

Authors

Jorge Gracia
View author publications
You can also search for this author in PubMed Google Scholar
Christian Fäth
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Hartung
View author publications
You can also search for this author in PubMed Google Scholar
Max Ionov
View author publications
You can also search for this author in PubMed Google Scholar
Julia Bosque-Gil
View author publications
You can also search for this author in PubMed Google Scholar
Susana Veríssimo
View author publications
You can also search for this author in PubMed Google Scholar
Christian Chiarcos
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Orlikowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jorge Gracia .

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, UK
Jeff Z. Pan
University of Liverpool, Liverpool, UK
Valentina Tamma
University of Bari, Bari, Italy
Claudia d’Amato
University of California, Santa Barbara, Santa Barbara, CA, USA
Krzysztof Janowicz
California State University, Long Beach, Long Beach, CA, USA
Bo Fu
Vienna University of Economics and Business, Vienna, Austria
Axel Polleres
Rensselaer Polytechnic Institute, Troy, NY, USA
Oshani Seneviratne
Massachusetts Institute of Technology, Cambridge, MA, USA
Lalana Kagal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gracia, J. et al. (2020). Leveraging Linguistic Linked Data for Cross-Lingual Model Transfer in the Pharmaceutical Domain. In: Pan, J.Z., et al. The Semantic Web – ISWC 2020. ISWC 2020. Lecture Notes in Computer Science(), vol 12507. Springer, Cham. https://doi.org/10.1007/978-3-030-62466-8_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-62466-8_31
Published: 01 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62465-1
Online ISBN: 978-3-030-62466-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the Semantic Web Science Association (opens in a new tab)

Leveraging Linguistic Linked Data for Cross-Lingual Model Transfer in the Pharmaceutical Domain

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of Consumer Reviews

Cross-lingual sentiment transfer with limited resources

Extracting Multilingual Relations with Joint Learning of Language Models

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Leveraging Linguistic Linked Data for Cross-Lingual Model Transfer in the Pharmaceutical Domain

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of Consumer Reviews

Cross-lingual sentiment transfer with limited resources

Extracting Multilingual Relations with Joint Learning of Language Models

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation