Nothing Special   »   [go: up one dir, main page]

Skip to main content

Leveraging Linguistic Linked Data for Cross-Lingual Model Transfer in the Pharmaceutical Domain

  • Conference paper
  • First Online:
The Semantic Web – ISWC 2020 (ISWC 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12507))

Included in the following conference series:

Abstract

We describe the use of linguistic linked data to support a cross-lingual transfer framework for sentiment analysis in the pharmaceutical domain. The proposed system dynamically gathers translations from the Linked Open Data (LOD) cloud, particularly from Apertium RDF, in order to project a deep learning-based sentiment classifier from one language to another, thus enabling scalability and avoiding the need of model re-training when transferred across languages. We describe the whole pipeline traversed by the multilingual data, from their conversion into RDF based on a new dynamic and flexible transformation framework, through their linking and publication as linked data, and finally their exploitation in the particular use case. Based on experiments on projecting a sentiment classifier from English to Spanish, we demonstrate how linked data techniques are able to enhance the multilingual capabilities of a deep learning-based approach in a dynamic and scalable way, in a real application scenario from the pharmaceutical domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://linguistic-lod.org/llod-cloud.

  2. 2.

    https://www.w3.org/2016/05/ontolex/.

  3. 3.

    See https://www.w3.org/2016/05/ontolex/#core for a diagram and complete description of the OntoLex-lemon core module.

  4. 4.

    See the whole diagram of the vartrans module at https://www.w3.org/2016/05/ontolex/#variation-translation-vartrans.

  5. 5.

    https://www.apertium.org/.

  6. 6.

    http://wiki.apertium.org/wiki/Main_Page.

  7. 7.

    http://www.meta-net.eu/projects/METANET4U/.

  8. 8.

    http://linguistic.linkeddata.es/resource/id/apertium.

  9. 9.

    https://www.lexinfo.net/ontology/2.0/lexinfo.

  10. 10.

    https://www.pret-a-llod.eu/.

  11. 11.

    https://www.semalytix.com.

  12. 12.

    RWE is evidence for the effectiveness and safety of a drug product, gathered outside of the controlled settings of clinical trials, in order to demonstrate added value of a drug in terms of improvements in quality of life in specific patient populations.

  13. 13.

    https://www.w3.org/2015/09/bpmlod-reports/bilingual-dictionaries/.

  14. 14.

    The mapping is available as CSV and TSV in GitHub and open to comments and modification by the community. See https://github.com/sid-unizar/apertium-lexinfo-mapping.

  15. 15.

    http://ec.europa.eu/isa/actions/01-trusted-information-exchange/1-1action_en.htm.

  16. 16.

    Access to a testing SPARQL endpoint, as well as a number of example queries to the Apertium RDF v2.0 dataset, can be found at 10.6084/m9.figshare.12355358. A stable version of Apertium RDF v2.0 will be uploaded to http://linguistic.linkeddata.es/apertium/ and hosted by Universidad Politécnica de Madrid (UPM) as part of the Prêt-à-LLOD project and documented through https://lod-cloud.net/.

  17. 17.

    Trained on news text, available from https://drive.google.com/open?id=1GpyF2h0j8K5TKT7y7Aj0OyPgpFc8pMNS.

  18. 18.

    Trained on the PubMed Central corpus, available from http://bio.nlplab.org.

  19. 19.

    Trained on Wikipedia text, available from https://drive.google.com/open?id=1GpyF2h0j8K5TKT7y7Aj0OyPgpFc8pMNS.

  20. 20.

    Trained on the concatenation of the Scielo corpus and a medical subset of Wikipedia text, available from https://zenodo.org/record/2542722#.XeUOo5NKjUK.

  21. 21.

    Available from https://github.com/jbarnesspain/blse/tree/master/lexicons/bingliu.

  22. 22.

    The overlap between these resources amounts to 647 processed entries between Apertium and BingLiu, but only 54 between Apertium and Pharma, and only 12 between Pharma and BingLiu.

  23. 23.

    Accuracy is defined as the proportion of correct labels in all labels predicted by the model on the test set.

  24. 24.

    For Apertium, Pharma, and Bing Liu, Table 3 displays only the best-performing configurations of monolingual embeddings.

  25. 25.

    Despite not being exactly comparable due to non-parallel evaluation data, the classifiers resulting from the Task Extension setting differ by only 4.3 points in source vs. target language accuracy (0.816 vs. 0.773, respectively).

References

  1. Barnes, J., Klinger, R., Schulte im Walde, S.: Bilingual sentiment embeddings: joint projection of sentiment across languages. In: Proceedings of ACL (2018)

    Google Scholar 

  2. Chiarcos, C., Fäth, C., Ionov, M.: The ACoLi dictionary graph. In: Proceedings of LREC, pp. 3281–3290. ELRA, Marseille (2020)

    Google Scholar 

  3. Cimiano, P., Buitelaar, P., McCrae, J., Sintek, M.: LexInfo: a declarative model for the lexicon-ontology interface. J. Web Semant. 9(1), 29–51 (2011)

    Article  Google Scholar 

  4. Cimiano, P., Chiarcos, C., McCrae, J.P., Gracia, J.: Linguistic Linked Data: Representation Generation and Applications. Springer International Publishing, Switzerland (2020)

    Book  Google Scholar 

  5. Feng, Y., Wan, X.: Learning bilingual sentiment-specific word embeddings without cross-lingual supervision. In: Proceedings of NAACL:HLT (2019)

    Google Scholar 

  6. Forcada, M.L., et al.: Apertium: a free/open-source platform for rule-based machine translation. Mach. Transl. 25(2), 127–144 (2011)

    Article  Google Scholar 

  7. Francopoulo, G., et al.: Lexical Markup Framework (LMF) for NLP multilingual resources. In: Proceedings of the Workshop on Multilingual Language Resources and Interoperability, pp. 1–8, Sydney (2006)

    Google Scholar 

  8. Fäth, C., Chiarcos, C., Ebbrecht, B., Ionov, M.: Fintan - flexible, integrated transformation and annotation engineering. In: Proceedings of LREC (2020)

    Google Scholar 

  9. Gracia, J., Montiel-Ponsoda, E., Vila-Suero, D., Aguado-de Cea, G.: Enabling language resources to expose translations as linked data on the web. In: Proceedings of LREC, pp. 409–413 (2014)

    Google Scholar 

  10. Gracia, J., Villegas, M., Gómez-Pérez, A., Bel, N.: The apertium bilingual dictionaries on the web of data. Semant.Web 9(2), 231–240 (2018)

    Article  Google Scholar 

  11. Hartung, M., Orlikowski, M., Veríssimo, S.: Evaluating the impact of bilingual lexical resources on cross-lingual sentiment projection in the pharmaceutical domain. https://doi.org/10.5281/zenodo.3707940 (2020)

  12. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of KDD, pp. 168–177 (2004)

    Google Scholar 

  13. McCrae, J.P., Bosque-Gil, J., Gracia, J., Buitelaar, P., Cimiano, P.: The OntoLex-Lemon Model: development and applications. In: Proceedings of eLex 2017 Electronic lexicography in the 21st century, pp. 587–597 (2017)

    Google Scholar 

  14. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space, January 2013

    Google Scholar 

  15. Mogadala, A., Rettinger, A.: Bilingual word embeddings from parallel and non-parallel corpora for cross-language text classification. In: Proceedings of NAACL:HLT (2016)

    Google Scholar 

  16. SRIA-Editorial-Team: Strategic Research and Innovation Agenda for the Multilingual Digital Single Market. Technical report, Cracking the Language Barrier initiative (2016)

    Google Scholar 

  17. Søgaard, A., Vulic, I., Ruder, S., Faruqui, M.: Cross-lingual word embeddings. Morgan Claypool (2019)

    Google Scholar 

  18. Vila-Suero, D., Gómez-Pérez, A., Montiel-Ponsoda, E., Gracia, J., Aguado-de-Cea, G.: Publishing linked data on the web: the multilingual dimension. In: Buitelaar, P., Cimiano, P. (eds.) Towards the Multilingual Semantic Web, pp. 101–117. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43585-4_7

    Chapter  Google Scholar 

  19. Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing multilingual text analysis tools via robust projection across aligned corpora. In: Proceedings of HLT (2001)

    Google Scholar 

  20. Zennaki, O., Semmar, N., Besacier, L.: Inducing multilingual text analysis tools using bidirectional recurrent neural networks. In: Proceedings of COLING (2016)

    Google Scholar 

  21. Zhou, X., Wan, X., Xiao, J.: Cross-lingual sentiment classification with bilingual document representation learning. In: Proceedings of ACL (2016)

    Google Scholar 

Download references

Acknowledgements

This work was funded by the Prêt-à-LLOD project within the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 825182. This work is also based upon work from COST Action CA18209 – NexusLinguarum “European network for Web-centred linguistic data science”, supported by COST (European Cooperation in Science and Technology). It has been also partially supported by the Spanish projects TIN2016-78011-C4-3-R (AEI/FEDER, UE) and DGA/FEDER 2014–2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jorge Gracia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gracia, J. et al. (2020). Leveraging Linguistic Linked Data for Cross-Lingual Model Transfer in the Pharmaceutical Domain. In: Pan, J.Z., et al. The Semantic Web – ISWC 2020. ISWC 2020. Lecture Notes in Computer Science(), vol 12507. Springer, Cham. https://doi.org/10.1007/978-3-030-62466-8_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62466-8_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62465-1

  • Online ISBN: 978-3-030-62466-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics