Nothing Special   »   [go: up one dir, main page]

Skip to main content

Query Translation for Cross-Lingual Search in the Academic Search Engine PubPsych

  • Conference paper
  • First Online:
Metadata and Semantic Research (MTSR 2018)

Abstract

We describe a lexical resource-based process for query translation of a domain-specific and multilingual academic search engine in psychology, PubPsych. PubPsych queries are diverse in language with a high amount of informational queries and technical terminology. We present an approach for translating queries into English, German, French, and Spanish. We build a quadrilingual lexicon with aligned terms in the four languages using MeSH, Wikipedia and Apertium as our main resources. Our results show that using the quadlexicon together with some simple translation rules, we can automatically translate 85% of translatable tokens in PubPsych queries with mean adequacy over all the translatable text of 1.4 when measured on a 3-point scale [0, 1, 2].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.clubs-project.eu.

  2. 2.

    https://www.pubpsych.eu.

  3. 3.

    https://www.nlm.nih.gov/bsd/medline.html.

  4. 4.

    A median of 2 queries was issued over all sessions. PubPsych’s mean query length (3.6 tokens for simple, 4.9 for advanced search) was comparable to other reported numbers (e.g. PubMed 3.5 [15], Citeseer 4.8 [17], ScienceDirect 3.8 [18]).

  5. 5.

    German-speaking countries’ database for psychology: https://www.psyndex.de.

  6. 6.

    We also calculated Krippendorff’s alpha and Cohen’s Kappa with the same results.

  7. 7.

    This method additionally solves the problem of intra-query language shifts, since different tokens in the same query can be matched to different languages.

  8. 8.

    CM, IT, SH, CT, SW, TI and AB fields in PubPsych.

  9. 9.

    AGE, EV, PLOC, AU, ISBN, ISSN, PU, SEG, CS, JT, DB, PY, LA, DT and ID.

  10. 10.

    https://www.nlm.nih.gov/mesh.

  11. 11.

    https://github.com/clubs-project/MeSHMerger.

  12. 12.

    https://github.com/cristinae/WikiTailor.

  13. 13.

    We used models WT0.5-100 or WT0.5-500 depending on the language. Refer to WikiTailor manual for more details http://cristinae.github.io/WikiTailor.

  14. 14.

    http://wiki.apertium.org/wiki/List_of_dictionaries.

  15. 15.

    https://www.deepl.com, work took place as of 25th Jan. and 1st-2nd Feb. 2018.

References

  1. Alastrué, R.P., Pérez-Llantada, C.: English as a Scientific and Research Language: Debates and Discourses. de Gruyter, Berlin (2015)

    Book  Google Scholar 

  2. Amano, T., González-Varo, J.P., Sutherland, W.J.: Languages are still a major barrier to global science. PLoS Biol. 14(12), e2000933 (2016)

    Article  Google Scholar 

  3. Aula, A., Kellar, M.: Multilingual search strategies. In: Conference on Human Factors in Computing Systems (CHI), pp. 3865–3870. ACM (2009)

    Google Scholar 

  4. Barrón-Cedeño, A., España-Bonet, C., Boldoba, J., Màrquez, L.: A factory of comparable corpora from Wikipedia. In: Proceedings of the 8th Workshop on Building and Using Comparable Corpora (BUCC), pp. 3–13, July 2015

    Google Scholar 

  5. Behnert, C.: Evaluation methods within the LibRank project. Working Paper, LibRank (2016)

    Google Scholar 

  6. Broder, A.: A taxonomy of web search. In: ACM Sigir Forum, vol. 36, pp. 3–10. ACM (2002)

    Google Scholar 

  7. Capstick, J., Diagne, A.K., Erbach, G., Uszkoreit, H.: MULINEX: multilingual web search and navigation. In: Proceedings of the 14th Twente Workshop on Language Technology (TWLT 14) (1998)

    Google Scholar 

  8. Chowdhury, G.: Introduction to Modern Information Retrieval. Facet, London (2010)

    Google Scholar 

  9. Cleverdon, C.: The Cranfield tests on index language devices. In: Aslib Proceedings, vol. 19, pp. 173–194. MCB UP Ltd. (1967)

    Google Scholar 

  10. Davis, P.M.: Information-seeking behavior of chemists: a transaction log analysis of referral URLs. J. Am. Soc. Inf. Sci. Tech. 55(4), 326–332 (2004)

    Article  Google Scholar 

  11. Diekema, A.R.: Multilinguality in the digital library: a review. Electron. Libr. 30(2), 165–181 (2012)

    Article  Google Scholar 

  12. Forcada, M.L., et al.: Apertium: a free/open-source platform for rule-based machine translation. Mach. Transl. 25(2), 127–144 (2011)

    Article  Google Scholar 

  13. Hedlund, T., Airio, E., Keskustalo, H., Lehtokangas, R., Pirkola, A., Järvelin, K.: Dictionary-based cross-language information retrieval: learning experiences from CLEF 2000–2002. Inf. Retr. 7(1–2), 99–119 (2004)

    Article  Google Scholar 

  14. Hienert, D.: User interests in German social science literature search: a large scale log analysis. In: Conference on Human Information Interaction & Retrieval (CHIIR), pp. 7–16. ACM (2017)

    Google Scholar 

  15. Islamaj Dogan, R., Murray, G.C., Névéol, A., Lu, Z.: Understanding PubMed\(\textregistered \)user search behavior through log analysis. Database 2009 (2009)

    Google Scholar 

  16. Ke, H.R., Kwakkelaar, R., Tai, Y.M., Chen, L.C.: Exploring behavior of e-journal users in science and technology: transaction log analysis of Elsevier’s sciencedirect onsite in Taiwan. Libr. Inf. Sci. Res. 24(3), 265–291 (2002)

    Article  Google Scholar 

  17. Khabsa, M., Wu, Z., Giles, C.L.: Towards better understanding of academic search. In: Joint Conference on Digital Library (JCDL), pp. 111–114. ACM (2016)

    Google Scholar 

  18. Li, X., Schijvenaars, B.J., de Rijke, M.: Investigating queries and search failures in academic search. Inf. Process. Manag. 53(3), 666–683 (2017)

    Article  Google Scholar 

  19. Luca, E.W.D., Hauke, S., Nürnberger, A., Schlechtweg, S.: MultiLexExplorer: combining multilingual web search with multilingual lexical resources. In: Proceedings of Combined Workshop on Language-Enabled Educational Technology and Development and Evaluation of Robust Spoken Dialogue Systems, pp. 17–21 (2006)

    Google Scholar 

  20. Mahoui, M., Cunningham, S.J.: Search behavior in a research-oriented digital library. In: Constantopoulos, P., Sølvberg, I.T. (eds.) ECDL 2001. LNCS, vol. 2163, pp. 13–24. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44796-2_2

    Chapter  MATH  Google Scholar 

  21. McCarn, D.B., Leiter, J.: On-line services in medicine and beyond. Science 181(4097), 318–324 (1973)

    Article  Google Scholar 

  22. Nzomo, P., Ajiferuke, I., Vaughan, L., McKenzie, P.: Multilingual information retrieval & use: perceptions and practices amongst bi/multilingual academic users. J. Acad. Libr. 42(5), 495–502 (2016)

    Article  Google Scholar 

  23. Palotti, J., Hanbury, A., Müller, H., Kahn Jr., C.E.: How users search and what they search for in the medical domain. Inf. Ret. 19(1–2), 189–224 (2016)

    Article  Google Scholar 

  24. Park, M., Lee, T.S.: A longitudinal study of information needs and search behaviors in science and technology: a query analysis. Electron. Libr. 34(1), 83–98 (2016)

    Article  Google Scholar 

  25. Pontis, S., Blandford, A., Greifeneder, E., Attalla, H., Neal, D.: Keeping up to date: an academic researcher’s information journey. J. Am. Soc. Inf. Sci. Tech. 68(1), 22–35 (2017)

    Article  Google Scholar 

  26. Ritchie, A., Teufel, S., Robertson, S.: Creating a test collection for citation-based IR experiments. In: HLT-NAACL 2006, pp. 391–398. ACL (2006)

    Google Scholar 

  27. Stiller, J., Gäde, M., Petras, V.: Ambiguity of queries and the challenges for query language detection. In: CLEF (2010)

    Google Scholar 

  28. Ture, F., Boschee, E.: Learning to translate: a query-specific combination approach for cross-lingual information retrieval. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 589–599. Association for Computational Linguistics, Doha (2014)

    Google Scholar 

  29. Uhl, M.: Survey on European psychology publication issues. Psychol. Sci. Q. 51(1), 19 (2009)

    Google Scholar 

  30. Vanopstal, K., Buysschaert, J., Laureys, G., Stichele, R.V.: Lost in PubMed. Factors influencing the success of medical information retrieval. Expert Syst. Appl. 40(10), 4106–4114 (2013)

    Article  Google Scholar 

  31. Vassilakaki, E., Garoufallou, E., Johnson, F., Hartley, R.J.: An exploration of users’ needs for multilingual information retrieval and access. In: Garoufallou, E., Hartley, R.J., Gaitanou, P. (eds.) MTSR 2015. CCIS, vol. 544, pp. 249–258. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24129-6_22

    Chapter  Google Scholar 

  32. Waeldin, S.: Results from the PubPsych launch survey: short report. ZPID Sci. Inf. Online 15(2), 3 (2015)

    Google Scholar 

  33. Yi, K., Beheshti, J., Cole, C., Leide, J.E., Large, A.: User search behavior of domain-specific information retrieval systems: an analysis of the query logs from PsycINFO and ABC-Clio’s historical abstracts-America: history and life: research articles. J. Am. Soc. Inf. Sci. Tech. 57(9), 1208–1220 (2006)

    Article  Google Scholar 

  34. Yoo, I., Mosa, A.S.M.: Analysis of Pubmed user sessions using a full-day Pubmed query log: a comparison of experienced and nonexperienced PubMedusers. JMIR Med. Inf. 3(3), e25 (2015)

    Article  Google Scholar 

Download references

Acknowledgments

This research was supported by the Leibniz-Gemeinschaft under grant SAW-2016-ZPID-2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristina España-Bonet .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

España-Bonet, C., Stiller, J., Ramthun, R., van Genabith, J., Petras, V. (2019). Query Translation for Cross-Lingual Search in the Academic Search Engine PubPsych. In: Garoufallou, E., Sartori, F., Siatri, R., Zervas, M. (eds) Metadata and Semantic Research. MTSR 2018. Communications in Computer and Information Science, vol 846. Springer, Cham. https://doi.org/10.1007/978-3-030-14401-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-14401-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-14400-5

  • Online ISBN: 978-3-030-14401-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics