Abstract
We describe a lexical resource-based process for query translation of a domain-specific and multilingual academic search engine in psychology, PubPsych. PubPsych queries are diverse in language with a high amount of informational queries and technical terminology. We present an approach for translating queries into English, German, French, and Spanish. We build a quadrilingual lexicon with aligned terms in the four languages using MeSH, Wikipedia and Apertium as our main resources. Our results show that using the quadlexicon together with some simple translation rules, we can automatically translate 85% of translatable tokens in PubPsych queries with mean adequacy over all the translatable text of 1.4 when measured on a 3-point scale [0, 1, 2].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
German-speaking countries’ database for psychology: https://www.psyndex.de.
- 6.
We also calculated Krippendorff’s alpha and Cohen’s Kappa with the same results.
- 7.
This method additionally solves the problem of intra-query language shifts, since different tokens in the same query can be matched to different languages.
- 8.
CM, IT, SH, CT, SW, TI and AB fields in PubPsych.
- 9.
AGE, EV, PLOC, AU, ISBN, ISSN, PU, SEG, CS, JT, DB, PY, LA, DT and ID.
- 10.
- 11.
- 12.
- 13.
We used models WT0.5-100 or WT0.5-500 depending on the language. Refer to WikiTailor manual for more details http://cristinae.github.io/WikiTailor.
- 14.
- 15.
https://www.deepl.com, work took place as of 25th Jan. and 1st-2nd Feb. 2018.
References
Alastrué, R.P., Pérez-Llantada, C.: English as a Scientific and Research Language: Debates and Discourses. de Gruyter, Berlin (2015)
Amano, T., González-Varo, J.P., Sutherland, W.J.: Languages are still a major barrier to global science. PLoS Biol. 14(12), e2000933 (2016)
Aula, A., Kellar, M.: Multilingual search strategies. In: Conference on Human Factors in Computing Systems (CHI), pp. 3865–3870. ACM (2009)
Barrón-Cedeño, A., España-Bonet, C., Boldoba, J., Màrquez, L.: A factory of comparable corpora from Wikipedia. In: Proceedings of the 8th Workshop on Building and Using Comparable Corpora (BUCC), pp. 3–13, July 2015
Behnert, C.: Evaluation methods within the LibRank project. Working Paper, LibRank (2016)
Broder, A.: A taxonomy of web search. In: ACM Sigir Forum, vol. 36, pp. 3–10. ACM (2002)
Capstick, J., Diagne, A.K., Erbach, G., Uszkoreit, H.: MULINEX: multilingual web search and navigation. In: Proceedings of the 14th Twente Workshop on Language Technology (TWLT 14) (1998)
Chowdhury, G.: Introduction to Modern Information Retrieval. Facet, London (2010)
Cleverdon, C.: The Cranfield tests on index language devices. In: Aslib Proceedings, vol. 19, pp. 173–194. MCB UP Ltd. (1967)
Davis, P.M.: Information-seeking behavior of chemists: a transaction log analysis of referral URLs. J. Am. Soc. Inf. Sci. Tech. 55(4), 326–332 (2004)
Diekema, A.R.: Multilinguality in the digital library: a review. Electron. Libr. 30(2), 165–181 (2012)
Forcada, M.L., et al.: Apertium: a free/open-source platform for rule-based machine translation. Mach. Transl. 25(2), 127–144 (2011)
Hedlund, T., Airio, E., Keskustalo, H., Lehtokangas, R., Pirkola, A., Järvelin, K.: Dictionary-based cross-language information retrieval: learning experiences from CLEF 2000–2002. Inf. Retr. 7(1–2), 99–119 (2004)
Hienert, D.: User interests in German social science literature search: a large scale log analysis. In: Conference on Human Information Interaction & Retrieval (CHIIR), pp. 7–16. ACM (2017)
Islamaj Dogan, R., Murray, G.C., Névéol, A., Lu, Z.: Understanding PubMed\(\textregistered \)user search behavior through log analysis. Database 2009 (2009)
Ke, H.R., Kwakkelaar, R., Tai, Y.M., Chen, L.C.: Exploring behavior of e-journal users in science and technology: transaction log analysis of Elsevier’s sciencedirect onsite in Taiwan. Libr. Inf. Sci. Res. 24(3), 265–291 (2002)
Khabsa, M., Wu, Z., Giles, C.L.: Towards better understanding of academic search. In: Joint Conference on Digital Library (JCDL), pp. 111–114. ACM (2016)
Li, X., Schijvenaars, B.J., de Rijke, M.: Investigating queries and search failures in academic search. Inf. Process. Manag. 53(3), 666–683 (2017)
Luca, E.W.D., Hauke, S., Nürnberger, A., Schlechtweg, S.: MultiLexExplorer: combining multilingual web search with multilingual lexical resources. In: Proceedings of Combined Workshop on Language-Enabled Educational Technology and Development and Evaluation of Robust Spoken Dialogue Systems, pp. 17–21 (2006)
Mahoui, M., Cunningham, S.J.: Search behavior in a research-oriented digital library. In: Constantopoulos, P., Sølvberg, I.T. (eds.) ECDL 2001. LNCS, vol. 2163, pp. 13–24. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44796-2_2
McCarn, D.B., Leiter, J.: On-line services in medicine and beyond. Science 181(4097), 318–324 (1973)
Nzomo, P., Ajiferuke, I., Vaughan, L., McKenzie, P.: Multilingual information retrieval & use: perceptions and practices amongst bi/multilingual academic users. J. Acad. Libr. 42(5), 495–502 (2016)
Palotti, J., Hanbury, A., Müller, H., Kahn Jr., C.E.: How users search and what they search for in the medical domain. Inf. Ret. 19(1–2), 189–224 (2016)
Park, M., Lee, T.S.: A longitudinal study of information needs and search behaviors in science and technology: a query analysis. Electron. Libr. 34(1), 83–98 (2016)
Pontis, S., Blandford, A., Greifeneder, E., Attalla, H., Neal, D.: Keeping up to date: an academic researcher’s information journey. J. Am. Soc. Inf. Sci. Tech. 68(1), 22–35 (2017)
Ritchie, A., Teufel, S., Robertson, S.: Creating a test collection for citation-based IR experiments. In: HLT-NAACL 2006, pp. 391–398. ACL (2006)
Stiller, J., Gäde, M., Petras, V.: Ambiguity of queries and the challenges for query language detection. In: CLEF (2010)
Ture, F., Boschee, E.: Learning to translate: a query-specific combination approach for cross-lingual information retrieval. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 589–599. Association for Computational Linguistics, Doha (2014)
Uhl, M.: Survey on European psychology publication issues. Psychol. Sci. Q. 51(1), 19 (2009)
Vanopstal, K., Buysschaert, J., Laureys, G., Stichele, R.V.: Lost in PubMed. Factors influencing the success of medical information retrieval. Expert Syst. Appl. 40(10), 4106–4114 (2013)
Vassilakaki, E., Garoufallou, E., Johnson, F., Hartley, R.J.: An exploration of users’ needs for multilingual information retrieval and access. In: Garoufallou, E., Hartley, R.J., Gaitanou, P. (eds.) MTSR 2015. CCIS, vol. 544, pp. 249–258. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24129-6_22
Waeldin, S.: Results from the PubPsych launch survey: short report. ZPID Sci. Inf. Online 15(2), 3 (2015)
Yi, K., Beheshti, J., Cole, C., Leide, J.E., Large, A.: User search behavior of domain-specific information retrieval systems: an analysis of the query logs from PsycINFO and ABC-Clio’s historical abstracts-America: history and life: research articles. J. Am. Soc. Inf. Sci. Tech. 57(9), 1208–1220 (2006)
Yoo, I., Mosa, A.S.M.: Analysis of Pubmed user sessions using a full-day Pubmed query log: a comparison of experienced and nonexperienced PubMedusers. JMIR Med. Inf. 3(3), e25 (2015)
Acknowledgments
This research was supported by the Leibniz-Gemeinschaft under grant SAW-2016-ZPID-2.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
España-Bonet, C., Stiller, J., Ramthun, R., van Genabith, J., Petras, V. (2019). Query Translation for Cross-Lingual Search in the Academic Search Engine PubPsych. In: Garoufallou, E., Sartori, F., Siatri, R., Zervas, M. (eds) Metadata and Semantic Research. MTSR 2018. Communications in Computer and Information Science, vol 846. Springer, Cham. https://doi.org/10.1007/978-3-030-14401-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-14401-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14400-5
Online ISBN: 978-3-030-14401-2
eBook Packages: Computer ScienceComputer Science (R0)