research-article

Type Linking for Query Understanding and Semantic Search

Authors:

Giorgos Stoilos,

Nikos Papasarantopoulos,

Pavlos Vougiouklis,

Patrik BanskyAuthors Info & Claims

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 3931 - 3940

https://doi.org/10.1145/3534678.3539067

Published: 14 August 2022 Publication History

Abstract

Huawei is currently undertaking an effort to build map and web search services using query understanding and semantic search techniques. We present our efforts to built a low-latency type mention detection and linking service for map search. In addition to latency challenges, we only had access to low quality and biased training data plus we had to support 13 languages. Consequently, our service is based mostly on unsupervised term- and vector-based methods. Nevertheless, we trained a Transformer-based query tagger which we integrated with the rest of the pipeline using a reward and penalisation approach. We present techniques that we designed in order to address challenges with the type dictionary, incompatibilities in scoring between the term-based and vector-based methods as well as over-segmentation issues in Thai, Chinese, and Japanese. We have evaluated our approach on the Huawei map search use case as well as on community Question Answering benchmarks.

References

[1]

Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France. OpenReview.net.

[2]

Krisztian Balog. 2013. Semistructured Data Search. In Bridging Between Information Retrieval and Databases - PROMISE Winter School 2013, Bressanone, Italy, February 4--8, 2013. Revised Tutorial Lectures, Nicola Ferro (Ed.), Vol. 8173. 74--96.

[3]

Roi Blanco, Berkant Barla Cambazoglu, Peter Mika, and Nicolas Torzec. 2013. Entity Recommendations in Web Search. In Proceedings of the 12th International Semantic Web Conference, (ISWC), Part II, Vol. 8219. 33--48.

Digital Library

[4]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomás Mikolov. 2016. Enriching Word Vectors with Subword Information. CoRR abs/1607.04606 (2016). arXiv:1607.04606 http://arxiv.org/abs/1607.04606

[5]

Helen Craig, Dragomir Yankov, Renzhong Wang, Pavel Berkhin, and Wei Wu. 2019. Scaling Address Parsing Sequence Models through Active Learning. In Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL. ACM, 424--427.

Digital Library

[6]

ONNX Runtime developers. 2021. ONNX Runtime. https://onnxruntime.ai/.

[7]

Dennis Diefenbach, Andreas Both, Kamal Singh, and Pierre Maret. 2020. Towards a question answering system over the Semantic Web. Semantic Web 11, 3 (2020), 421--439.

Digital Library

[8]

Mohnish Dubey, Debayan Banerjee, Debanjan Chaudhuri, and Jens Lehmann. 2018. EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs. In Proceedings of the 17th International Semantic Web Conference, Part I, Vol. 11136. 108--126.

Digital Library

[9]

Schubert Foo and Hui Li. 2004. Chinese word segmentation and its effect on information retrieval. Inf. Process. Manag. 40, 1 (2004), 161--190. https://doi.org/10.1016/S0306--4573(02)00079--1

Digital Library

[10]

Darío Garigliotti, Faegheh Hasibi, and Krisztian Balog. 2017. Target Type Identification for Entity-Bearing Queries. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7--11, 2017. ACM, 845--848.

Digital Library

[11]

Darío Garigliotti, Faegheh Hasibi, and Krisztian Balog. 2019. Identifying and exploiting target entity type information for ad hoc entity retrieval. Inf. Retr. J. 22, 3--4 (2019), 285--323.

[12]

Shuo Han, Lei Zou, Jeffrey Xu Yu, and Dongyan Zhao. 2017. Keyword Search on RDF Graphs - A Query Graph Assembly Approach. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, November 06 - 10, 2017. ACM, 227--236.

Digital Library

[13]

Ravindra Harige and Paul Buitelaar. 2016. Generating a Large-Scale Entity Linking Dictionary from Wikipedia Link Structure and Article Text. In Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016.

[14]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7, 3 (2019), 535--547.

[15]

Jin-Dong Kim, Christina Unger, Axel-Cyrille Ngonga Ngomo, André Freitas, YoungGyun Hahm, Jiseong Kim, Gyu-Hyun Choi, Jeonguk Kim, Ricardo Usbeck, Myoung-Gu Kang, and Key-Sun Choi. 2017. OKBQA: an Open Collaboration Framework for Development of Natural Language Question-Answering over Knowledge Bases. In Proceedings of the ISWC 2017 Posters & Demonstrations and Industry Tracks, Vol. 1963.

[16]

Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1746--1751. https://doi.org/10.3115/v1/D14--1181

[17]

Zornitsa Kozareva, Qi Li, Ke Zhai, and Weiwei Guo. 2016. Recognizing Salient Entities in Shopping Queries. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7--12, 2016, Berlin, Germany, Volume 2: Short Papers. The Association for Computer Linguistics.

[18]

Carolin Lawrence and Stefan Riezler. 2016. NLMaps: A Natural Language Interface to Query OpenStreetMap. Proceedings of the 26th International Conference on Computational Linguistics.

[19]

Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, and Christian Bizer. 2015. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6, 2 (2015), 167--195.

[20]

Xiao Li. 2010. Understanding the Semantic Structure of Noun Phrase Queries. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL). The Association for Computer Linguistics, 1337--1345.

[21]

Yi-Xuan Liu, Bin Wang, Fan Ding, and Sheng Xu. 2008. Information Retrieval Oriented Word Segmentation based on Character Association Strength Ranking. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing EMNLP. ACL, 1061--1069.

[22]

Kangqi Luo, Fengli Lin, Xusheng Luo, and Kenny Zhu. 2018. Knowledge Base Question Answering via Encoding of Complex Query Graphs. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2185--2194.

[23]

Mehdi Manshadi and Xiao Li. 2009. Semantic Tagging of Web Search Queries. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP. The Association for Computer Linguistics, 861--869.

[24]

Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2018. Advances in Pre-Training Distributed Word Representations. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018).

[25]

Jeff Z. Pan, Mei Zhang, Kuldeep Singh, Frank van Harmelen, Jinguang Gu, and Zhi Zhang. 2019. Entity Enabled Relation Linking. In Proceedings of the 18th International Semantic Web Conference (ISWC), Vol. 11778. 523--538.

Digital Library

[26]

Wannaphong Phatthiyaphaibun, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, and Pattarawat Chormai. 2016. PyThaiNLP: Thai Natural Language Processing in Python. https://doi.org/10. 5281/zenodo.3519354

[27]

Francesco Piccinno and Paolo Ferragina. 2014. From TagME to WAT: a new entity annotator. In ERD'14, Proceedings of the First ACM International Workshop on Entity Recognition & Disambiguation. ACM, 55--62.

Digital Library

[28]

Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, and Grant E. Weddell. 2012. Interpreting keyword queries over web knowledge bases. In 21st ACM International Conference on Information and Knowledge Management (CIKM). ACM, 305--314.

[29]

Dharmen Punjani, K. Singh, Andreas Both, Manolis Koubarakis, Iosif Angelidis, Konstantina Bereta, Themis Beris, Dimitris Bilidas, Theofilos Ioannidis, N. Karalis, Christoph Lange, Despina-Athanasia Pantazi, Christos Papaloukas, and George Stamoulis. 2018. Template-Based Question Answering over Linked Geospatial Data. In Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR@SIGSPATIAL 2018, Seattle, WA, USA, November 6, 2018. ACM, 7:1--7:10.

Digital Library

[30]

Nikos Sarkas, Stelios Paparizos, and Panayiotis Tsaparas. 2010. Structured annotations of web queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 771--782.

Digital Library

[31]

Vinay Setty and Krisztian Balog. 2020. Semantic Answer Type Prediction using BERT IAI at the ISWC SMART Task 2020. In Proceedings of the SeMantic AnsweR Type prediction task (SMART) at ISWC 2020 Semantic Web Challenge (CEUR Workshop Proceedings, Vol. 2774). CEUR-WS.org, 10--18.

[32]

Kuldeep Singh, Andreas Both, Arun Sethupat Radhakrishna, and Saeedeh Shekarpour. 2018. Frankenstein: A Platform Enabling Reuse of Question Answering Components. In Proceedings of the 15th European Semantic Web Conference ESWC, Aldo Gangemi and Roberto Navigli et. al (Eds.), Vol. 10843. 624--638.

Digital Library

[33]

Thanaruk Theeramunkong, Virach Sornlertlamvanich, Thanasan Tanhermhong, and Wirat Chinnan. 2000. Character cluster based Thai information retrieval. In Proceedings of the Fifth International Workshop on Information Retrieval with Asian Languages. ACM, 75--80.

Digital Library

[34]

Johannes M. van Hulst, Faegheh Hasibi, Koen Dercksen, Krisztian Balog, and Arjen P. de Vries. 2020. REL: An Entity Linker Standing on the Shoulders of Giants. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR. ACM, 2197--2200.

[35]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems (NIPS). 5998--6008.

[36]

Denny Vrandecic and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78--85.

Digital Library

[37]

Marouane Yassine, David Beauchemin, François Laviolette, and Luc Lamontagne. 2020. Leveraging Subword Embeddings for Multinational Address Parsing. CoRR abs/2006.16152 (2020). https://arxiv.org/abs/2006.16152

Cited By

Li JZeng WCheng SMa YTang JWang SYin DChen HDuh WHuang HKato MMothe JPoblete B(2023)Graph Enhanced BERT for Query UnderstandingProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591845(3315-3319)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591845
Kefalidis SPunjani DTsalapati EPlas KPollali MMitsios MTsokanaridou MKoubarakis MMaret P(2023)Benchmarking Geospatial Question Answering Engines Using the Dataset GeoQuestions1089The Semantic Web – ISWC 202310.1007/978-3-031-47243-5_15(266-284)Online publication date: 6-Nov-2023
https://dl.acm.org/doi/10.1007/978-3-031-47243-5_15

Index Terms

Type Linking for Query Understanding and Semantic Search
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
      1. Query intent
    2. Retrieval tasks and goals
      1. Question answering

Recommendations

Semantic Query Understanding
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Queries are often ambiguous and can be interpreted in many ways, even by humans. Hence, semantic query understanding's primary objective is to understand the intention behind the query. This implies first predicting the language used to express the ...
A distributional approach for terminological semantic search on the Linked Data Web
SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing

The process of searching and understanding existing vocabularies (terminological artifacts) on the Linked Data Web is an intrinsic activity to the consumption and production of Linked Data. Data consumers trying to find and understand the vocabularies ...
SPARK: Adapting Keyword Query to Semantic Search
The Semantic Web
Abstract
Semantic search promises to provide more accurate result than present-day keyword search. However, progress with semantic search has been delayed due to the complexity of its query languages. In this paper, we explore a novel approach of adapting ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2022

5033 pages

ISBN:9781450393850

DOI:10.1145/3534678

General Chairs:
Aidong Zhang
University of Virginia
,
Huzefa Rangwala
Amazon/George Mason University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '22

Sponsor:

KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 14 - 18, 2022

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
266
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)11

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li JZeng WCheng SMa YTang JWang SYin DChen HDuh WHuang HKato MMothe JPoblete B(2023)Graph Enhanced BERT for Query UnderstandingProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591845(3315-3319)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591845
Kefalidis SPunjani DTsalapati EPlas KPollali MMitsios MTsokanaridou MKoubarakis MMaret P(2023)Benchmarking Geospatial Question Answering Engines Using the Dataset GeoQuestions1089The Semantic Web – ISWC 202310.1007/978-3-031-47243-5_15(266-284)Online publication date: 6-Nov-2023
https://dl.acm.org/doi/10.1007/978-3-031-47243-5_15

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents