Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3534678.3539067acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Type Linking for Query Understanding and Semantic Search

Published: 14 August 2022 Publication History

Abstract

Huawei is currently undertaking an effort to build map and web search services using query understanding and semantic search techniques. We present our efforts to built a low-latency type mention detection and linking service for map search. In addition to latency challenges, we only had access to low quality and biased training data plus we had to support 13 languages. Consequently, our service is based mostly on unsupervised term- and vector-based methods. Nevertheless, we trained a Transformer-based query tagger which we integrated with the rest of the pipeline using a reward and penalisation approach. We present techniques that we designed in order to address challenges with the type dictionary, incompatibilities in scoring between the term-based and vector-based methods as well as over-segmentation issues in Thai, Chinese, and Japanese. We have evaluated our approach on the Huawei map search use case as well as on community Question Answering benchmarks.

References

[1]
Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France. OpenReview.net.
[2]
Krisztian Balog. 2013. Semistructured Data Search. In Bridging Between Information Retrieval and Databases - PROMISE Winter School 2013, Bressanone, Italy, February 4--8, 2013. Revised Tutorial Lectures, Nicola Ferro (Ed.), Vol. 8173. 74--96.
[3]
Roi Blanco, Berkant Barla Cambazoglu, Peter Mika, and Nicolas Torzec. 2013. Entity Recommendations in Web Search. In Proceedings of the 12th International Semantic Web Conference, (ISWC), Part II, Vol. 8219. 33--48.
[4]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomás Mikolov. 2016. Enriching Word Vectors with Subword Information. CoRR abs/1607.04606 (2016). arXiv:1607.04606 http://arxiv.org/abs/1607.04606
[5]
Helen Craig, Dragomir Yankov, Renzhong Wang, Pavel Berkhin, and Wei Wu. 2019. Scaling Address Parsing Sequence Models through Active Learning. In Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL. ACM, 424--427.
[6]
ONNX Runtime developers. 2021. ONNX Runtime. https://onnxruntime.ai/.
[7]
Dennis Diefenbach, Andreas Both, Kamal Singh, and Pierre Maret. 2020. Towards a question answering system over the Semantic Web. Semantic Web 11, 3 (2020), 421--439.
[8]
Mohnish Dubey, Debayan Banerjee, Debanjan Chaudhuri, and Jens Lehmann. 2018. EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs. In Proceedings of the 17th International Semantic Web Conference, Part I, Vol. 11136. 108--126.
[9]
Schubert Foo and Hui Li. 2004. Chinese word segmentation and its effect on information retrieval. Inf. Process. Manag. 40, 1 (2004), 161--190. https://doi.org/10.1016/S0306--4573(02)00079--1
[10]
Darío Garigliotti, Faegheh Hasibi, and Krisztian Balog. 2017. Target Type Identification for Entity-Bearing Queries. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7--11, 2017. ACM, 845--848.
[11]
Darío Garigliotti, Faegheh Hasibi, and Krisztian Balog. 2019. Identifying and exploiting target entity type information for ad hoc entity retrieval. Inf. Retr. J. 22, 3--4 (2019), 285--323.
[12]
Shuo Han, Lei Zou, Jeffrey Xu Yu, and Dongyan Zhao. 2017. Keyword Search on RDF Graphs - A Query Graph Assembly Approach. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, November 06 - 10, 2017. ACM, 227--236.
[13]
Ravindra Harige and Paul Buitelaar. 2016. Generating a Large-Scale Entity Linking Dictionary from Wikipedia Link Structure and Article Text. In Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016.
[14]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7, 3 (2019), 535--547.
[15]
Jin-Dong Kim, Christina Unger, Axel-Cyrille Ngonga Ngomo, André Freitas, YoungGyun Hahm, Jiseong Kim, Gyu-Hyun Choi, Jeonguk Kim, Ricardo Usbeck, Myoung-Gu Kang, and Key-Sun Choi. 2017. OKBQA: an Open Collaboration Framework for Development of Natural Language Question-Answering over Knowledge Bases. In Proceedings of the ISWC 2017 Posters & Demonstrations and Industry Tracks, Vol. 1963.
[16]
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1746--1751. https://doi.org/10.3115/v1/D14--1181
[17]
Zornitsa Kozareva, Qi Li, Ke Zhai, and Weiwei Guo. 2016. Recognizing Salient Entities in Shopping Queries. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7--12, 2016, Berlin, Germany, Volume 2: Short Papers. The Association for Computer Linguistics.
[18]
Carolin Lawrence and Stefan Riezler. 2016. NLMaps: A Natural Language Interface to Query OpenStreetMap. Proceedings of the 26th International Conference on Computational Linguistics.
[19]
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, and Christian Bizer. 2015. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6, 2 (2015), 167--195.
[20]
Xiao Li. 2010. Understanding the Semantic Structure of Noun Phrase Queries. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL). The Association for Computer Linguistics, 1337--1345.
[21]
Yi-Xuan Liu, Bin Wang, Fan Ding, and Sheng Xu. 2008. Information Retrieval Oriented Word Segmentation based on Character Association Strength Ranking. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing EMNLP. ACL, 1061--1069.
[22]
Kangqi Luo, Fengli Lin, Xusheng Luo, and Kenny Zhu. 2018. Knowledge Base Question Answering via Encoding of Complex Query Graphs. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2185--2194.
[23]
Mehdi Manshadi and Xiao Li. 2009. Semantic Tagging of Web Search Queries. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP. The Association for Computer Linguistics, 861--869.
[24]
Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2018. Advances in Pre-Training Distributed Word Representations. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018).
[25]
Jeff Z. Pan, Mei Zhang, Kuldeep Singh, Frank van Harmelen, Jinguang Gu, and Zhi Zhang. 2019. Entity Enabled Relation Linking. In Proceedings of the 18th International Semantic Web Conference (ISWC), Vol. 11778. 523--538.
[26]
Wannaphong Phatthiyaphaibun, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, and Pattarawat Chormai. 2016. PyThaiNLP: Thai Natural Language Processing in Python. https://doi.org/10. 5281/zenodo.3519354
[27]
Francesco Piccinno and Paolo Ferragina. 2014. From TagME to WAT: a new entity annotator. In ERD'14, Proceedings of the First ACM International Workshop on Entity Recognition & Disambiguation. ACM, 55--62.
[28]
Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, and Grant E. Weddell. 2012. Interpreting keyword queries over web knowledge bases. In 21st ACM International Conference on Information and Knowledge Management (CIKM). ACM, 305--314.
[29]
Dharmen Punjani, K. Singh, Andreas Both, Manolis Koubarakis, Iosif Angelidis, Konstantina Bereta, Themis Beris, Dimitris Bilidas, Theofilos Ioannidis, N. Karalis, Christoph Lange, Despina-Athanasia Pantazi, Christos Papaloukas, and George Stamoulis. 2018. Template-Based Question Answering over Linked Geospatial Data. In Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR@SIGSPATIAL 2018, Seattle, WA, USA, November 6, 2018. ACM, 7:1--7:10.
[30]
Nikos Sarkas, Stelios Paparizos, and Panayiotis Tsaparas. 2010. Structured annotations of web queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 771--782.
[31]
Vinay Setty and Krisztian Balog. 2020. Semantic Answer Type Prediction using BERT IAI at the ISWC SMART Task 2020. In Proceedings of the SeMantic AnsweR Type prediction task (SMART) at ISWC 2020 Semantic Web Challenge (CEUR Workshop Proceedings, Vol. 2774). CEUR-WS.org, 10--18.
[32]
Kuldeep Singh, Andreas Both, Arun Sethupat Radhakrishna, and Saeedeh Shekarpour. 2018. Frankenstein: A Platform Enabling Reuse of Question Answering Components. In Proceedings of the 15th European Semantic Web Conference ESWC, Aldo Gangemi and Roberto Navigli et. al (Eds.), Vol. 10843. 624--638.
[33]
Thanaruk Theeramunkong, Virach Sornlertlamvanich, Thanasan Tanhermhong, and Wirat Chinnan. 2000. Character cluster based Thai information retrieval. In Proceedings of the Fifth International Workshop on Information Retrieval with Asian Languages. ACM, 75--80.
[34]
Johannes M. van Hulst, Faegheh Hasibi, Koen Dercksen, Krisztian Balog, and Arjen P. de Vries. 2020. REL: An Entity Linker Standing on the Shoulders of Giants. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR. ACM, 2197--2200.
[35]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems (NIPS). 5998--6008.
[36]
Denny Vrandecic and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78--85.
[37]
Marouane Yassine, David Beauchemin, François Laviolette, and Luc Lamontagne. 2020. Leveraging Subword Embeddings for Multinational Address Parsing. CoRR abs/2006.16152 (2020). https://arxiv.org/abs/2006.16152

Cited By

View all
  • (2023)Graph Enhanced BERT for Query UnderstandingProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591845(3315-3319)Online publication date: 19-Jul-2023
  • (2023)Benchmarking Geospatial Question Answering Engines Using the Dataset GeoQuestions1089The Semantic Web – ISWC 202310.1007/978-3-031-47243-5_15(266-284)Online publication date: 6-Nov-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2022
5033 pages
ISBN:9781450393850
DOI:10.1145/3534678
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. query annotation
  3. semantic search
  4. type linking

Qualifiers

  • Research-article

Conference

KDD '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)43
  • Downloads (Last 6 weeks)11
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Graph Enhanced BERT for Query UnderstandingProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591845(3315-3319)Online publication date: 19-Jul-2023
  • (2023)Benchmarking Geospatial Question Answering Engines Using the Dataset GeoQuestions1089The Semantic Web – ISWC 202310.1007/978-3-031-47243-5_15(266-284)Online publication date: 6-Nov-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media