research-article

Matching Restaurant Menus to Crowdsourced Food Data: A Scalable Machine Learning Approach

Authors:

Hesam Salehian,

Patrick Howell,

Chul LeeAuthors Info & Claims

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 2001 - 2009

https://doi.org/10.1145/3097983.3098125

Published: 13 August 2017 Publication History

Abstract

We study the problem of how to match a formally structured restaurant menu item to a large database of less structured food items that has been collected via crowd-sourcing. At first glance, this problem scenario looks like a typical text matching problem that might possibly be solved with existing text similarity learning approaches. However, due to the unique nature of our scenario and the need for scalability, our problem imposes certain restrictions on possible machine learning approaches that we can employ. We propose a novel, practical, and scalable machine learning solution architecture, consisting of two major steps. First we use a query generation approach, based on a Markov Decision Process algorithm, to reduce the time complexity of searching for matching candidates. That is then followed by a re-ranking step, using deep learning techniques, to meet our required matching quality goals. It is important to note that our proposed solution architecture has already been deployed in a real application system serving tens of millions of users, and shows great potential for practical cases of user-entered text to structured text matching, especially when scalability is crucial.

References

[1]

Delphine Bernhard. 2010. Query expansion based on pseudo relevance feedback from definition clusters Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics, 54--62.

[2]

Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li 2007. Learning to rank: from pairwise approach to listwise approach Proceedings of the 24th international conference on Machine learning. ACM, 129--136.

[3]

Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2011. Simulating simple user behavior for system effectiveness evaluation Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 611--620.

[4]

Franccois Chollet. 2015. Keras: Theano-based deep learning library. Code: https://github. com/fchollet. Documentation: http://keras. io (2015).

[5]

William Cohen, Pradeep Ravikumar, and Stephen Fienberg. 2003. A comparison of string metrics for matching names and records Kdd workshop on data cleaning and object consolidation, Vol. Vol. 3. 73--78.

[6]

Koby Crammer, Yoram Singer, and others 2001. Pranking with Ranking. In Nips, Vol. Vol. 14. 641--647.

[7]

Minwei Feng, Bing Xiang, Michael R Glass, Lidan Wang, and Bowen Zhou 2015. Applying deep learning to answer selection: A study and an open task. arXiv preprint arXiv:1508.01585 (2015).

[8]

Dongyi Guan, Sicong Zhang, and Hui Yang 2013. Utilizing query change for session search. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 453--462.

Digital Library

[9]

LI Hang 2011. A short introduction to learning to rank. IEICE TRANSACTIONS on Information and Systems, Vol. 94, 10 (2011), 1854--1862.

[10]

Ralf Herbrich, Thore Graepel, and Klaus Obermayer. 1999. Large margin rank boundaries for ordinal regression. Advances in neural information processing systems (1999), 115--132.

[11]

Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences Advances in Neural Information Processing Systems. 2042--2050.

[12]

Piotr Indyk and Rajeev Motwani 1998. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Proceedings of the Thirtieth Annual ACM Symposium on the Theory of Computing, Dallas, Texas, USA, May 23--26, 1998. 604--613.

Digital Library

[13]

Jiepu Jiang and Daqing He 2013. Pitt at TREC 2013: Different Effects of Click-through and Past Queries on Whole-session Search Performance. In The Twenty-Second Text REtrieval Conference (TREC 2013) Proceedings.

[14]

Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. 1996. Reinforcement learning: A survey. Journal of artificial intelligence research (1996), 237--285.

[15]

Yoon Kim 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).

[16]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105.

[17]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner 1998. Gradient-based learning applied to document recognition. Proc. IEEE Vol. 86, 11 (1998), 2278--2324.

[18]

Hang Li 2014. Learning to rank for information retrieval and natural language processing. Synthesis Lectures on Human Language Technologies, Vol. 7, 3 (2014), 1--121.

[19]

Yasunari Maeda, Fumitaro Goto, Hiroshi Masui, Fumito Masui, and Masakiyo Suzuki 2011. The Bayesian Optimal Algorithm for Query Refinement in Information Retrieval. IJCSNS, Vol. 11, 10 (2011), 91.

[20]

Christopher D Manning, Prabhakar Raghavan, Hinrich Schütze, and others 2008. Introduction to information retrieval. Vol. Vol. 1. Cambridge university press Cambridge.

Digital Library

[21]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean 2013. Efficient Estimation of Word Representations in Vector Space Proceedings of Workshop at ICLR.

[22]

Nikolaos Nanas, Victoria Uren, Anne De Roeck, and J Domingue. 2003. A comparative study of term weighting methods for information filtering. KMi-TR-128. Knowledge Media Institue, The Open University (2003).

[23]

Kriengkrai Porkaew and Kaushik Chakrabarti 1999. Query refinement for multimedia similarity retrieval in MARS Proceedings of the seventh ACM international conference on Multimedia (Part 1). ACM, 235--238.

[24]

Stuart Russell and Peter Norvig 1995. Artificial intelligence: a modern approach. (1995).

[25]

Eldar Sadikov, Jayant Madhavan, Lu Wang, and Alon Halevy. 2010. Clustering query refinements by user intent. In Proceedings of the 19th international conference on World wide web. ACM, 841--850.

Digital Library

[26]

Aliaksei Severyn and Alessandro Moschitti 2015. Learning to rank short text pairs with convolutional deep neural networks Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 373--382.

[27]

Narayanan Sundaram, Aizana Turmukhametova, Nadathur Satish, Todd Mostak, Piotr Indyk, Samuel Madden, and Pradeep Dubey 2013. Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. Proceedings of the VLDB Endowment Vol. 6, 14 (2013), 1930--1941.

Digital Library

[28]

Wei Wang, Chuan Xiao, Xuemin Lin, and Chengqi Zhang. 2009. Efficient approximate entity extraction with edit distance constraints Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009. 759--770.

[29]

Wenpeng Yin and Hinrich Schütze 2015. Convolutional neural network for paraphrase identification Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 901--911.

[30]

Xiang Zhang and Yann LeCun 2015. Text Understanding from Scratch. arXiv preprint arXiv:1502.01710 (2015). endthebibliography

Cited By

Cong QTang JHan KHuang YChen LChee YZhang ARangwala H(2022)Noisy Interactive Graph SearchProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539267(231-240)Online publication date: 14-Aug-2022
https://dl.acm.org/doi/10.1145/3534678.3539267
Hasan WTuz Zaman KZadeh MLi J(2022)Eat This, Not That! – a Personalised Restaurant Menu Decoder That Helps You Pick the Right Food2022 IEEE International Conference on E-health Networking, Application & Services (HealthCom)10.1109/HealthCom54947.2022.9982770(43-48)Online publication date: 17-Oct-2022
https://doi.org/10.1109/HealthCom54947.2022.9982770
Dhinakaran KNedunchelian RBalasundaram A(2021)Crowdsourcing: Descriptive Study on Algorithms and Frameworks for PredictionArchives of Computational Methods in Engineering10.1007/s11831-021-09577-829:1(357-374)Online publication date: 4-Apr-2021
https://doi.org/10.1007/s11831-021-09577-8
Show More Cited By

Recommendations

Food Photography: From Snapshots to Great Shots
Analysis of an optimal policy in dynamic bipartite matching models
Abstract
A dynamic bipartite matching model is given by a bipartite matching graph which determines the possible matchings between the various types of supply and demand items. Both supply and demand items arrive to the system according to a ...
Indoor Corner Detection and Matching from Crowdsourced Movement Trajectories
2017 IEEE Wireless Communications and Networking Conference (WCNC)
Indoor landmarks, like corners, staircases and etc, play an important role in crowdsourcing-based indoor localization systems. This paper studies the problem of indoor corner detection and matching from crowdsourced movement trajectories. For corner ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2017

2240 pages

ISBN:9781450348874

DOI:10.1145/3097983

General Chairs:
Stan Matwin
Dalhousie University
,
Shipeng Yu
LinkedIn
,
Faisal Farooq
IBM

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '17

Sponsor:

KDD '17: The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 13 - 17, 2017

NS, Halifax, Canada

Acceptance Rates

KDD '17 Paper Acceptance Rate 64 of 748 submissions, 9%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
628
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)3

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cong QTang JHan KHuang YChen LChee YZhang ARangwala H(2022)Noisy Interactive Graph SearchProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539267(231-240)Online publication date: 14-Aug-2022
https://dl.acm.org/doi/10.1145/3534678.3539267
Hasan WTuz Zaman KZadeh MLi J(2022)Eat This, Not That! – a Personalised Restaurant Menu Decoder That Helps You Pick the Right Food2022 IEEE International Conference on E-health Networking, Application & Services (HealthCom)10.1109/HealthCom54947.2022.9982770(43-48)Online publication date: 17-Oct-2022
https://doi.org/10.1109/HealthCom54947.2022.9982770
Dhinakaran KNedunchelian RBalasundaram A(2021)Crowdsourcing: Descriptive Study on Algorithms and Frameworks for PredictionArchives of Computational Methods in Engineering10.1007/s11831-021-09577-829:1(357-374)Online publication date: 4-Apr-2021
https://doi.org/10.1007/s11831-021-09577-8
Su C(2020)Passenger Flow Forecast of Catering Business based on Autoregressive Integrated Moving Average and Smoothing Index Prediction Model2020 International Signal Processing, Communications and Engineering Management Conference (ISPCEM)10.1109/ISPCEM52197.2020.00016(53-57)Online publication date: Nov-2020
https://doi.org/10.1109/ISPCEM52197.2020.00016
Ribeiro DCosta JLopes IBarbosa TSoares CSousa FRibeiro JRocha DSilva M(2020)FILLET - Platform for Intelligent Nutrition2020 IEEE/ACS 17th International Conference on Computer Systems and Applications (AICCSA)10.1109/AICCSA50499.2020.9316490(1-8)Online publication date: Nov-2020
https://doi.org/10.1109/AICCSA50499.2020.9316490
Dhinakaran KNedunchelian RGnanavel RDurgadevi SAswini S(2020)An Interactive Virtual E-Learning Framework Using Crowdsourced AnalyticsArtificial Intelligence Techniques for Advanced Computing Applications10.1007/978-981-15-5329-5_12(127-136)Online publication date: 24-Jul-2020
https://doi.org/10.1007/978-981-15-5329-5_12
Mateush ASharma RDumas MPlotnikova VSlobozhan IÜbi J(2018)Building Payment Classification Models from Rules and Crowdsourced Labels: A Case StudyAdvanced Information Systems Engineering Workshops10.1007/978-3-319-92898-2_7(85-97)Online publication date: 5-Jun-2018
https://doi.org/10.1007/978-3-319-92898-2_7

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents