Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3097983.3098125acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Matching Restaurant Menus to Crowdsourced Food Data: A Scalable Machine Learning Approach

Published: 13 August 2017 Publication History

Abstract

We study the problem of how to match a formally structured restaurant menu item to a large database of less structured food items that has been collected via crowd-sourcing. At first glance, this problem scenario looks like a typical text matching problem that might possibly be solved with existing text similarity learning approaches. However, due to the unique nature of our scenario and the need for scalability, our problem imposes certain restrictions on possible machine learning approaches that we can employ. We propose a novel, practical, and scalable machine learning solution architecture, consisting of two major steps. First we use a query generation approach, based on a Markov Decision Process algorithm, to reduce the time complexity of searching for matching candidates. That is then followed by a re-ranking step, using deep learning techniques, to meet our required matching quality goals. It is important to note that our proposed solution architecture has already been deployed in a real application system serving tens of millions of users, and shows great potential for practical cases of user-entered text to structured text matching, especially when scalability is crucial.

References

[1]
Delphine Bernhard. 2010. Query expansion based on pseudo relevance feedback from definition clusters Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics, 54--62.
[2]
Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li 2007. Learning to rank: from pairwise approach to listwise approach Proceedings of the 24th international conference on Machine learning. ACM, 129--136.
[3]
Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2011. Simulating simple user behavior for system effectiveness evaluation Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 611--620.
[4]
Franccois Chollet. 2015. Keras: Theano-based deep learning library. Code: https://github. com/fchollet. Documentation: http://keras. io (2015).
[5]
William Cohen, Pradeep Ravikumar, and Stephen Fienberg. 2003. A comparison of string metrics for matching names and records Kdd workshop on data cleaning and object consolidation, Vol. Vol. 3. 73--78.
[6]
Koby Crammer, Yoram Singer, and others 2001. Pranking with Ranking. In Nips, Vol. Vol. 14. 641--647.
[7]
Minwei Feng, Bing Xiang, Michael R Glass, Lidan Wang, and Bowen Zhou 2015. Applying deep learning to answer selection: A study and an open task. arXiv preprint arXiv:1508.01585 (2015).
[8]
Dongyi Guan, Sicong Zhang, and Hui Yang 2013. Utilizing query change for session search. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 453--462.
[9]
LI Hang 2011. A short introduction to learning to rank. IEICE TRANSACTIONS on Information and Systems, Vol. 94, 10 (2011), 1854--1862.
[10]
Ralf Herbrich, Thore Graepel, and Klaus Obermayer. 1999. Large margin rank boundaries for ordinal regression. Advances in neural information processing systems (1999), 115--132.
[11]
Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences Advances in Neural Information Processing Systems. 2042--2050.
[12]
Piotr Indyk and Rajeev Motwani 1998. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Proceedings of the Thirtieth Annual ACM Symposium on the Theory of Computing, Dallas, Texas, USA, May 23--26, 1998. 604--613.
[13]
Jiepu Jiang and Daqing He 2013. Pitt at TREC 2013: Different Effects of Click-through and Past Queries on Whole-session Search Performance. In The Twenty-Second Text REtrieval Conference (TREC 2013) Proceedings.
[14]
Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. 1996. Reinforcement learning: A survey. Journal of artificial intelligence research (1996), 237--285.
[15]
Yoon Kim 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).
[16]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105.
[17]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner 1998. Gradient-based learning applied to document recognition. Proc. IEEE Vol. 86, 11 (1998), 2278--2324.
[18]
Hang Li 2014. Learning to rank for information retrieval and natural language processing. Synthesis Lectures on Human Language Technologies, Vol. 7, 3 (2014), 1--121.
[19]
Yasunari Maeda, Fumitaro Goto, Hiroshi Masui, Fumito Masui, and Masakiyo Suzuki 2011. The Bayesian Optimal Algorithm for Query Refinement in Information Retrieval. IJCSNS, Vol. 11, 10 (2011), 91.
[20]
Christopher D Manning, Prabhakar Raghavan, Hinrich Schütze, and others 2008. Introduction to information retrieval. Vol. Vol. 1. Cambridge university press Cambridge.
[21]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean 2013. Efficient Estimation of Word Representations in Vector Space Proceedings of Workshop at ICLR.
[22]
Nikolaos Nanas, Victoria Uren, Anne De Roeck, and J Domingue. 2003. A comparative study of term weighting methods for information filtering. KMi-TR-128. Knowledge Media Institue, The Open University (2003).
[23]
Kriengkrai Porkaew and Kaushik Chakrabarti 1999. Query refinement for multimedia similarity retrieval in MARS Proceedings of the seventh ACM international conference on Multimedia (Part 1). ACM, 235--238.
[24]
Stuart Russell and Peter Norvig 1995. Artificial intelligence: a modern approach. (1995).
[25]
Eldar Sadikov, Jayant Madhavan, Lu Wang, and Alon Halevy. 2010. Clustering query refinements by user intent. In Proceedings of the 19th international conference on World wide web. ACM, 841--850.
[26]
Aliaksei Severyn and Alessandro Moschitti 2015. Learning to rank short text pairs with convolutional deep neural networks Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 373--382.
[27]
Narayanan Sundaram, Aizana Turmukhametova, Nadathur Satish, Todd Mostak, Piotr Indyk, Samuel Madden, and Pradeep Dubey 2013. Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. Proceedings of the VLDB Endowment Vol. 6, 14 (2013), 1930--1941.
[28]
Wei Wang, Chuan Xiao, Xuemin Lin, and Chengqi Zhang. 2009. Efficient approximate entity extraction with edit distance constraints Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009. 759--770.
[29]
Wenpeng Yin and Hinrich Schütze 2015. Convolutional neural network for paraphrase identification Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 901--911.
[30]
Xiang Zhang and Yann LeCun 2015. Text Understanding from Scratch. arXiv preprint arXiv:1502.01710 (2015). endthebibliography

Cited By

View all
  • (2022)Noisy Interactive Graph SearchProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539267(231-240)Online publication date: 14-Aug-2022
  • (2022)Eat This, Not That! – a Personalised Restaurant Menu Decoder That Helps You Pick the Right Food2022 IEEE International Conference on E-health Networking, Application & Services (HealthCom)10.1109/HealthCom54947.2022.9982770(43-48)Online publication date: 17-Oct-2022
  • (2021)Crowdsourcing: Descriptive Study on Algorithms and Frameworks for PredictionArchives of Computational Methods in Engineering10.1007/s11831-021-09577-829:1(357-374)Online publication date: 4-Apr-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2017
2240 pages
ISBN:9781450348874
DOI:10.1145/3097983
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. convolutional neural networks
  2. markov decision process
  3. nutrition estimation
  4. short text matching

Qualifiers

  • Research-article

Conference

KDD '17
Sponsor:

Acceptance Rates

KDD '17 Paper Acceptance Rate 64 of 748 submissions, 9%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)3
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Noisy Interactive Graph SearchProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539267(231-240)Online publication date: 14-Aug-2022
  • (2022)Eat This, Not That! – a Personalised Restaurant Menu Decoder That Helps You Pick the Right Food2022 IEEE International Conference on E-health Networking, Application & Services (HealthCom)10.1109/HealthCom54947.2022.9982770(43-48)Online publication date: 17-Oct-2022
  • (2021)Crowdsourcing: Descriptive Study on Algorithms and Frameworks for PredictionArchives of Computational Methods in Engineering10.1007/s11831-021-09577-829:1(357-374)Online publication date: 4-Apr-2021
  • (2020)Passenger Flow Forecast of Catering Business based on Autoregressive Integrated Moving Average and Smoothing Index Prediction Model2020 International Signal Processing, Communications and Engineering Management Conference (ISPCEM)10.1109/ISPCEM52197.2020.00016(53-57)Online publication date: Nov-2020
  • (2020)FILLET - Platform for Intelligent Nutrition2020 IEEE/ACS 17th International Conference on Computer Systems and Applications (AICCSA)10.1109/AICCSA50499.2020.9316490(1-8)Online publication date: Nov-2020
  • (2020)An Interactive Virtual E-Learning Framework Using Crowdsourced AnalyticsArtificial Intelligence Techniques for Advanced Computing Applications10.1007/978-981-15-5329-5_12(127-136)Online publication date: 24-Jul-2020
  • (2018)Building Payment Classification Models from Rules and Crowdsourced Labels: A Case StudyAdvanced Information Systems Engineering Workshops10.1007/978-3-319-92898-2_7(85-97)Online publication date: 5-Jun-2018

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media