research-article

Noisy Interactive Graph Search

Authors:

Yeow Meng CheeAuthors Info & Claims

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 231 - 240

https://doi.org/10.1145/3534678.3539267

Published: 14 August 2022 Publication History

Abstract

The interactive graph search (IGS) problem aims to locate an initially unknown target node leveraging human intelligence. In IGS, we can gradually find the target node by sequentially asking humans some reachability queries like "is the target node reachable from a given node x?". However, human workers may make mistakes when answering these queries. Motivated by this concern, in this paper, we study a noisy version of the IGS problem. Our objective in this problem is to minimize the query complexity while ensuring accuracy. We propose a method to select the query node such that we can push the search process as much as possible and an online method to infer which node is the target after collecting a new answer. By rigorous theoretical analysis, we show that the query complexity of our approach is near-optimal up to a constant factor. The extensive experiments on two real datasets also demonstrate the superiorities of our approach.

Supplemental Material

MP4 File

Presentation video for paper Noisy Interactive Graph Search

Download
64.21 MB

References

[1]

Yael Amsterdamer, Yael Grossman, Tova Milo, and Pierre Senellart. 2013. Crowd Mining. In Proc. ACM SIGMOD. 241--252.

[2]

Yukino Baba and Hisashi Kashima. 2013. Statistical Quality Estimation for General Crowdsourcing Tasks. In Proc. ACM SIGKDD. 554--562.

Digital Library

[3]

Yosi Ben-Asher, Eitan Farchi, and Ilan Newman. 1999. Optimal Search in Trees. SIAM J. Comput. 28, 6 (1999), 2090--2102.

Digital Library

[4]

Venkatesan T Chakaravarthy, Vinayaka Pandit, Sambuddha Roy, Pranjal Awasthi, and Mukesh Mohania. 2007. Decision Trees for Entity Identification: Approximation Algorithms and Hardness Results. In Proc. PODS. 53--62.

Digital Library

[5]

Lydia B Chilton, Greg Little, Darren Edge, Daniel S Weld, and James A Landay. 2013. Cascade: Crowdsourcing Taxonomy Creation. In Proc. ACM SIGCHI. 1999--2008.

Digital Library

[6]

Ferdinando Cicalese, Tobias Jacobs, Eduardo Laber, and Marco Molinaro. 2011. On the Complexity of Searching in Trees and Partially Ordered Structures. Theoretical Computer Science 412, 50 (2011), 6879--6896.

Digital Library

[7]

Ferdinando Cicalese, Tobias Jacobs, Eduardo Laber, and Marco Molinaro. 2014. Improved Approximation Algorithms for the Average-Case Tree Searching Problem. Algorithmica 68, 4 (2014), 1045--1074.

[8]

Qianhao Cong, Jing Tang, Yuming Huang, Lei Chen, and Yeow Meng Chee. 2022. Cost-Effective Algorithms for Average-Case Interactive Graph Search. arXiv:2201.07944 cs.DB.

[9]

Susan B Davidson, Sanjeev Khanna, Tova Milo, and Sudeepa Roy. 2013. Using the Crowd for Top-k and Group-by Queries. In Proc. ICDT. 225--236.

Digital Library

[10]

Jia Deng,Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A Large-scale Hierarchical Image Database. In Proc. IEEE CVPR. 248--255.

[11]

Dariusz Dereniowski. 2008. Edge Ranking and Searching in Partial Orders. Discrete Applied Mathematics 156, 13 (2008), 2493--2500.

Digital Library

[12]

Jing Gao, Qi Li, Bo Zhao, Wei Fan, and Jiawei Han. 2016. Mining Reliable Information from Passively and Actively Crowdsourced Data. In Proc. ACM SIGKDD. 2121--2122.

Digital Library

[13]

Stephen Guo, Aditya Parameswaran, and Hector Garcia-Molina. 2012. So Who Won? Dynamic Max Discovery With the Crowd. In Proc. ACM SIGMOD. 385--396.

[14]

Ruining He and Julian McAuley. 2016. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-class Collaborative Filtering. In Proc. WWW. 507--517.

Digital Library

[15]

Mengdi Huai, Di Wang, Chenglin Miao, Jinhui Xu, and Aidong Zhang. 2019. Privacy-aware Synthesizing for Crowdsourced Data. In Proc IJCAI. 2542--2548.

[16]

Chao Huang, Xian Wu, and Dong Wang. 2016. Crowdsourcing-based Urban Anomaly Prediction System for Smart Cities. In Proc ACM CIKM. 1969--1972.

Digital Library

[17]

Johan Jensen. 1906. Sur les Fonctions Convexes et les Inégalités entre les Valeurs Moyennes. Acta Mathematica 30, 1 (1906), 175--193.

[18]

Adam Marcus Eugene Wu David Karger and Samuel Madden Robert Miller. 2011. Human-powered Sorts and Joins. Proc. VLDB Endowment 5, 1 (2011), 13--24.

Digital Library

[19]

Donald E. Knuth. 1998. The Art of Computer Programming, Volume 3: (2nd Ed.) Sorting and Searching.

[20]

Shao-Yuan Li, Yuan Jiang, Nitesh V Chawla, and Zhi-Hua Zhou. 2018. Multi-label Learning from Crowds. IEEE Transactions on Knowledge and Data Engineering 31, 7 (2018), 1369--1382.

[21]

Xinke Li, Chongshou Li, Zekun Tong, Andrew Lim, Junsong Yuan, Yuwei Wu, Jing Tang, and Raymond Huang. 2020. Campus3d: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor Scene. In Proc. ACM MM. 238--246.

Digital Library

[22]

Yanying Li, Haipei Sun, and Wendy Hui Wang. 2020. Towards Fair Truth Discovery from Biased Crowdsourced Answers. In Proc. ACM SIGKDD. 599--607.

Digital Library

[23]

Yuanbing Li, Xian Wu, Yifei Jin, Jian Li, and Guoliang Li. 2020. Efficient Algorithms for Crowd-Aided Categorization. Proc. VLDB Endowment 13, 8 (2020), 1221--1233.

Digital Library

[24]

Fenglong Ma, Yaliang Li, Qi Li, Minghui Qiu, Jing Gao, Shi Zhi, Lu Su, Bo Zhao, Heng Ji, and Jiawei Han. 2015. Faitcrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation. In Proc. ACM SIGKDD. 745--754.

Digital Library

[25]

George A Miller. 1998. WordNet: An Electronic Lexical Database. MIT Press.

[26]

Kaixiang Mo, Erheng Zhong, and Qiang Yang. 2013. Cross-task Crowdsourcing. In Proc. ACM SIGKDD. 677--685.

[27]

Shay Mozes, Krzysztof Onak, and OrenWeimann. 2008. Finding an Optimal Tree Searching Strategy in Linear Time. In Proc. SODA. 1096--1105.

[28]

Robert Nowak. 2009. Noisy Generalized Binary Search. In Proc. NeurIPS. 1366--1374.

[29]

Krzysztof Onak and Pawel Parys. 2006. Generalization of Binary Search: Searching in Trees and Forest-like Partial Orders. In Proc. IEEE FOCS. 379--388.

Digital Library

[30]

Aditya Parameswaran, Stephen Boyd, Hector Garcia-Molina, Ashish Gupta, Neoklis Polyzotis, and Jennifer Widom. 2014. Optimal Crowd-Powered Rating and Filtering Algorithms. Proc. VLDB Endowment 7, 9 (2014), 685--696.

Digital Library

[31]

Aditya Parameswaran, Anish Das Sarma, Hector Garcia-Molina, Neoklis Polyzotis, and Jennifer Widom. 2011. Human-Assisted Graph Search: It's Okay to Ask Questions. Proc. VLDB Endowment 4, 5 (2011), 267--278.

Digital Library

[32]

Aditya G Parameswaran, Hector Garcia-Molina, Hyunjung Park, Neoklis Polyzotis, Aditya Ramesh, and Jennifer Widom. 2012. Crowdscreen: Algorithms for Filtering Data with Humans. In Proc. ACM SIGMOD. 361--372.

Digital Library

[33]

Hesam Salehian, Patrick Howell, and Chul Lee. 2017. Matching Restaurant Menus to Crowdsourced Food Data: A Scalable Machine Learning Approach. In Proc. ACM SIGKDD. 2001--2009.

Digital Library

[34]

Yuyin Sun, Adish Singla, Dieter Fox, and Andreas Krause. 2015. Building hierarchies of concepts via crowdsourcing. In Proc. IJCAI. 844--853.

[35]

Yufei Tao, Yuanbing Li, and Guoliang Li. 2019. Interactive Graph Search. In Proc. ACM SIGMOD. 1393--1410.

Digital Library

[36]

Petros Venetis, Hector Garcia-Molina, Kerui Huang, and Neoklis Polyzotis. 2012. Max Algorithms in Crowdsourcing Environments. In Proc. WWW. 989--998.

Digital Library

[37]

Vasilis Verroios and Hector Garcia-Molina. 2015. Entity Resolution with Crowd Rrrors. In Proc. IEEE ICDE. 219--230.

[38]

Norases Vesdapunt, Kedar Bellare, and Nilesh Dalvi. 2014. Crowdsourcing Algorithms for Entity Resolution. Proc. VLDB Endowment 7, 12 (2014), 1071--1082.

Digital Library

[39]

Abraham Wald. 1947. Sequential Analysis. Wiley.

[40]

JiannanWang, Tim Kraska, Michael J Franklin, and Jianhua Feng. 2012. CrowdER: Crowdsourcing Entity Resolution. Proc. VLDB Endowment 5, 11 (2012), 1483--1494.

Digital Library

[41]

YueWang, KeWang, and Chunyan Miao. 2020. Truth Discovery Against Strategic Sybil Attack in Crowdsourcing. In Proc. ACM SIGKDD. 95--104.

[42]

Steven Euijong Whang, Peter Lofgren, and Hector Garcia-Molina. 2013. Question Selection for Crowd Entity Resolution. Proc. VLDB Endowment 6, 6 (2013), 349--360.

Digital Library

[43]

Jacob Whitehill, Ting-fan Wu, Jacob Bergsma, Javier Movellan, and Paul Ruvolo. 2009. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. In Advances in NeurIPS, Vol. 22. 2035--2043.

[44]

Chen Jason Zhang, Lei Chen, Yongxin Tong, and Zheng Liu. 2015. Cleaning Uncertain Data with a Noisy Crowd. In Proc. IEEE ICDE. 6--17.

[45]

Jing Zhang and Xindong Wu. 2018. Multi-label Inference for Crowdsourcing. In Proc. ACM SIGKDD. 2738--2747.

Digital Library

[46]

Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. 2017. Truth Inference in Crowdsourcing: Is the Problem Solved? Proc. VLDB Endowment 10, 5 (2017), 541--552.

Digital Library

[47]

Xuliang Zhu, Xin Huang, Byron Choi, Jiaxin Jiang, Zhaonian Zou, and Jianliang Xu. 2021. Budget Constrained Interactive Search for Multiple Targets. Proc. VLDB Endowment 14, 6 (2021), 890--902.

Digital Library

[48]

Honglei Zhuang, Aditya Parameswaran, Dan Roth, and Jiawei Han. 2015. Debiasing Crowdsourced Batches. In Proc. ACM SIGKDD. 1593--1602.

Digital Library

Cited By

Zhao ZGan JQi JBao Z(2024)Efficient Example-Guided Interactive Graph Search2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00033(342-354)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00033

Index Terms

Noisy Interactive Graph Search
1. Information systems
  1. World Wide Web
    1. Web applications
      1. Crowdsourcing
2. Theory of computation
  1. Design and analysis of algorithms
    1. Graph algorithms analysis

Recommendations

Interactive Graph Search
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

We study \em interactive graph search (IGS), with the conceptual objective of departing from the conventional "top-down" strategy in searching a poly-hierarchy, a.k.a.\ a decision graph. In IGS, a machine assists a human in looking for a target node z ...
Ranked Reverse Nearest Neighbor Search

Given a set of data points P and a query point q in a multidimensional space, Reverse Nearest Neighbor (RNN) query finds data points in P whose nearest neighbors are q. Reverse k-Nearest Neighbor (RkNN) query (where k ≥ 1) generalizes RNN query to find ...
Distributed Query Plan Generation using Cuckoo Search Algorithm

Query processing in distributed databases involves data transmission amongst sites capable of providing answers to a distributed query. For this, a distributed query processing strategy, which generates efficient query processing plans for a given ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2022

5033 pages

ISBN:9781450393850

DOI:10.1145/3534678

General Chairs:
Aidong Zhang
University of Virginia
,
Huzefa Rangwala
Amazon/George Mason University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Hong Kong RGC RIF
HKUST-NAVER/LINE AI Lab
National Natural Science Foundation of China
Alibaba Group
Hong Kong RGC CRF
Hong Kong RGC Theme-based project
Hong Kong RGC AOE
China NSFC
Guangdong Basic and Applied Basic Research Foundation
National Key Research and Development Program of China
Microsoft Research Asia Collaborative Research Grant
Hong Kong RGC GRF
Didi-HKUST joint research lab
The Hong Kong University of Science and Technology (Guangzhou)
Hong Kong ITC ITF
HKUST-Webank joint research lab grants

Conference

KDD '22

Sponsor:

KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 14 - 18, 2022

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
387
Total Downloads

Downloads (Last 12 months)68
Downloads (Last 6 weeks)3

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhao ZGan JQi JBao Z(2024)Efficient Example-Guided Interactive Graph Search2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00033(342-354)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00033

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents