Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

From Appearance to Essence: Comparing Truth Discovery Methods without Using Ground Truth

Published: 11 September 2020 Publication History

Abstract

Truth discovery has been widely studied in recent years as a fundamental means for resolving the conflicts in multi-source data. Although many truth discovery methods have been proposed based on different considerations and intuitions, investigations show that no single method consistently outperforms the others. To select the right truth discovery method for a specific application scenario, it becomes essential to evaluate and compare the performance of different methods. A drawback of current research efforts is that they commonly assume the availability of certain ground truth for the evaluation of methods. However, the ground truth may be very limited or even impossible to obtain, rendering the evaluation biased. In this article, we present CompTruthHyp, a generic approach for comparing the performance of truth discovery methods without using ground truth. In particular, our approach calculates the probability of observations in a dataset based on the output of different methods. The probability is then ranked to reflect the performance of these methods. We review and compare 12 representative truth discovery methods and consider both single-valued and multi-valued objects. The empirical studies on both real-world and synthetic datasets demonstrate the effectiveness of our approach for comparing truth discovery methods.

References

[1]
Djamal Benslimane, Quan Z. Sheng, Mahmoud Barhamgi, and Henri Prade. 2016. The uncertain web: Concepts, challenges, and current solutions. ACM Trans. Internet Technol. 16, 1 (2016), 1:1--1:6.
[2]
Laure Berti-Équille. 2019. Truth Discovery. Springer International Publishing, Cham, 1--8.
[3]
Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30, 1–7 (1998), 107--117.
[4]
Anish Das Sarma, Xin Dong, and Alon Halevy. 2011. Data integration with dependent sources. In Proceedings of the 14th International Conference on Extending Database Technology (EDBT'11). 401--412.
[5]
Xin Luna Dong, Laure Berti-Equille, Yifan Hu, and Divesh Srivastava. 2010. Global detection of complex copying relationships between sources. Proc. VLDB Endow. 3, 1--2 (2010), 1358--1369.
[6]
Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Integrating conflicting data: The role of source dependence. Proc. VLDB Endow. 2, 1 (2009), 550--561.
[7]
Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Truth discovery and copying detection in a dynamic world. Proc. VLDB Endow. 2, 1 (2009), 562--573.
[8]
Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 601--610.
[9]
Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Kevin Murphy, Shaohua Sun, and Wei Zhang. 2014. From data fusion to knowledge fusion. Proc. VLDB Endow. 7, 10 (2014), 881--892.
[10]
Xin Luna Dong, Barna Saha, and Divesh Srivastava. 2012. Less is more: Selecting sources wisely for integration. Proc. VLDB Endow. 6, 2 (2012), 37--48.
[11]
Wenfei Fan. 2012. Data quality: Theory and practice. In Proceedings of the International Conference on Web-Age Information Management. 1--16.
[12]
Wenfei Fan, Floris Geerts, Shuai Ma, Nan Tang, and Wenyuan Yu. 2013. Data quality problems beyond consistency and duduplication. In Search of Elegance in the Theory and Practice of Computation: Essays Dedicated to Peter Buneman. Springer Berlin Heidelberg, 237--249.
[13]
Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Mahmoud Barhamgi, Lina Yao, and Anne H. H. Ngu. 2017. SourceVote: Fusing multi-valued data via inter-source agreements. In Proceedings of the 36th International Conference on Conceptual Modeling (ER’17). 164--172.
[14]
Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Dianhui Chu, and Anne H. H. Ngu. 2019. SmartVote: A full-fledged graph-based model for multi-valued truth discovery. World Wide Web J. 22, 4 (2019), 1855–1885.
[15]
Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, and Anne H. H. Ngu. 2017. Value veracity estimation for multi-truth objects via a graph-based approach. In Proceedings of the International World Wide Web Conference (WWW’17). 777--778.
[16]
Alban Galland, Serge Abiteboul, Amélie Marian, and Pierre Senellart. 2010. Corroborating information from disagreeing views. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM’10). 131--140.
[17]
David Gleich, Paul Constantine, Abraham Flaxman, and Asela Gunawardana. 2010. Tracking the random surfer: Empirically measured teleportation parameters in PageRank. In Proceedings of the International World Wide Web Conference (WWW’10). 381--390.
[18]
Manish Gupta, Yizhou Sun, and Jiawei Han. 2011. Trust analysis with clustering. In Proceedings of the International World Wide Web Conference (WWW’11). 53--54.
[19]
Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5 (1999), 604--632.
[20]
Qi Li, Yaliang Li, Jing Gao, Lu Su, Bo Zhao, Murat Demirbas, Wei Fan, and Jiawei Han. 2014. A confidence-aware approach for truth discovery on long-tail data. Proc. VLDB Endow. 8, 4 (2014).
[21]
Qi Li, Yaliang Li, Jing Gao, Bo Zhao, Wei Fan, and Jiawei Han. 2014. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1187--1198.
[22]
Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, and Divesh Srivastava. 2012. Truth finding on the deep web: Is the problem solved? Proc. VLDB Endow. 6, 2 (2012), 97--108.
[23]
Xian Li, Xin Luna Dong, Kenneth B. Lyons, Weiyi Meng, and Divesh Srivastava. 2015. Scaling up copy detection. In Proceedings of the IEEE International Conference on Data Engineering (ICDE’15). 89--100.
[24]
Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. 2015. A survey on truth discovery. ACM SIGKDD Explor. Newslett. 17, 2 (2015), 1--16.
[25]
Yaliang Li, Chenglin Miao, Lu Su, Jing Gao, Qi Li, Bolin Ding, Zhan Qin, and Kui Ren. 2018. An efficient two-layer mechanism for privacy-preserving truth discovery. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery 8 Data Mining (KDD’18). 1705--1714.
[26]
Xueling Lin and Lei Chen. 2018. Domain-aware multi-truth discovery from conflicting sources. Proc. VLDB Endow. 11, 5 (2018), 635--647.
[27]
Xuan Liu, Xin Luna Dong, Beng Chin Ooi, and Divesh Srivastava. 2011. Online data fusion. Proc. VLDB Endow. 4, 11 (2011), 932--943.
[28]
J. Marshall, A. Argueta, and D. Wang. 2017. A neural network approach for truth discovery in social sensing. In Proceedings of the IEEE 14th International Conference on Mobile Ad Hoc and Sensor Systems (MASS’17). 343--347.
[29]
Chenglin Miao, Wenjun Jiang, Lu Su, Yaliang Li, Suxin Guo, Zhan Qin, Houping Xiao, Jing Gao, and Kui Ren. 2019. Privacy-preserving truth discovery in crowd sensing systems. ACM Trans. Sens. Netw. 15, 1 (2019).
[30]
Jeff Pasternack and Dan Roth. 2010. Comprehensive trust metrics for information networks. In Proceedings of the Army Science Conference.
[31]
Jeff Pasternack and Dan Roth. 2010. Knowing what to believe (when you already know something). In Proceedings of the International Conference on Computational Linguistics (COLING’10). 877--885.
[32]
Jeff Pasternack and Dan Roth. 2011. Making better informed trust decisions with generalized fact-finding. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’11). 2324--2329.
[33]
Jeff Pasternack and Dan Roth. 2013. Latent credibility analysis. In Proceedings of the International World Wide Web Conference (WWW’13). 1009--1020.
[34]
Ravali Pochampally, Anish Das Sarma, Xin Luna Dong, Alexandra Meliou, and Divesh Srivastava. 2014. Fusing data with correlations. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 433--444.
[35]
Kashyap Popat, Subhabrata Mukherjee, Jannik Strötgen, and Gerhard Weikum. 2017. Where the truth lies: Explaining the credibility of emerging claims on the web and social media. In Proceedings of the International World Wide Web Conference (WWW’17). 1003--1012.
[36]
Theodoros Rekatsinas, Xin Luna Dong, and Divesh Srivastava. 2014. Characterizing and selecting fresh data sources. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 919--930.
[37]
Dalia Attia Waguih and Laure Berti-Equille. 2014. Truth discovery algorithms: An experimental evaluation. Arxiv Preprint Arxiv:1409.6428 (2014).
[38]
Mengting Wan, Xiangyu Chen, Lance Kaplan, Jiawei Han, Jing Gao, and Bo Zhao. 2016. From truth discovery to trustworthy opinion discovery: An uncertainty-aware quantitative modeling approach. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1885--1894.
[39]
D. Wang, M. T. Amin, S. Li, T. Abdelzaher, L. Kaplan, S. Gu, C. Pan, H. Liu, C. C. Aggarwal, R. Ganti, X. Wang, P. Mohapatra, B. Szymanski, and H. Le. 2014. Using humans as sensors: An estimation-theoretic perspective. In Proceedings of the of the International Conference on Information Processing in Sensor Networks (IPSN’14). 35--46.
[40]
Xianzhi Wang, Quan Z. Sheng, Xiu Susie Fang, Lina Yao, Xiaofei Xu, and Xue Li. 2015. An integrated Bayesian approach for effective multi-truth discovery. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM’15). 493--502.
[41]
Xianzhi Wang, Quan Z. Sheng, Lina Yao, Xiu Susie Fang, Xiaofei Xu, and Boualem Benatallah. 2016. Truth discovery via exploiting implications from multi-source data. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM’16). 861--870.
[42]
Xianzhi Wang, Quan Z. Sheng, Lina Yao, Xue Li, Xiu Susie Fang, and Xiaofei Xu. 2016. Empowering truth discovery with multi-truth prediction. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM’16). 881--890.
[43]
Houping Xiao, Jing Gao, Qi Li, Fenglong Ma, Lu Su, Yunlong Feng, and Aidong Zhang. 2016. Towards confidence in the truth: A bootstrapping based truth discovery approach. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1935--1944.
[44]
H. Xiao, J. Gao, Q. Li, F. Ma, L. Su, Y. Feng, and A. Zhang. 2019. Towards confidence interval estimation in truth discovery. IEEE Trans. Knowl. Data Eng. 31, 3 (Mar. 2019), 575--588.
[45]
Xiaoxin Yin, Jiawei Han, and Philip S. Yu. 2008. Truth discovery with multiple conflicting information providers on the web. IEEE Trans. Knowl. Data Eng. 20, 6 (2008), 796--808.
[46]
Xiaoxin Yin and Wenzhao Tan. 2011. Semi-supervised truth discovery. In Proceedings of the International World Wide Web Conference (WWW’11). 217--226.
[47]
Dian Yu, Hongzhao Huang, Taylor Cassidy, Heng Ji, Chi Wang, Shi Zhi, Jiawei Han, Clare Voss, and Malik Magdon-Ismail. 2014. The wisdom of minority: Unsupervised slot filling validation based on multi-dimensional truth-finding. In Proceedings of the International Conference on Computational Linguistics (COLING’14). 1567--1578.
[48]
Hengtong Zhang, Qi Li, Fenglong Ma, Houping Xiao, Yaliang Li, Jing Gao, and Lu Su. 2016. Influence-aware truth discovery. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM’16). 851--860.
[49]
Bo Zhao and Jiawei Han. 2012. A probabilistic model for estimating real-valued truth from conflicting sources. In Proceedings of the International Workshop on Quality in DataBases (QDB’12) coheld with VLDB.
[50]
Bo Zhao, Benjamin I. P. Rubinstein, Jim Gemmell, and Jiawei Han. 2012. A Bayesian approach to discovering truth from conflicting sources for data integration. Proc. VLDB Endow. 5, 6 (2012), 550--561.
[51]
Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. 2017. Truth inference in crowdsourcing: Is the problem solved? Proc. VLDB Endow. 10, 5 (2017).
[52]
Shi Zhi, Fan Yang, Zheyi Zhu, Qi Li, Zhaoran Wang, and Jiawei Han. 2018. Dynamic truth discovery on numerical data. In Proceedings of the IEEE International Conference on Data Mining (ICDM’18). 817--826.
[53]
Shi Zhi, Bo Zhao, Wenzhu Tong, Jing Gao, Dian Yu, Heng Ji, and Jiawei Han. 2015. Modeling truth existence in truth discovery. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1543--1552.

Cited By

View all
  • (2024)TVD-RA: A Truthful Data Value Discovery-Based Reverse Auction Incentive System for Mobile CrowdsensingIEEE Internet of Things Journal10.1109/JIOT.2023.330807211:4(5826-5839)Online publication date: 15-Feb-2024
  • (2023)Mapping Irrigated Areas in China Using a Synergy ApproachWater10.3390/w1509166615:9(1666)Online publication date: 25-Apr-2023
  • (2023)DLFTI: A deep learning based fast truth inference mechanism for distributed spatiotemporal data in mobile crowd sensingInformation Sciences10.1016/j.ins.2023.119245644(119245)Online publication date: Oct-2023

Index Terms

  1. From Appearance to Essence: Comparing Truth Discovery Methods without Using Ground Truth

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 11, Issue 6
      Survey Paper and Regular Paper
      December 2020
      237 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/3424135
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 September 2020
      Accepted: 01 July 2020
      Revised: 01 July 2020
      Received: 01 January 2020
      Published in TIST Volume 11, Issue 6

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Web search
      2. multi-valued objects
      3. performance evaluation
      4. single-valued objects
      5. sparse ground truth
      6. truth discovery methods

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • Australian Research Council (ARC)
      • Discovery Project

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)22
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 27 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)TVD-RA: A Truthful Data Value Discovery-Based Reverse Auction Incentive System for Mobile CrowdsensingIEEE Internet of Things Journal10.1109/JIOT.2023.330807211:4(5826-5839)Online publication date: 15-Feb-2024
      • (2023)Mapping Irrigated Areas in China Using a Synergy ApproachWater10.3390/w1509166615:9(1666)Online publication date: 25-Apr-2023
      • (2023)DLFTI: A deep learning based fast truth inference mechanism for distributed spatiotemporal data in mobile crowd sensingInformation Sciences10.1016/j.ins.2023.119245644(119245)Online publication date: Oct-2023

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media