Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Maximal fusion of facts on the web with credibility guarantee

Published: 01 August 2019 Publication History

Highlights

A maximal number of factual claims with credibility higher than the precision requirement are extracted from the Web.
The learning model is up to 20 times faster than traditional learning.
The proposed model extracts up to 6 times more highly credible factual claims than a typical information extraction process.
The proposed model requires less than 57% label information to extract the same number of highly credible factual claims.
The proposed model is robust to 20% noisy data with only 6% deviation.

Abstract

The Web became the central medium for valuable sources of information fusion applications. However, such user-generated resources are often plagued by inaccuracies and misinformation as a result of the inherent openness and uncertainty of the Web. While finding objective data is non-trivial, assessing their credibility with a high confidence is even harder due to the conflicts of information between Web sources. In this work, we consider the novel setting of fusing factual data from the Web with a credibility guarantee and maximal recall. The ultimate goal is that not only the information should be extracted as much as possible but also its credibility must satisfy a threshold requirement. To this end, we formulate the problem of instantiating a maximal set of factual information such that its precision is larger than a pre-defined threshold. Our proposed approach is a learning process to optimize the parameters of a probabilistic model that captures the relationships between data sources, their contents, and the underlying factual information. The model automatically searches for best parameters without pre-trained data. Upon convergence, the parameters are used to instantiate as much as factual information with a precision guarantee. Our evaluations of real-world datasets show that our approach outperforms the baselines up to 6 times.

References

[1]
D.E. Losada, J. Parapar, A. Barreiro, A rank fusion approach based on score distributions for prioritizing relevance assessments in information retrieval evaluation, Inf. Fusion 39 (2018) 56–71.
[2]
L.B.S. Network, Big data fusion in internet of things, Inf. Fusion 40 (2017) 32–33.
[3]
F. Wu, Y. Huang, Z. Yuan, Domain-specific sentiment classification via fusing sentiment knowledge from multiple sources, Inf. Fusion 35 (2017) 26–37.
[4]
O. Deshpande, D.S. Lamba, M. Tourn, S. Das, S. Subramaniam, A. Rajaraman, V. Harinarayan, A. Doan, Building, Maintaining, and Using Knowledge Bases: A Report from the Trenches, SIGMOD, 2013, pp. 1209–1220.
[5]
M.T. Dzindolet, S.A. Peterson, R.A. Pomranky, L.G. Pierce, H.P. Beck, The role of trust in automation reliance, Int. J. Human-Computer Studies, 2003, pp. 697–718.
[6]
J.L. Herlocker, J.A. Konstan, J. Riedl, Explaining collaborative filtering recommendations, CSCW, 2000, pp. 241–250.
[7]
G. Bello-Orgaz, J.J. Jung, D. Camacho, Social big data: Recent achievements and new challenges, Inf. Fusion (2016) 45–59.
[8]
J.A. Iglesias, A. Tiemblo, A. Ledezma, A. Sanchis, Web news mining in an evolving framework, Inf. Fusion (2016) 90–98.
[9]
J. Liu, R.M. Rodríguez, L. Martínez, New trends of information fusion in decision making, Inf. Fusion (2016) 87–88.
[10]
Z. Yan, X. Jing, W. Pedrycz, Fusing and mining opinions for reputation generation, Inf. Fusion 36 (2017) 172–184.
[11]
K. Guo, Y. Tang, P. Zhang, Csf: crowdsourcing semantic fusion for heterogeneous media big data in the internet of things, Inf. Fusion 37 (2017) 77–85.
[12]
M. Banko, M.J. Cafarella, S. Soderland, M. Broadhead, O. Etzioni, Open Information Extraction for the Web, IJCAI, 2007, pp. 2670–2676.
[13]
S. Mukherjee, G. Weikum, C. Danescu-Niculescu-Mizil, People on Drugs: Credibility of User Statements in Health Communities, KDD, 2014, pp. 65–74.
[14]
A.K. Raz, C.R. Kenley, D.A. DeLaurentis, A system-of-systems perspective for information fusion system design and evaluation, Inf. Fusion 35 (2017) 148–165.
[15]
C. De Maio, G. Fenza, V. Loia, M. Parente, Time aware knowledge extraction for microblog summarization on twitter, Inf. Fusion 28 (2016) 60–74.
[16]
L. Snidaro, J. García, J. Llinas, Context-based information fusion: a survey and discussion, Inf. Fusion 25 (2015) 16–31.
[17]
P. Ernst, C. Meng, A. Siu, G. Weikum, Knowlife: A Knowledge Graph for Health and Life Sciences, ICDE, 2014, pp. 1254–1257.
[19]
S.S. Saquib, C.A. Bouman, K. Sauer, Ml parameter estimation for Markov random fields with applications to bayesian tomography, IEEE Trans. Image Process. 7 (7) (1998) 1029–1044.
[20]
N.Q.V. Hung, C.T. Duong, N.T. Tam, M. Weidlich, K. Aberer, H. Yin, X. Zhou, Argument discovery via crowdsourcing, VLDBJ 26 (4) (2017) 511–535.
[21]
N.Q. Hung, D.C. Thang, N.T. Tam, M. Weidlich, K. Aberer, H. Yin, X. Zhou, Answer validation for generic crowdsourcing tasks with minimal efforts, VLDBJ 26 (6) (2017) 855–880.
[22]
N.Q.V. Hung, D.C. Thang, M. Weidlich, K. Aberer, Minimizing efforts in validating crowd answers, SIGMOD, 2015, pp. 999–1014.
[23]
N.Q.V. Hung, H.H. Viet, N.T. Tam, M. Weidlich, H. Yin, X. Zhou, Computing crowd consensus with partial agreement, TKDE 30 (1) (2018) 1–14.
[24]
C. Yu, B. Xiao, D. Yao, X. Ding, H. Jin, Using check-in features to partition locations for individual users in location based social network, Information Fusion, 2017, pp. 86–97.
[25]
A. Tommasel, D. Godoy, A social-aware online short-text feature selection technique for social media, Inf. Fusion (2018) 1–17.
[26]
K.S. Hasan, V. Ng, Why are you taking this stance? identifying and classifying reasons in ideological debates, EMNLP, 2014.
[27]
S. Peng, A. Yang, L. Cao, S. Yu, D. Xie, Social influence modeling using information theory in mobile social networks, Inf. Sci. (Ny) 379 (2017) 146–159.
[28]
F.R. Kschischang, B.J. Frey, H.A. Loeliger, Factor graphs and the sum-product algorithm, TIT (2001) 498–519.
[29]
A. Abrardo, M. Barni, K. Kallas, B. Tondi, A message passing approach for decision fusion in adversarial multi-sensor networks, Information Fusion, 2018, pp. 101–111.
[30]
C. Zhang, C. R., Towards high-throughput Gibbs sampling at scale: A study across storage managers, SIGMOD, 2013, pp. 397–408.
[31]
A. McCallum, K. Bellare, F. Pereira, A conditional random field for discriminatively-trained finite-state string edit distance, UAI, 2005.
[32]
M. Schmidt, Linearly constrained bayesian matrix factorization for blind source separation, NIPS, 2009, pp. 1624–1632.
[33]
S. Borzsony, D. Kossmann, K. Stocker, The Skyline Operator, ICDE, 2001, pp. 421–430.
[34]
M. Bach, A. Werner, J. Żywiec, W. Pluskiewicz, The study of under-and over-sampling methods utility in analysis of highly imbalanced data on osteoporosis, Inf. Sci. (Ny) 384 (2017) 174–190.
[35]
H. Kwak, C. Lee, H. Park, S. Moon, What Is Twitter, a social network or a news media?, WWW, 2010.
[36]
O. Amble, D.E. Knuth, Ordered hash tables, Comput. J. (1974).
[39]
C. Ding, X. He, P. Husbands, H. Zha, H. Simon, Pagerank, Hits and a Unified Framework for Link Analysis, SDM, 2003, pp. 249–253.
[40]
A. Olteanu, S. Peshterliev, X. Liu, K. Aberer, Web Credibility: Features Exploration and Credibility Prediction, ECIR, 2013, pp. 557–568.
[41]
C. Castillo, M. Mendoza, B. Poblete, Information Credibility on Twitter, WWW, 2011.
[42]
F. Yang, Y. Liu, X. Yu, M. Yang, Automatic detection of rumor on sina weibo, KDD, 2012.
[43]
Z. Zhao, P. Resnick, Q. Mei, Enquiring minds: Early detection of rumors in social media from enquiry posts, WWW, 2015.
[44]
K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, Y. Singer, Online Passive-aggressive Algorithms, JMLR, 2006, pp. 551–585.
[45]
P. Pérez-Gállego, J.R. Quevedo, J.J. del Coz, Using ensembles for problems with characterizable changes in data distribution: a case study on quantification, Inf. Fusion 34 (2017) 87–100.
[46]
B. Pes, N. Dessì, M. Angioni, Exploiting the ensemble paradigm for stable feature selection: a case study on high-dimensional genomic data, Inf. Fusion 35 (2017) 132–147.
[47]
S. Argamon-Engelson, I. Dagan, Committee-based sample selection for probabilistic classifiers, JAIR, 1999, pp. 335–360.
[48]
S. Tong, D. Koller, Support vector machine active learning with applications to text classification, JMLR, 2002, pp. 45–66.
[49]
B. Zadrozny, C. Elkan, Learning and making decisions when costs and probabilities are both unknown, KDD, 2001, pp. 204–213.
[50]
B. Settles, Active learning, Synthesis Lectures on Artificial Intelligence and Machine Learning 6 (1) (2012) 1–114.
[51]
R.R. Yager, F. Petry, An intelligent quality-based approach to fusing multi-source probabilistic information, Inf. Fusion (2016) 127–136.
[52]
W. Xu, J. Yu, A novel approach to information fusion in multi-source datasets: a granular computing viewpoint, Inf. Sci (Ny) (2017) 410–423.
[53]
A. Galland, S. Abiteboul, A. Marian, P. Senellart, Corroborating information from disagreeing views, WSDM, 2010, pp. 131–140.
[54]
X.L. Dong, F. Naumann, Data fusion: resolving data conflicts for integration, VLDB, 2009, pp. 1654–1655.
[55]
J.A. Balazs, J.D. Velásquez, Opinion mining and information fusion: A survey, Information Fusion, 2016, pp. 95–110.
[56]
J. Zhang, V.S. Sheng, Q. Li, J. Wu, X. Wu, Consensus algorithms for biased labeling in crowdsourcing, Information Sciences, 2017, pp. 254–273.
[57]
N.Q.V. Hung, N.T. Tam, M. Weidlich, D.C. Thang, X. Zhou, What-if analysis with conflicting goals: recommending data ranges for exploration, VLDB 10 (2017).
[58]
B. Zhao, B.I. Rubinstein, J. Gemmell, J. Han, A Bayesian approach to discovering truth from conflicting sources for data integration, VLDB, 2012, pp. 550–561.
[59]
D. Wang, L. Kaplan, H. Le, T. Abdelzaher, On truth discovery in social sensing: A maximum likelihood estimation approach, IPSN, 2012, pp. 233–244.
[60]
J. Pasternack, D. Roth, Latent credibility analysis, WWW, 2013.
[61]
A. Kozierkiewicz-Hetmańska, The analysis of expert opinions consensus quality, Information Fusion, 2017, pp. 80–86.
[62]
A. Šipošová, L. Šipeky, F. Rindone, S. Greco, R. Mesiar, Super-and subadditive constructions of aggregation functions, Information Fusion 34 (2017) 49–54.
[63]
X.L. Dong, L. Berti-Equille, Y. Hu, D. Srivastava, Solomon: Seeking the truth via copying detection, VLDB, 2010, pp. 1617–1620.
[64]
X.L. Dong, L. Berti-Equille, D. Srivastava, Truth discovery and copying detection in a dynamic world, VLDB, 2009, pp. 562–573.
[65]
T.G. Papaioannou, J.E. Ranvier, A. Olteanu, K. Aberer, A decentralized recommender system for effective web credibility assessment, CIKM, 2012, pp. 704–713.
[66]
H. Yin, H. Chen, X. Sun, H. Wang, Y. Wang, Q.V.H. Nguyen, SPTF: a scalable probabilistic tensor factorization model for semantic-aware behavior prediction, ICDM, 2017, pp. 585–594.
[67]
N.T. Tam, M. Weidlich, D.C. Thang, H. Yin, N.Q.V. Hung, Retaining data from streams of social platforms with minimal regret, IJCAI, 2017, pp. 2850–2856.
[68]
H. Chen, H. Yin, W. Wang, H. Wang, Q.V.H. Nguyen, X. Li, Pme: Projected metric embedding on heterogeneous networks for link prediction, KDD, 2018, pp. 1177–1186.
[69]
A. Valdivia, M.V. Luzón, E. Cambria, F. Herrera, Consensus vote models for detecting and filtering neutrality in sentiment analysis, Inf. Fusion 44 (2018) 126–135.
[70]
I. Chaturvedi, E. Cambria, R.E. Welsch, F. Herrera, Distinguishing between facts and opinions for sentiment analysis: survey and challenges, Inf. Fusion 44 (2018) 65–77.
[71]
S. Poria, E. Cambria, R. Bajpai, A. Hussain, A review of affective computing: from unimodal analysis to multimodal fusion, Inf. Fusion 37 (2017) 98–125.
[72]
E. Cambria, Affective computing and sentiment analysis, IEEE Intell. Syst. 31 (2) (2016) 102–107.

Cited By

View all
  • (2023)Efficient Integration of Multi-Order Dynamics and Internal Dynamics in Stock Movement PredictionProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570427(850-858)Online publication date: 27-Feb-2023
  • (2023)Learning Holistic Interactions in LBSNs With High-Order, Dynamic, and Multi-Role ContextsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.315079235:5(5002-5016)Online publication date: 1-May-2023
  • (2022)Are Rumors Always False?: Understanding Rumors Across Domains, Queries, and RatingsAdvanced Data Mining and Applications10.1007/978-3-030-95405-5_13(174-189)Online publication date: 2-Feb-2022
  • Show More Cited By

Index Terms

  1. Maximal fusion of facts on the web with credibility guarantee
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Information Fusion
        Information Fusion  Volume 48, Issue C
        Aug 2019
        133 pages

        Publisher

        Elsevier Science Publishers B. V.

        Netherlands

        Publication History

        Published: 01 August 2019

        Author Tags

        1. Information fusion
        2. Knowledge extraction
        3. Precision guarantee
        4. Probabilistic model
        5. Credibility analysis

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 17 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)Efficient Integration of Multi-Order Dynamics and Internal Dynamics in Stock Movement PredictionProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570427(850-858)Online publication date: 27-Feb-2023
        • (2023)Learning Holistic Interactions in LBSNs With High-Order, Dynamic, and Multi-Role ContextsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.315079235:5(5002-5016)Online publication date: 1-May-2023
        • (2022)Are Rumors Always False?: Understanding Rumors Across Domains, Queries, and RatingsAdvanced Data Mining and Applications10.1007/978-3-030-95405-5_13(174-189)Online publication date: 2-Feb-2022
        • (2019)Aggregation of uncertainty data based on ordered weighting aggregation and generalized information qualityInternational Journal of Intelligent Systems10.1002/int.2211134:7(1653-1666)Online publication date: 19-Mar-2019

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media