Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1807167.1807241acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Lineage processing over correlated probabilistic databases

Published: 06 June 2010 Publication History

Abstract

In this paper, we address the problem of scalably evaluating conjunctive queries over correlated probabilistic databases containing tuple or attribute uncertainties. Like previous work, we adopt a two-phase approach where we first compute lineages of the output tuples, and then compute the probabilities of the lineage formulas. However unlike previous work, we allow for arbitrary and complex correlations to be present in the data, captured via a forest of junction trees. We observe that evaluating even read-once (tree structured) lineages (e.g., those generated by hierarchical conjunctive queries), polynomially computable over tuple independent probabilistic databases, is #P-complete for lightly correlated probabilistic databases like Markov sequences. We characterize the complexity of exact computation of the probability of the lineage formula on a correlated database using a parameter called lwidth (analogous to the notion of treewidth). For lineages that result in low lwidth, we compute exact probabilities using a novel message passing algorithm, and for lineages that induce large lwidths, we develop approximate Monte Carlo algorithms to estimate the result probabilities. We scale our algorithms to very large correlated probabilistic databases using the previously proposed INDSEP data structure. To mitigate the complexity of lineage evaluation, we develop optimization techniques to process a batch of lineages by sharing computation across formulas, and to exploit any independence relationships that may exist in the data. Our experimental study illustrates the benefits of using our algorithms for processing lineage formulas over correlated probabilistic databases.

References

[1]
L. Antova, T. Jansen, C. Koch, and D. Olteanu. Fast and simple relational processing of uncertain data. In ICDE, 2008.
[2]
A. Chechetka and C. Guestrin. Efficient principled learning of thin junction trees. In NIPS, 2007.
[3]
R. Cheng, D. V. Kalashnikov, and S. Prabhakar. Evaluating probabilistic queries over imprecise data. In SIGMOD, 2003.
[4]
A. Choi and A. Darwiche. An edge deletion semantics for belief propagation and its practical impact on approximation quality. In AAAI, 2006.
[5]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. 2001.
[6]
R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhater. Probabilistic Networks and Expert Systems. Springer, 1999.
[7]
N. Dalvi et al. A web of concepts. In PODS, 2009.
[8]
N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004.
[9]
X. L. Dong, A. Y. Halevy, and C. Yu. Data integration with uncertainty. In VLDB, 2007.
[10]
J. Finn and J. Frank. Optimal junction trees. In UAI, 1994.
[11]
S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE PAMI, 1984.
[12]
M. C. Golumbic, A. Mintz, and U. Rotics. Factoring and recognition of read-once functions using cographs and normality. In DAC, 2001.
[13]
R. Gupta and S. Sarawagi. Creating probabilistic databases from information extraction models. In VLDB, 2006.
[14]
C. Huang and A. Darwiche. Inference in belief networks: A procedural guide. Int. J. Approx. Reasoning, 1996.
[15]
T. S. Jayram, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar information extraction system. IEEE Data Eng. Bul l., 29(1), 2006.
[16]
B. Kanagal and A. Deshpande. Online filtering, smoothing and probabilistic modeling of streaming data. In ICDE, 2008.
[17]
B. Kanagal and A. Deshpande. Efficient query evaluation over temporally correlated probabilistic streams. In ICDE, 2009.
[18]
B. Kanagal and A. Deshpande. Indexing correlated probabilistic databases. In SIGMOD, 2009.
[19]
U. Kjærulff. Hugs: Combining exact inference and gibbs sampling in junction trees. In UAI, 1995.
[20]
J. Letchner, C. Re, M. Balazinska, and M. Philipose. Access methods for markovian streams. In ICDE, 2009.
[21]
E. Michelakis, R. Krishnamurthy, P. J. Haas, and S. Vaithyanathan. Uncertainty management in rule-based information extraction systems. In SIGMOD, 2009.
[22]
M. A. Paskin. Thin junction tree filters for simultaneous localization and mapping. In IJCAI, 2003.
[23]
J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1988.
[24]
C. Re, J. Letchner, M. Balazinska, and D. Suciu. Event queries on correlated probabilistic streams. In SIGMOD, 2008.
[25]
C. Ré and D. Suciu. Approximate lineage for probabilistic databases. PVLDB, 2008.
[26]
N. Robertson and P. D. Seymour. Graph minors. iii. planar tree--width. Journal of Combinatorial Theory, Series B, 36(1), 1984.
[27]
A. D. Sarma, M. Theobald, and J. Widom. Exploiting lineage for confidence computation in uncertain and probabilistic databases. In ICDE, 2008.
[28]
P. Sen and A. Deshpande. Representing and querying correlated tuples in probabilistic databases. In ICDE, 2007.
[29]
P. Sen, A. Deshpande, and L. Getoor. Exploiting shared correlations in probabilistic databases. In VLDB, 2008.
[30]
The RFID Ecosystem, University of Washington. http://rfid.cs.washington.edu/.

Cited By

View all
  • (2024)The Generalized Causal-Effect Score in Data Management (short paper)Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI10.1145/3665601.3669843(32-35)Online publication date: 9-Jun-2024
  • (2018)Lifted Dynamic Junction Tree AlgorithmGraph-Based Representation and Reasoning10.1007/978-3-319-91379-7_5(55-69)Online publication date: 20-May-2018
  • (2018)Graphical Models for Uncertain Data ManagementEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_80741(1661-1668)Online publication date: 7-Dec-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
June 2010
1286 pages
ISBN:9781450300322
DOI:10.1145/1807167
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. conjunctive queries
  2. indexing
  3. junction trees
  4. lineage
  5. probabilistic databases

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '10
Sponsor:
SIGMOD/PODS '10: International Conference on Management of Data
June 6 - 10, 2010
Indiana, Indianapolis, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The Generalized Causal-Effect Score in Data Management (short paper)Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI10.1145/3665601.3669843(32-35)Online publication date: 9-Jun-2024
  • (2018)Lifted Dynamic Junction Tree AlgorithmGraph-Based Representation and Reasoning10.1007/978-3-319-91379-7_5(55-69)Online publication date: 20-May-2018
  • (2018)Graphical Models for Uncertain Data ManagementEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_80741(1661-1668)Online publication date: 7-Dec-2018
  • (2017)An Indexing Framework for Queries on Probabilistic GraphsACM Transactions on Database Systems10.1145/304471342:2(1-34)Online publication date: 10-May-2017
  • (2017)Causality and Responsibility Problem on Probabilistic Reverse Skyline QueriesPreference Query Analysis and Optimization10.1007/978-981-10-6635-1_2(9-29)Online publication date: 3-Nov-2017
  • (2017)Graphical Models for Uncertain Data ManagementEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_80741-1(1-8)Online publication date: 19-Sep-2017
  • (2016)Finding Causality and Responsibility for Probabilistic Reverse Skyline Query Non-AnswersIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2016.259986928:11(2974-2987)Online publication date: 1-Nov-2016
  • (2015)Representing and processing lineages over uncertain data based on the Bayesian networkApplied Soft Computing10.1016/j.asoc.2015.07.04737:C(345-362)Online publication date: 1-Dec-2015
  • (2015)Threshold-Based Shortest Path Query over Large Correlated Uncertain GraphsJournal of Computer Science and Technology10.1007/s11390-015-1559-530:4(762-780)Online publication date: 8-Jul-2015
  • (2015)A Working Model for Uncertain Data with LineageConceptual Modeling10.1007/978-3-319-25264-3_27(369-383)Online publication date: 8-Dec-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media