Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Conditioning probabilistic databases

Published: 01 August 2008 Publication History

Abstract

Past research on probabilistic databases has studied the problem of answering queries on a static database. Application scenarios of probabilistic databases however often involve the conditioning of a database using additional information in the form of new evidence. The conditioning problem is thus to transform a probabilistic database of priors into a posterior probabilistic database which is materialized for subsequent query processing or further refinement. It turns out that the conditioning problem is closely related to the problem of computing exact tuple confidence values.
It is known that exact confidence computation is an NP-hard problem. This has led researchers to consider approximation techniques for confidence computation. However, neither conditioning nor exact confidence computation can be solved using such techniques. In this paper we present efficient techniques for both problems. We study several problem decomposition methods and heuristics that are based on the most successful search techniques from constraint satisfaction, such as the Davis-Putnam algorithm. We complement this with a thorough experimental evaluation of the algorithms proposed. Our experiments show that our exact algorithms scale well to realistic database sizes and can in some scenarios compete with the most efficient previous approximation algorithms.

References

[1]
S. Abiteboul, P. Kanellakis, and G. Grahne. "On the Representation and Querying of Sets of Possible Worlds". Theor. Comput. Sci., 78(1): 158--187, 1991.
[2]
P. Andritsos, A. Fuxman, and R. J. Miller. "Clean Answers over Dirty Databases: A Probabilistic Approach". In Proc. ICDE, 2006.
[3]
L. Antova, T. Jansen, C. Koch, and D. Olteanu. "Fast and Simple Relational Processing of Uncertain Data". In Proc. ICDE, 2008.
[4]
L. Antova, C. Koch, and D. Olteanu. "10106 Worlds and Beyond: Efficient Representation and Processing of Incomplete Information". In Proc. ICDE, 2007.
[5]
O. Benjelloun, A. D. Sarma, A. Halevy, and J. Widom. "ULDBs: Databases with Uncertainty and Lineage". In Proc. VLDB, 2006.
[6]
E. Birnbaum and E. Lozinskii. "The Good Old Davis-Putnam Procedure Helps Counting Models". Journal of AI Research, 10(6):457--477, 1999.
[7]
R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Trans. Computers, 35(8):677--691, 1986.
[8]
P. Dagum, R. M. Karp, M. Luby, and S. M. Ross. "An Optimal Algorithm for Monte Carlo Estimation". SIAM J. Comput., 29(5):1484--1496, 2000.
[9]
N. Dalvi and D. Suciu. "Efficient query evaluation on probabilistic databases". VLDB Journal, 16(4):523--544, 2007.
[10]
N. Dalvi and D. Suciu. "Management of Probabilistic Data: Foundations and Challenges". In Proc. PODS, 2007.
[11]
A. Darwiche and P. Marquis. "A knowlege compilation map". Journal of AI Research, 17:229--264, 2002.
[12]
M. Davis and H. Putnam. "A Computing Procedure for Quantification Theory". Journal of ACM, 7(3):201--215, 1960.
[13]
N. Fuhr and T. Rölleke. "A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems". ACM Trans. Inf. Syst., 15(1):32--66, 1997.
[14]
E. Grädel, Y. Gurevich, and C. Hirsch. "The Complexity of Query Reliability". In Proc. PODS, pages 227--234, 1998.
[15]
J. Huang and D. Olteanu. Conjunctive queries with inequalities on probabilistic databases. Technical report, University of Oxford, 2008.
[16]
T. Imielinski and W. Lipski. "Incomplete information in relational databases". Journal of ACM, 31(4):761--791, 1984.
[17]
R. M. Karp and M. Luby. "Monte-Carlo Algorithms for Enumeration and Reliability Problems". In Proc. FOCS, pages 56--64, 1983.
[18]
R. M. Karp, M. Luby, and N. Madras. "Monte-Carlo Approximation Algorithms for Enumeration Problems". J. Algorithms, 10(3):429--448, 1989.
[19]
C. Koch. "Approximating Predicates and Expressive Queries on Probabilistic Databases". In Proc. PODS, 2008.
[20]
C. Meinel and T. Theobald. Algorithms and Data Structures in VLSI Design. Springer-Verlag, 1998.
[21]
C. Re, N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In Proc. ICDE, pages 886--895, 2007.
[22]
A. D. Sarma, M. Theobald, and J. Widom. "Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases". In Proc. ICDE, 2008.
[23]
P. Sen and A. Deshpande. "Representing and Querying Correlated Tuples in Probabilistic Databases". In Proc. ICDE, pages 596--605, 2007.
[24]
V. V. Vazirani. Approximation Algorithms. Springer, 2001.
[25]
M. Wachter and R. Haenni. "Multi-state Directed Acyclic Graphs". In Proc. Canadian AI, pages 464--475, 2007.

Cited By

View all
  • (2023)Query-Guided Resolution in Uncertain DatabasesProceedings of the ACM on Management of Data10.1145/35893251:2(1-27)Online publication date: 20-Jun-2023
  • (2021)Tuple-Independent Representations of Infinite Probabilistic DatabasesProceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3452021.3458315(388-401)Online publication date: 20-Jun-2021
  • (2018)On optimizing operator fusion plans for large-scale machine learning in systemMLProceedings of the VLDB Endowment10.14778/3229863.322986511:12(1755-1768)Online publication date: 1-Aug-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 1, Issue 1
August 2008
1216 pages

Publisher

VLDB Endowment

Publication History

Published: 01 August 2008
Published in PVLDB Volume 1, Issue 1

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Query-Guided Resolution in Uncertain DatabasesProceedings of the ACM on Management of Data10.1145/35893251:2(1-27)Online publication date: 20-Jun-2023
  • (2021)Tuple-Independent Representations of Infinite Probabilistic DatabasesProceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3452021.3458315(388-401)Online publication date: 20-Jun-2021
  • (2018)On optimizing operator fusion plans for large-scale machine learning in systemMLProceedings of the VLDB Endowment10.14778/3229863.322986511:12(1755-1768)Online publication date: 1-Aug-2018
  • (2018)Discovery of genuine functional dependencies from relational data with missing valuesProceedings of the VLDB Endowment10.14778/3204028.320403211:8(880-892)Online publication date: 1-Apr-2018
  • (2018)Learning From Query-AnswersACM Transactions on Database Systems10.1145/327750343:4(1-41)Online publication date: 8-Dec-2018
  • (2017)Using integrity constraints to guide the interpretation of RFID-trajectory dataSIGSPATIAL Special10.1145/3151123.31511289:2(28-35)Online publication date: 10-Oct-2017
  • (2017)Scaling Probabilistic Temporal Query EvaluationProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3133038(697-706)Online publication date: 6-Nov-2017
  • (2017)Beta Probabilistic DatabasesProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3064026(573-586)Online publication date: 9-May-2017
  • (2016)Computing extensions' probabilities in probabilistic abstract argumentationProceedings of the Twenty-second European Conference on Artificial Intelligence10.3233/978-1-61499-672-9-1588(1588-1589)Online publication date: 29-Aug-2016
  • (2016)Exploiting Integrity Constraints for Cleaning Trajectories of RFID-Monitored ObjectsACM Transactions on Database Systems10.1145/293936841:4(1-52)Online publication date: 2-Nov-2016
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media