Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3452021.3458315acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Tuple-Independent Representations of Infinite Probabilistic Databases

Published: 20 June 2021 Publication History

Abstract

Probabilistic databases (PDBs) are probability spaces over database instances. They provide a framework for handling uncertainty in databases, as occurs due to data integration, noisy data, data from unreliable sources or randomized processes. Most of the existing theory literature investigated finite, tuple-independent PDBs (TI-PDBs) where the occurrences of tuples are independent events. Only recently, Grohe and Lindner (PODS '19) introduced independence assumptions for PDBs beyond the finite domain assumption. In the finite, a major argument for discussing the theoretical properties of TI-PDBs is that they can be used to represent any finite PDB via views. This is no longer the case once the number of tuples is countably infinite. In this paper, we systematically study the representability of infinite PDBs in terms of TI-PDBs and the related block-independent disjoint PDBs.
The central question is which infinite PDBs are representable as first-order views over tuple-independent PDBs. We give a necessary condition for the representability of PDBs and provide a sufficient criterion for representability in terms of the probability distribution of a PDB. With various examples, we explore the limits of our criteria. We show that conditioning on first order properties yields no additional power in terms of expressivity. Finally, we discuss the relation between purely logical and arithmetic reasons for (non-)representability.

Supplementary Material

MP4 File (teaser_tiroipdb_pods2021.mp4)
This is a short teaser on the paper "Tuple-Independent Representations of Infinite Probabilistic Databases". We only introduce the basic setting, provide a glimpse at our results, and hint at some of the intricacies of our work.

References

[1]
Serge Abiteboul, T.-H. Hubert Chan, Evgeny Kharlamov, Werner Nutt, and Pierre Senellart. Capturing Continuous Data and Answering Aggregate Queries in Probabilistic XML. ACM Transactions on Database Systems, 36(4), 2011. href https://doi.org/10.1145/2043652.2043658 pathdoi:10.1145/2043652.2043658 .
[2]
Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases . Addison-Wesley, Boston, MA, USA, 1st edition, 1995.
[3]
Parag Agrawal and Jennifer Widom. Continuous Uncertainty in Trio. In Proceedings of the Third VLDB workshop on Management of Uncertain Data (MUD2009) in conjunction with VLDB 2009, Lyon, France, August 28th, 2009, volume WP09--14 of CTIT Workshop Proceedings Series, pages 17--32. Centre for Telematics and Information Technology (CTIT), University of Twente, The Netherlands, 2009.
[4]
Antoine Amarilli, Pierre Bourhis, and Pierre Senellart. Provenance Circuits for Trees and Treelike Instances. In Automata, Languages, and Programming (ICALP 2015), volume 9135 of Lecture Notes in Computer Science, pages 56--68, Berlin, Heidelberg, 2015. Springer. href https://doi.org/10.1007/978--3--662--47666--6_5 pathdoi:10.1007/978--3--662--47666--6_5 .
[5]
Antoine Amarilli and .Ismail .Ilkan Ceylan. A Dichotomy for Homomorphism-Closed Queries on Probabilistic Graphs. In 23rd International Conference on Database Theory (ICDT 2020), volume 155 of Leibniz International Proceedings in Informatics (LIPIcs), pages 5:1--5:20, Dagstuhl, Germany, 2020. Schloss Dagstuhl--Leibniz-Zentrum für Informatik. href https://doi.org/10.4230/LIPIcs.ICDT.2020.5 pathdoi:10.4230/LIPIcs.ICDT.2020.5 .
[6]
Lyublena Antova, Christoph Koch, and Dan Olteanu. $smash10^(10^6) $ Worlds and Beyond: Efficient Representation and Processing of Incomplete Information. The VLDB Journal, 18(5):1021, 2009. href https://doi.org/10.1007/s00778-009-0149-y pathdoi:10.1007/s00778-009-0149-y .
[7]
Vaishak Belle. Open-Universe Weighted Model Counting. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI 2017), pages 3701--3708. AAAI Press, 2017.
[8]
Michael Benedikt, Evgeny Kharlamov, Dan Olteanu, and Pierre Senellart. Probabilistic XML via Markov Chains. Proceedings of the VLDB Endowment, 3(1--2):770--781, 2010. href https://doi.org/10.14778/1920841.1920939 pathdoi:10.14778/1920841.1920939 .
[9]
Omar Benjelloun, Anish Das Sarma, Alon Halevy, and Jennifer Widom. ULDBs: Databases with Uncertainty and Lineage. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB 2006), pages 953--964. VLDB Endowment, 2006.
[10]
Stefan Borgwardt, .Ismail .Ilkan Ceylan, and Thomas Lukasiewicz. Ontology-Mediated Queries for Probabilistic Databases. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, (AAAI 2017), pages 1063--1069. AAAI Press, 2017.
[11]
Zhuhua Cai, Zografoula Vagena, Luis Perez, Subramanian Arumugam, Peter J. Haas, and Christopher Jermaine. Simulation of Database-Valued Markov Chains Using SimSQL. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD 2013), pages 637--648, New York, NY, USA, 2013. Association for Computing Machinery. href https://doi.org/10.1145/2463676.2465283 pathdoi:10.1145/2463676.2465283 .
[12]
Nofar Carmeli, Martin Grohe, Peter Lindner, and Christoph Standke. Tuple-Independent Representations of Infinite Probabilistic Databases. arXiv: 2008.09511 [cs.DB], 2020. URL: https://arxiv.org/abs/2008.09511.
[13]
.Ismail .Ilkan Ceylan, Adnan Darwiche, and Guy Van den Broeck. Open-World Probabilistic Databases. In Proceedings of the Fifteenth International Conference on Principles of Knowledge Representation and Reasoning (KR 2016), pages 339--348, Palo Alto, CA, USA, 2016. AAAI Press.
[14]
Marco Console, Matthias Hofer, and Leonid Libkin. Queries with Arithmetic on Incomplete Databases. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2020), pages 179--189, New York, NY, USA, 2020. Association for Computing Machinery. href https://doi.org/10.1145/3375395.3387666 pathdoi:10.1145/3375395.3387666 .
[15]
Nilesh Dalvi, Christopher Ré, and Dan Suciu. Probabilistic Databases: Diamonds in the Dirt. Communications of the ACM, 52(7):86--94, 2009. An unpublished, extended version is available at https://homes.cs.washington.edu/ suciu/cacm-paper.pdf. href https://doi.org/10.1145/1538788.1538810 pathdoi:10.1145/1538788.1538810 .
[16]
Nilesh Dalvi, Christopher Re, and Dan Suciu. Queries and Materialized Views on Probabilistic Databases. Journal of Computer and System Sciences, 77(3):473--490, 2011. href https://doi.org/10.1016/j.jcss.2010.04.006 pathdoi:10.1016/j.jcss.2010.04.006 .
[17]
Nilesh Dalvi and Dan Suciu. Management of Probabilistic Data: Foundations and Challenges. In Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2007), pages 1--12, New York, NY, USA, 2007. Association for Computing Machinery. href https://doi.org/10.1145/1265530.1265531 pathdoi:10.1145/1265530.1265531 .
[18]
Nilesh Dalvi and Dan Suciu. The Dichotomy of Probabilistic Inference for Unions of Conjunctive Queries. Journal of the ACM, 59(6), 2012. href https://doi.org/10.1145/2395116.2395119 pathdoi:10.1145/2395116.2395119 .
[19]
Anish Das Sarma, Omar Benjelloun, Alon Halevy, and Jennifer Widom. Working Models for Uncertain Data. In 22nd International Conference on Data Engineering (ICDE 2006), volume 1, pages 7:1--7:12, Los Alamitos, CA, USA, 2006. IEEE Computer Society. href https://doi.org/10.1109/ICDE.2006.174 pathdoi:10.1109/ICDE.2006.174 .
[20]
Amol Deshpande, Carlos Guestrin, Samuel R. Madden, Joseph M. Hellerstein, and Wei Hong. Model-Driven Data Acquisition in Sensor Networks. In Proceedings of the 30th VLDB Conference (VLDB 2004), pages 588--599. Morgan Kaufmann, 2004. href https://doi.org/10.1016/B978-012088469--8.50053-X pathdoi:10.1016/B978-012088469--8.50053-X .
[21]
Anton Faradjian, Johannes Gehrke, and Philippe Bonnet. GADT: A Probability Space ADT for Representing and Querying the Physical World. In Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), pages 201--211. IEEE Computer Society, 2002. href https://doi.org/10.1109/ICDE.2002.994710 pathdoi:10.1109/ICDE.2002.994710 .
[22]
Robert Fink and Dan Olteanu. Dichotomies for Queries with Negation in Probabilistic Databases. ACM Transactions on Database Systems, 41(1), 2016. href https://doi.org/10.1145/2877203 pathdoi:10.1145/2877203 .
[23]
Tal Friedman and Guy Van den Broeck. On Constrained Open-World Probabilistic Databases. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, pages 5722--5729. International Joint Conferences on Artificial Intelligence Organization, 2019. href https://doi.org/10.24963/ijcai.2019/793 pathdoi:10.24963/ijcai.2019/793 .
[24]
Todd J. Green and Val Tannen. Models for Incomplete and Probabilistic Information. In Current Trends in Database Technology -- EDBT 2006, Lecture Notes in Computer Science, pages 278--296. Springer, 2006. href https://doi.org/10.1007/11896548_24 pathdoi:10.1007/11896548_24 .
[25]
Eric Gribkoff, Dan Suciu, and Guy Van den Broeck. Lifted Probabilistic Inference: A Guide for the Database Researcher. Bulletin of the Technical Committee on Data Engineering, 37(3):6--17, 2014.
[26]
Eric Gribkoff, Guy Van den Broeck, and Dan Suciu. The Most Probable Database Problem. In Proceedings of the First international workshop on Big Uncertain Data (BUDA 2014), pages 1--7, 2014.
[27]
Martin Grohe and Peter Lindner. Probabilistic Databases with an Infinite Open-World Assumption. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2019), pages 17--31, New York, NY, USA, 2019. ACM. Extended version available on arXiv e-prints: hrefhttps://arxiv.org/abs/1807.00607arXiv:1807.00607 [cs.DB]. href https://doi.org/10.1145/3294052.3319681 pathdoi:10.1145/3294052.3319681 .
[28]
Martin Grohe and Peter Lindner. Generalized Independence Assumptions in Probabilistic Databases, 2020. To appear.
[29]
Martin Grohe and Peter Lindner. Infinite Probabilistic Databases. In 23rd International Conference on Database Theory (ICDT 2020), volume 155 of Leibniz International Proceedings in Informatics (LIPIcs), pages 16:1--16:20, Dagstuhl, Germany, 2020. Schloss Dagstuhl--Leibniz-Zentrum für Informatik. href https://doi.org/10.4230/LIPIcs.ICDT.2020.16 pathdoi:10.4230/LIPIcs.ICDT.2020.16 .
[30]
Bernd Gutmann, Manfred Jaeger, and Luc De Raedt. Extending ProbLog with Continuous Distributions. In Inductive Logic Programming (ILP 2010), Lecture Notes in Computer Science, pages 76--91, Berlin, Heidelberg, Germany, 2011. Springer. href https://doi.org/10.1007/978--3--642--21295--6_12 pathdoi:10.1007/978--3--642--21295--6_12 .
[31]
Tomasz Imieli'nski and Witold Lipski. Incomplete Information in Relational Databases. Journal of the ACM, 31(4):761--791, 1984. href https://doi.org/10.1145/1634.1886 pathdoi:10.1145/1634.1886 .
[32]
Ravi Jampani, Fei Xu, Mingxi Wu, Luis Perez, Chris Jermaine, and Peter J. Haas. The Monte Carlo Database System: Stochastic Analysis Close to the Data. ACM Transactions on Database Systems, 36(3), 2011. href https://doi.org/10.1145/2000824.2000828 pathdoi:10.1145/2000824.2000828 .
[33]
Abhay Jha and Dan Suciu. Probabilistic Databases with MarkoViews. Proceedings of the VLDB Endowment, 5(11):1160--1171, 2012. href https://doi.org/10.14778/2350229.2350236 pathdoi:10.14778/2350229.2350236 .
[34]
Jean Christoph Jung and Carsten Lutz. Ontology-Based Access to Probabilistic Data with OWL QL. In The Semantic Web -- ISWC 2012, Lecture Notes in Computer Science, pages 182--197, Berlin, Heidelberg, Germany, 2012. Springer. href https://doi.org/10.1007/978--3--642--35176--1_12 pathdoi:10.1007/978--3--642--35176--1_12 .
[35]
Oliver Kennedy and Christoph Koch. PIP: A Database System for Great and Small Expectations. In 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pages 157--168. IEEE, 2010. href https://doi.org/10.1109/ICDE.2010.5447879 pathdoi:10.1109/ICDE.2010.5447879 .
[36]
Benny Kimelfeld and Pierre Senellart. Probabilistic XML: Models and Complexity. In Advances in Probabilistic Databases for Uncertain Information Management, volume 304 of Studies in Fuzziness and Soft Computing, pages 39--66. Springer, 2013. href https://doi.org/10.1007/978--3--642--37509--5_3 pathdoi:10.1007/978--3--642--37509--5_3 .
[37]
Christoph Koch and Dan Olteanu. Conditioning Probabilistic Databases. Proceedings of the VLDB Endowment, 1(1):313--325, 2008. href https://doi.org/10.14778/1453856.1453894 pathdoi:10.14778/1453856.1453894 .
[38]
Daphne Koller and Nir Friedman. Probabilistic Graphical Models--Principles and Techniques. MIT Press, 2009.
[39]
Leonid Libkin. Elements of Finite Model Theory . Texts in Theoretical Computer Science (TTCS). Springer-Verlag Berlin Heidelberg, Berlin and Heidelberg, Germany, 2004. href https://doi.org/https://doi.org/10.1007/978--3--662-07003--1 pathdoi:https://doi.org/10.1007/978--3--662-07003--1 .
[40]
Leonid Libkin. Certain Answers Meet Zero-One Laws. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2018), pages 195--207, New York, NY, USA, 2018. Association for Computing Machinery. href https://doi.org/10.1145/3196959.3196983 pathdoi:10.1145/3196959.3196983 .
[41]
Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, and Andrey Kolobov. BLOG: Probabilistic Models with Unknown Objects. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005), pages 1352--1359, San Francisco, CA, USA, 2005. Morgan Kaufmann Publishers Inc.
[42]
Dan Olteanu, Jiewen Huang, and Christoph Koch. SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases. In Proceedings of the 25th International Conference on Data Engineering, (ICDE 2009), pages 640--651. IEEE Computer Society, 2009. href https://doi.org/10.1109/ICDE.2009.123 pathdoi:10.1109/ICDE.2009.123 .
[43]
David Poole. Probabilistic Horn Abduction and Bayesian Networks. Artificial Intelligence, 64(1):81--129, 1993. href https://doi.org/https://doi.org/10.1016/0004--3702(93)90061-F pathdoi:https://doi.org/10.1016/0004--3702(93)90061-F .
[44]
Christopher Ré. Applications of Probabilistic Constraints. Technical Report TR2007-03-03, University of Washington, Seattle, WA, USA, 2007.
[45]
Christopher De Sa, Ihab F. Ilyas, Benny Kimelfeld, Christopher Ré, and Theodoros Rekatsinas. A Formal Framework for Probabilistic Unclean Databases. In 22nd International Conference on Database Theory (ICDT 2019), volume 127 of Leibniz International Proceedings in Informatics (LIPIcs), pages 6:1--6:18, Dagstuhl, Germany, 2019. Schloss Dagstuhl--Leibniz-Zentrum für Informatik. href https://doi.org/10.4230/LIPIcs.ICDT.2019.6 pathdoi:10.4230/LIPIcs.ICDT.2019.6 .
[46]
Sumit Sarkar and Debabrata Dey. Relational Models and Algebra for Uncertain Data. In Managing and Mining Uncertain Data, chapter 3. Springer, Boston, MA, USA, 2009. href https://doi.org/10.1007/978-0--387-09690--2_3 pathdoi:10.1007/978-0--387-09690--2_3 .
[47]
Prithviraj Sen, Amol Deshpande, and Lise Getoor. PrDB: Managing and Exploiting Rich Correlations in Probabilistic Databases. The VLDB Journal, 18(5):1065--1090, 2009. href https://doi.org/10.1007/s00778-009-0153--2 pathdoi:10.1007/s00778-009-0153--2 .
[48]
Sarvjeet Singh, Chris Mayfield, Sagar Mittal, Sunil Prabhakar, Susanne Hambrusch, and Rahul Shah. Orion 2.0: Native Support for Uncertain Data. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD 2008), pages 1239--1242, New York, NY, USA, 2008. Association for Computing Machinery. href https://doi.org/10.1145/1376616.1376744 pathdoi:10.1145/1376616.1376744 .
[49]
Sarvjeet Singh, Chris Mayfield, Rahul Shah, Sunil Prabhakar, Susanne E. Hambrusch, Jennifer Neville, and Reynold Cheng. Database Support for Probabilistic Attributes and Tuples. In Proceedings of the 24th International Conference on Data Engineering (ICDE 2008), pages 1053--1061, 2008. href https://doi.org/10.1109/ICDE.2008.4497514 pathdoi:10.1109/ICDE.2008.4497514 .
[50]
Parag Singla and Pedro Domingos. Markov Logic in Infinite Domains. In Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI 2007), pages 368--375, Arlington, Virginia, USA, 2007. AUAI Press.
[51]
Dan Suciu. Probabilistic Databases for All. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2020), pages 19--31, New York, NY, USA, 2020. Association for Computing Machinery. href https://doi.org/10.1145/3375395.3389129 pathdoi:10.1145/3375395.3389129 .
[52]
Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. Probabilistic Databases . Synthesis Lectures on Data Management. Morgan & Claypool, San Rafael, CA, USA, 1st edition, 2011. href https://doi.org/10.2200/S00362ED1V01Y201105DTM016 pathdoi:10.2200/S00362ED1V01Y201105DTM016 .
[53]
Guy Van den Broeck and Dan Suciu. Query Processing on Probabilistic Data: A Survey. Foundations and Trends® in Databases, 7(3--4):197--341, 2017. href https://doi.org/10.1561/1900000052 pathdoi:10.1561/1900000052 .
[54]
Ron van der Meyden. Logical Approaches to Incomplete Information: A Survey. In Logics for Databases and Information Systems, The Springer International Series in Engineering and Computer Science, pages 307--356. Springer US, Boston, MA, USA, 1998. href https://doi.org/10.1007/978--1--4615--5643--5_10 pathdoi:10.1007/978--1--4615--5643--5_10 .
[55]
Wenjie Zhang, Xuemin Lin, Jian Pei, and Ying Zhang. Managing Uncertain Data: Probabilistic Approaches. In The Ninth International Conference on Web-Age Information Management, (WAIM 2008), pages 405--412. IEEE Computer Society, 2008. href https://doi.org/10.1109/WAIM.2008.42 pathdoi:10.1109/WAIM.2008.42 .

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS'21: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
June 2021
440 pages
ISBN:9781450383813
DOI:10.1145/3452021
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. block-independent disjoint probabilistic databases
  2. first-order logic
  3. infinite probabilistic databases
  4. probabilistic databases
  5. tuple-independence
  6. views

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media