research-article

Query-Guided Resolution in Uncertain Databases

Authors:

Osnat Drien,

Matanya Freiman,

Antoine Amarilli,

Yael AmsterdamerAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 1, Issue 2

Article No.: 180, Pages 1 - 27

https://doi.org/10.1145/3589325

Published: 20 June 2023 Publication History

Get Access

Abstract

We present a novel framework for uncertain data management. We start with a database whose tuple correctness is uncertain and an oracle that can resolve the uncertainty, i.e., decide if a tuple is correct or not. Such an oracle may correspond, e.g., to a data expert or to a crowdsourcing platform. We wish to use the oracle to clean the database with the goal of ensuring the correct answer for specific mission-critical queries. To avoid the prohibitive cost of cleaning the entire database and to minimize the expected number of calls to the oracle, we must carefully select tuples whose resolution would suffice to resolve the uncertainty in query results. In other words, we need a query-guided process for the resolution of uncertain data.

We develop an end-to-end solution to this problem, based on the derivation of query answers and on correctness probabilities for the uncertain data. At a high level, we first track Boolean provenance to identify which input tuples contribute to the derivation of each output tuple, and in what ways. We then design an active learning solution for iteratively choosing tuples to resolve, based on the provenance structure and on an evolving estimation of tuple correctness probabilities. We conduct an extensive experimental study to validate our framework in different use cases.

Supplemental Material

MP4 File

We present a novel framework for an end-to-end solution for uncertain mission-critical data management. We start with a database whose tuple correctness is uncertain and an oracle (e.g., to a data expert or to a crowdsourcing platform) that can resolve the uncertainty, i.e., decide if a tuple is correct or not. We wish to use the oracle to clean the database with the goal of ensuring the correct answer for specific mission-critical queries. Our goal is to compute the precise ground truth query result while performing a minimal number of sequential oracles probes.

Download
167.41 MB

PDF File

Read me

Download
29.07 KB

ZIP File

Source Code

Download
156.24 KB

References

[1]

Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley.

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Probabilistic top-k dominating queries in uncertain databases

Probabilistic inverse ranking queries in uncertain databases

Shooting top-k stars in uncertain databases

Comments

Information

Published In

Publisher

Publication History

Permissions

Badges

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations