Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Debugging missing answers for spark queries over nested data with breadcrumb

Published: 01 July 2021 Publication History

Abstract

We present Breadcrumb, a system that aids developers in debugging queries through query-based explanations for missing answers. Given as input a query and an expected, but missing, query result, Breadcrumb identifies operators in the input query that are responsible for the failure to derive the missing answer. These operators form explanations that guide developers who can then focus their debugging efforts on fixing these parts of the query. Breadcrumb is implemented on top of Apache Spark. Our approach is the first that scales to big data dimensions and is capable of finding explanations for common errors in queries over nested and de-normalized data, e.g., errors based on misinterpreting schema semantics.

References

[1]
D. Aumueller, H. Do, S. Massmann, and E. Rahm. 2005. Schema and Ontology Matching with COMA++. In ACM Conference on the Management of Data (SIGMOD). 906--908.
[2]
N. Bidoit, M. Herschel, and A. Tzompanaki. 2015. Efficient Computation of Polynomial Explanations of Why-Not Questions. In Conference on Information and Knowledge Management (CIKM). 713--722.
[3]
N. Bidoit, M. Herschel, and K. Tzompanaki. 2014. Query-Based Why-Not Provenance with NedExplain. In Conference on Extending Database Technology (EDBT). 145--156.
[4]
A. Chapman and H. V. Jagadish. 2009. Why not?. In ACM Conference on the Management of Data (SIGMOD). 523--534.
[5]
Y. Cui and J. Widom. 2003. Lineage tracing for general data warehouse transformations. The VLDB Journal 12, 1 (2003), 41--58.
[6]
D. Deutch, N. Frost, A. Gilad, and T. Haimovich. 2018. NLProveNAns: Natural Language Provenance for Non-Answers. Proceedings of the VLDB Endowment (PVLDB) 11, 12 (2018), 1986--1989.
[7]
R. Diestelkämper, S. Lee, M. Herschel, and B. Glavic. 2021. To not miss the forest for the trees - a holistic approach for explaining missing answers over nested data. In ACM Conference on the Management of Data (SIGMOD). 405--417.
[8]
B. Glavic. 2021. Data Provenance - Origins, Applications, Algorithms, and Models. Foundations and Trends in Databases 9, 3--4 (2021), 209--441.
[9]
M. Herschel, R. Diestelkämper, and H. Ben Lahmar. 2017. A survey on provenance: What for? What form? What from? The VLDB Journal 26, 6 (2017), 881--906.
[10]
Y. Li, C. Yu, and H. V. Jagadish. 2008. Enabling Schema-Free XQuery with meaningful query focus. The VLDB Journal 17, 3 (2008), 355--377.
[11]
L. Libkin and L. Wong. 1997. Query Languages for Bags and Aggregate Functions. Journal of Computer and System Sciences (JCSS) 55, 2 (Oct. 1997), 241--272.
[12]
J. Lu, T.W. Ling, Z. Bao, and C. Wang. 2011. Extended XML Tree Pattern Matching: Theories and Algorithms. IEEE Transactions on Knowledge and Data Engineering (TKDE) 23, 3 (2011), 402--416.

Cited By

View all
  • (2023)Why Not Yet: Fixing a Top-k Ranking that is Not Fair to IndividualsProceedings of the VLDB Endowment10.14778/3598581.359860616:9(2377-2390)Online publication date: 10-Jul-2023

Index Terms

  1. Debugging missing answers for spark queries over nested data with breadcrumb
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 14, Issue 12
    July 2021
    587 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 July 2021
    Published in PVLDB Volume 14, Issue 12

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Why Not Yet: Fixing a Top-k Ranking that is Not Fair to IndividualsProceedings of the VLDB Endowment10.14778/3598581.359860616:9(2377-2390)Online publication date: 10-Jul-2023

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media