research-article

On generating near-optimal tableaux for conditional functional dependencies

Authors:

Howard Karloff,

Divesh Srivastava,

Bei YuAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 1, Issue 1

Pages 376 - 390

https://doi.org/10.14778/1453856.1453900

Published: 01 August 2008 Publication History

Abstract

Conditional functional dependencies (CFDs) have recently been proposed as a useful integrity constraint to summarize data semantics and identify data inconsistencies. A CFD augments a functional dependency (FD) with a pattern tableau that defines the context (i.e., the subset of tuples) in which the underlying FD holds. While many aspects of CFDs have been studied, including static analysis and detecting and repairing violations, there has not been prior work on generating pattern tableaux, which is critical to realize the full potential of CFDs.

This paper is the first to formally characterize a "good" pattern tableau, based on naturally desirable properties of support, confidence and parsimony. We show that the problem of generating an optimal tableau for a given FD is NP-complete but can be approximated in polynomial time via a greedy algorithm. For large data sets, we propose an "on-demand" algorithm providing the same approximation bound, that outperforms the basic greedy algorithm in running time by an order of magnitude. For ordered attributes, we propose the range tableau as a generalization of a pattern tableau, which can achieve even more parsimony. The effectiveness and efficiency of our techniques are experimentally demonstrated on real data.

References

[1]

D. Agarwal, D. Barman, D. Gunopulos, N. Young, F. Korn, and D. Srivastava. Efficient and effective explanation of change in hierarchical summaries. In KDD, pp 6--15, 2007.

Digital Library

[2]

R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, pp 207--216, 1993.

Digital Library

[3]

P. Berman and M. Karpinski. On Some Tighter Inapproximability Results (Extended Abstract). In ICALP, pp. 200--209, 1999.

Digital Library

[4]

P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, 2007.

[5]

L. Bravo, W. Fan, and S. Ma. Extending dependencies with conditions. In VLDB, 2007.

Digital Library

[6]

P. Brown, P. Haas. BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data. In VLDB, 2003.

Digital Library

[7]

T. Calders, R. Ng, and J. Wijsen. Searching for dependencies at multiple abstraction levels. TODS, 27(3):229--260, 2002.

Digital Library

[8]

S. Ceri, F. D. Guinta, and P. Lanzi. Mining constraint violations. TODS, 32(1), 2007.

Digital Library

[9]

G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma. Improving data quality: consistency and accuracy. In VLDB, 2007.

Digital Library

[10]

M. Dalkilic and E. Robertson. Information dependencies. In PODS, pp 245--253, 2000.

Digital Library

[11]

R. Gandhi, S. Khuller, and A. Srinivasan. Approximation algorithms for partial covering problems. J. Algorithms, 53(1):55--84, 2004.

Digital Library

[12]

M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP---Completeness. Freeman, San Francisco, 1979.

Digital Library

[13]

C. Giannella and E. Robertson. On approximation measures for functional dependencies. Information Systems, 29(6):483--507, 2004.

Digital Library

[14]

Y. Huhtala, J. Karkkainen, P. Porkka, and H. Toivonen. Tane: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 42(2):100--111, 1999.

[15]

I. Ilyas, V. Markl, P. Haas, P. Brown, and A. Aboulnaga. Cords: Automatic discovery of correlations and soft functional dependencies. In SIGMOD, pp 647--658, 2004.

Digital Library

[16]

M. J. Kearns and U. V. Vazirani. An introduction to computational learning theory. MIT Press, 1994.

Digital Library

[17]

R. King and J. Legendre. Discovery of functional and approximate functional dependencies in relational databases. Journal of Applied Mathematics and Decision Sciences, 7(1):49--59, 2003.

[18]

J. Kivinen and H. Mannila. Approximate inference of functional dependencies from relations. Theor. Comput. Sci., 149(1):129--149, 1995.

Digital Library

[19]

U. Nambiar and S. Kambhampati. Mining approximate functional dependencies and concept similarities to answer imprecise queries. In WebDB, 2004.

Digital Library

[20]

S. Sarawagi. Explaining differences in multidimensional aggregates. In VLDB, pp 42--53, 1999.

Digital Library

[21]

R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. SIGMOD Rec., 25(2):1--12, 1996.

Digital Library

[22]

Y. Xu and R. Motwani. Random sampling based algorithms for efficient semi-key discovery. Stanford University Technical Report.

Cited By

Han ZChen WHan Y(2025)Data Quality-Driven Top-k Rule DiscoveryProceedings of the 3rd International Conference on Machine Learning, Cloud Computing and Intelligent Mining (MLCCIM2024)10.1007/978-981-96-1698-5_3(24-38)Online publication date: 6-Feb-2025
https://doi.org/10.1007/978-981-96-1698-5_3
Fan WPang KLu PTian C(2024)Making It Tractable to Detect and Correct Errors in GraphsACM Transactions on Database Systems10.1145/370231549:4(1-75)Online publication date: 16-Dec-2024
https://dl.acm.org/doi/10.1145/3702315
Fan WHan ZXie MZhang G(2024)Discovering Top-k Relevant and Diversified RulesProceedings of the ACM on Management of Data10.1145/36771312:4(1-28)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3677131
Show More Cited By

Index Terms

On generating near-optimal tableaux for conditional functional dependencies

Recommendations

HyperS tableaux - heuristic hyper tableaux

Several syntactic methods have been constructed to automate theorem proving in first-order logic. The positive (negative) hyper-resolution and the clause tableaux were combined in a single calculus called hyper tableaux in [1]. In this paper we propose ...
Optimal Tableaux for Conditional Logics with Cautious Monotonicity
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence

Conditional logics capture default entailment in a modal framework in which non-monotonic implication is a first-class citizen, and in particular can be negated and nested. There is a wide range of axiomatizations of conditionals in the literature, from ...
Sectional and Conditional Functional Dependencies
WASA 2014: Proceedings of the 9th International Conference on Wireless Algorithms, Systems, and Applications - Volume 8491

This paper proposes the concept and applications of the sectional and conditional functional dependency (SCFD), which is an important extension of the conditional functional dependency (CFD) and the functional dependency (FD). SCFDs describe ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 1, Issue 1

August 2008

1216 pages

ISSN:2150-8097

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2008

Published in PVLDB Volume 1, Issue 1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

116
Total Citations
View Citations
500
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Han ZChen WHan Y(2025)Data Quality-Driven Top-k Rule DiscoveryProceedings of the 3rd International Conference on Machine Learning, Cloud Computing and Intelligent Mining (MLCCIM2024)10.1007/978-981-96-1698-5_3(24-38)Online publication date: 6-Feb-2025
https://doi.org/10.1007/978-981-96-1698-5_3
Fan WPang KLu PTian C(2024)Making It Tractable to Detect and Correct Errors in GraphsACM Transactions on Database Systems10.1145/370231549:4(1-75)Online publication date: 16-Dec-2024
https://dl.acm.org/doi/10.1145/3702315
Fan WHan ZXie MZhang G(2024)Discovering Top-k Relevant and Diversified RulesProceedings of the ACM on Management of Data10.1145/36771312:4(1-28)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3677131
Bao XBao ZBinbin BDuan QFan WLei HLi DLin WLiu PLv ZOuyang MTang SWang YWei QXie MZhang JZhang XZhao RZhou SBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Rock: Cleaning Data by Embedding ML in Logic RulesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653372(106-119)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3653372
Breve BCaruccio LCirillo SDeufemia VPolese G(2024)Decentralized and Incremental Discovery of Relaxed Functional Dependencies Using Bitwise SimilarityIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.340392836:12(7380-7398)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1109/TKDE.2024.3403928
Caicedo AMejia ÉLuna E(2024)Revolutionizing protection dynamics in microgrids: Local validation environment and a novel global management control through multi-agent systemsComputers and Electrical Engineering10.1016/j.compeleceng.2024.109748120(109748)Online publication date: Dec-2024
https://doi.org/10.1016/j.compeleceng.2024.109748
Yi JHan DChen CShen XZong L(2024)ARDN: Attention Re-distribution Network for Visual Question AnsweringArabian Journal for Science and Engineering10.1007/s13369-024-09067-6Online publication date: 1-May-2024
https://doi.org/10.1007/s13369-024-09067-6
Fan WFu WJin RLiu MLu PTian C(2023)Making It Tractable to Catch Duplicates and Conflicts in GraphsProceedings of the ACM on Management of Data10.1145/35889401:1(1-28)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588940
Wang RLin YChen AWang YXu MChen PLiu BNi YZheng SYao K(2023)SD-INT: Towards Lightweight Network-Wide Passive INT in the Self-Driving Way2023 IEEE 31st International Conference on Network Protocols (ICNP)10.1109/ICNP59255.2023.10355635(1-12)Online publication date: 10-Oct-2023
https://doi.org/10.1109/ICNP59255.2023.10355635
Li JMoskovitch YJagadish H(2023)Detection of Groups with Biased Representation in Ranking2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00168(2167-2179)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00168
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents