Article

Discovering approximate keys in XML data

Authors:

Jianfei ZhuAuthors Info & Claims

CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management

Pages 453 - 460

https://doi.org/10.1145/584792.584867

Published: 04 November 2002 Publication History

Abstract

Keys are very important in many aspects of data management, such as guiding query formulation, query optimization, indexing, etc. We consider the situation where an XML document does not come with key definitions, and we are interested in using data mining techniques to obtain a representation of the keys holding in a document. In order to have a compact representation of the set of keys holding in a document, we define a partial order on the set of all key expressions. This order is based on an analysis of the properties of absolute and relative keys for XML. Given the existence of the partial order, only a reduced set of key expressions need to be discovered.Due to the semistructured nature of XML documents, it turns out to be useful to consider keys that hold in "almost" the whole document, that is, they are violated only in a small part of the document. To this end, the support and confidence of a key expression are also defined, and the concept of approximate key expression is introduced. We give an efficient algorithm to mine a reduced set of approximate keys from an XML document.

References

[1]

ACM SIGMOD Record: XML Version, http://www.acm.org/sigmod/record/xml/.]]

[2]

S. Abiteboul, R. Hull and V. Vianu. Foundations of databases, Addison-Wesley, 1995.]]

Digital Library

[3]

R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Very Large Data Bases, pages 487--499, Santiago, 1994.]]

Digital Library

[4]

M. Arenas and L. Libkin. A normal form for XML documents, Proceedings of the 21th Symposium on Principles of Database Systems (PODS'02), pages 85--96, 2002.]]

Digital Library

[5]

T. Bray, J. Paoli, and C. M. Sperberg-McQueen. Extensive Markup Language (XML) 1.0. World Wide Web Consortium (W3C), Feb. 1998. http://www.w3.org/TR/REC-xml.]]

[6]

P. Buneman, S. Davidson, W. Fan, C. Hara, W. Tan. Reasoning about Keys for XML. In 8th International Workshop on Databases and Programming Languages (DBPL '01).]]

Digital Library

[7]

P. Buneman, W. Fan,J. Siméon, S. Weinstein. Constraints for semistructured data and XML. SIGMOD Record, 30(1):47--55, 2001.]]

Digital Library

[8]

S. Davidson, Y. Chen and Y. Zheng. Technical report, Indexing Keys in Hierarchical Data, 2001.]]

[9]

W. Fan, L. Libkin. On XML Integrity Constraints in the Presence of DTDs. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 114--125, Santa Barbara, California, May 2001.]]

Digital Library

[10]

W. Fan, J. Siméon. Integrity Constraints for XML. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 23--34, Dallas, Texas, May 2000.]]

Digital Library

[11]

C. M. Hoffmann and M. J. O'Donnell. Pattern matching in trees, Journal of the ACM, 29(1):68--95, 1982.]]

Digital Library

[12]

Y. Huhtala, J. Kivinen, P. Porkka and H. Toivonen. Efficient Discovery of Functional and Approximate Dependencies Using Partitions, ICDE, pages 392--401, 1998.]]

Digital Library

[13]

A. Layman et al. XML-Data. W3C Note, Jan. 1998. http://www.w3.org/TR/1998/ NOTE-XML-data.]]

[14]

K. Wang, H. Liu. Discovering Typical Structures of Documents: A Road Map Approach. In 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 146--154, 1998.]]

Digital Library

[15]

P. Buneman, S. Khanna, K. Tajima, W. Tan, Archiving Scientific Data. Proceedings of ACM SIGMOD International Conference on Management of Data, pages 1-12, 2002.]]

Digital Library

[16]

J. Kivinen and H. Mannila Approximate dependency inference from relations. Theoretical Computer Science, 149:129--149, 1995.]]

Digital Library

[17]

H. Mannila and K.-J. Räihä On the complexity of inferring functional dependencies. Discrete Applied Mathematics, 40:237--243, 1992.]]

Digital Library

[18]

Calders T., Paredaens J. Axiomatization of frequent sets. In Proceedings of the International Conference on Database Theory, pages 204--218, London, 2001.]]

Digital Library

Cited By

Ding JNathan VAlizadeh MKraska T(2020)TsunamiProceedings of the VLDB Endowment10.14778/3425879.342588014:2(74-86)Online publication date: 16-Nov-2020
https://dl.acm.org/doi/10.14778/3425879.3425880
Huang XLakshmanan L(2017)Attribute-driven community searchProceedings of the VLDB Endowment10.14778/3099622.309962610:9(949-960)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.14778/3099622.3099626
Jiang MFu AWong R(2017)READSProceedings of the VLDB Endowment10.14778/3099622.309962510:9(937-948)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.14778/3099622.3099625
Show More Cited By

Index Terms

Discovering approximate keys in XML data
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Discovering XML keys and foreign keys in queries
SAC '09: Proceedings of the 2009 ACM symposium on Applied Computing

The XML has undoubtedly become a standard for data representation and manipulation. But most of XML documents are still created without the respective description of their structure, i.e. an XML schema. In this paper, we further enhance current methods ...
Discovering XSD Keys from XML Data
Invited Articles Issue, SIGMOD 2013, PODS 2013 and ICDT 2013

A great deal of research into the learning of schemas from XML data has been conducted in recent years to enable the automatic discovery of XML schemas from XML documents when no schema or only a low-quality one is available. Unfortunately, and in ...
Discovering XSD keys from XML data
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

A great deal of research into the learning of schemas from XML data has been conducted in recent years to enable the automatic discovery of XML Schemas from XML documents when no schema, or only a low-quality one is available. Unfortunately, and in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management

November 2002

704 pages

ISBN:1581134924

DOI:10.1145/584792

General Chair:
Charles Nicholas
University of Maryland Baltimore County
,
Program Chairs:
David Grossman
Illinois Institute of Technology
,
Konstantinos Kalpakis
University of Maryland Baltimore County
,
Sajda Qureshi
Erasmus University, Rotterdam
,
Han van Dissel
Erasmus University, Rotterdam
,
Len Seligman
The MITRE Corporation

Copyright © 2002 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2002

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

CIKM02

Sponsor:

CIKM02: Eleventh ACM International Conference on Information and Knowledge Management

November 4 - 9, 2002

Virginia, McLean, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
636
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ding JNathan VAlizadeh MKraska T(2020)TsunamiProceedings of the VLDB Endowment10.14778/3425879.342588014:2(74-86)Online publication date: 16-Nov-2020
https://dl.acm.org/doi/10.14778/3425879.3425880
Huang XLakshmanan L(2017)Attribute-driven community searchProceedings of the VLDB Endowment10.14778/3099622.309962610:9(949-960)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.14778/3099622.3099626
Jiang MFu AWong R(2017)READSProceedings of the VLDB Endowment10.14778/3099622.309962510:9(937-948)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.14778/3099622.3099625
Wang XQin LLin XZhang YChang L(2017)Leveraging set relations in exact set similarity joinProceedings of the VLDB Endowment10.14778/3099622.309962410:9(925-936)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.14778/3099622.3099624
Huang KWang SBevilacqua GXiao XLakshmanan L(2017)Revisiting the stop-and-stare algorithms for influence maximizationProceedings of the VLDB Endowment10.14778/3099622.309962310:9(913-924)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.14778/3099622.3099623
Muñoz E(2016)On Learnability of Constraints from RDF DataProceedings of the 13th International Conference on The Semantic Web. Latest Advances and New Domains - Volume 967810.1007/978-3-319-34129-3_52(834-844)Online publication date: 29-May-2016
https://dl.acm.org/doi/10.1007/978-3-319-34129-3_52
Sharov AShraer AMerchant AStokely M(2015)Take me to your leader!Proceedings of the VLDB Endowment10.14778/2824032.28240478:12(1490-1501)Online publication date: 1-Aug-2015
https://dl.acm.org/doi/10.14778/2824032.2824047
Margo DSeltzer M(2015)A scalable distributed graph partitionerProceedings of the VLDB Endowment10.14778/2824032.28240468:12(1478-1489)Online publication date: 1-Aug-2015
https://dl.acm.org/doi/10.14778/2824032.2824046
Abedjan ZGolab LNaumann F(2015)Profiling relational dataThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-015-0389-y24:4(557-581)Online publication date: 1-Aug-2015
https://dl.acm.org/doi/10.1007/s00778-015-0389-y
Abiteboul SAmsterdamer YDeutch DMilo TSenellart P(2015)Optimal Probabilistic Generation of XML DocumentsTheory of Computing Systems10.1007/s00224-014-9581-557:4(806-842)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1007/s00224-014-9581-5
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents