Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/502585.502612acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Induction of integrated view for XML data with heterogeneous DTDs

Published: 05 October 2001 Publication History

Abstract

This paper proposes a novel approach to integrating heterogeneous XML DTDs. With this approach, an information agent can be easily extended to integrate heterogeneous XML-based contents and perform federated search. Based on a tree grammar inference technique, this approach derives an integrated view of XML DTDs in an information integration framework. The derivation takes advantages of naming and structural similarities among DTDs in similar domains. The complete approach consists of three main steps. (1) DTD clustering clusters DTDs in similar domains into classes. (2) Schema learning applies a tree grammar inference technique to generate a set of tree grammar rules from the DTDs in a class from the previous step. (3) Minimization optimizes the rules generated in the previous step and transforms them into an integrated view. We have implemented the proposed approach into a system called DEEP and tested the system on artificial and real domains. The experimental results reveal that this system can effectively and efficiently integrate radically different DTDs.

References

[1]
T. Bray, J. Paoli, and C. M. Sperberg-McQueen. Extensible Markup Language(XML) 1.0, 1998. W3C Recommendation.
[2]
P. Buneman, S. Davidson, G. Hillebrand, and D. Suciu. A query language and optimization techniques for unstructured data. In Proceedings of SIGMOD, 1996.
[3]
S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The TSIMMIS project: Integration of heterogeneous information sources. In Proceedings of the Information Processing Society of Japan Conference, pages 7-18, Tokyo, Japan, October 1995.
[4]
A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. XML-QL: a query language for XML, 1998.
[5]
A. Doan, P. Domingos, and A. Levy. Learning source descriptions for data integration. In 3rd International Workshop on the Web and Databases, 2000.
[6]
O. Duschka and M. Genesereth. Query planning in infomaster. In Proceedings of the ACM Symposium on Applied Computing, San Jose, CA, February 1997.
[7]
O. Etzioni and D. Weld. A softbot-based interface to the Internet. In C. ACM, 1994.
[8]
M. Fernandez, J. Simeon, and P. Wadler. XML query languages:experiences and examplars, 1999. W3C Draft manuscript.
[9]
H. Fukuda and K. Kamata. Inference of tree automata from sample set of trees. International Journal of Computer and Information Sciences, 13:177-196, 1984.
[10]
M. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri, and K. Shim. XTRACT: a system for extracting document type descriptors from xml documents. In Proceedings of the ACM SIGMOD, 2000.
[11]
T. Kirk, A. Y. Levy, Y. Sagiv, and D. Srivsstava. The information manifold. In Proceedings of the AAAI Spring Symposium on Information Gathering in Distributed Heterogeneous Environments, Stanford, California, March 1995.
[12]
C. A. Knoblock, Y. Arens, and C. N. Hsu. Cooperating agents for information retrieval. In Proceedings of International Conference on Cooperative Information Systems, 1994.
[13]
C. Kwok and D. Weld. Planning to gather information. In Proceedings on 13th National Conference of AI, 1996.
[14]
S. Y. Lu. A tree matching algorithm based on node splitting and merging. In IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 6, pages 249-256, 1984.
[15]
S. Nestorov, S. Abiteboul, and R. Motwani. Extracting schema from semistructured data. In Proceedings of the ACM SIGMOD, pages 295-306, Seattle, June 1998.
[16]
E. Rasmussen. Clustering Algorithms, chapter 16. Prentice Hall, 1992.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '01: Proceedings of the tenth international conference on Information and knowledge management
October 2001
616 pages
ISBN:1581134363
DOI:10.1145/502585
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2001

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. XML DTD
  2. distributed databases
  3. federated search
  4. intelligent agent
  5. mark-up schemes
  6. semistructured data

Qualifiers

  • Article

Conference

CIKM01
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2007)XML schema clustering with semantic and hierarchical similarity measuresKnowledge-Based Systems10.1016/j.knosys.2006.08.00620:4(336-349)Online publication date: 1-May-2007
  • (2006)A framework for integrating XML transformationsProceedings of the 25th international conference on Conceptual Modeling10.1007/11901181_15(182-195)Online publication date: 6-Nov-2006
  • (2005)Data Integration in a Three-Layer Mediation FrameworkProceedings. IEEE SoutheastCon, 2005.10.1109/SECON.2005.1423290(477-482)Online publication date: 2005
  • (2004)An Efficient Algorithm for Clustering XML SchemasWeb Information Systems – WISE 200410.1007/978-3-540-30480-7_38(372-377)Online publication date: 2004
  • (2003)Combining DAML+OIL, XSLT and Probabilistic Logics for Uncertain Schema Mappings in MINDResearch and Advanced Technology for Digital Libraries10.1007/978-3-540-45175-4_19(194-206)Online publication date: 2003
  • (2003)Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic ApproachConceptual Modeling - ER 200310.1007/978-3-540-39648-2_40(520-533)Online publication date: 2003
  • (2003)Relevance Ranking Tuning for Similarity Queries on XML DataEfficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web10.1007/3-540-36556-7_2(22-34)Online publication date: 28-Feb-2003

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media