Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2638404.2638517acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
research-article

Approximate matching of XML documents with schemata using tree alignment

Published: 28 March 2014 Publication History

Abstract

Studying structural similarity between XML documents are important for many tasks like XML data classification and XML document management. In this paper, we formally introduce the edit distance between trees and tree grammars using tree alignments. A sketch of the proof for the correctness of the algorithm is presented followed by the analysis of the efficiency of the algorithm. Experiments are conducted to show the time efficiency and validness of the distance in the context of XML document clustering.

References

[1]
Bertino, E., Guerrini, G., & Mesiti, M. (2008). Measuring the structural similarity among XML documents and DTDs. Journal of Intelligent Information Systems, 30(1), 55--92.
[2]
Canfield, R., Xing, G. (2005). Approximate matching of XML document with regular hedge grammar. International Journal of Computer Mathematics. 82(10), 1191--1198.
[3]
Cluto, http://glaros.dtc.umn.edu/gkhome/views/cluto
[4]
Dalamagas, T., Cheng, T., Winkel, K., & Sellis, T. (2006). A methodology for clustering XML documents by structure. Information Systems. 31(3) 187--228.
[5]
Denoyer, L., Gallinari, P., & Vercoustre A.(2006) Report on the XML Mining Track at INEX 2005 and INEX 2006. In Fuhr, N., Lalmas, M., & Trotman, A. (Ed.) INEX 2006 Lecture Notes in Computer Science (4518) (pp. 432--443).
[6]
Maes, F., Denoyer, L., & Gallinari, P. (2006). XML Structure Mapping. In N. Fuhr, M. Lalmas and A. Trotman (Eds.), INEX 2006 LNCS(4518) (pp. 540--551).
[7]
Suzuki, N. (2005). Finding an optimum edit script between an XML document and a DTD. In L. M. Liebrock (Ed.), Proceedings of the 2005 ACM Symposium on Applied Computing (pp. 647--653). New York, NY: ACM.
[8]
Tagarelli, A. & Greco, S. (2010). Semantic clustering of XML documents. ACM Transactions on Information Systems. 28(1), 1--56.
[9]
Xing, G.(2006) Fast Approximate Matching Between XML Documents and Schemata. In X. Zhou, J. Li, H. Shen, M. Kitsuregawa and Y. Zhang(Eds.), Frontiers of WWW Research and Development - APWeb 2006 Lecture Notes in Computer Science (3841) (pp. 425--436).
[10]
Xing, G., Guo, J., & Xia, Z.(2006) Classifying XML Documents Based on Structure/Content Similarity. In N. Fuhr, M. Lalmas and A. Trotman (Eds.), INEX 2006 Lecture Notes in Computer Science (4518) (pp. 444--457).
[11]
Xing, G., Xia, Z., & Guo, J. (2007). Clustering XML documents based on structural similarity. In R. Kotagiri, P. R. Krishna, M. Mohania and E. Nantajeewarawat(Eds.), Proceedings of the 12th international Conference on Database Systems For Advanced Applications Lecture Notes in Computer Science (4443) (pp. 905--911).
[12]
Guangming Xing, Approximate Matching Between XML Documents and Schemas with Applications in XML Classification and Clustering, XML Data Mining: Models, Methods, and Applications (Eds Andrea Tagarelli) ISBN 1-61350-356-3, IGI-Global: Chapter 5: 99--124, 2012.
[13]
Zhang, K., & Shasha, D. (1997). Tree pattern matching. In A. Apostolico and Z. Galil Pattern Matching Algorithms (pp. 341--369). London, UK: Oxford University Press.

Cited By

View all

Index Terms

  1. Approximate matching of XML documents with schemata using tree alignment

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ACMSE '14: Proceedings of the 2014 ACM Southeast Conference
    March 2014
    265 pages
    ISBN:9781450329231
    DOI:10.1145/2638404
    • Conference Chair:
    • Ken Hoganson,
    • Program Chair:
    • Selena He
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 March 2014

    Check for updates

    Author Tags

    1. XMLSchema
    2. approximate matching
    3. classification/clustering
    4. tree alignment

    Qualifiers

    • Research-article

    Conference

    ACM SE '14
    ACM SE '14: ACM Southeast Regional Conference 2014
    March 28 - 29, 2014
    Georgia, Kennesaw

    Acceptance Rates

    Overall Acceptance Rate 502 of 1,023 submissions, 49%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media