Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1141277.1141467acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Computing edit distances between an XML document and a schema and its application in document classification

Published: 23 April 2006 Publication History

Abstract

In this paper, we present an algorithm to find a sequence of top-down edit operations with minimum cost that transforms an XML document such that it conforms to a schema. It is shown that the algorithm runs in O(p x log p x n), where p is the size of the schema(grammar) and n is the size of the XML document (tree). We have also shown that edit distance with restricted top-down edit operations can be computed the same way.We will also show how to use the edit distances in document classification. Experimental studies have shown that our methods are effective in structure-oriented classification for both real and synthesized data sets.

References

[1]
Nobutaka Suzuki, Finding an Optimum Edit Script between an XML Document and a DTD, Proceedings of ACM Symposium on Applied Computing, pp. 647 - 653, March, 2005, Santa Fe, NM.
[2]
Rodney Canfield, Guangming Xing, Approximate XML Document Matching (Poster), Proceedings of ACM Symposium on Applied Computing, March, 2005, Santa Fe, NM.
[3]
T. Bray, J. Paoli, M. Sperberg-McQueen, and etl., Extensible Markup Language (XML) 1.0 (Third Edition), W3C, http://www.w3.org/TR/2004/REC-xml-20040204/.
[4]
D. Shasha, K. Zhang, Approximate Tree Pattern Matching, Chapter 14 Pattern Matching Algorithms (eds. Apostolico, A. and Galil, Z.), Oxford University Press, June 1997.
[5]
E. Tanaka, K. Tanaka, The Tree-to-tree Editing Problem, International Journal of Pattern Recognition and Artificial Intelligence, 2, (2), pp.221--240, 1988.
[6]
M. Murata Hedge Automata: A Formal Model for XML Schemata http://www.xml.gr.jp/relax/hedge_nice.html
[7]
G. Myers Approximately Matching Context Free Languages, Information Processing Letters, 54, 2, pp. 85--92, 1995.
[8]
E. Bertino, G. Guerrini, M. Mesiti, A Matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications, Info. Systems, V29, pp 23--46, 2004.
[9]
A. Boukottaya, C. Vanoirbeek, F. Paganelli, O. Abou Khaled, Automating XML Documents Transformations: A Conceptual Modelling Based Approach, Proceedings of 1st Asian-Pacific conference on Conceptual modelling, Dunedin, NZ, pp81 - 90 2004.
[10]
D. Reis, P. Golgher, A. Silva, A. Laender, Automatic web news extraction using tree edit distance, WWW'04, 2004, pp 502--511, Manhattan, NY, 2004.
[11]
W. Chen, New Algorithm for Ordered Tree-to-Tree Correction Problem, J. of Algorithms, 40:135--158, 2001.
[12]
A. Nierman, H. V. Jagadish, Evaluating structural similarity in XML documents, WebDB 2002, Madison, Wisconsin, June 2002.
[13]
XML Document Mining Challenge, http://xmlmining.lip6.fr/
[14]
B. Chidlovskii, Schema Extraction from XML Data: A Grammatical Inference Approach, KRDB'01 Workshop, Rome, Italy, September 15, 2001.
[15]
WEKA Project, http://www.cs.waikato.ac.nz/ml/weka/

Cited By

View all
  • (2019)Alignment Distance of Regular Tree LanguagesTheoretical Computer Science10.1016/j.tcs.2019.06.022Online publication date: Jul-2019
  • (2015)Approximate XML structure validation based on document-grammar tree similarityInformation Sciences: an International Journal10.1016/j.ins.2014.09.044295:C(258-302)Online publication date: 20-Feb-2015
  • (2013)Extracting differences between regular tree grammarsProceedings of the 28th Annual ACM Symposium on Applied Computing10.1145/2480362.2480527(859-864)Online publication date: 18-Mar-2013
  • Show More Cited By

Index Terms

  1. Computing edit distances between an XML document and a schema and its application in document classification

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        SAC '06: Proceedings of the 2006 ACM symposium on Applied computing
        April 2006
        1967 pages
        ISBN:1595931082
        DOI:10.1145/1141277
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 23 April 2006

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. XML
        2. classification
        3. document classification
        4. edit distance

        Qualifiers

        • Article

        Conference

        SAC06
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)1
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 18 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2019)Alignment Distance of Regular Tree LanguagesTheoretical Computer Science10.1016/j.tcs.2019.06.022Online publication date: Jul-2019
        • (2015)Approximate XML structure validation based on document-grammar tree similarityInformation Sciences: an International Journal10.1016/j.ins.2014.09.044295:C(258-302)Online publication date: 20-Feb-2015
        • (2013)Extracting differences between regular tree grammarsProceedings of the 28th Annual ACM Symposium on Applied Computing10.1145/2480362.2480527(859-864)Online publication date: 18-Mar-2013
        • (2011)Automatic Policy Rule Extraction for Configuration ManagementProceedings of the 2011 IEEE International Symposium on Policies for Distributed Systems and Networks10.1109/POLICY.2011.13(125-128)Online publication date: 6-Jun-2011

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media