Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1166160.1166183acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
Article

Fast and simple XML tree differencing by sequence alignment

Published: 10 October 2006 Publication History

Abstract

With the advent of XML we have seen a renewed interest in methods for computing the difference between trees. Methods that include heuristic elements play an important role in practical applications due to the inherent complexity of the problem. We present a method for differencing XML as ordered trees based on mapping the problem to the domain of sequence alignment, applying simple and efficient heuristics in this domain, and transforming back to the tree domain. Our approach provides a method to quickly compute changes that are meaningful transformations on the XML tree level, and includes subtree move as a primitive operation. We evaluate the feasibility of our approach and benchmark it against a selection of existing differencing tools. The results show our approach to be feasible and to have the potential to perform on par with tools of a more complex design in terms of both output size and execution time.

References

[1]
L. Bergroth, H. Hakonen, and T. Raita. A survey of longest common subsequence algorithms. In Seventh International Symposium on String Processing Information Retrieval (SPIRE'00), pages 39--48, Los Alamitos, CA, USA, Sept. 2000. IEEE Computer Society.
[2]
S.S. Chawathe. Comparing hierarchical data in external memory. In VLDB '99: Proceedings of the 25th International Conference on Very Large Data Bases, pages 90--101, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc.
[3]
S.S. Chawathe and H. Garcia-Molina. Meaningful change detection in structured data. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, pages 26--37. ACM Press, May 1997.
[4]
S.S. Chawathe, A. Rajaraman, H. Garcia-Molina, and J. Widom. Change detection in hierarchically structured information. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 493--504, June 1996.
[5]
G. Cobena, S. Abiteboul, and A. Marian. Detecting changes in XML documents. In 18th International Conference on Data Engineering, pages 41--52, Feb. 2002.
[6]
J.-l. Gailly and M. Adler. zlib 1.1.4 Manual, Mar. 2002.
[7]
J. Kangasharju and T. Lindholm. A sequence-based type-aware interface for XML processing. In M.H. Hamza, editor, Ninth IASTED International Conference on Internet and Multimedia Systems and Applications, pages 83--88. ACTA Press, Feb. 2005.
[8]
A. Laux and L. Martin. XUpdate - XML Update Language. XML:DB Initiative for XML Databases, Sept. 2000. Working draft.
[9]
T. Lindholm. A 3-way merging algorithm for synchronizing ordered trees - the 3DM merging and differencing tool for XML. Master's thesis, Helsinki University of Technology, Department of Computer Science and Engineering, Espoo, Finland, Sept. 2001.
[10]
T. Lindholm, J. Kangasharju, and S. Tarkoma. A hybrid approach to optimistic file system directory tree synchronization. In Fourth International ACM Workshop on Data Engineering for Wireless and Mobile Access, June 2005.
[11]
S.Y. Lu. A tree-to-tree distance and its application to cluster analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2):219--224, 1979.
[12]
J. MacDonald. File system support for delta compression. Master's thesis, UC Berkeley, May 2000.
[13]
E.W. Myers. An O(ND) difference algorithm and its variations. Algorithmica, 1(2):251--266, 1986.
[14]
M.O. Rabin. Fingerprinting by random polynomials. Technical Report TR-15-81, Department of Computer Science, Harvard University, 1981.
[15]
S. Rönnau, J. Scheffczyk, and U.M. Borghoff. Towards XML version control of office documents. In ACM Symposium on Document Engineering, pages 10--19, New York, NY, USA, Nov. 2005. ACM Press.
[16]
S.M. Selkow. The tree-to-tree editing problem. Information Processing Letters, 6(6):184--186, Dec. 1977.
[17]
D. Shapira and J.A. Storer. Edit distance with move operations. In A. Apostolico and M. Takeda, editors, 13th Annual Symposium on Combinatorial Pattern Matching, volume 2373 of Lecture Notes in Computer Science, pages 85--98, Fukuoka, Japan, July 2002. Springer-Verlag.
[18]
Sun Microsystems. OpenOffice.org XML File Format 1.0, Dec. 2002.
[19]
K.-C. Tai. The tree-to-tree correction problem. Journal of the ACM, 26(3):422--433, July 1979.
[20]
W.F. Tichy. The string-to-string correction problem with block moves. ACM Transactions on Computer Systems, 2(4):309--321, 1984.
[21]
A. Tridgell. Efficient Algorithms for Sorting and Synchronization. PhD thesis, Australian National University, Canberra, Australia, Feb. 1999.
[22]
Y. Wang, D.J. DeWitt, and J.-Y. Cai. X-Diff: an effective change detection algorithm for XML documents. In 19th International Conference on Data Engineering, pages 519--530, Mar. 2003.
[23]
World Wide Web Consortium, Cambridge, Massachusetts, USA. XML Path Language (XPath) 1.0, Nov. 1999. W3C Recommendation.
[24]
World Wide Web Consortium, Cambridge, Massachusetts, USA. XHTML 1.0 The Extensible HyperText Markup Language (Second Edition), Aug. 2002. W3C Recommendation.
[25]
World Wide Web Consortium, Cambridge, Massachusetts, USA. Scalable Vector Graphics (SVG) 1.1 Specification, Jan. 2003. W3C Recommendation.
[26]
World Wide Web Consortium, Cambridge, Massachusetts, USA. Document Object Model (DOM) Level 3 Core Specification, Apr. 2004. W3C Recommendation.
[27]
World Wide Web Consortium, Cambridge, Massachusetts, USA. Extensible Markup Language (XML) 1.0, 3rd edition, Feb. 2004. W3C Recommendation.
[28]
World Wide Web Consortium, Cambridge, Massachusetts, USA. XML Information Set, 2nd edition, Feb. 2004. W3C Recommendation.
[29]
World Wide Web Consortium, Cambridge, Massachusetts, USA. XQuery 1.0 and XPath 2.0 Data Model (XDM), Nov. 2005. W3C Candidate Recommendation.
[30]
World Wide Web Consortium, Cambridge, Massachusetts, USA. Remote Events for XML (REX) 1.0, Feb. 2006. W3C Working Draft.
[31]
K. Zhang and D. Shasha. Fast algorithm for the unit cost editing distance between trees. Journal of Algorithms, 11(4):581--621, Dec. 1990.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '06: Proceedings of the 2006 ACM symposium on Document engineering
October 2006
232 pages
ISBN:1595935150
DOI:10.1145/1166160
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. XML
  2. differencing
  3. move
  4. ordered tree
  5. sequence alignment

Qualifiers

  • Article

Conference

DocEng06
Sponsor:
DocEng06: ACM Symposium on Document Engineering
October 10 - 13, 2006
Amsterdam, The Netherlands

Acceptance Rates

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Change Detection on JATS Academic ArticlesProceedings of the ACM Symposium on Document Engineering 202010.1145/3395027.3419581(1-10)Online publication date: 29-Sep-2020
  • (2018)diffiProceedings of the ACM Symposium on Document Engineering 201810.1145/3209280.3229084(1-4)Online publication date: 28-Aug-2018
  • (2017)A stream-based method to detect differences between XML documentsJournal of Information Science10.1177/016555151560280543:1(39-53)Online publication date: 1-Feb-2017
  • (2016)JSON Patch for Turning a Pull REST API into a PushService-Oriented Computing10.1007/978-3-319-46295-0_27(435-449)Online publication date: 20-Sep-2016
  • (2016)Bridging the gap between tracking and detecting changes in XMLSoftware—Practice & Experience10.1002/spe.230546:2(227-250)Online publication date: 1-Feb-2016
  • (2014)Fine-grained change detection in structured text documentsProceedings of the 2014 ACM symposium on Document engineering10.1145/2644866.2644880(87-96)Online publication date: 16-Sep-2014
  • (2014)Using versioned trees, change detection and node identity for three-way XML mergingComputer Science - Research and Development10.1007/s00450-013-0253-5Online publication date: 29-Nov-2014
  • (2014)Integration of Web Sources Under Uncertainty and Dependencies Using Probabilistic XMLDatabase Systems for Advanced Applications10.1007/978-3-662-43984-5_28(360-375)Online publication date: 11-Jul-2014
  • (2013)Document changesProceedings of the 2013 ACM symposium on Document engineering10.1145/2494266.2494322(281-282)Online publication date: 10-Sep-2013
  • (2013)Introduction to the universal delta modelProceedings of the 2013 ACM symposium on Document engineering10.1145/2494266.2494284(47-56)Online publication date: 10-Sep-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media