Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1135777.1135965acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
Article

Using proportional transportation similarity with learned element semantics for XML document clustering

Published: 23 May 2006 Publication History

Abstract

This paper proposes a novel approach to measuring XML document similarity by taking into account the semantics between XML elements. The motivation of the proposed approach is to overcome the problems of "under-contributionö and "over-contributionö existing in previous work. The element semantics are learned in an unsupervised way and the Proportional Transportation Similarity is proposed to evaluate XML document similarity by modeling the similarity calculation as a transportation problem. Experiments of clustering are performed on three ACM SIGMOD data sets and results show the favorable performance of the proposed approach.

References

[1]
A. Doucet, H. A. Myka. Naive Clustering of a Large XML Document Collection. In Proceedings of the 1st INEX, Germany, 2002.
[2]
P. Giannopoulos and R. C. Veltkamp. A Pseudo-Metric for Weighted Point Sets. In Proceedings of the 7th European Conference on Computer Vision (ECCV), 715--730, 2002.
[3]
N. Karmarkar. A new polynomial-time algorithm for linear programming. In Proceedings of the Sixteenth Annual ACM Symposium on Theory of Computing, 302--311, 1984.
[4]
J.W. Yang and X.O. Chen. A semi-structured document model for text mining. Journal of Computer Science and Technology, 17(5): 603--610, 2002.
[5]
K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput., 18(6):1245--1262, 1989.

Cited By

View all
  • (2019)Temporal and multi-versioned XML documentsInformation Processing and Management: an International Journal10.1016/j.ipm.2013.08.00350:1(113-131)Online publication date: 25-Nov-2019
  • (2017)Structure based XML document clustering: A review2017 International Conference on Infocom Technologies and Unmanned Systems (Trends and Future Directions) (ICTUS)10.1109/ICTUS.2017.8286068(543-547)Online publication date: Dec-2017
  • (2007)XML version detectionProceedings of the 2007 ACM symposium on Document engineering10.1145/1284420.1284441(79-88)Online publication date: 28-Aug-2007

Index Terms

  1. Using proportional transportation similarity with learned element semantics for XML document clustering

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WWW '06: Proceedings of the 15th international conference on World Wide Web
      May 2006
      1102 pages
      ISBN:1595933239
      DOI:10.1145/1135777
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 May 2006

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. XML document clustering
      2. proportional transportation similarity

      Qualifiers

      • Article

      Conference

      WWW06
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 16 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)Temporal and multi-versioned XML documentsInformation Processing and Management: an International Journal10.1016/j.ipm.2013.08.00350:1(113-131)Online publication date: 25-Nov-2019
      • (2017)Structure based XML document clustering: A review2017 International Conference on Infocom Technologies and Unmanned Systems (Trends and Future Directions) (ICTUS)10.1109/ICTUS.2017.8286068(543-547)Online publication date: Dec-2017
      • (2007)XML version detectionProceedings of the 2007 ACM symposium on Document engineering10.1145/1284420.1284441(79-88)Online publication date: 28-Aug-2007

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media