Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/951953.952411guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Syntactic Similarity of Web Documents

Published: 10 November 2003 Publication History

Abstract

This paper presents and compares two methods for evaluating the syntactic similarity between documents. The first method uses the Patricia tree, constructed from the original document, and the similarity is computed searching the text of each candidate document in the tree. The second method uses shingles concept to obtain the similarity measure for every document pairs, and each shingle from the original document is inserted in a hash table, where shingles of each candidate document are searched. Given an original document and some candidates, two methods find documents that have some similarity relationship with the original document. Experimental results were obtained by using a plagiarized documents generator system, from 900 documents collected from the Web. Considering the arithmetic average of the absolute differences between the expected and obtained similarity, the algorithm that uses shingles obtained a performance of 4.13% and the algorithm that uses Patricia tree a performance of 7.50%.

Cited By

View all
  • (2018)Measuring verb similarity using binary coefficients with application to isiXhosa and isiZuluProceedings of the Annual Conference of the South African Institute of Computer Scientists and Information Technologists10.1145/3278681.3278690(65-71)Online publication date: 26-Sep-2018
  • (2017)Minmax Circular Sector Arc for External Plagiarisms Heuristic Retrieval stageKnowledge-Based Systems10.1016/j.knosys.2017.08.013137:C(1-18)Online publication date: 1-Dec-2017
  • (2007)Navigating among search resultsProceedings of the 8th international conference on Web information systems engineering10.5555/1781374.1781450(663-672)Online publication date: 3-Dec-2007

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
LA-WEB '03: Proceedings of the First Conference on Latin American Web Congress
November 2003
ISBN:0769520588

Publisher

IEEE Computer Society

United States

Publication History

Published: 10 November 2003

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Measuring verb similarity using binary coefficients with application to isiXhosa and isiZuluProceedings of the Annual Conference of the South African Institute of Computer Scientists and Information Technologists10.1145/3278681.3278690(65-71)Online publication date: 26-Sep-2018
  • (2017)Minmax Circular Sector Arc for External Plagiarisms Heuristic Retrieval stageKnowledge-Based Systems10.1016/j.knosys.2017.08.013137:C(1-18)Online publication date: 1-Dec-2017
  • (2007)Navigating among search resultsProceedings of the 8th international conference on Web information systems engineering10.5555/1781374.1781450(663-672)Online publication date: 3-Dec-2007

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media