Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2537734.2537742acmotherconferencesArticle/Chapter ViewAbstractPublication PagesadcsConference Proceedingsconference-collections
research-article

Efficient top-k retrieval with signatures

Published: 05 December 2013 Publication History

Abstract

This paper describes a new method of indexing and searching large binary signature collections to efficiently find similar signatures, addressing the scalability problem in signature search. Signatures offer efficient computation with acceptable measure of similarity in numerous applications. However, performing a complete search with a given search argument (a signature) requires a Hamming distance calculation against every signature in the collection. This quickly becomes excessive when dealing with large collections, presenting issues of scalability that limit their applicability.
Our method efficiently finds similar signatures in very large collections, trading memory use and precision for greatly improved search speed. Experimental results demonstrate that our approach is capable of finding a set of nearest signatures to a given search argument with a high degree of speed and fidelity.

References

[1]
Andrei Z Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences 1997. Proceedings, pages 21--29. IEEE, 1997.
[2]
Trevor Cohen, Roger Schvaneveldt, and Dominic Widdows. Reflective random indexing and indirect inference: A scalable method for discovery of implicit connections. Journal of Biomedical Informatics, 43(2): 240--256, 2010.
[3]
Shlomo Geva and Christopher M De Vries. Topsig: topology preserving document signatures. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pages 333--338. ACM, 2011.
[4]
Richard W Hamming. Error detecting and error correcting codes. Bell System technical journal, 29(2): 147--160, 1950.
[5]
Kalervo Järvelin and Jaana Kekäläinen. IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 41--48. ACM, 2000.
[6]
Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma. Detecting near-duplicates for web crawling. In Proceedings of the 16th International Conference on World Wide Web, pages 141--150. ACM, 2007.
[7]
Caitlin Sadowski and Greg Levin. Simhash: Hash-based similarity detection. Technical report, Technical report, Google, 2007.
[8]
Malcolm Slaney and Michael Casey. Locality-sensitive hashing for finding nearest neighbors {lecture notes}. Signal Processing Magazine, IEEE, 25(2): 128--131, 2008.
[9]
Yisong Yue, Thomas Finley, Filip Radlinski, and Thorsten Joachims. A support vector method for optimizing average precision. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 271--278. ACM, 2007.

Cited By

View all
  • (2021)SAUCEProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481950(4173-4183)Online publication date: 26-Oct-2021
  • (2020)An indexing scheme and descriptor for 3D object retrieval based on local shape queryingComputers & Graphics10.1016/j.cag.2020.09.00192(55-66)Online publication date: Nov-2020
  • (2019)On Tradeoffs Between Document Signature Methods for a Legal Due Diligence CorpusProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331311(1001-1004)Online publication date: 18-Jul-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ADCS '13: Proceedings of the 18th Australasian Document Computing Symposium
December 2013
126 pages
ISBN:9781450325240
DOI:10.1145/2537734
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. document signatures
  2. hamming distance
  3. locality-sensitive hashing
  4. near-duplicate detection
  5. nearest neighbour
  6. top-k

Qualifiers

  • Research-article

Conference

ADCS '13
ADCS '13: The Australasian Document Computing Symposium
December 5 - 6, 2013
Queensland, Brisbane, Australia

Acceptance Rates

ADCS '13 Paper Acceptance Rate 12 of 23 submissions, 52%;
Overall Acceptance Rate 30 of 57 submissions, 53%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)SAUCEProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481950(4173-4183)Online publication date: 26-Oct-2021
  • (2020)An indexing scheme and descriptor for 3D object retrieval based on local shape queryingComputers & Graphics10.1016/j.cag.2020.09.00192(55-66)Online publication date: Nov-2020
  • (2019)On Tradeoffs Between Document Signature Methods for a Legal Due Diligence CorpusProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331311(1001-1004)Online publication date: 18-Jul-2019
  • (2018)Content-based image retrieval using a signature graph and a self-organizing mapInternational Journal of Applied Mathematics and Computer Science10.5555/3060518.306052226:2(423-438)Online publication date: 15-Dec-2018
  • (2017)Content‐based image retrieval based on binary signatures cluster graphExpert Systems10.1111/exsy.1222035:1Online publication date: 2-Jul-2017
  • (2017)Similarity Projection: A Geometric Measure for Comparison of Biological Sequences2017 IEEE 13th International Conference on e-Science (e-Science)10.1109/eScience.2017.46(325-334)Online publication date: Oct-2017
  • (2016)Clustering Binary Signature Applied in Content-Based Image RetrievalNew Advances in Information Systems and Technologies10.1007/978-3-319-31232-3_22(233-242)Online publication date: 2-Mar-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media