research-article

An Evaluation of Large-scale Methods for Image Instance and Class Discovery

Authors:

Matthijs Douze,

Jeff JohnsonAuthors Info & Claims

Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017

Pages 1 - 9

https://doi.org/10.1145/3126686.3126711

Published: 23 October 2017 Publication History

Abstract

This paper aims at discovering meaningful subsets of related images from large image collections without annotations. We search groups of images related at different levels of semantic, i.e., either instances or visual classes. While k-means is usually considered as the gold standard for this task, we evaluate and show the interest of diffusion methods that have been neglected by the state of the art, such as the Markov Clustering algorithm.

We report results on the ImageNet and the Paris500k instance dataset, both enlarged with images from YFCC100M. We evaluate our methods with a labelling cost that reflects how much effort a human would require to correct generated clusters.

Our analysis highlights several properties. First, when powered with an efficient GPU implementation, the cost of the discovery process is small compared to computing the image descriptors, even for collections as large as 100 million images. Second, we show that descriptions selected for instance search improve the discovery of object classes. Third, the Markov Clustering technique consistently outperforms other methods; to our knowledge it has never been considered in this large scale scenario.

References

[1]

Sameer Agarwal, Noah Snavely, Ian Simon, Steven M Seitz, and Richard Szeliski. 2009. Building rome in a day. In CVPR. 72--79.

[2]

Yannis Avrithis, Yannis Kalantidis, Evangelos Anagnostopoulos, and Ioannis Z Emiris. 2015. Web-scale image clustering revisited. In ICCV.

Digital Library

[3]

Artem Babenko and Victor Lempitsky. 2015. Aggregating local deep features for image retrieval CVPR. 1269--1277.

[4]

Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. 2014. Neural codes for image retrieval. In ECCV.

[5]

Ronald R. Coifman, Boaz Nadler, Stephane Lafon and Ioannis G. Kevrekidis. 2008. Diffusion maps, spectral clustering and reaction coordinates of dynamical systems. Technical Report. Arxiv.

[6]

Minsu Cho, Suha Kwak, Cordelia Schmid, and Jean Ponce. 2015. Unsupervised Object Discovery and Localization in the Wild: Part-based Matching with Bottom-up Region Proposals. In CVPR.

[7]

Minsu Cho and Kyoung Mu Lee. 2012. Mode-seeking on graphs via random walks. In CVPR.

[8]

Ondrej Chum and Jiri Matas. 2010. Large-Scale Discovery of Spatially Related Images. IEEE Trans. PAMI, Vol. 32, 2 (February. 2010), 371--377.

Digital Library

[9]

W. Dong, M. Charikar, and K. Li. 2011. Efficient K-Nearest Neighbor Graph Construction for Generic Similarity Measures WWW.

Digital Library

[10]

Wei Dong, Richard Socher, Li Li-Jia, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database CVPR.

[11]

Michael Donoser and Horst Bischof. 2013. Diffusion processes for retrieval revisited. In CVPR. 1320--1327.

Digital Library

[12]

Anton J. Enright, Stijn Van Dongen, and Christos A. Ouzounis. 2002. An efficient algorithm for large-scale detection of protein families. Nucleic acids research Vol. 30, 7 (2002).

[13]

Rob Fergus, Yair Weiss, and Antonio Torralba. 2009. Semi-supervised learning in gigantic image collections NIPS. 522--530.

Digital Library

[14]

J.-M. Frahm, P. Georgel, D. Gallup, T. Johnson, R. Raguram, C. Wu, Y. Jen, E. Dunn, B. Clipp, S. Lazebnik, and M. Pollefeys. 2010. Building Rome on a Cloudless Day. In ECCV.

Digital Library

[15]

Teddy Furon and Hervé Jégou. 2013. Using extreme value theory for image detection. Research Report RR-8244. INRIA.

[16]

Gene H. Golub and Charles Van Loan. 2013. Matrix computations. John Hopkin University Press.

[17]

Yunchao Gong, Marcin Pawlowski, Fei Yang, Louis Brandy, Lubomir Bourdev, and Rob Fergus. 2015. Web scale photo hash clustering on a single machine Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 19--27.

[18]

Yunchao Gong, Liwei Wang, Ruiqi Guo, and Svetlana Lazebnik. 2014. Multi-scale orderless pooling of deep convolutional activation features ECCV.

[19]

Albert Gordo, Jon Almazan, Jerome Revaud, and Diane Larlus. 2016. Deep Image Retrieval: Learning Global Representations for Image Search ECCV.

[20]

Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid. 2009. Is That You? Metric Learning Approaches for Face Identification IEEE International Conference on Computer Vision, 2009.

[21]

Ben Harwood and Tom Drummond. 2016. FANNG: Fast Approximate Nearest Neighbour Graphs. CVPR.

[22]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. CVPR (June. 2016).

[23]

Ahmet Iscen, Yannis Avrithis, Giorgos Tolias, Teddy Furon, and Ondrej Chum. 2017. Fast Spectral Ranking for Similarity Search. arXiv preprint arXiv:1703.06935 (2017).

[24]

Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Teddy Furon, and Ondrej Chum. 2017. Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations CVPR.

[25]

Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Exploiting descriptor distances for precise image search. Research Report RR-7656. INRIA.

[26]

Hervé Jégou, Hedi Harzallah, and Cordelia Schmid. 2007. A contextual dissimilarity measure for accurate and efficient image search CVPR.

[27]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2017. Billion-scale similarity search with GPUs. arXiv preprint arXiv:1702.08734 (2017).

[28]

Armand Joulin, Francis Bach, and Jean Ponce. 2012. Multi-class cosegmentation. In CVPR. IEEE, 542--549.

Digital Library

[29]

Armand Joulin, Laurens van der Maaten, Allan Jabri, and Nicolas Vasilache. 2016. Learning Visual Features from Large Weakly Supervised Data ECCV.

[30]

Yannis Kalantidis, Lyndon Kennedy, Huy Nguyen, Clayton Mellina, and David A Shamma. 2016. LOH and behold: Web-scale visual search, recommendation and clustering using Locally Optimized Hashing. arXiv preprint arXiv:1604.06480 (2016).

[31]

Theodora Kontogianni, Markus Mathias, and Bastian Leibe. 2016. Incremental Object Discovery in Time-Varying Image Collections CVPR. 2082--2090.

[32]

Frank Lin and William W Cohen. 2010. Power iteration clustering. In ICML.

Digital Library

[33]

D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. IJCV, Vol. 60, 2 (2004), 91--110.

Digital Library

[34]

Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV, Vol. 42, 3 (2001), 145--175.

Digital Library

[35]

D. Omercevic, O. Drbohlav, and A. Leonardis. 2007. High-dimensional feature matching: employing the concept of meaningful nearest neighbors ICCV.

[36]

Florent Perronnin, Yan Liu, and J.-M. Renders. 2009. A family of contextual measures of similarity between distributions with application to image retrieval. In CVPR. 2358--2365.

[37]

J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching CVPR.

[38]

J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. 2008. Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases CVPR.

[39]

James Philbin and Andrew Zisserman. 2008. Object mining using a matching graph on very large image collections Computer Vision, Graphics & Image Processing.

Digital Library

[40]

Filip Radenović, Giorgos Tolias, and Ondvrej Chum. 2016. CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples ECCV.

[41]

Walter Scheirer, Neeraj Kumar, Peter Belhumeur, and Terrance Boult. 2012. Multi-attribute spaces: Calibration for attribute fusion and similarity search CVPR.

[42]

K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556 (2014).

[43]

Josef Sivic and Andrew Zisserman. 2003. Video Google: A Text Retrieval Approach to Object Matching in Videos ICCV. 1470--1477.

Digital Library

[44]

Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM Vol. 59, 2 (2016), 64--73.

Digital Library

[45]

Giorgios Tolias, Yannis Avrithis, and Hervé Jégou. 2013. To aggregate or not to aggregate: Selective match kernels for image search ICCV.

Digital Library

[46]

Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2016. Particular object retrieval with integral max-pooling of CNN activations ICLR.

[47]

A. Torralba and A. A. Efros. 2011. Unbiased look at dataset bias. In CVPR.

Digital Library

[48]

Tobias Weyand, Jan Hosang, and Bastian Leibe. 2010. An evaluation of two automatic landmark building discovery algorithms for city reconstruction ECCV.

Digital Library

[49]

Tobias Weyand, Ilya Kostrikov, and James Philbin. 2016. Planet-photo geolocation with convolutional neural networks ECCV. Springer, 37--55.

[50]

Tobias Weyand and Bastian Leibe. 2015. Visual landmark recognition from internet photo collections: A large-scale evaluation. Computer Vision and Image Understanding Vol. 135 (2015), 1--15.

Digital Library

[51]

M. D. Zeiler and R. Fergus. 2014. Visualizing and Understanding Convolutional Networks ECCV.

[52]

Lihi Zelnik-Manor and Pietro Perona. 2004. Self-tuning spectral clustering. NIPS, Vol. 17, 1601--1608 (2004), 16.

Digital Library

[53]

Wan-Lei Zhao, Hervé Jégou, and Guillaume Gravier. 2013. Sim-Min-Hash: An efficient matching technique for linking large image collections ACM Multimedia.

Digital Library

Cited By

Cai WHou CCui MWang BXiong GGou G(2024)Incremental encrypted traffic classification via contrastive prototype networksComputer Networks10.1016/j.comnet.2024.110591250(110591)Online publication date: Aug-2024
https://doi.org/10.1016/j.comnet.2024.110591
Johnson JDouze MJegou H(2021)Billion-Scale Similarity Search with GPUsIEEE Transactions on Big Data10.1109/TBDATA.2019.29215727:3(535-547)Online publication date: 1-Jul-2021
https://doi.org/10.1109/TBDATA.2019.2921572
He JZhu F(2021)Online Continual Learning For Visual Food Classification2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW54120.2021.00265(2337-2346)Online publication date: Oct-2021
https://doi.org/10.1109/ICCVW54120.2021.00265

Index Terms

An Evaluation of Large-scale Methods for Image Instance and Class Discovery

Recommendations

Large-Scale Discovery of Spatially Related Images

We propose a randomized data mining method that finds clusters of spatially overlapping images. The core of the method relies on the min-Hash algorithm for fast detection of pairs of images with spatial overlap, the so-called cluster seeds. The seeds ...
Large scale K-means clustering using GPUs
Abstract
The k-means algorithm is widely used for clustering, compressing, and summarizing vector data. We present a fast and memory-efficient GPU-based algorithm for exact k-means, Asynchronous Selective Batched K-means (ASB K-means). Unlike most GPU-...
An experimental comparison of clustering methods for content-based indexing of large image databases

In recent years, the expansion of acquisition devices such as digital cameras, the development of storage and transmission techniques of multimedia documents and the development of tablet computers facilitate the development of many large image ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017

October 2017

558 pages

ISBN:9781450354165

DOI:10.1145/3126686

Program Chairs:
Wanmin Wu
Google, USA
,
Jianchao Yang
Snap Inc., USA
,
Qi Tian
The University of Texas at San Antonio, USA
,
Roger Zimmermann
National University of Singapore, Singapore

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '17

Sponsor:

SIGMM

MM '17: ACM Multimedia Conference

October 23 - 27, 2017

California, Mountain View, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
141
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cai WHou CCui MWang BXiong GGou G(2024)Incremental encrypted traffic classification via contrastive prototype networksComputer Networks10.1016/j.comnet.2024.110591250(110591)Online publication date: Aug-2024
https://doi.org/10.1016/j.comnet.2024.110591
Johnson JDouze MJegou H(2021)Billion-Scale Similarity Search with GPUsIEEE Transactions on Big Data10.1109/TBDATA.2019.29215727:3(535-547)Online publication date: 1-Jul-2021
https://doi.org/10.1109/TBDATA.2019.2921572
He JZhu F(2021)Online Continual Learning For Visual Food Classification2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW54120.2021.00265(2337-2346)Online publication date: Oct-2021
https://doi.org/10.1109/ICCVW54120.2021.00265

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents