Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3126686.3126711acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

An Evaluation of Large-scale Methods for Image Instance and Class Discovery

Published: 23 October 2017 Publication History

Abstract

This paper aims at discovering meaningful subsets of related images from large image collections without annotations. We search groups of images related at different levels of semantic, i.e., either instances or visual classes. While k-means is usually considered as the gold standard for this task, we evaluate and show the interest of diffusion methods that have been neglected by the state of the art, such as the Markov Clustering algorithm.
We report results on the ImageNet and the Paris500k instance dataset, both enlarged with images from YFCC100M. We evaluate our methods with a labelling cost that reflects how much effort a human would require to correct generated clusters.
Our analysis highlights several properties. First, when powered with an efficient GPU implementation, the cost of the discovery process is small compared to computing the image descriptors, even for collections as large as 100 million images. Second, we show that descriptions selected for instance search improve the discovery of object classes. Third, the Markov Clustering technique consistently outperforms other methods; to our knowledge it has never been considered in this large scale scenario.

References

[1]
Sameer Agarwal, Noah Snavely, Ian Simon, Steven M Seitz, and Richard Szeliski. 2009. Building rome in a day. In CVPR. 72--79.
[2]
Yannis Avrithis, Yannis Kalantidis, Evangelos Anagnostopoulos, and Ioannis Z Emiris. 2015. Web-scale image clustering revisited. In ICCV.
[3]
Artem Babenko and Victor Lempitsky. 2015. Aggregating local deep features for image retrieval CVPR. 1269--1277.
[4]
Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. 2014. Neural codes for image retrieval. In ECCV.
[5]
Ronald R. Coifman, Boaz Nadler, Stephane Lafon and Ioannis G. Kevrekidis. 2008. Diffusion maps, spectral clustering and reaction coordinates of dynamical systems. Technical Report. Arxiv.
[6]
Minsu Cho, Suha Kwak, Cordelia Schmid, and Jean Ponce. 2015. Unsupervised Object Discovery and Localization in the Wild: Part-based Matching with Bottom-up Region Proposals. In CVPR.
[7]
Minsu Cho and Kyoung Mu Lee. 2012. Mode-seeking on graphs via random walks. In CVPR.
[8]
Ondrej Chum and Jiri Matas. 2010. Large-Scale Discovery of Spatially Related Images. IEEE Trans. PAMI, Vol. 32, 2 (February. 2010), 371--377.
[9]
W. Dong, M. Charikar, and K. Li. 2011. Efficient K-Nearest Neighbor Graph Construction for Generic Similarity Measures WWW.
[10]
Wei Dong, Richard Socher, Li Li-Jia, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database CVPR.
[11]
Michael Donoser and Horst Bischof. 2013. Diffusion processes for retrieval revisited. In CVPR. 1320--1327.
[12]
Anton J. Enright, Stijn Van Dongen, and Christos A. Ouzounis. 2002. An efficient algorithm for large-scale detection of protein families. Nucleic acids research Vol. 30, 7 (2002).
[13]
Rob Fergus, Yair Weiss, and Antonio Torralba. 2009. Semi-supervised learning in gigantic image collections NIPS. 522--530.
[14]
J.-M. Frahm, P. Georgel, D. Gallup, T. Johnson, R. Raguram, C. Wu, Y. Jen, E. Dunn, B. Clipp, S. Lazebnik, and M. Pollefeys. 2010. Building Rome on a Cloudless Day. In ECCV.
[15]
Teddy Furon and Hervé Jégou. 2013. Using extreme value theory for image detection. Research Report RR-8244. INRIA.
[16]
Gene H. Golub and Charles Van Loan. 2013. Matrix computations. John Hopkin University Press.
[17]
Yunchao Gong, Marcin Pawlowski, Fei Yang, Louis Brandy, Lubomir Bourdev, and Rob Fergus. 2015. Web scale photo hash clustering on a single machine Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 19--27.
[18]
Yunchao Gong, Liwei Wang, Ruiqi Guo, and Svetlana Lazebnik. 2014. Multi-scale orderless pooling of deep convolutional activation features ECCV.
[19]
Albert Gordo, Jon Almazan, Jerome Revaud, and Diane Larlus. 2016. Deep Image Retrieval: Learning Global Representations for Image Search ECCV.
[20]
Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid. 2009. Is That You? Metric Learning Approaches for Face Identification IEEE International Conference on Computer Vision, 2009.
[21]
Ben Harwood and Tom Drummond. 2016. FANNG: Fast Approximate Nearest Neighbour Graphs. CVPR.
[22]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. CVPR (June. 2016).
[23]
Ahmet Iscen, Yannis Avrithis, Giorgos Tolias, Teddy Furon, and Ondrej Chum. 2017. Fast Spectral Ranking for Similarity Search. arXiv preprint arXiv:1703.06935 (2017).
[24]
Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Teddy Furon, and Ondrej Chum. 2017. Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations CVPR.
[25]
Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Exploiting descriptor distances for precise image search. Research Report RR-7656. INRIA.
[26]
Hervé Jégou, Hedi Harzallah, and Cordelia Schmid. 2007. A contextual dissimilarity measure for accurate and efficient image search CVPR.
[27]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2017. Billion-scale similarity search with GPUs. arXiv preprint arXiv:1702.08734 (2017).
[28]
Armand Joulin, Francis Bach, and Jean Ponce. 2012. Multi-class cosegmentation. In CVPR. IEEE, 542--549.
[29]
Armand Joulin, Laurens van der Maaten, Allan Jabri, and Nicolas Vasilache. 2016. Learning Visual Features from Large Weakly Supervised Data ECCV.
[30]
Yannis Kalantidis, Lyndon Kennedy, Huy Nguyen, Clayton Mellina, and David A Shamma. 2016. LOH and behold: Web-scale visual search, recommendation and clustering using Locally Optimized Hashing. arXiv preprint arXiv:1604.06480 (2016).
[31]
Theodora Kontogianni, Markus Mathias, and Bastian Leibe. 2016. Incremental Object Discovery in Time-Varying Image Collections CVPR. 2082--2090.
[32]
Frank Lin and William W Cohen. 2010. Power iteration clustering. In ICML.
[33]
D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. IJCV, Vol. 60, 2 (2004), 91--110.
[34]
Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV, Vol. 42, 3 (2001), 145--175.
[35]
D. Omercevic, O. Drbohlav, and A. Leonardis. 2007. High-dimensional feature matching: employing the concept of meaningful nearest neighbors ICCV.
[36]
Florent Perronnin, Yan Liu, and J.-M. Renders. 2009. A family of contextual measures of similarity between distributions with application to image retrieval. In CVPR. 2358--2365.
[37]
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching CVPR.
[38]
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. 2008. Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases CVPR.
[39]
James Philbin and Andrew Zisserman. 2008. Object mining using a matching graph on very large image collections Computer Vision, Graphics & Image Processing.
[40]
Filip Radenović, Giorgos Tolias, and Ondvrej Chum. 2016. CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples ECCV.
[41]
Walter Scheirer, Neeraj Kumar, Peter Belhumeur, and Terrance Boult. 2012. Multi-attribute spaces: Calibration for attribute fusion and similarity search CVPR.
[42]
K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556 (2014).
[43]
Josef Sivic and Andrew Zisserman. 2003. Video Google: A Text Retrieval Approach to Object Matching in Videos ICCV. 1470--1477.
[44]
Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM Vol. 59, 2 (2016), 64--73.
[45]
Giorgios Tolias, Yannis Avrithis, and Hervé Jégou. 2013. To aggregate or not to aggregate: Selective match kernels for image search ICCV.
[46]
Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2016. Particular object retrieval with integral max-pooling of CNN activations ICLR.
[47]
A. Torralba and A. A. Efros. 2011. Unbiased look at dataset bias. In CVPR.
[48]
Tobias Weyand, Jan Hosang, and Bastian Leibe. 2010. An evaluation of two automatic landmark building discovery algorithms for city reconstruction ECCV.
[49]
Tobias Weyand, Ilya Kostrikov, and James Philbin. 2016. Planet-photo geolocation with convolutional neural networks ECCV. Springer, 37--55.
[50]
Tobias Weyand and Bastian Leibe. 2015. Visual landmark recognition from internet photo collections: A large-scale evaluation. Computer Vision and Image Understanding Vol. 135 (2015), 1--15.
[51]
M. D. Zeiler and R. Fergus. 2014. Visualizing and Understanding Convolutional Networks ECCV.
[52]
Lihi Zelnik-Manor and Pietro Perona. 2004. Self-tuning spectral clustering. NIPS, Vol. 17, 1601--1608 (2004), 16.
[53]
Wan-Lei Zhao, Hervé Jégou, and Guillaume Gravier. 2013. Sim-Min-Hash: An efficient matching technique for linking large image collections ACM Multimedia.

Cited By

View all
  • (2024)Incremental encrypted traffic classification via contrastive prototype networksComputer Networks10.1016/j.comnet.2024.110591250(110591)Online publication date: Aug-2024
  • (2021)Billion-Scale Similarity Search with GPUsIEEE Transactions on Big Data10.1109/TBDATA.2019.29215727:3(535-547)Online publication date: 1-Jul-2021
  • (2021)Online Continual Learning For Visual Food Classification2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW54120.2021.00265(2337-2346)Online publication date: Oct-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017
October 2017
558 pages
ISBN:9781450354165
DOI:10.1145/3126686
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering
  2. computer vision
  3. knn-graphs

Qualifiers

  • Research-article

Conference

MM '17
Sponsor:
MM '17: ACM Multimedia Conference
October 23 - 27, 2017
California, Mountain View, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Incremental encrypted traffic classification via contrastive prototype networksComputer Networks10.1016/j.comnet.2024.110591250(110591)Online publication date: Aug-2024
  • (2021)Billion-Scale Similarity Search with GPUsIEEE Transactions on Big Data10.1109/TBDATA.2019.29215727:3(535-547)Online publication date: 1-Jul-2021
  • (2021)Online Continual Learning For Visual Food Classification2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW54120.2021.00265(2337-2346)Online publication date: Oct-2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media