Abstract
Determining the similarity between images is a fundamental step in many applications, such as image categorization, image labeling and image retrieval. Automatic methods for similarity estimation often fall short when semantic context is required for the task, raising the need for human judgment. Such judgments can be collected via crowdsourcing techniques, based on tasks posed to web users. However, to allow the estimation of image similarities in reasonable time and cost, the generation of tasks to the crowd must be done in a careful manner. We observe that distances within local neighborhoods provide valuable information that allows a quick and accurate construction of the global similarity metric. This key observation leads to a solution based on clustering tasks, comparing relatively similar images. In each query, crowd members cluster a small set of images into bins. The results yield many relative similarities between images, which are used to construct a global image similarity metric. This metric is progressively refined, and serves to generate finer, more local queries in subsequent iterations. We demonstrate the effectiveness of our method on datasets where ground truth is available, and on a collection of images where semantic similarities cannot be quantified. In particular, we show that our method outperforms alternative baseline approaches, and prove the usefulness of clustering queries, and of our progressive refinement process.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Crowdsourcing is a general name for processes that involve posing many small-scale tasks to the crowd of web users, and piecing together the crowd’s answers to achieve a larger-scale goal, such as constructing a large knowledge base.
References
Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a mahalanobis metric from equivalence constraints. J. Mach. Learn. Res. 6(6), 937–965 (2005)
Biswas, A., Jacobs, D.: Active image clustering with pairwise constraints from humans. Int. J. Comput. Vis. 108(1–2), 133–147 (2014)
Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al.: Shapenet: An information-rich 3d model repository. arXiv:1512.03012 (arXiv preprint) (2015)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer vision and pattern recognition, IEEE, 2005, vol. 1, pp. 886–893 (2005)
Davidson, S.B., Khanna, S., Milo, T., Roy, S.: Using the crowd for top-k and group-by queries. In: International conference on database theory, pp. 225–236. ACM (2013)
Frome, A., Singer, Y., Sha, F., Malik, J.: Learning globally-consistent local distance functions for shape-based image retrieval and classification. In: International conference on computer vision, IEEE. pp. 1–8 (2007)
Gomes, R.G., Welinder, P., Krause, A., Perona, P.: Crowdclustering. In: Advances in neural information processing systems. pp. 558–566 (2011)
Lowe, D.G.: Object recognition from local scale-invariant features. In: International conference on computer vision, IEEE 1999, vol. 2, pp. 1150–1157 (1999)
Lun, Z., Kalogerakis, E., Sheffer, A.: Elements of style: learning perceptual shape style similarity. ACM Trans. Gr. (TOG) 34(4), 84 (2015)
Marcus, A., Wu, E., Karger, D., Madden, S., Miller, R.: Human-powered sorts and joins. Proc. VLDB Endow. 5(1), 13–24 (2011)
O’Donovan, P., Lībeks, J., Agarwala, A., Hertzmann, A.: Exploratory font selection using crowdsourced attributes. ACM Trans. Gr. (TOG) 33(4), 92 (2014)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Saleh, B., Dontcheva, M., Hertzmann, A., Liu, Z.: Learning style similarity for searching infographics. In: Proceedings of the 41st Graphics Interface Conference, pp. 59–64. Canadian Information Processing Society (2015)
Sammon, J.W.: A nonlinear mapping for data structure analysis. In: IEEE transactions on computers (1969)
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: International conference on computer vision, IEEE 2003, pp. 1470–1477 (2003)
Tamuz, O., Liu, C., Shamir, O., Kalai, A., Belongie, S.J.: Adaptively learning the crowd kernel. In: International conference on machine learning (ICML-11), pp. 673–680. ACM (2011)
Wang, C., Blei, D., Li, F.-F.: Simultaneous image classification and annotation. Computer vision and pattern recognition, IEEE 2009, pp. 1903–1910 (2009)
Wang, J., Kraska, T., Franklin, M.J., Feng, J.: Crowder: crowdsourcing entity resolution. Proc. VLDB Endow. 5(11), 1483–1494 (2012)
Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems, pp. 1473–1480 (2005)
Wilber, M.J., Kwak, I.S., Belongie, S.J.: Cost-effective hits for relative similarity comparisons. In: Conference on human computation and crowdsourcing (2014)
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning with application to clustering with side-information. Adv. Neural Inf. Proc. Syst. 15, 505–512 (2003)
Yi, J., Jin, R., Jain, S., Yang, T., Jain, A.K.: Semi-crowdsourced clustering: Generalizing crowd labeling by robust distance metric learning. In: Advances in neural information processing systems, pp. 1772–1780 (2012)
Zha, Z.-J., Hua, X.-S., Mei, T., Wang, J., Qi, G.-J., Wang, Z.: Joint multi-label multi-instance learning for image classification. Computer vision and pattern recognition, IEEE 2008, pp. 1–8 (2008)
Acknowledgments
This research was supported by a Google Focused Research Award, the Israeli Science Foundation (ISF, Grant No. 1636/13), by ICRC-The Blavatnik Interdisciplinary Cyber Research Center, and by the European Research Council under the FP7, ERC Grant MoDaS, Agreement 291071.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kleiman, Y., Goldberg, G., Amsterdamer, Y. et al. Toward semantic image similarity from crowdsourced clustering. Vis Comput 32, 1045–1055 (2016). https://doi.org/10.1007/s00371-016-1266-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-016-1266-4