Abstract
Real-time object matching and recognition is a challenging task in computer vision probably due to the extensively computational overload posed by large and high dimensional data space. Indexing approaches can help achieving thousands of times in speedups when comparing to sequential search. In this work, we propose a novel usage of product quantization and hierarchical clustering so that search speedups can be even improved further. To validate the proposed indexing algorithm, a number of experiments have been carried out on different datasets. Experimental results demonstrate that the proposed method works very efficiently and far outperforms many other state-of-the-art techniques.
Similar content being viewed by others
Notes
Alternatively, we use the notation \(d = D/m\) as the dimensionality of each sub-space.
References
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pp. 459–468 (2006)
Babenko, A., Lempitsky, V.: Additive quantization for extreme vector compression. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 931–938 (2014)
Babenko, A., Lempitsky, V.: The inverted multi-index. IEEE Trans. Pattern Anal. Mach. Intell. 37(6), 1247–1260 (2015)
Babenko, A., Lempitsky, V.: Tree quantization for large-scale similarity search and classification. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4240–4248 (2015)
Babenko, A., Lempitsky, V.: Product split trees. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6316–6324 (2017)
Dong, X., Shen, J., Shao, L.: Hierarchical superpixel-to-pixel dense matching. IEEE Trans. Circuits Syst. Video Technol. 27(12), 2518–2526 (2017)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96, pp. 226–231. AAAI Press (1996)
Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977)
Ge, T., He, K., Ke, Q., Sun, J.: Optimized product quantization. IEEE Trans. Pattern Anal. Mach. Intell. 36(4), 744–755 (2014)
Gordo, A., Almazan, J., Revaud, J., Larlus, D.: Deep image retrieval: learning global representations for image search. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) European Conference on Computer Vision. ECCV 2016. Lecture Notes in Computer Science, vol. 9910, pp. 241–257 (2016)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, pp. 369–376 (2006)
Han, J., Zhang, D., Cheng, G., Liu, N., Xu, D.: Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Process. Mag. 35(1), 84–100 (2018)
He, J., Liu, W., Chang, S.F.: Scalable similarity search with optimized kernel hashing. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, pp. 1129–1138 (2010)
Hu, P., Ramanan, D.: Finding tiny faces. CoRR arXiv:1612.04402 (2016)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing, STOC’98, pp. 604–613 (1998)
Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)
Kalantidis, Y., Avrithis, Y.: Locally optimized product quantization for approximate nearest neighbor search. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR 2014), pp. 2329–2336. Columbus, Ohio (2014)
Kaufman, L., Rousseeuw, P.J.: Clustering by means of medoids (1987)
Klein, B., Wolf, L.: In defense of product quantization. CoRR arXiv:1711.08589 (2017)
Krizhevsky, A., Hinton, G.E.: Using very deep autoencoders for content-based image retrieval. In: Proceedings of the European Symposium on Artificial Neural Networks (ESANN) (2011)
Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: IEEE International Conference on Computer Vision, ICCV’09, pp. 2130–2137 (2009)
Li, L., Hu, Q., Han, Y., Li, X.: Distribution sensitive product quantization. IEEE Trans. Circuits Syst. Video Technol. https://doi.org/10.1109/TCSVT.2017.2759277 (2017)
Liu, H., Wang, R., Shan, S., Chen, X.: Deep supervised hashing for fast image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 2064–2072 (2016)
Liu, L., Ouyang, W., Wang, X., Fieguth, P.W., Chen, J., Liu, X., Pietikäinen, M.: Deep learning for generic object detection: a survey. CoRR arXiv:1809.02165 (2018)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Macqueen, J.B.: Some methods for classification and analysis of multivariate observations. In: 5-th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
McNames, J.: A fast nearest-neighbor algorithm based on a principal axis search tree. IEEE Trans. Pattern Anal. Mach. Intell. 23(9), 964–976 (2001)
Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: Proceedings of International Conference on Computer Vision Theory and Applications, VISAPP’09, pp. 331–340 (2009)
Muja, M., Lowe, D.G.: Fast matching of binary features. In: Proceedings of the Ninth Conference on Computer and Robot Vision, CRV’12, pp. 404–410 (2012)
Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36, 2227–2240 (2014)
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition—Volume 2, CVPR’06, pp. 2161–2168 (2006)
Norouzi, M., Fleet, D.J.: Cartesian k-means. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’13, pp. 3017–3024 (2013)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Panigrahy, R.: Entropy based nearest neighbor search in high dimensions. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, SODA’06, pp. 1186–1195 (2006)
Pham, T.A.: Pair-wisely optimized clustering tree for feature indexing. Comput. Vis. Image Underst. 154(1), 35–47 (2017)
Pham, T.A., Barrat, S., Delalandre, M., Ramel, J.Y.: An efficient indexing scheme based on linked-node m-ary tree structure. In: 17th International Conference on Image Analysis and Processing (ICIAP 2013), LNCS, vol. 8156, pp. 752–762 (2013)
Pham, T.A., Do, N.T.: Embedding hierarchical clustering in product quantization for feature indexing. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-018-6626-9 (2018)
Qian, X., Zhao, Y., Han, J.: Image location estimation by salient region matching. IEEE Trans. Image Process. 24(11), 4348–4358 (2015)
Qin, X., Shen, J., Mao, X., Li, X., Jia, Y.: Robust match fusion using optimization. IEEE Trans. Cybern. 45(8), 1549–1560 (2015)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. CoRR arXiv:1503.03832 (2015)
Shen, J., Hao, X., Liang, Z., Liu, Y., Wang, W., Shao, L.: Real-time superpixel segmentation by DBSCAN clustering algorithm. IEEE Trans. Image Process. 25(12), 5933–5942 (2016)
Silpa-Anan, C., Hartley, R.: Optimised KD-trees for fast image descriptor matching. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR’08, pp. 1–8 (2008)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556 (2014)
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, pp. 1470–1477 (2003)
Wan, J., Wang, D., Hoi, S.C.H., Wu, P., Zhu, J., Zhang, Y., Li, J.: Deep learning for content-based image retrieval: a comprehensive study. In: ACM Multimedia (2014)
Wang, H., Cai, Y., Zhang, Y., Pan, H., Lv, W., Han, H.: Deep learning for image retrieval: what works and what doesn’t. In: 2015 IEEE International Conference on Data Mining Workshop (ICDMW) pp. 1576–1583 (2015)
Wang, W., Shen, J.: Deep visual attention prediction. IEEE Trans. Image Process. 27(5), 2368–2378 (2018)
Wang, W., Shen, J., Shao, L.: Video salient object detection via fully convolutional networks. IEEE Trans. Image Process. 27(1), 38–49 (2018)
Wieschollek, P., Wang, O., Sorkine-Hornung, A., Lensch, H.P.: Efficient large-scale approximate nearest neighbor search on the gpu. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2027–2035 (2016)
Wu, C., Karanasou, P., Gales, M.J.F., Sim, K.C.: Stimulated deep neural network for speech recognition. In: Proceedings of InterSpeech, pp. 400–404 (2016)
Wu, X., He, R., Sun, Z.: A lightened CNN for deep face representation. CoRR arXiv:1511.02683 (2015)
Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., Li, S.Z.: Faceboxes: a CPU real-time face detector with high accuracy. CoRR arXiv:1708.05234 (2017)
Zhang, T., Du, C., Wang, J.: Composite quantization for approximate nearest neighbor search. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 838–846 (2014)
Zhang, T., Qi, G.J., Tang, J., Wang, J.: Sparse composite quantization. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR’15), pp. 4548–4556 (2015)
Acknowledgements
This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant Number 102.01-2016.01
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pham, TA. Improved embedding product quantization. Machine Vision and Applications 30, 447–459 (2019). https://doi.org/10.1007/s00138-018-00999-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-018-00999-2