Abstract
While RGB-D classification task has been actively researched in recent years, most existing methods focus on the RGB-D source to target transfer task. The application of such methods cannot address the real-world scenario where the paired depth images are not hold. This paper focuses on a more flexible task that recognizes RGB test images by transferring them into the depth domain. Such a scenario retains high performance due to gaining auxiliary information but reduces the cost of pairing RGB with depth sensors at test time. Existing methods suffer from two challenges: the utilization of the additional depth features, and the domain shifting problem due to the different mechanisms between conventional RGB cameras and depth sensors. As a step towards bridging the gap, we propose a novel method called adaptive Visual-Depth Fusion Transfer (aVDFT) which can take advantage of the depth information and handle the domain distribution mismatch simultaneously. Our key novelties are: (1) a global visual-depth metric construction algorithm that can effectively align RGB and depth data structure; (2) adaptive transformed component extraction for target domain that conditioned on invariant transfer on location, scale and depth measurement. To demonstrate the effectiveness of aVDFT, we conduct comprehensive experiments on six pairs of RGB-D datasets for object recognition, scene classification and gender recognition and demonstrate state-of-the-art performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baktashmotlagh, M., Harandi, M.T., Lovell, B.C., Salzmann, M.: Unsupervised domain adaptation by domain invariant projection. In: IEEE International Conference on Computer Vision, pp. 769–776 (2013)
Bo, L., Ren, X., Fox, D.: Depth kernel descriptors for object recognition. In: IEEE International Conference on Intelligent Robots and Systems, pp. 821–826 (2011)
Cai, Z., Han, J., Liu, L., Shao, L.: RGB-D datasets using microsoft kinect or similar sensors: a survey. Multimed. Tools Appl. 76(3), 4313–4355 (2017)
Cai, Z., Long, Y., Shao, L.: Adaptive RGB image recognition by visual-depth embedding. IEEE Trans. Image Process. 27(5), 2471–2483 (2018)
Cai, Z., Shao, L.: RGB-D scene classification via multi-modal feature learning. Cogn. Comput. 10, 1–16 (2018)
Chen, L., Li, W., Xu, D.: Recognizing RGB images by learning from RGB-D data. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1418–1425 (2014)
Cui, Z., Li, W., Xu, D., Shan, S., Chen, X., Li, X.: Flowing on Riemannian manifold: domain adaptation by shifting covariance. IEEE Trans. Cybern. 44(12), 2264–2273 (2014)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Duan, L., Tsang, I.W., Xu, D.: Domain transfer multiple kernel learning. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 465–479 (2012)
Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)
Farquhar, J., Hardoon, D., Meng, H., Shawe-taylor, J.S., Szedmak, S.: Two view learning: SVM-2K, theory and practice. In: Advances in Neural Information Processing Systems, pp. 355–362 (2005)
Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 524–531 (2005)
Fernando, B., Habrard, A., Sebban, M., Tuytelaars, T.: Unsupervised visual domain adaptation using subspace alignment. In: IEEE International Conference on Computer Vision, pp. 2960–2967 (2013)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Gong, B., Grauman, K., Sha, F.: Connecting the dots with landmarks: discriminatively learning domain-invariant features for unsupervised domain adaptation. In: International Conference on Machine Learning, pp. 222–230 (2013)
Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised domain adaptation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2066–2073 (2012)
Gopalan, R., Li, R., Chellappa, R.: Domain adaptation for object recognition: an unsupervised approach. In: International Conference on Computer Vision, pp. 999–1006 (2011)
Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset (2007)
Gupta, S., Arbeláez, P., Girshick, R., Malik, J.: Indoor scene understanding with RGB-D images: bottom-up segmentation, object detection and semantic segmentation. Int. J. Comput. Vis. 112(2), 133–149 (2015)
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 345–360. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_23
Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)
Huang, J., Gretton, A., Borgwardt, K.M., Schölkopf, B., Smola, A.J.: Correcting sample selection bias by unlabeled data. In: Advances in Neural Information Processing Systems, pp. 601–608 (2006)
Huynh, T., Min, R., Dugelay, J.-L.: An efficient LBP-based descriptor for facial depth images applied to gender recognition using RGB-D face data. In: Park, J.-I., Kim, J. (eds.) ACCV 2012. LNCS, vol. 7728, pp. 133–145. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37410-4_12
Janoch, A., et al.: A category-level 3D object dataset: putting the kinect to work. In: Fossati, A., Gall, J., Grabner, H., Ren, X., Konolige, K. (eds.) Consumer Depth Cameras for Computer Vision. ACVPR, pp. 141–165. Springer, London (2013). https://doi.org/10.1007/978-1-4471-4640-7_8
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia, pp. 675–678 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: IEEE International Conference on Robotics and Automation, pp. 1817–1824 (2011)
Li, W., Chen, L., Xu, D., Van Gool, L.: Visual recognition in RGB images and videos by learning from RGB-D data. IEEE Trans. Pattern Anal. Mach. Intell. 1, 1 (2017)
Long, M., Wang, J., Ding, G., Sun, J., Yu, P.S.: Transfer joint matching for unsupervised domain adaptation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1410–1417 (2014)
Min, R., Kose, N., Dugelay, J.L.: KinectFaceDB: a kinect database for face recognition. IEEE Trans. Syst. Man Cybern.: Syst. 44(11), 1534–1548 (2014)
Motiian, S., Doretto, G.: Information bottleneck domain adaptation with privileged information for visual recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 630–647. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_39
Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2011)
Redko, I., Bennani, Y.: Non-negative embedding for fully unsupervised domain adaptation. Pattern Recogn. Lett. 77, 35–41 (2016)
Shao, L., Cai, Z., Liu, L., Lu, K.: Performance evaluation of deep feature learning for RGB-D image/video classification. Inf. Sci. 385, 266–283 (2017)
Sharmanska, V., Quadrianto, N., Lampert, C.H.: Learning to rank using privileged information. In: IEEE International Conference on Computer Vision, pp. 825–832 (2013)
Silberman, N., Fergus, R.: Indoor scene segmentation using a structured light sensor. In: IEEE International Conference on Computer Vision Workshops, pp. 601–608 (2011)
Vapnik, V., Vashist, A.: A new learning paradigm: learning using privileged information. Neural Netw. 22(5), 544–557 (2009)
Wolf, L., Hassner, T., Taigman, Y.: Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. IEEE Trans. Pattern Anal. Mach. Intell. 33(10), 1978–1990 (2011)
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems, pp. 487–495 (2014)
Acknowledgements
This work was sponsored by NUPTSF (Grant No. NY218120), and MRC Innovation Fellowship with ref MR/S003916/1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Cai, Z., Long, Y., Jing, XY., Shao, L. (2019). Adaptive Visual-Depth Fusion Transfer. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11364. Springer, Cham. https://doi.org/10.1007/978-3-030-20870-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-20870-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20869-1
Online ISBN: 978-3-030-20870-7
eBook Packages: Computer ScienceComputer Science (R0)