Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3394171.3413574acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Deep Heterogeneous Multi-Task Metric Learning for Visual Recognition and Retrieval

Published: 12 October 2020 Publication History

Abstract

How to estimate the distance between data instances is a fundamental problem in many artificial intelligence algorithms, and critical in diverse multimedia applications. A major challenge in the estimation is how to find an appropriate distance function when labeled data are insufficient for a certain task. Multi-task metric learning (MTML) is able to alleviate such data deficiency issue by learning distance metrics for multiple tasks together and sharing information between the different tasks. Recently, heterogeneous MTML (HMTML) has attracted much attention since it can handle multiple tasks with varied data representations. A major drawback of the current HMTML approaches is that only linear transformations are learned to connect different domains. This is suboptimal since the correlations between different domains may be very complex and highly nonlinear. To overcome this drawback, we propose a deep heterogeneous MTML (DHMTML) method, in which a nonlinear mapping is learned for each task by using a deep neural network. The correlations of different domains are exploited by sharing some parameters at the top layers of different networks. More importantly, the auto-encoder scheme and the adversarial learning mechanism are integrated and incorporated to help exploit the feature correlations in and between different tasks and the specific properties are preserved by learning additional task-specific layers together with the common layers. Experiments demonstrated that the proposed method outperforms single-task deep metric learning algorithms and other HMTML approaches consistently on several benchmark datasets.

Supplementary Material

MP4 File (3394171.3413574.mp4)
In this video, we present a novel deep heterogeneous multi-task metric learning framework, which is able to learn multiple nonlinear distance metrics simultaneously and enable information transfer between the different tasks/domains effectively. Specifically, a nonlinear metric is learned for each task using neural network, and we enforce the different networks to share some top layers to enable information transfer. Some specific representations are learned together with the common representation to respect the specific properties. We also introduce the auto-encoder scheme to exploit some interesting structures, such as feature correlations, contained in and between different domains. Another major contribution is that we introduce adversarial learning to enforce different domains not only share the same features, but also follow the same data distribution in the common subspace. We demonstrate effectiveness of our method in both toy face recognition and natural image clustering and retrieval.

References

[1]
Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017).
[2]
Peter N. Belhumeur, Jo ao Pedro Hespanha, and David J. Kriegman. 1997. Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE TPAMI, Vol. 19 (1997), 711--720.
[3]
Binod Bhattarai, Gaurav Sharma, and Frédéric Jurie. 2016. CP-mtML: Coupled projection multi-task metric learning for large scale face retrieval. In CVPR. 4226--4235.
[4]
Jason V. Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S. Dhillon. 2007. Information-theoretic Metric Learning. In ICML. 209--216.
[5]
Jacob Goldberger, Geoffrey E Hinton, Sam T Roweis, and Ruslan R Salakhutdinov. 2005. Neighbourhood components analysis. In NIPS. 513--520.
[6]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NIPS. 2672--2680.
[7]
Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In null. 1735--1742.
[8]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. CVPR (Jun 2016).
[9]
Steven CH Hoi, Wei Liu, and Shih-Fu Chang. 2010. Semi-supervised distance metric learning for collaborative image retrieval and clustering. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 6, 3 (2010), 1--26.
[10]
Steven CH Hoi, Wei Liu, Michael R Lyu, and Wei-Ying Ma. 2006. Learning distance metrics with contextual constraints for image retrieval. In CVPR, Vol. 2. 2072--2078.
[11]
Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge Belongie, and Deborah Estrin. 2017. Collaborative metric learning. In WWW. 193--201.
[12]
Junlin Hu, Jiwen Lu, and Yap-Peng Tan. 2014. Discriminative deep metric learning for face verification in the wild. In CVPR. 1875--1882.
[13]
Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks. CVPR (Jul 2017).
[14]
Xin Jin, Fuzhen Zhuang, Sinno Jialin Pan, Changying Du, Ping Luo, and Qing He. 2015. Heterogeneous multi-task semantic feature learning for classification. In ACM CIKM. 1847--1850.
[15]
Sungyeon Kim, Dongwon Kim, Minsu Cho, and Suha Kwak. 2020. Proxy anchor loss for deep metric learning. In CVPR. 3238--3247.
[16]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[17]
Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3D Object Representations for Fine-Grained Categorization. In 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13).
[18]
Brian Kulis et al. 2013. Metric learning: A survey. FTML, Vol. 5, 4 (2013), 287--364.
[19]
Jiawei Liu, Zheng-Jun Zha, Di Chen, Richang Hong, and Meng Wang. 2019. Adaptive transfer network for cross-domain person re-identification. In CVPR. 7202--7211.
[20]
Jiawei Liu, Zheng-Jun Zha, QI Tian, Dong Liu, Ting Yao, Qiang Ling, and Tao Mei. 2016b. Multi-scale triplet cnn for person re-identification. In ACM MM. 192--196.
[21]
Tongliang Liu, Dacheng Tao, Mingli Song, and Stephen J Maybank. 2016a. Algorithm-dependent generalization bounds for multi-task learning. IEEE TPAMI, Vol. 39, 2 (2016), 227--241.
[22]
Yong Luo, Han Hu, Yonggang Wen, and Dacheng Tao. 2020. Transforming device fingerprinting for wireless security via online multitask metric learning. IEEE IoTJ, Vol. 7, 1 (2020), 208--219.
[23]
Yong Luo, Yonggang Wen, Ling-Yu Duan, and Dacheng Tao. 2018b. Transfer metric learning: Algorithms, applications and outlooks. arXiv preprint arXiv:1810.03944 (2018).
[24]
Yong Luo, Yonggang Wen, Tongliang Liu, and Dacheng Tao. 2019. Transferring knowledge fragments for learning distance metric from a heterogeneous domain. IEEE TPAMI, Vol. 41, 4 (2019), 1013--1026.
[25]
Yong Luo, Yonggang Wen, and Dacheng Tao. 2016. On Combining Side Information and Unlabeled Data for Heterogeneous Multi-Task Metric Learning. In IJCAI. 1809--1815.
[26]
Yong Luo, Yonggang Wen, and Dacheng Tao. 2018a. Heterogeneous multitask metric learning across multiple domains. IEEE TNNLS, Vol. 29, 9 (2018), 4051--4064.
[27]
Lianyang Ma, Xiaokang Yang, and Dacheng Tao. 2014. Person re-identification over camera networks using multi-task distance metric learning. IEEE TIP, Vol. 23, 8 (2014), 3656--3670.
[28]
A. M. Martinez and R. Benavente. 1998. The AR Face Database. Technical Report. CVC.
[29]
Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. 2016. f-gan: Training generative neural samplers using variational divergence minimization. In NIPS. 271--279.
[30]
Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. 2016. Deep metric learning via lifted structured feature embedding. In CVPR. 4004--4012.
[31]
S. Parameswaran and K. Q. Weinberger. 2010. Large margin multi-task metric learning. In NIPS. 1867--1875.
[32]
Karl Pearson. 1901. LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, Vol. 2, 11 (1901), 559--572.
[33]
Qi Qian, Lei Shang, Baigui Sun, Juhua Hu, Hao Li, and Rong Jin. 2019. Softtriple loss: Deep metric learning without triplet sampling. In ICCV. 6450--6458.
[34]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. CVPR (Jun 2015).
[35]
Terence Sim, Simon Baker, and Maan Bsat. 2001. The Carnegie Mellon University Pose, Illumination, and Expression (PIE) Database of Human Faces. Technical Report Carnegie Mellon University-RI-TR-01-02.
[36]
Kihyuk Sohn. 2016. Improved deep metric learning with multi-class n-pair loss objective. In NIPS. 1857--1865.
[37]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper with Convolutions. In CVPR.
[38]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech-UCSD Birds-200--2011 Dataset. Technical Report.
[39]
Chang Wang and Sridhar Mahadevan. 2011. Heterogeneous domain adaptation using manifold alignment. In IJCAI, Vol. 22. 1541.
[40]
Kilian Q Weinberger, John Blitzer, and Lawrence K Saul. 2006. Distance metric learning for large margin nearest neighbor classification. In NIPS. 1473--1480.
[41]
Lei Wu, Steven CH Hoi, Rong Jin, Jianke Zhu, and Nenghai Yu. 2009. Distance metric learning from uncertain side information with application to automated photo tagging. In ACM international conference on Multimedia. 135--144.
[42]
Peipei Yang, Kaizhu Huang, and Cheng-Lin Liu. 2013. Geometry preserving multi-task metric learning. Machine learning, Vol. 92, 1 (2013), 133--175.
[43]
Yi Yang, Yueting Zhuang, Dong Xu, Yunhe Pan, Dacheng Tao, and Steve Maybank. 2009. Retrieval based interactive cartoon synthesis via unsupervised bi-distance metric learning. In Proceedings of the 17th ACM international conference on Multimedia. 311--320.
[44]
Zheng-Jun Zha, Chong Wang, Dong Liu, Hongtao Xie, and Yongdong Zhang. 2020. Robust deep co-saliency detection with group semantic and pyramid attention. IEEE TNNLS, Vol. 31, 7 (2020), 2398--2408.
[45]
Yu Zhang and Dit Yan Yeung. 2011. Multi-task learning in heterogeneous feature spaces. In AAAI. 574.

Cited By

View all
  • (2022)Semantically Meaningful Class Prototype Learning for One-Shot Image SegmentationIEEE Transactions on Multimedia10.1109/TMM.2021.306181624(968-980)Online publication date: 2022
  • (2021)Deep Marginal Fisher Analysis based CNN for Image Representation and ClassificationProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475560(181-189)Online publication date: 17-Oct-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep neural networks
  2. heterogeneous
  3. metric learning
  4. multi-task
  5. visual applications

Qualifiers

  • Research-article

Funding Sources

Conference

MM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)2
Reflects downloads up to 18 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Semantically Meaningful Class Prototype Learning for One-Shot Image SegmentationIEEE Transactions on Multimedia10.1109/TMM.2021.306181624(968-980)Online publication date: 2022
  • (2021)Deep Marginal Fisher Analysis based CNN for Image Representation and ClassificationProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475560(181-189)Online publication date: 17-Oct-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media