Multi-task learning based on geometric invariance discriminative features

Yitong Liu¹,
Lei Huang ORCID: orcid.org/0000-0003-4087-3677^1,2,
Jie Li¹,
Wenfeng Zhang¹,
Yanxiu Sheng¹ &
…
Zhiqiang Wei^1,2

436 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Multi-task learning (MTL) aims at tackling multiple tasks through one single network while guarantee all tasks can reach good performance. The main challenge in MTL is how to extract task-specific feature effectively. Existing task-specific feature extraction methods predominantly combine and stack convolutional neural networks (CNNs). However, these methods ignore two points: the geometric variations of the target object have different effects on each task; discriminative features for each task lack a mechanism to ensure they are focused on. In this work, we propose a Deformable-Attention Multi-Task Network (DAMTN) to improve the capability of extracting geometric invariance discriminative features. In particular, deformable convolution is introduced to learn geometric variation rules of the target object for different tasks, and attention mechanism helps task-specific networks focus on discriminative parts. The proposed DAMTN can be trained end-to-end. We empirically analyze the contribution of different components in the proposed method and demonstrate state-of-the-art performance on multiple classification tasks as well as semantic segmentation task and depth estimation task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SiamADT: Siamese Attention and Deformable Features Fusion Network for Visual Object Tracking

Article 10 May 2023

Optimizing depth estimation with attention U-Net

Article 20 July 2024

AFRNet: adaptive feature refinement network

Article 19 July 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bilen H, Vedaldi A (2016) Integrated perception with recurrent multi-task neural networks. In: Advances in neural information processing systems, pp 235–243
Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: IEEE International conference on computer vision, pp 2650–2658
Xu W, Li S, Lu Y (2021) Usr-mtl: an unsupervised sentence representation learning framework with multi-task learning. Appl Intell 51(6):3506–3521
Article Google Scholar
Seltzer M L, Droppo J (2013) Multi-task learning in deep neural networks for improved phoneme recognition. In: IEEE International conference on acoustics, speech and signal processing, pp 6965–6969
Zhou J, Huang J X, Hu Q V, He L (2020) Is position important? Deep multi-task learning for aspect-based sentiment analysis. Appl Intell 50:3367–3378
Article Google Scholar
Kingma D P, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: IEEE Conference on computer vision and pattern recognition, pp 3150–3158
Misra I, Shrivastava A, Gupta A, Hebert M (2016) Cross-stitch networks for multi-task learning. In: IEEE Conference on computer vision and pattern recognition
Liu S, Johns E, Davison A J (2019) End-to-end multi-task learning with attention. In: IEEE Conference on computer vision and pattern recognition
Strezoski G, Noord N, Worring M (2019) Many task learning with task routing. In: IEEE International conference on computer vision
Bragman F J S, Tanno R, Ourselin S, Alexander DC, Cardoso J (2019) Stochastic filter groups for multi-task cnns: learning specialist and generalist convolution kernels. In: IEEE International conference on computer vision, pp 1385–1394
Sun T, Shao Y, Li X, Liu P, Yan H, Qiu X, Huang X (2020) Learning sparse sharing architectures for multiple tasks. In: AAAI Conference on artificial intelligence, vol 34, pp 8936–8943
Li J, Huang L, Wei Z, Zhang W, Qin Q (2021) Multi-task learning with deformable convolution. J Vis Commun Image Represent 77:103109
Article Google Scholar
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: IEEE International conference on computer vision, pp 764–773
We G, Wei Z, Huang L, Nie J, Li X (2021) Center-aligned domain adaptation network for image classification. Expert Syst Appl 168:114381
Article Google Scholar
Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv:1605.07146
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A Y (2011) Reading digits in natural images with unsupervised feature learning
Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151
Nilsback M E, Zisserman A (2008) Automated flower classification over a large number of classes. In: 2008 Sixth Indian conference on computer vision, graphics and image processing. IEEE, pp 722–729
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Article Google Scholar
Lake B M, Salakhutdinov R, Tenenbaum J B (2015) Human-level concept learning through probabilistic program induction. Science 350(6266):1332–1338
Article MathSciNet MATH Google Scholar
Cimpoi M, Maji S, Kokkinos I, Mohamed S, Vedaldi A (2014) Describing textures in the wild. In: IEEE Conference on computer vision and pattern recognition, pp 3606–3613
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on computer vision and pattern recognition, pp 3213–3223
Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine-grained classification. In: European conference on computer vision, pp 420–435
Vandenhende S, Georgoulis S, De brabandere B, Van Gool L (2019) Branched multi-task networks: deciding what layers to share. arXiv:1904.02920
Xiao T, Liu Y, Zhou B, Jiang Y, Sun J (2018) Unified perceptual parsing for scene understanding. In: Proceedings of the European conference on computer vision, pp 418–434
Ruder S, Bingel J, Augenstein I, Søgaard A (2019) Latent multi-task architecture learning. In: AAAI Conference on artificial intelligence, vol 33, pp 4822–4829
Hashimoto K, Xiong C, Tsuruoka Y, Socher R (2016) A joint many-task model: growing a neural network for multiple nlp tasks. arXiv:1611.01587
Chen Z, Badrinarayanan V, Lee C Y, Rabinovich A (2018) Gradnorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: International conference on machine learning. PMLR, pp 794–803
Sener O, Koltun V (2018) Multi-task learning as multi-objective optimization. Adv Neural Inf Process Syst 31:527–538
Google Scholar
Kendall A, Gal Y, Cipolla R (2018) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: IEEE/CVF Conference on computer vision and pattern recognition, pp 7482–7491
Guo M, Haque A, Huang D A, Yeung S, Fei-Fei L (2018) Dynamic task prioritization for multitask learning. In: European conference on computer vision, pp 270–287
Yu T, Kumar S, Gupta A, Levine S, Hausman K, Finn C (2020) Gradient surgery for multi-task learning. In: Advances in neural information processing systems, pp 5824–5836
Li W H, Bilen H (2020) Knowledge distillation for multi-task learning. In: European conference on computer vision. Springer, Cham, pp 163–176
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: IEEE Conference on computer vision and pattern recognition, pp 7132–7141
Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: IEEE Conference on computer vision and pattern recognition, pp 510–519
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) ECA-Net: efficient channel attention for deep convolutional neural networks. In: IEEE Conference on computer vision and pattern recognition, pp 11534–11542
Woo S, Park J, Lee J Y, Kweon I S (2018) Cbam: convolutional block attention module. In: European conference on computer vision, pp 3–19
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: IEEE Conference on computer vision and pattern recognition, pp 7794–7803
Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: learning global representations for image search. In: European conference on computer vision. Springer, Cham, pp 241–257
Krizhevsky A, Sutskever I, Hinton G E (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: IEEE International conference on computer vision, pp 1520–1528
Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets v2: more deformable, better results. In: IEEE Conference on computer vision and pattern recognition, pp 9308–9316
Liu N, Long Y, Zou C, Niu Q, Pan L, Wu H (2019) Adcrowdnet: an attention-injective deformable convolutional network for crowd understanding. In: IEEE Conference on computer vision and pattern recognition, pp 3225–3234
Mac K N C, Joshi D, Yeh R A, Xiong J, Feris R S, Do M N (2019) Learning motion in feature space: locally-consistent deformable convolution networks for fine-grained action detection. In: IEEE International conference on computer vision, pp 6282– 6291
Wang X, Chan K C K, Yu K, Dong C, Change Loy C (2019) Edvr: video restoration with enhanced deformable convolutional networks. In: IEEE Conference on computer vision and pattern recognition workshops
Liu F, Liu D, Tian J, Xie X, Yang X, Wang K (2020) Cascaded one-shot deformable convolutional neural networks: developing a deep learning model for respiratory motion estimation in ultrasound sequences. Med Image Anal 65:101793
Article Google Scholar
Deng L, Gong Y, Lu X, Yi X, Ma Z, Xie M (2019) Focus-enhanced scene text recognition with deformable convolutions. In: 2019 IEEE 5th international conference on computer and communications, pp 1685–1689
Bottou L (2012) Stochastic gradient descent tricks. Neural networks: tricks of the trade. Springer, Berlin, pp 421–436
Book Google Scholar
Zhu M, Chen C, Wang N, Tang J, Zhao C (2021) Mixed attention dense network for sketch classification. Appl Intell 1–8
Li H, Duan H, Zheng Y, Wang Q, Wang Y (2020) A CTR prediction model based on user interest via attention mechanism. Appl Intell 50(4):1192–1203
Article Google Scholar
Bhatt A, Ganatra A, Kotecha K (2021) Covid-19 pulmonary consolidations detection in chest x-ray using progressive resizing and transfer learning techniques. Heliyon 7(6):e07211
Article Google Scholar
Huang G, Liu Z, Laurens V, Weinberger K Q (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition, pp 2261–2269
Tupe-Waghmare P, Malpure P, Kotecha K, Beniwal M, Santosh V, Saini J, Ingalhalikar M (2021) Comprehensive genomic subtyping of glioma using semi-supervised multi-task deep learning on multimodal MRI. IEEE Access 9:167900–167910
Article Google Scholar
Ioannou Y, Robertson D, Cipolla R, Criminisi A (2017) Deep roots: improving cnn efficiency with hierarchical filter groups. In: IEEE Conference on computer vision and pattern recognition, pp 1231–1240
Howard A G, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Li D, Yao A, Chen Q (2020) Psconv: squeezing feature pyramid into one compact poly-scale convolutional layer. In: European conference on computer vision, pp 615–632
Komodakis N, Gidaris S (2018) Unsupervised representation learning by predicting image rotations. In: International conference on learning representations

Download references

Acknowledgements

This work is supported by the National Key R&D Program of China (2019YFD0900401); National Natural Science Foundation of China (No. 61872326); Shandong Provincial Natural Science Foundation (ZR2019MF044). This work got the GPU computation support from Center for High Performance Computing and System Simulation, Pilot National Laboratory for Marine Science and Technology (Qingdao).

Author information

Authors and Affiliations

Ocean University of China, Qingdao, 266000, China
Yitong Liu, Lei Huang, Jie Li, Wenfeng Zhang, Yanxiu Sheng & Zhiqiang Wei
Pilot National Laboratory for Marine Science and Technology, Qingdao, 266000, China
Lei Huang & Zhiqiang Wei

Authors

Yitong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenfeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yanxiu Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Huang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Huang, L., Li, J. et al. Multi-task learning based on geometric invariance discriminative features. Appl Intell 53, 3505–3518 (2023). https://doi.org/10.1007/s10489-022-03617-x

Download citation

Accepted: 09 April 2022
Published: 31 May 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s10489-022-03617-x

Multi-task learning based on geometric invariance discriminative features

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SiamADT: Siamese Attention and Deformable Features Fusion Network for Visual Object Tracking

Optimizing depth estimation with attention U-Net

AFRNet: adaptive feature refinement network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multi-task learning based on geometric invariance discriminative features

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SiamADT: Siamese Attention and Deformable Features Fusion Network for Visual Object Tracking

Optimizing depth estimation with attention U-Net

AFRNet: adaptive feature refinement network

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation