research-article

GDOD: Effective Gradient Descent using Orthogonal Decomposition for Multi-Task Learning

Authors:

Linjian MoAuthors Info & Claims

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Pages 386 - 395

https://doi.org/10.1145/3511808.3557333

Published: 17 October 2022 Publication History

Abstract

Multi-task learning (MTL) aims at solving multiple related tasks simultaneously and has experienced rapid growth in recent years. However, MTL models often suffer from performance degeneration with negative transfer due to learning several tasks simultaneously. Some related work attributed the source of the problem is the conflicting gradients. In this case, it is needed to select useful gradient updates for all tasks carefully. To this end, we propose a novel optimization approach for MTL, named GDOD, which manipulates gradients of each task using an orthogonal basis decomposed from the span of all task gradients. GDOD decomposes gradients into task-shared and task-conflict components explicitly and adopts a general update rule for avoiding interference across all task gradients. This allows guiding the update directions depending on the task-shared components. Moreover, we prove the convergence of GDOD theoretically under both convex and non-convex assumptions. Experiment results on several multi-task datasets not only demonstrate the significant improvement of GDOD performed to existing MTL models but also prove that our algorithm outperforms state-of-the-art optimization methods in terms of AUC and Logloss metrics.

References

[1]

Rich Caruana. 1997. Multitask learning. Machine learning, Vol. 28, 1 (1997), 41--75.

[2]

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. 2018. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In International Conference on Machine Learning. PMLR, 794--803.

[3]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7--10.

Digital Library

[4]

Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning. 160--167.

Digital Library

[5]

Jean-Antoine Désidéri. 2012. Multiple-gradient descent algorithm (MGDA) for multiobjective optimization. Comptes Rendus Mathematique, Vol. 350, 5-6 (2012), 313--318.

[6]

Yunshu Du, Wojciech M Czarnecki, Siddhant M Jayakumar, Mehrdad Farajtabar, Razvan Pascanu, and Balaji Lakshminarayanan. 2018. Adapting auxiliary losses using gradient similarity. arXiv preprint arXiv:1812.02224 (2018).

[7]

Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).

[8]

Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, Vol. 53, 2 (2011), 217--288.

[9]

Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. 1991. Adaptive mixtures of local experts. Neural computation, Vol. 3, 1 (1991), 79--87.

[10]

Alex Kendall, Yarin Gal, and Roberto Cipolla. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7482--7491.

[11]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[12]

Kohavi R. Lane T. 2010. Census-Income (KDD) dataset. https://archive.ics.uci.edu/ml/datasets/Census-Income(KDD).

[13]

Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. 2021. Conflict-Averse Gradient Descent for Multi-task Learning. Advances in Neural Information Processing Systems, Vol. 34 (2021).

[14]

Shikun Liu, Edward Johns, and Andrew J. Davison. 2019a. End-to-end multi-task learning with attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1871--1880.

[15]

Shengchao Liu, Yingyu Liang, and Anthony Gitter. 2019b. Loss-balanced task weighting to reduce negative transfer in multi-task learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 9977--9978.

Digital Library

[16]

Jiaqi Ma, Zhe Zhao, Jilin Chen, Ang Li, Lichan Hong, and Ed H Chi. 2019. Snr: Sub-network routing for flexible parameter sharing in multi-task learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 216--223.

Digital Library

[17]

Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1930--1939.

Digital Library

[18]

Krzysztof Maziarz, Efi Kokiopoulou, Andrea Gesmundo, Luciano Sbaiz, Gabor Bartok, and Jesse Berent. 2019. Gumbel-matrix routing for flexible multi-task learning. (2019).

[19]

Baharan Mirzasoleiman, Kaidi Cao, and Jure Leskovec. 2020. Coresets for robust training of neural networks against noisy labels. arXiv preprint arXiv:2011.07451 (2020).

[20]

Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3994--4003.

[21]

Matthew Riemer, Ignacio Cases, Robert Ajemian, Miao Liu, Irina Rish, Yuhai Tu, and Gerald Tesauro. 2018. Learning to learn without forgetting by maximizing transfer and minimizing interference. arXiv preprint arXiv:1810.11910 (2018).

[22]

Clemens Rosenbaum, Tim Klinger, and Matthew Riemer. 2017. Routing networks: Adaptive selection of non-linear functions for multi-task learning. arXiv preprint arXiv:1711.01239 (2017).

[23]

Sebastian Ruder. 2017. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017).

[24]

Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, and Anders Søgaard. 2017. Sluice networks: Learning what to share between loosely related tasks. arXiv preprint arXiv:1705.08142, Vol. 2 (2017).

[25]

Ozan Sener and Vladlen Koltun. 2018. Multi-task learning as multi-objective optimization. arXiv preprint arXiv:1810.04650 (2018).

[26]

Yoav Shoham, Rob Powers, and Trond Grenager. 2003. Multi-agent reinforcement learning: a critical survey. Technical Report. Technical report, Stanford University.

[27]

Trevor Standley, Amir Zamir, Dawn Chen, Leonidas Guibas, Jitendra Malik, and Silvio Savarese. 2020. Which tasks should be learned together in multi-task learning?. In International Conference on Machine Learning. PMLR, 9120--9132.

[28]

Sandeep Subramanian, Adam Trischler, Yoshua Bengio, and Christopher J Pal. 2018. Learning general purpose distributed sentence representations via large scale multi-task learning. arXiv preprint arXiv:1804.00079 (2018).

[29]

Mihai Suteu and Yike Guo. 2019. Regularizing deep multi-task networks using orthogonal gradients. arXiv preprint arXiv:1912.06844 (2019).

[30]

Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations. In Fourteenth ACM Conference on Recommender Systems. 269--278.

Digital Library

[31]

Michael Wooldridge and Qiang Yang. 2015. IJCAI-15 dataset. https://ijcai-15.org/repeat-buyers-prediction-competition/.

[32]

Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. 2020. Gradient surgery for multi-task learning. arXiv preprint arXiv:2001.06782 (2020).

[33]

Amir R Zamir, Alexander Sax, William Shen, Leonidas J Guibas, Jitendra Malik, and Silvio Savarese. 2018. Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3712--3722.

[34]

Jiejie Zhao, Bowen Du, Leilei Sun, Fuzhen Zhuang, Weifeng Lv, and Hui Xiong. 2019. Multiple relational attention network for multi-task learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1123--1131.

Digital Library

[35]

Freiburg Ziegler, C.N. 2004. BookCrossing dataset. http://www2.informatik.uni-freiburg.de/ cziegler/BX/.

[36]

Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016).

Cited By

Wang YTang SChu M(2024)Physics-Informed Learning of Characteristic Trajectories for Smoke ReconstructionACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657483(1-11)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657483

Index Terms

GDOD: Effective Gradient Descent using Orthogonal Decomposition for Multi-Task Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning

Recommendations

Metric-Guided Multi-task Learning
Foundations of Intelligent Systems
Abstract
Multi-task learning (MTL) aims to solve multiple related learning tasks simultaneously so that the useful information in one specific task can be utilized by other tasks in order to improve the learning performance of all tasks. Many ...
Saliency-Regularized Deep Multi-Task Learning
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Multi-task learning (MTL) is a framework that enforces multiple learning tasks to share their knowledge to improve their generalization abilities. While shallow multi-task learning can learn task relations, it can only handle pre-defined features. Modern ...
RI-PCGrad: Optimizing multi-task learning with rescaling and impartial projecting conflict gradients
Abstract
The multi-task learning model is a learning paradigm that shares features across multiple tasks, and it has achieved great success in fields such as computer vision and natural language processing etc. Multiple tasks often conflict and even ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

October 2022

5274 pages

ISBN:9781450392365

DOI:10.1145/3511808

General Chairs:
Mohammad Al Hasan
Indiana University Purdue University, Indianapolis, USA
,
Li Xiong
Emory University, Atlanta, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '22

Sponsor:

CIKM '22: The 31st ACM International Conference on Information and Knowledge Management

October 17 - 21, 2022

GA, Atlanta, USA

Acceptance Rates

CIKM '22 Paper Acceptance Rate 621 of 2,257 submissions, 28%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
224
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)4

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang YTang SChu M(2024)Physics-Informed Learning of Characteristic Trajectories for Smoke ReconstructionACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657483(1-11)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657483

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten