research-article

Towards Automatic Construction of Multi-Network Models for Heterogeneous Multi-Task Learning

Authors:

Unai Garciarena,

Alexander Mendiburu,

Roberto SantanaAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 15, Issue 2

Article No.: 33, Pages 1 - 23

https://doi.org/10.1145/3434748

Published: 05 March 2021 Publication History

Abstract

Multi-task learning, as it is understood nowadays, consists of using one single model to carry out several similar tasks. From classifying hand-written characters of different alphabets to figuring out how to play several Atari games using reinforcement learning, multi-task models have been able to widen their performance range across different tasks, although these tasks are usually of a similar nature. In this work, we attempt to expand this range even further, by including heterogeneous tasks in a single learning procedure. To do so, we firstly formally define a multi-network model, identifying the necessary components and characteristics to allow different adaptations of said model depending on the tasks it is required to fulfill. Secondly, employing the formal definition as a starting point, we develop an illustrative model example consisting of three different tasks (classification, regression, and data sampling). The performance of this illustrative model is then analyzed, showing its capabilities. Motivated by the results of the analysis, we enumerate a set of open challenges and future research lines over which the full potential of the proposed model definition can be exploited.

References

[1]

A. Ali-Gombe, E. Elyan, Y. Savoye, and C. Jayne. 2018. Few-shot classifier GAN. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN’18). 1--8.

[2]

Filipe Assunçao, Nuno Lourenço, Penousal Machado, and Bernardete Ribeiro. 2019. DENSER: Deep evolutionary network structured representation. Genetic Programming and Evolvable Machines 20, 1 (2019), 5--35.

Digital Library

[3]

Christopher P. Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, and Alexander Lerchner. 2018. Understanding disentangling in -VAE. arXiv preprint arXiv:1804.03599 (2018).

[4]

R. Caruana. 1997. Multitask learning. Machine Learning 28, 1 (1997), 41--75.

Digital Library

[5]

Kyunghyun Cho, Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the EMNLP. 1724--1734. http://aclweb.org/anthology/D/D14/D14-1179.pdf.

[6]

Dan Ciregan, Ueli Meier, and Jürgen Schmidhuber. 2012. Multi-column deep neural networks for image classification. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, 3642--3649.

[7]

Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha, Andrei A. Rusu, Alexander Pritzel, and Daan Wierstra. 2017. Pathnet: Evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734 (2017).

[8]

Unai Garciarena, Alexander Mendiburu, and Roberto Santana. 2018. Analysis of the complexity of the automatic pipeline generation problem. In Proceedings of the 2018 IEEE Congress on Evolutionary Computation. IEEE Press, Rio de Janeiro, Brazil.

[9]

Unai Garciarena, Alexander Mendiburu, and Roberto Santana. 2020. Analysis of the transferability and robustness of GANs evolved for Pareto set approximations. Neural Networks 132 (2020), 281--296.

[10]

Unai Garciarena, Roberto Santana, and Alexander Mendiburu. 2018. Evolved GANs for generating Pareto set approximations. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, 434--441.

Digital Library

[11]

Unai Garciarena, Roberto Santana, and Alexander Mendiburu. 2018. Expanding variational autoencoders for learning and exploiting latent representations in search distributions. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, Kyoto, Japan, 849--856.

Digital Library

[12]

Felix A. Gers, Jürgen Schmidhuber, and Fred Cummins. 2000. Learning to forget: Continual prediction with LSTM. Neural Computation 12, 10 (2020), 2451--2471.

Digital Library

[13]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research). Yee Whye Teh and Mike Titterington (Eds.), Vol. 9. PMLR, Chia Laguna Resort, Sardinia, Italy, 249--256. Retrieved from http://proceedings.mlr.press/v9/glorot10a.html.

[14]

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems. MIT Press, Cambridge, MA, 2672--2680.

[15]

Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.

[16]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.

Digital Library

[17]

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).

[18]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1125--1134.

[19]

Kevin Jarrett, Koray Kavukcuoglu, Yann LeCun, and others. 2009. What is the best multi-stage architecture for object recognition? In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision (ICCV’09). IEEE, 2146--2153.

[20]

Su Jianlin. 2017. A Baseline of Fashion MNIST (MobileNet 95%). Retrieved from https://kexue.fm/archives/4556.

[21]

Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013).

[22]

Brent Komer, James Bergstra, and Chris Eliasmith. 2014. Hyperopt-sklearn: Automatic hyperparameter configuration for scikit-learn. In Proceedings of the 13th Python in Science Conference. 34--40.

[23]

Lars Kotthoff, Chris Thornton, Holger H. Hoos, Frank Hutter, and Kevin Leyton-Brown. 2017. Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. The Journal of Machine Learning Research 18, 1 (2017), 826--830.

Digital Library

[24]

Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Master’s Thesis. Department of Computer Science, University of Toronto. Retrieved from https://www.cs.toronto.edu/ kriz/learning-features-2009-TR.pdf.

[25]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems. 1097--1105.

Digital Library

[26]

Yann LeCun, Bernhard Boser, John S. Denker, Donnie Henderson, Richard E. Howard, Wayne Hubbard, and Lawrence D. Jackel. 1989. Backpropagation applied to handwritten zip code recognition. Neural Computation 1, 4 (1989), 541--551.

Digital Library

[27]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.

[28]

Wei Li, Xiatian Zhu, and Shaogang Gong. 2017. Person re-identification by deep joint learning of multi-loss classification. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI'17). AAAI Press, 2194--2200.

Digital Library

[29]

Jason Liang, Elliot Meyerson, and Risto Miikkulainen. 2018. Evolutionary architecture search for deep multitask networks. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, New York, NY, 466--473.

Digital Library

[30]

Tongliang Liu, Dacheng Tao, Mingli Song, and Stephen J. Maybank. 2016. Algorithm-dependent generalization bounds for multi-task learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 2 (2016), 227--241.

Digital Library

[31]

Elliot Meyerson and Risto Miikkulainen. 2017. Beyond shared hierarchies: Deep multitask learning through soft layer ordering. arXiv preprint arXiv:1711.00108 (2017).

[32]

Risto Miikkulainen, Jason Liang, Elliot Meyerson, Aditya Rawal, Daniel Fink, Olivier Francon, Bala Raju, Hormoz Shahrzad, Arshak Navruzyan, Nigel Duffy, and Babak Hodjat. 2019. Evolving deep neural networks. In Artificial Intelligence in the Age of Neural Networks and Brain Computing, Robert Kozma, Cesare Alippi, Yoonsuck Choe, and Francesco Carlo Morabito (Eds.). Academic Press, 293--312.

[33]

Julian F. Miller and Peter Thomson. 2000. Cartesian genetic programming. In Proceedings of the European Conference on Genetic Programming. Springer, 121--132.

Digital Library

[34]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. 2011. Reading digits in natural images with unsupervised feature learning. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning.

[35]

Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore. 2016. Evaluation of a tree-based pipeline optimization tool for automating data science. In Proceedings of the 2016 on Genetic and Evolutionary Computation Conference. ACM, 485--492.

[36]

Michael O’Neil and Conor Ryan. 2003. Grammatical evolution. In Grammatical Evolution. Springer, 33--47.

[37]

Frank Rosenblatt. 1958. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65, 6 (1958), 386.

[38]

Akash Srivastava, Lazar Valkov, Chris Russell, Michael U. Gutmann, and Charles Sutton. 2017. Veegan: Reducing mode collapse in GANs using implicit variational learning. In Proceedings of the Advances in Neural Information Processing Systems. 3308--3318.

[39]

Masanori Suganuma, Shinichi Shirakawa, and Tomoharu Nagao. 2017. A genetic programming approach to designing convolutional neural network architectures. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, 497--504.

Digital Library

[40]

Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). ACM, New York, NY, 847--855.

Digital Library

[41]

C. Wang, C. Xu, X. Yao, and D. Tao. 2019. Evolutionary generative adversarial networks. IEEE Transactions on Evolutionary Computation 23, 6 (2019), 921--934.

Digital Library

[42]

Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv: cs.LG/1708.07747.

[43]

Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. 2016. Attribute2image: Conditional image generation from visual attributes. In Proceedings of the European Conference on Computer Vision. Springer, 776--791.

[44]

Xianchao Zhang, Xiaotong Zhang, and Han Liu. 2015. Smart multitask Bregman clustering and multitask kernel clustering. ACM Transactions on Knowledge Discovery from Data 10, 1 (2015), 8.

[45]

Yu Zhang and Qiang Yang. 2017. A survey on multi-task learning. arXiv preprint arXiv:1707.08114 (2017).

[46]

Yu Zhang and Dit-Yan Yeung. 2014. A regularization approach to learning task relationships in multitask learning. ACM Transactions on Knowledge Discovery from Data 8, 3 (2014), 12.

[47]

Zhedong Zheng, Xiaodong Yang, Zhiding Yu, Liang Zheng, Yi Yang, and Jan Kautz. 2019. Joint discriminative and generative learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2138--2147.

Cited By

Garciarena USantana RMendiburu A(2024)Redefining Neural Architecture Search of Heterogeneous Multinetwork Models by Characterizing Variation Operators and Model ComponentsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.324287735:8(10561-10575)Online publication date: Aug-2024
https://doi.org/10.1109/TNNLS.2023.3242877
Garciarena UMendiburu ASantana R(2024)Factorized models in neural architecture search: Impact on computational costs and performance2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651157(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10651157

Index Terms

Towards Automatic Construction of Multi-Network Models for Heterogeneous Multi-Task Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Randomized search
  2. Machine learning
    1. Learning paradigms
      1. Multi-task learning
      2. Supervised learning
    2. Machine learning approaches
      1. Neural networks

Recommendations

Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Neural-based multi-task learning has been successfully used in many real-world large-scale applications such as recommendation systems. For example, in movie recommendations, beyond providing users movies which they tend to purchase and watch, the ...
Multi-task Multi-view Learning for Heterogeneous Tasks
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Multi-task multi-view learning deals with the learning scenarios where multiple tasks are associated with each other through multiple shared feature views. All previous works for this problem assume that the tasks use the same set of class labels. ...
Heterogeneous Multi-task Semantic Feature Learning for Classification
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Multi-task Learning (MTL) aims to learn multiple related tasks simultaneously instead of separately to improve generalization performance of each task. Most existing MTL methods assumed that the multiple tasks to be learned have the same feature ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 15, Issue 2

Survey Paper and Regular Papers

April 2021

524 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3446665

Editor:
Charu Aggarwal
IBM T. J. Watson Research, USA

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 March 2021

Accepted: 01 November 2020

Revised: 01 September 2020

Received: 01 December 2019

Published in TKDD Volume 15, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Basque Government
Elkartek
Spanish Ministry of Economy, Industry and Competitiveness
Spanish Ministry of Science and Innovation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
154
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)4

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Garciarena USantana RMendiburu A(2024)Redefining Neural Architecture Search of Heterogeneous Multinetwork Models by Characterizing Variation Operators and Model ComponentsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.324287735:8(10561-10575)Online publication date: Aug-2024
https://doi.org/10.1109/TNNLS.2023.3242877
Garciarena UMendiburu ASantana R(2024)Factorized models in neural architecture search: Impact on computational costs and performance2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651157(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10651157

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents