Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Towards Automatic Construction of Multi-Network Models for Heterogeneous Multi-Task Learning

Published: 05 March 2021 Publication History

Abstract

Multi-task learning, as it is understood nowadays, consists of using one single model to carry out several similar tasks. From classifying hand-written characters of different alphabets to figuring out how to play several Atari games using reinforcement learning, multi-task models have been able to widen their performance range across different tasks, although these tasks are usually of a similar nature. In this work, we attempt to expand this range even further, by including heterogeneous tasks in a single learning procedure. To do so, we firstly formally define a multi-network model, identifying the necessary components and characteristics to allow different adaptations of said model depending on the tasks it is required to fulfill. Secondly, employing the formal definition as a starting point, we develop an illustrative model example consisting of three different tasks (classification, regression, and data sampling). The performance of this illustrative model is then analyzed, showing its capabilities. Motivated by the results of the analysis, we enumerate a set of open challenges and future research lines over which the full potential of the proposed model definition can be exploited.

References

[1]
A. Ali-Gombe, E. Elyan, Y. Savoye, and C. Jayne. 2018. Few-shot classifier GAN. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN’18). 1--8.
[2]
Filipe Assunçao, Nuno Lourenço, Penousal Machado, and Bernardete Ribeiro. 2019. DENSER: Deep evolutionary network structured representation. Genetic Programming and Evolvable Machines 20, 1 (2019), 5--35.
[3]
Christopher P. Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, and Alexander Lerchner. 2018. Understanding disentangling in -VAE. arXiv preprint arXiv:1804.03599 (2018).
[4]
R. Caruana. 1997. Multitask learning. Machine Learning 28, 1 (1997), 41--75.
[5]
Kyunghyun Cho, Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the EMNLP. 1724--1734. http://aclweb.org/anthology/D/D14/D14-1179.pdf.
[6]
Dan Ciregan, Ueli Meier, and Jürgen Schmidhuber. 2012. Multi-column deep neural networks for image classification. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, 3642--3649.
[7]
Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha, Andrei A. Rusu, Alexander Pritzel, and Daan Wierstra. 2017. Pathnet: Evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734 (2017).
[8]
Unai Garciarena, Alexander Mendiburu, and Roberto Santana. 2018. Analysis of the complexity of the automatic pipeline generation problem. In Proceedings of the 2018 IEEE Congress on Evolutionary Computation. IEEE Press, Rio de Janeiro, Brazil.
[9]
Unai Garciarena, Alexander Mendiburu, and Roberto Santana. 2020. Analysis of the transferability and robustness of GANs evolved for Pareto set approximations. Neural Networks 132 (2020), 281--296.
[10]
Unai Garciarena, Roberto Santana, and Alexander Mendiburu. 2018. Evolved GANs for generating Pareto set approximations. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, 434--441.
[11]
Unai Garciarena, Roberto Santana, and Alexander Mendiburu. 2018. Expanding variational autoencoders for learning and exploiting latent representations in search distributions. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, Kyoto, Japan, 849--856.
[12]
Felix A. Gers, Jürgen Schmidhuber, and Fred Cummins. 2000. Learning to forget: Continual prediction with LSTM. Neural Computation 12, 10 (2020), 2451--2471.
[13]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research). Yee Whye Teh and Mike Titterington (Eds.), Vol. 9. PMLR, Chia Laguna Resort, Sardinia, Italy, 249--256. Retrieved from http://proceedings.mlr.press/v9/glorot10a.html.
[14]
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems. MIT Press, Cambridge, MA, 2672--2680.
[15]
Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.
[16]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.
[17]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
[18]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1125--1134.
[19]
Kevin Jarrett, Koray Kavukcuoglu, Yann LeCun, and others. 2009. What is the best multi-stage architecture for object recognition? In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision (ICCV’09). IEEE, 2146--2153.
[20]
Su Jianlin. 2017. A Baseline of Fashion MNIST (MobileNet 95%). Retrieved from https://kexue.fm/archives/4556.
[21]
Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013).
[22]
Brent Komer, James Bergstra, and Chris Eliasmith. 2014. Hyperopt-sklearn: Automatic hyperparameter configuration for scikit-learn. In Proceedings of the 13th Python in Science Conference. 34--40.
[23]
Lars Kotthoff, Chris Thornton, Holger H. Hoos, Frank Hutter, and Kevin Leyton-Brown. 2017. Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. The Journal of Machine Learning Research 18, 1 (2017), 826--830.
[24]
Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Master’s Thesis. Department of Computer Science, University of Toronto. Retrieved from https://www.cs.toronto.edu/ kriz/learning-features-2009-TR.pdf.
[25]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems. 1097--1105.
[26]
Yann LeCun, Bernhard Boser, John S. Denker, Donnie Henderson, Richard E. Howard, Wayne Hubbard, and Lawrence D. Jackel. 1989. Backpropagation applied to handwritten zip code recognition. Neural Computation 1, 4 (1989), 541--551.
[27]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.
[28]
Wei Li, Xiatian Zhu, and Shaogang Gong. 2017. Person re-identification by deep joint learning of multi-loss classification. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI'17). AAAI Press, 2194--2200.
[29]
Jason Liang, Elliot Meyerson, and Risto Miikkulainen. 2018. Evolutionary architecture search for deep multitask networks. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, New York, NY, 466--473.
[30]
Tongliang Liu, Dacheng Tao, Mingli Song, and Stephen J. Maybank. 2016. Algorithm-dependent generalization bounds for multi-task learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 2 (2016), 227--241.
[31]
Elliot Meyerson and Risto Miikkulainen. 2017. Beyond shared hierarchies: Deep multitask learning through soft layer ordering. arXiv preprint arXiv:1711.00108 (2017).
[32]
Risto Miikkulainen, Jason Liang, Elliot Meyerson, Aditya Rawal, Daniel Fink, Olivier Francon, Bala Raju, Hormoz Shahrzad, Arshak Navruzyan, Nigel Duffy, and Babak Hodjat. 2019. Evolving deep neural networks. In Artificial Intelligence in the Age of Neural Networks and Brain Computing, Robert Kozma, Cesare Alippi, Yoonsuck Choe, and Francesco Carlo Morabito (Eds.). Academic Press, 293--312.
[33]
Julian F. Miller and Peter Thomson. 2000. Cartesian genetic programming. In Proceedings of the European Conference on Genetic Programming. Springer, 121--132.
[34]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. 2011. Reading digits in natural images with unsupervised feature learning. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning.
[35]
Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore. 2016. Evaluation of a tree-based pipeline optimization tool for automating data science. In Proceedings of the 2016 on Genetic and Evolutionary Computation Conference. ACM, 485--492.
[36]
Michael O’Neil and Conor Ryan. 2003. Grammatical evolution. In Grammatical Evolution. Springer, 33--47.
[37]
Frank Rosenblatt. 1958. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65, 6 (1958), 386.
[38]
Akash Srivastava, Lazar Valkov, Chris Russell, Michael U. Gutmann, and Charles Sutton. 2017. Veegan: Reducing mode collapse in GANs using implicit variational learning. In Proceedings of the Advances in Neural Information Processing Systems. 3308--3318.
[39]
Masanori Suganuma, Shinichi Shirakawa, and Tomoharu Nagao. 2017. A genetic programming approach to designing convolutional neural network architectures. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, 497--504.
[40]
Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). ACM, New York, NY, 847--855.
[41]
C. Wang, C. Xu, X. Yao, and D. Tao. 2019. Evolutionary generative adversarial networks. IEEE Transactions on Evolutionary Computation 23, 6 (2019), 921--934.
[42]
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv: cs.LG/1708.07747.
[43]
Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. 2016. Attribute2image: Conditional image generation from visual attributes. In Proceedings of the European Conference on Computer Vision. Springer, 776--791.
[44]
Xianchao Zhang, Xiaotong Zhang, and Han Liu. 2015. Smart multitask Bregman clustering and multitask kernel clustering. ACM Transactions on Knowledge Discovery from Data 10, 1 (2015), 8.
[45]
Yu Zhang and Qiang Yang. 2017. A survey on multi-task learning. arXiv preprint arXiv:1707.08114 (2017).
[46]
Yu Zhang and Dit-Yan Yeung. 2014. A regularization approach to learning task relationships in multitask learning. ACM Transactions on Knowledge Discovery from Data 8, 3 (2014), 12.
[47]
Zhedong Zheng, Xiaodong Yang, Zhiding Yu, Liang Zheng, Yi Yang, and Jan Kautz. 2019. Joint discriminative and generative learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2138--2147.

Cited By

View all
  • (2024)Redefining Neural Architecture Search of Heterogeneous Multinetwork Models by Characterizing Variation Operators and Model ComponentsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.324287735:8(10561-10575)Online publication date: Aug-2024
  • (2024)Factorized models in neural architecture search: Impact on computational costs and performance2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651157(1-8)Online publication date: 30-Jun-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 15, Issue 2
Survey Paper and Regular Papers
April 2021
524 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3446665
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 March 2021
Accepted: 01 November 2020
Revised: 01 September 2020
Received: 01 December 2019
Published in TKDD Volume 15, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Multi-task learning
  2. deep neural networks
  3. neural architecture search

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Basque Government
  • Elkartek
  • Spanish Ministry of Economy, Industry and Competitiveness
  • Spanish Ministry of Science and Innovation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)4
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Redefining Neural Architecture Search of Heterogeneous Multinetwork Models by Characterizing Variation Operators and Model ComponentsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.324287735:8(10561-10575)Online publication date: Aug-2024
  • (2024)Factorized models in neural architecture search: Impact on computational costs and performance2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651157(1-8)Online publication date: 30-Jun-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media