Abstract
One major drawback of deep-learning algorithms is the elevated cost of computing complexity and memory bandwidth required for inference. In order to ameliorate these costs in applications that utilize Convolutional Neural Networks (CNNs), a new, radical, approach is the dynamic pruning of kernels which aims to the parsimonious inference by learning to exploit and dynamically remove the redundant capacity of a CNN architecture. This conditional execution approach formulates a systematic and data-driven method for developing CNNs that are trained to eventually change size and form in real-time during inference, targeting to the smaller possible computational footprint. The conditional execution however, induces a number of challenges when it comes to the implementation of these algorithms to embedded systems. In this paper we present a systematic way of deploying this new dynamic pruning methodology, in heterogeneous platforms that facilitate both CPU and GPU subsystems. Realtime measurements of embedded implementations in modern SoCs verify the efficacy of the proposed methodology and demonstrate the ability of the dynamic networks to both adapt their size to the complexity of the task and deliver significant computational gains during inference.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Russakovsky, O., Deng, J., Su, H., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)
Ignatov, A., Timofte, R., Chou, W., et al.: AI Benchmark: running deep neural networks on android smartphones. https://arxiv.org/abs/1810.01109 (2018). Last Revised 15 Oct 2018
Knoblauch, A., Körner, E., Körner, U., Sommer, F.T.: Structural synaptic plasticity has high memory capacity and can explain graded amnesia, catastrophic forgetting, and the spacing effect. PLoS ONE 9(5), e96485 (2014). https://doi.org/10.1371/journal.pone.0096485
Bengio, E., Bacon, P.L., Pineau, J., Precup, D.: Conditional computation in neural networks for faster models. https://arxiv.org/abs/1511.06297 (2015). Last Revised 7 Jan 2016
Theodorakopoulos, I., Pothos, V., Kastaniotis, D., Fragoulis, N.: Parsimonious inference on convolutional neural networks: learning and applying on-line kernel activation rules. https://arxiv.org/abs/1701.05221 (2017). Last Revised 31 Jan 2017
Hu, H., Peng, R., Tai, Y.-W., Tang, C.-K.: Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. https://arxiv.org/abs/1607.03250 (2016). Submitted on 12 Jul 2016
Feng, J., Darrell, T.: Learning the structure of deep convolutional networks. IEEE Int. Conf. Comput. Vis. (ICCV) 2749–2757, 1135–1143 (2015)
Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 2074–2082 (2016)
Yang, T.J., Yu-Hsin, C., Vivienne, S.: Designing energy-efficient convolutional neural networks using energy-aware pruning. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6071–6079 (2016)
Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural networks. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 1135–1143 (2015)
Figurnov, M., Vetrov, D., Kohl, P.: PerforatedCNNs: acceleration through elimination of redundant convolutions, https://arxiv.org/pdf/1504.08362 (2015). Last Revised 16 Oct 2016
Bossard, L., Guillaumin, M., Van Gool, L.: Food-101—mining discriminative components with random forests. In: Fleet D., Pajdla T., Schiele B., Tuytelaars T. (eds.) Computer Vision—ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol. 8694. Springer, Berlin (2014)
Forrest, N.I., Song, H., et al.: SqueezeNet: AlexNet-level accuracy with 50 × fewer parameters and < 1 MB model size. https://arxiv.org/abs/1602.07360 (2016). Last Revised 4 Nov 2016
Qualcomm Neural Processing SDK for AI (https://developer.qualcomm.com/software/qualcomm-neural-processing-sdk). Accessed 2019
Irida Labs S.A. (https://www.iridalabs.gr). Accessed 2020
Funding
This work has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No. 780788.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pothos, V., Vassalos, E., Theodorakopoulos, I. et al. Deep Learning Inference with Dynamic Graphs on Heterogeneous Platforms. Int J Parallel Prog 49, 158–176 (2021). https://doi.org/10.1007/s10766-020-00654-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-020-00654-2