Fast Depthwise Separable Convolution for Embedded Systems

Byeongheon Yoo¹⁶,
Yongjun Choi¹⁶ &
Heeyoul Choi¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11307))

Included in the following conference series:

International Conference on Neural Information Processing

3582 Accesses
4 Citations

Abstract

Convolutional neural networks (CNNs) have achieved outstanding performance in many applications. However, as the total number of layers has increased and the model structure has become compound, the computational cost comes into question. The large models cannot operate in embedded or mobile environments where hardware resources are quite limited. To overcome these problems, there have been several attempts like reducing the depth of networks, pruning, quantization or low rank approximation. Depthwise separable convolution (DSC) was proposed to reduce computation especially in convolutional layers by separating one convolution into a spatial convolution and a pointwise convolution. In this paper, we apply DSC to the YOLO network for object detection and propose a faster version of DSC, FastDSC by replacing the pointwise convolution with general matrix multiplication. Experiments on the NVIDIA Jetson TX2 board show that FastDSC speeds up DSC for object detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Convolution without multiplication: A general speed up strategy for CNNs

Article 11 November 2021

Hardware Architecture of Embedded Inference Accelerator and Analysis of Algorithms for Depthwise and Large-Kernel Convolutions

Speeding up inference on deep neural networks for object detection by performing partial convolution

Article 13 September 2019

References

Chollet, F.: Xception: deep learning with depthwise separable convolutions. CoRR abs/1610.02357 (2016). http://arxiv.org/abs/1610.02357
Courbariaux, M., Bengio, Y., David, J.: Binaryconnect: training deep neural networks with binary weights during propagations. CoRR abs/1511.00363 (2015). http://arxiv.org/abs/1511.00363
Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout (2013)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Article Google Scholar
Fukushima, K.: Neocognitron: a self-organizing neural network for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36(4), 193–202 (1980)
Article Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS 2010). Society for Artificial Intelligence and Statistics (2010)
Google Scholar
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29, 82–97 (2012)
Article Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531v1 (2015)
Howard, A.G., et al,: Mobilenets: efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017). http://arxiv.org/abs/1704.04861
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167 (2015). http://arxiv.org/abs/1502.03167
Kim, Y., Park, E., Yoo, S., Choi, T., Yang, L., Shin, D.: Compression of deep convolutional neural networks for fast and low power mobile applications. CoRR abs/1511.06530 (2015). http://arxiv.org/abs/1511.06530
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010). http://yann.lecun.com/exdb/mnist/
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines (2010)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. CoRR abs/1612.08242 (2016). http://arxiv.org/abs/1612.08242
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: hints for thin deep nets. CoRR abs/1412.6550 (2014). http://arxiv.org/abs/1412.6550
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
Article Google Scholar
Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. CoRR abs/1707.01083 (2017). http://arxiv.org/abs/1707.01083

Download references

Acknowledgement

This work was partly supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (No. 2018-0-00749,Development of virtual network management technology based on artificial intelligence) and the National Program for Excellence in Software funded by the Ministry of Science, ICT and Future Planning, Republic of Korea (2017-0-00130).

Author information

Authors and Affiliations

School of Computer Science and Electrical Engineering, Handong Global University, Pohang, 37554, South Korea
Byeongheon Yoo, Yongjun Choi & Heeyoul Choi

Authors

Byeongheon Yoo
View author publications
You can also search for this author in PubMed Google Scholar
Yongjun Choi
View author publications
You can also search for this author in PubMed Google Scholar
Heeyoul Choi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heeyoul Choi .

Editor information

Editors and Affiliations

The Chinese Academy of Sciences, Beijing, China
Long Cheng
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi Sing Leung
Kobe University, Kobe, Japan
Seiichi Ozawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yoo, B., Choi, Y., Choi, H. (2018). Fast Depthwise Separable Convolution for Embedded Systems. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11307. Springer, Cham. https://doi.org/10.1007/978-3-030-04239-4_59

Download citation

DOI: https://doi.org/10.1007/978-3-030-04239-4_59
Published: 18 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04238-7
Online ISBN: 978-3-030-04239-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fast Depthwise Separable Convolution for Embedded Systems

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Convolution without multiplication: A general speed up strategy for CNNs

Hardware Architecture of Embedded Inference Accelerator and Analysis of Algorithms for Depthwise and Large-Kernel Convolutions

Speeding up inference on deep neural networks for object detection by performing partial convolution

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Fast Depthwise Separable Convolution for Embedded Systems

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Convolution without multiplication: A general speed up strategy for CNNs

Hardware Architecture of Embedded Inference Accelerator and Analysis of Algorithms for Depthwise and Large-Kernel Convolutions

Speeding up inference on deep neural networks for object detection by performing partial convolution

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation