The Data Flow and Architectural Optimizations for a Highly Efficient CNN Accelerator Based on the Depthwise Separable Convolution

982 Accesses
1 Altmetric
Explore all metrics

Abstract

This paper presents the design and implementation of a convolutional neural network (CNN) accelerator for embedded and edge computing systems. To be specific, a novel processing flow is proposed in this paper so that the data that is already stored in the accelerator is maximally reused. This greatly reduces the requirements for the on-chip storage elements and the accesses to the off-chip memory. Therefore, significant reductions in the memory-access delay and the area complexity can be achieved. Based on the proposed data processing flow, a highly efficient VLSI architecture is designed and implemented. This architecture is based on a pipelined structure and maximizes the efficiency for the utilizations of hardware components. The implemented circuit is synthesized and placed- and routed with TSMC 90 nm technology, and the evaluations for the performance and area complexity are conducted based on the post-layout estimations. The experimental results show that the proposed CNN accelerator achieves a throughput of 44.06 Giga-MAC/s with the complexity of 5909KGEs. Furthermore, this design leads to a performance of 79.1 frame-per-second (fps) under the frequency of 250 MHz. Compared to the state-of-the-art accelerators, the proposed architecture achieves a significant enhancement in efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Unified and Energy-Efficient Depthwise Separable Convolution Accelerator

Hardware–Software Codesign of an Adder-Tree Type CNN Accelerator

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

Data Availability

The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

A. Ardakani, C. Condo, M. Ahmadi, W.J. Gross, An architecture to accelerate convolution in deep neural networks. IEEE Trans. Circuits Syst. I Regul. Pap. 65(4), 1349–1362 (2018)
Article Google Scholar
L. Bai, Y. Zhao, X. Huang, A CNN accelerator on FPGA using depthwise separable convolution. IEEE Trans. Circuits Syst. II Express Briefs 65(10), 1415–1419 (2018)
Article Google Scholar
S. Bazrafkan, P.M. Corcoran, Pushing the AI envelope: merging deep networks to accelerate edge artificial intelligence in consumer electronics devices and systems. IEEE Consum. Electron. Mag. 7(2), 55–61 (2018)
Article Google Scholar
W. Chen, Z. Wang, S. Li, Z. Yu, H. Li, Accelerating compact convolutional neural networks with multi-threaded data streaming. in 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 519–522 (2019)
Y. Chen, J. Emer, V. Sze, Using dataflow to optimize energy efficiency of deep neural network accelerators. IEEE Micro 37(3), 12–21 (2017)
Article Google Scholar
Y. Chen, T. Krishna, J.S. Emer, V. Sze, Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2017)
Article Google Scholar
Y. Chen, T. Yang, J. Emer, V. Sze, Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Top. Circuits Syst. 9(2), 292–308 (2019)
Article Google Scholar
G. Desoli et al., 14.1 A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems, in IEEE International Solid-State Circuits Conference (ISSCC), pp. 238–239 (2017)
Z. Du et al., ShiDianNao: Shifting vision processing closer to the sensor, in 2015 ACM/IEEE Annual International Symposium on Computer Architecture, pp. 92–104 (2015)
L. Jian, Z. Li, X. Yang, W. Wu, A. Ahmad, G. Jeon, Combining unmanned aerial vehicles with artificial-intelligence technology for traffic-congestion recognition: electronic eyes in the skies to spot clogged roads. IEEE Consumer Electron. Mag. 8(3), 81–86 (2019)
Article Google Scholar
A. Krizhevsky, S. Ilya, E.H. Geoffrey, Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
J. Li, X. Liang, S. Shen, T. Xu, J. Feng, S. Yan, Scale-aware fast R-CNN for pedestrian detection. IEEE Trans. Multimedia 20(4), 985–996 (2017)
Google Scholar
K.T. Malladi, F.A. Nothaft, K. Periyathambi, B.C. Lee, C. Kozyrakis and M. Horowitz, Towards energy-proportional datacenter memory with mobile DRAM, in 2012 39th Annual International Symposium on Computer Architecture (ISCA), pp. 37–48 (2012)
B. Moons, R. Uytterhoeven, W. Dehaene, M. Verhelst, 14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI, in 2017 IEEE International Solid-State Circuits Conference (ISSCC), pp. 246–247 (2017)
T. Ogunfunmi, R.P. Ramachandran, R. Togneri, Y. Zhao, X. Xia, A primer on deep learning architectures and applications in speech processing. Circuits Syst. Signal Process 38(8), 3406–3432 (2019)
Article Google Scholar
B. Qiang et al., SqueezeNet and fusion network-based accurate fast fully convolutional network for hand detection and gesture recognition. IEEE Access 9, 77661–77674 (2021)
Article Google Scholar
D. Sinha, M. El-Sharkawy, Thin MobileNet: an enhanced MobileNet architecture, in IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), pp. 0280–0285 (2019)
L. Sifre, Rigid-motion scattering for image classification. PhD Thesis in Ecole Polytechnique, CMAP (2014)
J. Su et al., Redundancy-reduced mobilenet acceleration on reconfigurable logic for ImageNet classification, in Applied Reconfigurable Computing. Architectures, Tools, and Applications, pp. 16–28 (2018)
V. Sze, Y. Chen, T. Yang, J.S. Emer, Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)
Article Google Scholar
C. Szegedy et al., Going deeper with convolutions, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)
X. Wang, M. Tang, S. Yang, H. Yin, H. Huang, L. He, Automatic hypernasality detection in cleft palate speech using CNN. Circuits Syst. Signal Process. 38(8), 3521–3547 (2019)
Article Google Scholar
Y. Yang, H. Luo, H. Xu, F. Wu, Towards real-time traffic sign detection and classification. IEEE Trans. Intell. Transp. Syst. 17(7), 2022–2031 (2016)
Article Google Scholar
X. Zhang, X. Zhou, M. Lin, J. Sun, ShuffleNet: an extremely efficient convolutional neural network for mobile devices, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
R. Zhao, X. Niu, W. Luk, Automatic optimising CNN with depthwise separable convolution on FPGA: (Abstact only), in Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Array, p. 285 (2018)

Download references

Acknowledgements

This work is supported in part by the Ministry of Science and Technology, Taiwan under grants MOST 109-2221-E-011-142 and 110-2221-E-011-155. The authors would like to thank Prof. Gerd Ascheid and Dr. Andreas Bytyn of RWTH Aachen University for their valuable inputs regarding the design of CNN accelerator.

Author information

Authors and Affiliations

Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan
Hung-Ju Lin & Chung-An Shen

Authors

Hung-Ju Lin
View author publications
You can also search for this author in PubMed Google Scholar
Chung-An Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chung-An Shen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, HJ., Shen, CA. The Data Flow and Architectural Optimizations for a Highly Efficient CNN Accelerator Based on the Depthwise Separable Convolution. Circuits Syst Signal Process 41, 3547–3569 (2022). https://doi.org/10.1007/s00034-022-01952-5

Download citation

Received: 20 February 2021
Revised: 19 December 2021
Accepted: 23 December 2021
Published: 17 January 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s00034-022-01952-5

The Data Flow and Architectural Optimizations for a Highly Efficient CNN Accelerator Based on the Depthwise Separable Convolution

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Unified and Energy-Efficient Depthwise Separable Convolution Accelerator

Hardware–Software Codesign of an Adder-Tree Type CNN Accelerator

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

The Data Flow and Architectural Optimizations for a Highly Efficient CNN Accelerator Based on the Depthwise Separable Convolution

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Unified and Energy-Efficient Depthwise Separable Convolution Accelerator

Hardware–Software Codesign of an Adder-Tree Type CNN Accelerator

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now