Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

FD-CNN: A Frequency-Domain FPGA Acceleration Scheme for CNN-Based Image-Processing Applications

Published: 09 November 2023 Publication History

Abstract

In the emerging edge-computing scenarios, FPGAs have been widely adopted to accelerate convolutional neural network (CNN)–based image-processing applications, such as image classification, object detection, and image segmentation, and so on. A standard image-processing pipeline first decodes the collected compressed images from Internet of Things (IoTs) to RGB data, then feeds them into CNN engines to compute the results. Previous works mainly focus on optimizing the CNN inference parts. However, we notice that on the popular ZYNQ FPGA platforms, image decoding can also become the bottleneck due to the poor performance of embedded ARM CPUs. Even with a hardware accelerator, the decoding operations still incur considerable latency. Moreover, conventional RGB-based CNNs have too few input channels at the first layer, which can hardly utilize the high parallelism of CNN engines and greatly slows down the network inference. To overcome these problems, in this article, we propose FD-CNN, a novel CNN accelerator leveraging the partial-decoding technique to accelerate CNNs directly in the frequency domain. Specifically, we omit the most time-consuming IDCT (Inverse Discrete Cosine Transform) operations of image decoding and directly feed the DCT coefficients (i.e., the frequency data) into CNNs. By this means, the image decoder can be greatly simplified. Moreover, compared to the RGB data, frequency data has a narrower input resolution but has 64× more channels. Such an input shape is more hardware friendly than RGB data and can substantially reduce the CNN inference time. We then systematically discuss the algorithm, architecture, and command set design of FD-CNN. To deal with the irregularity of different CNN applications, we propose an image-decoding-aware design-space exploration (DSE) workflow to optimize the pipeline. We further propose an early stopping strategy to tackle the time-consuming progressive JPEG decoding. Comprehensive experiments demonstrate that FD-CNN achieves, on average, 3.24×, 4.29×  throughput improvement, 2.55×, 2.54×  energy reduction and 2.38×, 2.58×  lower latency on ZC-706 and ZCU-102 platforms, respectively, compared to the baseline image-processing pipelines.

References

[1]
Nasir Abbas, Yan Zhang, Amir Taherkordi, and Tor Skeie. 2017. Mobile edge computing: A survey. IEEE Internet of Things Journal 5, 1 (2017), 450–465.
[2]
Mohamed S. Abdelfattah, Lukasz Dudziak, Thomas Chau, Royson Lee, Hyeji Kim, and Nicholas D. Lane. 2020. Codesign-NAS: Automatic FPGA/CNN codesign using neural architecture search. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’20). Association for Computing Machinery, New York, NY, 315.
[3]
Utku Aydonat, Shane O’Connell, Davor Capalija, Andrew C. Ling, and Gordon R. Chiu. 2017. An OpenCL(TM) deep learning accelerator on arria 10. CoRR abs/1701.03534 (2017). arXiv:1701.03534http://arxiv.org/abs/1701.03534.
[4]
Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, and Paul Whatmough. 2021. MicroNets: Neural network architectures for deploying TinyML applications on commodity microcontrollers. Proceedings of Machine Learning and Systems 3 (2021).
[5]
Hadjer Benmeziane, Kaoutar El Maghraoui, Hamza Ouarnoughi, Smail Niar, Martin Wistuba, and Naigang Wang. 2021. A comprehensive survey on hardware-aware neural architecture search. arXiv:2101.09336.
[6]
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Architectural Support for Programming Languages and Operating Systems, ASPLOS, Rajeev Balasubramonian, Al Davis, and Sarita V. Adve (Eds.). ACM, 269–284.
[7]
Yu-Hsin Chen, Joel S. Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). IEEE Computer Society, 367–379.
[8]
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). IEEE Computer Society, 609–622.
[9]
Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2016), 127–138.
[10]
Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. PACT: Parameterized clipping activation for quantized neural networks. CoRR abs/1805.06085 (2018). arXiv:1805.06085http://arxiv.org/abs/1805.06085.
[11]
Caiwen Ding, Siyu Liao, Yanzhi Wang, Zhe Li, Ning Liu, Youwei Zhuo, Chao Wang, Xuehai Qian, Yu Bai, Geng Yuan, Xiaolong Ma, Yipeng Zhang, Jian Tang, Qinru Qiu, Xue Lin, and Bo Yuan. 2017. CirCNN: Accelerating and compressing deep neural networks using block-circulant weight matrices. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50’17). Association for Computing Machinery, New York, NY, 395–408.
[12]
Zhen Dong, Yizhao Gao, Qijing Huang, John Wawrzynek, Hayden K. H. So, and Kurt Keutzer. 2021. HAO: Hardware-aware neural architecture optimization for efficient inference. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’21). IEEE, 50–59.
[13]
Samuel Felipe dos Santos, Nicu Sebe, and Jurandy Almeida. 2020. The good, the bad, and the ugly: Neural networks straight from JPEG. In 2020 IEEE International Conference on Image Processing (ICIP’20). IEEE, 1896–1900.
[14]
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. SIGARCH Computer Architecture News 43, 3S (June 2015), 92–104.
[15]
Max Ehrlich and Larry S. Davis. 2019. Deep residual learning in the JPEG transform domain. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3484–3493.
[16]
Anilloy Frank, Yasser Salim Khamis Al Aamri, and Amer Zayegh. 2019. IoT based smart traffic density control using image processing. In 2019 4th MEC International Conference on Big Data and Smart City (ICBDSC’19). 1–4.
[17]
Mingyu Gao, Xuan Yang, Jing Pu, Mark Horowitz, and Christos Kozyrakis. 2019. TANGRAM: Optimized coarse-grained dataflow for scalable NN accelerators. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’19), Iris Bahar, Maurice Herlihy, Emmett Witchel, and Alvin R. Lebeck (Eds.). ACM, 807–820.
[18]
Ross Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 1440–1448.
[19]
Lionel Gueguen, Alex Sergeev, Ben Kadlec, Rosanne Liu, and Jason Yosinski. 2018. Faster neural networks straight from JPEG. Advances in Neural Information Processing Systems 31 (2018), 3933–3944.
[20]
Kaiyuan Guo, Lingzhi Sui, Jiantao Qiu, Jincheng Yu, Junbin Wang, Song Yao, Song Han, Yu Wang, and Huazhong Yang. 2017. Angel-eye: A complete design flow for mapping CNN onto embedded FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 1 (2017), 35–47.
[21]
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). IEEE Computer Society, 243–254.
[22]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR. IEEE Computer Society, 770–778.
[23]
Xin He, Kaiyong Zhao, and Xiaowen Chu. 2021. AutoML: A survey of the state-of-the-art. Knowledge-Based Systems 212 (2021), 106622.
[24]
Yifan He, Jinshan Yue, Yongpan Liu, and Huazhong Yang. 2021. Block-circulant neural network accelerator featuring fine-grained frequency-domain quantization and reconfigurable FFT modules. In Proceedings of the 2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC’21). 813–818.
[25]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861.
[26]
Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, 2261–2269.
[27]
Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50 \(\times\) fewer parameters and \(\lt \!0.5\) MB model size. arXiv:1602.07360.
[28]
Shuang Jiang, Dong He, Chenxi Yang, Chenren Xu, Guojie Luo, Yang Chen, Yunlu Liu, and Jiangwei Jiang. 2018. Accelerating mobile applications at the network edge with software-programmable FPGAs. In Proceedings of the IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE, 55–62.
[29]
Norman P. Jouppi, Cliff Young, Nishant Patil, David A. Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, 1–12.
[30]
Andreas Kamilaris and Francesc X. Prenafeta-Boldú. 2018. Deep learning in agriculture: A survey. Computers and Electronics in Agriculture 147 (2018), 70–90.
[31]
Muhammad Zeeshan Khan, Saad Harous, Saleet Ul Hassan, Muhammad Usman Ghani Khan, Razi Iqbal, and Shahid Mumtaz. 2019. Deep unified model for face recognition based on convolution neural network and edge computing. IEEE Access 7 (2019), 72622–72633.
[32]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems (NIPS’12), Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger (Eds.). 1106–1114.
[33]
Jinmook Lee, Changhyeon Kim, Sanghoon Kang, Dongjoo Shin, Sangyeob Kim, and Hoi-Jun Yoo. 2019. UNPU: An energy-efficient deep neural network accelerator with fully variable weight bit precision. IEEE Journal of Solid State Circuits 54, 1 (2019), 173–185.
[34]
Yen-Lin Lee, Pei-Kuei Tsung, and Max Wu. 2018. Techology trend of edge AI. In Proceedings of the 2018 International Symposium on VLSI Design, Automation and Test (VLSI-DAT’18). IEEE, 1–2.
[35]
En Li, Liekang Zeng, Zhi Zhou, and Xu Chen. 2019. Edge AI: On-demand accelerating deep neural network inference via edge computing. IEEE Transactions on Wireless Communications 19, 1 (2019), 447–457.
[36]
Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, and Song Han. 2020. MCUNet: Tiny deep learning on IoT devices. arXiv:2007.10319.
[37]
Chang Liu, Xizhe Wang, Jing Ni, Yu Cao, and Benyuan Liu. 2019. An edge computing visual system for vegetable categorization. In Proceedings of the 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA’19). IEEE, 625–632.
[38]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. In European Conference on Computer Vision. Springer, 21–37.
[39]
Wenyan Lu, Guihai Yan, Jiajun Li, Shijun Gong, Yinhe Han, and Xiaowei Li. 2017. FlexFlow: A flexible dataflow accelerator architecture for convolutional neural networks. In Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA’17). IEEE Computer Society, 553–564.
[40]
Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. Shufflenet v2: Practical guidelines for efficient CNN architecture design. In Proceedings of the European Conference on Computer Vision (ECCV’18). 116–131.
[41]
Salma Abdel Magid, Francesco Petrini, and Behnam Dezfouli. 2020. Image classification on IoT edge devices: Profiling and modeling. Cluster Computing 23, 2 (2020), 1025–1043.
[42]
Shervin Minaee, Yuri Y. Boykov, Fatih Porikli, Antonio J. Plaza, Nasser Kehtarnavaz, and Demetri Terzopoulos. 2021. Image segmentation using deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (2021), 3523–3542.
[43]
NVIDIA. [n.d.]. NVDLA. http://nvdla.org.
[44]
Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A. Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W. Keckler, and Joel Emer. 2019. Timeloop: A systematic approach to DNN accelerator evaluation. In Proceedings of the 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’19). IEEE, 304–315.
[45]
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel S. Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, 27–40.
[46]
William B. Pennebaker and Joan L. Mitchell. 1992. JPEG: Still Image Data Compression Standard. Springer Science & Business Media.
[47]
Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, et al. 2016. Going deeper with embedded FPGA platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 26–35.
[48]
Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7263–7271.
[49]
Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv:1804.02767.
[50]
Ju Ren, Yundi Guo, Deyu Zhang, Qingqing Liu, and Yaoxue Zhang. 2018. Distributed and efficient object detection in edge computing: Challenges and solutions. IEEE Network 32, 6 (2018), 137–143.
[51]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 28 (2015), 91–99.
[52]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 234–241.
[53]
Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Vikas Chandra, and Hadi Esmaeilzadeh. 2018. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network. In Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’18), Murali Annavaram, Timothy Mark Pinkston, and Babak Falsafi (Eds.). IEEE Computer Society, 764–775.
[54]
Weisong Shi, Jie Cao, Quan Zhang, Youhuizi Li, and Lanyu Xu. 2016. Edge computing: Vision and challenges. IEEE Internet of Things Journal 3, 5 (2016), 637–646.
[55]
Weisong Shi and Schahram Dustdar. 2016. The promise of edge computing. Computer 49, 5 (2016), 78–81.
[56]
Yuanming Shi, Kai Yang, Tao Jiang, Jun Zhang, and Khaled B. Letaief. 2020. Communication-efficient edge AI: Algorithms and systems. IEEE Communications Surveys & Tutorials 22, 4 (2020), 2167–2191.
[57]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), Yoshua Bengio and Yann LeCun (Eds.).
[58]
Mingcong Song, Kan Zhong, Jiaqi Zhang, Yang Hu, Duo Liu, Weigong Zhang, Jing Wang, and Tao Li. 2018. In-situ AI: Towards autonomous and incremental deep learning for IoT systems. In Proceedings of the 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA’18). IEEE, 92–103.
[59]
Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. PMLR, 6105–6114.
[60]
Mingxing Tan, Ruoming Pang, and Quoc V. Le. 2020. EfficientDet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10781–10790.
[61]
Yaman Umuroglu, Lahiru Rasnayake, and Magnus Själander. 2018. BISMO: A scalable bit-serial matrix multiplication overlay for reconfigurable computing. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL’18). IEEE Computer Society, 307–314.
[62]
Junjue Wang, Ziqiang Feng, Shilpa George, Roger Iyengar, Padmanabhan Pillai, and Mahadev Satyanarayanan. 2019. Towards scalable edge-native applications. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing. 152–165.
[63]
Jinjiang Wang, Yulin Ma, Laibin Zhang, Robert X. Gao, and Dazhong Wu. 2018. Deep learning for smart manufacturing: Methods and applications. Journal of Manufacturing Systems 48 (2018), 144–156. Special Issue on Smart Manufacturing.
[64]
Xiaofei Wang, Yiwen Han, Chenyang Wang, Qiyang Zhao, Xu Chen, and Min Chen. 2019. In-edge AI: Intelligentizing mobile edge computing, caching and communication by federated learning. IEEE Network 33, 5 (2019), 156–165.
[65]
Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and Jason Cong. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Proceedings of the 54th Annual Design Automation Conference 2017. 1–6.
[66]
Di Wu, Yu Zhang, Xijie Jia, Lu Tian, Tianping Li, Lingzhi Sui, Dongliang Xie, and Yi Shan. 2019. A high-performance CNN processor based on FPGA for MobileNets. In 2019 29th International Conference on Field Programmable Logic and Applications (FPL’19). IEEE, 136–143.
[67]
Di Wu, Yu Zhang, Xijie Jia, Lu Tian, Tianping Li, Lingzhi Sui, Dongliang Xie, and Yi Shan. 2019. A high-performance CNN processor based on FPGA for MobileNets. In Proceedings of the 2019 29th International Conference on Field Programmable Logic and Applications (FPL’19). 136–143.
[68]
Yuan Xie, Luchang Ding, Aaron Zhou, and Gengsheng Chen. 2019. An optimized face recognition for edge computing. In 2019 IEEE 13th International Conference on ASIC (ASICON’19). IEEE, 1–4.
[69]
Kai Xu, Minghai Qin, Fei Sun, Yuhao Wang, Yen-Kuang Chen, and Fengbo Ren. 2020. Learning in the frequency domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1740–1749.
[70]
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. BiSeNet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV’18). 325–341.
[71]
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 161–170.
[72]
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays FPGA, George A. Constantinides and Deming Chen (Eds.). ACM, 161–170.
[73]
Chi Zhang and Viktor Prasanna. 2017. Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’17). Association for Computing Machinery, New York, NY, 35–44.
[74]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6848–6856.
[75]
Yu-Ming Zhang, Chun-Chieh Lee, Jun-Wei Hsieh, and Kuo-Chin Fan. 2021. CSL-YOLO: A new lightweight object detection system for edge computing. arXiv:2107.04829.
[76]
Xuda Zhou, Zidong Du, Qi Guo, Shaoli Liu, Chengsi Liu, Chao Wang, Xuehai Zhou, Ling Li, Tianshi Chen, and Yunji Chen. 2018. Cambricon-S: Addressing irregularity in sparse neural networks through a cooperative software/hardware approach. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’18). IEEE Computer Society, 15–28.
[77]
Zhe Zhou, Xintong Li, Xiaoyang Wang, Zheng Liang, Guangyu Sun, and Guojie Luo. 2020. Hardware-assisted service live migration in resource-limited edge computing systems. In 2020 57th ACM/IEEE Design Automation Conference (DAC’20). IEEE, 1–6.
[78]
Zhe Zhou, Bizhao Shi, Zhe Zhang, Yijin Guan, Guangyu Sun, and Guojie Luo. 2021. BlockGNN: Towards efficient GNN acceleration using block-circulant weight matrices. In Proceedings of the 58th ACM/IEEE Design Automation Conference (DAC’21). IEEE, 1009–1014.
[79]
Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang. 2018. UNet++: A nested U-Net architecture for medical image segmentation. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, 3–11.
[80]
Zhe Zhou, Bingzhe Wu, Zheng Liang, Guangyu Sun, Chenren Xu, and Guojie Luo. 2020. SaFace: Towards scenario-aware face recognition via edge computing system. In 3rd \(USENIX\) Workshop on Hot Topics in Edge Computing (HotEdge’20).

Index Terms

  1. FD-CNN: A Frequency-Domain FPGA Acceleration Scheme for CNN-Based Image-Processing Applications

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 22, Issue 6
      November 2023
      428 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/3632298
      • Editor:
      • Tulika Mitra
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 09 November 2023
      Online AM: 13 September 2022
      Accepted: 10 August 2022
      Revised: 11 July 2022
      Received: 31 December 2021
      Published in TECS Volume 22, Issue 6

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Edge computing
      2. image decoding
      3. Convolutional Neural Network
      4. FPGA

      Qualifiers

      • Research-article

      Funding Sources

      • National Key R&D Program of China
      • National Natural Science Foundation of China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 850
        Total Downloads
      • Downloads (Last 12 months)573
      • Downloads (Last 6 weeks)60
      Reflects downloads up to 26 Sep 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media