CAP-YOLO: Channel Attention Based Pruning YOLO for Coal Mine Real-Time Intelligent Monitoring
<p>Structure of YOLOv3, where the CBL represents the combination of Convolutional, Batch Normalize, and LeakyRelu activation function. Resblock denotes the residual structure. The outputs of YOLOv3 are represented by yolo1, yolo2, and yolo3, which represent different output scales.</p> "> Figure 2
<p>Structure of DCAM.</p> "> Figure 3
<p>Distribution of importance values evaluated by BN scale factors and DCAM. Most of the convolution layers in the Neck part are contained in the red dotted box, and the values of those are significantly smaller than that of Backbone. Obviously, the importance values evaluated by DCAM are more evenly distributed.</p> "> Figure 4
<p>(<b>a</b>) Structure of Resblock in Backbone. (<b>b</b>) Structure of Res-attention.</p> "> Figure 5
<p>Pruning process of the <span class="html-italic">l</span>-th layer, where the <math display="inline"><semantics> <mrow> <msub> <mi>n</mi> <mi>l</mi> </msub> </mrow> </semantics></math> denotes the number of filters in the <span class="html-italic">l</span>-th layer, while <math display="inline"><semantics> <mrow> <msub> <mi>p</mi> <mi>l</mi> </msub> </mrow> </semantics></math> represents the number of the filters that were removed.</p> "> Figure 6
<p>Effect of CLAHE with different parameters, where the parameter TG represents the tileGridSize, which indicates how many parts the image will be segmented into. The parameter of CL denotes clipLimit, which is the limit value of clips in CLAHE.</p> "> Figure 7
<p>Structure of AEPSM. The output dimension of AEPSM is 6: the former 3 correspond to clipLimit = 2, clipLimit = 4, and clipLimit = 8, respectively, while the later 3 correspond to tileGridSize = 2, tileGridSize = 4, and tileGridSize = 8.</p> "> Figure 8
<p>Training and inference processes of AEPSM. (<b>a</b>) is the training process, (<b>b</b>) is the parameter selection process, and (<b>c</b>) is the inference process.</p> "> Figure 9
<p>Effect of image modification. (<b>a</b>) Source, (<b>b</b>) cropping, (<b>c</b>) flipping, and (<b>d</b>) adding Gaussian noise.</p> "> Figure 10
<p>Pruning ratio-mAP curve of each model.</p> "> Figure 11
<p>Pruning ratio-mAP curve on the coal mine pedestrian dataset.</p> "> Figure 12
<p>The effect of CLAHE with AEPSM and fixed parameters under different fields.</p> ">
Abstract
:1. Introduction
- (1)
- DCAM is designed for evaluating the importance level of channels in feature maps.
- (2)
- The coal mine pedestrian dataset was established for transfer learning YOLOv3. Then, the YOLOv3 was pruned with the guidance of DCAM for forming CAP-YOLO.
- (3)
- For the complex lighting environments in coal mines, AEPSM proposed and combined with the Backbone of CAP-YOLO to perceive the lighting environment, to set the parameters of CLAHE for improving the accuracy of object detection.
2. Related Methods
2.1. Model Pruning
2.2. Attention Mechanism
3. Methods
3.1. Review of the YOLOv3 Object Detection Model
3.2. Deep Channel Attention Module (DCAM)
3.3. CAP-YOLO (Channel Attention Based Pruning YOLO)
Algorithm 1: Pruning Process | |
1 | Initialize YOLO-DCAM |
2 | Load the parameters of YOLOv3 to YOLO-DCAM |
3 | Fix the YOLOv3’s parameters of YOLO-DCAM |
4 | Training YOLO-DCAM |
5 | for img to D: |
6 | for l to L: //L is the number of total layers of pruning layers in YOLOv3 |
7 | Get the maximum prune value |
8 | Set prune threshold ) |
9 | Get CAP-YOLO by prune |
10 | Fine-tune the CAP-YOLO |
3.4. Adaptive Image Enhancement Parameter Selection Module
4. Results
4.1. Experiment Environments
4.1.1. Software and Hardware Environments
4.1.2. Dataset
4.1.3. Details
4.2. Performance on COCO
4.3. Performance on the Coal Mine Pedestrian Dataset
4.3.1. Performance of CAP-YOLO on the Coal Mine Pedestrian Dataset
4.3.2. Performance of AEPSM
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Feng, J.; Chen, J.; Liu, L.; Cao, X.; Zhang, X.; Jiao, L.; Yu, T. CNN-Based Multilayer Spatial–Spectral Feature Fusion and Sample Augmentation with Local and Nonlocal Constraints for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 1299–1313. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOV3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Chen, Z.; Wu, K.; Li, Y.; Wang, M.; Li, W. SSD-MSN: An Improved Multi-Scale Object Detection Network Based on SSD. IEEE Access 2019, 7, 80622–80632. [Google Scholar] [CrossRef]
- Wei, S.; Dustdar, S. The Promise of Edge Computing. Computer 2016, 49, 78–81. [Google Scholar]
- Wang, Z.; Zhang, J.; Zhao, Z.; Su, F. Efficient Yolo: A Lightweight Model for Embedded Deep Learning Object Detection. In Proceedings of the 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, UK, 6–10 July 2020; pp. 1–6. [Google Scholar]
- Luo, J.H.; Wu, J.X.; Lin, W.Y. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression. arXiv 2017, arXiv:1707.06342. [Google Scholar]
- Rong, J.; Yu, X.; Zhang, M.; Ou, L. Soft Taylor Pruning for Accelerating Deep Convolutional Neural Networks. In Proceedings of the IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 18–21 October 2020; pp. 5343–5349. [Google Scholar]
- Zhang, P.; Zhong, Y.; Li, X. SlimYOLOv3: Narrower, Faster and Better for Real-Time UAV Applications. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea, 27–28 October 2019; pp. 37–45. [Google Scholar]
- Wen, W.; Wu, C.; Wang, Y.; Chen, Y.; Li, H. Learning structured sparsity in deep neural networks. Adv. Neural Inf. Processing Syst. 2016, 29, 2074–2082. [Google Scholar]
- He, Y.; Zhang, X.; Sun, J. Channel Pruning for Accelerating Very Deep Neural Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1398–1406. [Google Scholar]
- Rodríguez, P.; Velazquez, D.; Cucurull, G.; Gonfaus, J.M.; Roca, F.X.; Gonzàlez, J. Pay Attention to the Activations: A Modular Attention Mechanism for Fine-Grained Image Recognition. IEEE Trans. Multimed. 2020, 22, 502–514. [Google Scholar] [CrossRef] [Green Version]
- Peng, X.; Zhong, R.; Li, Z.; Li, Q. Optical Remote Sensing Image Change Detection Based on Attention Mechanism and Image Difference. IEEE Trans. Geosci. Remote Sens. 2021, 59, 7296–7307. [Google Scholar] [CrossRef]
- Nan, G.; Ke, G.; Qiao, J.F.; Jing, B. Improved deep CNNs based on Nonlinear Hybrid Attention Module for image classification. Neural Netw. 2021, 140, 158–166. [Google Scholar]
- Zhang, Q.; Huang, N.; Yao, L.; Zhang, D.; Shan, C.; Han, J. RGB-T Salient Object Detection via Fusing Multi-Level CNN Features. IEEE Trans. Image Processing 2020, 29, 3321–3335. [Google Scholar] [CrossRef] [Green Version]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [Green Version]
- Liu, M.G.; Fang, W.; Ma, X.D.; Xu, W.Y.; Xiong, N.; Ding, Y. Channel pruning guided by spatial and channel attention for DNNs in intelligent edge computing. Appl. Soft Comput. 2021, 110, 107636. [Google Scholar] [CrossRef]
- Ren, S.; He, k.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Yu, Z.W.; Shen, Y.G.; Shen, C.K. A real-time detection approach for bridge cracks based on YOLOv4-FPM. Autom. Constr. 2021, 122, 103514. [Google Scholar] [CrossRef]
- Wang, Z.P.; Jin, L.Y.; Wang, S.; Xu, H.R. Apple stem/calyx real-time recognition using YOLO-v5 algorithm for fruit automatic loading system. Postharvest Biol. Technol. 2022, 185, 111808. [Google Scholar] [CrossRef]
- Sri, J.S.; Esther, R.P. LittleYOLO-SPP: A delicate real-time vehicle detection algorithm. Optik 2021, 225, 165818. [Google Scholar]
- Wu, D.H.; Lv, S.C.; Jiang, M.; Song, H.B. Using channel pruning-based YOLO v4 deep learning algorithm for the real-time and accurate detection of apple flowers in natural environments. Comput. Electron. Agric. 2020, 178, 105742. [Google Scholar] [CrossRef]
- Xu, Z.F.; Jia, R.S.; Sun, H.M.; Cui, Z. Light-YOLOv3: Fast method for detecting green mangoes in complex scenes using picking robots. Appl. Intell. 2020, 50, 4670–4687. [Google Scholar] [CrossRef]
- Kupyn, O.; Budzan, V.; Mykhailych, M.; Mishkin, D.; Matas, J. DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8183–8192. [Google Scholar]
- Han, S.; Mao, H.Z.; Dally, W.J. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv 2015, arXiv:1510.00149. [Google Scholar]
- Moon, S.; Byun, Y.; Park, J.; Lee, S.; Lee, Y. Memory-Reduced Network Stacking for Edge-Level CNN Architecture With Structured Weight Pruning. IEEE J. Emerg. Sel. Top. Circuits Syst. 2019, 9, 735–746. [Google Scholar] [CrossRef]
- Hu, H.Y.; Peng, R.; Tai, Y.W.; Tang, C.K. Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures. arXiv 2016, arXiv:1607.03250. [Google Scholar]
- Liu, Z.; Li, J.; Shen, Z.; Huang, G.; Yan, S.; Zhang, C. Learning Efficient Convolutional Networks through Network Slimming. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2755–2763. [Google Scholar]
- Li, H.; Kadav, A.; Durdanovic, I.; Samet, H.; Graf, H.P. Pruning Filters for Efficient ConvNets. arXiv 2017, arXiv:1608.08710. [Google Scholar]
- Wang, J.L.; Jiang, T.; Cui, Z.Y.; Cao, Z.J. Filter pruning with a feature map entropy importance criterion for convolution neural networks compressing. Neurocomputing 2021, 461, 41–54. [Google Scholar] [CrossRef]
- Luo, J.H.; Wu, J.X. An Entropy-based Pruning Method for CNN Compression. arXiv 2017, arXiv:1706.05791. [Google Scholar]
- He, Y.H.; Han, S. ADC: Automated Deep Compression and Acceleration with Reinforcement Learning. arXiv 2018, arXiv:1802.03494. [Google Scholar]
- He, Y.; Dong, X.; Kang, G.; Fu, Y.; Yan, C.; Yang, Y. Asymptotic Soft Filter Pruning for Deep Convolutional Neural Networks. IEEE Trans. Cybern. 2020, 50, 3594–3604. [Google Scholar] [CrossRef] [Green Version]
- Luo, J.H.; Wu, J.X. AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference. arXiv 2019, arXiv:1805.08941. [Google Scholar] [CrossRef]
- Li, Z.; Sun, Y.; Tian, G.; Xie, L.; Liu, Y.; Su, H.; He, Y. A compression pipeline for one-stage object detection model. J. Real-Time Image Processing 2021, 18, 1949–1962. [Google Scholar] [CrossRef]
- Chen, Y.J.; Li, R.; Li, R.F. HRCP: High-ratio channel pruning for real-time object detection on resource-limited platform. Neurocomputing 2021, 463, 155–167. [Google Scholar] [CrossRef]
- Chen, S.; Zhan, R.; Wang, W.; Zhang, J. Learning Slimming SAR Ship Object Detector Through Network Pruning and Knowledge Distillation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 1267–1282. [Google Scholar] [CrossRef]
- Zhang, B.; Xiong, D.; Su, J. Neural Machine Translation with Deep Attention. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 154–163. [Google Scholar] [CrossRef]
- Li, X.; Hu, X.L.; Yang, J. Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks. arXiv 2019, arXiv:1905.09646. [Google Scholar]
- Park, J.C.; Woo, S.; Lee, J.Y.; Kweon, I. BAM: Bottleneck Attention Module. arXiv 2018, arXiv:1807.06514. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar]
- Tang, C.; Liu, X.W.; Zheng, X.; Li, W.Q.; Xiong, J.; Wang, L.Z.; Zomaya, A.Y.; Longo, A. DeFusionNET: Defocus Blur Detection via Recurrently Fusing and Refining Discriminative Multi-Scale Deep Features. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 955–968. [Google Scholar] [CrossRef] [PubMed]
- Lu, X.; Ji, J.; Xing, Z.; Miao, Q. Attention and Feature Fusion SSD for Remote Sensing Object Detection. IEEE Trans. Instrum. Meas. 2021, 70, 1–9. [Google Scholar] [CrossRef]
- Xue, Y.; Ju, Z.; Li, Y.; Zhang, W. MAF-YOLO: Multi-modal attention fusion based YOLO for pedestrian detection. Infrared Phys. Technol. 2021, 118, 103906. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Yamamoto, K.; Maeno, K. PCAS: Pruning Channels with Attention Statistics. arXiv 2019, arXiv:1806.05382. [Google Scholar]
- Song, F.; Wang, Y.; Guo, Y.; Zhu, C.; Liu, J.; Jin, M. A Channel-level Pruning Strategy for Convolutional Layers in CNNs. In Proceedings of the 2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC), Guiyang, China, 22–24 August 2018; pp. 135–139. [Google Scholar]
- Shi, R.; Li, T.X.; Yamaguchi, Y. An attribution-based pruning method for real-time mango detection with YOLO network. Comput. Electron. Agric. 2020, 169, 105214. [Google Scholar] [CrossRef]
Training and Deployment Platform | Embedded Platform |
---|---|
Intel i7-11700k @4.9GHz | NVIDIA Jetson TX2 |
NVIDIA RTX3090 | Ubuntu 18.04 |
RAM 64G | Python3.6.8 |
Ubuntu 18.04 | Pytorch 1.10 |
Python3.6.8 | CUDA 11.3 |
Pytorch 1.10 | |
CUDA 11.3 |
Model | mAP (%) | FPS (RTX3090) |
---|---|---|
YOLO-DCAM | 62.3 | 48 |
YOLO-SENet | 58.7 | 52 |
YOLO-SGE | 59.4 | 51 |
YOLO-CBAM | 66.8 | 55 |
YOLO-BAM | 64.1 | 57 |
YOLOv3 | 55.2 | 61 |
SSD | 50.1 | 63 |
Model | mAP (%) | FPS-GPU | Size (MB) | FLOPs (Bn) |
---|---|---|---|---|
CAP-YOLO (40%) | 52.1 | 87 | 127 | 35.04 |
CAP-YOLO (60%) | 48.7 | 109 | 86.4 | 25.32 |
CAP-YOLO (88%) | 39.8 | 182 | 28.3 | 7.38 |
YOLOv3-tiny | 33.1 | 173 | 33.1 | 5.56 |
YOLOv3 | 55.2 | 61 | 236 | 65.86 |
Model | mAP (%) | FPS-GPU | FPS-TX2 |
---|---|---|---|
CAP-YOLO (40%) | 92.1 | 87 | 12 |
CAP-YOLO (60%) | 91.7 | 109 | 16 |
CAP-YOLO (93%) | 86.7 | 171 | 31 |
YOLO-SENet-prune (84%) | 76.3 | 154 | 23 |
YOLO-CBAM-prune (90%) | 83.2 | 161 | 28 |
YOLO-BAM-prune (91%) | 79.8 | 166 | 29 |
Slim-YOLOv3 (36%) | 78.3 | 78 | 9 |
YOLOv3 | 93.7 | 61 | 6 |
YOLOv3-tiny | 56.4 | 173 | 34 |
Fields | mAP (CAP-YOLO) | mAP (CAP-YOLO + CLAHE) |
---|---|---|
1 | 90.1 | 88.0 |
2 | 85.9 | 89.6 |
3 | 88.4 | 83.9 |
4 | 89.6 | 88.1 |
5 | 89.3 | 87.3 |
6 | 82.8 | 80.8 |
7 | 81.5 | 84.2 |
8 | 85.9 | 88.7 |
9 | 87.1 | 90.2 |
10 | 86.4 | 88.5 |
Fields | mAP (AEPSM + CLAHE) | CL | TG |
---|---|---|---|
1 | 91.6 | 4 | 8 |
2 | 91.4 | 2 | 8 |
3 | 92.1 | 8 | 4 |
4 | 90.7 | 4 | 4 |
5 | 91.6 | 2 | 8 |
6 | 90.1 | 2 | 4 |
7 | 92.3 | 8 | 8 |
8 | 92.9 | 8 | 4 |
9 | 91.8 | 2 | 8 |
10 | 92.6 | 4 | 8 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, Z.; Li, J.; Meng, Y.; Zhang, X. CAP-YOLO: Channel Attention Based Pruning YOLO for Coal Mine Real-Time Intelligent Monitoring. Sensors 2022, 22, 4331. https://doi.org/10.3390/s22124331
Xu Z, Li J, Meng Y, Zhang X. CAP-YOLO: Channel Attention Based Pruning YOLO for Coal Mine Real-Time Intelligent Monitoring. Sensors. 2022; 22(12):4331. https://doi.org/10.3390/s22124331
Chicago/Turabian StyleXu, Zhi, Jingzhao Li, Yifan Meng, and Xiaoming Zhang. 2022. "CAP-YOLO: Channel Attention Based Pruning YOLO for Coal Mine Real-Time Intelligent Monitoring" Sensors 22, no. 12: 4331. https://doi.org/10.3390/s22124331