YOLOX-Ray: An Efficient Attention-Based Single-Staged Object Detector Tailored for Industrial Inspections
<p>Coupled head vs. Decoupled head (Source: [<a href="#B21-sensors-23-04681" class="html-bibr">21</a>]).</p> "> Figure 2
<p>CAM and SAM attention mechanism vs. SimAM (Source: [<a href="#B15-sensors-23-04681" class="html-bibr">15</a>]).</p> "> Figure 3
<p>YOLOX-Ray architecture design.</p> "> Figure 4
<p>Image samples of each dataset: (<b>a</b>) Aerial thermal image of a solar farm; (<b>b</b>) Crack on a concrete infrastructure; (<b>c</b>) Corrosion on a bridge infrastructure.</p> "> Figure 5
<p>Image predictions for Case Study A: (<b>a</b>) Prediction on YOLOX-Ray-s; (<b>b</b>) Prediction on YOLOX-Ray-m; (<b>c</b>) Prediction on YOLOX-Ray-l; (<b>d</b>) Prediction on YOLOX-Ray-x.</p> "> Figure 6
<p>Image predictions for Case Study B: (<b>a</b>) Prediction on YOLOX-Ray-s; (<b>b</b>) Prediction on YOLOX-Ray-m; (<b>c</b>) Prediction on YOLOX-Ray-l; (<b>d</b>) Prediction on YOLOX-Ray-x.</p> "> Figure 7
<p>Image predictions for Case Study C: (<b>a</b>) Prediction on YOLOX-Ray-s; (<b>b</b>) Prediction on YOLOX-Ray-m; (<b>c</b>) Prediction on YOLOX-Ray-l; (<b>d</b>) Prediction on YOLOX-Ray-x.</p> ">
Abstract
:1. Introduction
- We introduce the SimAM attention mechanism into the YOLOX’s backbone. This enables a better feature extraction and feature fusion on the architecture’s neck;
- The proposed architecture implements a novel loss function, Alpha-IoU, which enables better bounding-box regression for small object detection.
2. Related Work
2.1. YOLO
2.2. YOLOX
2.3. Attention Mechanisms
2.3.1. SENet
2.3.2. CBAM
2.3.3. Coordinate-Attention
2.3.4. SimAM
2.4. Loss Functions
2.4.1. IoU Loss
2.4.2. Generalized-IoU
2.4.3. Distance-IoU
2.4.4. Complete-IoU
2.4.5. Alpha-IoU Loss
3. Proposed Method
Network Architecture
- Backbone: The backbone of the YOLOX-Ray network is based on the YOLOX base architecture, the CSPDarknet-53, which was first introduced in YOLOv4. This architecture is a modified version of the popular Darknet-53 architecture, with the addition of Cross Stage Partial (CSP) connections. Darknet-53 is a 53-layer DNN that has shown great performance on a variety of object detection tasks [39]. By combining these two structures, the CSPDarknet-53 backbone in YOLOX-Ray provides a high-level feature representation for object detection.
- Attention Mechanism: The SimAM attention mechanism, a novel mechanism that improves CNN performance by calculating attention weights in feature maps, is added at the end of the backbone without additional parameters [15]. The SimAM was added after the third layer, ‘Dark-3’, the fourth layer, ‘Dark-4’, and the fifth layer, ‘Dark-5’, of the original CSPDarknet-53 backbone, which served to improve the representation ability of feature extraction and improve feature fusion process in the neck component.
- Neck: The YOLOX-Ray neck is the same as the YOLOX base architecture, consisting of FPN and PAN structures. The neck takes the feature maps extracted by the backbone and generates a pyramid of features at different scales, allowing the network to detect objects at different scales. It performs upsampling in the Feature Pyramid Network (FPN) and downsampling in the Path Aggregation Network (PAN) [21,40].
- Head: The YOLOX-Ray architecture’s head, known as the YOLOX decoupled head, is also the same as the YOLOX base. This head is designed to perform bounding box regression and multi-class classification in parallel, allowing the network to predict the location and class of objects efficiently and effectively [21].
4. Experimental Tests and Results
- CPU: AMD Ryzen 7 3700X 3.6 GHz;
- GPU: 2 x NVIDIA GeForce RTX 2060TI SUPER OC 8 GB VRAM;
- RAM: 32 GB DDR4.
4.1. Datasets Structure
4.2. Network Hyperparameters
4.3. Model Size
4.4. Performance Metrics
4.5. Experimental Results
4.6. Case Study A: Experimental Results and Predictions
4.7. Case Study B: Experimental Results and Predictions
4.8. Case Study C: Experimental Results and Predictions
4.9. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
AP | Average Precision |
CA | Coordinate-Attention |
CAM | Channel Attention Mechanism |
CBAM | Convolutional Block Attention Module |
CCA | Coordinate-Channel Attention |
CE | Cross-Entropy |
CIoU | Complete Intersection-over-Union |
CNN | Convolutional Neural Network |
COCO | Common Objects in Context |
CPU | Central Process Unit |
CSA | Coordinate-Spatial Attention |
CSP | Cross Stage Partial |
CV | Computer Vision |
DIoU | Distance Intersection-over-Union |
DL | Deep Learning |
DNN | Deep Neural Network |
DOTA | Dataset for Object Detection in Aerial images |
FPN | Feature Pyramid Network |
FPS | Frames Per Second |
GIoU | Generalized Intersection-over-Union |
GPU | Graphics Processing Unit |
GT | Ground-Truth |
HSV | Hue, Saturation and Value |
IoU | Intersection-over-Union |
mAP | Mean Average Precision |
ML | Machine Learning |
OTA | Optimal Transport Assignment |
PAN | Path Aggregation Network |
RPN | Region Proposal Network |
SAM | Spatial Attention Mechanism |
SE | Squeeze-and-Excitation |
SENet | Squeeze-and-Excitation Network |
SimAM | Simple Parameter-free Attention Module |
SimOTA | Simplified Optimal Transport Assignment |
SoTA | State-of-the-Art |
VOC | Visual Object Classes |
YOLO | You Only Look Once |
References
- Kumar, A. Computer-Vision-Based Fabric Defect Detection: A Survey. IEEE Trans. Ind. Electron. 2008, 55, 348–363. [Google Scholar] [CrossRef]
- Weimer, D.; Scholz-Reiter, B.; Shpitalni, M. Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection. CIRP Ann. 2016, 65, 417–420. [Google Scholar] [CrossRef]
- Bedi, P.; Goyal, S.B.; Rajawat, A.S.; Bhaladhare, P.; Aggarwal, A.; Prasad, A. Feature Correlated Auto Encoder Method for Industrial 4.0 Process Inspection Using Computer Vision and Machine Learning. Procedia Comput. Sci. 2023, 218, 788–798. [Google Scholar] [CrossRef]
- Voulodimos, A.; Doulamis, N.; Doulamis, A.; Protopapadakis, E. Deep Learning for Computer Vision: A Brief Review. Comput. Intell. Neurosci. 2018, 2018, 7068349. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. CoRR 2016, 21–37. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [PubMed]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving Into High Quality Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar] [CrossRef]
- Du, J. Understanding of Object Detection Based on CNN Family and YOLO. J. Phys. Conf. Ser. 2018, 1004, 012029. [Google Scholar] [CrossRef]
- Li, Y.; Zeng, J.; Shan, S.; Chen, X. Occlusion Aware Facial Expression Recognition Using CNN with Attention Mechanism. IEEE Trans. Image Process. 2019, 28, 2439–2450. [Google Scholar] [CrossRef]
- Brauwers, G.; Frasincar, F. A General Survey on Attention Mechanisms in Deep Learning. IEEE Trans. Knowl. Data Eng. 2023, 35, 3279–3298. [Google Scholar] [CrossRef]
- Mustafaev, B.; Tursunov, A.; Kim, S.; Kim, E. A Novel Method to Inspect 3D Ball Joint Socket Products Using 2D Convolutional Neural Network with Spatial and Channel Attention. Sensors 2022, 22, 4192. [Google Scholar] [CrossRef]
- Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Volume 139, pp. 11863–11874. [Google Scholar]
- Cina, M.; Binny, S.; Roji, T.; Cini, J.; Shincy, K.K. Comparison of YOLO Versions for Object Detection from Aerial Images. Int. J. Eng. Technol. Manag. Sci. 2022, 9, 315–322. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Ultralytics LLC. YOLOv5. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 24 January 2023).
- Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. You Only Learn One Representation: Unified Network for Multiple Tasks. arXiv 2021, arXiv:2105.04206. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Ren, K.; Chen, X.; Wang, Z.; Yan, X.; Zhang, D. Fruit Recognition Based on YOLOX*. Proc. Int. Conf. Artif. Life Robot. 2022, 27, 470–473. [Google Scholar] [CrossRef]
- LearnOpenCV. YOLOX Object Detector Paper Explanation and Custom Training. 2022. Available online: https://learnopencv.com/yolox-object-detector-paper-explanation-and-custom-training/ (accessed on 24 January 2023).
- Zhang, J.; Huang, B.; Ye, Z.; Kuang, L.D.; Ning, X. Siamese anchor-free object tracking with multiscale spatial attentions. Sci. Rep. 2021, 11, 22908. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Li, Z.; Yoshie, O.; Sun, J. OTA: Optimal Transport Assignment for Object Detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 303–312. [Google Scholar] [CrossRef]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond Empirical Risk Minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
- Wei, Z.; Duan, C.; Song, X.; Tian, Y.; Wang, H. AMRNet: Chips Augmentation in Aerial Images Object Detection. arXiv 2020, arXiv:2009.07168. [Google Scholar]
- Zhang, C.; Yang, T.; Yang, J. Image Recognition of Wind Turbine Blade Defects Using Attention-Based MobileNetv1-YOLOv4 and Transfer Learning. Sensors 2022, 22, 6009. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2018; Volume 11211, pp. 3–19. [Google Scholar] [CrossRef]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13708–13717. [Google Scholar] [CrossRef]
- Zhai, H.; Cheng, J.; Wang, M. Rethink the IoU-based loss functions for bounding box regression. In Proceedings of the 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 11–13 December 2020; pp. 1522–1528. [Google Scholar] [CrossRef]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12993–13000. [Google Scholar] [CrossRef]
- Li, H.; Zhou, Q.; Mao, Y.; Zhang, B.; Liu, C. Alpha-SGANet: A multi-attention-scale feature pyramid network combined with lightweight network based on Alpha-IoU loss. PLoS ONE 2022, 17, e0276581. [Google Scholar] [CrossRef] [PubMed]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar] [CrossRef]
- He, J.; Erfani, S.; Ma, X.; Bailey, J.; Chi, Y.; Hua, X.S. Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression. In Proceedings of the Thirty-Fifth Conference on Neural Information Processing Systems, Online, 6–14 December 2021; Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P.S., Vaughan, J.W., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2021; Volume 34, pp. 20230–20242. [Google Scholar]
- Xiong, C.; Hu, S.; Fang, Z. Application of improved YOLOV5 in plate defect detection. Int. J. Adv. Manuf. Technol. 2022, 1–13. [Google Scholar] [CrossRef]
- Liu, L.; Liu, Y.; Yan, J.; Liu, H.; Li, M.; Wang, J.; Zhou, K. Object Detection in Large-Scale Remote Sensing Images With a Distributed Deep Learning Framework. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 8142–8154. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Li, S.; Liu, S.; Cai, Z.; Liu, Y.; Chen, G.; Tu, G. TC-YOLOv5: Rapid detection of floating debris on raspberry Pi 4B. J. Real-Time Image Process. 2023, 20, 17. [Google Scholar] [CrossRef]
- Roboflow. Solar Panels Thermal Dataset. 2022. Available online: https://universe.roboflow.com/neelakshtayal-gmail-com/thermal-dataset-tfoku (accessed on 10 January 2023).
- Roboflow. Crack Detection Dataset. 2022. Available online: https://universe.roboflow.com/crack-7rsjb/crack-detection-ol3yi (accessed on 14 January 2023).
- Roboflow. Corrosion Detection Dataset. 2022. Available online: https://universe.roboflow.com/roboflow-100/corrosion-bi3q3 (accessed on 14 January 2023).
- Ciaglia, F.; Zuppichini, F.S.; Guerrie, P.; McQuade, M.; Solawetz, J. Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark. arXiv 2022, arXiv:2211.13523. [Google Scholar]
- Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 2020, 53, 5455–5516. [Google Scholar] [CrossRef]
- Padilla, R.; Passos, W.L.; Dias, T.L.B.; Netto, S.L.; da Silva, E.A.B. A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit. Electronics 2021, 10, 279. [Google Scholar] [CrossRef]
- Sheikholeslami, S.; Meister, M.; Wang, T.; Payberah, A.H.; Vlassov, V.; Dowling, J. AutoAblation: Automated Parallel Ablation Studies for Deep Learning. In Proceedings of the 1st Workshop on Machine Learning and Systems, Online, 26 April 2021; ACM: New York, NY, USA, 2021; pp. 55–61. [Google Scholar] [CrossRef]
Loss Function | Object Scale | Dataset | mAP |
---|---|---|---|
Single-scale | DOTA-v1.0 | 0.7709 | |
DOTA-v1.5 | 0.7287 | ||
Multi-scale | DOTA-v1.0 | 0.78 | |
DOTA-v1.5 | 0.7502 | ||
Alpha-IoU | Single-scale | DOTA-v1.0 | 0.7761 |
DOTA-v1.5 | 0.7333 | ||
Multi-scale | DOTA-v1.0 | 0.7877 | |
DOTA-v1.5 | 0.7506 |
Images | Train Set | Validation Set | Test Set | Classes | |
---|---|---|---|---|---|
Case Study A | 1200 | 840 | 240 | 120 | Fault |
Case Study B | 2144 | 1500 | 433 | 211 | crack |
Case Study C | 880 | 616 | 176 | 88 | slippage corrosion crack |
Hyperparameter | Value |
---|---|
Epochs | 300 |
Activation Function | SiLU |
Optimizer | RADAM |
Initial LR | |
LR Scheduler | Cosine Annealing |
Data Augmentations | MOSAIC, MixUP, HSV, Flip H/V |
Alpha-IoU () | 3 |
Small (s) | Medium (m) | Large (l) | Extra-Large (x) | |
---|---|---|---|---|
Depth | 0.33 | 0.67 | 1.0 | 1.33 |
Width | 0.50 | 0.75 | 1.0 | 1.25 |
Model | P | R | mAP | mAP | Inf. (ms) | Params () |
---|---|---|---|---|---|---|
YOLOX-Ray-s | 0.73 | 0.917 | 0.877 | 0.422 | 11.95 | 8.94 |
YOLOX-Ray-m | 0.829 | 0.915 | 0.872 | 0.426 | 19.55 | 25.28 |
YOLOX-Ray-l | 0.806 | 0.916 | 0.89 | 0.427 | 29.22 | 54.15 |
YOLOX-Ray-x | 0.733 | 0.879 | 0.845 | 0.376 | 46.56 | 99.0 |
Model | P | R | mAP | mAP | Inf. (ms) | Params () |
---|---|---|---|---|---|---|
YOLOX-Ray-s | 0.984 | 0.987 | 0.996 | 0.66 | 9.62 | 8.94 |
YOLOX-Ray-m | 0.972 | 0.975 | 0.994 | 0.661 | 17.09 | 25.28 |
YOLOX-Ray-l | 0.962 | 0.979 | 0.994 | 0.658 | 25.96 | 54.15 |
YOLOX-Ray-x | 0.972 | 0.971 | 0.977 | 0.625 | 42.53 | 99.0 |
Model | P | R | mAP | mAP | Inf. (ms) | Params () |
---|---|---|---|---|---|---|
YOLOX-Ray-s | 0.762 | 0.866 | 0.859 | 0.484 | 18.04 | 8.94 |
YOLOX-Ray-m | 0.829 | 0.878 | 0.871 | 0.499 | 26.60 | 25.28 |
YOLOX-Ray-l | 0.792 | 0.883 | 0.873 | 0.505 | 37.83 | 54.15 |
YOLOX-Ray-x | 0.832 | 0.876 | 0.877 | 0.518 | 58.12 | 99.0 |
Configuration | P | R | mAP | mAP | Inf. (ms) | FPS |
---|---|---|---|---|---|---|
YOLOX | 0.77 | 0.91 | 0.857 | 0.40 | 11.78 | 84.89 |
Attention Mechanisms | ||||||
YOLOX + SENet | 0.397 | 0.891 | 0.827 | 0.332 | 12.14 | 82.37 |
YOLOX + CBAM | 0.431 | 0.872 | 0.797 | 0.315 | 12.46 | 80.26 |
YOLOX + CA | 0.468 | 0.888 | 0.828 | 0.324 | 12.05 | 82.99 |
YOLOX + SimAM | 0.359 | 0.916 | 0.861 | 0.371 | 12.32 | 81.17 |
Loss Functions | ||||||
YOLOX + CIoU | 0.601 | 0.913 | 0.871 | 0.378 | 11.89 | 84.10 |
YOLOX + DIoU | 0.551 | 0.9 | 0.84 | 0.34 | 12.44 | 80.38 |
YOLOX + GIoU | 0.466 | 0.885 | 0.823 | 0.315 | 11.97 | 83.54 |
YOLOX + Alpha-IoU | 0.464 | 0.915 | 0.866 | 0.378 | 12.18 | 82.10 |
Proposed Method | ||||||
YOLOX-Ray | 0.73 | 0.917 | 0.877 | 0.422 | 11.95 | 83.68 |
Configuration | P | R | mAP | mAP | Inf. (ms) | FPS |
---|---|---|---|---|---|---|
YOLOX | 0.67 | 0.931 | 0.897 | 0.33 | 9.59 | 104.28 |
Attention Mechanisms | ||||||
YOLOX + SENet | 0.699 | 0.76 | 0.821 | 0.357 | 10.45 | 95.69 |
YOLOX + CBAM | 0.719 | 0.88 | 0.845 | 0.361 | 9.98 | 100.2 |
YOLOX + CA | 0.722 | 0.961 | 0.963 | 0.555 | 9.7 | 103.09 |
YOLOX + SimAM | 0.97 | 0.986 | 0.99 | 0.634 | 9.71 | 102.99 |
Loss Functions | ||||||
YOLOX + CIoU | 0.933 | 0.98 | 0.989 | 0.61 | 9.72 | 102.88 |
YOLOX + DIoU | 0.913 | 0.966 | 0.951 | 0.581 | 9.81 | 101.94 |
YOLOX + GIoU | 0.912 | 0.964 | 0.911 | 0.563 | 9.86 | 101.41 |
YOLOX + Alpha-IoU | 0.957 | 0.972 | 0.976 | 0.569 | 9.91 | 100.91 |
Proposed Method | ||||||
YOLOX-Ray | 0.984 | 0.987 | 0.996 | 0.66 | 9.62 | 103.95 |
Configuration | P | R | mAP | mAP | Inf. (ms) | FPS |
---|---|---|---|---|---|---|
YOLOX | 0.29 | 0.821 | 0.768 | 0.389 | 17.47 | 57.24 |
Attention Mechanisms | ||||||
YOLOX + SENet | 0.585 | 0.87 | 0.856 | 0.445 | 17.90 | 55.87 |
YOLOX + CBAM | 0.577 | 0.861 | 0.849 | 0.422 | 18.50 | 54.05 |
YOLOX + CA | 0.521 | 0.866 | 0.858 | 0.439 | 18.58 | 53.81 |
YOLOX + SimAM | 0.277 | 0.871 | 0.84 | 0.45 | 17.47 | 57.24 |
Loss Functions | ||||||
YOLOX + CIoU | 0.657 | 0.857 | 0.846 | 0.435 | 18.33 | 54.56 |
YOLOX + DIoU | 0.649 | 0.857 | 0.846 | 0.437 | 18.16 | 55.07 |
YOLOX + GIoU | 0.604 | 0.846 | 0.825 | 0.427 | 17.64 | 56.69 |
YOLOX + Alpha-IoU | 0.304 | 0.841 | 0.806 | 0.402 | 18.20 | 54.95 |
Proposed Method | ||||||
YOLOX-Ray | 0.762 | 0.866 | 0.859 | 0.484 | 18.04 | 55.43 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Raimundo, A.; Pavia, J.P.; Sebastião, P.; Postolache, O. YOLOX-Ray: An Efficient Attention-Based Single-Staged Object Detector Tailored for Industrial Inspections. Sensors 2023, 23, 4681. https://doi.org/10.3390/s23104681
Raimundo A, Pavia JP, Sebastião P, Postolache O. YOLOX-Ray: An Efficient Attention-Based Single-Staged Object Detector Tailored for Industrial Inspections. Sensors. 2023; 23(10):4681. https://doi.org/10.3390/s23104681
Chicago/Turabian StyleRaimundo, António, João Pedro Pavia, Pedro Sebastião, and Octavian Postolache. 2023. "YOLOX-Ray: An Efficient Attention-Based Single-Staged Object Detector Tailored for Industrial Inspections" Sensors 23, no. 10: 4681. https://doi.org/10.3390/s23104681
APA StyleRaimundo, A., Pavia, J. P., Sebastião, P., & Postolache, O. (2023). YOLOX-Ray: An Efficient Attention-Based Single-Staged Object Detector Tailored for Industrial Inspections. Sensors, 23(10), 4681. https://doi.org/10.3390/s23104681