A New Approach for Detecting Fundus Lesions Using Image Processing and Deep Neural Network Architecture Based on YOLO Model
<p>Block diagram of the proposed approach. First, the images are passed to the Pre-processing block for noise filtering, contrast improvement, partial elimination of the black background of the images, and creation of <span class="html-italic">tiles</span>. Then, the pre-processed images are transferred to the Data Augmentation block, where sub-images are artificially created that will be used in the neural network input layer for training the proposed approach, which will be carried out after a pre-stage step training the network with the weights fitted to the Common Objects in Context (COCO) dataset.</p> "> Figure 2
<p>Representation of fundus image with the lesions annotated: Microaneurysms, Hemorrhages, Soft Exudates, and Hard Exudates.</p> "> Figure 3
<p>The neural network architecture block diagram composes the proposed approach for detecting fundus lesions. The structure is divided into three main blocks: <span class="html-italic">Backbone</span>, <span class="html-italic">Neck</span> and <span class="html-italic">Head</span>. The <span class="html-italic">Backbone</span> block consists of a Focus module, four Conv modules, four CSP modules (C3), and an SPP module. The <span class="html-italic">Neck</span> block consists of four Conv modules and four CSP modules (C3). The network input receives images of size <math display="inline"><semantics> <mrow> <mn>640</mn> <mo>×</mo> <mn>640</mn> <mo>×</mo> <mn>3</mn> </mrow> </semantics></math>, and the output is composed of three detection heads: the P3 layer, responsible for detecting small objects; layer P4, responsible for detecting medium objects; and, finally, the P5 layer, responsible for detecting large objects. CSP (C3), Cross Stage Partial Network C3; SPP, Spatial Pyramid Pooling; Conv, Convolution module; Concat, concatenation; Conv2d, 2D convolution layer.</p> "> Figure 4
<p>FPN+PAN structure used in the <span class="html-italic">Neck</span> of the neural network architecture of the proposed approach. FPN has a <span class="html-italic">Top-Down</span> structure and lateral connections that enable it to build feature maps with high-level semantic meaning, which are used to detect objects at different scales. The PAN architecture conveys strong localization features from the lower-feature maps to the upper-feature maps (<span class="html-italic">Bottom-up</span>). The two structures combined to reinforce the ability to merge characteristics of the <span class="html-italic">Neck</span> structure. The detection of lesions is performed in layers P3, P4 and P5 of the FPN+PAN structure, having outputs with sizes of <math display="inline"><semantics> <mrow> <mn>80</mn> <mo>×</mo> <mn>80</mn> <mo>×</mo> <mn>255</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mn>40</mn> <mo>×</mo> <mn>40</mn> <mo>×</mo> <mn>255</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mn>20</mn> <mo>×</mo> <mn>20</mn> <mo>×</mo> <mn>255</mn> </mrow> </semantics></math>, respectively. FPN, Feature Pyramid Network; PAN, Path Aggregation Network.</p> "> Figure 5
<p>Graph with Precision×Recall curve with a limit of Intersection over Union of <math display="inline"><semantics> <mrow> <mn>0.5</mn> </mrow> </semantics></math> obtained during the validation step of the proposed approach with Stochastic Gradient Descent (SGD) optimizer and <span class="html-italic">Tilling</span> in the Dataset for Diabetic Retinopathy (DDR). EX, hard exudates; HE, hemorrhages; SE, soft exudates; MA, microaneurysms; mAP, mean Average Precision.</p> "> Figure 6
<p>Confusion matrix obtained by the proposed approach with the Stochastic Gradient Descent (SGD) optimizer and <span class="html-italic">Tilling</span> during the validation step on the Dataset for Diabetic Retinopathy (DDR). EX, hard exudates; HE, hemorrhages; SE, soft exudates; MA, microaneurysms; FN, False Negative; FP, False Positive.</p> "> Figure 7
<p>Graph of the Precision×Recall curve with a limit of Intersection over Union of 0.5 obtained during the validation step of the proposed approach with Adam optimizer and <span class="html-italic">Tilling</span> in the Dataset for Diabetic Retinopathy (DDR). EX, hard exudates; HE, hemorrhages; SE, soft exudates; MA, microaneurysms; mAP, mean Average Precision.</p> "> Figure 8
<p>Confusion matrix obtained by the proposed approach with the Adam optimizer and <span class="html-italic">Tilling</span> during the validation step on the Dataset for Diabetic Retinopathy (DDR). EX, hard exudates; HE, hemorrhages; SE, soft exudates; MA, microaneurysms; FN, False Negative; FP, False Positive.</p> "> Figure 9
<p>Batch example with fundus images of the Dataset for Diabetic Retinopathy (DDR) along with annotations (<span class="html-italic">Ground Truth</span>) of the fundus lesions after the pre-processing and data augmentation steps that were used to validate the proposed approach. MA, microaneurysms; HE, hemorrhages; SE, soft exudates.</p> "> Figure 10
<p>Batch with fundus images from the Dataset for Diabetic Retinopathy (DDR) with fundus lesions detected by the proposed approach during the validation step. MA, microaneurysms; HE, hemorrhages; SE, soft exudates; EX, hard exudates.</p> "> Figure 11
<p>Example of fundus image of the dataset accompanied by the segmentation masks of the lesions present in the image. In (<b>a</b>), the fundus image “007-3711-200.jpg” of the test set from the Dataset for Diabetic Retinopathy (DDR), along with annotations (<span class="html-italic">Ground Truth</span>) of the fundus lesions; (<b>b</b>) segmentation mask of hard exudates; (<b>c</b>) hemorrhage segmentation mask; and (<b>d</b>) microaneurysm segmentation masks.</p> "> Figure 12
<p>Detection of fundus lesions performed by the proposed approach and the percentage of confidence obtained in each object located in the fundus image “007-3711-200.jpg” of the test set of the Dataset for Diabetic Retinopathy (DDR). EX, hard exudates; HE, hemorrhages; MA, microaneurysms.</p> "> Figure 13
<p>Detection of fundus lesions in the “007-3892-200.jpg” image of the test set of the Dataset for Diabetic Retinopathy (DDR). It is possible to observe different morphological aspects of the identified lesions, as in the case of hard exudates in the image’s central region and distributed in other regions of the retina, or hemorrhages, which, like the hard exudates detected. Also, they assume different shapes and sizes, in addition to being able to manifest themselves in different regions of the retina. EX, hard exudates; HE, hemorrhages.</p> ">
Abstract
:1. Introduction
2. Related Work
3. Materials and Methods
3.1. Dataset
3.2. Pre-Processing and Image Preparation
3.3. Data Augmentation
3.4. Deep Neural Network Architecture
3.5. Pre-Training
- The initial layers of the architecture of the proposed approach, focused on detecting the most fundamental characteristics of objects, were pre-trained with the weights of the COCO dataset, composed of 80 categories.
- The last three layers (out of a total of 283) that make up the Head of the architecture of the proposed approach are cut and replaced by new layers.
- The new layers added are adjusted by training the neural network on the DR dataset, while the weights of the initial layers are frozen.
- After fine-tuning the Head layers of the architecture, the entire neural network is unfrozen and retrained so that minor adjustments to the weights are performed across the entire network.
- For each adjustment performed, a hyperparameter value is varied, and the proposed approach is retrained, keeping the other hyperparameter values constant.
- The effect of this change is analyzed through the performance evaluation of the proposed approach with the metrics Average Precision () and mean Average Precision (), which will be presented and discussed in the next section of this article.
- If there is an improvement in the metric values, the hyperparameter value is further adjusted (increased or decreased) until the local maximum is reached.
- The exact process is carried out for the other hyperparameters until an optimal set of values is obtained that produces the maximum results of and for the detection of the investigated fundus lesions.
3.6. Performance Metrics
4. Experiments and Results
5. Discussion
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Delgado-Bonal, A.; Martín-Torres, J. Human vision is determined based on information theory. Sci. Rep. 2016, 6, 36038. [Google Scholar] [CrossRef] [PubMed]
- Riordan-Eva, P.; Augsburger, J.J. General Ophthalmology, 19th ed.; Mc Graw Hill Education: New York, NY, USA, 2018. [Google Scholar]
- IORJ. O que é Retina. 2021. Available online: https://iorj.med.br/o-que-e-retina/ (accessed on 15 June 2021).
- Mookiah, M.R.K.; Acharya, U.R.; Chua, C.K.; Lim, C.M.; Ng, E.Y.; Laude, A. Computer-aided diagnosis of diabetic retinopathy: A review. Comput. Biol. Med. 2013, 43, 2136–2155. [Google Scholar] [CrossRef] [PubMed]
- Yen, G.G.; Leong, W.F. A sorting system for hierarchical grading of diabetic fundus images: A preliminary study. IEEE Trans. Inf. Technol. Biomed. 2008, 12, 118–130. [Google Scholar] [CrossRef] [PubMed]
- Alghadyan, A.A. Diabetic retinopathy—An update. Saudi J. Ophthalmol. 2011, 25, 99–111. [Google Scholar] [CrossRef] [PubMed]
- ETDRSR. Grading Diabetic Retinopathy from Stereoscopic Color Fundus Photographs—An Extension of the Modified Airlie House Classification. Ophthalmology 1991, 98, 786–806. [Google Scholar] [CrossRef]
- Philip, S.; Fleming, A.D.; Goatman, K.A.; Fonseca, S.; Mcnamee, P.; Scotland, G.S.; Prescott, G.J.; Sharp, P.F.; Olson, J.A. The efficacy of automated “disease/no disease” grading for diabetic retinopathy in a systematic screening programme. Br. J. Ophthalmol. 2007, 91, 1512–1517. [Google Scholar] [CrossRef]
- ETDRSR. Classification of Diabetic Retinopathy from Fluorescein Angiograms. Ophthalmology 1991, 98, 807–822. [Google Scholar] [CrossRef]
- Hendrick, A.M.; Gibson, M.V.; Kulshreshtha, A. Diabetic Retinopathy. Prim. Care-Clin. Off. Pract. 2015, 42, 451–464. [Google Scholar] [CrossRef]
- Williams, R.; Airey, M.; Baxter, H.; Forrester, J.; Kennedy-Martin, T.; Girach, A. Epidemiology of diabetic retinopathy and macular oedema: A systematic review. Eye 2004, 18, 963–983. [Google Scholar] [CrossRef]
- International Council of Ophthalmology. Updated 2017 ICO Guidelines for Diabetic Eye Care. In ICO Guidelines for Diabetic Eye Care; International Council of Ophthalmology: Brussels, Belgium, 2017; pp. 1–33. [Google Scholar]
- Cardoso, C.d.F.d.S. Segmentação Automática do Disco óptico e de vasos Sanguíneos em Imagens de Fundo de Olho. Ph.D. Thesis, Universidade Federal de Uberlândia, Uberlândia, Brazil, 2019. [Google Scholar]
- Lecaire, T.J.; Palta, M.; Klein, R.; Klein, B.E.; Cruickshanks, K.J. Assessing progress in retinopathy outcomes in type 1 diabetes. Diabetes Care 2013, 36, 631–637. [Google Scholar] [CrossRef] [Green Version]
- Chakrabarti, R.; Harper, C.A.; Keeffe, J.E. Diabetic retinopathy management guidelines. Expert Rev. Ophthalmol. 2012, 7, 417–439. [Google Scholar] [CrossRef]
- Vocaturo, E.; Zumpano, E. The contribution of AI in the detection of the Diabetic Retinopathy. In Proceedings of the—2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020, Seoul, Korea, 16–19 December 2020; pp. 1516–1519. [Google Scholar] [CrossRef]
- Li, T.; Gao, Y.; Wang, K.; Guo, S.; Liu, H.; Kang, H. Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening. Inf. Sci. 2019, 501, 511–522. [Google Scholar] [CrossRef]
- Porwal, P.; Pachade, S.; Kokare, M.; Deshmukh, G.; Son, J.; Bae, W.; Liu, L.; Wang, J.; Liu, X.; Gao, L.; et al. IDRiD: Diabetic Retinopathy—Segmentation and Grading Challenge. Med. Image Anal. 2020, 59, 101561. [Google Scholar] [CrossRef]
- Mateen, M.; Wen, J.; Nasrullah, N.; Sun, S.; Hayat, S. Exudate Detection for Diabetic Retinopathy Using Pretrained Convolutional Neural Networks. Complexity 2020, 2020, 5801870. [Google Scholar] [CrossRef]
- Alyoubi, W.L.; Abulkhair, M.F.; Shalash, W.M. Diabetic Retinopathy Fundus Image Classification and Lesions Localization System Using Deep Learning. Sensors 2021, 21, 3704. [Google Scholar] [CrossRef]
- Dai, L.; Wu, L.; Li, H.; Cai, C.; Wu, Q.; Kong, H.; Liu, R.; Wang, X.; Hou, X.; Liu, Y.; et al. A deep learning system for detecting diabetic retinopathy across the disease spectrum. Nat. Commun. 2021, 12, 3242. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015—Conference Track Proceedings, San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; Volume 2016, pp. 770–778. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the—30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar] [CrossRef] [Green Version]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
- Xie, S.; Tu, Z. Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1395–1403. [Google Scholar] [CrossRef]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. Lect. Notes Comput. Sci. 2018, 11211 LNCS, 833–851. [Google Scholar] [CrossRef]
- Konishi, Y.; Hanzawa, Y.; Kawade, M.; Hashimoto, M. SSD: Single Shot MultiBox Detector. Eccv 2016, 1, 398–413. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Melo, R.; Lima, G.; Corrêa, G.; Zatt, B.; Aguiar, M.; Nachtigall, G.; Araújo, R. Diagnosis of Apple Fruit Diseases in the Wild with Mask R-CNN. In Intelligent Systems; Cerri, R., Prati, R.C., Eds.; BRACIS 2020. Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2020; Volume 12319, pp. 256–270. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. Lect. Notes Comput. Sci. 2015, 9351, 234–241. [Google Scholar] [CrossRef]
- Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef]
- Yu, F.; Wang, D.; Shelhamer, E.; Darrell, T. Deep Layer Aggregation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2403–2412. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar] [CrossRef] [Green Version]
- Tsiknakis, N.; Theodoropoulos, D.; Manikis, G.; Ktistakis, E.; Boutsora, O.; Berto, A.; Scarpa, F.; Scarpa, A.; Fotiadis, D.I.; Marias, K. Deep learning for diabetic retinopathy detection and classification based on fundus images: A review. Comput. Biol. Med. 2021, 135, 104599. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 386–397. [Google Scholar] [CrossRef] [PubMed]
- Ramcharan, A.; McCloskey, P.; Baranowski, K.; Mbilinyi, N.; Mrisho, L.; Ndalahwa, M.; Legg, J.; Hughes, D. Assessing a mobile-based deep learning model for plant disease surveillance. arXiv 2018, arXiv:1805.08692. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Ojha, A.; Sahu, S.P.; Dewangan, D.K. Vehicle Detection through Instance Segmentation using Mask R-CNN for Intelligent Vehicle System. In Proceedings of the 5th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 6–8 May 2021; pp. 954–959. [Google Scholar] [CrossRef]
- Iacovacci, J.; Wu, Z.; Bianconi, G. Mesoscopic structures reveal the network between the layers of multiplex data sets. Phys. Rev.-Stat. Nonlinear Soft Matter Phys. 2015, 92, 42806. [Google Scholar] [CrossRef]
- Bertels, J.; Eelbode, T.; Berman, M.; Vandermeulen, D.; Maes, F.; Bisschops, R.; Blaschko, M.B. Optimizing the Dice Score and Jaccard Index for Medical Image Segmentation: Theory and Practice. Lect. Notes Comput. Sci. 2019, 11765 LNCS, 92–100. [Google Scholar] [CrossRef]
- Kaggle. Diabetic Retinopathy Detection. 2015. Available online: https://www.kaggle.com/c/diabetic-retinopathy-detection (accessed on 11 June 2021).
- Zhu, L.; Geng, X.; Li, Z.; Liu, C. Improving YOLOv5 with Attention Mechanism for Detecting Boulders from Planetary Images. Remote Sens. 2021, 13, 3776. [Google Scholar] [CrossRef]
- Xu, R.; Lin, H.; Lu, K.; Cao, L.; Liu, Y. A forest fire detection system based on ensemble learning. Forests 2021, 12, 217. [Google Scholar] [CrossRef]
- Qi, D.; Tan, W.; Yao, Q.; Liu, J. YOLO5Face: Why Reinventing a Face Detector. arXiv 2021, arXiv:2105.12931. [Google Scholar]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
- Rahman, R.; Azad, Z.B.; Hasan, M.B. Densely-Populated Traffic Detection using YOLOv5 and Non-Maximum Suppression Ensembling. In Proceedings of the International Conference on Big Data, IoT, and Machine Learning, Cox’s Bazar, Bangladesh, 23–25 September 2021. [Google Scholar]
- Zheng, Z.; Zhao, J.; Li, Y. Research on Detecting Bearing-Cover Defects Based on Improved YOLOv3. IEEE Access 2021, 9, 10304–10315. [Google Scholar] [CrossRef]
- Xie, J.; Zheng, S. ZSD-YOLO: Zero-Shot YOLO Detection using Vision-Language KnowledgeDistillation. arXiv 2021, arXiv:2109.12066. [Google Scholar]
- Solawetz, J. YOLOv5: The Latest Model for Object Detection. YOLOv5 New Version—Improvements and Evaluation. 2020. Available online: https://blog.roboflow.com/yolov5-improvements-and-evaluation/ (accessed on 31 May 2021).
- Couturier, R.; Noura, H.N.; Salman, O.; Sider, A. A Deep Learning Object Detection Method for an Efficient Clusters Initialization. arXiv 2021, arXiv:2104.13634. [Google Scholar]
- Li, J.; Guo, S.; Kong, L.; Tan, S.; Yuan, Y. An improved YOLOv3-tiny method for fire detection in the construction industry. E3S Web Conf. 2021, 253, 03069. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Walter, T.; Klein, J.C.; Massin, P.; Erginay, A. A contribution of image processing to the diagnosis of diabetic retinopathy—Detection of exudates in color fundus images of the human retina. IEEE Trans. Med. Imaging 2002, 21, 1236–1243. [Google Scholar] [CrossRef]
- Jasim, M.K.; Najm, R.; Kanan, E.H.; Alfaar, H.E.; Otair, M. Image Noise Removal Techniques: A Comparative Analysis. 2019. Available online: http://www.warse.org/IJSAIT/static/pdf/file/ijsait01862019.pdf (accessed on 22 August 2022).
- Gonzalez, R.; Woods, R. Processamento Digital de Imagens, 3rd ed.; Pearson Prentice Hall: São Paulo, Brazil, 2010. [Google Scholar]
- Santos, C.; De Aguiar, M.S.; Welfer, D.; Belloni, B. Deep Neural Network Model based on One-Stage Detector for Identifying Fundus Lesions. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar] [CrossRef]
- Rai, R.; Gour, P.; Singh, B. Underwater Image Segmentation using CLAHE Enhancement and Thresholding. Int. J. Emerg. Technol. Adv. Eng. 2012, 2, 118–123. [Google Scholar]
- Horry, M.J.; Chakraborty, S.; Paul, M.; Ulhaq, A.; Pradhan, B.; Saha, M.; Shukla, N. COVID-19 Detection Through Transfer Learning Using Multimodal Imaging Data. IEEE Access 2020, 8, 149808–149824. [Google Scholar] [CrossRef]
- El abbadi, N.; Hammod, E. Automatic Early Diagnosis of Diabetic Retinopathy Using Retina Fundus Images Enas Hamood Al-Saadi-Automatic Early Diagnosis of Diabetic Retinopathy Using Retina Fundus Images. Eur. Acad. Res. 2014, 2, 1–22. [Google Scholar]
- Nguyen, T.S.; Stueker, S.; Niehues, J.; Waibel, A. Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019. [Google Scholar] [CrossRef]
- Lam, T.K.; Ohta, M.; Schamoni, S.; Riezler, S. On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR. arXiv 2021, arXiv:2104.01393. [Google Scholar] [CrossRef]
- Liu, C.; Jin, S.; Wang, D.; Luo, Z.; Yu, J.; Zhou, B.; Yang, C. Constrained Oversampling: An Oversampling Approach to Reduce Noise Generation in Imbalanced Datasets with Class Overlapping. IEEE Access 2020, 1–13. [Google Scholar] [CrossRef]
- Japkowicz, N. Learning from imbalanced data sets: A comparison of various strategies. In Proceedings of the AAAI Workshop on Learning from Imbalanced Data Sets, Austin, TX, USA, 31 July 2000; Volume 68, pp. 10–15. [Google Scholar]
- Provost, F. Machine learning from imbalanced data sets 101. In Proceedings of the AAAI’2000 Workshop on Imbalanced Data Sets, Austin, TX, USA, 31 July 2000; p. 3. [Google Scholar]
- Zhou, Z.H.; Liu, X.Y. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 2006, 18, 63–77. [Google Scholar] [CrossRef]
- Buda, M.; Maki, A.; Mazurowski, M.A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 2018, 106, 249–259. [Google Scholar] [CrossRef]
- Zhang, X.; Gweon, H.; Provost, S. Threshold Moving Approaches for Addressing the Class Imbalance Problem and their Application to Multi-label Classification. Pervasivehealth Pervasive Comput. Technol. Healthc. 2020, 169255, 72–77. [Google Scholar] [CrossRef]
- He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
- Fernández, A.; García, S.; Galar, M.; Prati, R.C. Learning from Imbalanced Data Sets; Springer: Berlin, Germany, 2019; pp. 1–377. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Iyer, R.; Shashikant Ringe, P.; Varadharajan Iyer, R.; Prabhulal Bhensdadiya, K. Comparison of YOLOv3, YOLOv5s and MobileNet-SSD V2 for Real-Time Mask Detection Comparison of YOLOv3, YOLOv5s and MobileNet-SSD V2 for Real-Time Mask Detection View project Comparison of YOLOv3, YOLOv5s and MobileNet-SSD V2 for Real-Time Mask Detection. Artic. Int. J. Res. Eng. Technol. 2021, 8, 1156–1160. [Google Scholar]
- Yu, Y.; Zhao, J.; Gong, Q.; Huang, C.; Zheng, G.; Ma, J. Real-Time Underwater Maritime Object Detection in Side-Scan Sonar Images Based on Transformer-YOLOv5. Remote Sens. 2021, 13, 3355. [Google Scholar] [CrossRef]
- Wang, C.Y.; Mark Liao, H.Y.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 1571–1580. [Google Scholar] [CrossRef]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Salt Lake City, UT, USA, 19–21 June 2018; pp. 5987–5995. [Google Scholar] [CrossRef]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015. [Google Scholar]
- Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. Neural Netw. 2017, 107, 3–11. [Google Scholar] [CrossRef]
- Agarap, A.F. Deep Learning using Rectified Linear Units (ReLU). arXiv 2019, arXiv:1803.08375. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. Lect. Notes Comput. Sci. 2014, 8691, 346–361. [Google Scholar] [CrossRef]
- Zeiler, M.D.; Taylor, G.W.; Fergus, R. Adaptive deconvolutional networks for mid and high level feature learning. In Proceedings of the 2011 International Conference on Computer Vision, Washington, DC, USA, 6–13 November 2011; pp. 2018–2025. [Google Scholar] [CrossRef]
- Li, X.; Lai, T.; Wang, S.; Chen, Q.; Yang, C.; Chen, R. Feature Pyramid Networks for Object Detection. In Proceedings of the 2019 IEEE International Conference on Parallel and Distributed Processing with Applications, Big Data and Cloud Computing, Sustainable Computing and Communications, Social Computing and Networking, ISPA/BDCloud/SustainCom/SocialCom 2019, Xiamen, China, 16–18 December 2019; pp. 1500–1504. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar] [CrossRef]
- Jadon, S. A survey of loss functions for semantic segmentation. In Proceedings of the 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2020, Virtual, 27–29 October 2020. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar] [CrossRef] [Green Version]
- Lin, K.; Zhao, H.; Lv, J.; Zhan, J.; Liu, X.; Chen, R.; Li, C.; Huang, Z. Face Detection and Segmentation with Generalized Intersection over Union Based on Mask R-CNN. In Advances in Brain Inspired Cognitive Systems, Proceedings of the 10th International Conference, BICS 2019, Guangzhou, China, 13–14 July 2019; Springer: Berlin/Heidelberg, Germeny, 2019; pp. 106–116. [Google Scholar] [CrossRef]
- Oksuz, K.; Cam, B.C.; Kahraman, F.; Baltaci, Z.S.; Kalkan, S.; Akbas, E. Mask-aware IoU for Anchor Assignment in Real-time Instance Segmentation. arXiv 2021, arXiv:2110.09734. [Google Scholar]
- Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
- Blitzer, J.; Dredze, M.; Pereira, F. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, 23–30 June 2007; Association for Computational Linguistics: Prague, Czech Republic, 2007; pp. 440–447. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. Lect. Notes Comput. Sci. 2014, 8693 LNCS, 740–755. [Google Scholar] [CrossRef]
- Franke, M.; Gopinath, V.; Reddy, C.; Ristić-Durrant, D.; Michels, K. Bounding Box Dataset Augmentation for Long-Range Object Distance Estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Montreal, QC, Canada, 11–17 October 2021; pp. 1669–1677. [Google Scholar]
- Mamdouh, N.; Khattab, A. YOLO-Based Deep Learning Framework for Olive Fruit Fly Detection and Counting. IEEE Access 2021, 9, 84252–84262. [Google Scholar] [CrossRef]
- Dewi, C.; Chen, R.C.; Liu, Y.T.; Jiang, X.; Hartomo, K.D. Yolo V4 for Advanced Traffic Sign Recognition with Synthetic Training Data Generated by Various GAN. IEEE Access 2021, 9, 97228–97242. [Google Scholar] [CrossRef]
- Freitas, G.A.d.L. Aprendizagem Profunda Aplicada ao Futebol de Robôs: Uso de Redes Neurais Convolucionais para Detecção de Objetos Universidade Estadual de Londrina Centro de Tecnologia e Urbanismo Departamento de Engenharia Elétrica Aprendizagem Profunda Aplicada ao Fute; Trabalho de conclusão (curso de engenharia elétrica); Universidade Estadual de Londrina: Londrina, Brazil, 2019. [Google Scholar]
- COCO. Detection Evaluation Metrics Used by COCO. 2021. Available online: https://cocodataset.org/#detection-eval (accessed on 22 August 2022).
- Prechelt, L. Early Stopping—But When? Springer: Berlin/Heidelberg, Germany, 1998; pp. 55–69. [Google Scholar] [CrossRef]
- Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding deep learning requires rethinking generalization. arXiv 2017, arXiv:cs.LG/1611.03530. [Google Scholar] [CrossRef]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Liang, X.; Wu, L.; Li, J.; Wang, Y.; Meng, Q.; Qin, T.; Chen, W.; Zhang, M.; Liu, T.Y. R-Drop: Regularized Dropout for Neural Networks. Adv. Neural Inf. Process. Syst. 2021, 34, 10890–10905. [Google Scholar]
- Labach, A.; Salehinejad, H.; Valaee, S. Survey of Dropout Methods for Deep Neural Networks. arXiv 2019, arXiv:1904.13310. [Google Scholar]
- Davis, J.; Goadrich, M. The relationship between Precision-Recall and ROC curves. In Proceedings of the ICML 2006—Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PL, USA, 25–29 June 2006; pp. 233–240. [Google Scholar]
- Manning, C.D.; Raghavan, P.; Schütze, H. Introduction to Information Retrieval; Cambridge University Press: Cambridge, MA, USA, 2008. [Google Scholar]
- Flach, P.A.; Kull, M. Precision-Recall-Gain curves: PR analysis done right. Adv. Neural Inf. Process. Syst. 2015, 28, 838–846. [Google Scholar]
- Asamoah, D.; Ofori, E.; Opoku, S.; Danso, J. Measuring the Performance of Image Contrast Enhancement Technique. Int. J. Comput. Appl. 2018, 181, 6–13. [Google Scholar] [CrossRef]
- Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS - Improving Object Detection with One Line of Code. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5562–5570. [Google Scholar] [CrossRef]
- Carratino, L.; Cissé, M.; Jenatton, R.; Vert, J.P. On Mixup Regularization. arXiv 2020, arXiv:2006.06049. [Google Scholar]
- Castro, D.J.L. Garra Servo-Controlada com Integração de Informação táCtil e de Proximidade. Master’s Thesis, Universidade de Coimbra, Coimbra, Portugal, 1996. [Google Scholar]
- Chandrasekar, L.; Durga, G. Implementation of Hough Transform for image processing applications. In Proceedings of the 2014 International Conference on Communication and Signal Processing, Bangkok, Thailand, 10–12 October 2014; pp. 843–847. [Google Scholar] [CrossRef]
- Claro, M.; Vogado, L.; Santos, J.; Veras, R. Utilização de Técnicas de Data Augmentation em Imagens: Teoria e Prática. 2020. Available online: https://sol.sbc.org.br/livros/index.php/sbc/catalog/view/48/224/445-1 (accessed on 1 November 2021).
- Li, F.-F.; Krishna, R.; Xu, D. cs231n, Lecture 15—Slide 4, Detection and Segmentation. 2021. Available online: http://cs231n.stanford.edu/slides/2021/lecture_15.pdf (accessed on 26 December 2021).
- Li, F.-F.; Deng, J.; Li, K. ImageNet: Constructing a large-scale image database. J. Vis. 2010, 9, 1037. [Google Scholar] [CrossRef]
- Dai, F.; Fan, B.; Peng, Y. An image haze removal algorithm based on blockwise processing using LAB color space and bilateral filtering. In Proceedings of the 2018 Chinese Control And Decision Conference (CCDC), Shenyang, China, 9–11 June 2018; pp. 5945–5948. [Google Scholar]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object Detection via Region-Based Fully Convolutional Networks. arXiv 2016, arXiv:1605.06409. [Google Scholar]
- dos Santos, J.R.V. Avaliação de Técnicas de Realce de Imagens Digitais Utilizando Métricas Subjetivas e Objetivas. Master’s Thesis, Universidade Federal do Ceará, Fortaleza, Brazil, 2016. [Google Scholar]
- Dvornik, N.; Mairal, J.; Schmid, C. Modeling Visual Context is Key to Augmenting Object Detection Datasets. arXiv 2018, arXiv:1807.07428. [Google Scholar]
- Dwibedi, D.; Misra, I.; Hebert, M. Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection. arXiv 2017, arXiv:1708.01642. [Google Scholar]
- Erfurt, J.; Helmrich, C.R.; Bosse, S.; Schwarz, H.; Marpe, D.; Wiegand, T. A Study of the Perceptually Weighted Peak Signal-To-Noise Ratio (WPSNR) for Image Compression. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 2339–2343. [Google Scholar] [CrossRef]
- Fardo, F.A.; Conforto, V.H.; de Oliveira, F.C.; Rodrigues, P.S. A Formal Evaluation of PSNR as Quality Measurement Parameter for Image Segmentation Algorithms. arXiv 2016, arXiv:1605.07116. [Google Scholar]
- Faria, D. Trabalhos Práticos Análise e Processamento de Imagem; Faculdade de Engenharia da Universidade do Porto: Porto, Portugal, 2010. [Google Scholar]
- Ghiasi, G.; Cui, Y.; Srinivas, A.; Qian, R.; Lin, T.Y.; Cubuk, E.D.; Le, Q.V.; Zoph, B. Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation. arXiv 2021, arXiv:2012.07177. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- Gonzalez, R.C.; Woods, R.E.; Eddins, S.L. Digital Image Processing Using MATLAB; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 2003. [Google Scholar]
- Guo, H.; Mao, Y.; Zhang, R. Augmenting Data with Mixup for Sentence Classification: An Empirical Study. arXiv 2019, arXiv:1905.08941. [Google Scholar]
- Guo, H.; Mao, Y.; Zhang, R. MixUp as Locally Linear Out-Of-Manifold Regularization. arXiv 2018, arXiv:1809.02499. [Google Scholar] [CrossRef]
- Hao, R.; Namdar, K.; Liu, L.; Haider, M.A.; Khalvati, F. A Comprehensive Study of Data Augmentation Strategies for Prostate Cancer Detection in Diffusion-weighted MRI using Convolutional Neural Networks. arXiv 2020, arXiv:2006.01693. [Google Scholar] [CrossRef]
- Hawas, A.R.; Ashour, A.S.; Guo, Y. 8—Neutrosophic set in medical image clustering. In Neutrosophic Set in Medical Image Analysis; Guo, Y., Ashour, A.S., Eds.; Academic Press: Cambridge, MA, USA, 2019; pp. 167–187. [Google Scholar] [CrossRef]
- Huynh-The, T.; Le, B.V.; Lee, S.; Le-Tien, T.; Yoon, Y. Using weighted dynamic range for histogram equalization to improve the image contrast. EURASIP J. Image Video Process. 2014, 2014, 44. [Google Scholar] [CrossRef]
- Illingworth, J.; Kittler, J. The Adaptive Hough Transform. IEEE Trans. Pattern Anal. Mach. Intell. 1987, PAMI-9, 690–698. [Google Scholar] [CrossRef]
- Kim, J.H.; Choo, W.; Jeong, H.; Song, H.O. Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity. arXiv 2021, arXiv:2102.03065. [Google Scholar]
- Liu, Z.; Chen, W.; Zou, Y.; Hu, C. Regions of interest extraction based on HSV color space. In Proceedings of the IEEE 10th International Conference on Industrial Informatics, Beijing, China, 25–27 July 2012; pp. 481–485. [Google Scholar] [CrossRef]
- Ma, J.; Fan, X.; Yang, S.X.; Zhang, X.; Zhu, X. Contrast Limited Adaptive Histogram Equalization-Based Fusion in YIQ and HSI Color Spaces for Underwater Image Enhancement. Int. J. Pattern Recognit. Artif. Intell. 2018, 32, 1–26. [Google Scholar] [CrossRef]
- Marroni, L.S. Aplicação da Transformada de Hough Para Localização dos Olhos em Faces Humanas. Master’s Thesis, Universidade de São Paulo, São Carlos, Brazil, 2002. [Google Scholar]
- McREYNOLDS, T.; BLYTHE, D. CHAPTER 12—Image Processing Techniques. In Advanced Graphics Programming Using OpenGL; McReynolds, T., Blythe, D., Eds.; The Morgan Kaufmann Series in Computer Graphics; Morgan Kaufmann: San Francisco, CA, USA, 2005; pp. 211–245. [Google Scholar] [CrossRef]
- Mukhopadhyay, S.; Mandal, S.; Pratiher, S.; Changdar, S.; Burman, R.; Ghosh, N.; Panigrahi, P.K. A comparative study between proposed Hyper Kurtosis based Modified Duo-Histogram Equalization (HKMDHE) and Contrast Limited Adaptive Histogram Equalization (CLAHE) for Contrast Enhancement Purpose of Low Contrast Human Brain CT scan images. arXiv 2015, arXiv:1505.06219. [Google Scholar]
- Nixon, M.S.; Aguado, A.S. 5—High-level feature extraction: Fixed shape matching. In Feature Extraction and Image Processing for Computer Vision, 4th ed.; Nixon, M.S., Aguado, A.S., Eds.; Academic Press: Cambridge, MA, USA, 2020; pp. 223–290. [Google Scholar] [CrossRef]
- Paris, S.; Durand, F. A Fast Approximation of the Bilateral Filter Using a Signal Processing Approach. In Proceedings of the Computer Vision—ECCV 2006, Graz, Austria, 7–13 May 2006; Leonardis, A., Bischof, H., Pinz, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 568–580. [Google Scholar]
- Park, G.H.; Cho, H.H.; Choi, M.R. A contrast enhancement method using dynamic range separate histogram equalization. IEEE Trans. Consum. Electron. 2008, 54, 1981–1987. [Google Scholar] [CrossRef]
- Peixoto, C.S.B. Estudo de Métodos de Agrupamento e Transformada de Hough para Processamento de Imagens Digitais. Master’s Thesis, Universidade Federal da Bahia, Salvador, Spain, 2003. [Google Scholar]
- Pujari, J.; Pushpalatha, S.; Padmashree, D. Content-Based Image Retrieval using color and shape descriptors. In Proceedings of the 2010 International Conference on Signal and Image Processing, Chennai, India, 15–17 December 2010; pp. 239–242. [Google Scholar] [CrossRef]
- Rong, F.; Du-wu, C.; Bo, H. A Novel Hough Transform Algorithm for Multi-objective Detection. In Proceedings of the 2009 Third International Symposium on Intelligent Information Technology Application, NanChang, China, 21–22 November 2009; Volume 3, pp. 705–708. [Google Scholar] [CrossRef]
- Schettini, R.; Gasparini, F.; Corchs, S.; Marini, F.; Capra, A.; Castorina, A. Contrast image correction method. J. Electron. Imaging 2010, 19, 023005. [Google Scholar] [CrossRef]
- Setiawan, A.W.; Mengko, T.R.; Santoso, O.S.; Suksmono, A.B. Color retinal image enhancement using CLAHE. In Proceedings of the International Conference on ICT for Smart Society 2013: “Think Ecosystem Act Convergence”, ICISS 2013, Jakarta, Indonesia, 13–14 June 2013; pp. 215–217. [Google Scholar] [CrossRef]
- Shene, C.K. Geometric Transformations. 2018. Available online: https://pages.mtu.edu/~shene/COURSES/cs3621/NOTES/geometry/geo-tran.html (accessed on 1 November 2021).
- Shiao, Y.H.; Chen, T.J.; Chuang, K.S.; Lin, C.H.; Chuang, C.C. Quality of compressed medical images. J. Digit. Imaging 2007, 20, 149–159. [Google Scholar] [CrossRef]
- Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Singh, P.K.; Tiwari, V. Normalized Log Twicing Function for DC Coefficients Scaling in LAB Color Space. In Proceedings of the International Conference on Inventive Research in Computing Applications, ICIRCA 2018, Coimbatore, India, 11–12 July 2018; pp. 333–338. [Google Scholar] [CrossRef]
- Sun, K.; Wang, B.; Zhou, Z.Q.; Zheng, Z.H. Real time image haze removal using bilateral filter. Trans. Beijing Inst. Technol. 2011, 31, 810–814. [Google Scholar]
- Unel, F.O.; Ozkalayci, B.O.; Cigla, C. The Power of Tiling for Small Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019; pp. 582–591. [Google Scholar] [CrossRef]
- Wang, K.; Fang, B.; Qian, J.; Yang, S.; Zhou, X.; Zhou, J. Perspective Transformation Data Augmentation for Object Detection. IEEE Access 2020, 8, 4935–4943. [Google Scholar] [CrossRef]
- Wang, S.; Zheng, J.; Hu, H.M.; Li, B. Naturalness Preserved Enhancement Algorithm for Non-Uniform Illumination Images. IEEE Trans. Image Process. 2013, 22, 3538–3548. [Google Scholar] [CrossRef]
- Warner, R. Measurement of Meat Quality | Measurements of Water-holding Capacity and Color: Objective and Subjective. In Encyclopedia of Meat Sciences, 2nd ed.; Dikeman, M., Devine, C., Eds.; Academic Press: Oxford, UK, 2014; pp. 164–171. [Google Scholar] [CrossRef]
- Yadav, G.; Maheshwari, S.; Agarwal, A. Contrast limited adaptive histogram equalization based enhancement for real time video system. In Proceedings of the 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Delhi, India, 24–27 September 2014; pp. 2392–2397. [Google Scholar] [CrossRef]
- Yang, Q.; Tan, K.H.; Ahuja, N. Real-time O(1) bilateral filtering. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 557–564. [Google Scholar] [CrossRef]
- Ye, H.; Shang, G.; Wang, L.; Zheng, M. A new method based on hough transform for quick line and circle detection. In Proceedings of the 2015 8th International Conference on Biomedical Engineering and Informatics (BMEI), Shenyang, China, 14–16 October 2015; pp. 52–56. [Google Scholar] [CrossRef]
- Ye, Z.; Mohamadian, H.; Ye, Y. Discrete Entropy and Relative Entropy Study on Nonlinear Clustering of Underwater and Arial Images. In Proceedings of the 2007 IEEE International Conference on Control Applications, Singapore, 1–3 October 2007; pp. 313–318. [Google Scholar] [CrossRef]
- Yuen, H.; Princen, J.; Illingworth, J.; Kittler, J. Comparative study of Hough Transform methods for circle finding. Image Vis. Comput. 1990, 8, 71–77. [Google Scholar] [CrossRef] [Green Version]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. MixUp: Beyond empirical risk minimization. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018—Conference Track Proceedings, Vancouver, BC, Canada, 30 April–3 May 2018; pp. 1–13. [Google Scholar]
- Zhao, H.; Li, Q.; Feng, H. Multi-Focus Color Image Fusion in the HSI Space Using the Sum-Modified-Laplacian and a Coarse Edge Map. Image Vis. Comput. 2008, 26, 1285–1295. [Google Scholar] [CrossRef]
- Silva, A.D.D.; Carneiro, M.B.P.; Cardoso, C.F.S. Realce De Microaneurimas Em Imagens De Fundo De Olho Utilizando Clahe. In Anais do V Congresso Brasileiro de Eletromiografia e Cinesiologia e X Simpósio de Engenharia Biomédica; Even3: Uberlândia, Brazil, 2018; pp. 772–775. [Google Scholar] [CrossRef] [Green Version]
Ref. | Dataset | # Images | # of Images with Lesions Annotated | Data Augmentation | Unbalanced Data | Model | Performance Measure | Limitations | |||
---|---|---|---|---|---|---|---|---|---|---|---|
MA | HE | EX | SE | ||||||||
[18] | IDRiD | 516 | 81 | 80 | 81 | 40 | Applied | Yes | Mask R-CNN | IoU = 0.9338 | EX, MA, SE e HE not detected |
[17] | DDR | 12,522 | 570 | 601 | 486 | 239 | Not applied | Yes | SSD, YOLO | mAP = 0.0059 mAP = 0.0035 | Low performance in detecting MA and SE |
[20] | Applied | Yes | YOLOv3 | mAP = 0.216 | Imbalance of data used in training | ||||||
[19] | DIARETDB1 | 89 | 80 | 54 | 48 | 36 | Applied | Yes | CNN | Accuracy = 98.91% | MA e HE not detected |
[21] | Private dataset | 666,383 | - | - | - | - | Not applied | Yes | Mask R-CNN | AUC = 0.954 AUC = 0.901 AUC = 0.941 AUC = 0.967 | The dataset with fundus images used for training is private Validation of detection of lesions performed only in images from the private dataset |
# Images | Resolution | MA | HE | EX | SE | Notes of Lesions at Pixel | Multiple Experts |
---|---|---|---|---|---|---|---|
12,522 | Variable | # of images with lesion annotations | Yes | Yes | |||
570 | 601 | 486 | 239 | ||||
# of lesion annotations | |||||||
10,388 | 13,093 | 23,713 | 1558 |
Parameters | Value |
---|---|
Batch Size | 32 |
Number of Epochs | 8000 |
Learning Rate | 0.01 |
Momentum | 0.937 |
Activation Function | SiLU |
Optimizer | SGD and Adam |
Weight Decay | 0.0005 |
Dropout | 10% |
Threshold IoU NMS | 0.45 |
Confidence Limit | 0.25 |
Size of initial anchors (COCO) | (10, 13), (16, 30), (33, 23)—P3 (30, 61), (62, 45), (59, 119)—P4 (116, 90), (156, 198), (373, 326)—P5 |
Adjusted anchor size | (3, 3), (4, 4), (7, 7)—P3 (10, 10), ( 15, 15), (23, 28)—P4 (33, 24), (44, 49), (185, 124)—P5 |
Early Stopping | Patience value |
Models | |||||
---|---|---|---|---|---|
EX | SE | MA | HE | ||
SSD [17] | 0 | 0.0227 | 0 | 0.0007 | 0.0059 |
YOLO [17] | 0.0039 | 0 | 0 | 0.0101 | 0.0035 |
YOLOv3+SGD [20] | - | - | - | - | 0.1100 |
YOLOv3+SGD+Dropout [20] | - | - | - | - | 0.1710 |
YOLOv4 [59] | 0.0370 | 0.1493 | 0.0193 | 0.0849 | 0.0716 |
YOLOv5 (unmodified) | 0.0306 | 0.2500 | 0.0047 | 0.1300 | 0.1040 |
Proposed Approach+SGD without Tilling | 0.1490 | 0.4060 | 0.0454 | 0.2780 | 0.2200 |
Proposed Approach+SGD with Tilling | 0.2290 | 0.3280 | 0.1050 | 0.3330 | 0.2490 |
Models | |||||
---|---|---|---|---|---|
EX | SE | MA | HE | ||
SSD [17] | 0.0002 | 0 | 0.0001 | 0.0056 | 0.0015 |
YOLO [17] | 0.0012 | 0 | 0 | 0.0109 | 0.0030 |
YOLOv5 (unmodified) | 0.0342 | 0.1000 | 0.0028 | 0.0590 | 0.0511 |
Proposed Approach+SGD without Tilling | 0.1430 | 0.2040 | 0.0280 | 0.1480 | 0.1310 |
Proposed Approach+SGD with Tilling | 0.2100 | 0.1380 | 0.0530 | 0.1710 | 0.1430 |
Models | |||||
---|---|---|---|---|---|
EX | SE | MA | HE | ||
SSD [17] | 0 | 0.0227 | 0 | 0.0007 | 0.0059 |
YOLO [17] | 0.0039 | 0 | 0 | 0.0101 | 0.0035 |
YOLOv3+Adam+Dropout [20] | - | - | - | - | 0.2160 |
Proposed Approach+Adam without Tilling | 0.1640 | 0.4020 | 0.0610 | 0.3290 | 0.2390 |
Proposed Approach+Adam with Tilling | 0.2240 | 0.3650 | 0.1110 | 0.3520 | 0.2630 |
Models | |||||
---|---|---|---|---|---|
EX | SE | MA | HE | ||
SSD [17] | 0.0002 | 0 | 0.0001 | 0.0056 | 0.0015 |
YOLO [17] | 0.0012 | 0 | 0 | 0.0109 | 0.0030 |
Proposed Approach+Adam without Tilling | 0.1540 | 0.2110 | 0.0296 | 0.1590 | 0.1380 |
Proposed Approach+Adam with Tilling | 0.2210 | 0.1570 | 0.0553 | 0.1840 | 0.1540 |
Models | Validation | Validation | F1-Score Validation | Test | Test | F1-Score Test |
---|---|---|---|---|---|---|
Proposed Approach+SGD without Tilling | 0.4533 | 0.2233 | 0.2992 | 0.3270 | 0.1540 | 0.2094 |
Proposed Approach+Adam without Tilling | 0.4618 | 0.2484 | 0.3231 | 0.3060 | 0.1710 | 0.2194 |
Proposed Approach+SGD com Tilling | 0.4775 | 0.2653 | 0.3411 | 0.3390 | 0.1820 | 0.2368 |
Proposed Approach+Adam with Tilling | 0.4462 | 0.2859 | 0.3485 | 0.3410 | 0.2000 | 0.2521 |
Models | Inference Time (ms) Validation | Inference Time (ms) Test |
---|---|---|
Proposed Approach+SGD without Tilling | 15.7 | 13.0 |
Proposed Approach+Adam without Tilling | 14.1 | 21.1 |
Proposed Approach+SGD with Tilling | 4.6 | 5.9 |
Proposed Approach+Adam with Tilling | 5.5 | 7.5 |
Models | |||||
---|---|---|---|---|---|
EX | SE | MA | HE | ||
Proposed Approach+SGD without Tilling | 0.1030 | 0.2940 | 0.0601 | 0.2460 | 0.1760 |
Proposed Approach+Adam without Tilling | 0.1040 | 0.1810 | 0.0723 | 0.1350 | 0.1230 |
Proposed Approach+SGD with Tilling | 0.2630 | 0.5340 | 0.2170 | 0.2980 | 0.3280 |
Proposed Approach+Adam with Tilling | 0.2670 | 0.2740 | 0.2100 | 0.3200 | 0.2680 |
Models | |||||
---|---|---|---|---|---|
EX | SE | MA | HE | ||
Proposed Approach+SGD without Tilling | 0.1260 | 0.3000 | 0.0787 | 0.2630 | 0.1920 |
Proposed Approach+Adam without Tilling | 0.0993 | 0.2640 | 0.0661 | 0.1380 | 0.1420 |
Proposed Approach+SGD with Tilling | 0.2390 | 0.3940 | 0.2010 | 0.2890 | 0.2810 |
Proposed Approach+Adam with Tilling | 0.2530 | 0.4090 | 0.2210 | 0.2970 | 0.2950 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Santos, C.; Aguiar, M.; Welfer, D.; Belloni, B. A New Approach for Detecting Fundus Lesions Using Image Processing and Deep Neural Network Architecture Based on YOLO Model. Sensors 2022, 22, 6441. https://doi.org/10.3390/s22176441
Santos C, Aguiar M, Welfer D, Belloni B. A New Approach for Detecting Fundus Lesions Using Image Processing and Deep Neural Network Architecture Based on YOLO Model. Sensors. 2022; 22(17):6441. https://doi.org/10.3390/s22176441
Chicago/Turabian StyleSantos, Carlos, Marilton Aguiar, Daniel Welfer, and Bruno Belloni. 2022. "A New Approach for Detecting Fundus Lesions Using Image Processing and Deep Neural Network Architecture Based on YOLO Model" Sensors 22, no. 17: 6441. https://doi.org/10.3390/s22176441
APA StyleSantos, C., Aguiar, M., Welfer, D., & Belloni, B. (2022). A New Approach for Detecting Fundus Lesions Using Image Processing and Deep Neural Network Architecture Based on YOLO Model. Sensors, 22(17), 6441. https://doi.org/10.3390/s22176441