Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (3,357)

Search Parameters:
Keywords = Yolov3

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 9635 KiB  
Article
Damage Detection and Segmentation in Disaster Environments Using Combined YOLO and Deeplab
by So-Hyeon Jo, Joo Woo, Chang Ho Kang and Sun Young Kim
Remote Sens. 2024, 16(22), 4267; https://doi.org/10.3390/rs16224267 (registering DOI) - 15 Nov 2024
Abstract
Building damage due to various causes occurs frequently and has risk factors that can cause additional collapses. However, it is difficult to accurately identify objects in complex structural sites because of inaccessible situations and image noise. In conventional approaches, close-up images have been [...] Read more.
Building damage due to various causes occurs frequently and has risk factors that can cause additional collapses. However, it is difficult to accurately identify objects in complex structural sites because of inaccessible situations and image noise. In conventional approaches, close-up images have been used to detect and segment damage images such as cracks. In this study, the method of using a deep learning model is proposed for the rapid determination and analysis of multiple damage types, such as cracks and concrete rubble, in disaster sites. Through the proposed method, it is possible to perform analysis by receiving image information from a robot explorer instead of a human, and it is possible to detect and segment damage information even when the damaged point is photographed at a distance. To accomplish this goal, damage information is detected and segmented using YOLOv7 and Deeplabv2. Damage information is quickly detected through YOLOv7, and semantic segmentation is performed using Deeplabv2 based on the bounding box information obtained through YOLOv7. By using images with various resolutions and senses of distance for training, damage information can be effectively detected not only at short distances but also at long distances. When comparing the results, depending on how YOLOv7 and Deeplabv2 were used, they returned better scores than the comparison model, with a Recall of 0.731, Precision of 0.843, F1 of 0.770, and mIoU of 0.638, and had the lowest standard deviation. Full article
Show Figures

Figure 1

Figure 1
<p>Damage detection and segmentation methods.</p>
Full article ">Figure 2
<p>Crack and pile detection using a segmentation model. (<b>a</b>) Crack taken at a short distance, (<b>b</b>) Crack taken at a long distance, (<b>c</b>) Piles taken at a short distance, (<b>d</b>) Piles taken at a long distance.</p>
Full article ">Figure 3
<p>The ratio of objects within one pixel.</p>
Full article ">Figure 4
<p>Difference in detection according to the resolution. (<b>a</b>) Case ① in <a href="#remotesensing-16-04267-f003" class="html-fig">Figure 3</a>, (<b>b</b>) Case ② in <a href="#remotesensing-16-04267-f003" class="html-fig">Figure 3</a>.</p>
Full article ">Figure 5
<p>Experimental environment.</p>
Full article ">Figure 6
<p>Datasets of different resolutions and sizes.</p>
Full article ">Figure 7
<p>Crack and pile detection.</p>
Full article ">Figure 8
<p>Histograms. (<b>a</b>) Recall, (<b>b</b>) Precision, (<b>c</b>) F1 Score, (<b>d</b>) IoU.</p>
Full article ">
17 pages, 2791 KiB  
Article
Object Detection for Yellow Maturing Citrus Fruits from Constrained or Biased UAV Images: Performance Comparison of Various Versions of YOLO Models
by Yuu Tanimoto, Zhen Zhang and Shinichi Yoshida
AgriEngineering 2024, 6(4), 4308-4324; https://doi.org/10.3390/agriengineering6040243 (registering DOI) - 15 Nov 2024
Abstract
Citrus yield estimation using deep learning and unmanned aerial vehicles (UAVs) is an effective method that can potentially achieve high accuracy and labor savings. However, many citrus varieties with different fruit shapes and colors require varietal-specific fruit detection models, making it challenging to [...] Read more.
Citrus yield estimation using deep learning and unmanned aerial vehicles (UAVs) is an effective method that can potentially achieve high accuracy and labor savings. However, many citrus varieties with different fruit shapes and colors require varietal-specific fruit detection models, making it challenging to acquire a substantial number of images for each variety. Understanding the performance of models on constrained or biased image datasets is crucial for determining methods for improving model performance. In this study, we evaluated the accuracy of the You Only Look Once (YOLO) v8m, YOLOv9c, and YOLOv5mu models using constrained or biased image datasets to obtain fundamental knowledge for estimating the yield from UAV images of yellow maturing citrus (Citrus junos) trees. Our results demonstrate that the YOLOv5mu model performed better than the others based on the constrained 25-image datasets, achieving a higher average precision at an intersection over union of 0.50 (AP@50) (85.1%) than the YOLOv8m (80.3%) and YOLOv9c (81.6%) models in the training dataset. On the other hand, it was revealed that the performance improvement due to data augmentation was high for the YOLOv8m and YOLOv9c models. Moreover, the impact of the bias in the training dataset, such as the light condition and the coloring of the fruit, on the performance of the fruit detection model is demonstrated. These findings provide critical insights for selecting models based on the quantity and quality of the image data collected under actual field conditions. Full article
27 pages, 8254 KiB  
Article
Small Object Detection in UAV Remote Sensing Images Based on Intra-Group Multi-Scale Fusion Attention and Adaptive Weighted Feature Fusion Mechanism
by Zhe Yuan, Jianglei Gong, Baolong Guo, Chao Wang, Nannan Liao, Jiawei Song and Qiming Wu
Remote Sens. 2024, 16(22), 4265; https://doi.org/10.3390/rs16224265 (registering DOI) - 15 Nov 2024
Abstract
In view of the issues of missed and false detections encountered in small object detection for UAV remote sensing images, and the inadequacy of existing algorithms in terms of complexity and generalization ability, we propose a small object detection model named IA-YOLOv8 in [...] Read more.
In view of the issues of missed and false detections encountered in small object detection for UAV remote sensing images, and the inadequacy of existing algorithms in terms of complexity and generalization ability, we propose a small object detection model named IA-YOLOv8 in this paper. This model integrates the intra-group multi-scale fusion attention mechanism and the adaptive weighted feature fusion approach. In the feature extraction phase, the model employs a hybrid pooling strategy that combines Avg and Max pooling to replace the single Max pooling operation used in the original SPPF framework. Such modifications enhance the model’s ability to capture the minute features of small objects. In addition, an adaptive feature fusion module is introduced, which is capable of automatically adjusting the weights based on the significance and contribution of features at different scales to improve the detection sensitivity for small objects. Simultaneously, a lightweight intra-group multi-scale fusion attention module is implemented, which aims to effectively mitigate background interference and enhance the saliency of small objects. Experimental results indicate that the proposed IA-YOLOv8 model has a parameter quantity of 10.9 MB, attaining an average precision (mAP) value of 42.1% on the Visdrone2019 test set, an mAP value of 82.3% on the DIOR test set, and an mAP value of 39.8% on the AI-TOD test set. All these results outperform the existing detection algorithms, demonstrating the superior performance of the IA-YOLOv8 model in the task of small object detection for UAV remote sensing. Full article
11 pages, 11841 KiB  
Article
Deep Learning Model Size Performance Evaluation for Lightning Whistler Detection on Arase Satellite Dataset
by I Made Agus Dwi Suarjaya, Desy Purnami Singgih Putri, Yuji Tanaka, Fajar Purnama, I Putu Agung Bayupati, Linawati, Yoshiya Kasahara, Shoya Matsuda, Yoshizumi Miyoshi and Iku Shinohara
Remote Sens. 2024, 16(22), 4264; https://doi.org/10.3390/rs16224264 (registering DOI) - 15 Nov 2024
Abstract
The plasmasphere within Earth’s magnetosphere plays a crucial role in space physics, with its electron density distribution being pivotal and strongly influenced by solar activity. Very Low Frequency (VLF) waves, including whistlers, provide valuable insights into this distribution, making the study of their [...] Read more.
The plasmasphere within Earth’s magnetosphere plays a crucial role in space physics, with its electron density distribution being pivotal and strongly influenced by solar activity. Very Low Frequency (VLF) waves, including whistlers, provide valuable insights into this distribution, making the study of their propagation through the plasmasphere essential for predicting space weather impacts on various technologies. In this study, we evaluate the performance of different deep learning model sizes for lightning whistler detection using the YOLO (You Only Look Once) architecture. To achieve this, we transformed the entirety of raw data from the Arase (ERG) Satellite for August 2017 into 2736 images, which were then used to train the models. Our approach involves exposing the models to spectrogram diagrams—visual representations of the frequency content of signals—derived from the Arase Satellite’s WFC (WaveForm Capture) subsystem, with a focus on analyzing whistler-mode plasma waves. We experimented with various model sizes, adjusting epochs, and conducted performance analysis using a partial set of labeled data. The testing phase confirmed the effectiveness of the models, with YOLOv5n emerging as the optimal choice due to its compact size (3.7 MB) and impressive detection speed, making it suitable for resource-constrained applications. Despite challenges such as image quality and the detection of smaller whistlers, YOLOv5n demonstrated commendable accuracy in identifying scenarios with simple shapes, thereby contributing to a deeper understanding of whistlers’ impact on Earth’s magnetosphere and fulfilling the core objectives of this study. Full article
Show Figures

Figure 1

Figure 1
<p>Spectrogram of Arase (ERG) for 15 August 2017, 02:02 UT.</p>
Full article ">Figure 2
<p>Schematic overview of lightning whistler detection on Arase satellite dataset.</p>
Full article ">Figure 3
<p>The YOLOv5 nano model performance result. The x-axis corresponds to the epoch and the y-axis corresponds to the respected title of each subfigure.</p>
Full article ">Figure 4
<p>The detection result of four YOLOv5 models. Event of 15 August 2017, at 02:02 UT.</p>
Full article ">Figure 5
<p>Annotated and predicted spectrogram of YOLOv5 nano.</p>
Full article ">
14 pages, 1108 KiB  
Article
DualPFL: A Dual Sparse Pruning Method with Efficient Federated Learning for Edge-Based Object Detection
by Shijin Song, Sen Du, Yuefeng Song and Yongxin Zhu
Appl. Sci. 2024, 14(22), 10547; https://doi.org/10.3390/app142210547 - 15 Nov 2024
Abstract
With the increasing complexity of neural network models, the huge communication overhead in federated learning (FL) has become a significant issue. To mitigate resource consumption, incorporating pruning algorithms into federated learning has emerged as a promising approach. However, existing pruning algorithms exhibit high [...] Read more.
With the increasing complexity of neural network models, the huge communication overhead in federated learning (FL) has become a significant issue. To mitigate resource consumption, incorporating pruning algorithms into federated learning has emerged as a promising approach. However, existing pruning algorithms exhibit high sensitivity to network architectures and typically require multiple sessions of retraining to identify optimal structures. The direct application of such strategies to FL would inevitably introduce an additional communication cost. To this end, we propose a novel communication-efficient federated learning framework, DualPFL (Dual Sparse Pruning Federated Learning), designed to address these issues by implementing dynamic sparse pruning and adaptive model aggregation strategies. The experimental results demonstrate that, compared to similar works, our framework can improve convergence speed by more than two times under non-IID data, achieving up to 84% accuracy on the CIFAR-10 dataset, 95% mean average precision (mAP) on the COCO dataset using YOLOv8, and 96% accuracy on the TT100K traffic sign datasets. These findings indicate that DualPFL facilitates secure and efficient collaborative computing in smart city applications. Full article
Show Figures

Figure 1

Figure 1
<p>Our proposed DualPFL framework.</p>
Full article ">Figure 2
<p>Pipeline of the scale-based model aggregation. The red and yellow lines represent the weights that the three clients aggregate together.</p>
Full article ">Figure 3
<p>Test accuracy on CIFAR-10 under different Dirichlet (<span class="html-italic">α</span>). (<b>a</b>) <span class="html-italic">α</span> = 0.5; (<b>b</b>) <span class="html-italic">α</span> = 10.</p>
Full article ">
19 pages, 13994 KiB  
Article
An Enhanced Feature-Fusion Network for Small-Scale Pedestrian Detection on Edge Devices
by Min Hu, Yaorong Zhang, Teng Jiao, Huijun Xue, Xue Wu, Jianguo Luo, Shipeng Han and Hao Lv
Sensors 2024, 24(22), 7308; https://doi.org/10.3390/s24227308 - 15 Nov 2024
Abstract
Small-scale pedestrian detection is one of the challenges in general object detection. Factors such as complex backgrounds, long distances, and low-light conditions make the image features of small-scale pedestrians less distinct, further increasing the difficulty of detection. To address these challenges, an Enhanced [...] Read more.
Small-scale pedestrian detection is one of the challenges in general object detection. Factors such as complex backgrounds, long distances, and low-light conditions make the image features of small-scale pedestrians less distinct, further increasing the difficulty of detection. To address these challenges, an Enhanced Feature-Fusion YOLO network (EFF-YOLO) for small-scale pedestrian detection is proposed. Specifically, this method employs a backbone based on the FasterNet block within YOLOv8n, which is designed to enhance the extraction of spatial features while reducing redundant operation. Furthermore, the gather-and-distribute (GD) mechanism is integrated into the neck of the network to realize the aggregation and distribution of global information and multi-level features. This not only strengthens the faint features of small-scale pedestrians but also effectively suppresses complex background information, thereby improving the accuracy of small-scale pedestrians. Experimental results indicate that EFF-YOLO achieves detection accuracies of 72.5%, 72.3%, and 91% on the three public datasets COCO-person, CityPersons, and LLVIP, respectively. Moreover, the proposed method reaches a detection speed of 50.7 fps for 1920 × 1080-pixel video streams on the edge device Jetson Orin NX, marking a 15.2% improvement over the baseline network. Thus, the proposed EFF-YOLO method not only boasts high detection accuracy but also demonstrates excellent real-time performance on edge devices. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection: 2nd Edition)
Show Figures

Figure 1

Figure 1
<p>The architecture of EFF-YOLO.</p>
Full article ">Figure 2
<p>The structure of FasterNet.</p>
Full article ">Figure 3
<p>The neck network based on the GD mechanism.</p>
Full article ">Figure 4
<p>The structure of low-stage GD branch.</p>
Full article ">Figure 5
<p>The details of IIM with LAF.</p>
Full article ">Figure 6
<p>The structure of high-stage GD branch.</p>
Full article ">Figure 7
<p>Comparison of visualized heatmaps from the EFF-YOLO and baseline on three datasets. (<b>a</b>) COCO-person; (<b>b</b>) Citypersons; (<b>c</b>) LLVIP.</p>
Full article ">Figure 8
<p>PR curves and mAP<sub>0.5:0.95</sub> iteration curves of different models. (<b>a</b>) PR curve on COCO-person; (<b>b</b>) PR curve on Citypersons; (<b>c</b>) PR curve on LLVIP; (<b>d</b>) iterative curve of mAP<sub>0.5:0.95</sub> on COCO-person; (<b>e</b>) iterative curve of mAP<sub>0.5:0.95</sub> on Citypersons; (<b>f</b>) iterative curve of mAP<sub>0.5:0.95</sub> on LLVIP.</p>
Full article ">Figure 9
<p>Visualization results of different models on three datasets. (<b>a</b>) COCO-person; (<b>b</b>) Citypersons; (<b>c</b>) LLVIP.</p>
Full article ">Figure 9 Cont.
<p>Visualization results of different models on three datasets. (<b>a</b>) COCO-person; (<b>b</b>) Citypersons; (<b>c</b>) LLVIP.</p>
Full article ">Figure 10
<p>Comparison of algorithm speed and accuracy on three datasets. (<b>a</b>) COCO-person; (<b>b</b>) Citypersons; (<b>c</b>) LLVIP.</p>
Full article ">Figure 11
<p>The experimental platform based on edge devices.</p>
Full article ">Figure 12
<p>Visualization of small-scale pedestrian-detection results under different illumination on the Jetson Orin NX platform. (<b>a</b>) Day. (<b>b</b>) Night.</p>
Full article ">
17 pages, 12206 KiB  
Article
Smart Monitoring Method for Land-Based Sources of Marine Outfalls Based on an Improved YOLOv8 Model
by Shicheng Zhao, Haolan Zhou and Haiyan Yang
Water 2024, 16(22), 3285; https://doi.org/10.3390/w16223285 - 15 Nov 2024
Abstract
Land-based sources of marine outfalls are a major source of marine pollution. The monitoring of land-based sources of marine outfalls is an important means for marine environmental protection and governance. Traditional on-site manual monitoring methods are inefficient, expensive, and constrained by geographic conditions. [...] Read more.
Land-based sources of marine outfalls are a major source of marine pollution. The monitoring of land-based sources of marine outfalls is an important means for marine environmental protection and governance. Traditional on-site manual monitoring methods are inefficient, expensive, and constrained by geographic conditions. Satellite remote sensing spectral analysis methods can only identify pollutant plumes and are affected by discharge timing and cloud/fog interference. Therefore, we propose a smart monitoring method for land-based sources of marine outfalls based on an improved YOLOv8 model, using unmanned aerial vehicles (UAVs). This method can accurately identify and classify marine outfalls, offering high practical application value. Inspired by the sparse sampling method in compressed sensing, we incorporated a multi-scale dilated attention mechanism into the model and integrated dynamic snake convolutions into the C2f module. This approach enhanced the model’s detection capability for occluded and complex-feature targets while constraining the increase in computational load. Additionally, we proposed a new loss calculation method by combining Inner-IoU (Intersection over Union) and MPDIoU (IoU with Minimum Points Distance), which further improved the model’s regression speed and its ability to predict multi-scale targets. The final experimental results show that the improved model achieved an mAP50 (mean Average Precision at 50) of 87.0%, representing a 3.4% increase from the original model, effectively enabling the smart monitoring of land-based marine discharge outlets. Full article
(This article belongs to the Section Oceans and Coastal Zones)
Show Figures

Figure 1

Figure 1
<p>Zhanjiang city outlets point map. (<b>a</b>) “gully”, (<b>b</b>) “weir”, (<b>c</b>) “pipe”, (<b>d</b>) “culvert”, (<b>e</b>) “gully”, (<b>f</b>) “weir”, (<b>g</b>) “pipe”, (<b>h</b>) “culvert”.</p>
Full article ">Figure 2
<p>YOLOv8 model structure.</p>
Full article ">Figure 3
<p>MSDA mechanism structure. The red points represent the key positions of the convolutional kernel, the yellow area shows the dilation of the kernel at <math display="inline"><semantics> <mrow> <mi mathvariant="normal">r</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, the blue area shows the dilation at <math display="inline"><semantics> <mrow> <mi mathvariant="normal">r</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>, and the green area shows the dilation at <math display="inline"><semantics> <mrow> <mi mathvariant="normal">r</mi> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 4
<p>C2f module structure.</p>
Full article ">Figure 5
<p>DSConv selectable receptive fields. The blue line represents the continuous shift of the convolutional kernel in the horizontal direction, while the red line represents the continuous shift of the convolutional kernel in the vertical direction.</p>
Full article ">Figure 6
<p>Inner-MPDIoU diagram.</p>
Full article ">Figure 7
<p>(<b>a</b>) Anchor box category number statistics, (<b>b</b>) Anchor box position statistics. The color of Anchor box in (<b>b</b>) belongs to the same category as that in (<b>a</b>).</p>
Full article ">Figure 8
<p>(<b>a</b>) Normalized confusion matrices for YOLOv8 model, (<b>b</b>) normalized confusion matrices for YOLOv8+MSDA model.</p>
Full article ">Figure 9
<p>(<b>a</b>) YOLOv8 model’s predicted results, (<b>b</b>) our model’s predicted results.</p>
Full article ">Figure 10
<p>(<b>a</b>) P–R curve of the improved model, (<b>b</b>) P–R curve of the improved model after transfer learning.</p>
Full article ">Figure 11
<p>Model training process.</p>
Full article ">
18 pages, 2990 KiB  
Article
A GGCM-E Based Semantic Filter and Its Application in VSLAM Systems
by Yuanjie Li, Chunyan Shao and Jiaming Wang
Electronics 2024, 13(22), 4487; https://doi.org/10.3390/electronics13224487 - 15 Nov 2024
Viewed by 11
Abstract
Image matching-based visual simultaneous localization and mapping (vSLAM) extracts low-level pixel features to reconstruct camera trajectories and maps through the epipolar geometry method. However, it fails to achieve correct trajectories and mapping when there are low-quality feature correspondences in several challenging environments. Although [...] Read more.
Image matching-based visual simultaneous localization and mapping (vSLAM) extracts low-level pixel features to reconstruct camera trajectories and maps through the epipolar geometry method. However, it fails to achieve correct trajectories and mapping when there are low-quality feature correspondences in several challenging environments. Although the RANSAC-based framework can enable better results, it is computationally inefficient and unstable in the presence of a large number of outliers. A Faster R-CNN learning-based semantic filter is proposed to explore the semantic information of inliers to remove low-quality correspondences, helping vSLAM localize accurately in our previous work. However, the semantic filter learning method generalizes low precision for low-level and dense texture-rich scenes, leading the semantic filter-based vSLAM to be unstable and have poor geometry estimation. In this paper, a GGCM-E-based semantic filter using YOLOv8 is proposed to address these problems. Firstly, the semantic patches of images are collected from the KITTI dataset, the TUM dataset provided by the Technical University of Munich, and real outdoor scenes. Secondly, the semantic patches are classified by our proposed GGCM-E descriptors to obtain the YOLOv8 neural network training dataset. Finally, several semantic filters for filtering low-level and dense texture-rich scenes are generated and combined into the ORB-SLAM3 system. Extensive experiments show that the semantic filter can detect and classify semantic levels of different scenes effectively, filtering low-level semantic scenes to improve the quality of correspondences, thus achieving accurate and robust trajectory reconstruction and mapping. For the challenging autonomous driving benchmark and real environments, the vSLAM system with respect to the GGCM-E-based semantic filter demonstrates its superiority regarding reducing the 3D position error, such that the absolute trajectory error is reduced by up to approximately 17.44%, showing its promise and good generalization. Full article
(This article belongs to the Special Issue Application of Artificial Intelligence in Robotics)
Show Figures

Figure 1

Figure 1
<p>ORB-SLAM3 framework with the proposed semantic filter module.</p>
Full article ">Figure 2
<p>Framework of the proposed semantic filter approach.</p>
Full article ">Figure 3
<p>Computation of GGCM-E features.</p>
Full article ">Figure 4
<p>Semantic filtering on the KITTI frame.</p>
Full article ">Figure 5
<p>Semantic filtering on our captured outdoor frame.</p>
Full article ">Figure 6
<p>The trajectory of KITTI07 with respect to the ground truth using GGCM-E semantic filter.</p>
Full article ">Figure 7
<p>Comparison of trajectories between the proposed method and ground truth in the KITTI dataset.</p>
Full article ">Figure 8
<p>Comparison on APEs with respect to ground truth of the ORB-SLAM3 and the semantic filter.</p>
Full article ">Figure 9
<p>Dense texture-rich sequences in TUM dataset (DTR sequences).</p>
Full article ">Figure 10
<p>Comparison of camera trajectories in DTR sequences.</p>
Full article ">Figure 11
<p>Comparison of the trajectory with respect to the ground truth of DynaSLAM and GGCM-E+DynaSLAM on KITTI00 sequences.</p>
Full article ">Figure 12
<p>Comparison of the APEs of semantic filter-based Structure-SLAM, LDSO and DynaSLAM on KITTI07 sequences.</p>
Full article ">
31 pages, 7153 KiB  
Article
You Only Look Once Version 5 and Deep Simple Online and Real-Time Tracking Algorithms for Real-Time Customer Behavior Tracking and Retail Optimization
by Mohamed Shili, Osama Sohaib and Salah Hammedi
Algorithms 2024, 17(11), 525; https://doi.org/10.3390/a17110525 - 15 Nov 2024
Viewed by 46
Abstract
The speedy progress of computer vision and machine learning engineering has inaugurated novel means for improving the purchasing experiment in brick-and-mortar stores. This paper examines the utilization of YOLOv (You Only Look Once) and DeepSORT (Deep Simple Online and Real-Time Tracking) algorithms for [...] Read more.
The speedy progress of computer vision and machine learning engineering has inaugurated novel means for improving the purchasing experiment in brick-and-mortar stores. This paper examines the utilization of YOLOv (You Only Look Once) and DeepSORT (Deep Simple Online and Real-Time Tracking) algorithms for the real-time detection and analysis of the purchasing penchant in brick-and-mortar market surroundings. By leveraging these algorithms, stores can track customer behavior, identify popular products, and monitor high-traffic areas, enabling businesses to adapt quickly to customer preferences and optimize store layout and inventory management. The methodology involves the integration of YOLOv5 for accurate and rapid object detection combined with DeepSORT for the effective tracking of customer movements and interactions with products. Information collected in in-store cameras and sensors is handled to detect tendencies in customer behavior, like repeatedly inspected products, periods expended in specific intervals, and product handling. The results indicate a modest improvement in customer engagement, with conversion rates increasing by approximately 3 percentage points, and a decline in inventory waste levels, from 88% to 75%, after system implementation. This study provides essential insights into the further integration of algorithm technology in physical retail locations and demonstrates the revolutionary potential of real-time behavior tracking in the retail industry. This research determines the foundation for future developments in functional strategies and customer experience optimization by offering a solid framework for creating intelligent retail systems. Full article
(This article belongs to the Section Evolutionary Algorithms and Machine Learning)
Show Figures

Figure 1

Figure 1
<p>Architecture of YOLOv5.</p>
Full article ">Figure 2
<p>The architecture of DeepSORT.</p>
Full article ">Figure 3
<p>The proposed architecture for this system.</p>
Full article ">Figure 4
<p>The data flow diagram for this system.</p>
Full article ">Figure 5
<p>Flowchart of the real-time retail tendency detection system.</p>
Full article ">Figure 6
<p>Recommendations generated by the proposed system.</p>
Full article ">Figure 7
<p>Product detection.</p>
Full article ">Figure 8
<p>Confusion matrix for evaluating YOLOv5 detections.</p>
Full article ">Figure 9
<p>Using DeepSORT algorithm in a store.</p>
Full article ">Figure 10
<p>Graph of the model accuracy.</p>
Full article ">Figure 11
<p>Graph of the precision through the datasets.</p>
Full article ">Figure 12
<p>Graph of the recall.</p>
Full article ">Figure 13
<p>Graph of the F1-score calculation.</p>
Full article ">Figure 14
<p>Overview of latency and computing cost.</p>
Full article ">Figure 15
<p>Graph of accuracy and standard deviation over multiple executions.</p>
Full article ">Figure 16
<p>YOLOv5 object detection performance.</p>
Full article ">Figure 17
<p>DeepSORT tracking performance.</p>
Full article ">Figure 18
<p>Conversion rates before and after implementation of YOLOv5 + DeepSORT.</p>
Full article ">Figure 19
<p>Inventory waste levels before and after system integration.</p>
Full article ">Figure 20
<p>Comparison of YOLOv5 + DeepSORT vs. traditional methods.</p>
Full article ">Figure 21
<p>Confusion metrics for different models.</p>
Full article ">Figure 22
<p>Performance comparison of YOLOv5 + DeepSORT vs. other methods.</p>
Full article ">
19 pages, 5488 KiB  
Article
Insulator-YOLO: Transmission Line Insulator Risk Identification Based on Improved YOLOv5
by Nan Zhang, Jingyi Su, Yang Zhao and Hua Chen
Processes 2024, 12(11), 2552; https://doi.org/10.3390/pr12112552 - 15 Nov 2024
Viewed by 66
Abstract
This study introduces an innovative method for detecting risks in transmission line insulators by developing an optimized variant of YOLOv5, named Insulator-YOLO. The model addresses key challenges in small-defect detection, complex backgrounds, and computational efficiency. By incorporating GhostNetV2 in the backbone to streamline [...] Read more.
This study introduces an innovative method for detecting risks in transmission line insulators by developing an optimized variant of YOLOv5, named Insulator-YOLO. The model addresses key challenges in small-defect detection, complex backgrounds, and computational efficiency. By incorporating GhostNetV2 in the backbone to streamline feature extraction and introducing SE and CBAM attention mechanisms, the model enhances its focus on critical features. The Bibi-directional Feature feature Pyramid pyramid Network network (BiFPN) is applied to enhance multi-scale feature fusion, and the integration of CIoU and NWD loss functions optimizes bounding box regression, achieving higher accuracy. Additionally, focal loss mitigates the imbalance between positive and negative samples, leading to more accurate and robust defect detection. Extensive evaluations demonstrate that Insulator-YOLO significantly improves detection accuracy and efficiency in real-world power line insulator defects, providing a reliable solution for maintaining the integrity of transmission systems. Full article
(This article belongs to the Special Issue AI-Based Modelling and Control of Power Systems)
Show Figures

Figure 1

Figure 1
<p>FPN + PAN network structure.</p>
Full article ">Figure 2
<p>YOLOv5 network structure.</p>
Full article ">Figure 3
<p>Network structure diagram of our Insulator-YOLO algorithm.</p>
Full article ">Figure 4
<p>GhostNetV2 bottleneck.</p>
Full article ">Figure 5
<p>Schematic diagram of SE attention mechanism.</p>
Full article ">Figure 6
<p>The modular structure of CBAM.</p>
Full article ">Figure 7
<p>Structure comparison of PANet (<b>left</b>) and BiFPN (<b>right</b>).</p>
Full article ">Figure 8
<p>Part of the insulator self-explosion defect data.</p>
Full article ">Figure 9
<p>Model mAP curve comparison diagram.</p>
Full article ">Figure 10
<p>Model loss curve comparison diagram.</p>
Full article ">Figure 11
<p>Partial visualization results.</p>
Full article ">
23 pages, 5517 KiB  
Article
Research on an Eye Control Method Based on the Fusion of Facial Expression and Gaze Intention Recognition
by Xiangyang Sun and Zihan Cai
Appl. Sci. 2024, 14(22), 10520; https://doi.org/10.3390/app142210520 - 15 Nov 2024
Viewed by 157
Abstract
With the deep integration of psychology and artificial intelligence technology and other related technologies, eye control technology has achieved certain results at the practical application level. However, it is found that the accuracy of the current single-modal eye control technology is still not [...] Read more.
With the deep integration of psychology and artificial intelligence technology and other related technologies, eye control technology has achieved certain results at the practical application level. However, it is found that the accuracy of the current single-modal eye control technology is still not high, which is mainly caused by the inaccurate eye movement detection caused by the high randomness of eye movements in the process of human–computer interaction. Therefore, this study will propose an intent recognition method that fuses facial expressions and eye movement information and expects to complete an eye control method based on the fusion of facial expression and eye movement information based on the multimodal intent recognition dataset, including facial expressions and eye movement information constructed in this study. Based on the self-attention fusion strategy, the fused features are calculated, and the multi-layer perceptron is used to classify the fused features, so as to realize the mutual attention between different features, and improve the accuracy of intention recognition by enhancing the weight of effective features in a targeted manner. In order to solve the problem of inaccurate eye movement detection, an improved YOLOv5 model was proposed, and the accuracy of the model detection was improved by adding two strategies: a small target layer and a CA attention mechanism. At the same time, the corresponding eye movement behavior discrimination algorithm was combined for each eye movement action to realize the output of eye behavior instructions. Finally, the experimental verification of the eye–computer interaction scheme combining the intention recognition model and the eye movement detection model showed that the accuracy of the eye-controlled manipulator to perform various tasks could reach more than 95 percent based on this scheme. Full article
Show Figures

Figure 1

Figure 1
<p>The technical route of this paper’s research.</p>
Full article ">Figure 2
<p>Face image dataset example.</p>
Full article ">Figure 3
<p>This eye movement intent detection flow chart describes the conversion of eye movement data to intent classification.</p>
Full article ">Figure 4
<p>Integration framework based on attention mechanism.</p>
Full article ">Figure 5
<p>Comparison of performance in single-mode and multimodal prediction.</p>
Full article ">Figure 6
<p>Line charts of five indicators of different models.</p>
Full article ">Figure 7
<p>Loss function curve of Anchor method before and after improvement.</p>
Full article ">Figure 8
<p>Structure diagram of the CA attention mechanism [<a href="#B9-applsci-14-10520" class="html-bibr">9</a>].</p>
Full article ">Figure 9
<p>Improved YOLOv5 model structure.</p>
Full article ">Figure 10
<p>Improved loss variation diagram for the YOLOv5 model.</p>
Full article ">Figure 10 Cont.
<p>Improved loss variation diagram for the YOLOv5 model.</p>
Full article ">Figure 11
<p>The average accuracy (AP) curve of the improved model.</p>
Full article ">Figure 12
<p>The F1 score curve of the improved model.</p>
Full article ">Figure 13
<p>Test results before and after improvement.</p>
Full article ">Figure 14
<p>Human–computer interaction experiment platform.</p>
Full article ">Figure 15
<p>The overall flow chart of the experiment.</p>
Full article ">Figure 16
<p>Comparison of calculation efficiency indicators.</p>
Full article ">Figure 17
<p>Complete human–computer interaction process.</p>
Full article ">Figure 18
<p>Test results.</p>
Full article ">Figure 19
<p>Test results for different tasks.</p>
Full article ">
25 pages, 17437 KiB  
Article
ACD-Net: An Abnormal Crew Detection Network for Complex Ship Scenarios
by Zhengbao Li, Heng Zhang, Ding Gao, Zewei Wu, Zheng Zhang and Libin Du
Sensors 2024, 24(22), 7288; https://doi.org/10.3390/s24227288 - 14 Nov 2024
Viewed by 130
Abstract
Abnormal behavior of crew members is an important cause of frequent ship safety accidents. The existing abnormal crew recognition algorithms are affected by complex ship environments and have low performance in real and open shipborne environments. This paper proposes an abnormal crew detection [...] Read more.
Abnormal behavior of crew members is an important cause of frequent ship safety accidents. The existing abnormal crew recognition algorithms are affected by complex ship environments and have low performance in real and open shipborne environments. This paper proposes an abnormal crew detection network for complex ship scenarios (ACD-Net), which uses a two-stage algorithm to detect and identify abnormal crew members in real-time. An improved YOLOv5s model based on a transformer and CBAM mechanism (YOLO-TRCA) is proposed with a C3-TransformerBlock module to enhance the feature extraction ability of crew members in complex scenes. The CBAM attention mechanism is introduced to reduce the interference of background features and improve the accuracy of real-time detection of crew abnormal behavior. The crew identification algorithm (CFA) tracks and detects abnormal crew members’ faces in real-time in an open environment (CenterFace), continuously conducts face quality assessment (Filter), and selects high-quality facial images for identity recognition (ArcFace). The CFA effectively reduces system computational overhead and improves the success rate of identity recognition. Experimental results indicate that ACD-Net achieves 92.3% accuracy in detecting abnormal behavior and a 69.6% matching rate for identity recognition, with a processing time of under 39.5 ms per frame at a 1080P resolution. Full article
(This article belongs to the Special Issue Human-Centric Sensing Technology and Systems: 2nd Edition)
Show Figures

Figure 1

Figure 1
<p>ACD-Net: Abnormal crew detection network.</p>
Full article ">Figure 2
<p>Four types of images with distinct features: (<b>a</b>) image with uneven lighting and significant brightness variations; (<b>b</b>) image with local overexposure, underexposure, or blurring; (<b>c</b>) image with a cluttered background and a small proportion of crew images; (<b>d</b>) image with severe occlusions and overlaps between crew and equipment.</p>
Full article ">Figure 3
<p>YOLOv5s feature visualization and recognition effect diagram: (<b>a</b>) original image; (<b>b</b>) the C3 model before SPPF; (<b>c</b>) SPPF; (<b>d</b>) input 1 of the neck (PAN); (<b>e</b>) input 2 of the neck (PAN); (<b>f</b>) input 3 of the neck (PAN); (<b>g</b>) YOLOv5s detection diagram.</p>
Full article ">Figure 4
<p>YOLOv5s feature visualization and recognition effect diagram: (<b>a</b>) original image; (<b>b</b>) the C3 model before SPPF; (<b>c</b>) SPPF; (<b>d</b>) input 1 of the neck (PAN); (<b>e</b>) input 2 of the neck (PAN); (<b>f</b>) input 3 of the neck (PAN); (<b>g</b>) YOLOv5s detection diagram.</p>
Full article ">Figure 4 Cont.
<p>YOLOv5s feature visualization and recognition effect diagram: (<b>a</b>) original image; (<b>b</b>) the C3 model before SPPF; (<b>c</b>) SPPF; (<b>d</b>) input 1 of the neck (PAN); (<b>e</b>) input 2 of the neck (PAN); (<b>f</b>) input 3 of the neck (PAN); (<b>g</b>) YOLOv5s detection diagram.</p>
Full article ">Figure 5
<p>YOLOv5s feature visualization and recognition effect diagram: (<b>a</b>) original image; (<b>b</b>) the C3 model before SPPF; (<b>c</b>) SPPF; (<b>d</b>) input 1 of the Neck (PAN); (<b>e</b>) input 2 of the Neck (PAN); (<b>f</b>) input 3 of the Neck (PAN); (<b>g</b>) YOLOv5s detection diagram.</p>
Full article ">Figure 6
<p>YOLOv5s feature visualization and recognition effect diagram: (<b>a</b>) original image; (<b>b</b>) the C3 model before SPPF; (<b>c</b>) SPPF; (<b>d</b>) input 1 of the neck (PAN); (<b>e</b>) input 2 of the neck (PAN); (<b>f</b>) input 3 of the neck (PAN); (<b>g</b>) YOLOv5s detection diagram.</p>
Full article ">Figure 7
<p>The structures of C3 module, TransformerBlock, and the C3-TransformerBlock module.</p>
Full article ">Figure 8
<p>Added CBAM schematic diagram in the feature fusion network.</p>
Full article ">Figure 9
<p>Comparison of loss function effects: (<b>a</b>) original image; (<b>b</b>) IoU; (<b>c</b>) CIoU.</p>
Full article ">Figure 10
<p>The architecture of YOLO-TRCA.</p>
Full article ">Figure 11
<p>CFA: crew identity recognition process.</p>
Full article ">Figure 12
<p>Facial coordinate diagram.</p>
Full article ">Figure 13
<p>Yaw rotation.</p>
Full article ">Figure 14
<p>Diagram of the crew identity recognition process.</p>
Full article ">Figure 15
<p>Partial images of the dataset: (<b>a</b>) not wearing a life jacket: nolifevast; (<b>b</b>) smoke; (<b>c</b>) not wearing work clothes: notrainlifevast; (<b>d</b>) not wearing a shirt: nocoat; (<b>e</b>) normal: lifevast.</p>
Full article ">Figure 16
<p>Comparison of detection results: (<b>a</b>) original image; (<b>b</b>) original image; (<b>c</b>) original image; (<b>d</b>) original image; (<b>e</b>) original image; (<b>f</b>) YOLOv5s; (<b>g</b>) YOLOv5s; (<b>h</b>) YOLOv5s; (<b>i</b>) YOLOv5s; (<b>j</b>) YOLOv5s; (<b>k</b>) proposed method; (<b>l</b>) proposed method; (<b>m</b>) proposed method; (<b>n</b>) proposed method; (<b>o</b>) proposed method.</p>
Full article ">Figure 17
<p>Features visualization of the network: (<b>a</b>) original image; (<b>b</b>) C3; (<b>c</b>) before adding CBAM1; (<b>d</b>) before adding CBAM2; (<b>e</b>) before adding CBAM3; (<b>f</b>) original image; (<b>g</b>) C3-TransformerBlock; (<b>h</b>) after adding CBAM1; (<b>i</b>) after adding CBAM2; (<b>j</b>) after adding CBAM3; (<b>k</b>) original image; (<b>l</b>) C3; (<b>m</b>) before adding CBAM1; (<b>n</b>) before adding CBAM2; (<b>o</b>) before adding CBAM3; (<b>p</b>) original image; (<b>q</b>) C3-TransformerBlock; (<b>r</b>) after adding CBAM1; (<b>s</b>) after adding CBAM2; (<b>t</b>) after adding CBAM3.</p>
Full article ">Figure 18
<p>Marine equipment layout diagram: (<b>a</b>) overall picture; (<b>b</b>) partial view.</p>
Full article ">Figure 19
<p>Algorithm effect and software design: (<b>a</b>) abnormal behavior detection of crew members in monitoring; (<b>b</b>) abnormal behavior detection of crew members in monitoring; (<b>c</b>) capturing and identifying abnormal crew; (<b>d</b>) abnormal crew identification record; (<b>e</b>) abnormal crew identity recognition results; (<b>f</b>) abnormal crew identity recognition results.</p>
Full article ">
12 pages, 3195 KiB  
Article
Detection of Aortic Dissection and Intramural Hematoma in Non-Contrast Chest Computed Tomography Using a You Only Look Once-Based Deep Learning Model
by Yu-Seop Kim, Jae Guk Kim, Hyun Young Choi, Dain Lee, Jin-Woo Kong, Gu Hyun Kang, Yong Soo Jang, Wonhee Kim, Yoonje Lee, Jihoon Kim, Dong Geum Shin, Jae Keun Park, Gayoung Lee and Bitnarae Kim
J. Clin. Med. 2024, 13(22), 6868; https://doi.org/10.3390/jcm13226868 - 14 Nov 2024
Viewed by 254
Abstract
Background/Objectives: Aortic dissection (AD) and aortic intramural hematoma (IMH) are fatal diseases with similar clinical characteristics. Immediate computed tomography (CT) with a contrast medium is required to confirm the presence of AD or IMH. This retrospective study aimed to use CT images [...] Read more.
Background/Objectives: Aortic dissection (AD) and aortic intramural hematoma (IMH) are fatal diseases with similar clinical characteristics. Immediate computed tomography (CT) with a contrast medium is required to confirm the presence of AD or IMH. This retrospective study aimed to use CT images to differentiate AD and IMH from normal aorta (NA) using a deep learning algorithm. Methods: A 6-year retrospective study of non-contrast chest CT images was conducted at a university hospital in Seoul, Republic of Korea, from January 2016 to July 2021. The position of the aorta was analyzed in each CT image and categorized as NA, AD, or IMH. The images were divided into training, validation, and test sets in an 8:1:1 ratio. A deep learning model that can differentiate between AD and IMH from NA using non-contrast CT images alone, called YOLO (You Only Look Once) v4, was developed. The YOLOv4 model was used to analyze 8881 non-contrast CT images from 121 patients. Results: The YOLOv4 model can distinguish AD, IMH, and NA from each other simultaneously with a probability of over 92% using non-contrast CT images. Conclusions: This model can help distinguish AD and IMH from NA when applying a contrast agent is challenging. Full article
(This article belongs to the Section Nuclear Medicine & Radiology)
Show Figures

Figure 1

Figure 1
<p>YOLOv4 framework with CSP-Darknet53. AD—aortic dissection; CT—computed tomography; CSP—cross-stage partial; IMH—aortic intramural hematoma; NA—normal aorta; PAN—path aggregation network; SPP—spatial pyramid pooling; YOLOv3—You Only Look Once version 3.</p>
Full article ">Figure 2
<p>Flow diagram of the study. CT—computed tomography; AD—aortic dissection; IMH—aortic intramural hematoma; NA—normal aorta; YOLOv4—You Only Look Once version 4.</p>
Full article ">Figure 3
<p>Visualization of the prediction result (<b>a</b>) and test results with the confidence score of the YOLOv4 model (<b>b</b>). (<b>b-1</b>) AD; (<b>b-2</b>) NA; (<b>b-3</b>) IMH. AD—aortic dissection; IMH—aortic intramural hematoma; NA—normal aorta.</p>
Full article ">Figure 4
<p>Distribution of each class and detailed detection results. AD—aortic dissection; IMH—aortic intramural hematoma; NA—normal aorta.</p>
Full article ">Figure 5
<p>Average precision graph (P-R curve) for each class. AD—aortic dissection; <span class="html-italic">AP</span>—average precision; IMH—aortic intramural hematoma; NA—normal aorta.</p>
Full article ">Figure 6
<p>Results of the YOLOv4 training. CIOU—Complete Intersection over Union; CIOU loss curve graph; YOLOv4—You Only Look Once version 4.</p>
Full article ">
16 pages, 8101 KiB  
Article
Visual Prompt Selection Framework for Real-Time Object Detection and Interactive Segmentation in Augmented Reality Applications
by Eungyeol Song, Doeun Oh and Beom-Seok Oh
Appl. Sci. 2024, 14(22), 10502; https://doi.org/10.3390/app142210502 - 14 Nov 2024
Viewed by 284
Abstract
This study presents a novel visual prompt selection framework for augmented reality (AR) applications that integrates advanced object detection and image segmentation techniques. The framework is designed to enhance user interactions and improve the accuracy of foreground–background separation in AR environments, making AR [...] Read more.
This study presents a novel visual prompt selection framework for augmented reality (AR) applications that integrates advanced object detection and image segmentation techniques. The framework is designed to enhance user interactions and improve the accuracy of foreground–background separation in AR environments, making AR experiences more immersive and precise. We evaluated six state-of-the-art object detectors (DETR, DINO, CoDETR, YOLOv5, YOLOv8, and YOLO-NAS) in combination with a prompt segmentation model using the DAVIS 2017 validation dataset. The results show that the combination of YOLO-NAS-L and SAM achieved the best performance with a J&F score of 70%, while DINO-scale4-swin had the lowest score of 57.5%. This 12.5% performance gap highlights the significant contribution of user-provided regions of interest (ROIs) to segmentation outcomes, emphasizing the importance of interactive user input in enhancing accuracy. Our framework supports fast prompt processing and accurate mask generation, allowing users to refine digital overlays interactively, thereby improving both the quality of AR experiences and overall user satisfaction. Additionally, the framework enables the automatic detection of moving objects, providing a more efficient alternative to traditional manual selection interfaces in AR devices. This capability is particularly valuable in dynamic AR scenarios, where seamless user interaction is crucial. Full article
(This article belongs to the Section Robotics and Automation)
Show Figures

Figure 1

Figure 1
<p>An overview of interactive segmentation using input points to display the generated mask.</p>
Full article ">Figure 2
<p>An overview of interactive segmentation using input points. Input points are considered either positive or negative. Positive points are denoted by green dots on the left side of the figure, while negative points are represented as red dots. A pretrained backbone interprets the image, and a decoder generates a mask based on the input positive and negative points. The output displays the generated mask.</p>
Full article ">Figure 3
<p>The 360-degree view and mask results of a camel as a non-rigid object.</p>
Full article ">Figure 4
<p>A simple illustration of our pipeline with an example frame, where the features are two modules for object detection and segmentation. The object detector generates bounding boxes from the image. Subsequently, segmentation is applied to each box, generating an individual mask.</p>
Full article ">Figure 5
<p>The VPS (visual prompt selection) framework operates through the following pipeline: (<b>a</b>) An AR camera captures the real-world scene observed by the user, transmitting it to the system as a frame. (<b>b</b>) The object detector conducts localization and categorization, identifying objects within the image. The outputs are pairs of bounding boxes and labels. (<b>c</b>) These bounding boxes are processed in the prompt encoder, and together with image embeddings from the image encoder, the decoder generates masks. (<b>d</b>) A mask is generated for each object. (<b>e</b>) Using these final masks and label information, AR systems distinguish non-rigid objects from the background. This enables the placement of digital icons located in the background while keeping the display of non-rigid objects.</p>
Full article ">Figure 6
<p>The VPS (visual prompt selection) framework’s workflow with user interaction is as follows: (<b>a</b>) Real-world images are captured. (<b>b</b>) Object detection is performed, generating bounding boxes. (<b>c</b>) User interactivity is allowed with the introduction of the Segment Anything Model. User input is accepted through hand tapping or flicking, and this input is treated as a point. The coordinates are then passed to the prompt encoder. (<b>d</b>) SAM consists of three key components: an image encoder, a prompt encoder, and a mask decoder. The image encoder generates image embeddings, which are subsequently sent to the mask decoder. The prompt encoder in SAM can simultaneously process both boxes and points, enabling the specification of the intended object. The mask decoder uses image and prompt embeddings to generate an output mask. (<b>e</b>) The final output is a user-intent mask.</p>
Full article ">Figure 7
<p>A graph of the results on the DAVIS 2017 dataset, representing the model size of the detector and the <math display="inline"><semantics> <mi mathvariant="script">J</mi> </semantics></math>- and <math display="inline"><semantics> <mi mathvariant="script">F</mi> </semantics></math>-scores. All three YOLO-NAS models demonstrate superior performance, outperforming the other models in the experiments, as shown by the red line with triangle marks. Detectors utilizing the Swin Transformer or Deformable-DETR (see (b)) had substantially larger model sizes than those of the compared methods (see (a)). Other details are illustrated in <a href="#sec4dot2-applsci-14-10502" class="html-sec">Section 4.2</a>.</p>
Full article ">Figure 8
<p>The provided figure summarizes the exemplary results on the DAVIS 2017 dataset for the “horse-jump” and “judo” sequences. The outcomes are from four different models in the following order: DINO-5scale, CoDETR-R50, YOLOv8-S, and YOLO-NAS-L.</p>
Full article ">Figure 9
<p>An example of the camel (<b>left</b>) and goldfish (<b>right</b>) sequences in the DAVIS dataset. The common features have similarities between the object and the background.</p>
Full article ">Figure 10
<p>A segment of the “camel” results for each model. In order from the top, we have the results for DINO-4scale, DINO-4scale-swin, and CoDETR-swin. The final line corresponds to the annotations originally provided by DAVIS. Frames without boxes or masks indicate that nothing has been detected in the image.</p>
Full article ">
16 pages, 15828 KiB  
Article
Artificial Intelligence Vision Methods for Robotic Harvesting of Edible Flowers
by Fabio Taddei Dalla Torre, Farid Melgani, Ilaria Pertot and Cesare Furlanello
Plants 2024, 13(22), 3197; https://doi.org/10.3390/plants13223197 - 14 Nov 2024
Viewed by 236
Abstract
Edible flowers, with their increasing demand in the market, face a challenge in labor-intensive hand-picking practices, hindering their attractiveness for growers. This study explores the application of artificial intelligence vision for robotic harvesting, focusing on the fundamental elements: detection, pose estimation, and plucking [...] Read more.
Edible flowers, with their increasing demand in the market, face a challenge in labor-intensive hand-picking practices, hindering their attractiveness for growers. This study explores the application of artificial intelligence vision for robotic harvesting, focusing on the fundamental elements: detection, pose estimation, and plucking point estimation. The objective was to assess the adaptability of this technology across various species and varieties of edible flowers. The developed computer vision framework utilizes YOLOv5 for 2D flower detection and leverages the zero-shot capabilities of the Segmentation Anything Model for extracting points of interest from a 3D point cloud, facilitating 3D space flower localization. Additionally, we provide a pose estimation method, a key factor in plucking point identification. The plucking point is determined through a linear regression correlating flower diameter with the height of the plucking point. The results showed effective 2D detection. Further, the zero-shot and standard machine learning techniques employed achieved promising 3D localization, pose estimation, and plucking point estimation. Full article
(This article belongs to the Special Issue Advances in Artificial Intelligence for Plant Research)
Show Figures

Figure 1

Figure 1
<p>Sample images of the four edible flowers considered in the experiments.</p>
Full article ">Figure 2
<p>Acquisition cart: (<b>a</b>) 3D blueprint of the cart; (<b>b</b>) example of acquisition in the greenhouse.</p>
Full article ">Figure 3
<p>Jetson TX2 module (<b>a</b>) and Zed2i (<b>b</b>) integrated in the data acquisition setup.</p>
Full article ">Figure 4
<p>Comparison of marigold flower conditions: (<b>a</b>) July vs. (<b>b</b>) November.</p>
Full article ">Figure 5
<p>Examples of the different flowers in the D0 dataset from ImageNet and Kaggle.</p>
Full article ">Figure 6
<p>Main workflow for the AI-based vision pipeline. The dashed-green bounding boxes represent the three main modules, indicating inputs (black rectangles) and main implemented methods (black rhomboids).</p>
Full article ">Figure 7
<p>Flower pose estimation for steps. From left to right: input image, single flower cutout trough SAM, derived isolated point cloud in the 3D space, and PCA-based point cloud analysis.</p>
Full article ">Figure 8
<p>Marigold flowers: side view with pose vectors, top view with circumference and diameter, and height perspective.</p>
Full article ">Figure 9
<p>FLOLO outputs for different flowers: (<b>a</b>) snapdragon, (<b>b</b>) marigold, (<b>c</b>) viola (1st view), (<b>d</b>) viola (2nd view).</p>
Full article ">Figure 10
<p>Cropped close-up output from the preceding images (Mari = marigold, Snap = snapdragon). Not-ready-to-be-picked flowers are correctly not detected.</p>
Full article ">Figure 11
<p>SAM zero-shot segmentation for different flowers: (<b>a</b>) snapdragon, (<b>b</b>) marigold, (<b>c</b>) pansy (1st view), (<b>d</b>) pansy (2nd view).</p>
Full article ">Figure 11 Cont.
<p>SAM zero-shot segmentation for different flowers: (<b>a</b>) snapdragon, (<b>b</b>) marigold, (<b>c</b>) pansy (1st view), (<b>d</b>) pansy (2nd view).</p>
Full article ">Figure 12
<p>Isolated point clouds of (<b>a</b>) marigold and (<b>b</b>) snapdragon flowers and the perspectives vector component. For a closer view, see Figure 15, which also shows the estimated plucking points for each flower.</p>
Full article ">Figure 13
<p>Distribution plots for total flower diameter (<b>left</b>) and height (<b>right</b>). Dotted vertical lines indicate medians, horizontal solid line represents a smooth approximation of the distribution.</p>
Full article ">Figure 14
<p>Scatter plot of flower diameter against total flower height. Light solid lines depict the linear regression line, while darker lines represent the 85% upper boundaries.</p>
Full article ">Figure 15
<p>Isolated point clouds of (<b>a</b>) marigold and (<b>b</b>) snapdragon flowers and the respective vector components (red, green and blue vectors) and estimated plucking point (red squares).</p>
Full article ">
Back to TopTop