Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (17)

Search Parameters:
Keywords = multiscale key frames

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 5142 KiB  
Article
Adaptive Real-Time Tracking of Molten Metal Using Multi-Scale Features and Weighted Histograms
by Yifan Lei and Degang Xu
Electronics 2024, 13(15), 2905; https://doi.org/10.3390/electronics13152905 - 23 Jul 2024
Viewed by 550
Abstract
In this study, we study the tracking of the molten metal region in the dross removal process during metal ingot casting, and propose a real-time tracking method based on adaptive feature selection and weighted histogram. This research is highly significant in metal smelting, [...] Read more.
In this study, we study the tracking of the molten metal region in the dross removal process during metal ingot casting, and propose a real-time tracking method based on adaptive feature selection and weighted histogram. This research is highly significant in metal smelting, as efficient molten metal tracking is crucial for effective dross removal and ensuring the quality of metal ingots. Due to the influence of illumination and temperature in the tracking environment, it is difficult to extract suitable features for tracking molten metal during the metal pouring process using industrial cameras. We transform the images captured by the camera into a multi-scale feature space and select the features with the maximum distinction between the molten metal region and its surrounding background for tracking. Furthermore, we introduce a weighted histogram based on the pixel values of the target region into the mean-shift tracking algorithm to improve tracking accuracy. During the tracking process, the target model updates based on changes in the molten metal region across frames. Experimental tests confirm that this tracking method meets practical requirements, effectively addressing key challenges in molten metal tracking and providing reliable support for the dross removal process. Full article
(This article belongs to the Section Industrial Electronics)
Show Figures

Figure 1

Figure 1
<p>The installation position of the dross removal robot on the casting production line and a schematic diagram of its dross skimming operation.</p>
Full article ">Figure 2
<p>The dross removal process and tracking challenges in tracking molten metal area.</p>
Full article ">Figure 3
<p>Evaluating the separability between target and background classes.</p>
Full article ">Figure 4
<p>(<b>a</b>) A sample image with rectangular frames delineating molten metal and background samples. (<b>b</b>) Images produced by all 49 candidate features, rank-ordered by the variance ratio measure.</p>
Full article ">Figure 5
<p>(<b>a</b>) The tracking frame in selected feature space: —2R+2G. The frames 13 (<b>b</b>), 17 (<b>c</b>), 21 (<b>d</b>), 25 (<b>e</b>), and 29 (<b>f</b>) are shown.</p>
Full article ">Figure 6
<p>(<b>a</b>) The tracking frame in selected feature space: 2R—G—2B. The frames 155 (<b>b</b>), 159 (<b>c</b>), 163 (<b>d</b>), 167 (<b>e</b>), and 171 (<b>f</b>) are shown.</p>
Full article ">Figure 7
<p>In frame 527, when the tracking target’s upper edge meets the set <span class="html-italic">y</span>-value, a downward search along the <span class="html-italic">y</span>-axis locates the nearest new target region.</p>
Full article ">Figure 8
<p>Qualitative comparison of molten metal region tracking. We compared our method (red) with current state-of-the-art (SOTA) deep learning network tracking methods: DiMP (blue), KYS (yellow), and ToMP (green). From the experimental results, it can be seen that our method demonstrates better accuracy and robustness in tracking the molten metal region.</p>
Full article ">Figure 9
<p>Variation in Intersection over Union (IoU) values for four object tracking methods over a series of frames. The methods compared are our proposed method, DiMP, KYS, and ToMP.</p>
Full article ">
21 pages, 5041 KiB  
Article
DDEYOLOv9: Network for Detecting and Counting Abnormal Fish Behaviors in Complex Water Environments
by Yinjia Li, Zeyuan Hu, Yixi Zhang, Jihang Liu, Wan Tu and Hong Yu
Fishes 2024, 9(6), 242; https://doi.org/10.3390/fishes9060242 - 20 Jun 2024
Cited by 1 | Viewed by 1878
Abstract
Accurately detecting and counting abnormal fish behaviors in aquaculture is essential. Timely detection allows farmers to take swift action to protect fish health and prevent economic losses. This paper proposes an enhanced high-precision detection algorithm based on YOLOv9, named DDEYOLOv9, to facilitate the [...] Read more.
Accurately detecting and counting abnormal fish behaviors in aquaculture is essential. Timely detection allows farmers to take swift action to protect fish health and prevent economic losses. This paper proposes an enhanced high-precision detection algorithm based on YOLOv9, named DDEYOLOv9, to facilitate the detection and counting of abnormal fish behavior in industrial aquaculture environments. To address the lack of publicly available datasets on abnormal behavior in fish, we created the “Abnormal Behavior Dataset of Takifugu rubripes”, which includes five categories of fish behaviors. The detection algorithm was further enhanced in several key aspects. Firstly, the DRNELAN4 feature extraction module was introduced to replace the original RepNCSPELAN4 module. This change improves the model’s detection accuracy for high-density and occluded fish in complex water environments while reducing the computational cost. Secondly, the proposed DCNv4-Dyhead detection head enhances the model’s multi-scale feature learning capability, effectively recognizes various abnormal fish behaviors, and improves the computational speed. Lastly, to address the issue of sample imbalance in the abnormal fish behavior dataset, we propose EMA-SlideLoss, which enhances the model’s focus on hard samples, thereby improving the model’s robustness. The experimental results demonstrate that the DDEYOLOv9 model achieves high Precision, Recall, and mean Average Precision (mAP) on the “Abnormal Behavior Dataset of Takifugu rubripes”, with values of 91.7%, 90.4%, and 94.1%, respectively. Compared to the YOLOv9 model, these metrics are improved by 5.4%, 5.5%, and 5.4%, respectively. The model also achieves a running speed of 119 frames per second (FPS), which is 45 FPS faster than YOLOv9. Experimental results show that the DDEYOLOv9 algorithm can accurately and efficiently identify and quantify abnormal fish behaviors in specific complex environments. Full article
(This article belongs to the Special Issue AI and Fisheries)
Show Figures

Figure 1

Figure 1
<p>Image acquisition.</p>
Full article ">Figure 2
<p>Abnormal behavior of Takifugu rubripes (framed fish with abnormal behavior).</p>
Full article ">Figure 3
<p>Sample distribution of the abnormal behavior dataset of <span class="html-italic">Takifugu rubripes</span>.</p>
Full article ">Figure 4
<p>Structure diagram of the DDEYOLOv9 model. SPPELAN stands for Spatial Pyramid Pooling with Enhanced Local Attention Network. This block plays a crucial role in our model by enhancing feature extraction and improving the accuracy of abnormal behavior detection in fish. Through the cooperative work of multiple sub-modules, the DRNELAN4 module can more effectively extract the fish characteristics in the input image in complex water environments. ADown is the convolutional block of down-sampling operation, which is used to reduce the feature map spatial dimension. It helps the model to capture the features of the image at a higher level while reducing the amount of computation.</p>
Full article ">Figure 5
<p>Dilated Reparam Block. A dilated small kernel conv layer is used to augment the non-dilated large kernel conv layer. From a parametric point of view, this dilated layer is equivalent to a non-dilated conv layer with a larger sparse kernel, so that the whole block can be equivalently transformed into a single large kernel conv.</p>
Full article ">Figure 6
<p>Comparison of improved DRNELAN4 and RepNCSPELAN4 modules.</p>
Full article ">Figure 7
<p>The core operation of spatial aggregation of query pixels at different locations in the same channel in DCNv4. DCNv4 combines DCNv3’s use of dynamic weights to aggregate spatial features and convolution’s flexible unbounded values for aggregate weights.</p>
Full article ">Figure 8
<p>Structure of DCNv4-Dyhead.</p>
Full article ">Figure 9
<p>An illustration of the DCNv4-Dyhead approach.</p>
Full article ">Figure 10
<p>Comparison of the learning curves of the training dataset before and after improvement. (<b>a</b>) shows the <math display="inline"><semantics> <mrow> <mi>E</mi> <mi>p</mi> <mi>o</mi> <mi>c</mi> <mi>h</mi> <mi>s</mi> <mo>−</mo> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math> curves of YOLOv9 and DDEYOLOv9 models. (<b>b</b>) shows the curve of <math display="inline"><semantics> <mrow> <mi>E</mi> <mi>p</mi> <mi>o</mi> <mi>c</mi> <mi>h</mi> <mi>s</mi> <mo>−</mo> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math>; (<b>c</b>) shows the plot of the <math display="inline"><semantics> <mrow> <mi>E</mi> <mi>p</mi> <mi>o</mi> <mi>c</mi> <mi>h</mi> <mi>s</mi> <mo>−</mo> <mi>m</mi> <mi>A</mi> <mi>P</mi> </mrow> </semantics></math>.</p>
Full article ">Figure 11
<p>Comparison of accuracy before and after improvement. (<b>a</b>) shows the bar graph of <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math> comparison for the six behavioral categories of the shoal; (<b>b</b>) shows the <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math> comparison bar charts for the six behaviors; (<b>c</b>) presents the <math display="inline"><semantics> <mrow> <mi>m</mi> <mi>A</mi> <mi>P</mi> </mrow> </semantics></math> versus bar charts for the five behaviors.</p>
Full article ">Figure 12
<p>Renderings of the detection of abnormal behaviors of fish in different abnormal environments ((<b>a</b>) YOLOv9 has false detection, and (<b>b</b>) YOLOv9 has missed detection).</p>
Full article ">Figure 13
<p>Performance comparisons. (<b>a</b>–<b>c</b>) show the <math display="inline"><semantics> <mrow> <mi>E</mi> <mi>p</mi> <mi>o</mi> <mi>c</mi> <mi>h</mi> <mi>s</mi> <mo>−</mo> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>E</mi> <mi>p</mi> <mi>o</mi> <mi>c</mi> <mi>h</mi> <mi>s</mi> <mo>−</mo> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <mi>E</mi> <mi>p</mi> <mi>o</mi> <mi>c</mi> <mi>h</mi> <mi>s</mi> <mo>−</mo> <mi>m</mi> <mi>A</mi> <mi>P</mi> </mrow> </semantics></math> curves of the six models respectively.</p>
Full article ">Figure 13 Cont.
<p>Performance comparisons. (<b>a</b>–<b>c</b>) show the <math display="inline"><semantics> <mrow> <mi>E</mi> <mi>p</mi> <mi>o</mi> <mi>c</mi> <mi>h</mi> <mi>s</mi> <mo>−</mo> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>E</mi> <mi>p</mi> <mi>o</mi> <mi>c</mi> <mi>h</mi> <mi>s</mi> <mo>−</mo> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <mi>E</mi> <mi>p</mi> <mi>o</mi> <mi>c</mi> <mi>h</mi> <mi>s</mi> <mo>−</mo> <mi>m</mi> <mi>A</mi> <mi>P</mi> </mrow> </semantics></math> curves of the six models respectively.</p>
Full article ">
19 pages, 4123 KiB  
Article
Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models
by Mohammad D. Alahmadi and Moayad Alshangiti
Mathematics 2024, 12(7), 1036; https://doi.org/10.3390/math12071036 - 30 Mar 2024
Cited by 1 | Viewed by 1541
Abstract
The rapid evolution of video programming tutorials as a key educational resource has highlighted the need for effective code extraction methods. These tutorials, varying widely in video quality, present a challenge for accurately transcribing the embedded source code, crucial for learning and software [...] Read more.
The rapid evolution of video programming tutorials as a key educational resource has highlighted the need for effective code extraction methods. These tutorials, varying widely in video quality, present a challenge for accurately transcribing the embedded source code, crucial for learning and software development. This study investigates the impact of video quality on the performance of optical character recognition (OCR) engines and the potential of large language models (LLMs) to enhance code extraction accuracy. Our comprehensive empirical analysis utilizes a rich dataset of programming screencasts, involving manual transcription of source code and the application of both traditional OCR engines, like Tesseract and Google Vision, and advanced LLMs, including GPT-4V and Gemini. We investigate the efficacy of image super-resolution (SR) techniques, namely, enhanced deep super-resolution (EDSR) and multi-scale deep super-resolution (MDSR), in improving the quality of low-resolution video frames. The findings reveal significant improvements in OCR accuracy with the use of SR, particularly at lower resolutions such as 360p. LLMs demonstrate superior performance across all video qualities, indicating their robustness and advanced capabilities in diverse scenarios. This research contributes to the field of software engineering by offering a benchmark for code extraction from video tutorials and demonstrating the substantial impact of SR techniques and LLMs in enhancing the readability and reusability of code from these educational resources. Full article
Show Figures

Figure 1

Figure 1
<p>An overview of our empirical study on OCR and LLM accuracy across different video programming qualities using super-resolution techniques.</p>
Full article ">Figure 2
<p>Visual representation of images with varying resolutions within our Python dataset, spanning from 360p to 1080p. These images showcase the diverse quality levels found in our dataset, reflecting the range of available resolutions.</p>
Full article ">Figure 3
<p>Boxplots showing how well OCRs and LLMs worked on different image qualities, measured by NLD scores.</p>
Full article ">Figure 4
<p>Boxplots showing how well OCRs and LLMs worked on different image qualities, measured by NLD-Token scores on different programming languages.</p>
Full article ">Figure 5
<p>Boxplots showing how well OCRs and LLMs worked on different image qualities, measured by NLD-Token scores on pre-processed images using super-resolution.</p>
Full article ">Figure 6
<p>A sample of Python code images with a 360p resolution processed using EDSR-×2 and EDSR-×4 as part of our super-resolution techniques.</p>
Full article ">
24 pages, 8939 KiB  
Article
YOLOv7-GCA: A Lightweight and High-Performance Model for Pepper Disease Detection
by Xuejun Yue, Haifeng Li, Qingkui Song, Fanguo Zeng, Jianyu Zheng, Ziyu Ding, Gaobi Kang, Yulin Cai, Yongda Lin, Xiaowan Xu and Chaoran Yu
Agronomy 2024, 14(3), 618; https://doi.org/10.3390/agronomy14030618 - 19 Mar 2024
Viewed by 1468
Abstract
Existing disease detection models for deep learning-based monitoring and prevention of pepper diseases face challenges in accurately identifying and preventing diseases due to inter-crop occlusion and various complex backgrounds. To address this issue, we propose a modified YOLOv7-GCA model based on YOLOv7 for [...] Read more.
Existing disease detection models for deep learning-based monitoring and prevention of pepper diseases face challenges in accurately identifying and preventing diseases due to inter-crop occlusion and various complex backgrounds. To address this issue, we propose a modified YOLOv7-GCA model based on YOLOv7 for pepper disease detection, which can effectively overcome these challenges. The model introduces three key enhancements: Firstly, lightweight GhostNetV2 is used as the feature extraction network of the model to improve the detection speed. Secondly, the Cascading fusion network (CFNet) replaces the original feature fusion network, which improves the expression ability of the model in complex backgrounds and realizes multi-scale feature extraction and fusion. Finally, the Convolutional Block Attention Module (CBAM) is introduced to focus on the important features in the images and improve the accuracy and robustness of the model. This study uses the collected dataset, which was processed to construct a dataset of 1259 images with four types of pepper diseases: anthracnose, bacterial diseases, umbilical rot, and viral diseases. We applied data augmentation to the collected dataset, and then experimental verification was carried out on this dataset. The experimental results demonstrate that the YOLOv7-GCA model reduces the parameter count by 34.3% compared to the YOLOv7 original model while improving 13.4% in mAP and 124 frames/s in detection speed. Additionally, the model size was reduced from 74.8 MB to 46.9 MB, which facilitates the deployment of the model on mobile devices. When compared to the other seven mainstream detection models, it was indicated that the YOLOv7-GCA model achieved a balance between speed, model size, and accuracy. This model proves to be a high-performance and lightweight pepper disease detection solution that can provide accurate and timely diagnosis results for farmers and researchers. Full article
Show Figures

Figure 1

Figure 1
<p>Example diagram of data augmentation: (<b>a</b>) Original image; (<b>b</b>) Contrast data augmentation; (<b>c</b>) Cutout data augmentation; (<b>d</b>) Rotation; (<b>e</b>) Kernel Filters; (<b>f</b>) Add salt-and-pepper noise noise; (<b>g</b>) Scaling; (<b>h</b>) Random cropping; (<b>i</b>) Mosaic data augmentation.</p>
Full article ">Figure 2
<p>The network structure of the original YOLOv7.</p>
Full article ">Figure 3
<p>DFC mechanism and GhostNetV2 module. Mul is feature map multiplication. Add is feature map addition.</p>
Full article ">Figure 4
<p>GhostNetV2 information aggregation process diagram.</p>
Full article ">Figure 5
<p>CBAM algorithm implementation flowchart.</p>
Full article ">Figure 6
<p>The network structure of CFNet.</p>
Full article ">Figure 7
<p>Illustration of the CIoU loss formula.</p>
Full article ">Figure 8
<p>YOLOv7-GCA Network architecture.</p>
Full article ">Figure 9
<p>Results of the PR plots in the YOLOv7 (<b>a</b>) and YOLOv7-GCA (<b>b</b>) models.</p>
Full article ">Figure 10
<p>Recognition effect analysis: (<b>a</b>,<b>d</b>,<b>g</b>) are the labeled results; (<b>b</b>,<b>e</b>,<b>h</b>) are the YOLOv7 detection results; (<b>c</b>,<b>f</b>,<b>i</b>) are the YOLOv7-GCA detection results.</p>
Full article ">Figure 11
<p>The mAP (<b>a</b>) and training loss (<b>b</b>) of the ablation experiments.</p>
Full article ">Figure 12
<p>Confusion matrix of YOLOv7 (<b>a</b>) and YOLOv7-GCA (<b>b</b>) to identify results.</p>
Full article ">Figure 13
<p>Flowchart of deployment process on Android terminal.</p>
Full article ">Figure 14
<p>Effective picture for pepper disease detection: (<b>a</b>) anthracnose; (<b>b</b>) umbilical rot diseases; (<b>c</b>) viral diseases.</p>
Full article ">
15 pages, 4905 KiB  
Article
Transformer-Based Cascading Reconstruction Network for Video Snapshot Compressive Imaging
by Jiaxuan Wen, Junru Huang, Xunhao Chen, Kaixuan Huang and Yubao Sun
Appl. Sci. 2023, 13(10), 5922; https://doi.org/10.3390/app13105922 - 11 May 2023
Cited by 2 | Viewed by 1415
Abstract
Video Snapshot Compressive Imaging (SCI) is a new imaging method based on compressive sensing. It encodes image sequences into a single snapshot measurement and then recovers the original high-speed video through reconstruction algorithms, which has the advantages of a low hardware cost and [...] Read more.
Video Snapshot Compressive Imaging (SCI) is a new imaging method based on compressive sensing. It encodes image sequences into a single snapshot measurement and then recovers the original high-speed video through reconstruction algorithms, which has the advantages of a low hardware cost and high imaging efficiency. How to construct an efficient algorithm is the key problem of video SCI. Although the current mainstream deep convolution network reconstruction methods can directly learn the inverse reconstruction mapping, they still have shortcomings in the representation of the complex spatiotemporal content of video scenes and the modeling of long-range contextual correlation. The quality of reconstruction still needs to be improved. To solve this problem, we propose a Transformer-based Cascading Reconstruction Network for Video Snapshot Compressive Imaging. In terms of the long-range correlation matching in the Transformer, the proposed network can effectively capture the spatiotemporal correlation of video frames for reconstruction. Specifically, according to the residual measurement mechanism, the reconstruction network is configured as a cascade of two stages: overall structure reconstruction and incremental details reconstruction. In the first stage, a multi-scale Transformer module is designed to extract the long-range multi-scale spatiotemporal features and reconstruct the overall structure. The second stage takes the measurement of the first stage as the input and employs a dynamic fusion module to adaptively fuse the output features of the two stages so that the cascading network can effectively represent the content of complex video scenes and reconstruct more incremental details. Experiments on simulation and real datasets show that the proposed method can effectively improve the reconstruction accuracy, and ablation experiments also verify the validity of the constructed network modules. Full article
Show Figures

Figure 1

Figure 1
<p>Schematic diagram of video snapshot compressive imaging.</p>
Full article ">Figure 2
<p>The diagram of the Transformer-based Cascading Reconstruction Network for Video Compressive Snapshot Imaging.</p>
Full article ">Figure 3
<p>The diagram of the multi-scale Transformer network for overall structure reconstruction.</p>
Full article ">Figure 4
<p>The diagram of the dynamic fusion Transformer network for incremental details reconstruction.</p>
Full article ">Figure 5
<p>The reconstruction results of two stages in our network: (<b>a</b>) Overall structure reconstruction; (<b>b</b>) Incremental details reconstruction; (<b>c</b>) Final reconstruction.</p>
Full article ">Figure 6
<p>Reconstructed frames of six simulation datasets by different methods (the left side is the Ground Truth, and the right side is the reconstruction result of each method). The sequence numbers of the selected frames of each dataset are Aerial #5, Crash #24, Drop #4, Kobe #6, Runner #1 and Traffic #18.</p>
Full article ">Figure 7
<p>Reconstruction results of different methods on the real dataset Wheel. (The red boxes are enlarged detail images.)</p>
Full article ">
15 pages, 4694 KiB  
Article
CenterPNets: A Multi-Task Shared Network for Traffic Perception
by Guangqiu Chen, Tao Wu, Jin Duan, Qi Hu, Dandan Huang and Hao Li
Sensors 2023, 23(5), 2467; https://doi.org/10.3390/s23052467 - 23 Feb 2023
Cited by 2 | Viewed by 1844
Abstract
The importance of panoramic traffic perception tasks in autonomous driving is increasing, so shared networks with high accuracy are becoming increasingly important. In this paper, we propose a multi-task shared sensing network, called CenterPNets, that can perform the three major detection tasks of [...] Read more.
The importance of panoramic traffic perception tasks in autonomous driving is increasing, so shared networks with high accuracy are becoming increasingly important. In this paper, we propose a multi-task shared sensing network, called CenterPNets, that can perform the three major detection tasks of target detection, driving area segmentation, and lane detection in traffic sensing in one go and propose several key optimizations to improve the overall detection performance. First, this paper proposes an efficient detection head and segmentation head based on a shared path aggregation network to improve the overall reuse rate of CenterPNets and an efficient multi-task joint training loss function to optimize the model. Secondly, the detection head branch uses an anchor-free frame mechanism to automatically regress target location information to improve the inference speed of the model. Finally, the split-head branch fuses deep multi-scale features with shallow fine-grained features, ensuring that the extracted features are rich in detail. CenterPNets achieves an average detection accuracy of 75.8% on the publicly available large-scale Berkeley DeepDrive dataset, with an intersection ratio of 92.8% and 32.1% for driveableareas and lane areas, respectively. Therefore, CenterPNets is a precise and effective solution to the multi-tasking detection issue. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

Figure 1
<p>HybridNets Architecture has one encoder: backbone network and neck network;two decoders: Detection Head and Segmentation Head.</p>
Full article ">Figure 2
<p>Illustration of the detection head branching process.</p>
Full article ">Figure 3
<p>Illustration of the branching process of the segmented head.</p>
Full article ">Figure 4
<p>Target detection visualization comparison results. (<b>a</b>) YOLOP, (<b>b</b>) HybridNets, (<b>c</b>) CenterPNets.</p>
Full article ">Figure 5
<p>Comparative results of the visualization of the driveable area segmentation. (<b>a</b>) YOLOP, (<b>b</b>) HybridNets, (<b>c</b>) CenterPNets.</p>
Full article ">Figure 6
<p>Comparison of lane split visualization results. (<b>a</b>) YOLOP, (<b>b</b>) HybridNets, (<b>c</b>) CenterPNets.</p>
Full article ">Figure 6 Cont.
<p>Comparison of lane split visualization results. (<b>a</b>) YOLOP, (<b>b</b>) HybridNets, (<b>c</b>) CenterPNets.</p>
Full article ">Figure 7
<p>CenterPNets multi-tasking results.</p>
Full article ">
13 pages, 2627 KiB  
Article
Method for Segmentation of Litchi Branches Based on the Improved DeepLabv3+
by Jiaxing Xie, Tingwei Jing, Binhan Chen, Jiajun Peng, Xiaowei Zhang, Peihua He, Huili Yin, Daozong Sun, Weixing Wang, Ao Xiao, Shilei Lyu and Jun Li
Agronomy 2022, 12(11), 2812; https://doi.org/10.3390/agronomy12112812 - 11 Nov 2022
Cited by 8 | Viewed by 2046
Abstract
It is necessary to develop automatic picking technology to improve the efficiency of litchi picking, and the accurate segmentation of litchi branches is the key that allows robots to complete the picking task. To solve the problem of inaccurate segmentation of litchi branches [...] Read more.
It is necessary to develop automatic picking technology to improve the efficiency of litchi picking, and the accurate segmentation of litchi branches is the key that allows robots to complete the picking task. To solve the problem of inaccurate segmentation of litchi branches under natural conditions, this paper proposes a segmentation method for litchi branches based on the improved DeepLabv3+, which replaced the backbone network of DeepLabv3+ and used the Dilated Residual Networks as the backbone network to enhance the model’s feature extraction capability. During the training process, a combination of Cross-Entropy loss and the dice coefficient loss was used as the loss function to cause the model to pay more attention to the litchi branch area, which could alleviate the negative impact of the imbalance between the litchi branches and the background. In addition, the Coordinate Attention module is added to the atrous spatial pyramid pooling, and the channel and location information of the multi-scale semantic features acquired by the network are simultaneously considered. The experimental results show that the model’s mean intersection over union and mean pixel accuracy are 90.28% and 94.95%, respectively, and the frames per second (FPS) is 19.83. Compared with the classical DeepLabv3+ network, the model’s mean intersection over union and mean pixel accuracy are improved by 13.57% and 15.78%, respectively. This method can accurately segment litchi branches, which provides powerful technical support to help litchi-picking robots find branches. Full article
(This article belongs to the Special Issue Precision Operation Technology and Intelligent Equipment in Farmland)
Show Figures

Figure 1

Figure 1
<p>A comparison of ResNet18 and DRN-C-26. Each rectangle in the figure represents a Conv-BN-ReLU combination. The number in the rectangle indicates the size of the convolution kernel and the number of output channels. H × W indicates the height and width of the feature map.</p>
Full article ">Figure 2
<p>A comparison of DRN-D-22 and DRN-C-26. The DRN is divided into eight stages, and each stage outputs identically-sized feature maps and uses the same expansion coefficient. Each rectangle in the figure represents a Conv-BN-ReLU combination. The number in the rectangle indicates the size of the convolution kernel and the number of output channels. H × W is the height and width of the feature map, and the green lines represent downsampling by a stride of two.</p>
Full article ">Figure 3
<p>The coordinate attention mechanism.</p>
Full article ">Figure 4
<p>The improved DeepLabv3+ network structure.</p>
Full article ">Figure 5
<p>A comparison of the <span class="html-italic">mIoU</span> curves for transfer learning.</p>
Full article ">Figure 6
<p><span class="html-italic">mIoU</span> curves of the ablation experiment.</p>
Full article ">Figure 7
<p>A comparison of the network prediction effects.</p>
Full article ">
17 pages, 4355 KiB  
Article
Research on the Symbolic 3D Route Scene Expression Method Based on the Importance of Objects
by Fulin Han, Liang Huo, Tao Shen, Xiaoyong Zhang, Tianjia Zhang and Na Ma
Appl. Sci. 2022, 12(20), 10532; https://doi.org/10.3390/app122010532 - 19 Oct 2022
Viewed by 1438
Abstract
In the study of 3D route scene construction, the expression of key targets needs to be highlighted. This is because compared with the 3D model, the abstract 3D symbols can reflect the number and spatial distribution characteristics of entities more intuitively. Therefore, this [...] Read more.
In the study of 3D route scene construction, the expression of key targets needs to be highlighted. This is because compared with the 3D model, the abstract 3D symbols can reflect the number and spatial distribution characteristics of entities more intuitively. Therefore, this research proposes a symbolic 3D route scene representation method based on the importance of the object. The method takes the object importance evaluation model as the theoretical basis, calculates the spatial importance of the same type of objects according to the spatial characteristics of the geographical objects in the 3D route scene, and constructs the object importance evaluation model by combining semantic factors. The 3D symbols are then designed in a hierarchical manner on the basis of the results of the object importance evaluation and the CityGML standard. Finally, the LOD0-LOD4 symbolic 3D railway scene was constructed on the basis of a railroad data to realise the multi-scale expression of symbolic 3D route scene. Compared with the conventional loading method, the real-time frame rate of the scene was improved by 20 fps and was more stable. The scene loading speed was also improved by 5–10 s. The results show that the method can effectively improve the efficiency of the 3D route scene construction and the prominent expression effect of the key objects in the 3D route scene. Full article
(This article belongs to the Special Issue State-of-the-Art Earth Sciences and Geography in China)
Show Figures

Figure 1

Figure 1
<p>A technological roadmap for the construction of symbolic 3D route scenes according to the importance of objects.</p>
Full article ">Figure 2
<p>Road junction and corner importance target: Assuming that the importance of each road <span class="html-italic">CI</span> is 1, the <span class="html-italic">CIT</span> quantitative calculations are as follows. (<b>a</b>) Four roads intersect at a point, and each road is spatially connected to the junction, so the junction <span class="html-italic">CIT</span> = 2 + 2 + 2 + 2 = 8. (<b>b</b>) Two roads intersect at a single point, and each road crosses the junction in spatial relationship; therefore, the junction <span class="html-italic">CIT</span> = 3.5 + 3.5 = 7. (<b>c</b>) Two roads intersect at a point, one road crosses the junction, and the other road joins the junction, so the junction <span class="html-italic">CIT</span> = 3.5 + 2 = 5.5. (<b>d</b>) A corner of the road itself, so that the corner <span class="html-italic">CIT</span> = 2.</p>
Full article ">Figure 3
<p>Flowchart: 3D symbol classification and hierarchy.</p>
Full article ">Figure 4
<p>Natural classification of geographic entities for 3D route scenes. It has combined specific route scenes in reality according to relevant specifications. A classification of the geographical entities that may appear in the construction of 3D route scenes was made.</p>
Full article ">Figure 5
<p>The five LODs of CityGML: (<b>a</b>) LOD0: 2D symbol; (<b>b</b>) LOD1: simple geometry; (<b>c</b>) LOD2: simple geometry combinations; (<b>d</b>) LOD3: geometry with realistic textures; (<b>e</b>) LOD4: real internal structure.</p>
Full article ">Figure 6
<p>Range of experimental data. It contains a junction and three corners in two roads.</p>
Full article ">Figure 7
<p>Three-dimensional symbol modelling: (<b>a</b>) bridge model; (<b>b</b>) tunnel model; (<b>c</b>) signal machine model.</p>
Full article ">Figure 8
<p>Multi-scale representation of symbolic 3D railway scenes: (<b>a</b>) LOD0: Only geographic entities with object importance level 5 were loaded, containing stations in the route and tunnels at some junctions. The symbolic accuracy was at the lowest level of detail. (<b>b</b>) LOD1: Three-dimensional symbols of tunnels and some bridges are present in the scene. The accuracy of the 3D symbols at this level was improved compared to the previous level. (<b>c</b>) LOD2: In addition to the previously loaded stations, tunnels, and bridges, at this level of detail, some of the roadbeds out of junctions and turning points were loaded. The roadbeds were generally connected to the bridges. The 3D symbols at this level of detail had a clearer outline and colour. (<b>d</b>) LOD3: At this level, all the roadbeds are shown. The whole road was fully loaded, consisting of the tunnel, bridge, and roadbed stitched together. The 3D symbol structure at this scale was complete and well textured, especially the station model. (<b>e</b>) LOD4: At the highest level of detail scenes, all 3D symbols were at their highest accuracy. The 3D symbols of the signalling machines on the route were also loaded. These symbols were reproduced to the greatest extent possible for the real scene.</p>
Full article ">Figure 9
<p>Quantitative comparative analysis: (<b>a</b>) The symbolic 3D railway scene data constructed by this research method had a higher frame rate after loading. Scene fluency was optimised. The user experience was improved. (<b>b</b>) Under the multi-scale constraint of this research method, the loading of geographic entities with focused features was faster. Moreover, the advantage became more obvious as the LOD level gradually increased. It improved the perceived depth of the user.</p>
Full article ">
18 pages, 4630 KiB  
Article
A Method for Obtaining the Number of Maize Seedlings Based on the Improved YOLOv4 Lightweight Neural Network
by Jiaxin Gao, Feng Tan, Jiapeng Cui and Bo Ma
Agriculture 2022, 12(10), 1679; https://doi.org/10.3390/agriculture12101679 - 12 Oct 2022
Cited by 8 | Viewed by 1717
Abstract
Obtaining the number of plants is the key to evaluating the effect of maize mechanical sowing, and is also a reference for subsequent statistics on the number of missing seedlings. When the existing model is used for plant number detection, the recognition accuracy [...] Read more.
Obtaining the number of plants is the key to evaluating the effect of maize mechanical sowing, and is also a reference for subsequent statistics on the number of missing seedlings. When the existing model is used for plant number detection, the recognition accuracy is low, the model parameters are large, and the single recognition area is small. This study proposes a method for detecting the number of maize seedlings based on an improved You Only Look Once version 4 (YOLOv4) lightweight neural network. First, the method uses the improved Ghostnet as the model feature extraction network, and successively introduces the attention mechanism and k-means clustering algorithm into the model, thereby improving the detection accuracy of the number of maize seedlings. Second, using depthwise separable convolutions instead of ordinary convolutions makes the network more lightweight. Finally, the multi-scale feature fusion network structure is improved to further reduce the total number of model parameters, pre-training with transfer learning to obtain the optimal model for prediction on the test set. The experimental results show that the harmonic mean, recall rate, average precision and accuracy rate of the model on all test sets are 0.95%, 94.02%, 97.03% and 96.25%, respectively, the model network parameters are 18.793 M, the model size is 71.690 MB, and frames per second (FPS) is 22.92. The research results show that the model has high recognition accuracy, fast recognition speed, and low model complexity, which can provide technical support for corn management at the seedling stage. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>Study area: (<b>a</b>) experimental-area location; (<b>b</b>) splicing diagram of test field.</p>
Full article ">Figure 2
<p>Structure diagram of ghost bottlenecks.</p>
Full article ">Figure 3
<p>Improved ghost module.</p>
Full article ">Figure 4
<p>Depthwise convolution.</p>
Full article ">Figure 5
<p>Pointwise convolution.</p>
Full article ">Figure 6
<p>Structure of CBAM.</p>
Full article ">Figure 7
<p>Improved multi-scale feature fusion network structure.</p>
Full article ">Figure 8
<p>Flow chart of seedling number detection.</p>
Full article ">Figure 9
<p>The curve of the loss value changing with the number of iterations.</p>
Full article ">Figure 10
<p>Detection results of different models: (<b>a</b>) YOLOv4 test results; (<b>b</b>) Improved YOLOv4 lightweight networks test results; (<b>c</b>) Mbilenetv1-YOLOv4 test results; (<b>d</b>) Mobilenetv3-YOLOv4 test results; (<b>e</b>) Densenet121-YOLOv4 test results; (<b>f</b>) Vgg-YOLOv4 test results.</p>
Full article ">
16 pages, 2701 KiB  
Article
Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection
by Zhenyu Zhai, Qiantong Wang, Zongxu Pan, Zhentong Gao and Wenlong Hu
Sensors 2022, 22(19), 7473; https://doi.org/10.3390/s22197473 - 2 Oct 2022
Cited by 7 | Viewed by 3113
Abstract
Continuous frames of point-cloud-based object detection is a new research direction. Currently, most research studies fuse multi-frame point clouds using concatenation-based methods. The method aligns different frames by using information on GPS, IMU, etc. However, this fusion method can only align static objects [...] Read more.
Continuous frames of point-cloud-based object detection is a new research direction. Currently, most research studies fuse multi-frame point clouds using concatenation-based methods. The method aligns different frames by using information on GPS, IMU, etc. However, this fusion method can only align static objects and not moving objects. In this paper, we proposed a non-local-based multi-scale feature fusion method, which can handle both moving and static objects without GPS- and IMU-based registrations. Considering that non-local methods are resource-consuming, we proposed a novel simplified non-local block based on the sparsity of the point cloud. By filtering out empty units, memory consumption decreased by 99.93%. In addition, triple attention is adopted to enhance the key information on the object and suppresses background noise, further benefiting non-local-based feature fusion methods. Finally, we verify the method based on PointPillars and CenterPoint. Experimental results show that the mAP of the proposed method improved by 3.9% and 4.1% in mAP compared with concatenation-based fusion modules, PointPillars-2 and CenterPoint-2, respectively. In addition, the proposed network outperforms powerful 3D-VID by 1.2% in mAP. Full article
(This article belongs to the Special Issue Artificial Intelligence and Smart Sensors for Autonomous Driving)
Show Figures

Figure 1

Figure 1
<p>Multiple frames are concatenated into one frame by registration. The black dashed box marks the area where motion blur occurs.</p>
Full article ">Figure 2
<p>The overall framework of our proposed multi-frame fusion method.</p>
Full article ">Figure 3
<p>The overall process of grid-based point cloud encoder.</p>
Full article ">Figure 4
<p>Feature extraction and fusion network. The 0th layer is the pseudo-image, which is generated by the point cloud encoder.</p>
Full article ">Figure 5
<p>Non-local module. The blue symbols represent 1 × 1 convolutions, the orange symbols represent matrix multiplication, and the green symbols represent element-wise addition.</p>
Full article ">Figure 6
<p>The correlation matrix calculation of index-nonlocal module. In the feature map and similarity calculation stage, grids with color represent non-empty units. The classes of color represent the similarity of feature points. In the correlation matrix, the gray level represents the relevance among feature points.</p>
Full article ">Figure 7
<p>The position in which triple attention is applied.</p>
Full article ">Figure 8
<p>The relationship between keyframes and intermediate frames. The red line represents the point cloud frames used in this study.</p>
Full article ">Figure 9
<p>Comparison between MFFFNet and CenterPoint-2. Line (<b>a</b>,<b>b</b>) indicate two different scenes. The first column is the ground truth. The second and third columns are the detection results of the CenterPoint-2, and the MFFFNet, respectively. The green box represents the ground truth. The red box indicates the test results. The black dashed box indicates the areas that need to be focused on. Blue and orange circles indicate false positive and false negative results.</p>
Full article ">Figure 10
<p>Comparison between MFFFNet and PointPillars-2. Line (<b>a</b>,<b>b</b>) indicate two different scenes. The first column is the ground truth. The second and third columns are the detection results of the CenterPoint-2, and the MFFFNet, respectively. The green box represents the ground truth. The red box indicates the test results. The black dashed box indicates the areas that need to be focused on. Blue and orange circles indicate false positive and false negative results.</p>
Full article ">
23 pages, 7799 KiB  
Article
Multi-Scale Hybrid Network for Polyp Detection in Wireless Capsule Endoscopy and Colonoscopy Images
by Meryem Souaidi and Mohamed El Ansari
Diagnostics 2022, 12(8), 2030; https://doi.org/10.3390/diagnostics12082030 - 22 Aug 2022
Cited by 15 | Viewed by 2536
Abstract
The trade-off between speed and precision is a key step in the detection of small polyps in wireless capsule endoscopy (WCE) images. In this paper, we propose a hybrid network of an inception v4 architecture-based single-shot multibox detector (Hyb-SSDNet) to detect small polyp [...] Read more.
The trade-off between speed and precision is a key step in the detection of small polyps in wireless capsule endoscopy (WCE) images. In this paper, we propose a hybrid network of an inception v4 architecture-based single-shot multibox detector (Hyb-SSDNet) to detect small polyp regions in both WCE and colonoscopy frames. Medical privacy concerns are considered the main barriers to WCE image acquisition. To satisfy the object detection requirements, we enlarged the training datasets and investigated deep transfer learning techniques. The Hyb-SSDNet framework adopts inception blocks to alleviate the inherent limitations of the convolution operation to incorporate contextual features and semantic information into deep networks. It consists of four main components: (a) multi-scale encoding of small polyp regions, (b) using the inception v4 backbone to enhance more contextual features in shallow and middle layers, and (c) concatenating weighted features of mid-level feature maps, giving them more importance to highly extract semantic information. Then, the feature map fusion is delivered to the next layer, followed by some downsampling blocks to generate new pyramidal layers. Finally, the feature maps are fed to multibox detectors, consistent with the SSD process-based VGG16 network. The Hyb-SSDNet achieved a 93.29% mean average precision (mAP) and a testing speed of 44.5 FPS on the WCE dataset. This work proves that deep learning has the potential to develop future research in polyp detection and classification tasks. Full article
Show Figures

Figure 1

Figure 1
<p>Framework of the traditional SSD.</p>
Full article ">Figure 2
<p>Flowchart of the proposed Hyb-SSDNet method for small polyp detection in WCE images.</p>
Full article ">Figure 3
<p>Modified inception structure.</p>
Full article ">Figure 4
<p>The mid-fusion framework. Input feature maps were created at two successive inception modules of the inception v4 network to encode contextual information. All scale sizes are 35 × 35 × 384. Extracted features were passed to the mSE-Network to generate the score maps, reflecting the importance of features at different positions and scales. Weighted features were then concatenated and normalized to complete the network process.</p>
Full article ">Figure 5
<p>Overview of the Hyb-SSDNet architecture with a 299 × 299 × 3 input size and inception v4 as the backbone. Features from two successive modified inception-A layers (<math display="inline"><semantics> <msub> <mi>S</mi> <mrow> <mi>L</mi> <mn>1</mn> </mrow> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>S</mi> <mrow> <mi>L</mi> <mn>2</mn> </mrow> </msub> </semantics></math>) were fused by a mid-fusion block producing an intermediate feature representation (<math display="inline"><semantics> <msubsup> <mrow> <mi>S</mi> </mrow> <mrow> <mi>H</mi> <mi>L</mi> </mrow> <msup> <mrow/> <mo>′</mo> </msup> </msubsup> </semantics></math>).</p>
Full article ">Figure 6
<p>Detailed structure of the Hyb-SSDNet network.</p>
Full article ">Figure 7
<p>Example of WCE polyp images (<b>a</b>–<b>c</b>).</p>
Full article ">Figure 8
<p>Example of CVC-ClinicDB polyp images (<b>a</b>–<b>c</b>).</p>
Full article ">Figure 9
<p>Example of ETIS-Larib polyp images (<b>a</b>–<b>c</b>).</p>
Full article ">Figure 10
<p>Precision vs. recall for (<b>a</b>) WCE test set, (<b>b</b>) CVC-ClinicDB test set, and (<b>c</b>) ETIS-Larib test set using the Hyb-SSDNet framework.</p>
Full article ">Figure 11
<p>Qualitative results comparison between FSSD300 (<b>a</b>,<b>c</b>,<b>e</b>,<b>g</b>) and the proposed Hyb-SSDNet (<b>b</b>,<b>d</b>,<b>f</b>,<b>h</b>) on the WCE polyp test set. True bounding boxes with IoU of 0.5 or higher with the bounding predicted boxes are drawn in green and red colors, respectively.</p>
Full article ">Figure 12
<p>Qualitative results comparison between FSSD300 (<b>a</b>,<b>c</b>,<b>e</b>,<b>g</b>) and the proposed Hyb-SSDNet (<b>b</b>,<b>d</b>,<b>f</b>,<b>h</b>) on the CVC-ClinicDB polyp test set. True bounding boxes with IoU of 0.5 or higher with the bounding predicted boxes are drawn in green and red colors, respectively.</p>
Full article ">Figure 13
<p>Qualitative results comparison between FSSD300 (<b>a</b>,<b>c</b>,<b>e</b>,<b>g</b>) and the proposed Hyb-SSDNet (<b>b</b>,<b>d</b>,<b>f</b>,<b>h</b>) on the ETIS-Larib polyp test set. True bounding boxes with IoU of 0.5 or higher with the bounding predicted boxes are drawn in green and red colors, respectively.</p>
Full article ">
15 pages, 1882 KiB  
Article
Unsupervised Anomaly Detection in Printed Circuit Boards through Student–Teacher Feature Pyramid Matching
by Venkat Anil Adibhatla, Yu-Chieh Huang, Ming-Chung Chang, Hsu-Chi Kuo, Abhijeet Utekar, Huan-Chuang Chih, Maysam F. Abbod and Jiann-Shing Shieh
Electronics 2021, 10(24), 3177; https://doi.org/10.3390/electronics10243177 - 20 Dec 2021
Cited by 10 | Viewed by 4397
Abstract
Deep learning methods are currently used in industries to improve the efficiency and quality of the product. Detecting defects on printed circuit boards (PCBs) is a challenging task and is usually solved by automated visual inspection, automated optical inspection, manual inspection, and supervised [...] Read more.
Deep learning methods are currently used in industries to improve the efficiency and quality of the product. Detecting defects on printed circuit boards (PCBs) is a challenging task and is usually solved by automated visual inspection, automated optical inspection, manual inspection, and supervised learning methods, such as you only look once (YOLO) of tiny YOLO, YOLOv2, YOLOv3, YOLOv4, and YOLOv5. Previously described methods for defect detection in PCBs require large numbers of labeled images, which is computationally expensive in training and requires a great deal of human effort to label the data. This paper introduces a new unsupervised learning method for the detection of defects in PCB using student–teacher feature pyramid matching as a pre-trained image classification model used to learn the distribution of images without anomalies. Hence, we extracted the knowledge into a student network which had same architecture as the teacher network. This one-step transfer retains key clues as much as possible. In addition, we incorporated a multi-scale feature matching strategy into the framework. A mixture of multi-level knowledge from the features pyramid passes through a better supervision, known as hierarchical feature alignment, which allows the student network to receive it, thereby allowing for the detection of various sizes of anomalies. A scoring function reflects the probability of the occurrence of anomalies. This framework helped us to achieve accurate anomaly detection. Apart from accuracy, its inference speed also reached around 100 frames per second. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Schematic overview of student–teacher feature pyramid framework.</p>
Full article ">Figure 2
<p>True positive PCB example. (<b>a</b>) True positive example 1; (<b>b</b>) True positive example 2; (<b>c</b>) True positive example 3.</p>
Full article ">Figure 2 Cont.
<p>True positive PCB example. (<b>a</b>) True positive example 1; (<b>b</b>) True positive example 2; (<b>c</b>) True positive example 3.</p>
Full article ">Figure 3
<p>False negative PCB example. (<b>a</b>) False negative example 1; (<b>b</b>) False negative example 2; (<b>c</b>) False negative example 3.</p>
Full article ">Figure 3 Cont.
<p>False negative PCB example. (<b>a</b>) False negative example 1; (<b>b</b>) False negative example 2; (<b>c</b>) False negative example 3.</p>
Full article ">Figure 4
<p>True negative PCB example. (<b>a</b>) True negative example 1; (<b>b</b>) True negative example 2; (<b>c</b>) True negative example 3.</p>
Full article ">Figure 4 Cont.
<p>True negative PCB example. (<b>a</b>) True negative example 1; (<b>b</b>) True negative example 2; (<b>c</b>) True negative example 3.</p>
Full article ">Figure 5
<p>False positive of PCB section. (<b>a</b>) False positive example 1; (<b>b</b>) False positive example 2; (<b>c</b>) False positive example 3.</p>
Full article ">Figure 5 Cont.
<p>False positive of PCB section. (<b>a</b>) False positive example 1; (<b>b</b>) False positive example 2; (<b>c</b>) False positive example 3.</p>
Full article ">
16 pages, 5749 KiB  
Article
Optimization of Action Recognition Model Based on Multi-Task Learning and Boundary Gradient
by Yiming Xu, Fangjie Zhou, Li Wang, Wei Peng and Kai Zhang
Electronics 2021, 10(19), 2380; https://doi.org/10.3390/electronics10192380 - 29 Sep 2021
Cited by 5 | Viewed by 1841
Abstract
Recently, people’s demand for action recognition has extended from the initial high classification accuracy to the high accuracy of the temporal action detection. It is challenging to meet the two requirements simultaneously. The key to behavior recognition lies in the quantity and quality [...] Read more.
Recently, people’s demand for action recognition has extended from the initial high classification accuracy to the high accuracy of the temporal action detection. It is challenging to meet the two requirements simultaneously. The key to behavior recognition lies in the quantity and quality of the extracted features. In this paper, a two-stream convolutional network is used. A three-dimensional convolutional neural network (3D-CNN) is used to extract spatiotemporal features from the consecutive frames. A two-dimensional convolutional neural network (2D-CNN) is used to extract spatial features from the key-frames. The integration of the two networks is excellent for improving the model’s accuracy and can complete the task of distinguishing the start–stop frame. In this paper, a multi-scale feature extraction method is presented to extract more abundant feature information. At the same time, a multi-task learning model is introduced. It can further improve the accuracy of classification via sharing the data between multiple tasks. The experimental result shows that the accuracy of the modified model is improved by 10%. Meanwhile, we propose the confidence gradient, which can optimize the distinguishing method of the start–stop frame to improve the temporal action detection accuracy. The experimental result shows that the accuracy has been enhanced by 11%. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

Figure 1
<p>Overall framework of action recognition.</p>
Full article ">Figure 2
<p>The graph of the data processing, schematic diagram of channel fusion, and attention mechanism.</p>
Full article ">Figure 3
<p>Spatial pyramid model.</p>
Full article ">Figure 4
<p>Temporal pyramid model.</p>
Full article ">Figure 5
<p>The multi-task model with the sparse sharing of the features.</p>
Full article ">Figure 6
<p>Pipeline linking of action.</p>
Full article ">Figure 7
<p>The judgment of time boundary.</p>
Full article ">Figure 8
<p>The self-made dataset.</p>
Full article ">Figure 9
<p>The relationship between the setting of Batch value and clip_duration value with F1_score.</p>
Full article ">Figure 10
<p>Original pictures of multi-player action.</p>
Full article ">Figure 11
<p>The accurate location of multiplayer action.</p>
Full article ">Figure 12
<p>The predicted location of multiplayer action.</p>
Full article ">Figure 13
<p>The accuracy of temporal action detection.</p>
Full article ">
19 pages, 7552 KiB  
Article
Crater Detection and Recognition Method for Pose Estimation
by Zihao Chen and Jie Jiang
Remote Sens. 2021, 13(17), 3467; https://doi.org/10.3390/rs13173467 - 1 Sep 2021
Cited by 7 | Viewed by 3695
Abstract
A crater detection and recognition algorithm is the key to pose estimation based on craters. Due to the changing viewing angle and varying height, the crater is imaged as an ellipse and the scale changes in the landing camera. In this paper, a [...] Read more.
A crater detection and recognition algorithm is the key to pose estimation based on craters. Due to the changing viewing angle and varying height, the crater is imaged as an ellipse and the scale changes in the landing camera. In this paper, a robust and efficient crater detection and recognition algorithm for fusing the information of sequence images for pose estimation is designed, which can be used in both flying in orbit around and landing phases. Our method consists of two stages: stage 1 for crater detection and stage 2 for crater recognition. In stage 1, a single-stage network with dense anchor points (dense point crater detection network, DPCDN) is conducive to dealing with multi-scale craters, especially small and dense crater scenes. The fast feature-extraction layer (FEL) of the network improves detection speed and reduces network parameters without losing accuracy. We comprehensively evaluate this method and present state-of-art detection performance on a Mars crater dataset. In stage 2, taking the encoded features and intersection over union (IOU) of craters as weights, we solve the weighted bipartite graph matching problem, which is matching craters in the image with the previously identified craters and the pre-established craters database. The former is called “frame-frame match”, or FFM, and the latter is called “frame-database match”, or FDM. Combining the FFM with FDM, the recognition speed is enabled to achieve real-time on the CPU (25 FPS) and the average recognition precision is 98.5%. Finally, the recognition result is used to estimate the pose using the perspective-n-point (PnP) algorithm and results show that the root mean square error (RMSE) of trajectories is less than 10 m and the angle error is less than 1.5 degrees. Full article
(This article belongs to the Special Issue Cartography of the Solar System: Remote Sensing beyond Earth)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Crater detection and recognition system workflow. The whole workflow consists of two stages. In stage 1, a dense point crater detection network obtains craters in the frame k and frame k − 1. Then, in stage-2, we use the KM algorithm to match k’s craters with k − 1′s craters or the pre-established database.</p>
Full article ">Figure 2
<p>Matching (<b>a</b>) with bounding box and (<b>b</b>) with the center of craters.</p>
Full article ">Figure 3
<p>(<b>a</b>) Architecture of DPCDN; (<b>b</b>) one point in feature map mapping to an anchor point in original image applied in P3, P4, and P5; (<b>c</b>) one point in feature map mapping to multiple anchor points in the original image, applied in P2.</p>
Full article ">Figure 4
<p>(<b>a</b>) Structure of FEL and Conv-CReLU. (<b>b</b>) Relationship between point (<span class="html-italic">x</span>,<span class="html-italic">y</span>) of feature map and point of original map. (<b>c</b>) Changes in centrality before and after dense anchor points.</p>
Full article ">Figure 5
<p>(<b>a</b>) Recognition workflow. (<b>b</b>) KM matching algorithm where the <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="normal">d</mi> <mrow> <mi>ij</mi> </mrow> </msub> </mrow> </semantics></math> is the distance between craters, including the feature distance and IOU distance. <math display="inline"><semantics> <mi mathvariant="sans-serif">Θ</mi> </semantics></math> is for the craters and <math display="inline"><semantics> <mrow> <msubsup> <mi mathvariant="normal">C</mi> <mi mathvariant="normal">i</mi> <mi mathvariant="normal">k</mi> </msubsup> </mrow> </semantics></math> is the ith crater in the kth frame.</p>
Full article ">Figure 6
<p>Feature-encoding method.</p>
Full article ">Figure 7
<p>(<b>a</b>) State-transition diagram. (<b>b</b>) Frame k − 1 matching with frame k, using Kalman filter to predict crater’s position of frame k − 1. <math display="inline"><semantics> <mrow> <msup> <mi mathvariant="normal">C</mi> <mrow> <mi mathvariant="normal">k</mi> <mo>−</mo> <mn>1</mn> </mrow> </msup> </mrow> </semantics></math> is the craters’ state in the frame k − 1 and <math display="inline"><semantics> <mrow> <msup> <mrow> <mover> <mi mathvariant="normal">C</mi> <mo>^</mo> </mover> </mrow> <mrow> <mi mathvariant="normal">k</mi> <mo>−</mo> <mn>1</mn> </mrow> </msup> </mrow> </semantics></math> is the prediction of <math display="inline"><semantics> <mrow> <msup> <mi mathvariant="normal">C</mi> <mrow> <mi mathvariant="normal">k</mi> <mo>−</mo> <mn>1</mn> </mrow> </msup> </mrow> </semantics></math> in frame k by Kalman filter.</p>
Full article ">Figure 8
<p>(<b>a</b>) Bandeira Mars Crater Database. (<b>b</b>) Left, gazebo simulation environment; right, image captured by camera.</p>
Full article ">Figure 9
<p>(<b>a</b>–<b>d</b>) Ground-truth trajectories of Seq1–4.</p>
Full article ">Figure 10
<p>PR curve with or without dense anchor point.</p>
Full article ">Figure 11
<p>Craters on the Moon and Mars detected by DPCDN.</p>
Full article ">Figure 12
<p>Matching rate vs. pose noise and angle noise.</p>
Full article ">Figure 13
<p>Visualization of FFM results. The images of (<b>1-a</b>)–(<b>2-c</b>) are the experimental landing sequences simulated by Gazebo and (<b>3-a</b>)–(<b>3-c</b>) are the landing sequence of Chang’e 3. The yellow rectangles are the DPCDN detection result, and the rectangles with the same color in different images indicate the same crater.</p>
Full article ">Figure 14
<p>Result of projecting craters onto images by estimated pose.</p>
Full article ">Figure 15
<p>Ground-truth vs. estimate trajectories. The subfigures from the upper left to the lower right represent the estimated and ground truth trajectories of <span class="html-italic">x</span>, <span class="html-italic">y</span>, <span class="html-italic">z</span> direction, and rolling, pitch, and yaw, respectively. Dashed lines represent ground truth trajectories, and solid lines represent estimated trajectories.</p>
Full article ">Figure A1
<p>Bandeira Mars Crater Database. Yellow points in image are centers of craters.</p>
Full article ">Figure A2
<p>Lunar image from Chang’E-1 CCD stereo camera.</p>
Full article ">
4763 KiB  
Article
A Virtual Geographic Environment for Debris Flow Risk Analysis in Residential Areas
by Lingzhi Yin, Jun Zhu, Yi Li, Chao Zeng, Qing Zhu, Hua Qi, Mingwei Liu, Weilian Li, Zhenyu Cao, Weijun Yang and Pengcheng Zhang
ISPRS Int. J. Geo-Inf. 2017, 6(11), 377; https://doi.org/10.3390/ijgi6110377 - 22 Nov 2017
Cited by 26 | Viewed by 5211
Abstract
Emergency risk assessment of debris flows in residential areas is of great significance for disaster prevention and reduction, but the assessment has disadvantages, such as a low numerical simulation efficiency and poor capabilities of risk assessment and geographic knowledge sharing. Thus, this paper [...] Read more.
Emergency risk assessment of debris flows in residential areas is of great significance for disaster prevention and reduction, but the assessment has disadvantages, such as a low numerical simulation efficiency and poor capabilities of risk assessment and geographic knowledge sharing. Thus, this paper focuses on the construction of a VGE (virtual geographic environment) system that provides an efficient tool to support the rapid risk analysis of debris flow disasters. The numerical simulation, risk analysis, and 3D (three-dimensional) dynamic visualization of debris flow disasters were tightly integrated into the VGE system. Key technologies, including quantitative risk assessment, multiscale parallel optimization, and visual representation of disaster information, were discussed in detail. The Qipan gully in Wenchuan County, Sichuan Province, China, was selected as the case area, and a prototype system was developed. According to the multiscale parallel optimization experiments, a suitable scale was chosen for the numerical simulation of debris flow disasters. The computational efficiency of one simulation step was 5 ms (milliseconds), and the rendering efficiency was approximately 40 fps (frames per second). Information about the risk area, risk population, and risk roads under different conditions can be quickly obtained. The experimental results show that our approach can support real-time interactive analyses and can be used to share and publish geographic knowledge. Full article
Show Figures

Figure 1

Figure 1
<p>Overall framework.</p>
Full article ">Figure 2
<p>Multiscale parallel optimization based on the OpenMP framework.</p>
Full article ">Figure 3
<p>Quantitative risk assessment of debris flow disasters.</p>
Full article ">Figure 4
<p>Visualization color design: (<b>a</b>) real-time flow depth and color mapping and (<b>b</b>) color grade designed in the risk map.</p>
Full article ">Figure 5
<p>Case area.</p>
Full article ">Figure 6
<p>Spatial distribution of the maximum flow velocities under different grid cell sizes.</p>
Full article ">Figure 7
<p>Spatial distribution of the maximum flow depths under different grid cell sizes.</p>
Full article ">Figure 8
<p>Risk map of the debris flow disaster in Qipan gully.</p>
Full article ">Figure 9
<p>3D visualization of disaster information.</p>
Full article ">
Back to TopTop