Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (128)

Search Parameters:
Keywords = spatial baseline optimization

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 9667 KiB  
Article
Coarse-to-Fine Structure and Semantic Learning for Single-Sample SAR Image Generation
by Xilin Wang, Bingwei Hui, Pengcheng Guo, Rubo Jin and Lei Ding
Remote Sens. 2024, 16(17), 3326; https://doi.org/10.3390/rs16173326 - 8 Sep 2024
Viewed by 560
Abstract
Synthetic Aperture Radar (SAR) enables the acquisition of high-resolution imagery even under severe meteorological and illumination conditions. Its utility is evident across a spectrum of applications, particularly in automatic target recognition (ATR). Since SAR samples are often scarce in practical ATR applications, there [...] Read more.
Synthetic Aperture Radar (SAR) enables the acquisition of high-resolution imagery even under severe meteorological and illumination conditions. Its utility is evident across a spectrum of applications, particularly in automatic target recognition (ATR). Since SAR samples are often scarce in practical ATR applications, there is an urgent need to develop sample-efficient augmentation techniques to augment the SAR images. However, most of the existing generative approaches require an excessive amount of training samples for effective modeling of the SAR imaging characteristics. Additionally, they show limitations in augmenting the interesting target samples while maintaining image recognizability. In this study, we introduce an innovative single-sample image generation approach tailored to SAR data augmentation. To closely approximate the target distribution across both the spatial layout and local texture, a multi-level Generative Adversarial Network (GAN) architecture is constructed. It comprises three distinct GANs that independently model the structural, semantic, and texture patterns. Furthermore, we introduce multiple constraints including prior-regularized noise sampling and perceptual loss optimization to enhance the fidelity and stability of the generation process. Comparative evaluations against the state-of-the-art generative methods demonstrate the superior performance of the proposed method in terms of generation diversity, recognizability, and stability. In particular, its advantages over the baseline method are up to 0.2 and 0.22 in the SIFID and SSIM, respectively. It also exhibits stronger robustness in the generation of images across varying spatial sizes. Full article
Show Figures

Figure 1

Figure 1
<p>Network structure of the different generative models.</p>
Full article ">Figure 2
<p>GAN Inversion and noise sampling with prior constraints.</p>
Full article ">Figure 3
<p>An overview of the ‘coarse-to-fine’ image generation process. The leftmost figures represent the input data fed into the generator at each hierarchical level, while the central figures represent the synthesized outputs produced by the generator, i.e., the ‘fake’ images. The rightmost figures present the actual reference images, i.e., the ‘real’ images. The red dashed lines represent the upscaling operations.</p>
Full article ">Figure 4
<p>Calculation of the LPIPS loss function.</p>
Full article ">Figure 5
<p>Structure of the MSDA module.</p>
Full article ">Figure 6
<p>A diversity vs. recognizability evaluation framework. Among the synthesized samples, those that exhibit intra-class variance within the decision boundaries demonstrate strong generalization (Box 1). In contrast, exact replicas (Box 2) and excessively diversified samples that surpass the boundaries (Box 3) exhibit poor generalization.</p>
Full article ">Figure 7
<p>Comparison of the synthesized results in the ablation study. (<b>a</b>) Real images, (<b>b</b>) results of the StGAN, (<b>c</b>) results of StGAN and TeGAN, (<b>d</b>) results of the TeGAN and SeGAN, (<b>e</b>) results of the StGAN, TeGAN, and SeGAN.</p>
Full article ">Figure 8
<p>Visualization of diversity vs. recognizability in the synthesized image samples. Each point represents one sample, while each method yields 70 image samples. The ‘st’, ‘te’ and ‘se’ refer to the StGAN, TeGAN, and SeGAN, respectively.</p>
Full article ">Figure 9
<p>Comparison of synthesized results obtained with and without the use of the MSDA. (<b>a</b>) Real images, (<b>b</b>) the baseline method, (<b>c</b>) the proposed method without MSDA, (<b>d</b>) the proposed method with MSDA.</p>
Full article ">Figure 10
<p>Visualization of diversity vs. recognizability in the synthetic results obtained with and without the use of the MSDA. Each point represents one sample, while each method yields 70 image samples.</p>
Full article ">Figure 11
<p>Qualitative comparisons of small image samples with 256 × 256 pixels. (<b>a</b>) Real images, (<b>b</b>) results of the proposed method, (<b>c</b>) results of the ExSinGAN, (<b>d</b>) results of the SinGAN, (<b>e</b>) results of the InGAN, (<b>f</b>) results of the HP-VAEGAN.</p>
Full article ">Figure 12
<p>Qualitative comparisons of small image samples with 400 × 400 pixels. (<b>a</b>) Real images, (<b>b</b>) results of the proposed method, (<b>c</b>) results of the ExSinGAN, (<b>d</b>) results of the SinGAN, (<b>e</b>) results of the InGAN, (<b>f</b>) results of the HP-VAEGAN.</p>
Full article ">Figure 13
<p>Qualitative comparisons of small image samples with 800 × 800 pixels. (<b>a</b>) Real images, (<b>b</b>) results of the proposed method, (<b>c</b>) results of the ExSinGAN, (<b>d</b>) results of the SinGAN, (<b>e</b>) results of the InGAN, (<b>f</b>) results of the HP-VAEGAN.</p>
Full article ">Figure 14
<p>The visualization of diversity vs. recognizability of the compared methods. The different colors indicate the samples generated by the different methods. The results are analyzed in the sample images with the spatial sizes of (<b>a</b>) 200 <math display="inline"><semantics> <mo>×</mo> </semantics></math> 200, (<b>b</b>) 400 <math display="inline"><semantics> <mo>×</mo> </semantics></math> 400, and (<b>c</b>) 800 <math display="inline"><semantics> <mo>×</mo> </semantics></math> 800.</p>
Full article ">
24 pages, 4199 KiB  
Article
Multi-Source Data-Driven Local-Global Dynamic Multi-Graph Convolutional Network for Bike-Sharing Demands Prediction
by Juan Chen and Rui Huang
Algorithms 2024, 17(9), 384; https://doi.org/10.3390/a17090384 - 1 Sep 2024
Viewed by 290
Abstract
The prediction of bike-sharing demand plays a pivotal role in the optimization of intelligent transportation systems, particularly amidst the COVID-19 pandemic, which has significantly altered travel behaviors and demand dynamics. In this study, we examine various spatiotemporal influencing factors associated with bike-sharing and [...] Read more.
The prediction of bike-sharing demand plays a pivotal role in the optimization of intelligent transportation systems, particularly amidst the COVID-19 pandemic, which has significantly altered travel behaviors and demand dynamics. In this study, we examine various spatiotemporal influencing factors associated with bike-sharing and propose the Local-Global Dynamic Multi-Graph Convolutional Network (LGDMGCN) model, driven by multi-source data, for multi-step prediction of station-level bike-sharing demand. In the temporal dimension, we dynamically model temporal dependencies by incorporating multiple sources of time semantic features such as confirmed COVID-19 cases, weather conditions, and holidays. Additionally, we integrate a time attention mechanism to better capture variations over time. In the spatial dimension, we consider factors related to the addition or removal of stations and utilize spatial semantic features, such as urban points of interest and station locations, to construct dynamic multi-graphs. The model utilizes a local-global structure to capture spatial dependencies among individual bike-sharing stations and all stations collectively. Experimental results, obtained through comparisons with baseline models on the same dataset and conducting ablation studies, demonstrate the feasibility and effectiveness of the proposed model in predicting bike-sharing demand. Full article
(This article belongs to the Special Issue AI Algorithms for Positive Change in Digital Futures)
Show Figures

Figure 1

Figure 1
<p>The mutual impact of demand between stations.</p>
Full article ">Figure 2
<p>The dynamic changes in station locations within specific regions of Chicago.</p>
Full article ">Figure 3
<p>The relationship between daily demand for shared bicycles and daily confirmed COVID-19 cases.</p>
Full article ">Figure 4
<p>The impact of rainy or snowy weather on the demand for shared bicycles.</p>
Full article ">Figure 5
<p>The architecture of LGDMGCN. <math display="inline"><semantics> <mrow> <mo>{</mo> <msub> <mi mathvariant="normal">x</mi> <mrow> <msub> <mi>t</mi> <mn>0</mn> </msub> <mo>−</mo> <mi>h</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <mo> </mo> <msub> <mi mathvariant="normal">x</mi> <mrow> <msub> <mi>t</mi> <mn>0</mn> </msub> <mo>−</mo> <mi>h</mi> <mo>+</mo> <mn>2</mn> </mrow> </msub> <mo>,</mo> <mo> </mo> <mo>…</mo> <mo>,</mo> <mo> </mo> <msub> <mi mathvariant="normal">x</mi> <mrow> <msub> <mi>t</mi> <mn>0</mn> </msub> </mrow> </msub> <mo>}</mo> </mrow> </semantics></math>: the input time series. Records: historical ride records between stations. POI: POIs of stations. Dis: distance between stations. <math display="inline"><semantics> <mrow> <mo>{</mo> <msub> <mi>s</mi> <mrow> <msub> <mi>t</mi> <mn>0</mn> </msub> <mo>−</mo> <mi>h</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <mo> </mo> <msub> <mi>s</mi> <mrow> <msub> <mi>t</mi> <mn>0</mn> </msub> <mo>−</mo> <mi>h</mi> <mo>+</mo> <mn>2</mn> </mrow> </msub> <mo>,</mo> <mo> </mo> <mo>…</mo> <mo>,</mo> <mo> </mo> <msub> <mi>s</mi> <mrow> <msub> <mi>t</mi> <mn>0</mn> </msub> </mrow> </msub> <mo>}</mo> </mrow> </semantics></math>: the feature sequence produced by the encoder. <math display="inline"><semantics> <mrow> <mi>A</mi> <mi>t</mi> <mi>t</mi> <mi>n</mi> </mrow> </semantics></math>: the output of Temporal Attention. <math display="inline"><semantics> <mrow> <mo>{</mo> <msub> <mover accent="true"> <mi>y</mi> <mo>^</mo> </mover> <mrow> <msub> <mi>t</mi> <mn>0</mn> </msub> <mo>−</mo> <mi>h</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <mo> </mo> <msub> <mover accent="true"> <mi>y</mi> <mo>^</mo> </mover> <mrow> <msub> <mi>t</mi> <mn>0</mn> </msub> <mo>−</mo> <mi>h</mi> <mo>+</mo> <mn>2</mn> </mrow> </msub> <mo>,</mo> <mo> </mo> <mo>…</mo> <mo>,</mo> <mo> </mo> <msub> <mover accent="true"> <mi>y</mi> <mo>^</mo> </mover> <mrow> <msub> <mi>t</mi> <mn>0</mn> </msub> </mrow> </msub> <mo>}</mo> </mrow> </semantics></math>: the predictive sequence.</p>
Full article ">Figure 6
<p>Local-Global Dynamic Spatiotemporal Graph Convolution Module.</p>
Full article ">Figure 7
<p>Gated Temporal Convolution Module.</p>
Full article ">Figure 8
<p>The metrics of different time granularities.</p>
Full article ">Figure 9
<p>Dynamic adjacency matrix weighted heatmaps at different times.</p>
Full article ">Figure 10
<p>Relationship between daily confirmed COVID-19 cases and predicted demand for bike-sharing.</p>
Full article ">Figure 11
<p>Comparison of MAE for models across different periods.</p>
Full article ">
22 pages, 8995 KiB  
Article
Chili Pepper Object Detection Method Based on Improved YOLOv8n
by Na Ma, Yulong Wu, Yifan Bo and Hongwen Yan
Plants 2024, 13(17), 2402; https://doi.org/10.3390/plants13172402 - 28 Aug 2024
Viewed by 440
Abstract
In response to the low accuracy and slow detection speed of chili recognition in natural environments, this study proposes a chili pepper object detection method based on the improved YOLOv8n. Evaluations were conducted among YOLOv5n, YOLOv6n, YOLOv7-tiny, YOLOv8n, YOLOv9, and YOLOv10 to select [...] Read more.
In response to the low accuracy and slow detection speed of chili recognition in natural environments, this study proposes a chili pepper object detection method based on the improved YOLOv8n. Evaluations were conducted among YOLOv5n, YOLOv6n, YOLOv7-tiny, YOLOv8n, YOLOv9, and YOLOv10 to select the optimal model. YOLOv8n was chosen as the baseline and improved as follows: (1) Replacing the YOLOv8 backbone with the improved HGNetV2 model to reduce floating-point operations and computational load during convolution. (2) Integrating the SEAM (spatially enhanced attention module) into the YOLOv8 detection head to enhance feature extraction capability under chili fruit occlusion. (3) Optimizing feature fusion using the dilated reparam block module in certain C2f (CSP bottleneck with two convolutions). (4) Substituting the traditional upsample operator with the CARAFE(content-aware reassembly of features) upsampling operator to further enhance network feature fusion capability and improve detection performance. On a custom-built chili dataset, the F0.5-score, mAP0.5, and mAP0.5:0.95 metrics improved by 1.98, 2, and 5.2 percentage points, respectively, over the original model, achieving 96.47%, 96.3%, and 79.4%. The improved model reduced parameter count and GFLOPs by 29.5% and 28.4% respectively, with a final model size of 4.6 MB. Thus, this method effectively enhances chili target detection, providing a technical foundation for intelligent chili harvesting processes. Full article
(This article belongs to the Section Plant Modeling)
Show Figures

Figure 1

Figure 1
<p>Comparison of loss curves for improved YOLOv8 and YOLOv8n.</p>
Full article ">Figure 2
<p>Detection accuracy comparison for improved YOLOv8 and YOLOv8.</p>
Full article ">Figure 3
<p>Comparison of weight for improved YOLOv8 and YOLOv8.</p>
Full article ">Figure 4
<p>Model heat map visualization.</p>
Full article ">Figure 5
<p>Comparison of detection effects of improved YOLOv8 and YOLOv8n. (<b>a</b>) Detection results of unobstructed chili pepper fruits; (<b>b</b>) Detection results of overlapping chili pepper fruits; (<b>c</b>) Detection results of chili pepper fruits obscured by leaves; (<b>d</b>) Detection results of mixed interference.</p>
Full article ">Figure 6
<p>Sampling area map.</p>
Full article ">Figure 7
<p>Different scenarios of chili pepper images: (<b>a</b>) unobstructed chili pepper fruits; (<b>b</b>) chili pepper fruits slightly obscured by leaves; (<b>c</b>) the chili pepper fruits are slightly obscured by branches; (<b>d</b>) the chili pepper fruits are heavily obscured by leaves; (<b>e</b>) the chili pepper fruits overlap; (<b>f</b>) mixed interference.</p>
Full article ">Figure 8
<p>Labeling process.</p>
Full article ">Figure 9
<p>Chili pepper of data augmentation; (<b>a</b>) original image; (<b>b</b>) flipped image; (<b>c</b>) brightened image; (<b>d</b>) cropped image; (<b>e</b>) enhanced color image; (<b>f</b>) noisy image; (<b>g</b>) saturated image; (<b>h</b>) blurred image; (<b>i</b>) sharpened image.</p>
Full article ">Figure 10
<p>C2f structure.</p>
Full article ">Figure 11
<p>YOLOv8 head structure.</p>
Full article ">Figure 12
<p>Improved YOLOv8 network structure. We replaced the improved HGNetV2 with the YOLOv8 backbone network and introduced the SEAM attention mechanism into the detection head. Some C2f layers were optimized to C2f-DRB. CARAFE upsampling operators replaced traditional upsampling operators. ConvModule stands for convolution + normalization + activation function; SPPF is the spatial pyramid pooling module; Maxpool2d is max pooling; Conv is convolution; Contact is the feature concatenation module; Bbox Loss and Cls Loss refer to bounding box loss and classification loss. respectively.</p>
Full article ">Figure 13
<p>HGNetV2 structure.</p>
Full article ">Figure 14
<p>HG block structure.</p>
Full article ">Figure 15
<p>Improved HGNetV2 structure.</p>
Full article ">Figure 16
<p>SEAM structure.</p>
Full article ">Figure 17
<p>Improved YOLOv8 head.</p>
Full article ">Figure 18
<p>Dilated reparam block network architecture.</p>
Full article ">Figure 19
<p>C2f-DRB network architecture.</p>
Full article ">Figure 20
<p>Comparison of loss function.</p>
Full article ">
24 pages, 9717 KiB  
Article
Automated Measurement of Cattle Dimensions Using Improved Keypoint Detection Combined with Unilateral Depth Imaging
by Cheng Peng, Shanshan Cao, Shujing Li, Tao Bai, Zengyuan Zhao and Wei Sun
Animals 2024, 14(17), 2453; https://doi.org/10.3390/ani14172453 - 23 Aug 2024
Viewed by 414
Abstract
Traditional measurement methods often rely on manual operations, which are not only inefficient but also cause stress to cattle, affecting animal welfare. Currently, non-contact cattle dimension measurement usually involves the use of multi-view images combined with point cloud or 3D reconstruction technologies, which [...] Read more.
Traditional measurement methods often rely on manual operations, which are not only inefficient but also cause stress to cattle, affecting animal welfare. Currently, non-contact cattle dimension measurement usually involves the use of multi-view images combined with point cloud or 3D reconstruction technologies, which are costly and less flexible in actual farming environments. To address this, this study proposes an automated cattle dimension measurement method based on an improved keypoint detection model combined with unilateral depth imaging. Firstly, YOLOv8-Pose is selected as the keypoint detection model and SimSPPF replaces the original SPPF to optimize spatial pyramid pooling, reducing computational complexity. The CARAFE architecture, which enhances upsampling content-aware capabilities, is introduced at the neck. The improved YOLOv8-pose achieves a mAP of 94.4%, a 2% increase over the baseline model. Then, cattle keypoints are captured on RGB images and mapped to depth images, where keypoints are optimized using conditional filtering on the depth image. Finally, cattle dimension parameters are calculated using the cattle keypoints combined with Euclidean distance, the Moving Least Squares (MLS) method, Radial Basis Functions (RBFs), and Cubic B-Spline Interpolation (CB-SI). The average relative errors for the body height, lumbar height, body length, and chest girth of the 23 measured beef cattle were 1.28%, 3.02%, 6.47%, and 4.43%, respectively. The results show that the method proposed in this study has high accuracy and can provide a new approach to non-contact beef cattle dimension measurement. Full article
Show Figures

Figure 1

Figure 1
<p>Data collection and body measurement scenes. (<b>a</b>) Image acquisition environment. (<b>b</b>) Image acquisition equipment. (<b>c</b>) Measurement equipment. (<b>d</b>) Body scale measurement environment.</p>
Full article ">Figure 2
<p>Dataset example graph.</p>
Full article ">Figure 3
<p>Data Annotation.</p>
Full article ">Figure 4
<p>Technology Roadmap.</p>
Full article ">Figure 5
<p>Overall architecture of CARAFE.</p>
Full article ">Figure 6
<p>SimSPFF network structure diagram.</p>
Full article ">Figure 7
<p>Improved YOLOv8-pose network structure diagram.</p>
Full article ">Figure 8
<p>Schematic diagram of pixel coordinates to world coordinates conversion.</p>
Full article ">Figure 9
<p>Schematic diagram of body height and body length calculation.</p>
Full article ">Figure 10
<p>Schematic diagram of lumbar height calculation.</p>
Full article ">Figure 11
<p>Chest circumference calculation curve.</p>
Full article ">Figure 12
<p>Keypoint prediction results. (<b>a</b>) In excellent lighting conditions; (<b>b</b>) with the cow standing; (<b>c</b>) with the cow bowing its head; (<b>d</b>) with the cow walking.</p>
Full article ">Figure 13
<p>Normality test results: (<b>a</b>) indicates the results of body height; (<b>b</b>) indicates the results of lumbar height; (<b>c</b>) indicates the results of body length; (<b>d</b>) indicates the results of chest girth.</p>
Full article ">Figure 14
<p>Body size measurement box plot results.</p>
Full article ">Figure 15
<p>Results of keypoint detection and body size measurement under different noise conditions.</p>
Full article ">Figure 16
<p>Keypoint detection and body measurement results at different distances.</p>
Full article ">Figure 17
<p>Keypoint detection and body measurement results for different postures.</p>
Full article ">Figure 18
<p>Non-contact body measurement system.</p>
Full article ">
21 pages, 2501 KiB  
Article
RetinaViT: Efficient Visual Backbone for Online Video Streams
by Tomoyuki Suzuki and Yoshimitsu Aoki
Sensors 2024, 24(17), 5457; https://doi.org/10.3390/s24175457 - 23 Aug 2024
Viewed by 517
Abstract
In online video understanding, which has a wide range of real-world applications, inference speed is crucial. Many approaches involve frame-level visual feature extraction, which often represents the biggest bottleneck. We propose RetinaViT, an efficient method for extracting frame-level visual features in an online [...] Read more.
In online video understanding, which has a wide range of real-world applications, inference speed is crucial. Many approaches involve frame-level visual feature extraction, which often represents the biggest bottleneck. We propose RetinaViT, an efficient method for extracting frame-level visual features in an online video stream, aiming to fundamentally enhance the efficiency of online video understanding tasks. RetinaViT is composed of efficiently approximated Transformer blocks that only take changed tokens (event tokens) as queries and reuse the already processed tokens from the previous timestep for the others. Furthermore, we restrict keys and values to the spatial neighborhoods of event tokens to further improve efficiency. RetinaViT involves tuning multiple parameters, which we determine through a multi-step process. During model training, we randomly vary these parameters and then perform black-box optimization to maximize accuracy and efficiency on the pre-trained model. We conducted extensive experiments on various online video recognition tasks, including action recognition, pose estimation, and object segmentation, validating the effectiveness of each component in RetinaViT and demonstrating improvements in the speed/accuracy trade-off compared to baselines. In particular, for action recognition, RetinaViT built on ViT-B16 reduces inference time by approximately 61.9% on the CPU and 50.8% on the GPU, while achieving slight accuracy improvements rather than degradation. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of RetinaViT. Based on Vision Transformer (ViT) [<a href="#B13-sensors-24-05457" class="html-bibr">13</a>], RetinaViT converts an input frame image into tokens and processes them with a stack of Transformer blocks. The key difference is that RetinaViT detects tokens that have changed compared to those at the previous timestep in the same stage (block), referred to as event tokens. It then inputs only these event tokens as queries to the Transformer blocks for feature extraction, while reusing the previous tokens for the rest (represented by rectangles with red edges). For simplicity, this figure does not show the restriction of keys and values to the neighborhood of event tokens (see <a href="#sensors-24-05457-f002" class="html-fig">Figure 2</a> and <a href="#sec3dot1-sensors-24-05457" class="html-sec">Section 3.1</a> for details). This simple framework is task-agnostic, and RetinaViT can be used as the backbone for a wide range of online video recognition tasks.</p>
Full article ">Figure 2
<p>Original Transformer block (<b>top</b>) and Retina block (<b>bottom</b>). Each rectangle represents a token. In the Retina block, only event tokens, i.e., tokens that have changed over time, are input as queries, and the previous information is reused for the rest. In addition, by restricting the context tokens to the spatial neighborhood of the event tokens, the computational cost is further reduced.</p>
Full article ">Figure 3
<p>Sanity check on 50Salads [<a href="#B70-sensors-24-05457" class="html-bibr">70</a>]. We present trade-off curves between the accuracy and inference time on the CPU (<b>left</b>) and GPU (<b>right</b>). In the legends, the start and end points of each arrow represent training and inference strategies, respectively. “Origin” in each graph represents the result where no token selection is used (original ViT). We draw dashed arrows representing the improvements in the trade-off between our method (event-based -&gt; event-based) and the corresponding “origin”.</p>
Full article ">Figure 4
<p>Ablation results for the locations of event token detection on 50Salads [<a href="#B70-sensors-24-05457" class="html-bibr">70</a>]. We show trade-off curves between the accuracy and inference time on the CPU (<b>left</b>) and GPU (<b>right</b>). Note that we do not show “origin” (original ViT) in this figure to clarify the differences, but we have successfully improved the trade-off significantly compared to the “origin” as shown in <a href="#sensors-24-05457-f003" class="html-fig">Figure 3</a>.</p>
Full article ">Figure 5
<p>Ablation results for local context tokens and post-fine-tuning on 50Salads [<a href="#B70-sensors-24-05457" class="html-bibr">70</a>]. We show trade-off curves between the accuracy and inference time on the CPU (<b>left</b>) and GPU (<b>right</b>). Note that we do not show “origin” (original ViT) in this figure to clarify the differences, but we have successfully improved the trade-off significantly compared to the “origin” as shown in <a href="#sensors-24-05457-f003" class="html-fig">Figure 3</a>.</p>
Full article ">Figure 6
<p>Comparisons of the learning curves between RetinaViT-S16 and the original ViT-S16. The vertical axis represents the loss for each task, and the horizontal axis represents the number of epochs. In all datasets, the validation learning curves of RetinaViT are relatively stable.</p>
Full article ">Figure 7
<p>Comparisons of the trade-off between the accuracy and inference time on 50Salads [<a href="#B70-sensors-24-05457" class="html-bibr">70</a>] (val). Inference time was measured on the CPU (<b>left</b>) and GPU (<b>right</b>). “Sw” and “Res” represent Swin Transformer [<a href="#B18-sensors-24-05457" class="html-bibr">18</a>] and ResNet [<a href="#B82-sensors-24-05457" class="html-bibr">82</a>], respectively. “DC” represents DeltaCNN [<a href="#B11-sensors-24-05457" class="html-bibr">11</a>], which we only use on the GPU as it does not support CPU inference. Dashed arrows represent the improvements in the trade-off between RetinaViT and the corresponding original ViT.</p>
Full article ">Figure 8
<p>Comparisons of the trade-off between accuracy (PCK@0.2 [<a href="#B75-sensors-24-05457" class="html-bibr">75</a>]) and inference time on Sub-JHMDB [<a href="#B74-sensors-24-05457" class="html-bibr">74</a>] (val). Inference time was measured on the CPU (<b>left</b>) and GPU (<b>right</b>). “HR” represents HR-Net [<a href="#B83-sensors-24-05457" class="html-bibr">83</a>]. “DC” represents DeltaCNN [<a href="#B11-sensors-24-05457" class="html-bibr">11</a>], which we only use on the GPU as it does not support CPU inference.</p>
Full article ">Figure 9
<p>Trade-off comparisons between the accuracy (<math display="inline"><semantics> <mi mathvariant="script">G</mi> </semantics></math> score [<a href="#B77-sensors-24-05457" class="html-bibr">77</a>]) and inference time on DAVIS17 [<a href="#B77-sensors-24-05457" class="html-bibr">77</a>] (test-dev). Inference time was measured on the CPU (<b>left</b>) and GPU (<b>right</b>). “DC” represents DeltaCNN [<a href="#B11-sensors-24-05457" class="html-bibr">11</a>], which we only use on the GPU as it does not support CPU inference.</p>
Full article ">Figure 10
<p>Visualization of event scores, event tokens, and predictions on 50Salads [<a href="#B70-sensors-24-05457" class="html-bibr">70</a>]. The frames are arranged from left to right in time order, and each column represents the same timestep. “no drop” represents the prediction without dropping tokens (i.e., <math display="inline"><semantics> <mrow> <msub> <mi>δ</mi> <mi>l</mi> </msub> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math> for all <span class="html-italic">l</span>) overlaid on the input frames. “drop” represents the event tokens in the fourth block, on which the corresponding prediction is overlaid. For visibility, we overlaid the locations of event tokens on the corresponding input RGB frames, where the non-event tokens are blacked out. The predictions are drawn in green if correct and in red if incorrect. “event score” represents the event scores as a heatmap at the input for the fourth block. Note that all tokens are processed in the first frame of each video clip, which is not depicted in the figure.</p>
Full article ">Figure 11
<p>Visualization of event scores, event tokens, and predictions on Sub-JHMDB [<a href="#B84-sensors-24-05457" class="html-bibr">84</a>]. The frames are arranged from left to right in time order, and each column represents the same timestep. “no drop” represents the prediction (the locations of the key points) without dropping tokens (i.e., <math display="inline"><semantics> <mrow> <msub> <mi>δ</mi> <mi>l</mi> </msub> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math> for all <span class="html-italic">l</span>), overlaid on the input frames. “drop” represents the event tokens in the fourth block, on which the corresponding prediction is overlaid. For visibility, we overlaid the locations of event tokens on the corresponding input RGB frames, where the non-event tokens are blacked out. “event score” represents the event scores as a heatmap at the input for the fourth block. Note that all tokens are processed in the first frame of each video clip, which is not depicted in the figure.</p>
Full article ">Figure 12
<p>Visualization of event scores, event tokens, and predictions on DAVIS2017 [<a href="#B77-sensors-24-05457" class="html-bibr">77</a>]. The frames are arranged from left to right in time order, and each column represents the same timestep. “no drop” represents the prediction masks without dropping tokens (i.e., <math display="inline"><semantics> <mrow> <msub> <mi>δ</mi> <mi>l</mi> </msub> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math> for all <span class="html-italic">l</span>), overlaid on the input frames. “drop” represents the event tokens in the fourth block, on which the corresponding prediction is overlaid. For visibility, we overlaid the locations of event tokens on the corresponding input RGB frames, where the non-event tokens are blacked out. The prediction masks are drawn in different colors for each instance. “event score” represents the event scores as a heatmap at the input for the fourth block. For the last two videos, we show two “drops” with different thresholds <math display="inline"><semantics> <msub> <mi>δ</mi> <mi>l</mi> </msub> </semantics></math> as examples where it is difficult to reduce computational costs while maintaining accuracy due to large camera motion. Note that all tokens are processed in the first frame of each video clip, which is not depicted in the figure.</p>
Full article ">
13 pages, 2979 KiB  
Article
SMR–YOLO: Multi-Scale Detection of Concealed Suspicious Objects in Terahertz Images
by Yuan Zhang, Hao Chen, Zihao Ge, Yuying Jiang, Hongyi Ge, Yang Zhao and Haotian Xiong
Photonics 2024, 11(8), 778; https://doi.org/10.3390/photonics11080778 - 22 Aug 2024
Viewed by 456
Abstract
The detection of concealed suspicious objects in public places is a critical issue and a popular research topic. Terahertz (THz) imaging technology, as an emerging detection method, can penetrate materials without emitting ionizing radiation, providing a new approach to detecting concealed suspicious objects. [...] Read more.
The detection of concealed suspicious objects in public places is a critical issue and a popular research topic. Terahertz (THz) imaging technology, as an emerging detection method, can penetrate materials without emitting ionizing radiation, providing a new approach to detecting concealed suspicious objects. This study focuses on the detection of concealed suspicious objects wrapped in different materials such as polyethylene and kraft paper, including items like scissors, pistols, and blades, using THz imaging technology. To address issues such as the lack of texture details in THz images and the contour similarity of different objects, which can lead to missed detections and false alarms, we propose a THz concealed suspicious object detection model based on SMR–YOLO (SPD_Mobile + RFB + YOLO). This model, based on the MobileNext network, introduces the spatial-to-depth convolution (SPD-Conv) module to replace the backbone network, reducing computational and parameter load. The inclusion of the receptive field block (RFB) module, which uses a multi-branch structure of dilated convolutions, enhances the network’s depth features. Using the EIOU loss function to assess the accuracy of predicted box localization further optimizes convergence speed and localization accuracy. Experimental results show that the improved model achieved [email protected] and [email protected]:0.95 scores of 98.9% and 89.4%, respectively, representing improvements of 0.2% and 1.8% over the baseline model. Additionally, the detection speed reached 108.7 FPS, an improvement of 23.2 FPS over the baseline model. The model effectively identifies concealed suspicious objects within packages, offering a novel approach for detection in public places. Full article
Show Figures

Figure 1

Figure 1
<p>TeraFAST-256-300 system.</p>
Full article ">Figure 2
<p>Real image of concealed object samples.</p>
Full article ">Figure 3
<p>THz images of concealed objects. (<b>a</b>) Kraft paper box packaging and (<b>b</b>) polyethylene bag packaging.</p>
Full article ">Figure 4
<p>NLM processing results. (<b>a</b>) Original THz image. (<b>b</b>) Image after NLM processing.</p>
Full article ">Figure 5
<p>SMR–YOLO network structure.</p>
Full article ">Figure 6
<p>Comparison of dilated convolutions with different dilation rates. (<b>a</b>) Dilated convolution with a dilation rate of 1; (<b>b</b>) dilated convolution with a dilation rate of 2; and (<b>c</b>) dilated convolution with a dilation rate of 4.</p>
Full article ">Figure 7
<p>Structure of the RFB module.</p>
Full article ">Figure 8
<p>Comparison of object detection results between SMR–YOLO and YOLOv7 models. (<b>a</b>) Object detection results of YOLOv7. (<b>b</b>) Object detection results of SMR–YOLO.</p>
Full article ">
20 pages, 4393 KiB  
Article
Tool State Recognition Based on POGNN-GRU under Unbalanced Data
by Weiming Tong, Jiaqi Shen, Zhongwei Li, Xu Chu, Wenqi Jiang and Liguo Tan
Sensors 2024, 24(16), 5433; https://doi.org/10.3390/s24165433 - 22 Aug 2024
Viewed by 326
Abstract
Accurate recognition of tool state is important for maximizing tool life. However, the tool sensor data collected in real-life scenarios has unbalanced characteristics. Additionally, although graph neural networks (GNNs) show excellent performance in feature extraction in the spatial dimension of data, it is [...] Read more.
Accurate recognition of tool state is important for maximizing tool life. However, the tool sensor data collected in real-life scenarios has unbalanced characteristics. Additionally, although graph neural networks (GNNs) show excellent performance in feature extraction in the spatial dimension of data, it is difficult to extract features in the temporal dimension efficiently. Therefore, we propose a tool state recognition method based on the Pruned Optimized Graph Neural Network-Gated Recurrent Unit (POGNN-GRU) under unbalanced data. Firstly, design the Improved-Majority Weighted Minority Oversampling Technique (IMWMOTE) by introducing an adaptive noise removal strategy and improving the MWMOTE to alleviate the unbalanced problem of data. Subsequently, propose a POG graph data construction method based on a multi-scale multi-metric basis and a Gaussian kernel weight function to solve the problem of one-sided description of graph data under a single metric basis. Then, construct the POGNN-GRU model to deeply mine the spatial and temporal features of the data to better identify the state of the tool. Finally, validation and ablation experiments on the PHM 2010 and HMoTP datasets show that the proposed method outperforms the other models in terms of identification, and the highest accuracy improves by 1.62% and 1.86% compared with the corresponding optimal baseline model. Full article
Show Figures

Figure 1

Figure 1
<p>A POGNN-GRU-based model framework for tool state recognition under unbalanced data.</p>
Full article ">Figure 2
<p>POGNN-GRU model framework.</p>
Full article ">Figure 3
<p>(<b>a</b>) Wear variation curves of tools C1; (<b>b</b>) wear variation curves of tools C4; (<b>c</b>) wear variation curves of tools C6.</p>
Full article ">Figure 4
<p>(<b>a</b>) Partial force data for single-cycle experiments during tool engagement; (<b>b</b>) Partial force data for single-cycle experiments during tool disengagement.</p>
Full article ">Figure 5
<p>(<b>a</b>) Wear variation curves of T01; (<b>b</b>) wear variation curves of T02; (<b>c</b>) wear variation curves of T03.</p>
Full article ">Figure 6
<p>Model training and testing results; (<b>a</b>) the result of PHM2010 dataset; (<b>b</b>) the result of HMoTP dataset.</p>
Full article ">Figure 7
<p>Confusion Matrix, (<b>a</b>) Medium_Wear/Slight_Wear classification (no sampling) under PHM2010 dataset; (<b>b</b>) Medium_Wear/Severe_Wear classification (no sampling) under PHM2010 dataset; (<b>c</b>) Medium_Wear/Slight_Wear classification (with IMWMOTE) under PHM2010 dataset; (<b>d</b>) Medium_Wear/Severe_Wear classification (with IMWMOTE) under PHM2010 dataset; (<b>e</b>) Medium_Wear/Slight_Wear classification (no sampling) under HMoTP dataset; (<b>f</b>) Medium_Wear/Severe_Wear classification (no sampling) under HMoTP dataset; (<b>g</b>) Medium_Wear/Slight_Wear classification (with IMWMOTE) under HMoTP dataset; (<b>h</b>) Medium_Wear/Severe_Wear classification (with IMWMOTE) under HMoTP dataset.</p>
Full article ">Figure 8
<p>The computational cost of different models.</p>
Full article ">Figure 9
<p>The result of ablation experiments 1.</p>
Full article ">
23 pages, 7110 KiB  
Article
Ship Detection in Synthetic Aperture Radar Images Based on BiLevel Spatial Attention and Deep Poly Kernel Network
by Siyuan Tian, Guodong Jin, Jing Gao, Lining Tan, Yuanliang Xue, Yang Li and Yantong Liu
J. Mar. Sci. Eng. 2024, 12(8), 1379; https://doi.org/10.3390/jmse12081379 - 12 Aug 2024
Viewed by 634
Abstract
Synthetic aperture radar (SAR) is a technique widely used in the field of ship detection. However, due to the high ship density, fore-ground-background imbalance, and varying target sizes, achieving lightweight and high-precision multiscale ship object detection remains a significant challenge. In response to [...] Read more.
Synthetic aperture radar (SAR) is a technique widely used in the field of ship detection. However, due to the high ship density, fore-ground-background imbalance, and varying target sizes, achieving lightweight and high-precision multiscale ship object detection remains a significant challenge. In response to these challenges, this research presents YOLO-MSD, a multiscale SAR ship detection method. Firstly, we propose a Deep Poly Kernel Backbone Network (DPK-Net) that utilizes the Optimized Convolution (OC) Module to reduce data redundancy and the Poly Kernel (PK) Module to improve the feature extraction capability and scale adaptability. Secondly, we design a BiLevel Spatial Attention Module (BSAM), which consists of the BiLevel Routing Attention (BRA) and the Spatial Attention Module. The BRA is first utilized to capture global information. Then, the Spatial Attention Module is used to improve the network’s ability to localize the target and capture high-quality detailed information. Finally, we adopt a Powerful-IoU (P-IoU) loss function, which can adjust to the ship size adaptively, effectively guiding the anchor box to achieve faster and more accurate detection. Using HRSID and SSDD as experimental datasets, mAP of 90.2% and 98.8% are achieved, respectively, outperforming the baseline by 5.9% and 6.2% with a model size of 12.3 M. Furthermore, the network exhibits excellent performance across various ship scales. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

Figure 1
<p>The overall network structure of YOLO-MSD. The DPK-Net, including the OC module and PK module, is firstly constructed as the backbone network, then the BSAM is introduced at the neck, and finally, the P-IoU is introduced at the regression stage.</p>
Full article ">Figure 2
<p>The structure of the DPK-Net. The backbone network extracts the critical characteristics of the input image, followed by the output of three different scale feature maps.</p>
Full article ">Figure 3
<p>Detailed design of the OC module. (<b>a</b>) shows the structure of the OC Module. (<b>b</b>,<b>c</b>) show a detailed comparison between regular convolution and PConv.</p>
Full article ">Figure 4
<p>Depthwise separable convolution.</p>
Full article ">Figure 5
<p>The PK Module starts with a small kernel convolution for local data and then uses parallel DWConv for the multiscale context.</p>
Full article ">Figure 6
<p>The structure of the BSAM attention mechanism.</p>
Full article ">Figure 7
<p>The overall structure of BRA.</p>
Full article ">Figure 8
<p>IoU-based losses. The loss functions described in (<b>a</b>) incorporate dimensional information, specifically using the diagonal length of the smallest enclosing bounding box (represented by the gray dashed box) for both the anchor and target boxes as the denominator in the loss calculation. Conversely, the P-IoU loss function outlined in (<b>b</b>) simplifies this approach by utilizing only the edge length of the target box as the denominator of its loss factor.</p>
Full article ">Figure 9
<p>PR curves for ablation experiments: (<b>a</b>) is based on HRSID, and (<b>b</b>) is based on SSDD.</p>
Full article ">Figure 10
<p>The loss curves of the proposed YOLO-MSD and the original YOLOv7-tiny model: (<b>a</b>) is based on HRSID, and (<b>b</b>) is based on SSDD.</p>
Full article ">Figure 11
<p>Ship detection results for the HRSID.</p>
Full article ">Figure 12
<p>Ship detection results for the SSDD.</p>
Full article ">Figure 13
<p>Heat map of images at different scales and in different scenes. (<b>a</b>) small ship. (<b>b</b>) medium ship. (<b>c</b>) large ship. (<b>d</b>) offshore ships. (<b>e</b>) inshore ships. (<b>f</b>) dense inshore ships.</p>
Full article ">
24 pages, 4485 KiB  
Article
Vessel Trajectory Prediction for Enhanced Maritime Navigation Safety: A Novel Hybrid Methodology
by Yuhao Li, Qing Yu and Zhisen Yang
J. Mar. Sci. Eng. 2024, 12(8), 1351; https://doi.org/10.3390/jmse12081351 - 8 Aug 2024
Viewed by 623
Abstract
The accurate prediction of vessel trajectory is of crucial importance in order to improve navigational efficiency, optimize routes, enhance the effectiveness of search and rescue operations at sea, and ensure maritime safety. However, the spatial interaction among vessels can have a certain impact [...] Read more.
The accurate prediction of vessel trajectory is of crucial importance in order to improve navigational efficiency, optimize routes, enhance the effectiveness of search and rescue operations at sea, and ensure maritime safety. However, the spatial interaction among vessels can have a certain impact on the prediction accuracy of the models. To overcome such a problem in predicting the vessel trajectory, this research proposes a novel hybrid methodology incorporating the graph attention network (GAT) and long short-term memory network (LSTM). The proposed GAT-LSTM model can comprehensively consider spatio-temporal features in the prediction process, which is expected to significantly improve the accuracy and robustness of the trajectory prediction. The Automatic Identification System (AIS) data from the surrounding waters of Xiamen Port is collected and utilized as the empirical case for model validation. The experimental results demonstrate that the GAT-LSTM model outperforms the best baseline model in terms of the reduction on the average displacement error and final displacement error, which are 44.52% and 56.20%, respectively. These improvements will translate into more accurate vessel trajectories, helping to minimize route deviations and improve the accuracy of collision avoidance systems, so that this research can effectively provide support for warning about potential collisions and reducing the risk of maritime accidents. Full article
(This article belongs to the Special Issue Risk Assessment in Maritime Transportation)
Show Figures

Figure 1

Figure 1
<p>Vessel interaction networks.</p>
Full article ">Figure 2
<p>The schematic diagram of the adjacency matrix.</p>
Full article ">Figure 3
<p>The overall structural framework of the GAT-LSTM model.</p>
Full article ">Figure 4
<p>The structure of graph attention network. (<b>a</b>) Calculation of attention coefficient between node i and node j. (<b>b</b>) A diagram of a multi-head (K = 3) attention layer, with arrows of different colors representing independent attention calculations. The features of each head are concatenated to obtain <math display="inline"><semantics> <mrow> <msup> <mrow> <msub> <mrow> <mi>h</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> <mrow> <mo>′</mo> </mrow> </msup> </mrow> </semantics></math>.</p>
Full article ">Figure 5
<p>The unit structure of long short-term memory.</p>
Full article ">Figure 6
<p>The visualization of vessel trajectories in research waters.</p>
Full article ">Figure 7
<p>The overall process of AIS data preprocessing.</p>
Full article ">Figure 8
<p>Training procedure of the proposed model.</p>
Full article ">Figure 9
<p>The loss curves of the baseline model and the proposed model. (<b>a</b>) LSTM model loss curve. (<b>b</b>) CNN-LSTM model loss curve. (<b>c</b>) GRU model loss curve. (<b>d</b>) Seq2seq model loss curve. (<b>e</b>) GAT-LSTM model loss curve.</p>
Full article ">Figure 10
<p>Visualization results of vessel trajectory prediction models under different navigation scenarios. (<b>a</b>,<b>b</b>) The trajectory prediction results of vessel sailing in the same direction. (<b>c</b>) The trajectory prediction results when the vessel turns after sailing straight for a period and then quickly returns to a straight course. (<b>d</b>) The trajectory prediction results of vessel after turning in open waters. (<b>e</b>) The trajectory prediction results as the vessel turns towards an anchorage. (<b>f</b>) The trajectory prediction results in a narrow channel after a turn.</p>
Full article ">Figure 11
<p>Comparison of predictive performance between baseline models and proposed model.</p>
Full article ">Figure 12
<p>(<b>a</b>) The vessel trajectory prediction results of the head-on situation. (<b>b</b>) The risk result of the head-on situation.</p>
Full article ">Figure 13
<p>(<b>a</b>) The vessel trajectory prediction results of the overtaking situation. (<b>b</b>) The risk result of the overtaking situation.</p>
Full article ">
16 pages, 2033 KiB  
Article
Deciphering Optimal Radar Ensemble for Advancing Sleep Posture Prediction through Multiview Convolutional Neural Network (MVCNN) Approach Using Spatial Radio Echo Map (SREM)
by Derek Ka-Hei Lai, Andy Yiu-Chau Tam, Bryan Pak-Hei So, Andy Chi-Ho Chan, Li-Wen Zha, Duo Wai-Chi Wong and James Chung-Wai Cheung
Sensors 2024, 24(15), 5016; https://doi.org/10.3390/s24155016 - 2 Aug 2024
Cited by 1 | Viewed by 586
Abstract
Assessing sleep posture, a critical component in sleep tests, is crucial for understanding an individual’s sleep quality and identifying potential sleep disorders. However, monitoring sleep posture has traditionally posed significant challenges due to factors such as low light conditions and obstructions like blankets. [...] Read more.
Assessing sleep posture, a critical component in sleep tests, is crucial for understanding an individual’s sleep quality and identifying potential sleep disorders. However, monitoring sleep posture has traditionally posed significant challenges due to factors such as low light conditions and obstructions like blankets. The use of radar technolsogy could be a potential solution. The objective of this study is to identify the optimal quantity and placement of radar sensors to achieve accurate sleep posture estimation. We invited 70 participants to assume nine different sleep postures under blankets of varying thicknesses. This was conducted in a setting equipped with a baseline of eight radars—three positioned at the headboard and five along the side. We proposed a novel technique for generating radar maps, Spatial Radio Echo Map (SREM), designed specifically for data fusion across multiple radars. Sleep posture estimation was conducted using a Multiview Convolutional Neural Network (MVCNN), which serves as the overarching framework for the comparative evaluation of various deep feature extractors, including ResNet-50, EfficientNet-50, DenseNet-121, PHResNet-50, Attention-50, and Swin Transformer. Among these, DenseNet-121 achieved the highest accuracy, scoring 0.534 and 0.804 for nine-class coarse- and four-class fine-grained classification, respectively. This led to further analysis on the optimal ensemble of radars. For the radars positioned at the head, a single left-located radar proved both essential and sufficient, achieving an accuracy of 0.809. When only one central head radar was used, omitting the central side radar and retaining only the three upper-body radars resulted in accuracies of 0.779 and 0.753, respectively. This study established the foundation for determining the optimal sensor configuration in this application, while also exploring the trade-offs between accuracy and the use of fewer sensors. Full article
Show Figures

Figure 1

Figure 1
<p>Radar placement around the bed. S1–S5 denote radar sensors arranged from cranial (S1) to caudal direction. HL, HC, and HR denote radar positions at the left, center, and right of the headboard.</p>
Full article ">Figure 2
<p>Illustration of the nine sleep postures with three blanket conditions (thick, medium, and thin). The postures are: supine (S); left lateral side lying with both legs extended (L. Log); left lateral side lying at a half-stomach position (L. Sto); left lateral side lying at a fetal position (L. Fet); right lateral side lying (R. Log); right lateral side lying at a half-stomach position (R. Sto); right lateral side lying at a fetal position (R. Fet.); prone position with head turned left (L. Pr.); and prone position with head turned right (R. Pr.). The no-blanket condition is displayed for illustration and not included in the dataset.</p>
Full article ">Figure 3
<p>An illustration of Spatial Radar Echo Maps (SREMs) in all radars.</p>
Full article ">Figure 4
<p>Model architecture of MWCNN with different feature extractors.</p>
Full article ">
13 pages, 2883 KiB  
Article
Hybrid Integrated Wearable Patch for Brain EEG-fNIRS Monitoring
by Boyu Li, Mingjie Li, Jie Xia, Hao Jin, Shurong Dong and Jikui Luo
Sensors 2024, 24(15), 4847; https://doi.org/10.3390/s24154847 - 25 Jul 2024
Viewed by 593
Abstract
Synchronous monitoring electroencephalogram (EEG) and functional near-infrared spectroscopy (fNIRS) have received significant attention in brain science research for their provision of more information on neuro-loop interactions. There is a need for an integrated hybrid EEG-fNIRS patch to synchronously monitor surface EEG and deep [...] Read more.
Synchronous monitoring electroencephalogram (EEG) and functional near-infrared spectroscopy (fNIRS) have received significant attention in brain science research for their provision of more information on neuro-loop interactions. There is a need for an integrated hybrid EEG-fNIRS patch to synchronously monitor surface EEG and deep brain fNIRS signals. Here, we developed a hybrid EEG-fNIRS patch capable of acquiring high-quality, co-located EEG and fNIRS signals. This patch is wearable and provides easy cognition and emotion detection, while reducing the spatial interference and signal crosstalk by integration, which leads to high spatial–temporal correspondence and signal quality. The modular design of the EEG-fNIRS acquisition unit and optimized mechanical design enables the patch to obtain EEG and fNIRS signals at the same location and eliminates spatial interference. The EEG pre-amplifier on the electrode side effectively improves the acquisition of weak EEG signals and significantly reduces input noise to 0.9 μVrms, amplitude distortion to less than 2%, and frequency distortion to less than 1%. Detrending, motion correction algorithms, and band-pass filtering were used to remove physiological noise, baseline drift, and motion artifacts from the fNIRS signal. A high fNIRS source switching frequency configuration above 100 Hz improves crosstalk suppression between fNIRS and EEG signals. The Stroop task was carried out to verify its performance; the patch can acquire event-related potentials and hemodynamic information associated with cognition in the prefrontal area. Full article
(This article belongs to the Special Issue Sensors for Physiological Monitoring and Digital Health)
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Overall system architecture. (<b>b</b>) Layout of EEG and fNIRS sensors. (<b>c</b>) Positioning structure of EEG electrodes, LEDs, and PDs.</p>
Full article ">Figure 2
<p>(<b>a</b>) The system circuit block diagram. (<b>b</b>) The proposed acquisition board. (<b>c</b>) The barrel, cap, and optical filter to fix the LED and PD. (<b>d</b>) The EEG-fNIRS acquisition module of the proposed patch. (<b>e</b>) Front and back view of the entire patch.</p>
Full article ">Figure 3
<p>(<b>a</b>) EEG input-referred noise in no-LED flashing condition and LED flashing condition; (<b>b</b>) EEG input-referred noise spectrum in LED flashing condition; (<b>c</b>) EEG amplitude distortion measurement; (<b>d</b>) EEG frequency distortion measurement; and (<b>e</b>) ΔHbO<sub>2</sub>, ΔHbR trend in forearm block experiment.</p>
Full article ">Figure 4
<p>(<b>a</b>) Schematic diagram of incongruent and congruent trial. (<b>b</b>) Experimental paradigm for Stroop task.</p>
Full article ">Figure 5
<p>(<b>a</b>) Raw EEG data of Fp1 and Fp2; (<b>b</b>) ERP results in a trial; (<b>c</b>) average amplitude of three ERP components in Fp1 and Fp2; (<b>d</b>) comparison of original fNIRS signal and preprocessed fNIRS signal; and (<b>e</b>) the ΔHbO2, ΔHbR, and ΔHbT values of the brain Fp1 point in the Stroop task.</p>
Full article ">
0 pages, 3739 KiB  
Article
A Lightweight and Efficient Multi-Type Defect Detection Method for Transmission Lines Based on DCP-YOLOv8
by Yong Wang, Linghao Zhang, Xingzhong Xiong, Junwei Kuang and Siyu Xiang
Sensors 2024, 24(14), 4491; https://doi.org/10.3390/s24144491 - 11 Jul 2024
Viewed by 730
Abstract
Currently, the intelligent defect detection of massive grid transmission line inspection pictures using AI image recognition technology is an efficient and popular method. Usually, there are two technical routes for the construction of defect detection algorithm models: one is to use a lightweight [...] Read more.
Currently, the intelligent defect detection of massive grid transmission line inspection pictures using AI image recognition technology is an efficient and popular method. Usually, there are two technical routes for the construction of defect detection algorithm models: one is to use a lightweight network, which improves the efficiency, but it can generally only target a few types of defects and may reduce the detection accuracy; the other is to use a complex network model, which improves the accuracy, and can identify multiple types of defects at the same time, but it has a large computational volume and low efficiency. To maintain the model’s high detection accuracy as well as its lightweight structure, this paper proposes a lightweight and efficient multi type defect detection method for transmission lines based on DCP-YOLOv8. The method employs deformable convolution (C2f_DCNv3) to enhance the defect feature extraction capability, and designs a re-parameterized cross phase feature fusion structure (RCSP) to optimize and fuse high-level semantic features with low level spatial features, thus improving the capability of the model to recognize defects at different scales while significantly reducing the model parameters; additionally, it combines the dynamic detection head and deformable convolutional v3’s detection head (DCNv3-Dyhead) to enhance the feature expression capability and the utilization of contextual information to further improve the detection accuracy. Experimental results show that on a dataset containing 20 real transmission line defects, the method increases the average accuracy ([email protected]) to 72.2%, an increase of 4.3%, compared with the lightest baseline YOLOv8n model; the number of model parameters is only 2.8 M, a reduction of 9.15%, and the number of processed frames per second (FPS) reaches 103, which meets the real time detection demand. In the scenario of multi type defect detection, it effectively balances detection accuracy and performance with quantitative generalizability. Full article
(This article belongs to the Section Electronic Sensors)
Show Figures

Figure 1

Figure 1
<p>Detailed architecture of the proposed DCP-YOLOv8.</p>
Full article ">Figure 2
<p>Structure diagram of the Deformable Convolutions v3.</p>
Full article ">Figure 3
<p>Structure of the C2f_DCNv3.</p>
Full article ">Figure 4
<p>Structure of the RCSP.</p>
Full article ">Figure 5
<p>Structure of the DCNv3-Dyhead.</p>
Full article ">Figure 6
<p>Distribution of the number of defective samples in the training set.</p>
Full article ">Figure 7
<p>Analysis during training; (<b>a</b>) represents the box loss curve, (<b>b</b>) represents the classification loss curve, and (<b>c</b>) represents the map curve changes of YOLOv8n and DCP-YOLOv8.</p>
Full article ">Figure 8
<p>Some examples of defect detection effects.</p>
Full article ">Figure 8 Cont.
<p>Some examples of defect detection effects.</p>
Full article ">Figure 9
<p>Robustness analysis of DCP-YOLOv8 in complex environments.</p>
Full article ">Figure 9 Cont.
<p>Robustness analysis of DCP-YOLOv8 in complex environments.</p>
Full article ">Figure 10
<p>Some examples of defective target heat maps.</p>
Full article ">Figure 10 Cont.
<p>Some examples of defective target heat maps.</p>
Full article ">
30 pages, 11600 KiB  
Article
A Performance and Data-Driven Method for Optimization of Traditional Courtyards
by Zhixin Xu, Xia Huang, Xin Zheng, Ji-Yu Deng and Bo Sun
Sustainability 2024, 16(13), 5779; https://doi.org/10.3390/su16135779 - 6 Jul 2024
Viewed by 875
Abstract
As urbanization and rapid industrialization accelerate, rural areas face increasing pressure on resources and the environment, leading to challenges such as energy waste and reduced comfort. Traditional village planning and design methods are based on economic benefits and often lack consideration of climate [...] Read more.
As urbanization and rapid industrialization accelerate, rural areas face increasing pressure on resources and the environment, leading to challenges such as energy waste and reduced comfort. Traditional village planning and design methods are based on economic benefits and often lack consideration of climate adaptability. To address these issues, a comprehensive assessment of building and courtyard performance should be introduced early in the planning of traditional villages. This approach can better adapt the buildings to their climatic conditions. Introducing relevant performance indicators, such as outdoor comfort, indoor lighting, and building energy consumption, at the initial design stage is crucial. This article employs performance-based multi-objective optimization algorithms and machine learning techniques to investigate the design workflow of courtyards and their combinations. The goal is to enhance planners’ design efficiency in village planning by integrating data-driven and performance-driven methods. The research results show that during the performance-driven phase, by adjusting the spatial morphology and architectural parameters, the performance of the courtyard significantly improved compared to the baseline model. Energy efficiency increased by 32.3%, the physiological equivalent temperature (PET) comfort time ratio in winter was enhanced by 8.3%, and the ratio in summer increased by 3.8%. During the data-driven phase, the classification prediction accuracy of courtyard performance can reach 83%, and the F1 score is 0.81. In the project validation phase, it has also been proven that the performance of different plans can be quickly verified. Compared to the base’s original status, the design solutions’ performance score can be improved from 59.12 to 85.62. In summary, this workflow improves the efficiency of the interaction between design decisions and performance evaluation in the conceptual stage of village planning, providing a solid foundation for promoting subsequent solutions. Full article
Show Figures

Figure 1

Figure 1
<p>Overall workflow.</p>
Full article ">Figure 2
<p>Research object location (As shown in the position marked by the five-pointed star in the figure).</p>
Full article ">Figure 3
<p>Distribution of Testing Points (The numbers next to the circles indicate the order of the measurement points).</p>
Full article ">Figure 4
<p>On-site photo of the testing instruments.</p>
Full article ">Figure 5
<p>Comparison of simulated and measured outdoor air temperatures.</p>
Full article ">Figure 6
<p>Comparison of simulated and measured globe temperatures.</p>
Full article ">Figure 7
<p>Comparison of simulated and measured relative humidity.</p>
Full article ">Figure 8
<p>Comparison of simulated and measured wind speeds.</p>
Full article ">Figure 9
<p>Aerial View of the Measured Courtyard.</p>
Full article ">Figure 10
<p>Courtyard Classification Diagram.</p>
Full article ">Figure 11
<p>PET comfort time ratio_Typical summer and winter weeks.</p>
Full article ">Figure 12
<p>Baseline Model and Site Environment settings.</p>
Full article ">Figure 13
<p>Layout corresponding to the 59th generation Pareto front solution set.</p>
Full article ">Figure 14
<p>Courtyard layout optimization based on different optimization goals.</p>
Full article ">Figure 15
<p>Performance simulation results of the optimal courtyard unit selected by Average of Fitness Ranks. Red refers to: Floor Normalized Electric Equipment Energy for Building(Monthly); Orange refers to: Floor Normalized Electric Lighting Energy for Building(Monthly); Light blue refers to: Floor Normalized Heating Load for Building(Monthly); Dark blue refers to: Floor Normalized Cooling Load for Building(Monthly).</p>
Full article ">Figure 16
<p>PET comfort time and wind speed distribution during typical summer and winter weeks. (three-courtyard houses).</p>
Full article ">Figure 17
<p>PET comfort time and wind speed distribution during typical summer and winter weeks. (four-courtyard houses).</p>
Full article ">Figure 18
<p>PET comfort time and wind speed distribution during typical summer and winter weeks. (five-courtyard houses).</p>
Full article ">Figure 19
<p>Statistical results of the PET comfort time ratio for courtyard combinations. (the left is for the typical week in winter, and the right is for the typical week in summer; see below). The red on the graph indicates the maximum value of this row.</p>
Full article ">Figure 20
<p>Statistical results of the area ratio of the quiet wind zone for courtyard combinations. The red on the left represents the maximum value of this row, and the red on the right represents the minimum value of this row.</p>
Full article ">Figure 21
<p>Correlation analysis between the PET comfort time ratio and the area ratio of the quiet wind zone.</p>
Full article ">Figure 22
<p>Schematic diagram of 5-fold cross-validation.</p>
Full article ">Figure 23
<p>Schematic diagram of the XGBoost regression tree model.</p>
Full article ">Figure 24
<p>The general plan of different schemes.</p>
Full article ">
16 pages, 3904 KiB  
Article
Research on Zoning and Carbon Sink Enhancement Strategies for Ecological Spaces in Counties with Different Landform Types
by Jianfeng Li, Yang Zhang, Longfei Xia, Jing Wang, Huping Ye, Siqi Liu and Zhuoying Zhang
Sustainability 2024, 16(13), 5700; https://doi.org/10.3390/su16135700 - 3 Jul 2024
Viewed by 752
Abstract
Ecological carbon sinks, pivotal in mitigating carbon emissions, are indispensable for climate change mitigation. Counties, as the fundamental units of ecological space management, directly impact the achievement of regional dual carbon targets through their levels of carbon sink. However, existing research has overlooked [...] Read more.
Ecological carbon sinks, pivotal in mitigating carbon emissions, are indispensable for climate change mitigation. Counties, as the fundamental units of ecological space management, directly impact the achievement of regional dual carbon targets through their levels of carbon sink. However, existing research has overlooked the intricate relationship between terrain features and ecological spaces, leading to a lack of specific guidance on enhancing the carbon sink for counties with diverse landform characteristics. This study focused on Jingbian County (Loess Plateau), Fuping County (Guanzhong Plain), and Chenggu County (Qinba Mountains), each characterized by distinct landform characteristics. This study proposes a comprehensive identification model for ecological space within the context of dual carbon targets. Utilizing this model as a basis, the land use structure, carbon sink potential, and ecological spatial patterns of different counties were systematically analyzed. The results indicated substantial disparities in land use structure, carbon sink capabilities, and ecological space distributions among counties with different landform types. Specifically, Jingbian County was predominantly covered by grassland, exhibiting a moderate overall carbon sink capacity, with baseline ecological spaces playing a significant role. Conversely, Fuping County, dominated by cultivated land and construction land, exhibited the lowest carbon sink capacity, with non-ecological spaces accounting for a staggering 85.93%. Chenggu County, on the other hand, was characterized by the dominance of forestland, with nearly all its carbon sink originating from forestland, and core ecological spaces occupying a leading position. Tailored optimization strategies are recommended based on varying terrain features: Jingbian County should prioritize ecosystem restoration and conservation, while Fuping County should concentrate on optimizing land use structure and promoting urban greening. Reinforcing the carbon sink capacity of existing ecosystems is crucial for Chenggu County. This study broadens the perspective on ecological space optimization and provides scientific guidance and pragmatic insights tailored to regional disparities, which are instrumental in assisting various regions to achieve their dual carbon targets. Full article
(This article belongs to the Topic Energy Economics and Sustainable Development)
Show Figures

Figure 1

Figure 1
<p>The locations of the study areas.</p>
Full article ">Figure 2
<p>Multivariate comprehensive identification model of ecological space under the dual carbon targets.</p>
Full article ">Figure 3
<p>The distribution pattern of land use. (<b>a</b>) Jingbian County. (<b>b</b>) Fuping County. (<b>c</b>) Chenggu County.</p>
Full article ">Figure 4
<p>Distribution proportions of ecological spaces in counties with different landform types.</p>
Full article ">Figure 5
<p>Distribution patterns of ecological spaces in counties with different landform types. (<b>a</b>) Jingbian County. (<b>b</b>) Fuping County. (<b>c</b>) Chenggu County.</p>
Full article ">
21 pages, 4061 KiB  
Article
A Lightweight Crop Pest Classification Method Based on Improved MobileNet-V2 Model
by Hongxing Peng, Huiming Xu, Guanjia Shen, Huanai Liu, Xianlu Guan and Minhui Li
Agronomy 2024, 14(6), 1334; https://doi.org/10.3390/agronomy14061334 - 20 Jun 2024
Viewed by 627
Abstract
This paper proposes PestNet, a lightweight method for classifying crop pests, which improves upon MobileNet-V2 to address the high model complexity and low classification accuracy commonly found in pest classification research. Firstly, the training phase employs the AdamW optimizer and mixup data augmentation [...] Read more.
This paper proposes PestNet, a lightweight method for classifying crop pests, which improves upon MobileNet-V2 to address the high model complexity and low classification accuracy commonly found in pest classification research. Firstly, the training phase employs the AdamW optimizer and mixup data augmentation techniques to enhance the model’s convergence and generalization capabilities. Secondly, the Adaptive Spatial Group-Wise Enhanced (ASGE) attention mechanism is introduced and integrated into the inverted residual blocks of the MobileNet-V2 model, boosting the model’s ability to extract both local and global pest information. Additionally, a dual-branch feature fusion module is developed using convolutional kernels of varying sizes to enhance classification performance for pests of different scales under real-world conditions. Lastly, the model’s activation function and overall architecture are optimized to reduce complexity. Experimental results on a proprietary pest dataset show that PestNet achieves classification accuracy and an F1 score of 87.62% and 86.90%, respectively, marking improvements of 4.20 percentage points and 5.86 percentage points over the baseline model. Moreover, PestNet’s parameter count and floating-point operations are reduced by 14.10% and 37.50%, respectively, compared to the baseline model. When compared with ResNet-50, MobileNet V3-Large, and EfficientNet-B1, PestNet offers superior parameter efficiency and floating-point operation requirements, as well as improved pest classification accuracy. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>Some examples of images of rice gall midges from the IP102 dataset.</p>
Full article ">Figure 2
<p>Some examples of images of pests from the Pest37 dataset.</p>
Full article ">Figure 3
<p>The core structure of the MobileNet-V2.</p>
Full article ">Figure 4
<p>The structure of SGE and ASGE attention mechanisms.</p>
Full article ">Figure 5
<p>SGE attention mechanism insertion points in the inverted residual blocks.</p>
Full article ">Figure 6
<p>The four structures of the dual-branch feature fusion module (DFFM).</p>
Full article ">Figure 7
<p>Bottleneck_ASGE_DFFM.</p>
Full article ">Figure 8
<p>The function graph of GELU.</p>
Full article ">Figure 9
<p>The structure of PestNet.</p>
Full article ">Figure 10
<p>The variation of model loss.</p>
Full article ">Figure 11
<p>The class activation maps of models with introduced attention mechanisms.</p>
Full article ">Figure 12
<p>The prediction results and corresponding class activation maps of PestNet and MobileNet-V2.</p>
Full article ">Figure 13
<p>The accuracy of each model during iterations on the Pest37 validation set.</p>
Full article ">
Back to TopTop