Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (869)

Search Parameters:
Keywords = point cloud feature extraction

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 8895 KiB  
Article
Automated 3D Image Processing System for Inspection of Residential Wall Spalls
by Junjie Wang, Yunfang Pang and Xinyu Teng
Appl. Sci. 2025, 15(4), 2140; https://doi.org/10.3390/app15042140 - 18 Feb 2025
Viewed by 2
Abstract
Continuous spalling exposure can weaken the performance of structures. Therefore, the development of methods for detecting wall spall damage remains essential in the field of Structural Health Monitoring. Currently, researchers mainly rely on 2D information for spall detection and predominantly use manual data [...] Read more.
Continuous spalling exposure can weaken the performance of structures. Therefore, the development of methods for detecting wall spall damage remains essential in the field of Structural Health Monitoring. Currently, researchers mainly rely on 2D information for spall detection and predominantly use manual data collection methods in the complex environment of residential buildings, which are usually inefficient. To address this challenge, an automated 3D image processing system for wall spalls is proposed in this study. First, UGV path planning was performed in order to collect information about the surrounding environmental defects. Second, to address the shortcomings of RandLA-Net, a dynamic enhanced dual-branch structure is established based on which consistency constraints are introduced, a lightweight attention module is added, and the loss function is optimized in order to enhance the ability of the model in extracting feature information of the point cloud. Finally, spalls are quantitatively evaluated to determine the damage to buildings. The results show that the Randla-Spall achieves 94.71% Recall and 84.20% mIoU on the test set, improved by 4.25% and 5.37%. An integrated process using a lightweight device is achieved in this study, which is capable of efficiently extracting and quantifying spalling defects and provides valuable references for SHM. Full article
(This article belongs to the Section Civil Engineering)
Show Figures

Figure 1

Figure 1
<p>Residential wall spall inspection system.</p>
Full article ">Figure 2
<p>Process of residential wall spall inspections.</p>
Full article ">Figure 3
<p>Example of test scenario selection.</p>
Full article ">Figure 4
<p>Data acquisition UGV.</p>
Full article ">Figure 5
<p>Example of path planning and dataset creation.</p>
Full article ">Figure 6
<p>Comparison of dense reconstruction.</p>
Full article ">Figure 7
<p>Dataset structure.</p>
Full article ">Figure 8
<p>RandLA-Spall network architecture.</p>
Full article ">Figure 9
<p>Example of data enhancement.</p>
Full article ">Figure 10
<p>RandLA-Spall residual module.</p>
Full article ">Figure 11
<p>CBAM module structure: (<b>a</b>) CAM, (<b>b</b>) SAM, and (<b>c</b>) CBAM.</p>
Full article ">Figure 12
<p>Comparison of experimental indicators.</p>
Full article ">Figure 13
<p>Semantic segmentation results.</p>
Full article ">Figure 14
<p>Larger-scale structures semantic segmentation results.</p>
Full article ">Figure 15
<p>Comparison of segmentation indicators.</p>
Full article ">Figure 16
<p>Comparison of ablation experiment indicators.</p>
Full article ">Figure 17
<p>Sample example.</p>
Full article ">
17 pages, 7393 KiB  
Article
Laser Stripe Centerline Extraction Method for Deep-Hole Inner Surfaces Based on Line-Structured Light Vision Sensing
by Huifu Du, Daguo Yu, Xiaowei Zhao and Ziyang Zhou
Sensors 2025, 25(4), 1113; https://doi.org/10.3390/s25041113 - 12 Feb 2025
Viewed by 297
Abstract
This paper proposes a point cloud post-processing method based on the minimum spanning tree (MST) and depth-first search (DFS) to extract laser stripe centerlines from the complex inner surfaces of deep holes. Addressing the limitations of traditional image processing methods, which are affected [...] Read more.
This paper proposes a point cloud post-processing method based on the minimum spanning tree (MST) and depth-first search (DFS) to extract laser stripe centerlines from the complex inner surfaces of deep holes. Addressing the limitations of traditional image processing methods, which are affected by burrs and low-frequency random noise, this method utilizes 360° structured light to illuminate the inner wall of the deep hole. A sensor captures laser stripe images, and the Steger algorithm is employed to extract sub-pixel point clouds. Subsequently, an MST is used to construct the point cloud connectivity structure, while DFS is applied for path search and noise removal to enhance extraction accuracy. Experimental results demonstrate that this method significantly improves extraction accuracy, with a dice similarity coefficient (DSC) approaching 1 and a maximum Hausdorff distance (HD) of 3.3821 pixels, outperforming previous methods. This study provides an efficient and reliable solution for the precise extraction of complex laser stripes and lays a solid data foundation for subsequent feature parameter calculations and 3D reconstruction. Full article
Show Figures

Figure 1

Figure 1
<p>Analysis of laser stripe image features and geometric centerline extraction in a smooth deep hole. (<b>a</b>) Contour image of the laser stripe in a smooth deep hole; (<b>b</b>) 3D distribution of grayscale intensity in the laser stripe image of a smooth deep hole; (<b>c</b>) extracted laser stripe centerline in a smooth deep hole using the Steger algorithm; (<b>d</b>) locally magnified view of the extracted laser stripe centerline in a smooth deep hole.</p>
Full article ">Figure 2
<p>Analysis of laser stripe image features and geometric centerline extraction in petal-shaped structures. (<b>a</b>) Contour image of the laser stripe in a petal-shaped structure; (<b>b</b>) 3D distribution of grayscale intensity in the laser stripe image of a petal-shaped structure; (<b>c</b>) extracted laser stripe centerline in a petal-shaped structure using the Steger algorithm; (<b>d</b>) locally magnified view of the extracted laser stripe centerline in a petal-shaped structure.</p>
Full article ">Figure 3
<p>Analysis of laser stripe image features and geometric centerline extraction in internal gears. (<b>a</b>) Contour image of the laser stripe in an internal gear; (<b>b</b>) 3D distribution of grayscale intensity in the laser stripe image of an internal gear; (<b>c</b>) extracted laser stripe centerline in an internal gear using the Steger algorithm; (<b>d</b>) locally magnified view of the extracted laser stripe centerline in an internal gear.</p>
Full article ">Figure 4
<p>Analysis of laser stripe image features and geometric centerline extraction in rectangular splines. (<b>a</b>) Contour image of the laser stripe in a rectangular spline; (<b>b</b>) 3D distribution of grayscale intensity in the laser stripe image of a rectangular spline; (<b>c</b>) extracted laser stripe centerline in a rectangular spline using the Steger algorithm; (<b>d</b>) locally magnified view of the extracted laser stripe centerline in a rectangular spline.</p>
Full article ">Figure 5
<p>Analysis of laser stripe image features and geometric centerline extraction in internal octagons. (<b>a</b>) Contour image of the laser stripe in an internal octagon; (<b>b</b>) 3D distribution of grayscale intensity in the laser stripe image of an internal octagon; (<b>c</b>) extracted laser stripe centerline in an internal octagon using the Steger algorithm; (<b>d</b>) locally magnified view of the extracted laser stripe centerline in an internal octagon.</p>
Full article ">Figure 6
<p>Minimum spanning tree generation example.</p>
Full article ">Figure 7
<p>DFS path search example in MST.</p>
Full article ">Figure 8
<p>Extraction results of the petal-shaped laser stripe centerline.</p>
Full article ">Figure 9
<p>Extraction results of the internal gear laser stripe centerline.</p>
Full article ">Figure 9 Cont.
<p>Extraction results of the internal gear laser stripe centerline.</p>
Full article ">Figure 10
<p>Extraction results of the rectangular spline laser stripe centerline.</p>
Full article ">Figure 11
<p>Extraction results of the internal octagonal laser stripe centerline.</p>
Full article ">Figure 12
<p>Laser stripe centerline extraction results based on MST and DFS compared with manual extraction results.</p>
Full article ">
21 pages, 7839 KiB  
Article
High-Throughput 3D Rice Chalkiness Detection Based on Micro-CT and VSE-UNet
by Zhiqi Cai, Yangjun Deng, Xinghui Zhu, Bo Li, Chenglin Xu and Donghui Li
Agronomy 2025, 15(2), 450; https://doi.org/10.3390/agronomy15020450 - 12 Feb 2025
Viewed by 264
Abstract
Rice is a staple food for nearly half the global population and, with rising living standards, the demand for high-quality grain is increasing. Chalkiness, a key determinant of appearance quality, requires accurate detection for effective quality evaluation. While traditional 2D imaging has been [...] Read more.
Rice is a staple food for nearly half the global population and, with rising living standards, the demand for high-quality grain is increasing. Chalkiness, a key determinant of appearance quality, requires accurate detection for effective quality evaluation. While traditional 2D imaging has been used for chalkiness detection, its inherent inability to capture complete 3D morphology limits its suitability for precision agriculture and breeding. Although micro-CT has shown promise in 3D chalk phenotype analysis, high-throughput automated 3D detection for multiple grains remains a challenge, hindering practical applications. To address this, we propose a high-throughput 3D chalkiness detection method using micro-CT and VSE-UNet. Our method begins with non-destructive 3D imaging of grains using micro-CT. For the accurate segmentation of kernels and chalky regions, we propose VSE-UNet, an improved VGG-UNet with an SE attention mechanism for enhanced feature learning. Through comprehensive training optimization strategies, including the Dice focal loss function and dropout technique, the model achieves robust and accurate segmentation of both kernels and chalky regions in continuous CT slices. To enable high-throughput 3D analysis, we developed a unified 3D detection framework integrating isosurface extraction, point cloud conversion, DBSCAN clustering, and Poisson reconstruction. This framework overcomes the limitations of single-grain analysis, enabling simultaneous multi-grain detection. Finally, 3D morphological indicators of chalkiness are calculated using triangular mesh techniques. Experimental results demonstrate significant improvements in both 2D segmentation (7.31% improvement in chalkiness IoU, 2.54% in mIoU, 2.80% in mPA) and 3D phenotypic measurements, with VSE-UNet achieving more accurate volume and dimensional measurements compared with the baseline. These improvements provide a reliable foundation for studying chalkiness formation and enable high-throughput phenotyping. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>Schematic illustration of rice grain X-ray micro-CT imaging and annotation process: (<b>a</b>) Schematic illustration of X-ray micro-CT imaging principle; (<b>b</b>) raw micro-CT slice images; (<b>c</b>) manually annotated micro-CT slices.</p>
Full article ">Figure 2
<p>Image annotation status: (<b>a</b>) micro-CT image of original rice grain; (<b>b</b>) image annotation results. Black represents the background, red represents the rice kernel, and green represents the chalky areas.</p>
Full article ">Figure 3
<p>The architecture of VSE-UNet. SE refers to the squeeze-and-excitation attention mechanism.</p>
Full article ">Figure 4
<p>The structure of the SE attention mechanism.</p>
Full article ">Figure 5
<p>Unified 3D chalkiness detection framework demonstrates the complete process of rice grain analysis. (<b>a</b>) Original micro-CT scan image of rice grains showing the raw input data; (<b>b</b>) implementation of VSE-UNet for image segmentation; (<b>c</b>) generated mask image highlighting the kernel and chalkiness regions; (<b>d</b>) 3D surface reconstruction model showing 5 rice kernels; (<b>e</b>) point cloud representation of 5 rice kernels demonstrating the spatial distribution; (<b>f</b>) detailed point cloud model focusing on a single rice kernel; (<b>g</b>) surface reconstruction model of an individual rice kernel showing detailed morphological features.</p>
Full article ">Figure 6
<p>Comparison of 2D segmentation results: (<b>a</b>) original micro-CT image; (<b>b</b>) ground truth; (<b>c</b>) VGG-UNet segmentation results; (<b>d</b>) VSE-UNet segmentation results.</p>
Full article ">Figure 7
<p>Comparison of 3D surface model results: (<b>a</b>) ground truth reconstruction; (<b>b</b>) VGG-UNet segmentation reconstruction; (<b>c</b>) VSE-UNet segmentation reconstruction.</p>
Full article ">Figure 8
<p>Comparison of segmentation results of 3D point cloud models: (<b>a</b>) segmented point cloud model from ground truth; (<b>b</b>) segmented point cloud model using VGG-UNet; (<b>c</b>) segmented point cloud model using VSE-UNet.</p>
Full article ">Figure 9
<p>Comparison of point cloud models of a single kernel and its chalkiness region: (<b>a</b>) point cloud model of the kernel and its chalkiness region from ground truth; (<b>b</b>) point cloud model of the kernel and its chalkiness region using VGG-UNet; (<b>c</b>) point cloud model of the kernel and its chalkiness region using VSE-UNet.</p>
Full article ">Figure 10
<p>Comparison of surface models of a single kernel reconstructed from point cloud models: (<b>a</b>) surface model of a single kernel reconstructed from the ground truth point cloud model; (<b>b</b>) surface model of a single kernel reconstructed from the point cloud model segmented using VGG-UNet; (<b>c</b>) surface model of a single kernel reconstructed from the point cloud model segmented using VSE-UNet.</p>
Full article ">
18 pages, 39910 KiB  
Article
DyGS-SLAM: Realistic Map Reconstruction in Dynamic Scenes Based on Double-Constrained Visual SLAM
by Fan Zhu, Yifan Zhao, Ziyu Chen, Chunmao Jiang, Hui Zhu and Xiaoxi Hu
Remote Sens. 2025, 17(4), 625; https://doi.org/10.3390/rs17040625 - 12 Feb 2025
Viewed by 426
Abstract
Visual SLAM is widely applied in robotics and remote sensing. The fusion of Gaussian radiance fields and Visual SLAM has demonstrated astonishing efficacy in constructing high-quality dense maps. While existing methods perform well in static scenes, they are prone to the influence of [...] Read more.
Visual SLAM is widely applied in robotics and remote sensing. The fusion of Gaussian radiance fields and Visual SLAM has demonstrated astonishing efficacy in constructing high-quality dense maps. While existing methods perform well in static scenes, they are prone to the influence of dynamic objects in real-world dynamic environments, thus making robust tracking and mapping challenging. We introduce DyGS-SLAM, a Visual SLAM system that employs dual constraints to achieve high-fidelity static map reconstruction in dynamic environments. We extract ORB features within the scene, and use open-world semantic segmentation models and multi-view geometry to construct dual constraints, forming a zero-shot dynamic information elimination module while recovering backgrounds occluded by dynamic objects. Furthermore, we select high-quality keyframes and use them for loop closure detection and global optimization, constructing a foundational Gaussian map through a set of determined point clouds and poses and integrating repaired frames for rendering new viewpoints and optimizing 3D scenes. Experimental results on the TUM RGB-D, Bonn, and Replica datasets, as well as real scenes, demonstrate that our method has excellent localization accuracy and mapping quality in dynamic scenes. Full article
(This article belongs to the Special Issue 3D Scene Reconstruction, Modeling and Analysis Using Remote Sensing)
Show Figures

Figure 1

Figure 1
<p>System framework of DyGS-SLAM. The tracking thread conducts dynamic object removal and background inpainting. The mapping thread reconstructs the Gaussian map and performs differentiable rendering using a set of determined poses and point clouds. Lastly, the 3D scene is optimized based on the repaired frames and rendered frames.</p>
Full article ">Figure 2
<p>Open-world semantic segmentation model.</p>
Full article ">Figure 3
<p>RGB images in TUM RGB-D dataset. (<b>a</b>) Frame 690. (<b>b</b>) Frame 765. The red boxes indicates the chair being moved. This is often semantically static but actually moving.</p>
Full article ">Figure 4
<p>The feature point p on the keyframe projected onto the current frame is p’, and O and O’ are the two frames corresponding to the optical center of the camera, respectively. (<b>a</b>) Feature point p’ is static (<math display="inline"><semantics> <mrow> <msup> <mi>d</mi> <mo>′</mo> </msup> <mo>=</mo> <msub> <mi>d</mi> <mrow> <mi>p</mi> <mi>r</mi> <mi>o</mi> <mi>j</mi> </mrow> </msub> </mrow> </semantics></math>). (<b>b</b>) Feature point p’ is dynamic (<math display="inline"><semantics> <mrow> <msup> <mi>d</mi> <mo>′</mo> </msup> <mo>≪</mo> <msub> <mi>d</mi> <mrow> <mi>p</mi> <mi>r</mi> <mi>o</mi> <mi>j</mi> </mrow> </msub> </mrow> </semantics></math>).</p>
Full article ">Figure 5
<p>Image frame comparison between Dyna-SLAM and DyGS-SLAM (Ours) after walking_halfsphere sequence repair in TUM RGB-D dataset. The red boxes show how different methods compare the results of fixing the same frame.</p>
Full article ">Figure 6
<p>Camera trajectory estimated by ORB-SLAM3 and DyGS-SLAM (Ours) on the TUM dataset, and the differences with ground truth values.</p>
Full article ">Figure 7
<p>Comparison of mapping effects between NICE-SLAM, SplaTAM, and DyGS-SLAM (Ours) on walking_xyz sequence.</p>
Full article ">Figure 8
<p>Detailed comparison of the original reconstructed scene provided by Bonn and the scene reconstructed by our method. The red boxes indicate the details of the different methods to reconstruct the scene. (<b>a</b>) Original reconstructed scene provided by Bonn. (<b>b</b>–<b>d</b>) Details of the reconstructed scene of our method.</p>
Full article ">Figure 9
<p>Comparison of reconstruction performance between SplaTAM and DyGS-SLAM (Ours) on Bonn dataset. Our method demonstrates better reconstruction quality. (<b>a</b>) SplaTAM. (<b>b</b>) DyGS-SLAM.</p>
Full article ">Figure 10
<p>Comparison of mapping effects between NICE-SLAM, SplaTAM, and DyGS-SLAM on Replica dataset. The red boxes indicate the details of the different methods to reconstruct the scene. Our method also has excellent reconstruction quality in static scenes. (<b>a</b>) NICE-SLAM. (<b>b</b>) SplaTAM. (<b>c</b>) DyGS-SLAM. (<b>d</b>) GT.</p>
Full article ">Figure 11
<p>Experimental results in real scenarios. The red boxes indicates the recovery of the static background during reconstruction (<b>a</b>) Input image. (<b>b</b>) Segmentation. (<b>c</b>) Background repair. (<b>d</b>) Novel view synthesis.</p>
Full article ">Figure 12
<p>Effect of background inpainting or not on DyGS-SLAM scene reconstruction. The red boxes indicate the reconstruction effects of different methods. (<b>a</b>) Reconstruction w/o background inpainting. (<b>b</b>) Reconstruction w/background inpainting.</p>
Full article ">
23 pages, 5392 KiB  
Article
A Sliding Window-Based CNN-BiGRU Approach for Human Skeletal Pose Estimation Using mmWave Radar
by Yuquan Luo, Yuqiang He, Yaxin Li, Huaiqiang Liu, Jun Wang and Fei Gao
Sensors 2025, 25(4), 1070; https://doi.org/10.3390/s25041070 - 11 Feb 2025
Viewed by 308
Abstract
In this paper, we present a low-cost, low-power millimeter-wave (mmWave) skeletal joint localization system. High-quality point cloud data are generated using the self-developed BHYY_MMW6044 59–64 GHz mmWave radar device. A sliding window mechanism is introduced to extend the single-frame point cloud into multi-frame [...] Read more.
In this paper, we present a low-cost, low-power millimeter-wave (mmWave) skeletal joint localization system. High-quality point cloud data are generated using the self-developed BHYY_MMW6044 59–64 GHz mmWave radar device. A sliding window mechanism is introduced to extend the single-frame point cloud into multi-frame time-series data, enabling the full utilization of temporal information. This is combined with convolutional neural networks (CNNs) for spatial feature extraction and a bidirectional gated recurrent unit (BiGRU) for temporal modeling. The proposed spatio-temporal information fusion framework for multi-frame point cloud data fully exploits spatio-temporal features, effectively alleviates the sparsity issue of radar point clouds, and significantly enhances the accuracy and robustness of pose estimation. Experimental results demonstrate that the proposed system accurately detects 25 skeletal joints, particularly improving the positioning accuracy of fine joints, such as the wrist, thumb, and fingertip, highlighting its potential for widespread application in human–computer interaction, intelligent monitoring, and motion analysis. Full article
(This article belongs to the Section Radar Sensors)
Show Figures

Figure 1

Figure 1
<p>FMCW radar transmit and receive waveforms.</p>
Full article ">Figure 2
<p>FMCW radar RX antenna array and phase relationship.</p>
Full article ">Figure 3
<p>List and locations of 25 skeletal points.</p>
Full article ">Figure 4
<p>Overall flowchart of the human skeletal pose estimation system based on mmWave wave radar and CNN-BiGRU.</p>
Full article ">Figure 5
<p>Multi-frame point cloud temporal modeling based on sliding windows.</p>
Full article ">Figure 6
<p>Spatio-temporal information fusion network architecture based on CNN-BiGRU.</p>
Full article ">Figure 7
<p>mmWave radar structure.</p>
Full article ">Figure 8
<p>Experimental setup with one radar and one Kinect.</p>
Full article ">Figure 9
<p>Experimental environment.</p>
Full article ">Figure 10
<p>Average MAE for 25 human skeletal joints (<span class="html-italic">MARS</span> dataset).</p>
Full article ">Figure 11
<p>Average RMSE for 25 human skeletal joints (<span class="html-italic">MARS</span> dataset).</p>
Full article ">Figure 12
<p>Average MAE for 25 human skeletal joints (self-built dataset).</p>
Full article ">Figure 13
<p>Average RMSE for 25 human skeletal joints (self-built dataset).</p>
Full article ">Figure 14
<p>Demonstration of CNN-BiGRU reconstructing human skeletal joints from point cloud. From left to right, it shows radar point cloud, CNN-BiGRU prediction, and ground truth, respectively. The movements from top to bottom are left upper limb stretch, double upper limb stretch, left front lunge, right front lunge, and left lunge (self-built dataset).</p>
Full article ">Figure 15
<p>Average localization error for 25 human skeletal joints under different <math display="inline"><semantics> <mrow> <mi>s</mi> <mi>t</mi> <mi>e</mi> <mi>p</mi> </mrow> </semantics></math> value.</p>
Full article ">
20 pages, 3024 KiB  
Article
Building Lightweight 3D Indoor Models from Point Clouds with Enhanced Scene Understanding
by Minglei Li, Mingfan Li, Min Li and Leheng Xu
Remote Sens. 2025, 17(4), 596; https://doi.org/10.3390/rs17040596 - 10 Feb 2025
Viewed by 349
Abstract
Indoor scenes often contain complex layouts and interactions between objects, making 3D modeling of point clouds inherently difficult. In this paper, we design a divide-and-conquer modeling method considering the structural differences between indoor walls and internal objects. To achieve semantic understanding, we propose [...] Read more.
Indoor scenes often contain complex layouts and interactions between objects, making 3D modeling of point clouds inherently difficult. In this paper, we design a divide-and-conquer modeling method considering the structural differences between indoor walls and internal objects. To achieve semantic understanding, we propose an effective 3D instance segmentation module using a deep network Indoor3DNet combined with super-point clustering, which provides a larger receptive field and maintains the continuity of individual objects. The Indoor3DNet includes an efficient point feature extraction backbone with good operability for different object granularity. In addition, we use a geometric primitives-based modeling approach to generate lightweight polygonal facets for walls and use a cross-modal registration technique to fit the corresponding instance models for internal objects based on their semantic labels. This modeling method can restore correct geometric shapes and topological relationships while maintaining a very lightweight structure. We have tested the method on diverse datasets, and the experimental results demonstrate that the method outperforms the state-of-the-art in terms of performance and robustness. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of the modeling method. The indoor parsing module uses a deep supervised semantic labeling network and super-point information to predict the semantic labels of each point and segment them into different instances. Subsequently, the hybrid modeling module estimates geometric primitives from the predefined instances and reconstructs the exterior walls and interior objects separately.</p>
Full article ">Figure 2
<p>Workflow of the proposed Indoor3DNet.</p>
Full article ">Figure 3
<p>The deeply-supervised encoder–decoder network for multi-granular feature extraction.</p>
Full article ">Figure 4
<p>Illustration of the schematic representation of marking and classification rules.</p>
Full article ">Figure 5
<p>A visual result of the proposed algorithm for indoor instance segmentation. Different colors represent different instance segmentation.</p>
Full article ">Figure 6
<p>Illustration of the method in the hybrid modeling module.</p>
Full article ">Figure 7
<p>Visualization of the point labeling results on the S3DIS datasetand UZH 3D dataset.</p>
Full article ">Figure 8
<p>Visualization of reconstruction results on the S3DIS dataset and UZH 3D dataset.</p>
Full article ">Figure 9
<p>Confusion matrix on the S3DIS dataset and UZH 3D dataset. The NULL indicates that the instance cannot be successfully reconstructed.</p>
Full article ">Figure 10
<p>Visualization of point labeling results on the NUAA3D dataset.</p>
Full article ">Figure 11
<p>Visualization of reconstruction results on the NUAA3D dataset.</p>
Full article ">Figure 12
<p>Confusion matrix on the NUAA3D dataset. The NULL indicates that the instance cannot be successfully reconstructed.</p>
Full article ">Figure 13
<p>Feature distribution before and after position encoding. Arrows in (<b>a</b>) indicate inter-class overlap before position encoding.</p>
Full article ">
21 pages, 16141 KiB  
Article
The Development of a Sorting System Based on Point Cloud Weight Estimation for Fattening Pigs
by Luo Liu, Yangsen Ou, Zhenan Zhao, Mingxia Shen, Ruqian Zhao and Longshen Liu
Agriculture 2025, 15(4), 365; https://doi.org/10.3390/agriculture15040365 - 8 Feb 2025
Viewed by 405
Abstract
As large-scale and intensive fattening pig farming has become mainstream, the increase in farm size has led to more severe issues related to the hierarchy within pig groups. Due to genetic differences among individual fattening pigs, those that grow faster enjoy a higher [...] Read more.
As large-scale and intensive fattening pig farming has become mainstream, the increase in farm size has led to more severe issues related to the hierarchy within pig groups. Due to genetic differences among individual fattening pigs, those that grow faster enjoy a higher social rank. Larger pigs with greater aggression continuously acquire more resources, further restricting the survival space of weaker pigs. Therefore, fattening pigs must be grouped rationally, and the management of weaker pigs must be enhanced. This study, considering current fattening pig farming needs and actual production environments, designed and implemented an intelligent sorting system based on weight estimation. The main hardware structure of the partitioning equipment includes a collection channel, partitioning channel, and gantry-style collection equipment. Experimental data were collected, and the original scene point cloud was preprocessed to extract the back point cloud of fattening pigs. Based on the morphological characteristics of the fattening pigs, the back point cloud segmentation method was used to automatically extract key features such as hip width, hip height, shoulder width, shoulder height, and body length. The segmentation algorithm first calculates the centroid of the point cloud and the eigenvectors of the covariance matrix to reconstruct the point cloud coordinate system. Then, based on the variation characteristics and geometric shape of the consecutive horizontal slices of the point cloud, hip width and shoulder width slices are extracted, and the related features are calculated. Weight estimation was performed using Random Forest, Multilayer perceptron (MLP), linear regression based on the least squares method, and ridge regression models, with parameter tuning using Bayesian optimization. The mean squared error, mean absolute error, and mean relative error were used as evaluation metrics to assess the model’s performance. Finally, the classification capability was evaluated using the median and average weights of the fattening pigs as partitioning standards. The experimental results show that the system’s average relative error in weight estimation is approximately 2.90%, and the total time for the partitioning process is less than 15 s, which meets the needs of practical production. Full article
(This article belongs to the Special Issue Modeling of Livestock Breeding Environment and Animal Behavior)
Show Figures

Figure 1

Figure 1
<p>Experimental data collection device diagram.</p>
Full article ">Figure 2
<p>Pass-through filtering.</p>
Full article ">Figure 3
<p>Pig back point cloud division.</p>
Full article ">Figure 4
<p>Coordinate system reconstruction result of point cloud.</p>
Full article ">Figure 5
<p><span class="html-italic">x</span>-axis span of a slice.</p>
Full article ">Figure 6
<p>Schematic diagram of the operation of the column equipment.</p>
Full article ">Figure 6 Cont.
<p>Schematic diagram of the operation of the column equipment.</p>
Full article ">Figure 7
<p>A 3D diagram of the column device.</p>
Full article ">Figure 8
<p>Hardware connection diagram.</p>
Full article ">Figure 9
<p>Relationship between ‘eps’ and ‘min_points’ and the number of running hours and categories.</p>
Full article ">Figure 10
<p>DBSCAN clustering of different ‘eps’ and ‘min_points’ values.</p>
Full article ">Figure 11
<p>DBSCAN clustering and voxel downsampling effect.</p>
Full article ">Figure 12
<p>Scatter plot of redundant and normal samples.</p>
Full article ">Figure 13
<p>Model test results and error comparison.</p>
Full article ">Figure 14
<p>Operation display of sorting equipment and system platform.</p>
Full article ">
19 pages, 3685 KiB  
Article
Semantic Segmentation of Key Categories in Transmission Line Corridor Point Clouds Based on EMAFL-PTv3
by Li Lu, Linong Wang, Shaocheng Wu, Shengxuan Zu, Yuhao Ai and Bin Song
Electronics 2025, 14(4), 650; https://doi.org/10.3390/electronics14040650 - 8 Feb 2025
Viewed by 373
Abstract
Accurate and efficient segmentation of key categories of transmission line corridor point clouds is one of the prerequisite technologies for the application of transmission line drone inspection. However, current semantic segmentation methods are limited to a few categories, involve cumbersome processes, and exhibit [...] Read more.
Accurate and efficient segmentation of key categories of transmission line corridor point clouds is one of the prerequisite technologies for the application of transmission line drone inspection. However, current semantic segmentation methods are limited to a few categories, involve cumbersome processes, and exhibit low accuracy. To address these issues, this paper proposes EMAFL-PTv3, a deep learning model for semantic segmentation of transmission line corridor point clouds. Built upon Point Transformer v3 (PTv3), EMAFL-PTv3 integrates Efficient Multi-Scale Attention (EMA) to enhance feature extraction at different scales, incorporates Focal Loss to mitigate class imbalance, and achieves accurate segmentation into five categories: ground, ground wire, insulator string, pylon, and transmission line. EMAFL-PTv3 is evaluated on a dataset of 40 spans of transmission line corridor point clouds collected by a drone in Wuhan and Xiangyang, Hubei Province. Experimental results demonstrate that EMAFL-PTv3 outperforms PTv3 in all categories, with notable improvements in the more challenging categories: insulator string (IoU 67.25%) and Pylon (IoU 91.77%), showing increases of 7.06% and 11.39%, respectively. The mIoU, mA, and OA scores reach 90.46%, 92.86%, and 98.07%, representing increases of 5.49%, 2.75%, and 2.44% over PTv3, respectively, proving its superior performance. Full article
Show Figures

Figure 1

Figure 1
<p>The overall framework of EMAFL-PTv3.</p>
Full article ">Figure 2
<p>The structure of the EMA module.</p>
Full article ">Figure 3
<p>Comparison of point cloud segmentation results of Span I from different models. Subfigures (<b>a</b>–<b>h</b>) represent the original point cloud data of a span, ground truth, and segmentation results of EMAFL-PTv3 (ours), PTv3 (baseline), PTv2, PTv1, PointNet++, and PointNet, respectively. Subfigures (<b>i</b>–<b>k</b>) provide zoomed-in comparisons of critical regions, including the insulator string, pylon, and transmission line intersections, for ground truth, EMAFL-PTv3, and PTv3. The red circles highlight typical segmentation imperfections.</p>
Full article ">
23 pages, 4583 KiB  
Article
Research on Fine-Scale Terrain Construction in High Vegetation Coverage Areas Based on Implicit Neural Representations
by Yi Zhang, Peipei He, Haihang Jing, Bin He, Weibo Yin, Junzhen Meng, Yuntian Ma, Haifeng Zhang, Bo Zhang and Haoxiang Shen
Sustainability 2025, 17(3), 1320; https://doi.org/10.3390/su17031320 - 6 Feb 2025
Viewed by 429
Abstract
Due to the high-density coverage of vegetation, the complexity of terrain, and occlusion issues, ground point extraction faces significant challenges. Airborne Light Detection and Ranging (LiDAR) technology plays a crucial role in complex mountainous areas. This article proposes a method for constructing fine [...] Read more.
Due to the high-density coverage of vegetation, the complexity of terrain, and occlusion issues, ground point extraction faces significant challenges. Airborne Light Detection and Ranging (LiDAR) technology plays a crucial role in complex mountainous areas. This article proposes a method for constructing fine terrain in high vegetation coverage areas based on implicit neural representation. This method consists of data preprocessing, multi-scale and multi-feature high-difference point cloud initial filtering, and an upsampling module based on implicit neural representation. Firstly, preprocess the regional point cloud data is preprocessed; then, K-dimensional trees (K-d trees) are used to construct spatial indexes, and spherical neighborhood methods are applied to capture the geometric and physical information of point clouds for multi-feature fusion, enhancing the distinction between terrain and non-terrain elements. Subsequently, a differential model is constructed based on DSM (Digital Surface Model) at different scales, and the elevation variation coefficient is calculated to determine the threshold for extracting the initial set of ground points. Finally, the upsampling module using implicit neural representation is used to finely process the initial ground point set, providing a complete and uniformly dense ground point set for the subsequent construction of fine terrain. To validate the performance of the proposed method, three sets of point cloud data from mountainous terrain with different features are selected as the experimental area. The experimental results indicate that, from a qualitative perspective, the proposed method significantly improves the classification of vegetation, buildings, and roads, with clear boundaries between different types of terrain. From a quantitative perspective, the Type I errors of the three selected regions are 4.3445%, 5.0623%, and 5.9436%, respectively. The Type II errors are 5.7827%, 6.8516%, and 7.3478%, respectively. The overall errors are 5.3361%, 6.4882%, and 6.7168%, respectively. The Kappa coefficients of the measurement areas all exceed 80%, indicating that the proposed method performs well in complex mountainous environments. Provide point cloud data support for the construction of wind and photovoltaic bases in China, reduce potential damage to the ecological environment caused by construction activities, and contribute to the sustainable development of ecology and energy. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of the location of the wind and photovoltaic project in the experimental area.</p>
Full article ">Figure 2
<p>Flowchart of the fine point cloud filtering method for dense vegetation coverage in complex mountainous areas.</p>
Full article ">Figure 3
<p>Multi-feature neighborhood construction model. In the K-d tree, it can be clearly seen that the red, green, and blue lines in the above figure divide the space of the cube into two, four, and eight parts, respectively. The last 8 subspaces are leaf nodes; In a spherical neighborhood map, black dots are the current point, blue dots are points within the neighborhood of the current point, and the remaining points are terrain points in the neighborhood of the previous point.</p>
Full article ">Figure 4
<p>Illustrates the application of the implicit neural representation upsampling module in the processing of point clouds in complex mountainous terrain.</p>
Full article ">Figure 5
<p>Results obtained with different upsampling scales for the same input.</p>
Full article ">Figure 6
<p>4× upsampling point cloud data results.</p>
Full article ">Figure 7
<p>Results of processing the point cloud data of Area c.</p>
Full article ">Figure 8
<p>The DEM of the complex mountainous terrain generated after processing with the proposed method.</p>
Full article ">Figure 9
<p>Point cloud image and DEM for Area b.</p>
Full article ">Figure 10
<p>Maps of Area c, d, and e, along with their corresponding DEMs.</p>
Full article ">Figure 10 Cont.
<p>Maps of Area c, d, and e, along with their corresponding DEMs.</p>
Full article ">
21 pages, 6413 KiB  
Article
Targetless Radar–Camera Extrinsic Parameter Calibration Using Track-to-Track Association
by Xinyu Liu, Zhenmiao Deng and Gui Zhang
Sensors 2025, 25(3), 949; https://doi.org/10.3390/s25030949 - 5 Feb 2025
Viewed by 436
Abstract
One of the challenges in calibrating millimeter-wave radar and camera lies in the sparse semantic information of the radar point cloud, making it hard to extract environment features corresponding to the images. To overcome this problem, we propose a track association algorithm for [...] Read more.
One of the challenges in calibrating millimeter-wave radar and camera lies in the sparse semantic information of the radar point cloud, making it hard to extract environment features corresponding to the images. To overcome this problem, we propose a track association algorithm for heterogeneous sensors, to achieve targetless calibration between the radar and camera. Our algorithm extracts corresponding points from millimeter-wave radar and image coordinate systems by considering the association of tracks from different sensors, without any explicit target or prior for the extrinsic parameter. Then, perspective-n-point (PnP) and nonlinear optimization algorithms are applied to obtain the extrinsic parameter. In an outdoor experiment, our algorithm achieved a track association accuracy of 96.43% and an average reprojection error of 2.6649 pixels. On the CARRADA dataset, our calibration method yielded a reprojection error of 3.1613 pixels, an average rotation error of 0.8141°, and an average translation error of 0.0754 m. Furthermore, robustness tests demonstrated the effectiveness of our calibration algorithm in the presence of noise. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

Figure 1
<p>Application scenarios for the algorithm.</p>
Full article ">Figure 2
<p>Flowchart of the proposed algorithm. The radar detector includes components such as moving target indication (MTI), constant false alarm rate (CFAR), and fast Fourier transform (FFT). The video detector utilizes YOLOv5 and filters out targets, except for vehicles and people. The MOT refers to multiple object tracking. Through object detection and tracking, target tracks are obtained from raw data. After time synchronization and track association, we can obtain track pairs in the radar and pixel coordinate system. Finally, the extrinsic parameters are obtained using PnP and nonlinear optimization algorithms.</p>
Full article ">Figure 3
<p>Radar signal processing pipeline.</p>
Full article ">Figure 4
<p>Radar data after MTI and 2D-FFT. The data were collected from the real world. In this scenario, there are three individuals moving, corresponding to the three marked peaks.</p>
Full article ">Figure 5
<p>Flowchart of radar target tracking.</p>
Full article ">Figure 6
<p>Result of radar target tracking.</p>
Full article ">Figure 7
<p>Tracking scenario with multiple pedestrian targets being tracked.</p>
Full article ">Figure 8
<p>Schematic diagram of temporal synchronization, where solid lines represent radar data frames and dashed lines represent video data frames.</p>
Full article ">Figure 9
<p>Illustration of the cost matrix <math display="inline"><semantics> <mi mathvariant="bold">M</mi> </semantics></math>. The red cells represent the minimum value in each row of the matrix, which also corresponds to the radar track with the minimum association cost for each video track. The correct track pairs are <math display="inline"><semantics> <mrow> <mo>(</mo> <msub> <mi>T</mi> <mrow> <mi>c</mi> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>T</mi> <mrow> <mi>r</mi> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mo>(</mo> <msub> <mi>T</mi> <mrow> <mi>c</mi> <mn>2</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>T</mi> <mrow> <mi>r</mi> <mn>3</mn> </mrow> </msub> <mo>)</mo> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mo>(</mo> <msub> <mi>T</mi> <mrow> <mi>c</mi> <mn>3</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>T</mi> <mrow> <mi>r</mi> <mn>2</mn> </mrow> </msub> <mo>)</mo> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mo>(</mo> <msub> <mi>T</mi> <mrow> <mi>c</mi> <mn>4</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>T</mi> <mrow> <mi>r</mi> <mn>5</mn> </mrow> </msub> <mo>)</mo> </mrow> </semantics></math>.</p>
Full article ">Figure 10
<p>Experimental setup.</p>
Full article ">Figure 11
<p>Projection of radar points.</p>
Full article ">Figure 12
<p>Rotation and translation error in the CARRADA dataset. The ends of the box in the figure represent the upper and lower quartiles, respectively, while the red line denotes the median. The plus signs indicate the outliers. The ends of the box, extended by dashed lines, represent the maximum and minimum values within the normal range.</p>
Full article ">Figure 13
<p>Reprojection error of the proposed algorithm and single-target calibration on the CARRADA dataset.</p>
Full article ">Figure 14
<p>Reprojection error of approaches A and B on the CARRADA dataset.</p>
Full article ">Figure 15
<p>The scenario of the CARRADA dataset.</p>
Full article ">Figure 16
<p>Rotation and translation error varied with noise.</p>
Full article ">Figure 17
<p>Reprojection error varied with noise.</p>
Full article ">
38 pages, 14791 KiB  
Article
Online High-Definition Map Construction for Autonomous Vehicles: A Comprehensive Survey
by Hongyu Lyu, Julie Stephany Berrio Perez, Yaoqi Huang, Kunming Li, Mao Shan and Stewart Worrall
J. Sens. Actuator Netw. 2025, 14(1), 15; https://doi.org/10.3390/jsan14010015 - 2 Feb 2025
Viewed by 558
Abstract
High-definition (HD) maps aim to provide detailed road information with centimeter-level accuracy, essential for enabling precise navigation and safe operation of autonomous vehicles (AVs). Traditional offline construction methods involve several complex steps, such as data collection, point cloud generation, and feature extraction, but [...] Read more.
High-definition (HD) maps aim to provide detailed road information with centimeter-level accuracy, essential for enabling precise navigation and safe operation of autonomous vehicles (AVs). Traditional offline construction methods involve several complex steps, such as data collection, point cloud generation, and feature extraction, but these methods are resource-intensive and struggle to keep pace with the rapidly changing road environments. In contrast, online HD map construction leverages onboard sensor data to dynamically generate local HD maps, offering a bird’s-eye view (BEV) representation of the surrounding road environment. This approach has the potential to improve adaptability to spatial and temporal changes in road conditions while enhancing cost-efficiency by reducing the dependency on frequent map updates and expensive survey fleets. This survey provides a comprehensive analysis of online HD map construction, including the task background, high-level motivations, research methodology, key advancements, existing challenges, and future trends. We systematically review the latest advancements in three key sub-tasks: map segmentation, map element detection, and lane graph construction, aiming to bridge gaps in the current literature. We also discuss existing challenges and future trends, covering standardized map representation design, multitask learning, and multi-modality fusion, while offering suggestions for potential improvements. Full article
(This article belongs to the Special Issue Advances in Intelligent Transportation Systems (ITS))
Show Figures

Figure 1

Figure 1
<p>Structure of this survey.</p>
Full article ">Figure 2
<p>Pipeline of research methodology.</p>
Full article ">Figure 3
<p>Comparison of the VT module in two projection-based map segmentation methods. (<b>a</b>) Simple-BEV [<a href="#B53-jsan-14-00015" class="html-bibr">53</a>] projects voxel grid points onto feature maps and uses bilinear sampling to extract features for constructing 3D voxel features. (<b>b</b>) Ego3RT [<a href="#B46-jsan-14-00015" class="html-bibr">46</a>] projects polarized grid queries onto feature maps and uses attention to extract features for constructing 3D voxel features.</p>
Full article ">Figure 4
<p>Comparison of the VT module in two lift-based map segmentation methods. (<b>a</b>) PON [<a href="#B38-jsan-14-00015" class="html-bibr">38</a>] uses MLP to expand bottleneck features along the depth axis. (<b>b</b>) LSS [<a href="#B39-jsan-14-00015" class="html-bibr">39</a>] uses CNN to predict pixel-wise depth probability distributions.</p>
Full article ">Figure 5
<p>Comparison of the VT module in two network-based map segmentation methods. (<b>a</b>) PYVA [<a href="#B42-jsan-14-00015" class="html-bibr">42</a>] uses two MLPs to enable bidirectional projection of feature maps between pixel space and BEV space. (<b>b</b>) BEVSegFormer [<a href="#B51-jsan-14-00015" class="html-bibr">51</a>] uses deformable cross-attention [<a href="#B102-jsan-14-00015" class="html-bibr">102</a>] to predict 2D reference points for sampling feature maps to refine BEV queries.</p>
Full article ">Figure 6
<p>Comparison of the MD module in two CNN-based map element detection methods. (<b>a</b>) HDMapNet [<a href="#B21-jsan-14-00015" class="html-bibr">21</a>] uses an FCN [<a href="#B109-jsan-14-00015" class="html-bibr">109</a>] to decode semantic, instance, and direction masks, which are then post-processed into vectorized representations. (<b>b</b>) InstaGraM [<a href="#B59-jsan-14-00015" class="html-bibr">59</a>] uses two CNNs to detect vertices and edges, then employs an attentional GNN to associate the vertices, generating vectorized representations in an end-to-end manner.</p>
Full article ">Figure 7
<p>Comparison of the pipelines of two Transformer-based map element detection methods. (<b>a</b>) MapTR [<a href="#B56-jsan-14-00015" class="html-bibr">56</a>] uses a single-stage DETR-like Transformer [<a href="#B110-jsan-14-00015" class="html-bibr">110</a>] for parallel decoding of ordered point sequences for map elements. (<b>b</b>) MGMap [<a href="#B67-jsan-14-00015" class="html-bibr">67</a>] uses instance masks to enhance element queries for precise localization and uses mask patches to refine point position predictions.</p>
Full article ">Figure 8
<p>Comparison of temporal fusion (short-term and long-term) in two Transformer-based map element detection methods. (<b>a</b>) StreamMapNet [<a href="#B63-jsan-14-00015" class="html-bibr">63</a>] aligns and fuses BEV features from consecutive frames and propagates high-confidence element queries to the next frame. (<b>b</b>) HRMapNet [<a href="#B70-jsan-14-00015" class="html-bibr">70</a>] fuses BEV features with rasterized map features to enrich information and rasterizes vectorized map predictions to maintain a global historical map.</p>
Full article ">Figure 9
<p>Comparison of the pipelines for two single-step-based lane graph construction methods. (<b>a</b>) TopoMLP [<a href="#B87-jsan-14-00015" class="html-bibr">87</a>] uses two Transformers for lane and traffic element queries, followed by MLPs to predict the topological relationships between paired queries. (<b>b</b>) TPLR [<a href="#B20-jsan-14-00015" class="html-bibr">20</a>] uses a Transformer to process lane and minimal cycle queries simultaneously, followed by joint decoding of the lane graph and the cover of minimal cycles.</p>
Full article ">Figure 10
<p>Comparison of the TR module in two iteration-based lane graph construction methods. (<b>a</b>) TopoNet [<a href="#B85-jsan-14-00015" class="html-bibr">85</a>] uses two Transformers for lane and traffic element queries, followed by a GCN for iterative message passing and feature updating. (<b>b</b>) RoadNetTransformer [<a href="#B84-jsan-14-00015" class="html-bibr">84</a>] (semi-autoregressive) first predicts lane key points in parallel and then autoregressively generates local sequences for lane graphs.</p>
Full article ">Figure 11
<p>Comparison of lane segment representation [<a href="#B86-jsan-14-00015" class="html-bibr">86</a>] with two alternative map representations.</p>
Full article ">Figure 12
<p>Comparison of uncertainty-based map representations [<a href="#B116-jsan-14-00015" class="html-bibr">116</a>] integrated into various online HD map construction methods. (<b>a</b>) Ground truth. (<b>b</b>) MapTR [<a href="#B56-jsan-14-00015" class="html-bibr">56</a>]. (<b>c</b>) MapTRv2 [<a href="#B76-jsan-14-00015" class="html-bibr">76</a>]. (<b>d</b>) MapTRv2-CL [<a href="#B76-jsan-14-00015" class="html-bibr">76</a>]. (<b>e</b>) StreamMapNet [<a href="#B63-jsan-14-00015" class="html-bibr">63</a>].</p>
Full article ">Figure 13
<p>Comparison of the MTL pipeline in two online HD map construction methods. (<b>a</b>) BEVerse [<a href="#B50-jsan-14-00015" class="html-bibr">50</a>] presents a unified framework for map segmentation, 3D object detection, and motion prediction. (<b>b</b>) BeMapNet [<a href="#B57-jsan-14-00015" class="html-bibr">57</a>] presents a unified framework for map segmentation, map element detection, and instance segmentation.</p>
Full article ">Figure 14
<p>Comparison of the MMF pipeline in two online HD map construction methods. (<b>a</b>) BEVFusion [<a href="#B52-jsan-14-00015" class="html-bibr">52</a>] fuses camera and LiDAR features in the unified BEV space. (<b>b</b>) NMP [<a href="#B58-jsan-14-00015" class="html-bibr">58</a>] fuses BEV features with neural map priors from previous traversals.</p>
Full article ">
19 pages, 11928 KiB  
Article
Point Cloud Vibration Compensation Algorithm Based on an Improved Gaussian–Laplacian Filter
by Wanhe Du, Xianfeng Yang and Jinghui Yang
Electronics 2025, 14(3), 573; https://doi.org/10.3390/electronics14030573 - 31 Jan 2025
Viewed by 491
Abstract
In industrial environments, steel plate surface inspection plays a crucial role in quality control. However, vibrations during laser scanning can significantly impact measurement accuracy. While traditional vibration compensation methods rely on complex dynamic modeling, they often face challenges in practical implementation and generalization. [...] Read more.
In industrial environments, steel plate surface inspection plays a crucial role in quality control. However, vibrations during laser scanning can significantly impact measurement accuracy. While traditional vibration compensation methods rely on complex dynamic modeling, they often face challenges in practical implementation and generalization. This paper introduces a novel point cloud vibration compensation algorithm that combines an improved Gaussian–Laplacian filter with adaptive local feature analysis. The key innovations include (1) an FFT-based vibration factor extraction method that effectively identifies vibration trends, (2) an adaptive windowing strategy that automatically adjusts based on local geometric features, and (3) a weighted compensation mechanism that preserves surface details while reducing vibration noise. The algorithm demonstrated significant improvements in signal-to-noise ratio: 15.78% for simulated data, 6.81% for precision standard parts, and 12.24% for actual industrial measurements. Experimental validation confirms the algorithm’s effectiveness across different conditions. This approach achieved a practical, implementable solution for surface inspection in steel plate surface inspection. Full article
Show Figures

Figure 1

Figure 1
<p>Algorithm flowchart.</p>
Full article ">Figure 2
<p>Overall approach diagram. (Standard bearing diameter: 129.991 ± 0.001 mm, and the solid arrow indicates processing using our algorithm.).</p>
Full article ">Figure 3
<p>Original data, trend plot and comparison before and after vibration compensation for a random laser line of simulated data.</p>
Full article ">Figure 4
<p>Comparison of differentials, variance, curvature and features before and after vibration compensation for a random laser line of simulated data.</p>
Full article ">Figure 5
<p>Frequency–power spectral density plot before and after vibration compensation for a random laser line of simulated data.</p>
Full article ">Figure 6
<p>On-site experimental setup.</p>
Full article ">Figure 7
<p>Experimental workflow diagram.</p>
Full article ">Figure 8
<p>Point cloud registration results and local magnification.</p>
Full article ">Figure 9
<p>Original data, trend plot and comparison before and after vibration compensation for a random laser line of the standard bearing data.</p>
Full article ">Figure 10
<p>Comparison of differentials, variance, curvature and features before and after vibration compensation for a random laser line of standard bearing data.</p>
Full article ">Figure 11
<p>Frequency–power spectral density plot before and after vibration compensation for a random laser line of standard bearing data.</p>
Full article ">Figure 12
<p>Original data, trend plot and comparison before and after vibration compensation for a random laser line of actual plane data.</p>
Full article ">Figure 13
<p>Comparison of differentials, variance, curvature and features before and after vibration compensation for a random laser line of actual plane data.</p>
Full article ">Figure 14
<p>Frequency–power spectral density plot before and after vibration compensation for a random laser line of actual plane data.</p>
Full article ">Figure 15
<p>Comparison before and after only Gaussian smoothing for a random laser line of actual plane data.</p>
Full article ">Figure 16
<p>Comparison before and after only Laplacian operator for a random laser line of actual plane data.</p>
Full article ">Figure 17
<p>Comparison before and after improved Gaussian–Laplacian filter for a random laser line of actual plane data.</p>
Full article ">Figure 18
<p>Point cloud comparison and local magnification before and after vibration compensation of actual plane data.</p>
Full article ">
22 pages, 807 KiB  
Article
Fusing Skeleton-Based Scene Flow for Gesture Recognition on Point Clouds
by Yahui Liu and Jiajia Jiao
Electronics 2025, 14(3), 567; https://doi.org/10.3390/electronics14030567 - 31 Jan 2025
Viewed by 441
Abstract
Dynamic gesture recognition has recently aimed to learn static and motion features by exploiting point clouds from depth images. However, the weak correlation between some pixels and hand gestures makes the extracted dynamic features redundant. When search points and adjacent points in a [...] Read more.
Dynamic gesture recognition has recently aimed to learn static and motion features by exploiting point clouds from depth images. However, the weak correlation between some pixels and hand gestures makes the extracted dynamic features redundant. When search points and adjacent points in a larger feature space maintain movement consistency, more detailed movements are ignored. To improve the ability to capture fine-grained dynamic features and improve the relevance of point clouds and gestures, we propose a novel method of fusing skeleton-based scene flow for gesture recognition (FSS-GR) for higher recognition accuracy. Firstly, skeletons are automatically converted into pairs of point clouds. Based on the time interval between source and target point clouds and scene flow measurement indicators, four scene flow estimators are obtained. To minimize the additional cost of capturing fine-grained information, scene flow is used as datasets before fusion. Then, the coarse-grained dynamic features from depth images are fused with the obtained scene flow using different strategies, so that the flexible tradeoffs between model complexity and recognition performance are available for various scenarios. The comprehensive experiments and ablation study on SHREC’17 and DHG demonstrate that FSS-GR achieves a higher accuracy than state-of-the-art works. Full article
(This article belongs to the Special Issue Machine Learning and Deep Learning Based Pattern Recognition)
Show Figures

Figure 1

Figure 1
<p>The basic idea of FSS-GR. (<b>a</b>) Multi-stream FSS-GR. (<b>b</b>) Two-stream FSS-GR. Compared with the general framework based on point clouds and FSS-GR, FSS-GR not only takes advantage of both the depth image and skeleton, but also integrates 3D motion features at different fine-grained levels. Most works learn the dynamic features of search point in Neighborhood 1. Our work fuses the fine-grained features represented by scene flow in smaller Neighborhood 2. <math display="inline"><semantics> <mrow> <mo>Δ</mo> <msub> <mi>t</mi> <mi>c</mi> </msub> </mrow> </semantics></math> is the frame time interval between search points and adjacent points during the learning of coarse-grained features. <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>t</mi> </mrow> </semantics></math> is the frame time interval between search (source) points and adjacent (target) points during the extraction of fine-grained features.</p>
Full article ">Figure 2
<p>A pair of point clouds includes source point cloud and target point cloud. There are three methods to choose the target point cloud based on <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>t</mi> </mrow> </semantics></math>. <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>t</mi> </mrow> </semantics></math> is the frame time interval between the source and target point clouds. <span class="html-italic">k</span> is the average number of frames in every grouped gesture. When <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>t</mi> </mrow> </semantics></math> is set to <math display="inline"><semantics> <msub> <mrow> <mo>Δ</mo> <mi>t</mi> </mrow> <mn>1</mn> </msub> </semantics></math>, the scene flow of the search point(red) is learned from points(blue) in <math display="inline"><semantics> <msub> <mi>S</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </msub> </semantics></math>. When <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>t</mi> </mrow> </semantics></math> is set to <math display="inline"><semantics> <msub> <mrow> <mo>Δ</mo> <mi>t</mi> </mrow> <mn>2</mn> </msub> </semantics></math>, the scene flow of the search point(red) is learned from points(green) in <math display="inline"><semantics> <msub> <mi>S</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </msub> </semantics></math> and points in <math display="inline"><semantics> <msup> <mi>X</mi> <mi>t</mi> </msup> </semantics></math>. When <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>t</mi> </mrow> </semantics></math> is set to <math display="inline"><semantics> <msub> <mrow> <mo>Δ</mo> <mi>t</mi> </mrow> <mn>3</mn> </msub> </semantics></math>, the scene flow of the search point(red) is learned from points(purple) in <math display="inline"><semantics> <msub> <mi>S</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>+</mo> <mn>2</mn> <mo>)</mo> </mrow> </msub> </semantics></math> and points in <math display="inline"><semantics> <msup> <mi>X</mi> <mi>t</mi> </msup> </semantics></math>.</p>
Full article ">Figure 3
<p>Multi-stream FSS-GR framework. In terms of input modality, the input for static and dynamic branches are depth images, and the initial input modality for scene flow is skeleton. The input of this branch is the scene flow dataset in <a href="#sec3dot1-electronics-14-00567" class="html-sec">Section 3.1</a>. CNN is a 1 × 1 convolution. FC is the fully connected layer.</p>
Full article ">Figure 4
<p>Two-stream FSS-GR framework. Two-stream FSS-GR consists of a static branch that extracts spatial features and a dynamic branch that captures dynamic features. Scene flow is the supplementary learning of normal vectors in the dynamic branch. In T-FSS-GR, the input of module SA are 3D static features and multi-dimensional normal vectors that fuses fine-grained scene flow. <math display="inline"><semantics> <msubsup> <mi>f</mi> <mrow> <mi>s</mi> <mi>s</mi> <mi>t</mi> </mrow> <mo>′</mo> </msubsup> </semantics></math>(brown) is the static feature which learns scene flow. <math display="inline"><semantics> <msubsup> <mi>n</mi> <mrow> <mi>s</mi> <mi>t</mi> </mrow> <mo>′</mo> </msubsup> </semantics></math>(pink) is the dynamic feature which learns scene flow.</p>
Full article ">Figure 5
<p>Performance comparison (%) of multi-stream FSS-GR under different scene flow branch ratio on SHREC’17: (<b>a</b>) M-FSS-GR; and (<b>b</b>) M-FSS-GR’. The x axis is the ratio of scene flow branch, while y axis is the total accuracy (%).</p>
Full article ">
39 pages, 4315 KiB  
Review
A Review of Embodied Grasping
by Jianghao Sun, Pengjun Mao, Lingju Kong and Jun Wang
Sensors 2025, 25(3), 852; https://doi.org/10.3390/s25030852 - 30 Jan 2025
Viewed by 560
Abstract
Pre-trained models trained with internet-scale data have achieved significant improvements in perception, interaction, and reasoning. Using them as the basis of embodied grasping methods has greatly promoted the development of robotics applications. In this paper, we provide a comprehensive review of the latest [...] Read more.
Pre-trained models trained with internet-scale data have achieved significant improvements in perception, interaction, and reasoning. Using them as the basis of embodied grasping methods has greatly promoted the development of robotics applications. In this paper, we provide a comprehensive review of the latest developments in this field. First, we summarize the embodied foundations, including cutting-edge embodied robots, simulation platforms, publicly available datasets, and data acquisition methods, to fully understand the research focus. Then, the embodied algorithms are introduced, starting from pre-trained models, with three main research goals: (1) embodied perception, using data captured by visual sensors to perform point cloud extraction or 3D reconstruction, combined with pre-trained models, to understand the target object and external environment and directly predict the execution of actions; (2) embodied strategy: In imitation learning, the pre-trained model is used to enhance data or as a feature extractor to enhance the generalization ability of the model. In reinforcement learning, the pre-trained model is used to obtain the optimal reward function, which improves the learning efficiency and ability of reinforcement learning; (3) embodied agent: The pre-trained model adopts hierarchical or holistic execution to achieve end-to-end robot control. Finally, the challenges of the current research are summarized, and a perspective on feasible technical routes is provided. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

Figure 1
<p>Main organizational framework of this article.</p>
Full article ">Figure 2
<p>Embodied-foundation content.</p>
Full article ">Figure 3
<p>Three-dimensional feature framework.</p>
Full article ">Figure 4
<p>Three-dimensional scene reconstruction framework.</p>
Full article ">Figure 5
<p>Data augmentation framework and application.</p>
Full article ">Figure 6
<p>Feature extractor framework.</p>
Full article ">Figure 7
<p>Reward function calculation framework.</p>
Full article ">Figure 8
<p>Low-level control strategy and skills library framework.</p>
Full article ">Figure 9
<p>Classic framework for the comprehensive implementation of three methods.</p>
Full article ">
17 pages, 3362 KiB  
Article
Truck Lifting Accident Detection Method Based on Improved PointNet++ for Container Terminals
by Yang Shen, Xintai Man, Jiaqi Wang, Yujie Zhang and Chao Mi
J. Mar. Sci. Eng. 2025, 13(2), 256; https://doi.org/10.3390/jmse13020256 - 30 Jan 2025
Viewed by 460
Abstract
In container terminal operations, truck lifting accidents pose a serious threat to the safety and efficiency of automated equipment. Traditional detection methods using visual cameras and single-line Light Detection and Ranging (LiDAR) are insufficient for capturing three-dimensional spatial features, leading to reduced detection [...] Read more.
In container terminal operations, truck lifting accidents pose a serious threat to the safety and efficiency of automated equipment. Traditional detection methods using visual cameras and single-line Light Detection and Ranging (LiDAR) are insufficient for capturing three-dimensional spatial features, leading to reduced detection accuracy. Moreover, the boundary features of key accident objects, such as containers, truck chassis, and wheels, are often blurred, resulting in frequent false and missed detections. To tackle these challenges, this paper proposes an accident detection method based on multi-line LiDAR and an improved PointNet++ model. This method uses multi-line LiDAR to collect point cloud data from operational lanes in real time and enhances the PointNet++ model by integrating a multi-layer perceptron (MLP) and a mixed attention mechanism (MAM), optimizing the model’s ability to extract local and global features. This results in high-precision semantic segmentation and accident detection of critical structural point clouds, such as containers, truck chassis, and wheels. Experiments confirm that the proposed method achieves superior performance compared to the current mainstream algorithms regarding point cloud segmentation accuracy and stability. In engineering tests across various real-world conditions, the model exhibits strong generalization capability. Full article
(This article belongs to the Special Issue Sustainable Maritime Transport and Port Intelligence)
Show Figures

Figure 1

Figure 1
<p>Nine different types of truck lifting accidents.</p>
Full article ">Figure 2
<p>LiDAR installation and data collection. (<b>a</b>) Shows LiDAR installation; (<b>b</b>) shows a point cloud of a container being lifted as captured by the LiDAR.</p>
Full article ">Figure 3
<p>Truck lifting accident detection process.</p>
Full article ">Figure 4
<p>Improved PointNet++ network structure.</p>
Full article ">Figure 5
<p>Multi-layer perceptron.</p>
Full article ">Figure 6
<p>Feature extraction module based on mixed attention mechanism.</p>
Full article ">Figure 7
<p>Channel attention mechanism.</p>
Full article ">Figure 8
<p>Self-attention mechanism.</p>
Full article ">Figure 9
<p>Point cloud dataset collection process. (<b>a</b>) Shows the collection process for a truck lifting accident involving a 20-foot container; (<b>b</b>) shows the collection process for a truck lifting accident involving a 40-foot container.</p>
Full article ">Figure 10
<p>Visualization of point cloud segmentation results. (<b>a</b>) Shows a container remaining stationary without being lifted; (<b>b</b>) shows a container being lifted successfully under normal conditions; (<b>c</b>) shows a lifting accident occurring with the front lock engaged; (<b>d</b>) shows a lifting accident occurring with the rear lock engaged.</p>
Full article ">
Back to TopTop