MDPI - Publisher of Open Access Journals

23 pages, 8895 KiB

Open AccessArticle

Automated 3D Image Processing System for Inspection of Residential Wall Spalls

by Junjie Wang, Yunfang Pang and Xinyu Teng

Appl. Sci. 2025, 15(4), 2140; https://doi.org/10.3390/app15042140 - 18 Feb 2025

Viewed by 2

Continuous spalling exposure can weaken the performance of structures. Therefore, the development of methods for detecting wall spall damage remains essential in the field of Structural Health Monitoring. Currently, researchers mainly rely on 2D information for spall detection and predominantly use manual data [...] Read more.

Continuous spalling exposure can weaken the performance of structures. Therefore, the development of methods for detecting wall spall damage remains essential in the field of Structural Health Monitoring. Currently, researchers mainly rely on 2D information for spall detection and predominantly use manual data collection methods in the complex environment of residential buildings, which are usually inefficient. To address this challenge, an automated 3D image processing system for wall spalls is proposed in this study. First, UGV path planning was performed in order to collect information about the surrounding environmental defects. Second, to address the shortcomings of RandLA-Net, a dynamic enhanced dual-branch structure is established based on which consistency constraints are introduced, a lightweight attention module is added, and the loss function is optimized in order to enhance the ability of the model in extracting feature information of the point cloud. Finally, spalls are quantitatively evaluated to determine the damage to buildings. The results show that the Randla-Spall achieves 94.71% Recall and 84.20% mIoU on the test set, improved by 4.25% and 5.37%. An integrated process using a lightweight device is achieved in this study, which is capable of efficiently extracting and quantifying spalling defects and provides valuable references for SHM. Full article

(This article belongs to the Section Civil Engineering)

► Show Figures

Figure 1

17 pages, 7393 KiB

Open AccessArticle

Laser Stripe Centerline Extraction Method for Deep-Hole Inner Surfaces Based on Line-Structured Light Vision Sensing

by Huifu Du, Daguo Yu, Xiaowei Zhao and Ziyang Zhou

Sensors 2025, 25(4), 1113; https://doi.org/10.3390/s25041113 - 12 Feb 2025

Viewed by 297

Abstract

This paper proposes a point cloud post-processing method based on the minimum spanning tree (MST) and depth-first search (DFS) to extract laser stripe centerlines from the complex inner surfaces of deep holes. Addressing the limitations of traditional image processing methods, which are affected [...] Read more.

This paper proposes a point cloud post-processing method based on the minimum spanning tree (MST) and depth-first search (DFS) to extract laser stripe centerlines from the complex inner surfaces of deep holes. Addressing the limitations of traditional image processing methods, which are affected by burrs and low-frequency random noise, this method utilizes 360° structured light to illuminate the inner wall of the deep hole. A sensor captures laser stripe images, and the Steger algorithm is employed to extract sub-pixel point clouds. Subsequently, an MST is used to construct the point cloud connectivity structure, while DFS is applied for path search and noise removal to enhance extraction accuracy. Experimental results demonstrate that this method significantly improves extraction accuracy, with a dice similarity coefficient (DSC) approaching 1 and a maximum Hausdorff distance (HD) of 3.3821 pixels, outperforming previous methods. This study provides an efficient and reliable solution for the precise extraction of complex laser stripes and lays a solid data foundation for subsequent feature parameter calculations and 3D reconstruction. Full article

(This article belongs to the Special Issue Computer Vision and Sensing Technologies for Industrial Quality Inspection: 2nd Edition)

► Show Figures

Figure 1

21 pages, 7839 KiB

Open AccessArticle

High-Throughput 3D Rice Chalkiness Detection Based on Micro-CT and VSE-UNet

by Zhiqi Cai, Yangjun Deng, Xinghui Zhu, Bo Li, Chenglin Xu and Donghui Li

Agronomy 2025, 15(2), 450; https://doi.org/10.3390/agronomy15020450 - 12 Feb 2025

Viewed by 264

Abstract

Rice is a staple food for nearly half the global population and, with rising living standards, the demand for high-quality grain is increasing. Chalkiness, a key determinant of appearance quality, requires accurate detection for effective quality evaluation. While traditional 2D imaging has been [...] Read more.

Rice is a staple food for nearly half the global population and, with rising living standards, the demand for high-quality grain is increasing. Chalkiness, a key determinant of appearance quality, requires accurate detection for effective quality evaluation. While traditional 2D imaging has been used for chalkiness detection, its inherent inability to capture complete 3D morphology limits its suitability for precision agriculture and breeding. Although micro-CT has shown promise in 3D chalk phenotype analysis, high-throughput automated 3D detection for multiple grains remains a challenge, hindering practical applications. To address this, we propose a high-throughput 3D chalkiness detection method using micro-CT and VSE-UNet. Our method begins with non-destructive 3D imaging of grains using micro-CT. For the accurate segmentation of kernels and chalky regions, we propose VSE-UNet, an improved VGG-UNet with an SE attention mechanism for enhanced feature learning. Through comprehensive training optimization strategies, including the Dice focal loss function and dropout technique, the model achieves robust and accurate segmentation of both kernels and chalky regions in continuous CT slices. To enable high-throughput 3D analysis, we developed a unified 3D detection framework integrating isosurface extraction, point cloud conversion, DBSCAN clustering, and Poisson reconstruction. This framework overcomes the limitations of single-grain analysis, enabling simultaneous multi-grain detection. Finally, 3D morphological indicators of chalkiness are calculated using triangular mesh techniques. Experimental results demonstrate significant improvements in both 2D segmentation (7.31% improvement in chalkiness IoU, 2.54% in mIoU, 2.80% in mPA) and 3D phenotypic measurements, with VSE-UNet achieving more accurate volume and dimensional measurements compared with the baseline. These improvements provide a reliable foundation for studying chalkiness formation and enable high-throughput phenotyping. Full article

(This article belongs to the Section Precision and Digital Agriculture)

► Show Figures

Figure 1

18 pages, 39910 KiB

Open AccessArticle

DyGS-SLAM: Realistic Map Reconstruction in Dynamic Scenes Based on Double-Constrained Visual SLAM

by Fan Zhu, Yifan Zhao, Ziyu Chen, Chunmao Jiang, Hui Zhu and Xiaoxi Hu

Remote Sens. 2025, 17(4), 625; https://doi.org/10.3390/rs17040625 - 12 Feb 2025

Viewed by 426

Abstract

Visual SLAM is widely applied in robotics and remote sensing. The fusion of Gaussian radiance fields and Visual SLAM has demonstrated astonishing efficacy in constructing high-quality dense maps. While existing methods perform well in static scenes, they are prone to the influence of [...] Read more.

Visual SLAM is widely applied in robotics and remote sensing. The fusion of Gaussian radiance fields and Visual SLAM has demonstrated astonishing efficacy in constructing high-quality dense maps. While existing methods perform well in static scenes, they are prone to the influence of dynamic objects in real-world dynamic environments, thus making robust tracking and mapping challenging. We introduce DyGS-SLAM, a Visual SLAM system that employs dual constraints to achieve high-fidelity static map reconstruction in dynamic environments. We extract ORB features within the scene, and use open-world semantic segmentation models and multi-view geometry to construct dual constraints, forming a zero-shot dynamic information elimination module while recovering backgrounds occluded by dynamic objects. Furthermore, we select high-quality keyframes and use them for loop closure detection and global optimization, constructing a foundational Gaussian map through a set of determined point clouds and poses and integrating repaired frames for rendering new viewpoints and optimizing 3D scenes. Experimental results on the TUM RGB-D, Bonn, and Replica datasets, as well as real scenes, demonstrate that our method has excellent localization accuracy and mapping quality in dynamic scenes. Full article

(This article belongs to the Special Issue 3D Scene Reconstruction, Modeling and Analysis Using Remote Sensing)

► Show Figures

Figure 1

23 pages, 5392 KiB

Open AccessArticle

A Sliding Window-Based CNN-BiGRU Approach for Human Skeletal Pose Estimation Using mmWave Radar

by Yuquan Luo, Yuqiang He, Yaxin Li, Huaiqiang Liu, Jun Wang and Fei Gao

Sensors 2025, 25(4), 1070; https://doi.org/10.3390/s25041070 - 11 Feb 2025

Viewed by 308

Abstract

In this paper, we present a low-cost, low-power millimeter-wave (mmWave) skeletal joint localization system. High-quality point cloud data are generated using the self-developed BHYY_MMW6044 59–64 GHz mmWave radar device. A sliding window mechanism is introduced to extend the single-frame point cloud into multi-frame [...] Read more.

In this paper, we present a low-cost, low-power millimeter-wave (mmWave) skeletal joint localization system. High-quality point cloud data are generated using the self-developed BHYY_MMW6044 59–64 GHz mmWave radar device. A sliding window mechanism is introduced to extend the single-frame point cloud into multi-frame time-series data, enabling the full utilization of temporal information. This is combined with convolutional neural networks (CNNs) for spatial feature extraction and a bidirectional gated recurrent unit (BiGRU) for temporal modeling. The proposed spatio-temporal information fusion framework for multi-frame point cloud data fully exploits spatio-temporal features, effectively alleviates the sparsity issue of radar point clouds, and significantly enhances the accuracy and robustness of pose estimation. Experimental results demonstrate that the proposed system accurately detects 25 skeletal joints, particularly improving the positioning accuracy of fine joints, such as the wrist, thumb, and fingertip, highlighting its potential for widespread application in human–computer interaction, intelligent monitoring, and motion analysis. Full article

(This article belongs to the Section Radar Sensors)

► Show Figures

Figure 1

20 pages, 3024 KiB

Open AccessArticle

Building Lightweight 3D Indoor Models from Point Clouds with Enhanced Scene Understanding

by Minglei Li, Mingfan Li, Min Li and Leheng Xu

Remote Sens. 2025, 17(4), 596; https://doi.org/10.3390/rs17040596 - 10 Feb 2025

Viewed by 349

Abstract

Indoor scenes often contain complex layouts and interactions between objects, making 3D modeling of point clouds inherently difficult. In this paper, we design a divide-and-conquer modeling method considering the structural differences between indoor walls and internal objects. To achieve semantic understanding, we propose [...] Read more.

Indoor scenes often contain complex layouts and interactions between objects, making 3D modeling of point clouds inherently difficult. In this paper, we design a divide-and-conquer modeling method considering the structural differences between indoor walls and internal objects. To achieve semantic understanding, we propose an effective 3D instance segmentation module using a deep network Indoor3DNet combined with super-point clustering, which provides a larger receptive field and maintains the continuity of individual objects. The Indoor3DNet includes an efficient point feature extraction backbone with good operability for different object granularity. In addition, we use a geometric primitives-based modeling approach to generate lightweight polygonal facets for walls and use a cross-modal registration technique to fit the corresponding instance models for internal objects based on their semantic labels. This modeling method can restore correct geometric shapes and topological relationships while maintaining a very lightweight structure. We have tested the method on diverse datasets, and the experimental results demonstrate that the method outperforms the state-of-the-art in terms of performance and robustness. Full article

(This article belongs to the Special Issue Intelligent Processing of 3D Point Clouds for Scene Understanding and Modelling)

► Show Figures

Figure 1

21 pages, 16141 KiB

Open AccessArticle

The Development of a Sorting System Based on Point Cloud Weight Estimation for Fattening Pigs

by Luo Liu, Yangsen Ou, Zhenan Zhao, Mingxia Shen, Ruqian Zhao and Longshen Liu

Agriculture 2025, 15(4), 365; https://doi.org/10.3390/agriculture15040365 - 8 Feb 2025

Viewed by 405

Abstract

As large-scale and intensive fattening pig farming has become mainstream, the increase in farm size has led to more severe issues related to the hierarchy within pig groups. Due to genetic differences among individual fattening pigs, those that grow faster enjoy a higher [...] Read more.

As large-scale and intensive fattening pig farming has become mainstream, the increase in farm size has led to more severe issues related to the hierarchy within pig groups. Due to genetic differences among individual fattening pigs, those that grow faster enjoy a higher social rank. Larger pigs with greater aggression continuously acquire more resources, further restricting the survival space of weaker pigs. Therefore, fattening pigs must be grouped rationally, and the management of weaker pigs must be enhanced. This study, considering current fattening pig farming needs and actual production environments, designed and implemented an intelligent sorting system based on weight estimation. The main hardware structure of the partitioning equipment includes a collection channel, partitioning channel, and gantry-style collection equipment. Experimental data were collected, and the original scene point cloud was preprocessed to extract the back point cloud of fattening pigs. Based on the morphological characteristics of the fattening pigs, the back point cloud segmentation method was used to automatically extract key features such as hip width, hip height, shoulder width, shoulder height, and body length. The segmentation algorithm first calculates the centroid of the point cloud and the eigenvectors of the covariance matrix to reconstruct the point cloud coordinate system. Then, based on the variation characteristics and geometric shape of the consecutive horizontal slices of the point cloud, hip width and shoulder width slices are extracted, and the related features are calculated. Weight estimation was performed using Random Forest, Multilayer perceptron (MLP), linear regression based on the least squares method, and ridge regression models, with parameter tuning using Bayesian optimization. The mean squared error, mean absolute error, and mean relative error were used as evaluation metrics to assess the model’s performance. Finally, the classification capability was evaluated using the median and average weights of the fattening pigs as partitioning standards. The experimental results show that the system’s average relative error in weight estimation is approximately 2.90%, and the total time for the partitioning process is less than 15 s, which meets the needs of practical production. Full article

(This article belongs to the Special Issue Modeling of Livestock Breeding Environment and Animal Behavior)

► Show Figures

Figure 1

19 pages, 3685 KiB

Open AccessArticle

Semantic Segmentation of Key Categories in Transmission Line Corridor Point Clouds Based on EMAFL-PTv3

by Li Lu, Linong Wang, Shaocheng Wu, Shengxuan Zu, Yuhao Ai and Bin Song

Electronics 2025, 14(4), 650; https://doi.org/10.3390/electronics14040650 - 8 Feb 2025

Viewed by 373

Abstract

Accurate and efficient segmentation of key categories of transmission line corridor point clouds is one of the prerequisite technologies for the application of transmission line drone inspection. However, current semantic segmentation methods are limited to a few categories, involve cumbersome processes, and exhibit [...] Read more.

Accurate and efficient segmentation of key categories of transmission line corridor point clouds is one of the prerequisite technologies for the application of transmission line drone inspection. However, current semantic segmentation methods are limited to a few categories, involve cumbersome processes, and exhibit low accuracy. To address these issues, this paper proposes EMAFL-PTv3, a deep learning model for semantic segmentation of transmission line corridor point clouds. Built upon Point Transformer v3 (PTv3), EMAFL-PTv3 integrates Efficient Multi-Scale Attention (EMA) to enhance feature extraction at different scales, incorporates Focal Loss to mitigate class imbalance, and achieves accurate segmentation into five categories: ground, ground wire, insulator string, pylon, and transmission line. EMAFL-PTv3 is evaluated on a dataset of 40 spans of transmission line corridor point clouds collected by a drone in Wuhan and Xiangyang, Hubei Province. Experimental results demonstrate that EMAFL-PTv3 outperforms PTv3 in all categories, with notable improvements in the more challenging categories: insulator string (IoU 67.25%) and Pylon (IoU 91.77%), showing increases of 7.06% and 11.39%, respectively. The mIoU, mA, and OA scores reach 90.46%, 92.86%, and 98.07%, representing increases of 5.49%, 2.75%, and 2.44% over PTv3, respectively, proving its superior performance. Full article

► Show Figures

Figure 1

23 pages, 4583 KiB

Open AccessArticle

Research on Fine-Scale Terrain Construction in High Vegetation Coverage Areas Based on Implicit Neural Representations

by Yi Zhang, Peipei He, Haihang Jing, Bin He, Weibo Yin, Junzhen Meng, Yuntian Ma, Haifeng Zhang, Bo Zhang and Haoxiang Shen

Sustainability 2025, 17(3), 1320; https://doi.org/10.3390/su17031320 - 6 Feb 2025

Viewed by 429

Abstract

Due to the high-density coverage of vegetation, the complexity of terrain, and occlusion issues, ground point extraction faces significant challenges. Airborne Light Detection and Ranging (LiDAR) technology plays a crucial role in complex mountainous areas. This article proposes a method for constructing fine [...] Read more.

Due to the high-density coverage of vegetation, the complexity of terrain, and occlusion issues, ground point extraction faces significant challenges. Airborne Light Detection and Ranging (LiDAR) technology plays a crucial role in complex mountainous areas. This article proposes a method for constructing fine terrain in high vegetation coverage areas based on implicit neural representation. This method consists of data preprocessing, multi-scale and multi-feature high-difference point cloud initial filtering, and an upsampling module based on implicit neural representation. Firstly, preprocess the regional point cloud data is preprocessed; then, K-dimensional trees (K-d trees) are used to construct spatial indexes, and spherical neighborhood methods are applied to capture the geometric and physical information of point clouds for multi-feature fusion, enhancing the distinction between terrain and non-terrain elements. Subsequently, a differential model is constructed based on DSM (Digital Surface Model) at different scales, and the elevation variation coefficient is calculated to determine the threshold for extracting the initial set of ground points. Finally, the upsampling module using implicit neural representation is used to finely process the initial ground point set, providing a complete and uniformly dense ground point set for the subsequent construction of fine terrain. To validate the performance of the proposed method, three sets of point cloud data from mountainous terrain with different features are selected as the experimental area. The experimental results indicate that, from a qualitative perspective, the proposed method significantly improves the classification of vegetation, buildings, and roads, with clear boundaries between different types of terrain. From a quantitative perspective, the Type I errors of the three selected regions are 4.3445%, 5.0623%, and 5.9436%, respectively. The Type II errors are 5.7827%, 6.8516%, and 7.3478%, respectively. The overall errors are 5.3361%, 6.4882%, and 6.7168%, respectively. The Kappa coefficients of the measurement areas all exceed 80%, indicating that the proposed method performs well in complex mountainous environments. Provide point cloud data support for the construction of wind and photovoltaic bases in China, reduce potential damage to the ecological environment caused by construction activities, and contribute to the sustainable development of ecology and energy. Full article

► Show Figures

Figure 1

21 pages, 6413 KiB

Open AccessArticle

Targetless Radar–Camera Extrinsic Parameter Calibration Using Track-to-Track Association

by Xinyu Liu, Zhenmiao Deng and Gui Zhang

Sensors 2025, 25(3), 949; https://doi.org/10.3390/s25030949 - 5 Feb 2025

Viewed by 436

Abstract

One of the challenges in calibrating millimeter-wave radar and camera lies in the sparse semantic information of the radar point cloud, making it hard to extract environment features corresponding to the images. To overcome this problem, we propose a track association algorithm for [...] Read more.

One of the challenges in calibrating millimeter-wave radar and camera lies in the sparse semantic information of the radar point cloud, making it hard to extract environment features corresponding to the images. To overcome this problem, we propose a track association algorithm for heterogeneous sensors, to achieve targetless calibration between the radar and camera. Our algorithm extracts corresponding points from millimeter-wave radar and image coordinate systems by considering the association of tracks from different sensors, without any explicit target or prior for the extrinsic parameter. Then, perspective-n-point (PnP) and nonlinear optimization algorithms are applied to obtain the extrinsic parameter. In an outdoor experiment, our algorithm achieved a track association accuracy of 96.43% and an average reprojection error of 2.6649 pixels. On the CARRADA dataset, our calibration method yielded a reprojection error of 3.1613 pixels, an average rotation error of 0.8141°, and an average translation error of 0.0754 m. Furthermore, robustness tests demonstrated the effectiveness of our calibration algorithm in the presence of noise. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

38 pages, 14791 KiB

Open AccessFeature PaperArticle

Online High-Definition Map Construction for Autonomous Vehicles: A Comprehensive Survey

by Hongyu Lyu, Julie Stephany Berrio Perez, Yaoqi Huang, Kunming Li, Mao Shan and Stewart Worrall

J. Sens. Actuator Netw. 2025, 14(1), 15; https://doi.org/10.3390/jsan14010015 - 2 Feb 2025

Viewed by 558

Abstract

High-definition (HD) maps aim to provide detailed road information with centimeter-level accuracy, essential for enabling precise navigation and safe operation of autonomous vehicles (AVs). Traditional offline construction methods involve several complex steps, such as data collection, point cloud generation, and feature extraction, but [...] Read more.

High-definition (HD) maps aim to provide detailed road information with centimeter-level accuracy, essential for enabling precise navigation and safe operation of autonomous vehicles (AVs). Traditional offline construction methods involve several complex steps, such as data collection, point cloud generation, and feature extraction, but these methods are resource-intensive and struggle to keep pace with the rapidly changing road environments. In contrast, online HD map construction leverages onboard sensor data to dynamically generate local HD maps, offering a bird’s-eye view (BEV) representation of the surrounding road environment. This approach has the potential to improve adaptability to spatial and temporal changes in road conditions while enhancing cost-efficiency by reducing the dependency on frequent map updates and expensive survey fleets. This survey provides a comprehensive analysis of online HD map construction, including the task background, high-level motivations, research methodology, key advancements, existing challenges, and future trends. We systematically review the latest advancements in three key sub-tasks: map segmentation, map element detection, and lane graph construction, aiming to bridge gaps in the current literature. We also discuss existing challenges and future trends, covering standardized map representation design, multitask learning, and multi-modality fusion, while offering suggestions for potential improvements. Full article

(This article belongs to the Special Issue Advances in Intelligent Transportation Systems (ITS))

► Show Figures

Figure 1

19 pages, 11928 KiB

Open AccessArticle

Point Cloud Vibration Compensation Algorithm Based on an Improved Gaussian–Laplacian Filter

by Wanhe Du, Xianfeng Yang and Jinghui Yang

Electronics 2025, 14(3), 573; https://doi.org/10.3390/electronics14030573 - 31 Jan 2025

Viewed by 491

Abstract

In industrial environments, steel plate surface inspection plays a crucial role in quality control. However, vibrations during laser scanning can significantly impact measurement accuracy. While traditional vibration compensation methods rely on complex dynamic modeling, they often face challenges in practical implementation and generalization. [...] Read more.

In industrial environments, steel plate surface inspection plays a crucial role in quality control. However, vibrations during laser scanning can significantly impact measurement accuracy. While traditional vibration compensation methods rely on complex dynamic modeling, they often face challenges in practical implementation and generalization. This paper introduces a novel point cloud vibration compensation algorithm that combines an improved Gaussian–Laplacian filter with adaptive local feature analysis. The key innovations include (1) an FFT-based vibration factor extraction method that effectively identifies vibration trends, (2) an adaptive windowing strategy that automatically adjusts based on local geometric features, and (3) a weighted compensation mechanism that preserves surface details while reducing vibration noise. The algorithm demonstrated significant improvements in signal-to-noise ratio: 15.78% for simulated data, 6.81% for precision standard parts, and 12.24% for actual industrial measurements. Experimental validation confirms the algorithm’s effectiveness across different conditions. This approach achieved a practical, implementable solution for surface inspection in steel plate surface inspection. Full article

► Show Figures

Figure 1

22 pages, 807 KiB

Open AccessArticle

Fusing Skeleton-Based Scene Flow for Gesture Recognition on Point Clouds

by Yahui Liu and Jiajia Jiao

Electronics 2025, 14(3), 567; https://doi.org/10.3390/electronics14030567 - 31 Jan 2025

Viewed by 441

Abstract

Dynamic gesture recognition has recently aimed to learn static and motion features by exploiting point clouds from depth images. However, the weak correlation between some pixels and hand gestures makes the extracted dynamic features redundant. When search points and adjacent points in a [...] Read more.

Dynamic gesture recognition has recently aimed to learn static and motion features by exploiting point clouds from depth images. However, the weak correlation between some pixels and hand gestures makes the extracted dynamic features redundant. When search points and adjacent points in a larger feature space maintain movement consistency, more detailed movements are ignored. To improve the ability to capture fine-grained dynamic features and improve the relevance of point clouds and gestures, we propose a novel method of fusing skeleton-based scene flow for gesture recognition (FSS-GR) for higher recognition accuracy. Firstly, skeletons are automatically converted into pairs of point clouds. Based on the time interval between source and target point clouds and scene flow measurement indicators, four scene flow estimators are obtained. To minimize the additional cost of capturing fine-grained information, scene flow is used as datasets before fusion. Then, the coarse-grained dynamic features from depth images are fused with the obtained scene flow using different strategies, so that the flexible tradeoffs between model complexity and recognition performance are available for various scenarios. The comprehensive experiments and ablation study on SHREC’17 and DHG demonstrate that FSS-GR achieves a higher accuracy than state-of-the-art works. Full article

(This article belongs to the Special Issue Machine Learning and Deep Learning Based Pattern Recognition)

► Show Figures

Figure 1

39 pages, 4315 KiB

Open AccessReview

A Review of Embodied Grasping

by Jianghao Sun, Pengjun Mao, Lingju Kong and Jun Wang

Sensors 2025, 25(3), 852; https://doi.org/10.3390/s25030852 - 30 Jan 2025

Viewed by 560

Abstract

Pre-trained models trained with internet-scale data have achieved significant improvements in perception, interaction, and reasoning. Using them as the basis of embodied grasping methods has greatly promoted the development of robotics applications. In this paper, we provide a comprehensive review of the latest [...] Read more.

Pre-trained models trained with internet-scale data have achieved significant improvements in perception, interaction, and reasoning. Using them as the basis of embodied grasping methods has greatly promoted the development of robotics applications. In this paper, we provide a comprehensive review of the latest developments in this field. First, we summarize the embodied foundations, including cutting-edge embodied robots, simulation platforms, publicly available datasets, and data acquisition methods, to fully understand the research focus. Then, the embodied algorithms are introduced, starting from pre-trained models, with three main research goals: (1) embodied perception, using data captured by visual sensors to perform point cloud extraction or 3D reconstruction, combined with pre-trained models, to understand the target object and external environment and directly predict the execution of actions; (2) embodied strategy: In imitation learning, the pre-trained model is used to enhance data or as a feature extractor to enhance the generalization ability of the model. In reinforcement learning, the pre-trained model is used to obtain the optimal reward function, which improves the learning efficiency and ability of reinforcement learning; (3) embodied agent: The pre-trained model adopts hierarchical or holistic execution to achieve end-to-end robot control. Finally, the challenges of the current research are summarized, and a perspective on feasible technical routes is provided. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

17 pages, 3362 KiB

Open AccessArticle

Truck Lifting Accident Detection Method Based on Improved PointNet++ for Container Terminals

by Yang Shen, Xintai Man, Jiaqi Wang, Yujie Zhang and Chao Mi

J. Mar. Sci. Eng. 2025, 13(2), 256; https://doi.org/10.3390/jmse13020256 - 30 Jan 2025

Viewed by 460

Abstract

In container terminal operations, truck lifting accidents pose a serious threat to the safety and efficiency of automated equipment. Traditional detection methods using visual cameras and single-line Light Detection and Ranging (LiDAR) are insufficient for capturing three-dimensional spatial features, leading to reduced detection [...] Read more.

In container terminal operations, truck lifting accidents pose a serious threat to the safety and efficiency of automated equipment. Traditional detection methods using visual cameras and single-line Light Detection and Ranging (LiDAR) are insufficient for capturing three-dimensional spatial features, leading to reduced detection accuracy. Moreover, the boundary features of key accident objects, such as containers, truck chassis, and wheels, are often blurred, resulting in frequent false and missed detections. To tackle these challenges, this paper proposes an accident detection method based on multi-line LiDAR and an improved PointNet++ model. This method uses multi-line LiDAR to collect point cloud data from operational lanes in real time and enhances the PointNet++ model by integrating a multi-layer perceptron (MLP) and a mixed attention mechanism (MAM), optimizing the model’s ability to extract local and global features. This results in high-precision semantic segmentation and accident detection of critical structural point clouds, such as containers, truck chassis, and wheels. Experiments confirm that the proposed method achieves superior performance compared to the current mainstream algorithms regarding point cloud segmentation accuracy and stability. In engineering tests across various real-world conditions, the model exhibits strong generalization capability. Full article

(This article belongs to the Special Issue Sustainable Maritime Transport and Port Intelligence)

► Show Figures

Figure 1

Search Results (869)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (869)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI