Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (804)

Search Parameters:
Keywords = point cloud feature extraction

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
13 pages, 2762 KiB  
Article
Advanced Point Cloud Techniques for Improved 3D Object Detection: A Study on DBSCAN, Attention, and Downsampling
by Wenqiang Zhang, Xiang Dong, Jingjing Cheng and Shuo Wang
World Electr. Veh. J. 2024, 15(11), 527; https://doi.org/10.3390/wevj15110527 (registering DOI) - 15 Nov 2024
Viewed by 172
Abstract
To address the challenges of limited detection precision and insufficient segmentation of small to medium-sized objects in dynamic and complex scenarios, such as the dense intermingling of pedestrians, vehicles, and various obstacles in urban environments, we propose an enhanced methodology. Firstly, we integrated [...] Read more.
To address the challenges of limited detection precision and insufficient segmentation of small to medium-sized objects in dynamic and complex scenarios, such as the dense intermingling of pedestrians, vehicles, and various obstacles in urban environments, we propose an enhanced methodology. Firstly, we integrated a point cloud processing module utilizing the DBSCAN clustering algorithm to effectively segment and extract critical features from the point cloud data. Secondly, we introduced a fusion attention mechanism that significantly improves the network’s capability to capture both global and local features, thereby enhancing object detection performance in complex environments. Finally, we incorporated a CSPNet downsampling module, which substantially boosts the network’s overall performance and processing speed while reducing computational costs through advanced feature map segmentation and fusion techniques. The proposed method was evaluated using the KITTI dataset. Under moderate difficulty, the BEV mAP for detecting cars, pedestrians, and cyclists achieved 87.74%, 55.07%, and 67.78%, reflecting improvements of 1.64%, 5.84%, and 5.53% over PointPillars. For 3D mAP, the detection accuracy for cars, pedestrians, and cyclists reached 77.90%, 49.22%, and 62.10%, with improvements of 2.91%, 5.69%, and 3.03% compared to PointPillars. Full article
(This article belongs to the Special Issue Recent Advances in Intelligent Vehicle)
Show Figures

Figure 1

Figure 1
<p>PointPillars network architecture.</p>
Full article ">Figure 2
<p>Comparison of point cloud before and after processing.</p>
Full article ">Figure 3
<p>Feature extraction incorporating the attention mechanism.</p>
Full article ">Figure 4
<p>Flowchart of CSPNet network.</p>
Full article ">Figure 5
<p>Comparison of the results of PointPillars with the algorithm of this paper. The left part of each scene is the result of the baseline, and the right part is the result of the proposed approach. (<b>a</b>,<b>d</b>) show improvements for false detections caused by under-segmentation of small objects, while (<b>b</b>,<b>c</b>) show improvements for missed detections caused by occlusion.</p>
Full article ">
17 pages, 7758 KiB  
Article
An Autotuning Hybrid Method with Bayesian Optimization for Road Edge Extraction in Highway Systems from Point Clouds
by Jingxu Chen, Qiru Cao, Mingzhuang Hua, Jinyang Liu, Jie Ma, Di Wang and Aoxiang Liu
Systems 2024, 12(11), 480; https://doi.org/10.3390/systems12110480 - 11 Nov 2024
Viewed by 499
Abstract
In transportation infrastructure systems, feature images and spatial characteristics are generally utilized as complementary elements derived from point clouds for road edge extraction, but the involvement of one or more hyperparameters in each makes the extraction complicated. This study proposes an autotuning hybrid [...] Read more.
In transportation infrastructure systems, feature images and spatial characteristics are generally utilized as complementary elements derived from point clouds for road edge extraction, but the involvement of one or more hyperparameters in each makes the extraction complicated. This study proposes an autotuning hybrid method with Bayesian optimization for road edge extraction in highway systems. The hybrid method combines the strengths of 2D feature images and 3D spatial characteristics while also automatically tuning the hyperparameter combination using Bayesian optimization. The hyperparameters encompass high and low pixel gradient thresholds, neighborhood radius, and normal vector threshold. Later, the point cloud dataset of national highways in Henan Province, China, is taken as the case study to evaluate the performance of the proposed method against three benchmark methods in two typical road scenarios: straight and curved edges. Experimental results show that the proposed method outperforms the benchmarks in detection quality and accuracy. It can serve as a decision-making tool to complement traditional manual road surveying, enabling efficient and automated road edge extraction in highway systems. Full article
Show Figures

Figure 1

Figure 1
<p>Methodological flowchart.</p>
Full article ">Figure 2
<p>Study area of point cloud dataset.</p>
Full article ">Figure 3
<p>Two typical road scenarios.</p>
Full article ">Figure 4
<p>Results of data preprocessing steps.</p>
Full article ">Figure 5
<p>Process of a 2D feature image-based module.</p>
Full article ">Figure 6
<p>Process of 3D spatial characteristics-based module.</p>
Full article ">Figure 7
<p>Bayesian hyperparameter optimization.</p>
Full article ">Figure 8
<p>Graphical visualization of the hybrid method with BO.</p>
Full article ">
29 pages, 61165 KiB  
Article
LiDAR-360 RGB Camera-360 Thermal Camera Targetless Calibration for Dynamic Situations
by Khanh Bao Tran, Alexander Carballo and Kazuya Takeda
Sensors 2024, 24(22), 7199; https://doi.org/10.3390/s24227199 - 10 Nov 2024
Viewed by 596
Abstract
Integrating multiple types of sensors into autonomous systems, such as cars and robots, has become a widely adopted approach in modern technology. Among these sensors, RGB cameras, thermal cameras, and LiDAR are particularly valued for their ability to provide comprehensive environmental data. However, [...] Read more.
Integrating multiple types of sensors into autonomous systems, such as cars and robots, has become a widely adopted approach in modern technology. Among these sensors, RGB cameras, thermal cameras, and LiDAR are particularly valued for their ability to provide comprehensive environmental data. However, despite their advantages, current research primarily focuses on the one or combination of two sensors at a time. The full potential of utilizing all three sensors is often neglected. One key challenge is the ego-motion compensation of data in dynamic situations, which results from the rotational nature of the LiDAR sensor, and the blind spots of standard cameras due to their limited field of view. To resolve this problem, this paper proposes a novel method for the simultaneous registration of LiDAR, panoramic RGB cameras, and panoramic thermal cameras in dynamic environments without the need for calibration targets. Initially, essential features from RGB images, thermal data, and LiDAR point clouds are extracted through a novel method, designed to capture significant raw data characteristics. These extracted features then serve as a foundation for ego-motion compensation, optimizing the initial dataset. Subsequently, the raw features can be further refined to enhance calibration accuracy, achieving more precise alignment results. The results of the paper demonstrate the effectiveness of this approach in enhancing multiple sensor calibration compared to other ways. In the case of a high speed of around 9 m/s, some situations can improve the accuracy about 30 percent higher for LiDAR and Camera calibration. The proposed method has the potential to significantly improve the reliability and accuracy of autonomous systems in real-world scenarios, particularly under challenging environmental conditions. Full article
(This article belongs to the Section Radar Sensors)
Show Figures

Figure 1

Figure 1
<p>Visualization of the system including RGB cameras, thermal cameras, and LiDAR. 360 RGB camera and 360 thermal camera are made from independent cameras to remove blind spots. Images and point clouds are compensated to decrease negative impacts of motion. Then, point clouds and images are used for sensor calibration based on extracted features.</p>
Full article ">Figure 2
<p>Visualization of the target detected by two types of cameras. The (<b>left image</b>) is the target detected by the RGB camera and the (<b>right image</b>) is the target detected by the thermal camera.</p>
Full article ">Figure 3
<p>Our system includes sensors: LiDAR Velodyne Alpha Prime, LadyBug-5 camera, 6 FLIR ADK cameras, LiDAR Ouster-128, LiDAR Ouster-64 and LiDAR Hesai Pandar.</p>
Full article ">Figure 4
<p>Visualization of stitching 360 thermal images and 360 RGB images.</p>
Full article ">Figure 5
<p>Pipeline of the registration process. The approach is divided into two parts, one part focuses on detecting key points from RGB images and thermal images, while the other part detects key points from images converted from LiDAR point clouds. For images generated from LiDAR point clouds, a velocity estimation step is required to perform distortion correction, ensuring the accurate positioning of the scanned points. After getting results from distortion correction, external parameters of LiDAR, 360 RGB camera and 360 thermal camera can be calibrated.</p>
Full article ">Figure 6
<p>Visualization of features extracted from RGB images.</p>
Full article ">Figure 7
<p>Pipeline of our approach. The first step is enhancing images by Retinex Decomposition. The second step is to extract key features from <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>+</mo> <mn>1</mn> </mrow> </semantics></math> consecutive images. The third step is using MobileNetV3 to remove noise features on moving objects.</p>
Full article ">Figure 8
<p>The (<b>above image</b>) shows results before being enhanced by Retinex Decomposition. The (<b>below image</b>) shows results after being enhanced by Retinex Decomposition.</p>
Full article ">Figure 9
<p>The (<b>above image</b>) including the red rectangles shows reliable features extracted from <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>+</mo> <mn>1</mn> </mrow> </semantics></math> consecutive RGB images. The (<b>below image</b>) including the green rectangles shows reliable features after filtering by MobileNetV3.</p>
Full article ">Figure 10
<p>Visualization of features extracted from thermal images.</p>
Full article ">Figure 11
<p>Visualization of image projection. (<b>a</b>) shows the 3D point cloud data from the LiDAR. (<b>b</b>) presents the 2D image data with the intensity channel. (<b>c</b>) presents the 2D image data with the range channel. The height of the image is 128, corresponding to the number of channels in the LiDAR.</p>
Full article ">Figure 12
<p>Visualization of key points extracted from LiDAR images. (<b>a</b>) simulates key points across two frames, while (<b>b</b>) simulates selecting key points with similarity across the two frames.</p>
Full article ">Figure 13
<p>Pipeline of our approach. Key features of projected images are extracted by Superpoint enhanced by LSTM. These features are matched to find pair points in two consecutive frames.</p>
Full article ">Figure 14
<p>Pipeline of the ego-motion compensation process. First, the point clouds are converted into two-dimensional images using Spherical Projection. Key features are then identified within these range images, and corresponding point pairs are matched. By matching key feature pairs, the distance between frames can be determined, allowing for velocity estimation. Finally, velocity and timestamp will be used to resolve ego-motion compensation and point cloud accumulation.</p>
Full article ">Figure 15
<p>Visualization of distortion correction. The motion of the vehicle is presented by the circles, and the LiDAR is also sotating while the vehicle is in motion. (<b>a</b>) shows the actual shape of the obstacle. (<b>b</b>) depicts the shape of the obstacle scanned by LiDAR. (<b>c</b>) illustrates the shape of the obstacle after distortion correction.</p>
Full article ">Figure 16
<p>Visualization of the differences in distortion correction on 3D point clouds within a frame with a speed of 54 km/h and a frequency of 10 Hz. The red part shows the original points of the point clouds, while the green part shows the corrected points. The left image shows points on the <math display="inline"><semantics> <mrow> <mi>x</mi> <mi>y</mi> </mrow> </semantics></math>-plane. The right image shows points on the <math display="inline"><semantics> <mrow> <mi>y</mi> <mi>z</mi> </mrow> </semantics></math>-plane.</p>
Full article ">Figure 17
<p>Visualization of distortion correction of cameras. The blue rectangle is the actual shape, and the red rectangle is the shape distorted by ego-motion.</p>
Full article ">Figure 18
<p>Visualization of 360 RGB–LiDAR images calibration. The (<b>above image</b>) including the red rectangles indicate the calibration results before applying correction. The (<b>below image</b>) including the green rectangles indicates the calibration results after applying correction.</p>
Full article ">Figure 19
<p>Visualization of 360 RGB–thermal images calibration. The (<b>above image</b>) includes red rectangles that indicate the calibration results before applying correction. The (<b>below image</b>) includes green rectangles that indicate the calibration results after applying correction.</p>
Full article ">Figure 20
<p>Visualization of point clouds extracted from Ouster OS1-128 and Velodyne Alpha prime.</p>
Full article ">Figure 21
<p>Visualization of image projection. (<b>a</b>) presents the 2D image data with the intensity channel from Ouster OS1-128. (<b>b</b>) presents the 2D image data with the range channel from Ouster OS1-128. (<b>c</b>) presents the 2D image data with the intensity channel from Velodyne Alpha prime. (<b>d</b>) presents the 2D image data with the range channel from Velodyne Alpha prime.</p>
Full article ">Figure 22
<p>Visualization of velocity comparison between estimated velocity and ground truth over a continuous duration of 720 s. The intervals between approximately 100 to 200 s and 400 to 500 s corresponded to periods when the vehicle was turning. Conversely, the intervals from 0 to approximately 100 s, 200 to 400 s, and 500 to 600 s represented phases when the vehicle was moving straight. The vehicle decelerated and came to a halt between 600 and 720 s. The maximum observed velocity difference was 0.36 m/s, while the average velocity difference over the 720 s period was 0.03 m/s, as in <a href="#sensors-24-07199-t001" class="html-table">Table 1</a>.</p>
Full article ">Figure 23
<p>Comparison with CNN, RIFT, RI-MFM by MAE.</p>
Full article ">Figure 24
<p>Comparison with CNN, RIFT, RI-MFM by Accuracy.</p>
Full article ">Figure 25
<p>Comparison with CNN, RIFT, RI-MFM by RMSE.</p>
Full article ">Figure 26
<p>Red points represent the results of calibration without distortion correction, while blue points represent the results with distortion correction in static situations. The dashed line is the results from the target based method.</p>
Full article ">Figure 27
<p>Red points and blue points represent the results of calibration without and with distortion correction in dynamic situations. The dashed lines present the results using the actual data.</p>
Full article ">Figure 28
<p>Comparison of error in rotation and translation of three methods.</p>
Full article ">
18 pages, 7569 KiB  
Article
An Efficient and Stable Registration Framework for Large Point Clouds at Two Different Moments
by Guangxin Zhao, Jinlong Li, Jingyi Xi and Lin Luo
Sensors 2024, 24(22), 7174; https://doi.org/10.3390/s24227174 - 8 Nov 2024
Viewed by 528
Abstract
Point cloud registration plays a great role in many application scenarios; however, the registration of large-scale point clouds for actual different moments suffers from the problems of low efficiency, low accuracy, and a lack of stability. In this paper, we propose a registration [...] Read more.
Point cloud registration plays a great role in many application scenarios; however, the registration of large-scale point clouds for actual different moments suffers from the problems of low efficiency, low accuracy, and a lack of stability. In this paper, we propose a registration framework for large-scale point clouds at different moments, which firstly downsamples large-scale point clouds using a random sampling method, then performs a random expansion strategy to make up for the loss of information caused by the random sampling, then completes the first registration by a deep learning network based on the extraction of keypoints and feature descriptors in combination with RANSAC, and finally completes the registration using the point-to-point ICP method. We conducted validation experiments and application experiments on large-scale point clouds of key train components, and the experimental results are much higher in accuracy or efficiency than other methods, which proves the effectiveness of our framework, which can be applied to actual large-scale point clouds. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>The point cloud obtained from scanning at different moments. Even without rotation and translation, the points at moments T1 and T2 cannot coincide exactly, which means there is an error in it.</p>
Full article ">Figure 2
<p>Registration framework. This registration framework consists of the pre-processing process, coarse registration process, and fine registration process. The coarse registration consists of random sampling, grouping, a keypoints detector, a feature descriptor, and RANSAC, while the fine registration is performed by the point-to-point ICP algorithm.</p>
Full article ">Figure 3
<p>Registration network based on keypoints and descriptors. The network contains random sampling, grouping, keypoints processing, and feature descriptors, and gives the size of the data obtained at each step as well as the composition of the loss function.</p>
Full article ">Figure 4
<p>Standard <math display="inline"><semantics> <mrow> <mi>k</mi> </mrow> </semantics></math> NN-based cluster (<b>left</b>) and random dilation cluster (<b>right</b>). The random dilation cluster selects <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>×</mo> <mi>k</mi> </mrow> </semantics></math> proximity points near the candidate point and then randomly selects <math display="inline"><semantics> <mrow> <mi>k</mi> </mrow> </semantics></math> proximity points to complete the clustering.</p>
Full article ">Figure 5
<p>Example of constructing a train set. The original image was randomly sampled to 32,768 points and then randomly translated and rotated to obtain the registration point cloud pairs.</p>
Full article ">Figure 6
<p>Example of constructing a partially overlapping dataset. A partially overlapping dataset was obtained by randomly cropping 80% of the points in the point cloud pairs based on the complete dataset.</p>
Full article ">Figure 7
<p>Example of constructing an application test set for the same scenario. Point clouds at different moments in the same scenario were obtained by laser scanning.</p>
Full article ">Figure 8
<p>Example of constructing an application test set for different scenarios. Point clouds at different moments in different scenarios were obtained by laser scanning.</p>
Full article ">Figure 9
<p>Point cloud registration results of our method on key train components.</p>
Full article ">Figure 10
<p>Application test visualization results in the same scenario.</p>
Full article ">Figure 11
<p>Application test visualization results in different scenarios.</p>
Full article ">Figure 12
<p>Example of registration of anomalous point clouds with normal point clouds. (<b>a</b>,<b>b</b>) are normal components after registration; (<b>c</b>,<b>d</b>) are abnormal components after registration.</p>
Full article ">
18 pages, 982 KiB  
Review
Remote Sensing and GIS in Natural Resource Management: Comparing Tools and Emphasizing the Importance of In-Situ Data
by Sanjeev Sharma, Justin O. Beslity, Lindsey Rustad, Lacy J. Shelby, Peter T. Manos, Puskar Khanal, Andrew B. Reinmann and Churamani Khanal
Remote Sens. 2024, 16(22), 4161; https://doi.org/10.3390/rs16224161 - 8 Nov 2024
Viewed by 815
Abstract
Remote sensing (RS) and Geographic Information Systems (GISs) provide significant opportunities for monitoring and managing natural resources across various temporal, spectral, and spatial resolutions. There is a critical need for natural resource managers to understand the expanding capabilities of image sources, analysis techniques, [...] Read more.
Remote sensing (RS) and Geographic Information Systems (GISs) provide significant opportunities for monitoring and managing natural resources across various temporal, spectral, and spatial resolutions. There is a critical need for natural resource managers to understand the expanding capabilities of image sources, analysis techniques, and in situ validation methods. This article reviews key image analysis tools in natural resource management, highlighting their unique strengths across diverse applications such as agriculture, forestry, water resources, soil management, and natural hazard monitoring. Google Earth Engine (GEE), a cloud-based platform introduced in 2010, stands out for its vast geospatial data catalog and scalability, making it ideal for global-scale analysis and algorithm development. ENVI, known for advanced multi- and hyperspectral image processing, excels in vegetation monitoring, environmental analysis, and feature extraction. ERDAS IMAGINE specializes in radar data analysis and LiDAR processing, offering robust classification and terrain analysis capabilities. Global Mapper is recognized for its versatility, supporting over 300 data formats and excelling in 3D visualization and point cloud processing, especially in UAV applications. eCognition leverages object-based image analysis (OBIA) to enhance classification accuracy by grouping pixels into meaningful objects, making it effective in environmental monitoring and urban planning. Lastly, QGIS integrates these remote sensing tools with powerful spatial analysis functions, supporting decision-making in sustainable resource management. Together, these tools when paired with in situ data provide comprehensive solutions for managing and analyzing natural resources across scales. Full article
Show Figures

Figure 1

Figure 1
<p>Articles published using different image analysis tools in different time intervals.</p>
Full article ">Figure 2
<p>Map of sites identified and included in database.</p>
Full article ">
18 pages, 5160 KiB  
Article
DPFANet: Deep Point Feature Aggregation Network for Classification of Irregular Objects in LIDAR Point Clouds
by Shuming Zhang and Dali Xu
Electronics 2024, 13(22), 4355; https://doi.org/10.3390/electronics13224355 - 6 Nov 2024
Viewed by 383
Abstract
Point cloud data acquired by scanning with Light Detection and Ranging (LiDAR) devices typically contain irregular objects, such as trees, which lead to low classification accuracy in existing point cloud classification methods. Consequently, this paper proposes a deep point feature aggregation network (DPFANet) [...] Read more.
Point cloud data acquired by scanning with Light Detection and Ranging (LiDAR) devices typically contain irregular objects, such as trees, which lead to low classification accuracy in existing point cloud classification methods. Consequently, this paper proposes a deep point feature aggregation network (DPFANet) that integrates adaptive graph convolution and space-filling curve sampling modules to effectively address the feature extraction problem for irregular object point clouds. To refine the feature representation, we utilize the affinity matrix to quantify inter-channel relationships and adjust the input feature matrix accordingly, thereby improving the classification accuracy of the object point cloud. To validate the effectiveness of the proposed approach, a TreeNet dataset was created, comprising four categories of tree point clouds derived from publicly available UAV point cloud data. The experimental findings illustrate that the model attains a mean accuracy of 91.4% on the ModelNet40 dataset, comparable to prevailing state-of-the-art techniques. When applied to the more challenging TreeNet dataset, the model achieves a mean accuracy of 88.0%, surpassing existing state-of-the-art methods in all classification metrics. These results underscore the high potential of the model for point cloud classification of irregular objects. Full article
(This article belongs to the Special Issue Point Cloud Data Processing and Applications)
Show Figures

Figure 1

Figure 1
<p>Network architecture of DPFANet. For the AGConv module (see <a href="#sec3dot2-electronics-13-04355" class="html-sec">Section 3.2</a>), different feature correspondences are learned by generating an adaptive convolutional kernel. The FEAF module (see <a href="#sec3dot3-electronics-13-04355" class="html-sec">Section 3.3</a>) is the point feature extraction and fusion module. The CAA module (see <a href="#sec3dot4-electronics-13-04355" class="html-sec">Section 3.4</a>) is a channel-based attention mechanism specifically designed for fine-grained representation of features. SA and SA(MSG) represent the set abstraction module proposed by PointNet. The LBRD layer comprises the linear layer, BatchNorm layer, ReLU layer, and dropout layer.</p>
Full article ">Figure 2
<p>The figure illustrates a neighborhood target point <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="normal">x</mi> </mrow> <mrow> <mi mathvariant="normal">i</mi> </mrow> </msub> </mrow> </semantics></math> processed in AGConv. Based on the feature inputs on the edge <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="normal">e</mi> </mrow> <mrow> <mi mathvariant="normal">i</mi> <mi mathvariant="normal">j</mi> </mrow> </msub> </mrow> </semantics></math>, the adaptive kernel <math display="inline"><semantics> <mrow> <msub> <mrow> <mover accent="true"> <mrow> <mi mathvariant="normal">e</mi> </mrow> <mo>^</mo> </mover> </mrow> <mrow> <mi mathvariant="normal">i</mi> <mi mathvariant="normal">j</mi> <mi mathvariant="normal">m</mi> </mrow> </msub> </mrow> </semantics></math> is generated, and a convolution operation is performed with the spatial input <math display="inline"><semantics> <mrow> <mi mathvariant="sans-serif">Δ</mi> <msub> <mrow> <mi mathvariant="normal">x</mi> </mrow> <mrow> <mi mathvariant="normal">i</mi> <mi mathvariant="normal">j</mi> </mrow> </msub> </mrow> </semantics></math>. Subsequently, the edge features <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="normal">h</mi> </mrow> <mrow> <mi mathvariant="normal">i</mi> <mi mathvariant="normal">j</mi> </mrow> </msub> </mrow> </semantics></math> are constructed by merging all dimensions of <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="normal">h</mi> </mrow> <mrow> <mi mathvariant="normal">i</mi> <mi mathvariant="normal">j</mi> <mi mathvariant="normal">m</mi> </mrow> </msub> </mrow> </semantics></math>. Eventually, these edge features are integrated using the aggregation function to obtain the center point’s output feature <math display="inline"><semantics> <mrow> <msubsup> <mrow> <mi mathvariant="normal">x</mi> </mrow> <mrow> <mi mathvariant="normal">i</mi> </mrow> <mrow> <mo>′</mo> </mrow> </msubsup> </mrow> </semantics></math>.</p>
Full article ">Figure 3
<p>The serialized point cloud neighborhood mapping sampling strategy for Z-order curve ordering begins by sampling equally spaced points, where the sampling spacing is fixed to 3. The correlation tensor is then designed to evaluate the relationship between local features and structural features. The CRF layer represents the combination of the 2D Conv layer, ReLU layer, and Pooling layer.</p>
Full article ">Figure 4
<p>The channel affinity attention module comprises two main components. The CCS component (light green part of the figure) computes the similarity matrix between channels, while the CAE component (light blue part of the figure) utilizes this matrix to evaluate the weight matrix.</p>
Full article ">Figure 5
<p>Point cloud feature maps for the TreeNet dataset.</p>
Full article ">Figure 6
<p>The confusion matrix of detailed classification results of each algorithm on the test set.</p>
Full article ">Figure 7
<p>The precision bars for each category of the eight algorithms computed from the confusion matrix.</p>
Full article ">Figure 8
<p>The recall bar charts for each category of the eight algorithms computed from the confusion matrix.</p>
Full article ">
19 pages, 16743 KiB  
Article
Low-Cost and Contactless Survey Technique for Rapid Pavement Texture Assessment Using Mobile Phone Imagery
by Zhenlong Gong, Marco Bruno, Margherita Pazzini, Anna Forte, Valentina Alena Girelli, Valeria Vignali and Claudio Lantieri
Sustainability 2024, 16(22), 9630; https://doi.org/10.3390/su16229630 - 5 Nov 2024
Viewed by 487
Abstract
Collecting pavement texture information is crucial to understand the characteristics of a road surface and to have essential data to support road maintenance. Traditional texture assessment techniques often require expensive equipment and complex operations. To ensure cost sustainability and reduce traffic closure times, [...] Read more.
Collecting pavement texture information is crucial to understand the characteristics of a road surface and to have essential data to support road maintenance. Traditional texture assessment techniques often require expensive equipment and complex operations. To ensure cost sustainability and reduce traffic closure times, this study proposes a rapid, cost-effective, and non-invasive surface texture assessment technique. This technology consists of capturing a set of images of a road surface with a mobile phone; then, the images are used to reconstruct the 3D surface with photogrammetric processing and derive the roughness parameters to assess the pavement texture. The results indicate that pavement images taken by a mobile phone can reconstruct the 3D surface and extract texture features with accuracy, meeting the requirements of a time-effective documentation. To validate the effectiveness of this technique, the surface structure of the pavement was analyzed in situ using a 3D structured light projection scanner and rigorous photogrammetry with a high-end reflex camera. The results demonstrated that increasing the point cloud density can enhance the detail level of the real surface 3D representation, but it leads to variations in road surface roughness parameters. Therefore, appropriate density should be chosen when performing three-dimensional reconstruction using mobile phone images. Mobile phone photogrammetry technology performs well in detecting shallow road surface textures but has certain limitations in capturing deeper textures. The texture parameters and the Abbott curve obtained using all three methods are comparable and fall within the same range of acceptability. This finding demonstrates the feasibility of using a mobile phone for pavement texture assessments with appropriate settings. Full article
Show Figures

Figure 1

Figure 1
<p>Asphalt mixture grading curves.</p>
Full article ">Figure 2
<p>The overall workflow of CRP.</p>
Full article ">Figure 3
<p>(<b>a</b>) Parallel axis capture; (<b>b</b>) Schematic diagram of the shooting platform.</p>
Full article ">Figure 4
<p>Example of an image acquired by the reflex camera and containing coded targets.</p>
Full article ">Figure 5
<p>(<b>a</b>) The Structured-light scanner employed; (<b>b</b>) A 3D point cloud obtained.</p>
Full article ">Figure 6
<p>Image of pavement sweeping site.</p>
Full article ">Figure 7
<p>Results of dense point cloud from CRP technique based on mobile phone.</p>
Full article ">Figure 8
<p>Cloud maps with different point cloud sizes and cloud maps from scanner.</p>
Full article ">Figure 9
<p>Abbott curves from different point cloud sizes and from the scanner.</p>
Full article ">Figure 10
<p>Abbott curves for results of different methods in five locations.</p>
Full article ">Figure 11
<p>Roughness parameters for different locations.</p>
Full article ">Figure 12
<p>The results of cloud maps at different locations by CRP based on mobile phone.</p>
Full article ">Figure 13
<p>Mobile phone point cloud of Location 4, represented with a color gradient showing the Z values (in mm) differences concerning the cloud scanned with SLS.</p>
Full article ">Figure 14
<p>The results of cloud maps in case of contamination.</p>
Full article ">Figure 15
<p>Abbott curves for results of different methods of four samples.</p>
Full article ">Figure 16
<p>Roughness parameters for different locations in case of contamination.</p>
Full article ">
22 pages, 4067 KiB  
Article
AIFormer: Adaptive Interaction Transformer for 3D Point Cloud Understanding
by Xutao Chu, Shengjie Zhao and Hongwei Dai
Remote Sens. 2024, 16(21), 4103; https://doi.org/10.3390/rs16214103 - 2 Nov 2024
Viewed by 493
Abstract
Recently, significant advancements have been made in 3D point cloud analysis by leveraging transformer architecture in 3D space. However, it remains challenging to effectively implement local and global learning within irregular and sparse structures of 3D point clouds. This paper presents the Adaptive [...] Read more.
Recently, significant advancements have been made in 3D point cloud analysis by leveraging transformer architecture in 3D space. However, it remains challenging to effectively implement local and global learning within irregular and sparse structures of 3D point clouds. This paper presents the Adaptive Interaction Transformer (AIFormer), a novel hierarchical transformer architecture designed to enhance 3D point cloud analysis by fusing local and global features through the adaptive interaction of features. Specifically, AIFormer mainly consists of several stacked AIFormer Blocks. Each AIFormer module employs the Local Relation Aggregation Module and the Global Context Aggregation Module, respectively, to extract local details of relationships within the reference point and long-range dependencies between reference points. Then, the local and global features are fused using the Adaptive Interaction Module for adaptive interaction to optimize the point representation. Additionally, the AIFormer Block further designs geometric relation functions and contextual relative semantic encoding to enhance local and global feature extraction capabilities, respectively. Extensive experiments on three popular 3D point cloud datasets verify that AIFormer achieves state-of-the-art or comparable performances. Our comprehensive ablation study further validates the effectiveness and soundness of the AIFormer design. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Framework Overview. (<b>b</b>) Structure of Adaptive Interaction Transformer Block (AIFormer Block). Best viewed in color.</p>
Full article ">Figure 2
<p>Illustration of the Local Relation Aggregation Module, the Global Context Aggregation Module and the Adaptive Interaction Module, respectively. Best viewed in color.</p>
Full article ">Figure 3
<p>Illustration of the Downsample and Upsample, best viewed in color.</p>
Full article ">Figure 4
<p>Visual comparison of our model with other methods [<a href="#B15-remotesensing-16-04103" class="html-bibr">15</a>,<a href="#B16-remotesensing-16-04103" class="html-bibr">16</a>] on S3DIS. Differences in semantic segmentation results are highlighted using red box for clarity.</p>
Full article ">Figure 5
<p>Visual comparison of our model with other methods [<a href="#B20-remotesensing-16-04103" class="html-bibr">20</a>,<a href="#B59-remotesensing-16-04103" class="html-bibr">59</a>] on ScanNetv2. Note that black indicates ignored labels, and differences in semantic segmentation results are highlighted using red box for clarity.</p>
Full article ">Figure 6
<p>Visual comparison of our model with other methods [<a href="#B10-remotesensing-16-04103" class="html-bibr">10</a>,<a href="#B27-remotesensing-16-04103" class="html-bibr">27</a>] on SemanticKITTI. Note that black indicates ignored labels.</p>
Full article ">Figure 7
<p>Validation mIoU and training loss curves of different point embedding methods on the ScanNetv2.</p>
Full article ">
26 pages, 284813 KiB  
Article
Automatic Method for Detecting Deformation Cracks in Landslides Based on Multidimensional Information Fusion
by Bo Deng, Qiang Xu, Xiujun Dong, Weile Li, Mingtang Wu, Yuanzhen Ju and Qiulin He
Remote Sens. 2024, 16(21), 4075; https://doi.org/10.3390/rs16214075 - 31 Oct 2024
Viewed by 622
Abstract
As cracks are a precursor landslide deformation feature, they can provide forecasting information that is useful for the early identification of landslides and determining motion instability characteristics. However, it is difficult to solve the size effect and noise-filtering problems associated with the currently [...] Read more.
As cracks are a precursor landslide deformation feature, they can provide forecasting information that is useful for the early identification of landslides and determining motion instability characteristics. However, it is difficult to solve the size effect and noise-filtering problems associated with the currently available automatic crack detection methods under complex conditions using single remote sensing data sources. This article uses multidimensional target scene images obtained by UAV photogrammetry as the data source. Firstly, under the premise of fully considering the multidimensional image characteristics of different crack types, this article accomplishes the initial identification of landslide cracks by using six algorithm models with indicators including the roughness, slope, eigenvalue rate of the point cloud and pixel gradient, gray value, and RGB value of the images. Secondly, the initial extraction results are processed through a morphological repair task using three filtering algorithms (calculating the crack orientation, length, and frequency) to address background noise. Finally, this article proposes a multi-dimensional information fusion method, the Bayesian probability of minimum risk methods, to fuse the identification results derived from different models at the decision level. The results show that the six tested algorithm models can be used to effectively extract landslide cracks, providing Area Under the Curve (AUC) values between 0.6 and 0.85. After the repairing and filtering steps, the proposed method removes complex noise and minimizes the loss of real cracks, thus increasing the accuracy of each model by 7.5–55.3%. Multidimensional data fusion methods solve issues associated with the spatial scale effect during crack identification, and the F-score of the fusion model is 0.901. Full article
(This article belongs to the Topic Landslides and Natural Resources)
Show Figures

Figure 1

Figure 1
<p>Location of study area: (<b>a</b>) the location and traffic conditions of the study area on satellite images; (<b>b</b>) optical image of the landslide (photographed by UAV in May 2021); (<b>c</b>) main deformation area and DSM at the landslide site.</p>
Full article ">Figure 2
<p>Flight route and terrain products of the UAV operation in WuLiPo: (<b>a</b>) planned flight plane route and checkpoint positions; (<b>b</b>) FeiMa D200 drone; (<b>c</b>) terrain-following flight route; (<b>d</b>) DOM; (<b>e</b>) 3D point cloud.</p>
Full article ">Figure 3
<p>Results of the field investigation at Wulipo: (<b>a</b>) section A–A’ and material composition characteristics; (<b>b</b>) manual survey results of cracks; (<b>c</b>) on-site photos of the main cracks (numbers correspond to the shooting range of the black rectangular frame in (<b>b</b>)).</p>
Full article ">Figure 4
<p>Flow chart showing the automatic landslide crack detection process utilizing multidimensional data fusion.</p>
Full article ">Figure 5
<p>Schematic diagram representing the image pre-processing method.</p>
Full article ">Figure 6
<p>2D and 3D characteristics of landslide cracks with different scales: (<b>a</b>,<b>b</b>) the texture and morphology of the same landslide crack in the image and point cloud (the same numbered frames represent crack comparisons at the same location); (<b>c</b>) schematic diagram of the tensile crack formation; (<b>d</b>) schematic diagram of the shear crack formation. The base map of c and d is digitized from [<a href="#B3-remotesensing-16-04075" class="html-bibr">3</a>].</p>
Full article ">Figure 7
<p>Schematic diagram of the principle by which a K-D tree is used to search the local neighborhood in the point cloud and generate various crack-extraction indicators.</p>
Full article ">Figure 8
<p>Schematic diagram showing the crack edge threshold segmentation principle in which grayscale images and the Sobel gradient map are used: (<b>a</b>) grayscale image of a crack; (<b>b</b>) local grayscale feature of the crack and the background surface; (<b>c</b>) gradient feature of the crack image processed by the Sobel operator; and (<b>d</b>) edge binarization effect of the crack image.</p>
Full article ">Figure 9
<p>Object classification results derived based on maximum likelihood supervision: (<b>a</b>) image map and sampling points; (<b>b</b>) distribution of objects after the classification process; (<b>c</b>,<b>d</b>) spatial distribution and categories of sample pixels before and after the classification process, respectively.</p>
Full article ">Figure 10
<p>Process by which the crack binary image is repaired using the morphological closure operation: (<b>a</b>,<b>b</b>) cracks and background surfaces after binarization, respectively; (<b>c</b>,<b>d</b>) expanding and corroding effects of local crack pixels, respectively; and (<b>e</b>) repaired crack.</p>
Full article ">Figure 11
<p>Schematic diagram showing the principles of the crack-filtering method: (<b>a</b>) local image of crack binary image orientational filtering convolution (from the rectangular box in (<b>d</b>)); (<b>b</b>) eigenvalues after orientational filter convolution; (<b>c</b>) original image with cracks; (<b>d</b>) preliminary automatically extracted crack image; (<b>e</b>) crack identification image after orientation, frequency, and length filtering; (<b>f</b>) principle by which a single crack is clustered using DBSCAN; (<b>g</b>) local characteristics of cracks after clustering (from the rectangular box in (<b>d</b>)); and (<b>h</b>) local characteristics of cracks after orientation and frequency filtering (from the rectangular box in (<b>e</b>)).</p>
Full article ">Figure 12
<p>WuLiPo orthophoto image and 3D point cloud pre-processing results.</p>
Full article ">Figure 13
<p>Recognition results of the WuLiPo cracks obtained by each model: (<b>a</b>–<b>c</b>) calculation results of the point cloud roughness, eigenvalue ratios, and slope, respectively; (<b>d</b>–<b>f</b>) grid conversion results corresponding to panels (<b>a</b>–<b>c</b>); (<b>g</b>) binary image transformed by Sobel; (<b>h</b>) preprocessed grayscale image.</p>
Full article ">Figure 14
<p>WuLiPo image classification results derived based on maximum likelihood supervised learning: (<b>a</b>) Orthophoto and manually selected sampling locations; (<b>b</b>) distribution of sample categories after prediction.</p>
Full article ">Figure 15
<p>Crack pixel binary classification confusion matrix.</p>
Full article ">Figure 16
<p>ROC curve test results of each crack identification and classification model.</p>
Full article ">Figure 17
<p>Semantic segmentation results of cracks in WuLiPo derived using various models.</p>
Full article ">Figure 18
<p>Effects of the repairing and filtering processes on the initial extraction crack results of each model (The red areas in the image are the identified crack pixels).</p>
Full article ">Figure 19
<p>Statistical chart of TPR, FPR, and precision metrics of the crack extraction models before and after repair filtering.</p>
Full article ">Figure 20
<p>Crack identification results of Wulipo: (<b>a</b>) automatic detection results of gradient value segmentation and slope segmentation model; (<b>b</b>) manual investigation results.</p>
Full article ">Figure 21
<p>Distribution of the image fusion features and the posterior probability comparison results derived for WuLiPo cracks: (<b>a</b>) distribution of 64 fusion feature samples; (<b>b</b>) posterior probability values of 64 fusion feature samples, and the reference line with red font indicates that the fusion result has equal probability of cracks and non-cracks.</p>
Full article ">Figure 22
<p>Bayesian probability fusion results derived under different risk factors: (<b>a</b>–<b>f</b>) are the results of crack fusion recognition when <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>λ</mi> </mrow> <mrow> <mn>1</mn> <mo>,</mo> <mn>1</mn> </mrow> </msub> <mo>:</mo> <msub> <mrow> <mi>λ</mi> </mrow> <mrow> <mn>1,2</mn> </mrow> </msub> <mo>:</mo> <msub> <mrow> <mi>λ</mi> </mrow> <mrow> <mn>2</mn> <mo>,</mo> <mn>1</mn> </mrow> </msub> <mo>:</mo> <msub> <mrow> <mi>λ</mi> </mrow> <mrow> <mn>2</mn> <mo>,</mo> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math>= (0:1.5:1:0), (0:1.2:1:0), (0:1:1:0), (0:1:2:0), (0:1:4:0), and (0:1:7:0), respectively. The red areas in the image are the identified crack pixels.</p>
Full article ">Figure 23
<p>Changes in model evaluation indicators under different risk ratios based on Bayesian probability fusion: (<b>a</b>–<b>d</b>) represent the changes of TPR, FPR, Precision, and F-score under different <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>λ</mi> </mrow> <mrow> <mn>12</mn> </mrow> </msub> <mo>:</mo> <msub> <mrow> <mi>λ</mi> </mrow> <mrow> <mn>21</mn> </mrow> </msub> </mrow> </semantics></math>, respectively.</p>
Full article ">
18 pages, 13017 KiB  
Article
DeployFusion: A Deployable Monocular 3D Object Detection with Multi-Sensor Information Fusion in BEV for Edge Devices
by Fei Huang, Shengshu Liu, Guangqian Zhang, Bingsen Hao, Yangkai Xiang and Kun Yuan
Sensors 2024, 24(21), 7007; https://doi.org/10.3390/s24217007 - 31 Oct 2024
Viewed by 414
Abstract
To address the challenges of suboptimal remote detection and significant computational burden in existing multi-sensor information fusion 3D object detection methods, a novel approach based on Bird’s-Eye View (BEV) is proposed. This method utilizes an enhanced lightweight EdgeNeXt feature extraction network, incorporating residual [...] Read more.
To address the challenges of suboptimal remote detection and significant computational burden in existing multi-sensor information fusion 3D object detection methods, a novel approach based on Bird’s-Eye View (BEV) is proposed. This method utilizes an enhanced lightweight EdgeNeXt feature extraction network, incorporating residual branches to address network degradation caused by the excessive depth of STDA encoding blocks. Meantime, deformable convolution is used to expand the receptive field and reduce computational complexity. The feature fusion module constructs a two-stage fusion network to optimize the fusion and alignment of multi-sensor features. This network aligns image features to supplement environmental information with point cloud features, thereby obtaining the final BEV features. Additionally, a Transformer decoder that emphasizes global spatial cues is employed to process the BEV feature sequence, enabling precise detection of distant small objects. Experimental results demonstrate that this method surpasses the baseline network, with improvements of 4.5% in the NuScenes detection score and 5.5% in average precision for detection objects. Finally, the model is converted and accelerated using TensorRT tools for deployment on mobile devices, achieving an inference time of 138 ms per frame on the Jetson Orin NX embedded platform, thus enabling real-time 3D object detection. Full article
(This article belongs to the Special Issue AI-Driving for Autonomous Vehicles)
Show Figures

Figure 1

Figure 1
<p>Overall framework of the network. DeployFusion introduces an improved EdgeNeXt feature extraction network, using residual branches to address degradation and deformable convolutions to increase the receptive field and reduce complexity. The feature fusion module aligns image and point cloud features to generate optimized BEV features. A Transformer decoder is used to process the sequence of BEV features, enabling accurate identification of small distant objects.</p>
Full article ">Figure 2
<p>Comparison of convolutional encoding block. (<b>a</b>) DW Encode. (<b>b</b>) DDW Encode.</p>
Full article ">Figure 3
<p>Feature channel separation attention.</p>
Full article ">Figure 4
<p>Feature channel separation attention.</p>
Full article ">Figure 5
<p>Transposed attention.</p>
Full article ">Figure 6
<p>Comparison of standard and variable convolution kernels in receptive field regions. (<b>a</b>) Receptive field area of standard convolutional kernel. (<b>b</b>) Receptive field area of deformable convolutional kernel.</p>
Full article ">Figure 7
<p>Experimental results of dynamic loss and NDS. (<b>a</b>) Dynamic loss graph. (<b>b</b>) Dynamic NDS score graph.</p>
Full article ">Figure 8
<p>Comparison of EdgeNeXt_DCN with other fusion networks of inference results.</p>
Full article ">Figure 9
<p>Comparisons of detection accuracy in different feature fusion networks. (<b>a</b>) Primitive feature extraction network. (<b>b</b>) EdgeNeXt_DCN feature extraction network.</p>
Full article ">Figure 10
<p>Results of object detection for each category.</p>
Full article ">Figure 11
<p>Comparison of detection results from multi-sensor fusion detection method in BEV.</p>
Full article ">Figure 12
<p>Performance of object detection in BEV of this method. (<b>a</b>) Scene 1. (<b>b</b>) Scene 2.</p>
Full article ">Figure 13
<p>Jetson Orin NX mobile device.</p>
Full article ">Figure 14
<p>Workflow of TensorRT.</p>
Full article ">Figure 15
<p>Comparison of computation time before and after operator fusion.</p>
Full article ">Figure 16
<p>Comparison of detection methods in various quantifiers and accuracy levels.</p>
Full article ">Figure 17
<p>Comparison of inference time before and after model quantification in detection.</p>
Full article ">Figure 18
<p>Detection result of method on mobile devices.</p>
Full article ">
20 pages, 3007 KiB  
Article
Efficient Semantic Segmentation for Large-Scale Agricultural Nursery Managements via Point Cloud-Based Neural Network
by Hui Liu, Jie Xu, Wen-Hua Chen, Yue Shen and Jinru Kai
Remote Sens. 2024, 16(21), 4011; https://doi.org/10.3390/rs16214011 - 29 Oct 2024
Viewed by 576
Abstract
Remote sensing technology has found extensive application in agriculture, providing critical data for analysis. The advancement of semantic segmentation models significantly enhances the utilization of point cloud data, offering innovative technical support for modern horticulture in nursery environments, particularly in the area of [...] Read more.
Remote sensing technology has found extensive application in agriculture, providing critical data for analysis. The advancement of semantic segmentation models significantly enhances the utilization of point cloud data, offering innovative technical support for modern horticulture in nursery environments, particularly in the area of plant cultivation. Semantic segmentation results aid in obtaining tree components, like canopies and trunks, and detailed data on tree growth environments. However, obtaining precise semantic segmentation results from large-scale areas can be challenging due to the vast number of points involved. Therefore, this paper introduces an improved model aimed at achieving superior performance for large-scale points. The model incorporates direction angles between points to improve local feature extraction and ensure rotational invariance. It also uses geometric and relative distance information for better adjustment of different neighboring point features. An external attention module extracts global spatial features, and an upsampling feature adjustment strategy integrates features from the encoder and decoder. A specialized dataset was created from real nursery environments for experiments. Results show that the improved model surpasses several point-based models, achieving a Mean Intersection over Union (mIoU) of 87.18%. This enhances the precision of nursery environment analysis and supports the advancement of autonomous nursery managements. Full article
(This article belongs to the Special Issue Point Cloud Processing with Machine Learning)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Some photos of the experimental data collection site.</p>
Full article ">Figure 2
<p>The machine used to collect point clouds of seedlings in the nursery.</p>
Full article ">Figure 3
<p>Some images of collection process. The red line represents the motion trajectory of the robot during the data collection process.</p>
Full article ">Figure 4
<p>(<b>a</b>) is the point cloud data generated from four scenes within the nursery, with each point containing coordinate and reflectivity information. (<b>b</b>) is the annotated ground truth.</p>
Full article ">Figure 5
<p>An example of labeled data. Different colors are used to represent different categories.</p>
Full article ">Figure 6
<p>The main structure of the proposed model for implementing semantic segmentation tasks. Three improved modules are emphasized using dashed rectangular boxes. The local feature extraction module aims to effectively extract features from local areas, while the global feature extraction module is designed to capture the global features. The upsampling feature adjustment module replaces the traditional skip connections to facilitate the efficient fusion of encoder and decoder features.</p>
Full article ">Figure 7
<p>The relative directional angles between the neighbor points and the center point around the XYZ axes.</p>
Full article ">Figure 8
<p>The proposed local feature extraction module, where <math display="inline"><semantics> <msub> <mi>N</mi> <mi>L</mi> </msub> </semantics></math> represents the number of generated local areas, and <span class="html-italic">C</span> and <math display="inline"><semantics> <msub> <mi>C</mi> <mrow> <mi>o</mi> <mi>u</mi> <mi>t</mi> </mrow> </msub> </semantics></math> are the dimensionality of the input and out features of the proposed local feature extraction module.</p>
Full article ">Figure 9
<p>(<b>a</b>–<b>c</b>) are the structures of the self-attention module and its variables. (<b>d</b>,<b>e</b>) represent the structures of the external attention and multi-head external attention modules.</p>
Full article ">Figure 10
<p>The structure of the proposed upsampling features adjustment module.</p>
Full article ">Figure 11
<p>The accuracy and loss curves of different models with the self-made dataset during the training processes.</p>
Full article ">Figure 12
<p>Visualization segmentation results on the testing set. (<b>a</b>) represents the ground truth and segmentation results of RandLA-Net and the proposed method. (<b>b</b>–<b>e</b>) show some details of the segmentation results. White boxes indicate that RandLA-Net predicts error points.</p>
Full article ">Figure 13
<p>The accuracy and loss curves of different models in the ablation experiments.</p>
Full article ">
17 pages, 13097 KiB  
Article
Airborne LiDAR Point Cloud Classification Using Ensemble Learning for DEM Generation
by Ting-Shu Ciou, Chao-Hung Lin and Chi-Kuei Wang
Sensors 2024, 24(21), 6858; https://doi.org/10.3390/s24216858 - 25 Oct 2024
Viewed by 513
Abstract
Airborne laser scanning (ALS) point clouds have emerged as a predominant data source for the generation of digital elevation models (DEM) in recent years. Traditionally, the generation of DEM using ALS point clouds involves the steps of point cloud classification or ground point [...] Read more.
Airborne laser scanning (ALS) point clouds have emerged as a predominant data source for the generation of digital elevation models (DEM) in recent years. Traditionally, the generation of DEM using ALS point clouds involves the steps of point cloud classification or ground point filtering to extract ground points and labor-intensive post-processing to correct the misclassified ground points. The current deep learning techniques leverage the ability of geometric recognition for ground point classification. However, the deep learning classifiers are generally trained using 3D point clouds with simple geometric terrains, which decrease the performance of model inferencing. In this study, a point-based deep learning model with boosting ensemble learning and a set of geometric features as the model inputs is proposed. With the ensemble learning strategy, this study integrates specialized ground point classifiers designed for different terrains to boost classification robustness and accuracy. In experiments, ALS point clouds containing various terrains were used to evaluate the feasibility of the proposed method. The results demonstrated that the proposed method can improve the point cloud classification and the quality of generated DEMs. The classification accuracy and F1 score are improved from 80.9% to 92.2%, and 82.2% to 94.2%, respectively, by using the proposed methods. In addition, the DEM generation error, in terms of mean squared error (RMSE), is reduced from 0.318–1.362 m to 0.273–1.032 m by using the proposed ensemble learning. Full article
(This article belongs to the Section Radar Sensors)
Show Figures

Figure 1

Figure 1
<p>The network structure of the DGCNN segmentation model.</p>
Full article ">Figure 2
<p>The edge convolution operation.</p>
Full article ">Figure 3
<p>The inconsistent intensity value in point cloud data.</p>
Full article ">Figure 4
<p>Workflow of ground point determination by using ensemble learning.</p>
Full article ">Figure 5
<p>Spatial distribution of training datasets. (<b>Left</b>) the locations of mountain, urban, and mixed datasets are marked gray, orange, and pink; (<b>right</b>) examples of datasets.</p>
Full article ">Figure 6
<p>Results of the urban classifier <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>M</mi> </mrow> <mrow> <mi>u</mi> <mi>r</mi> <mi>b</mi> <mi>a</mi> <mi>n</mi> </mrow> </msub> </mrow> </semantics></math> applied to an urban dataset. (<b>Top</b>) ground truth; (<b>bottom</b>) prediction result. The point cloud profile of the red line in the left subfigure is displayed in the right subfigure, and the ground points are marked in orange.</p>
Full article ">Figure 7
<p>Results of the urban classifier <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>M</mi> </mrow> <mrow> <mi>m</mi> <mi>o</mi> <mi>u</mi> <mi>n</mi> <mi>t</mi> <mi>a</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> </mrow> </semantics></math> applied to mountain data. (<b>Top</b>) ground truth; (<b>bottom</b>) prediction result. The point cloud profile of the red line in the left subfigure is displayed in the right subfigure, and the ground points are marked in orange.</p>
Full article ">Figure 8
<p>Comparison of prediction results of three ground point extraction processes on the mixed dataset. The <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="normal">S</mi> </mrow> <mrow> <mi>i</mi> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="normal">E</mi> </mrow> <mrow> <mi>i</mi> </mrow> </msub> </mrow> </semantics></math> represent the start and end of the profile. The locations of the profiles are marked with red, and the ground points are marked in orange.</p>
Full article ">Figure 9
<p>Classification result of AHN dataset using the proposed method. The locations of the profiles are marked with red, and the ground points are marked in orange.</p>
Full article ">Figure 10
<p>Comparison of ground points in ground truth and prediction. The locations of the profiles are marked with red.</p>
Full article ">Figure 11
<p>Error maps of the generated DEM.</p>
Full article ">
23 pages, 5405 KiB  
Article
CPH-Fmnet: An Optimized Deep Learning Model for Multi-View Stereo and Parameter Extraction in Complex Forest Scenes
by Lingnan Dai, Zhao Chen, Xiaoli Zhang, Dianchang Wang and Lishuo Huo
Forests 2024, 15(11), 1860; https://doi.org/10.3390/f15111860 - 23 Oct 2024
Viewed by 552
Abstract
The three-dimensional reconstruction of forests is crucial in remote sensing technology, ecological monitoring, and forestry management, as it yields precise forest structure and tree parameters, providing essential data support for forest resource management, evaluation, and sustainable development. Nevertheless, forest 3D reconstruction now encounters [...] Read more.
The three-dimensional reconstruction of forests is crucial in remote sensing technology, ecological monitoring, and forestry management, as it yields precise forest structure and tree parameters, providing essential data support for forest resource management, evaluation, and sustainable development. Nevertheless, forest 3D reconstruction now encounters obstacles including higher equipment costs, reduced data collection efficiency, and complex data processing. This work introduces a unique deep learning model, CPH-Fmnet, designed to enhance the accuracy and efficiency of 3D reconstruction in intricate forest environments. CPH-Fmnet enhances the FPN Encoder-Decoder Architecture by meticulously incorporating the Channel Attention Mechanism (CA), Path Aggregation Module (PA), and High-Level Feature Selection Module (HFS), alongside the integration of the pre-trained Vision Transformer (ViT), thereby significantly improving the model’s global feature extraction and local detail reconstruction abilities. We selected three representative sample plots in Haidian District, Beijing, China, as the study area and took forest stand sequence photos with an iPhone for the research. Comparative experiments with the conventional SfM + MVS and MVSFormer models, along with comprehensive parameter extraction and ablation studies, substantiated the enhanced efficacy of the proposed CPH-Fmnet model in addressing difficult circumstances such as intricate occlusions, poorly textured areas, and variations in lighting. The test results show that the model does better on a number of evaluation criteria. It has an RMSE of 1.353, an MAE of only 5.1%, an r value of 1.190, and a forest reconstruction rate of 100%, all of which are better than current methods. Furthermore, the model produced a more compact and precise 3D point cloud while accurately determining the properties of the forest trees. The findings indicate that CPH-Fmnet offers an innovative approach for forest resource management and ecological monitoring, characterized by cheap cost, high accuracy, and high efficiency. Full article
(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)
Show Figures

Figure 1

Figure 1
<p>Overview of the study area.</p>
Full article ">Figure 2
<p>The architecture of CPH-Fmnet. Three sub-module: pretrained ViT, CPHFPN, and 3D UNet.</p>
Full article ">Figure 3
<p>Channel attention mechanism framework [<a href="#B37-forests-15-01860" class="html-bibr">37</a>].</p>
Full article ">Figure 4
<p>Comparison of 3D reconstruction results of different methods in three representative forest scenes. (<b>a</b>–<b>c</b>) correspond to plot 1, plot 2, and plot 3, respectively.</p>
Full article ">Figure 5
<p>Comparison of tree trunk detail reconstruction results using different methods.</p>
Full article ">Figure 6
<p>Comparison of tree crown detail reconstruction results using different methods.</p>
Full article ">Figure 7
<p>Comparison of extraction results of diameter at breast height of a single tree using different methods.</p>
Full article ">
25 pages, 7196 KiB  
Article
Position Normalization of Propellant Grain Point Clouds
by Junchao Wang, Fengnian Tian, Renfu Li, Zhihui Li, Bin Zhang and Xuelong Si
Aerospace 2024, 11(10), 859; https://doi.org/10.3390/aerospace11100859 - 18 Oct 2024
Viewed by 396
Abstract
Point cloud data obtained from scanning propellant grains with 3D scanning equipment exhibit positional uncertainty in space, posing significant challenges for calculating the relevant parameters of the propellant grains. Therefore, it is essential to normalize the position of each propellant grain’s point cloud. [...] Read more.
Point cloud data obtained from scanning propellant grains with 3D scanning equipment exhibit positional uncertainty in space, posing significant challenges for calculating the relevant parameters of the propellant grains. Therefore, it is essential to normalize the position of each propellant grain’s point cloud. This paper proposes a normalization algorithm for propellant grain point clouds, consisting of two stages, coarse normalization and fine normalization, to achieve high-precision transformations of the point clouds. In the coarse normalization stage, a layer-by-layer feature points detection scheme based on k-dimensional trees (KD-tree) and k-means clustering (k-means) is designed to extract feature points from the propellant grain point cloud. In the fine normalization stage, a rotation angle compensation scheme is proposed to align the fitted symmetry axis of the propellant grain point cloud with the coordinate axes. Finally, comparative experiments with iterative closest point (ICP) and random sample consensus (RANSAC) validate the efficiency of the proposed normalization algorithm. Full article
(This article belongs to the Section Astronautics & Space Science)
Show Figures

Figure 1

Figure 1
<p>The free assembly method and the wall pouring method.</p>
Full article ">Figure 2
<p>Virtual free assembly process.</p>
Full article ">Figure 3
<p>Propellant grain point cloud.</p>
Full article ">Figure 4
<p>Overall procedure of the method (the images have been specially processed for de-identification).</p>
Full article ">Figure 5
<p>The problems faced by layer-by-layer projection (the images have been specially processed for de-identification). The height for each layer is selected with certain limitations, considering the presence of certain tilt in the original point cloud and the abundance of convex points on the arc surface.</p>
Full article ">Figure 6
<p>KD-tree nearest neighbor search. After performing KD-tree search for the point <math display="inline"><semantics> <msubsup> <mi>p</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mo>′</mo> </msubsup> </semantics></math> in <math display="inline"><semantics> <msubsup> <mi>P</mi> <mi>i</mi> <mo>′</mo> </msubsup> </semantics></math>, the nearest subset for this point is obtained. Using k-means, this subset is divided into two subsets, denoted as <math display="inline"><semantics> <msub> <mi>C</mi> <mn>1</mn> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>C</mi> <mn>2</mn> </msub> </semantics></math>. Line fitting is performed on both subsets, and based on the angle, it is determined whether to include this point in the candidate corner point set.</p>
Full article ">Figure 7
<p>Boundary of propellant grain point cloud. The boundary of the cross-sectional point cloud exhibits a periodic density distribution. The appropriate interval division is shown in figures (<b>a</b>,<b>b</b>). An excessively large interval is shown in figure (<b>c</b>), while an overly small interval is shown in figure (<b>d</b>).</p>
Full article ">Figure 8
<p>Symmetric axis fitting.</p>
Full article ">Figure 9
<p>The entire process for multiple symmetry axis fitting.</p>
Full article ">Figure 10
<p>FP-1, FP-2, FP-3, and FP-4.</p>
Full article ">Figure 11
<p>RMSE principle diagram.</p>
Full article ">Figure 12
<p>Transformation matrix error. When only <math display="inline"><semantics> <msub> <mi>θ</mi> <mi>y</mi> </msub> </semantics></math> is changed, the <math display="inline"><semantics> <msub> <mi>δ</mi> <mi>θ</mi> </msub> </semantics></math> of our method is smaller than that of RANSAC + ICP. When only <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>x</mi> </mrow> </semantics></math> or <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>z</mi> </mrow> </semantics></math> is changed, RANSAC + ICP shows higher accuracy. However, when only <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>y</mi> </mrow> </semantics></math> is changed, the performance of RANSAC + ICP is unstable, while our method remains stable with a consistently low error.</p>
Full article ">Figure 13
<p><math display="inline"><semantics> <mi>RMSE</mi> </semantics></math> between <math display="inline"><semantics> <msup> <mrow> <msubsup> <mi>P</mi> <mi>T</mi> <mi>i</mi> </msubsup> </mrow> <mo>′</mo> </msup> </semantics></math> and <math display="inline"><semantics> <msubsup> <mi>P</mi> <mi>T</mi> <mi>i</mi> </msubsup> </semantics></math>. When only <math display="inline"><semantics> <msub> <mi>θ</mi> <mi>y</mi> </msub> </semantics></math> is changed, the two-stage <math display="inline"><semantics> <mi>RMSE</mi> </semantics></math> values of our method are lower. When only <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>x</mi> </mrow> </semantics></math> or <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>z</mi> </mrow> </semantics></math> is changed, the <math display="inline"><semantics> <mi>RMSE</mi> </semantics></math> values of RANSAC and ICP tend to be consistent and are lower than those of our method. However, when only <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>y</mi> </mrow> </semantics></math> is changed, the <math display="inline"><semantics> <mi>RMSE</mi> </semantics></math> values of RANSAC and ICP show instability, while our method remains relatively stable and maintains a lower <math display="inline"><semantics> <mi>RMSE</mi> </semantics></math> value.</p>
Full article ">Figure 14
<p>Random position point cloud position normalization experiment.</p>
Full article ">Figure 15
<p>Position normalization results. It is evident that after position normalization using our method, most of the points align well with the target point cloud, with only a few points that are not completely aligned. In contrast, the results from RANSAC + ICP show that most points are not well aligned.</p>
Full article ">
19 pages, 5199 KiB  
Article
Geometry-Aware Enhanced Mutual-Supervised Point Elimination with Overlapping Mask Contrastive Learning for Partitial Point Cloud Registration
by Yue Dai, Shuilin Wang, Chunfeng Shao, Heng Zhang and Fucang Jia
Electronics 2024, 13(20), 4074; https://doi.org/10.3390/electronics13204074 - 16 Oct 2024
Viewed by 573
Abstract
Point cloud registration is one of the fundamental tasks in computer vision, but faces challenges under low overlap conditions. Recent approaches use transformers and overlapping masks to improve perception, but mask learning only considers Euclidean distances between features, ignores mismatches caused by fuzzy [...] Read more.
Point cloud registration is one of the fundamental tasks in computer vision, but faces challenges under low overlap conditions. Recent approaches use transformers and overlapping masks to improve perception, but mask learning only considers Euclidean distances between features, ignores mismatches caused by fuzzy geometric structures, and is often computationally inefficient. To address these issues, we introduce a novel matching framework. Firstly, we fuse adaptive graph convolution with PPF features to obtain rich feature perception. Subsequently, we construct a PGT framework that uses GeoTransformer and combines it with location information encoding to enhance the geometry perception between source and target clouds. In addition, we improve the visibility of overlapping regions through information exchange and the AIS module, aiming at subsequent keypoint extraction, preserving points with distinct geometrical structures while suppressing the influence of non-overlapping regions to improve computational efficiency. Finally, the mask is refined through contrast learning to preserve geometric and distance similarity, which helps to compute the transformation parameters more accurately. We have conducted comprehensive experiments on synthetic and real-world scene datasets, demonstrating superior registration performance compared to recent deep learning methods. Our approach shows remarkable improvements of 68.21% in RRMSE and 76.31% in tRMSE on synthetic data, while also excelling in real-world scenarios with enhancements of 76.46% in RRMSE and 45.16% in tRMSE. Full article
Show Figures

Figure 1

Figure 1
<p>The overall architecture of our network. Our network mainly consists of feature extraction, PGT module, feature interaction with AIS module, key point selection module, overlapping mask prediction module and corresponding search module. The inputs are source point clouds <span class="html-italic">X</span> and target point clouds <span class="html-italic">Y</span> of dimension <math display="inline"><semantics> <mrow> <mi>M</mi> <mo>×</mo> <mn>3</mn> </mrow> </semantics></math>, and the network loops over the obtained features <span class="html-italic">N</span> times to refine the registration results after extracting the keypoints. The source point clouds <span class="html-italic">X</span> and target point clouds <span class="html-italic">Y</span> undergo feature extraction to encode features, followed by PGT module to enhance feature perception and encode relative position information to obtain feature <math display="inline"><semantics> <msub> <mi>F</mi> <mi>x</mi> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>F</mi> <mi>y</mi> </msub> </semantics></math>. Then similar feature information is captured by feature interaction with the AIS module and significant features and points that are in the overlap region are selected by scores. Finally, the transformation matrix <span class="html-italic">T</span> is obtained by the corresponding search module and the overlap mask <math display="inline"><semantics> <msub> <mi>M</mi> <mi>X</mi> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>M</mi> <mi>Y</mi> </msub> </semantics></math> are optimized by Contrastive Learning. <span class="html-italic">N</span> indicates the number of iterations.</p>
Full article ">Figure 2
<p>Our feature extraction module. We use adaptive graph convolution to extract the point cloud features, fusing the obtained multilevel features and obtaining the global features, followed by feature fusion of the extracted PPF geometrical features with the global features and the multilevel graph features to obtain 512-dimensional features.</p>
Full article ">Figure 3
<p>Our PGT module. We use fully connected layer FC, Sigmoid and Relu activation functions to build the position encoding module, we use concat operation to stitch features with position information and position enhancement by Geotransformer. Finally, we superimpose the original features to highlight saliency.</p>
Full article ">Figure 4
<p>Our keypoints selection module. Uses MLP to extract significance scores and selects corresponding features and keypoints based on the top TOP-K scores.</p>
Full article ">Figure 5
<p>Corredpondences search module.</p>
Full article ">Figure 6
<p>Visualization of the registration of the same category in ModelNet40. Red represents the source point cloud, blue represents the target point cloud, and green represents the source point cloud after registration. The settings for subsequent visualization results remain consistent. (<b>a</b>) plant, (<b>b</b>) vase, (<b>c</b>) night-stand, (<b>d</b>) plant.</p>
Full article ">Figure 7
<p>Visualization of the registration of the unseen category in ModelNet40. (<b>a</b>) monitor, (<b>b</b>) range-hood, (<b>c</b>) glass-box, (<b>d</b>) night-stand.</p>
Full article ">Figure 8
<p>Visualization results for Gaussian noise in ModelNet40. (<b>a</b>) door, (<b>b</b>) table, (<b>c</b>) mantel, (<b>d</b>) bookshelf.</p>
Full article ">Figure 9
<p>Visualization results for Gaussian noise with low overlap in ModelNet40. (<b>a</b>) night-stand, (<b>b</b>) laptop, (<b>c</b>) vase, (<b>d</b>) table.</p>
Full article ">Figure 10
<p>Our method achieves the best registration accuracy in terms of registration accuracy, and although the method DCP slightly outperforms us in terms of speed, our registration accuracy far exceeds it.</p>
Full article ">Figure 11
<p>Registration results of our algorithm at different degrees of overlap. It can be seen that our method still maintains a good registration accuracy when the overlap degree decreases sharply.</p>
Full article ">Figure 12
<p>Regarding the comparison of our method with the recent CMIGNet method for different overlap ratios, the solid line corresponds to the rotation error and the dashed line corresponds to the translation error, which is much lower than the CMIGNet error.</p>
Full article ">Figure 13
<p>The results of our registration under different noise levels, we can see that our registration still maintains good accuracy when the noise is stacked sequentially.</p>
Full article ">Figure 14
<p>Our results on the real scene registration result. The color representation is consistent with previous experiments. Subfigures (<b>a</b>–<b>d</b>) represent the alignment results of the point clouds acquired at different viewing angles.</p>
Full article ">Figure 15
<p>Further ablation of our approach regarding the combined AIS module and the feature interaction module FI, histograms mention their effects.</p>
Full article ">
Back to TopTop