Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (138)

Search Parameters:
Keywords = ORB SLAM

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 39910 KiB  
Article
DyGS-SLAM: Realistic Map Reconstruction in Dynamic Scenes Based on Double-Constrained Visual SLAM
by Fan Zhu, Yifan Zhao, Ziyu Chen, Chunmao Jiang, Hui Zhu and Xiaoxi Hu
Remote Sens. 2025, 17(4), 625; https://doi.org/10.3390/rs17040625 - 12 Feb 2025
Viewed by 446
Abstract
Visual SLAM is widely applied in robotics and remote sensing. The fusion of Gaussian radiance fields and Visual SLAM has demonstrated astonishing efficacy in constructing high-quality dense maps. While existing methods perform well in static scenes, they are prone to the influence of [...] Read more.
Visual SLAM is widely applied in robotics and remote sensing. The fusion of Gaussian radiance fields and Visual SLAM has demonstrated astonishing efficacy in constructing high-quality dense maps. While existing methods perform well in static scenes, they are prone to the influence of dynamic objects in real-world dynamic environments, thus making robust tracking and mapping challenging. We introduce DyGS-SLAM, a Visual SLAM system that employs dual constraints to achieve high-fidelity static map reconstruction in dynamic environments. We extract ORB features within the scene, and use open-world semantic segmentation models and multi-view geometry to construct dual constraints, forming a zero-shot dynamic information elimination module while recovering backgrounds occluded by dynamic objects. Furthermore, we select high-quality keyframes and use them for loop closure detection and global optimization, constructing a foundational Gaussian map through a set of determined point clouds and poses and integrating repaired frames for rendering new viewpoints and optimizing 3D scenes. Experimental results on the TUM RGB-D, Bonn, and Replica datasets, as well as real scenes, demonstrate that our method has excellent localization accuracy and mapping quality in dynamic scenes. Full article
(This article belongs to the Special Issue 3D Scene Reconstruction, Modeling and Analysis Using Remote Sensing)
Show Figures

Figure 1

Figure 1
<p>System framework of DyGS-SLAM. The tracking thread conducts dynamic object removal and background inpainting. The mapping thread reconstructs the Gaussian map and performs differentiable rendering using a set of determined poses and point clouds. Lastly, the 3D scene is optimized based on the repaired frames and rendered frames.</p>
Full article ">Figure 2
<p>Open-world semantic segmentation model.</p>
Full article ">Figure 3
<p>RGB images in TUM RGB-D dataset. (<b>a</b>) Frame 690. (<b>b</b>) Frame 765. The red boxes indicates the chair being moved. This is often semantically static but actually moving.</p>
Full article ">Figure 4
<p>The feature point p on the keyframe projected onto the current frame is p’, and O and O’ are the two frames corresponding to the optical center of the camera, respectively. (<b>a</b>) Feature point p’ is static (<math display="inline"><semantics> <mrow> <msup> <mi>d</mi> <mo>′</mo> </msup> <mo>=</mo> <msub> <mi>d</mi> <mrow> <mi>p</mi> <mi>r</mi> <mi>o</mi> <mi>j</mi> </mrow> </msub> </mrow> </semantics></math>). (<b>b</b>) Feature point p’ is dynamic (<math display="inline"><semantics> <mrow> <msup> <mi>d</mi> <mo>′</mo> </msup> <mo>≪</mo> <msub> <mi>d</mi> <mrow> <mi>p</mi> <mi>r</mi> <mi>o</mi> <mi>j</mi> </mrow> </msub> </mrow> </semantics></math>).</p>
Full article ">Figure 5
<p>Image frame comparison between Dyna-SLAM and DyGS-SLAM (Ours) after walking_halfsphere sequence repair in TUM RGB-D dataset. The red boxes show how different methods compare the results of fixing the same frame.</p>
Full article ">Figure 6
<p>Camera trajectory estimated by ORB-SLAM3 and DyGS-SLAM (Ours) on the TUM dataset, and the differences with ground truth values.</p>
Full article ">Figure 7
<p>Comparison of mapping effects between NICE-SLAM, SplaTAM, and DyGS-SLAM (Ours) on walking_xyz sequence.</p>
Full article ">Figure 8
<p>Detailed comparison of the original reconstructed scene provided by Bonn and the scene reconstructed by our method. The red boxes indicate the details of the different methods to reconstruct the scene. (<b>a</b>) Original reconstructed scene provided by Bonn. (<b>b</b>–<b>d</b>) Details of the reconstructed scene of our method.</p>
Full article ">Figure 9
<p>Comparison of reconstruction performance between SplaTAM and DyGS-SLAM (Ours) on Bonn dataset. Our method demonstrates better reconstruction quality. (<b>a</b>) SplaTAM. (<b>b</b>) DyGS-SLAM.</p>
Full article ">Figure 10
<p>Comparison of mapping effects between NICE-SLAM, SplaTAM, and DyGS-SLAM on Replica dataset. The red boxes indicate the details of the different methods to reconstruct the scene. Our method also has excellent reconstruction quality in static scenes. (<b>a</b>) NICE-SLAM. (<b>b</b>) SplaTAM. (<b>c</b>) DyGS-SLAM. (<b>d</b>) GT.</p>
Full article ">Figure 11
<p>Experimental results in real scenarios. The red boxes indicates the recovery of the static background during reconstruction (<b>a</b>) Input image. (<b>b</b>) Segmentation. (<b>c</b>) Background repair. (<b>d</b>) Novel view synthesis.</p>
Full article ">Figure 12
<p>Effect of background inpainting or not on DyGS-SLAM scene reconstruction. The red boxes indicate the reconstruction effects of different methods. (<b>a</b>) Reconstruction w/o background inpainting. (<b>b</b>) Reconstruction w/background inpainting.</p>
Full article ">
15 pages, 3120 KiB  
Article
Implementation of Visual Odometry on Jetson Nano
by Jakub Krško, Dušan Nemec, Vojtech Šimák and Mário Michálik
Sensors 2025, 25(4), 1025; https://doi.org/10.3390/s25041025 - 9 Feb 2025
Viewed by 461
Abstract
This paper presents the implementation of ORB-SLAM3 for visual odometry on a low-power ARM-based system, specifically the Jetson Nano, to track a robot’s movement using RGB-D cameras. Key challenges addressed include the selection of compatible software libraries, camera calibration, and system optimization. The [...] Read more.
This paper presents the implementation of ORB-SLAM3 for visual odometry on a low-power ARM-based system, specifically the Jetson Nano, to track a robot’s movement using RGB-D cameras. Key challenges addressed include the selection of compatible software libraries, camera calibration, and system optimization. The ORB-SLAM3 algorithm was adapted for the ARM architecture and tested using both the EuRoC dataset and real-world scenarios involving a mobile robot. The testing demonstrated that ORB-SLAM3 provides accurate localization, with errors in path estimation ranging from 3 to 11 cm when using the EuRoC dataset. Real-world tests on a mobile robot revealed discrepancies primarily due to encoder drift and environmental factors such as lighting and texture. The paper discusses strategies for mitigating these errors, including enhanced calibration and the potential use of encoder data for tracking when camera performance falters. Future improvements focus on refining the calibration process, adding trajectory correction mechanisms, and integrating visual odometry data more effectively into broader systems. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

Figure 1
<p>Chessboard pattern with highlighted corners [<a href="#B1-sensors-25-01025" class="html-bibr">1</a>].</p>
Full article ">Figure 2
<p>EuRoC dataset sample [<a href="#B20-sensors-25-01025" class="html-bibr">20</a>].</p>
Full article ">Figure 3
<p>Detail of difference between estimated and ground truth trajectories [<a href="#B1-sensors-25-01025" class="html-bibr">1</a>].</p>
Full article ">Figure 4
<p>Comparison between robot and camera trajectories [<a href="#B1-sensors-25-01025" class="html-bibr">1</a>].</p>
Full article ">Figure 5
<p>ORB-SLAM3 GUI [<a href="#B1-sensors-25-01025" class="html-bibr">1</a>].</p>
Full article ">Figure 6
<p>State-chart for implemented algorithm [<a href="#B1-sensors-25-01025" class="html-bibr">1</a>].</p>
Full article ">
68 pages, 11118 KiB  
Review
A Review of Simultaneous Localization and Mapping for the Robotic-Based Nondestructive Evaluation of Infrastructures
by Ali Ghadimzadeh Alamdari, Farzad Azizi Zade and Arvin Ebrahimkhanlou
Sensors 2025, 25(3), 712; https://doi.org/10.3390/s25030712 - 24 Jan 2025
Viewed by 829
Abstract
The maturity of simultaneous localization and mapping (SLAM) methods has now reached a significant level that motivates in-depth and problem-specific reviews. The focus of this study is to investigate the evolution of vision-based, LiDAR-based, and a combination of these methods and evaluate their [...] Read more.
The maturity of simultaneous localization and mapping (SLAM) methods has now reached a significant level that motivates in-depth and problem-specific reviews. The focus of this study is to investigate the evolution of vision-based, LiDAR-based, and a combination of these methods and evaluate their performance in enclosed and GPS-denied (EGD) conditions for infrastructure inspection. This paper categorizes and analyzes the SLAM methods in detail, considering the sensor fusion type and chronological order. The paper analyzes the performance of eleven open-source SLAM solutions, containing two visual (VINS-Mono, ORB-SLAM 2), eight LiDAR-based (LIO-SAM, Fast-LIO 2, SC-Fast-LIO 2, LeGO-LOAM, SC-LeGO-LOAM A-LOAM, LINS, F-LOAM) and one combination of the LiDAR and vision-based method (LVI-SAM). The benchmarking section analyzes accuracy and computational resource consumption using our collected dataset and a test dataset. According to the results, LiDAR-based methods performed well under EGD conditions. Contrary to common presumptions, some vision-based methods demonstrate acceptable performance in EGD environments. Additionally, combining vision-based techniques with LiDAR-based methods demonstrates superior performance compared to either vision-based or LiDAR-based methods individually. Full article
(This article belongs to the Special Issue Feature Review Papers in Intelligent Sensors)
Show Figures

Figure 1

Figure 1
<p>Classification of SLAM methods. Previous surveys. Methods with an underline are considered for benchmarking.</p>
Full article ">Figure 2
<p>Triangulation and minimizing reprojection error in SfM.</p>
Full article ">Figure 3
<p>Visual feature types, LiDAR point cloud, feature types, and tracking loss. (<b>a</b>) SURF. (<b>b</b>) SIFT. (<b>c</b>) ORB. (<b>d</b>) Harris. (<b>e</b>) FAST. (<b>f</b>) Point cloud and detected features. (<b>g</b>) SURF. (<b>h</b>) SIFT. (<b>i</b>) ORB feature matching. (<b>j</b>) Tracking loss in severe rotation for SURF feature matching. (<b>k</b>) Tracking loss in severe rotation for SIFT feature matching. (<b>l</b>) Tracking loss in severe rotation for ORB feature matching (colors of the points and lines are intended to differentiate independent points and lines in (<b>a</b>–<b>e</b>,<b>g</b>–<b>l</b>). In (<b>f</b>), green and yellow points, respectively, represent edge and planar features, while other points are red).</p>
Full article ">Figure 4
<p>Drift and trajectory error during path traversal. (<b>a</b>) Drift error at the end of the traversed path. (<b>b</b>) Trajectory error during the path traversal.</p>
Full article ">Figure 5
<p>Visual SLAM feature-based method workflow demonstration. (<b>a</b>) Feature tracking using the camera. (<b>b</b>) 3D mapping using the detected features.</p>
Full article ">Figure 6
<p>Evolution of visual feature-based methods.</p>
Full article ">Figure 7
<p>Tracking in a four-level pyramid in PTAM method. The images include a maximum of 50 closest features in consecutive frames. (<b>a</b>) original image. (<b>b</b>) first level. (<b>c</b>) second level. (<b>d</b>) third level. (<b>e</b>) fourth-level pyramids.</p>
Full article ">Figure 8
<p>Evolution of direct methods.</p>
Full article ">Figure 9
<p>Evolution of RGB-D methods.</p>
Full article ">Figure 10
<p>Visual-inertial SLAM workflow demonstration.</p>
Full article ">Figure 11
<p>Evolution of loosely coupled methods.</p>
Full article ">Figure 12
<p>Evolution of tightly coupled methods.</p>
Full article ">Figure 13
<p>Evolution of (<b>a</b>) feature-based and (<b>b</b>) direct-based LiDAR SLAM methods.</p>
Full article ">Figure 14
<p>Progress of (<b>a</b>) loosely and (<b>b</b>) tightly coupled LiDAR-inertia methods.</p>
Full article ">Figure 15
<p>Combined SLAM methods’ evolution: (<b>a</b>) loosely and (<b>b</b>) tightly coupled.</p>
Full article ">Figure 16
<p>Distinct locations of EGD Conditions in the Luleå Tunnel Dataset: The images demonstrate the poorly illuminated and feature-poor characteristics of the underground tunnel environment, highlighting the challenging conditions for SLAM algorithms.</p>
Full article ">Figure 17
<p>A top view of the map generated from the 3D LiDAR scans from the field test environment.</p>
Full article ">Figure 18
<p>Traversing path and available ground truth data.</p>
Full article ">Figure 19
<p>Trajectory results for the evaluated methods. (<b>a</b>) The trajectory of vision-based methods compared to ground truth and LVI-SAM. (<b>b</b>) The trajectory of LiDAR-based methods compared to ground truth.</p>
Full article ">Figure 19 Cont.
<p>Trajectory results for the evaluated methods. (<b>a</b>) The trajectory of vision-based methods compared to ground truth and LVI-SAM. (<b>b</b>) The trajectory of LiDAR-based methods compared to ground truth.</p>
Full article ">Figure 20
<p>Trajectory RMSE of evaluated methods for 181 s (about 6 min) of traversing the mine environment. (* considering that ORB-SLAM fails to turn in the corridor that contains quick rotation resulting in failure of feature tracking).</p>
Full article ">Figure 21
<p>Translation error for the evaluated methods against ground truth (* considering that ORB-SLAM failed to follow the turn in the corridor containing a quick rotation. The failure was due to feature tracking failure caused by blurry image).</p>
Full article ">Figure 22
<p>Mean CPU usage consumption for 181 s (about 6 min) traversal time.</p>
Full article ">Figure 23
<p>Mean memory usage for 181 s (about 6 min) traversal time.</p>
Full article ">Figure 24
<p>Data collection device.</p>
Full article ">Figure 25
<p>Traversed path using data collection device to SLAM methods.</p>
Full article ">Figure 26
<p>MSE values for different SLAM methods compared to LVI SAM as ground truth.</p>
Full article ">Figure 27
<p>(<b>a</b>) LiDAR-based mapping and odometry results. (<b>b</b>) Feature tracking is performed using the VINS-Mono SLAM method. Red dots indicate newly detected features in the current frame, blue dots represent successfully tracked features across frames, and purple dots denote features that failed to track or were rejected as outliers.</p>
Full article ">
21 pages, 9794 KiB  
Article
Research on a Density-Based Clustering Method for Eliminating Inter-Frame Feature Mismatches in Visual SLAM Under Dynamic Scenes
by Zhiyong Yang, Kun Zhao, Shengze Yang, Yuhong Xiong, Changjin Zhang, Lielei Deng and Daode Zhang
Sensors 2025, 25(3), 622; https://doi.org/10.3390/s25030622 - 22 Jan 2025
Viewed by 582
Abstract
Visual SLAM relies on the motion information of static feature points in keyframes for both localization and map construction. Dynamic feature points interfere with inter-frame motion pose estimation, thereby affecting the accuracy of map construction and the overall robustness of the visual SLAM [...] Read more.
Visual SLAM relies on the motion information of static feature points in keyframes for both localization and map construction. Dynamic feature points interfere with inter-frame motion pose estimation, thereby affecting the accuracy of map construction and the overall robustness of the visual SLAM system. To address this issue, this paper proposes a method for eliminating feature mismatches between frames in visual SLAM under dynamic scenes. First, a spatial clustering-based RANSAC method is introduced. This method eliminates mismatches by leveraging the distribution of dynamic and static feature points, clustering the points, and separating dynamic from static clusters, retaining only the static clusters to generate a high-quality dataset. Next, the RANSAC method is introduced to fit the geometric model of feature matches, eliminating local mismatches in the high-quality dataset with fewer iterations. The accuracy of the DSSAC-RANSAC method in eliminating feature mismatches between frames is then tested on both indoor and outdoor dynamic datasets, and the robustness of the proposed algorithm is further verified on self-collected outdoor datasets. Experimental results demonstrate that the proposed algorithm reduces the average reprojection error by 58.5% and 49.2%, respectively, when compared to traditional RANSAC and GMS-RANSAC methods. The reprojection error variance is reduced by 65.2% and 63.0%, while the processing time is reduced by 69.4% and 31.5%, respectively. Finally, the proposed algorithm is integrated into the initialization thread of ORB-SLAM2 and the tracking thread of ORB-SLAM3 to validate its effectiveness in eliminating feature mismatches between frames in visual SLAM. Full article
Show Figures

Figure 1

Figure 1
<p>DSSAC-RANSAC algorithm flowchart; the images are derived from the KITTI dataset.</p>
Full article ">Figure 2
<p>Flowchart of the DSSAC method.</p>
Full article ">Figure 3
<p>Methodology flowchart of this study.</p>
Full article ">Figure 4
<p>ORB feature matching in the TUM dataset using DSSAC-RANSAC with 1500 feature points. (<b>a</b>) Brute-force matching, (<b>b</b>,<b>c</b>) clustering and dynamic cluster elimination process of DSSAC, and (<b>d</b>) results after RANSAC removes a small number of mismatches and low-quality points.</p>
Full article ">Figure 5
<p>Comparison of feature matching performance between RANSAC, GMS-RANSAC (G-R), GMS-ATRANSAC (G-ATR) and DSSAC-RANSAC (D-R) in TUM indoor dynamic scenes.</p>
Full article ">Figure 5 Cont.
<p>Comparison of feature matching performance between RANSAC, GMS-RANSAC (G-R), GMS-ATRANSAC (G-ATR) and DSSAC-RANSAC (D-R) in TUM indoor dynamic scenes.</p>
Full article ">Figure 6
<p>Comparison of feature matching performance between RANSAC, GMS-RANSAC (G-R), GMS-ATRANSAC (G-ATR) and DSSAC-RANSAC (D-R) in KITTI outdoor dynamic scenes.</p>
Full article ">Figure 7
<p>Comparison of feature matching performance between RANSAC, GMS-RANSAC (G-R), GMS-ATRANSAC (G-ATR) and DSSAC-RANSAC (D-R) in self-collected outdoor scenes.</p>
Full article ">Figure 8
<p>Comparison of the matching mean error (<b>a</b>) and error variance (<b>b</b>) of RANSAC, GMS-RANSAC, GMS-ATRANSAC and DSSAC-RANSAC at different numbers of feature points.</p>
Full article ">Figure 9
<p>Runtime of four algorithms with different numbers of feature points.</p>
Full article ">Figure 10
<p>ORB-SLAM2 default initialization method (<b>a</b>) and proposed initialization method (<b>b</b>) in the TUM indoor dynamic dataset.</p>
Full article ">Figure 11
<p>DSSAC-RANSAC applied to feature tracking between adjacent keyframes in ORB-SLAM3, as shown in (<b>a</b>,<b>b</b>). (<b>c</b>) Depth point cloud map of the indoor scene constructed using the default method of ORB-SLAM3. (<b>d</b>) Depth point cloud map of the indoor scene constructed by integrating the method proposed in this paper.</p>
Full article ">
24 pages, 5261 KiB  
Article
Extended Study of a Multi-Modal Loop Closure Detection Framework for SLAM Applications
by Mohammed Chghaf, Sergio Rodríguez Flórez and Abdelhafid El Ouardi
Electronics 2025, 14(3), 421; https://doi.org/10.3390/electronics14030421 - 21 Jan 2025
Viewed by 701
Abstract
Loop Closure (LC) is a crucial task in Simultaneous Localization and Mapping (SLAM) for Autonomous Ground Vehicles (AGV). It is an active research area because it improves global localization efficiency. The consistency of the global map and the accuracy of the AGV’s location [...] Read more.
Loop Closure (LC) is a crucial task in Simultaneous Localization and Mapping (SLAM) for Autonomous Ground Vehicles (AGV). It is an active research area because it improves global localization efficiency. The consistency of the global map and the accuracy of the AGV’s location in an unknown environment are highly correlated with the efficiency and robustness of Loop Closure Detection (LCD), especially when facing environmental changes or data unavailability. We propose to introduce multimodal complementary data to increase the algorithms’ resilience. Various methods using different data sources have been proposed to achieve precise place recognition. However, integrating a multimodal loop-closure fusion process that combines multiple information sources within a SLAM system has been explored less. Additionally, existing multimodal place recognition techniques are often difficult to integrate into existing frameworks. In this paper, we propose a fusion scheme of multiple place recognition methods based on camera and LiDAR data for a robust multimodal LCD. The presented approach uses Similarity-Guided Particle Filtering (SGPF) to identify and verify candidates for loop closure. Based on the ORB-SLAM2 framework, the proposed method uses two perception sensors (camera and LiDAR) under two data representation models for each. Our experiments on both KITTI and a self-collected dataset show that our approach outperforms the state-of-the-art methods in terms of place recognition metrics or localization accuracy metrics. The proposed Multi-Modal Loop Closure (MMLC) framework enhances the robustness and accuracy of AGV’s localization by fusing multiple sensor modalities, ensuring consistent performance across diverse environments. Its real-time operation and early loop closure detection enable timely trajectory corrections, reducing navigation errors and supporting cost-effective deployment with adaptable sensor configurations. Full article
(This article belongs to the Special Issue Image Analysis Using LiDAR Data)
Show Figures

Figure 1

Figure 1
<p>Focus on N different perception modalities into the loop closure module, independent of the used pose-graph optimization-based SLAM system.</p>
Full article ">Figure 2
<p>Step-by-step diagram of the SGPF.</p>
Full article ">Figure 3
<p>SATIE Laboratory instrumented car. The vehicle is equipped with a stereo-camera, a LiDAR and a data logger. The ground truth trajectory is recorded using an RTK GNSS receiver.</p>
Full article ">Figure 4
<p>GNSS traces of our self-collected data in SATIE laboratory nearby cities (Saclay and Saint-Aubin).</p>
Full article ">Figure 5
<p>Normalized modality distance of each modality. Using all the frames of sequence 00 of the KITTI dataset, each plotted point represents the normalized modality distance in function of the ground truth distance of all possible pairs of frames in the sequence.</p>
Full article ">Figure 6
<p>ATE (m) in Seq. 00 of KITTI dataset. On the x-axis, the first number represents the total number of LCCs suggested by the perception modalities, while the second number indicates the number of sampled loop candidates. The total is the sum of all considered loop candidates.</p>
Full article ">Figure 7
<p>Processing time (ms) in Seq. 00 of KITTI dataset. On the x-axis, the first number represent the total number of LCCs proposed by the perception modalities. The second number indicates the number of sampled loop candidates. The sum being all the considered loop candidates.</p>
Full article ">Figure 8
<p>Distributions of particles representing potential LCCs, identified by the four modalities over the Seq. 00 of KITTI dataset. (<b>a</b>) The gray particles indicate ground truth loop closures. The yellow particles are all the sampled particles that were dropped during the update step. The blue (magenta, orange, green) particles are based on DBOW (NetVLAD (NV), Scan Context (SC), PointNetVLAD (PNV)) similarity. (<b>b</b>) Distribution of particles across the sequence frames. (<b>c</b>) Distribution of particles across the sequence map. The black dots are the vehicle’s poses.</p>
Full article ">Figure 9
<p>Evolution of cumulative error during ORB-SLAM2 [<a href="#B2-electronics-14-00421" class="html-bibr">2</a>] runtime, using different modalities combinations for loop closure.</p>
Full article ">
20 pages, 3776 KiB  
Article
Parallelized SLAM: Enhancing Mapping and Localization Through Concurrent Processing
by Francisco J. Romero-Ramirez, Miguel Cazorla, Manuel J. Marín-Jiménez, Rafael Medina-Carnicer and Rafael Muñoz-Salinas
Sensors 2025, 25(2), 365; https://doi.org/10.3390/s25020365 - 9 Jan 2025
Viewed by 494
Abstract
Simultaneous Localization and Mapping (SLAM) systems face high computational demands, hindering their real-time implementation on low-end computers. An approach to addressing this challenge involves offline processing, i.e., a map of the environment map is created offline on a powerful computer and then passed [...] Read more.
Simultaneous Localization and Mapping (SLAM) systems face high computational demands, hindering their real-time implementation on low-end computers. An approach to addressing this challenge involves offline processing, i.e., a map of the environment map is created offline on a powerful computer and then passed to a low-end computer, which uses it for navigation, which involves fewer resources. However, even creating the map on a powerful computer is slow since SLAM is designed as a sequential process. This work proposes a parallel mapping method pSLAM for speeding up the offline creation of maps. In pSLAM, a video sequence is partitioned into multiple subsequences, with each processed independently, creating individual submaps. These submaps are subsequently merged to create a unified global map of the environment. Our experiments across a diverse range of scenarios demonstrate an increase in the processing speed of up to 6 times compared to that of the sequential approach while maintaining the same level of robustness. Furthermore, we conducted comparative analyses against state-of-the-art SLAM methods, namely UcoSLAM, OpenVSLAM, and ORB-SLAM3, with our method outperforming these across all of the scenarios evaluated. Full article
Show Figures

Figure 1

Figure 1
<p>This image shows the main modules of our system. Initially, the video sequence is divided into multiple subsequences, with each processed by dedicated Map builder modules. Finally, each module’s resulting maps are merged to create a unified representation.</p>
Full article ">Figure 2
<p>Representation of the methodology followed for merging the submaps. The merge modules are executed in parallel within the system, ensuring that each module can be executed only when the maps it will unify have previously been created.</p>
Full article ">Figure 3
<p>(<b>a</b>,<b>c</b>) Maps generated by our method, using two video sequences from the Kitti dataset. Different configurations of the parameter <span class="html-italic">m</span> were employed to create each map, <math display="inline"><semantics> <mrow> <mi>m</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>m</mi> <mo>=</mo> <mn>4</mn> </mrow> </semantics></math>, respectively. (<b>b</b>,<b>d</b>) Trajectories of the evaluated methods are depicted in various colors, along with the ground truth.</p>
Full article ">Figure 4
<p>(<b>a</b>,<b>c</b>) Maps generated by our method for two video sequences of the 4Seasons dataset, using the parameter <math display="inline"><semantics> <mrow> <mi>m</mi> <mo>=</mo> <mn>4</mn> </mrow> </semantics></math>. (<b>b</b>,<b>d</b>) Trajectories of the different methods in the same sequences.</p>
Full article ">Figure 5
<p>(<b>a</b>,<b>c</b>) Maps generated using two video sequences from the CUR dataset. Different parameter configurations <span class="html-italic">m</span> were employed to create the maps: (<b>a</b>) <math display="inline"><semantics> <mrow> <mi>m</mi> <mo>=</mo> <mn>16</mn> </mrow> </semantics></math>, while (<b>c</b>) <math display="inline"><semantics> <mrow> <mi>m</mi> <mo>=</mo> <mn>8</mn> </mrow> </semantics></math>. (<b>b</b>,<b>d</b>) Trajectories followed by the different methods evaluated.</p>
Full article ">Figure 6
<p>Computing time comparison of UcoSLAM and the proposed pSLAM method for different values of <span class="html-italic">m</span> on the datasets analyzed.</p>
Full article ">
21 pages, 4833 KiB  
Article
An Effective 3D Instance Map Reconstruction Method Based on RGBD Images for Indoor Scene
by Heng Wu, Yanjie Liu, Chao Wang and Yanlong Wei
Remote Sens. 2025, 17(1), 139; https://doi.org/10.3390/rs17010139 - 3 Jan 2025
Viewed by 544
Abstract
To enhance the intelligence of robots, constructing accurate object-level instance maps is essential. However, the diversity and clutter of objects in indoor scenes present significant challenges for instance map construction. To tackle this issue, we propose a method for constructing object-level instance maps [...] Read more.
To enhance the intelligence of robots, constructing accurate object-level instance maps is essential. However, the diversity and clutter of objects in indoor scenes present significant challenges for instance map construction. To tackle this issue, we propose a method for constructing object-level instance maps based on RGBD images. First, we utilize the advanced visual odometer ORB-SLAM3 to estimate the poses of image frames and extract keyframes. Next, we perform semantic and geometric segmentation on the color and depth images of these keyframes, respectively, using semantic segmentation to optimize the geometric segmentation results and address inaccuracies in the target segmentation caused by small depth variations. The segmented depth images are then projected into point cloud segments, which are assigned corresponding semantic information. We integrate these point cloud segments into a global voxel map, updating each voxel’s class using color, distance constraints, and Bayesian methods to create an object-level instance map. Finally, we construct an ellipsoids scene from this map to test the robot’s localization capabilities in indoor environments using semantic information. Our experiments demonstrate that this method accurately and robustly constructs the environment, facilitating precise object-level scene segmentation. Furthermore, compared to manually labeled ellipsoidal maps, generating ellipsoidal maps from extracted objects enables accurate global localization. Full article
(This article belongs to the Special Issue 3D Scene Reconstruction, Modeling and Analysis Using Remote Sensing)
Show Figures

Figure 1

Figure 1
<p>Overview of our pipeline. In Object-level Voxel Construction, different colors represent different instance objects.</p>
Full article ">Figure 2
<p>The fusion of semantic segmentation and geometric segmentation. YOLOv11 is used for semantic segmentation of the RGB image (<b>a</b>), and the Mask of the target is obtained (<b>b</b>). The depth image (<b>c</b>) is projected onto 3D space for geometric segmentation (<b>d</b>). Then, the mask extracted from semantic segmentation is compensated for geometric segmentation (<b>e</b>), and finally the segmented regions (<b>f</b>) are obtained.</p>
Full article ">Figure 3
<p>The fusion of point cloud segment between different frames in the 061 sequence of the SceneNN dataset. The point cloud segment labels ➀, ➁, … of Frame_0 are global labels, while the point cloud segment labels “1”, “2”, … of Frame_191 are temporary labels. Global labels and temporary labels are not one-to-one correspondences; they are only used to represent mutual relationships. For example, segment “2” is scored separately with overlapping regions ➀ and ➁. When the scores of “2” and ➁ are higher, label ➁ is assigned to segment “2” and integrated into the global map.</p>
Full article ">Figure 4
<p>Error reconstruction areas under different thresholds. Evaluate the three methods using error thresholds of 2 cm, 4 cm, and 8 cm on the <math display="inline"><semantics> <mrow> <mn>11</mn> <mo>_</mo> <mn>01</mn> </mrow> </semantics></math> sequence of the Scannet dataset. The red area represents error reconstructions that exceed the threshold.</p>
Full article ">Figure 5
<p>Compare the map accuracy of the 11_01 squence in Scannet. Top: Ground truth maps and RGB images from different viewpoints. Bottom: Images observed from three different viewpoints on different maps.</p>
Full article ">Figure 6
<p>Qualitative results on a selected scene from SceneNN.</p>
Full article ">Figure 7
<p>Constructing object-level instance map and ellipsoids scene using Chess sequences in the 7scene dataset. (<b>a</b>) is the mesh map of the scene, (<b>b</b>) is the object-level instance map, (<b>c</b>) is the ellipsoid scene, and (<b>d</b>) shows the extracted objects in the environment.</p>
Full article ">Figure 8
<p>Absolute translation errors for different sequences in the 7Scenes dataset. Zins et al. [<a href="#B35-remotesensing-17-00139" class="html-bibr">35</a>] used manually labeled ellipsoid scenes, while Voxblox++ and ours use ellipsoid scenes constructed from instance objects.</p>
Full article ">Figure 9
<p>Absolute rotation errors for different sequences in the 7Scenes dataset. Zins et al. [<a href="#B35-remotesensing-17-00139" class="html-bibr">35</a>] used manually labeled ellipsoid scenes, while Voxblox++ and ours use ellipsoid scenes constructed from instance objects.</p>
Full article ">Figure 10
<p>Constructing object-level instance map and ellipsoids scene in real world. (<b>a</b>) is the mesh map of the scene, (<b>b</b>) is the object-level instance map, (<b>c</b>) is the ellipsoids scene, and (<b>d</b>) shows the extracted objects in the scene.</p>
Full article ">
22 pages, 12893 KiB  
Article
Research on Visual–Inertial Measurement Unit Fusion Simultaneous Localization and Mapping Algorithm for Complex Terrain in Open-Pit Mines
by Yuanbin Xiao, Wubin Xu, Bing Li, Hanwen Zhang, Bo Xu and Weixin Zhou
Sensors 2024, 24(22), 7360; https://doi.org/10.3390/s24227360 - 18 Nov 2024
Viewed by 951
Abstract
As mining technology advances, intelligent robots in open-pit mining require precise localization and digital maps. Nonetheless, significant pitch variations, uneven highways, and rocky surfaces with minimal texture present substantial challenges to the precision of feature extraction and positioning in traditional visual SLAM systems, [...] Read more.
As mining technology advances, intelligent robots in open-pit mining require precise localization and digital maps. Nonetheless, significant pitch variations, uneven highways, and rocky surfaces with minimal texture present substantial challenges to the precision of feature extraction and positioning in traditional visual SLAM systems, owing to the intricate terrain features of open-pit mines. This study proposes an improved SLAM technique that integrates visual and Inertial Measurement Unit (IMU) data to address these challenges. The method incorporates a point–line feature fusion matching strategy to enhance the quality and stability of line feature extraction. It integrates an enhanced Line Segment Detection (LSD) algorithm with short segment culling and approximate line merging techniques. The combination of IMU pre-integration and visual feature restrictions is executed inside a tightly coupled visual–inertial framework utilizing a sliding window approach for back-end optimization, enhancing system robustness and precision. Experimental results demonstrate that the suggested method improves RMSE accuracy by 36.62% and 26.88% on the MH and VR sequences of the EuRoC dataset, respectively, compared to ORB-SLAM3. The improved SLAM system significantly reduces trajectory drift in the simulated open-pit mining tests, improving localization accuracy by 40.62% and 61.32%. The results indicate that the proposed method demonstrates significance. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

Figure 1
<p>The system framework diagram. This procedure encompasses data input, front-end visual–inertial odometry, closed-loop detection, back-end optimization, and mapping; The red box in the data input section represents the sparse textured slope.</p>
Full article ">Figure 2
<p>An example diagram of the line feature extraction optimization method. The efficacy of line segment identification is enhanced by implementing short line elimination and approximate line segment amalgamation procedures.</p>
Full article ">Figure 3
<p>(<b>a</b>) Flowchart of improved LSD line feature detection algorithm; (<b>b</b>) schematic diagram of similar line feature merging.</p>
Full article ">Figure 4
<p>Visual observation model and IMU schematic diagram. The IMU data must be integrated and calculated in discrete time due to the fact that its data acquisition frequency is significantly higher than that of the camera. Consequently, a unified data format is necessary to ensure close coupling of the data. This diagram uses hollow circles, hollow triangles, green stars, and black squares to represent image frames, keyframes, IMU data, and the pre-integration process.</p>
Full article ">Figure 5
<p>Marginalization model. The relationship model between the camera and the landmark locations during the marginalization process.</p>
Full article ">Figure 6
<p>Histogram for the performance comparison of the line feature extraction algorithm: (<b>a</b>) the average time required to derive line features; (<b>b</b>) the average number of line feature extractions.</p>
Full article ">Figure 7
<p>Line feature extraction algorithm performance comparison: (<b>a</b>) the LSD algorithm’s effect on line feature extraction; (<b>b</b>) the enhanced LSD method. Utilizing short segment elimination and approximate line merging techniques markedly eliminates redundant short line features while preserving the longer line segments essential for localization precision. The red box highlights the comparison section between the two images, with the green dots and lines representing the extracted point and line features from the images, respectively.</p>
Full article ">Figure 8
<p>Histogram of absolute trajectory error. The histogram illustrates that, in the MH sequence, the absolute trajectory error of the enhanced algorithm is less than that of other algorithms, whereas, in the VR sequence, the enhanced algorithm performs comparably to or better than the perfect algorithm.</p>
Full article ">Figure 9
<p>Analysis results of trajectory error comparison: (<b>a</b>) comparison of the trajectory for Sequence MH_04_difficult; (<b>b</b>) comparison of difficult trajectories in Sequence V2_03. The black boxes and red arrows in the figure are used to enlarge key areas and mark trajectory deviations, highlighting the accuracy differences among different algorithms in these regions.</p>
Full article ">Figure 10
<p>Results of absolute pose inaccuracy for each series: (<b>a</b>) Sequence MH_04_difficult; (<b>b</b>) Sequence MH_05_difficult; (<b>c</b>) Sequence V1_02_medium; (<b>d</b>) Sequence V1_03_difficult. The color-coded line represents varying levels of Absolute Pose Error (APE) along the trajectory, with red indicating higher error and blue indicating lower error, highlighting accuracy differences across segments.</p>
Full article ">Figure 11
<p>Three-dimensional point cloud maps: (<b>a</b>) Sequence MH_05_difficult; (<b>b</b>) Sequence V1_03_difficult. The figure shows a 3D mapping visualization where the green lines represent the estimated trajectory, red points indicate mapped features, and black points show additional environmental points.</p>
Full article ">Figure 12
<p>Experimental mining intelligent robot platform: (<b>a</b>) left view; (<b>b</b>) front view.</p>
Full article ">Figure 13
<p>Scene 1: circular open-pit excavation: (<b>a</b>) real-world scene; (<b>b</b>) diagram of movement trajectory. In (<b>a</b>), the red arrows represent the motion trajectory of the mapping robot. In (<b>b</b>), points A, B, and C represent key checkpoints along the closed-loop path.</p>
Full article ">Figure 14
<p>Comparison of trajectory errors in Scenario 1.</p>
Full article ">Figure 15
<p>Analysis of experimental outcomes in Scenario 1: (<b>a</b>) comparison of 2D plane trajectories; (<b>b</b>) absolute trajectory error of data series.</p>
Full article ">Figure 16
<p>Scene 2: Uneven road conditions in an open-pit mine: (<b>a</b>) real-world scene; (<b>b</b>) diagram of movement trajectory. In (<b>a</b>), the red arrows represent the motion trajectory of the mapping robot. In (<b>b</b>), points A, B, and C represent key checkpoints along the path.</p>
Full article ">Figure 17
<p>Comparison of trajectory errors in Scenario 2.</p>
Full article ">
18 pages, 2990 KiB  
Article
A GGCM-E Based Semantic Filter and Its Application in VSLAM Systems
by Yuanjie Li, Chunyan Shao and Jiaming Wang
Electronics 2024, 13(22), 4487; https://doi.org/10.3390/electronics13224487 - 15 Nov 2024
Viewed by 532
Abstract
Image matching-based visual simultaneous localization and mapping (vSLAM) extracts low-level pixel features to reconstruct camera trajectories and maps through the epipolar geometry method. However, it fails to achieve correct trajectories and mapping when there are low-quality feature correspondences in several challenging environments. Although [...] Read more.
Image matching-based visual simultaneous localization and mapping (vSLAM) extracts low-level pixel features to reconstruct camera trajectories and maps through the epipolar geometry method. However, it fails to achieve correct trajectories and mapping when there are low-quality feature correspondences in several challenging environments. Although the RANSAC-based framework can enable better results, it is computationally inefficient and unstable in the presence of a large number of outliers. A Faster R-CNN learning-based semantic filter is proposed to explore the semantic information of inliers to remove low-quality correspondences, helping vSLAM localize accurately in our previous work. However, the semantic filter learning method generalizes low precision for low-level and dense texture-rich scenes, leading the semantic filter-based vSLAM to be unstable and have poor geometry estimation. In this paper, a GGCM-E-based semantic filter using YOLOv8 is proposed to address these problems. Firstly, the semantic patches of images are collected from the KITTI dataset, the TUM dataset provided by the Technical University of Munich, and real outdoor scenes. Secondly, the semantic patches are classified by our proposed GGCM-E descriptors to obtain the YOLOv8 neural network training dataset. Finally, several semantic filters for filtering low-level and dense texture-rich scenes are generated and combined into the ORB-SLAM3 system. Extensive experiments show that the semantic filter can detect and classify semantic levels of different scenes effectively, filtering low-level semantic scenes to improve the quality of correspondences, thus achieving accurate and robust trajectory reconstruction and mapping. For the challenging autonomous driving benchmark and real environments, the vSLAM system with respect to the GGCM-E-based semantic filter demonstrates its superiority regarding reducing the 3D position error, such that the absolute trajectory error is reduced by up to approximately 17.44%, showing its promise and good generalization. Full article
(This article belongs to the Special Issue Application of Artificial Intelligence in Robotics)
Show Figures

Figure 1

Figure 1
<p>ORB-SLAM3 framework with the proposed semantic filter module.</p>
Full article ">Figure 2
<p>Framework of the proposed semantic filter approach.</p>
Full article ">Figure 3
<p>Computation of GGCM-E features.</p>
Full article ">Figure 4
<p>Semantic filtering on the KITTI frame.</p>
Full article ">Figure 5
<p>Semantic filtering on our captured outdoor frame.</p>
Full article ">Figure 6
<p>The trajectory of KITTI07 with respect to the ground truth using GGCM-E semantic filter.</p>
Full article ">Figure 7
<p>Comparison of trajectories between the proposed method and ground truth in the KITTI dataset.</p>
Full article ">Figure 8
<p>Comparison on APEs with respect to ground truth of the ORB-SLAM3 and the semantic filter.</p>
Full article ">Figure 9
<p>Dense texture-rich sequences in TUM dataset (DTR sequences).</p>
Full article ">Figure 10
<p>Comparison of camera trajectories in DTR sequences.</p>
Full article ">Figure 11
<p>Comparison of the trajectory with respect to the ground truth of DynaSLAM and GGCM-E+DynaSLAM on KITTI00 sequences.</p>
Full article ">Figure 12
<p>Comparison of the APEs of semantic filter-based Structure-SLAM, LDSO and DynaSLAM on KITTI07 sequences.</p>
Full article ">
20 pages, 1837 KiB  
Article
A Monocular Ranging Method for Ship Targets Based on Unmanned Surface Vessels in a Shaking Environment
by Zimu Wang, Xiunan Li, Peng Chen, Dan Luo, Gang Zheng and Xin Chen
Remote Sens. 2024, 16(22), 4220; https://doi.org/10.3390/rs16224220 - 12 Nov 2024
Viewed by 1011
Abstract
Aiming to address errors in the estimation of the position and attitude of an unmanned vessel, especially during vibration, where the rapid loss of feature point information hinders continuous attitude estimation and global trajectory mapping, this paper improves the monocular ORB-SLAM framework based [...] Read more.
Aiming to address errors in the estimation of the position and attitude of an unmanned vessel, especially during vibration, where the rapid loss of feature point information hinders continuous attitude estimation and global trajectory mapping, this paper improves the monocular ORB-SLAM framework based on the characteristics of the marine environment. In general, we extract the location area of the artificial sea target in the video, build a virtual feature set for it, and filter the background features. When shaking occurs, GNSS information is combined and the target feature set is used to complete the map reconstruction task. Specifically, firstly, the sea target area of interest is detected by YOLOv5, and the feature extraction and matching method is optimized in the front-end tracking stage to adapt to the sea environment. In the key frame selection and local map optimization stage, the characteristics of the feature set are improved to further improve the positioning accuracy, to provide more accurate position and attitude information about the unmanned platform. We use GNSS information to provide the scale and world coordinates for the map. Finally, the target distance is measured by the beam ranging method. In this paper, marine unmanned platform data, GNSS, and AIS position data are autonomously collected, and experiments are carried out using the proposed marine ranging system. Experimental results show that the maximum measurement error of this method is 9.2%, and the average error is 4.7%. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of our method. The maize-yellow parallelogram represents input and output, the blue rectangle represents methods, and the red diamond represents special cases.</p>
Full article ">Figure 2
<p>YOLOv5 network structure.</p>
Full article ">Figure 3
<p>Ship target extraction of the YOLOv5 network.</p>
Full article ">Figure 4
<p>The three processes of video, mapping, and set update, and the strategy to be adopted when tracking loss occurs due to shaking. The red stars represent feature points that are not on the ship.</p>
Full article ">Figure 5
<p>Triangulation ranging.</p>
Full article ">Figure 6
<p>Video sequence acquisition route. Gray ship is the main positioning vessel and red line is the camera route. The red arrow represents the camera path, and the black ship is the camera’s main observation ship “Yihai 157”.</p>
Full article ">Figure 7
<p>There are many water scenes in the video sequence data set collected in <a href="#remotesensing-16-04220-f007" class="html-fig">Figure 7</a>, and the ships are in a static anchored state. The whole route ensures that the video can obtain complete ship information, from the bow to the stern as a whole. The bottom right image is the video frame at the moment when the camera shaking occurs due to the natural undulations of the sea.</p>
Full article ">Figure 8
<p>Ship detection.</p>
Full article ">Figure 9
<p>Object-based feature set establishment. The green points are the extracted feature points.</p>
Full article ">Figure 10
<p>An overview of the three methods. The red lines represent the path of the ship and the blue points are the target feature points. Ours (<b>a</b>). Yolo_ORBSLAM (<b>b</b>). Water segmentation SLAM mapping (<b>c</b>). U-net dynamic SLAM (<b>d</b>).</p>
Full article ">Figure 11
<p>Real distance and actual distance of sampling frame.</p>
Full article ">Figure 12
<p>The analysis of USV movement path from the top view is clearer, and our method (<b>a</b>) effectively overcomes the shaking part. It can be clearly seen that there is a large shift at the end of YOLO_ORBSLAM (<b>b</b>), and the SLAM method of water segmentation (<b>c</b>) has poor performance. U-net dynamic SLAM (<b>d</b>) is stable for the first part but it finally stops at a shaking moment.</p>
Full article ">
7 pages, 1148 KiB  
Proceeding Paper
A Novel Method to Improve the Efficiency and Performance of Cloud-Based Visual Simultaneous Localization and Mapping
by Omar M. Salih, Hussam Rostum and József Vásárhelyi
Eng. Proc. 2024, 79(1), 78; https://doi.org/10.3390/engproc2024079078 - 11 Nov 2024
Viewed by 481
Abstract
Since Visual Simultaneous Localization and Mapping (VSLAM) inherently requires intensive computational operations and consumes many hardware resources, these limitations pose challenges to implementing the entire VSLAM architecture within limited processing power and battery capacity. This paper proposes a novel solution to improve the [...] Read more.
Since Visual Simultaneous Localization and Mapping (VSLAM) inherently requires intensive computational operations and consumes many hardware resources, these limitations pose challenges to implementing the entire VSLAM architecture within limited processing power and battery capacity. This paper proposes a novel solution to improve the efficiency and performance of exchanging data between the unmanned aerial vehicle (UAV) and the cloud server. First, an adaptive ORB (oriented FAST and rotated BRIEF) method is proposed for precise tracking, mapping, and re-localization. Second, efficient visual data encoding and decoding methods are proposed for exchanging the data between the edge device and the UAV. The results show an improvement in the trajectory RMSE and accurate tracking using the adaptive ORB-SLAM. Furthermore, the proposed visual data encoding and decoding showed an outstanding performance compared with the most used standard JPEG-based system over high quantization ratios. Full article
(This article belongs to the Proceedings of The Sustainable Mobility and Transportation Symposium 2024)
Show Figures

Figure 1

Figure 1
<p>Adaptive ORB-SLAM architecture.</p>
Full article ">Figure 2
<p>The proposed visual data algorithms for (<b>a</b>) encoding and (<b>b</b>) decoding.</p>
Full article ">Figure 3
<p>The graph of the comparison between the estimated trajectory and the actual camera trajectory.</p>
Full article ">
17 pages, 3301 KiB  
Article
Stereo and LiDAR Loosely Coupled SLAM Constrained Ground Detection
by Tian Sun, Lei Cheng, Ting Zhang, Xiaoping Yuan, Yanzheng Zhao and Yong Liu
Sensors 2024, 24(21), 6828; https://doi.org/10.3390/s24216828 - 24 Oct 2024
Viewed by 999
Abstract
In many robotic applications, creating a map is crucial, and 3D maps provide a method for estimating the positions of other objects or obstacles. Most of the previous research processes 3D point clouds through projection-based or voxel-based models, but both approaches have certain [...] Read more.
In many robotic applications, creating a map is crucial, and 3D maps provide a method for estimating the positions of other objects or obstacles. Most of the previous research processes 3D point clouds through projection-based or voxel-based models, but both approaches have certain limitations. This paper proposes a hybrid localization and mapping method using stereo vision and LiDAR. Unlike the traditional single-sensor systems, we construct a pose optimization model by matching ground information between LiDAR maps and visual images. We use stereo vision to extract ground information and fuse it with LiDAR tensor voting data to establish coplanarity constraints. Pose optimization is achieved through a graph-based optimization algorithm and a local window optimization method. The proposed method is evaluated using the KITTI dataset and compared against the ORB-SLAM3, F-LOAM, LOAM, and LeGO-LOAM methods. Additionally, we generate 3D point cloud maps for the corresponding sequences and high-definition point cloud maps of the streets in sequence 00. The experimental results demonstrate significant improvements in trajectory accuracy and robustness, enabling the construction of clear, dense 3D maps. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

Figure 1
<p>Pose optimization based on ground information. <span class="html-italic">T</span>, <span class="html-italic">p</span>, and <span class="html-italic">q</span> represent the transformation matrix, points on the plane, and points off the plane, respectively.</p>
Full article ">Figure 2
<p>The stereo sensor model and the coordinate systems used [<a href="#B34-sensors-24-06828" class="html-bibr">34</a>].</p>
Full article ">Figure 3
<p>Region of interest extraction. (<b>a</b>) Left image. (<b>b</b>) Right image. (<b>c</b>) Disparity image. (<b>d</b>) v-disparity. (<b>e</b>) u-disparity. (<b>d</b>,<b>e</b>) are derived from (<b>c</b>). (<b>f</b>) Large obstacles removed by removing peak values from (<b>e</b>). (<b>g</b>) v-disparity based on (<b>f</b>), and red line is disparity profile of ground plane. (<b>h</b>) Detected ground plane and region of interest (RoI); RoI in red box. (<b>i</b>) City 3D reconstruction; green represents ground.</p>
Full article ">Figure 4
<p>Graph-structure optimization. <span class="html-italic">P</span> represents the nodes of visual points, and <span class="html-italic">X</span> represents the pose of the frame. “Ground” denotes the ground information extracted from the 3D reconstruction.</p>
Full article ">Figure 5
<p>Trajectory estimates in the KITTI dataset. (<b>a</b>) 00. (<b>b</b>) 01. (<b>c</b>) 05. (<b>d</b>) 07. (<b>e</b>) 08. (<b>f</b>) 09.</p>
Full article ">Figure 6
<p>High-definition display of point clouds for some streets in the 00 sequence. (<b>a</b>) 00. (<b>b</b>) 01. (<b>c</b>) 05. (<b>d</b>) 07. (<b>e</b>) 08. (<b>f</b>) 09.</p>
Full article ">Figure 7
<p>3D reconstruction based on road constraints, where green represents the road. (<b>a</b>) 00. (<b>b</b>) 01. (<b>c</b>) 05. (<b>d</b>) 07. (<b>e</b>) 08. (<b>f</b>) 09.</p>
Full article ">Figure 7 Cont.
<p>3D reconstruction based on road constraints, where green represents the road. (<b>a</b>) 00. (<b>b</b>) 01. (<b>c</b>) 05. (<b>d</b>) 07. (<b>e</b>) 08. (<b>f</b>) 09.</p>
Full article ">Figure 8
<p>High-definition display of point clouds for some streets in the 00 sequence.The image in the top left corner is a 3D reconstruction of the entire city, and the other images depict details of its streets (<b>a</b>–<b>e</b>).</p>
Full article ">
19 pages, 12975 KiB  
Article
Enhancing Real-Time Visual SLAM with Distant Landmarks in Large-Scale Environments
by Hexuan Dou, Xinyang Zhao, Bo Liu, Yinghao Jia, Guoqing Wang and Changhong Wang
Drones 2024, 8(10), 586; https://doi.org/10.3390/drones8100586 - 16 Oct 2024
Viewed by 1578
Abstract
The efficacy of visual Simultaneous Localization and Mapping (SLAM) diminishes in large-scale environments due to challenges in identifying distant landmarks, leading to a limited perception range and trajectory drift. This paper presents a practical method to enhance the accuracy of feature-based real-time visual [...] Read more.
The efficacy of visual Simultaneous Localization and Mapping (SLAM) diminishes in large-scale environments due to challenges in identifying distant landmarks, leading to a limited perception range and trajectory drift. This paper presents a practical method to enhance the accuracy of feature-based real-time visual SLAM for compact unmanned vehicles by constructing distant map points. By tracking consecutive image features across multiple frames, remote map points are generated with sufficient parallax angles, extending the mapping scope to the theoretical maximum range. Observations of these landmarks from preceding keyframes are supplemented accordingly, improving back-end optimization and, consequently, localization accuracy. The effectiveness of this approach is ensured by the introduction of the virtual map point, a proposed data structure that links relational features to an imaginary map point, thereby maintaining the constrained size of local optimization during triangulation. Based on the ORB-SLAM3 code, a SLAM system incorporating the proposed method is implemented and tested. Experimental results on drone and vehicle datasets demonstrate that the proposed method outperforms ORB-SLAM3 in both accuracy and perception range with negligible additional processing time, thus preserving real-time performance. Field tests using a UGV further validate the efficacy of the proposed method. Full article
(This article belongs to the Section Drone Design and Development)
Show Figures

Figure 1

Figure 1
<p>Triangulation of a spatial point. The uncertainty of triangulation is represented by the gray area. The case illustrated on the right has a shorter baseline <span class="html-italic">b</span> and/or a smaller parallax angle <math display="inline"><semantics> <mi>θ</mi> </semantics></math>, resulting in greater uncertainty in the localization of <math display="inline"><semantics> <mi mathvariant="normal">P</mi> </semantics></math>.</p>
Full article ">Figure 2
<p>Extending the perception range to the maximum by utilizing features from keyframes beyond the local mapping range.</p>
Full article ">Figure 3
<p>The relationship between the perception ranges of the camera and the structure of the matrix <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">H</mi> <mrow> <mi>C</mi> <mi>P</mi> </mrow> </msub> </semantics></math>. The case in (<b>a</b>) has shorter perception range, with <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">H</mi> <mrow> <mi>C</mi> <mi>P</mi> </mrow> </msub> </semantics></math> represented as (<b>b</b>). While the case in (<b>c</b>) has longer perception range, with <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">H</mi> <mrow> <mi>C</mi> <mi>P</mi> </mrow> </msub> </semantics></math> augmented by the commonly observed landmarks <math display="inline"><semantics> <msub> <mi mathvariant="script">P</mi> <mrow> <mi>a</mi> <mi>b</mi> <mi>c</mi> </mrow> </msub> </semantics></math>, represented as (<b>d</b>). Note that many of the block matrices in dark blue remain zero due to the absence of observation between corresponding camera pose <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">C</mi> <mi>i</mi> </msub> </semantics></math> and landmark <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">P</mi> <mi>j</mi> </msub> </semantics></math>.</p>
Full article ">Figure 4
<p>A diagram of a virtual map point. (i) The virtual map point <math display="inline"><semantics> <mi mathvariant="normal">V</mi> </semantics></math> is constructed from features matched from two adjacent frames. (ii) As the camera moves, features and back-projected rays in subsequent frames are associated with <math display="inline"><semantics> <mi mathvariant="normal">V</mi> </semantics></math>, allowing for the calculation of the maximum parallax angle. (iii) The spatial coordinates of <math display="inline"><semantics> <mi mathvariant="normal">V</mi> </semantics></math> are triangulated when the parallax angle exceeds the threshold <math display="inline"><semantics> <msub> <mi>θ</mi> <mi>th</mi> </msub> </semantics></math>. (iv) Subsequently, a map point <math display="inline"><semantics> <mi mathvariant="normal">P</mi> </semantics></math> is constructed from <math display="inline"><semantics> <mi mathvariant="normal">V</mi> </semantics></math>, inheriting the observation relationships with frames ranging from <math display="inline"><semantics> <msub> <mi mathvariant="normal">F</mi> <mn>0</mn> </msub> </semantics></math> to <math display="inline"><semantics> <msub> <mi mathvariant="normal">F</mi> <mi>i</mi> </msub> </semantics></math>.</p>
Full article ">Figure 5
<p>An overview of the proposed SLAM system. The management of virtual map points is integrated into the “Local Mapping” thread of ORB-SLAM3.</p>
Full article ">Figure 6
<p>Keyframe trajectory comparison between ORB-SLAM3 (ORB3) and the proposed method (MOD) alongside groundtruth (GT) from the top view in sequences of (<b>a</b>) EuRoC Machine Hall 05, (<b>b</b>) KITTI 00, (<b>c</b>) KAIST 32, (<b>d</b>) KAIST 27, (<b>e</b>) KAIST 33, (<b>f</b>) KAIST 39 and (<b>g</b>) UZH MAV during the dataset test.</p>
Full article ">Figure 7
<p>The distribution of triangulation depth of map points in dataset tests: (<b>a</b>) Comparison between ORB-SLAM3 (ORB3) and the proposed method (MOD); (<b>b</b>) comparison between normal map points and map points converted from virtual map points within the proposed method.</p>
Full article ">Figure 8
<p>The observation on map points in the 2776th frame of the sequence KAIST 26 by (<b>a</b>) ORB-SLAM3 and (<b>b</b>) the proposed method, and in the 2979th frame by (<b>c</b>) ORB-SLAM3 and (<b>d</b>) the proposed method, with normal map points denoted in green and converted virtual map points denoted in red.</p>
Full article ">Figure 9
<p>Comparison of statistics on mean tracking time across the dataset and field tests between ORB-SLAM3 (ORB) and the proposed method (MOD).</p>
Full article ">Figure 10
<p>(<b>a</b>) The experimental platform on a wheeled robot. Intel D435i collects the monocular image sequence and ComNav M100 offers groundtruth. (<b>b</b>) A sample of collected image sequences, which contains parts of distant landmarks. The sequences are collected in campus scenes of (<b>c</b>) yard, (<b>d</b>) road and (<b>e</b>) park. Location: Science Park, Harbin Institute of Technology.</p>
Full article ">Figure 11
<p>Keyframe trajectory comparison between ORB-SLAM3 (ORB3) and the proposed method (MOD) alongside groundtruth (GT) from the top view in scenes of a (<b>a</b>) yard, (<b>b</b>) road and (<b>c</b>) park during the field test.</p>
Full article ">Figure 12
<p>The distribution of triangulation depth of map points in field tests: (<b>a</b>) Comparison between ORB-SLAM3 (ORB3) and the proposed method (MOD); (<b>b</b>) comparison between normal map points and map points converted from virtual map points within the proposed method.</p>
Full article ">
20 pages, 6262 KiB  
Article
YPR-SLAM: A SLAM System Combining Object Detection and Geometric Constraints for Dynamic Scenes
by Xukang Kan, Gefei Shi, Xuerong Yang and Xinwei Hu
Sensors 2024, 24(20), 6576; https://doi.org/10.3390/s24206576 - 12 Oct 2024
Viewed by 948
Abstract
Traditional SLAM systems assume a static environment, but moving objects break this ideal assumption. In the real world, moving objects can greatly influence the precision of image matching and camera pose estimation. In order to solve these problems, the YPR-SLAM system is proposed. [...] Read more.
Traditional SLAM systems assume a static environment, but moving objects break this ideal assumption. In the real world, moving objects can greatly influence the precision of image matching and camera pose estimation. In order to solve these problems, the YPR-SLAM system is proposed. First of all, the system includes a lightweight YOLOv5 detection network for detecting both dynamic and static objects, which provides pre-dynamic object information to the SLAM system. Secondly, utilizing the prior information of dynamic targets and the depth image, a method of geometric constraint for removing motion feature points from the depth image is proposed. The Depth-PROSAC algorithm is used to differentiate the dynamic and static feature points so that dynamic feature points can be removed. At last, the dense cloud map is constructed by the static feature points. The YPR-SLAM system is an efficient combination of object detection and geometry constraint in a tightly coupled way, eliminating motion feature points and minimizing their adverse effects on SLAM systems. The performance of the YPR-SLAM was assessed on the public TUM RGB-D dataset, and it was found that YPR-SLAM was suitable for dynamic situations. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>Framework of YPR-SLAM System. The blue section is ORB-SLAM2, and the orange section is the addition of this paper.</p>
Full article ">Figure 2
<p>The YOLOv5 network architecture.</p>
Full article ">Figure 3
<p>Dynamic target detection and filtering thread. First, the ORB feature point is extracted from the RGB image by the tracking thread. Next, the dynamic target detection thread identifies potential dynamic target areas, and then the Depth-PROSAC algorithm is applied to filter out dynamic feature points. Finally, the static feature points are retained for subsequent pose estimation.</p>
Full article ">Figure 4
<p>The comparison between target detection algorithms and the Depth-PROSAC algorithm in filtering out dynamic feature points. (<b>a</b>) shows that the object detection method directly filters out dynamic feature points, and (<b>b</b>) shows that the Depth-PROSAC algorithm filters out dynamic feature points.</p>
Full article ">Figure 5
<p>Dense point cloud construction workflow.</p>
Full article ">Figure 6
<p>In the fr3_walking_halfsphere sequence, the YPR-SLAM and ORB-SLAM2 systems were used to estimate the 3D motion of the camera. (<b>a</b>) Camera path estimated by ORB-SLAM2; (<b>b</b>) YPR-SLAM estimation of camera trajectory.</p>
Full article ">Figure 7
<p><span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> of the ORB-SLAM2 system and the YPR-SLAM system under different datasets. (<b>a1</b>,<b>a2</b>,<b>c1</b>,<b>c2</b>,<b>e1</b>,<b>e2</b>,<b>g1</b>,<b>g2</b>) represent ATE and RPE obtained by the ORB-SLAM2 system by running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>b1</b>,<b>b2</b>,<b>d1</b>,<b>d2</b>,<b>f1</b>,<b>f2</b>,<b>h1</b>,<b>h2</b>) represent <span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> plots of the YPR-SLAM system running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>a1</b>,<b>b1</b>,<b>c1</b>,<b>d1</b>,<b>e1</b>,<b>f1</b>,<b>g1</b>,<b>h1</b>) represent ATE plots. (<b>a2</b>,<b>b2</b>,<b>c2</b>,<b>d2</b>,<b>e2</b>,<b>f2</b>,<b>g2</b>,<b>h2</b>) represent <span class="html-italic">RPE</span> plots.</p>
Full article ">Figure 7 Cont.
<p><span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> of the ORB-SLAM2 system and the YPR-SLAM system under different datasets. (<b>a1</b>,<b>a2</b>,<b>c1</b>,<b>c2</b>,<b>e1</b>,<b>e2</b>,<b>g1</b>,<b>g2</b>) represent ATE and RPE obtained by the ORB-SLAM2 system by running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>b1</b>,<b>b2</b>,<b>d1</b>,<b>d2</b>,<b>f1</b>,<b>f2</b>,<b>h1</b>,<b>h2</b>) represent <span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> plots of the YPR-SLAM system running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>a1</b>,<b>b1</b>,<b>c1</b>,<b>d1</b>,<b>e1</b>,<b>f1</b>,<b>g1</b>,<b>h1</b>) represent ATE plots. (<b>a2</b>,<b>b2</b>,<b>c2</b>,<b>d2</b>,<b>e2</b>,<b>f2</b>,<b>g2</b>,<b>h2</b>) represent <span class="html-italic">RPE</span> plots.</p>
Full article ">Figure 7 Cont.
<p><span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> of the ORB-SLAM2 system and the YPR-SLAM system under different datasets. (<b>a1</b>,<b>a2</b>,<b>c1</b>,<b>c2</b>,<b>e1</b>,<b>e2</b>,<b>g1</b>,<b>g2</b>) represent ATE and RPE obtained by the ORB-SLAM2 system by running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>b1</b>,<b>b2</b>,<b>d1</b>,<b>d2</b>,<b>f1</b>,<b>f2</b>,<b>h1</b>,<b>h2</b>) represent <span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> plots of the YPR-SLAM system running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>a1</b>,<b>b1</b>,<b>c1</b>,<b>d1</b>,<b>e1</b>,<b>f1</b>,<b>g1</b>,<b>h1</b>) represent ATE plots. (<b>a2</b>,<b>b2</b>,<b>c2</b>,<b>d2</b>,<b>e2</b>,<b>f2</b>,<b>g2</b>,<b>h2</b>) represent <span class="html-italic">RPE</span> plots.</p>
Full article ">Figure 8
<p>Using ORB-SLAM2 and YPR-SLAM to construct dense 3D point cloud map in dynamic scene sequence fre3_walking_xyz. (<b>a</b>) represents a dense 3D point cloud map constructed by the ORB-SLAM2 system; (<b>b</b>) represents a dense 3D point cloud map constructed by the YPR-SLAM system.</p>
Full article ">
24 pages, 4712 KiB  
Article
Balancing Efficiency and Accuracy: Enhanced Visual Simultaneous Localization and Mapping Incorporating Principal Direction Features
by Yuelin Yuan, Fei Li, Xiaohui Liu and Jialiang Chen
Appl. Sci. 2024, 14(19), 9124; https://doi.org/10.3390/app14199124 - 9 Oct 2024
Viewed by 1121
Abstract
In visual Simultaneous Localization and Mapping (SLAM), operational efficiency and localization accuracy are equally crucial evaluation metrics. We propose an enhanced visual SLAM method to ensure stable localization accuracy while improving system efficiency. It can maintain localization accuracy even after reducing the number [...] Read more.
In visual Simultaneous Localization and Mapping (SLAM), operational efficiency and localization accuracy are equally crucial evaluation metrics. We propose an enhanced visual SLAM method to ensure stable localization accuracy while improving system efficiency. It can maintain localization accuracy even after reducing the number of feature pyramid levels by 50%. Firstly, we innovatively incorporate the principal direction error, which represents the global geometric features of feature points, into the error function for pose estimation, utilizing Pareto optimal solutions to improve the localization accuracy. Secondly, for loop-closure detection, we construct a feature matrix by integrating the grayscale and gradient direction of an image. This matrix is then dimensionally reduced through aggregation, and a multi-layer detection approach is employed to ensure both efficiency and accuracy. Finally, we optimize the feature extraction levels and integrate our method into the visual system to speed up the extraction process and mitigate the impact of the reduced levels. We comprehensively evaluate the proposed method on local and public datasets. Experiments show that the SLAM method maintained high localization accuracy after reducing the tracking time by 24% compared with ORB SLAM3. Additionally, the proposed loop-closure-detection method demonstrated superior computational efficiency and detection accuracy compared to the existing methods. Full article
(This article belongs to the Special Issue Mobile Robotics and Autonomous Intelligent Systems)
Show Figures

Figure 1

Figure 1
<p>The visual SLAM framework. The main contributions of our work are highlighted within the yellow dashed box. (<b>a</b>) Pose estimation incorporating principal direction information; (<b>b</b>) descriptor extraction; (<b>c</b>) similarity calculation and loop-closure detection.</p>
Full article ">Figure 2
<p>Pose estimation incorporating principal direction information. The small rectangles represent feature points, and the straight lines with arrows represent the principal directions of the feature points.</p>
Full article ">Figure 3
<p>Loop-closure detection based on aggregation descriptors. The left side shows the aggregation feature descriptor extraction process, while the right side shows the loop-closure-detection process.</p>
Full article ">Figure 4
<p>The illustration of principal direction feature projection. The purple rectangles represent feature points, while the pink lines indicate the principal directions of the feature points.</p>
Full article ">Figure 5
<p>Aggregation feature descriptor generation process. Each image generates two feature vectors, aggregated from the feature matrix along the row and column directions.</p>
Full article ">Figure 6
<p>Examples of scenes in dataset. (<b>a</b>) Around-view image; (<b>b</b>) fast motion and uneven feature distribution; (<b>c</b>) low luminance; (<b>d</b>) high luminance; (<b>e</b>) image rotation.</p>
Full article ">Figure 7
<p>Experimental trajectory and loop-closure ground truth. From left to right: sequences 00, 05, 06, 07 from KITTI.</p>
Full article ">Figure 8
<p>Experimental trajectory and loop-closure ground truth in local datasets.</p>
Full article ">Figure 9
<p>Precision–recall curves based on 00 sequence (<b>left</b>) and local dataset (<b>right</b>).</p>
Full article ">Figure 10
<p>Maximum recall at 100% precision.</p>
Full article ">Figure 11
<p>Schematic of image processing. The left image is the original, and the right image has luminance reduced to 15%.</p>
Full article ">Figure 12
<p>Maximum recall at 100% precision in low-luminance scenarios.</p>
Full article ">Figure 13
<p>Comparison of trajectory results. The red box indicates a locally magnified trajectory. The left and right images correspond to sequences 05 and 06, both from the KITTI dataset.</p>
Full article ">
Back to TopTop