Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (126)

Search Parameters:
Keywords = RGB-D SLAM

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 39910 KiB  
Article
DyGS-SLAM: Realistic Map Reconstruction in Dynamic Scenes Based on Double-Constrained Visual SLAM
by Fan Zhu, Yifan Zhao, Ziyu Chen, Chunmao Jiang, Hui Zhu and Xiaoxi Hu
Remote Sens. 2025, 17(4), 625; https://doi.org/10.3390/rs17040625 - 12 Feb 2025
Viewed by 446
Abstract
Visual SLAM is widely applied in robotics and remote sensing. The fusion of Gaussian radiance fields and Visual SLAM has demonstrated astonishing efficacy in constructing high-quality dense maps. While existing methods perform well in static scenes, they are prone to the influence of [...] Read more.
Visual SLAM is widely applied in robotics and remote sensing. The fusion of Gaussian radiance fields and Visual SLAM has demonstrated astonishing efficacy in constructing high-quality dense maps. While existing methods perform well in static scenes, they are prone to the influence of dynamic objects in real-world dynamic environments, thus making robust tracking and mapping challenging. We introduce DyGS-SLAM, a Visual SLAM system that employs dual constraints to achieve high-fidelity static map reconstruction in dynamic environments. We extract ORB features within the scene, and use open-world semantic segmentation models and multi-view geometry to construct dual constraints, forming a zero-shot dynamic information elimination module while recovering backgrounds occluded by dynamic objects. Furthermore, we select high-quality keyframes and use them for loop closure detection and global optimization, constructing a foundational Gaussian map through a set of determined point clouds and poses and integrating repaired frames for rendering new viewpoints and optimizing 3D scenes. Experimental results on the TUM RGB-D, Bonn, and Replica datasets, as well as real scenes, demonstrate that our method has excellent localization accuracy and mapping quality in dynamic scenes. Full article
(This article belongs to the Special Issue 3D Scene Reconstruction, Modeling and Analysis Using Remote Sensing)
Show Figures

Figure 1

Figure 1
<p>System framework of DyGS-SLAM. The tracking thread conducts dynamic object removal and background inpainting. The mapping thread reconstructs the Gaussian map and performs differentiable rendering using a set of determined poses and point clouds. Lastly, the 3D scene is optimized based on the repaired frames and rendered frames.</p>
Full article ">Figure 2
<p>Open-world semantic segmentation model.</p>
Full article ">Figure 3
<p>RGB images in TUM RGB-D dataset. (<b>a</b>) Frame 690. (<b>b</b>) Frame 765. The red boxes indicates the chair being moved. This is often semantically static but actually moving.</p>
Full article ">Figure 4
<p>The feature point p on the keyframe projected onto the current frame is p’, and O and O’ are the two frames corresponding to the optical center of the camera, respectively. (<b>a</b>) Feature point p’ is static (<math display="inline"><semantics> <mrow> <msup> <mi>d</mi> <mo>′</mo> </msup> <mo>=</mo> <msub> <mi>d</mi> <mrow> <mi>p</mi> <mi>r</mi> <mi>o</mi> <mi>j</mi> </mrow> </msub> </mrow> </semantics></math>). (<b>b</b>) Feature point p’ is dynamic (<math display="inline"><semantics> <mrow> <msup> <mi>d</mi> <mo>′</mo> </msup> <mo>≪</mo> <msub> <mi>d</mi> <mrow> <mi>p</mi> <mi>r</mi> <mi>o</mi> <mi>j</mi> </mrow> </msub> </mrow> </semantics></math>).</p>
Full article ">Figure 5
<p>Image frame comparison between Dyna-SLAM and DyGS-SLAM (Ours) after walking_halfsphere sequence repair in TUM RGB-D dataset. The red boxes show how different methods compare the results of fixing the same frame.</p>
Full article ">Figure 6
<p>Camera trajectory estimated by ORB-SLAM3 and DyGS-SLAM (Ours) on the TUM dataset, and the differences with ground truth values.</p>
Full article ">Figure 7
<p>Comparison of mapping effects between NICE-SLAM, SplaTAM, and DyGS-SLAM (Ours) on walking_xyz sequence.</p>
Full article ">Figure 8
<p>Detailed comparison of the original reconstructed scene provided by Bonn and the scene reconstructed by our method. The red boxes indicate the details of the different methods to reconstruct the scene. (<b>a</b>) Original reconstructed scene provided by Bonn. (<b>b</b>–<b>d</b>) Details of the reconstructed scene of our method.</p>
Full article ">Figure 9
<p>Comparison of reconstruction performance between SplaTAM and DyGS-SLAM (Ours) on Bonn dataset. Our method demonstrates better reconstruction quality. (<b>a</b>) SplaTAM. (<b>b</b>) DyGS-SLAM.</p>
Full article ">Figure 10
<p>Comparison of mapping effects between NICE-SLAM, SplaTAM, and DyGS-SLAM on Replica dataset. The red boxes indicate the details of the different methods to reconstruct the scene. Our method also has excellent reconstruction quality in static scenes. (<b>a</b>) NICE-SLAM. (<b>b</b>) SplaTAM. (<b>c</b>) DyGS-SLAM. (<b>d</b>) GT.</p>
Full article ">Figure 11
<p>Experimental results in real scenarios. The red boxes indicates the recovery of the static background during reconstruction (<b>a</b>) Input image. (<b>b</b>) Segmentation. (<b>c</b>) Background repair. (<b>d</b>) Novel view synthesis.</p>
Full article ">Figure 12
<p>Effect of background inpainting or not on DyGS-SLAM scene reconstruction. The red boxes indicate the reconstruction effects of different methods. (<b>a</b>) Reconstruction w/o background inpainting. (<b>b</b>) Reconstruction w/background inpainting.</p>
Full article ">
15 pages, 3120 KiB  
Article
Implementation of Visual Odometry on Jetson Nano
by Jakub Krško, Dušan Nemec, Vojtech Šimák and Mário Michálik
Sensors 2025, 25(4), 1025; https://doi.org/10.3390/s25041025 - 9 Feb 2025
Viewed by 461
Abstract
This paper presents the implementation of ORB-SLAM3 for visual odometry on a low-power ARM-based system, specifically the Jetson Nano, to track a robot’s movement using RGB-D cameras. Key challenges addressed include the selection of compatible software libraries, camera calibration, and system optimization. The [...] Read more.
This paper presents the implementation of ORB-SLAM3 for visual odometry on a low-power ARM-based system, specifically the Jetson Nano, to track a robot’s movement using RGB-D cameras. Key challenges addressed include the selection of compatible software libraries, camera calibration, and system optimization. The ORB-SLAM3 algorithm was adapted for the ARM architecture and tested using both the EuRoC dataset and real-world scenarios involving a mobile robot. The testing demonstrated that ORB-SLAM3 provides accurate localization, with errors in path estimation ranging from 3 to 11 cm when using the EuRoC dataset. Real-world tests on a mobile robot revealed discrepancies primarily due to encoder drift and environmental factors such as lighting and texture. The paper discusses strategies for mitigating these errors, including enhanced calibration and the potential use of encoder data for tracking when camera performance falters. Future improvements focus on refining the calibration process, adding trajectory correction mechanisms, and integrating visual odometry data more effectively into broader systems. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

Figure 1
<p>Chessboard pattern with highlighted corners [<a href="#B1-sensors-25-01025" class="html-bibr">1</a>].</p>
Full article ">Figure 2
<p>EuRoC dataset sample [<a href="#B20-sensors-25-01025" class="html-bibr">20</a>].</p>
Full article ">Figure 3
<p>Detail of difference between estimated and ground truth trajectories [<a href="#B1-sensors-25-01025" class="html-bibr">1</a>].</p>
Full article ">Figure 4
<p>Comparison between robot and camera trajectories [<a href="#B1-sensors-25-01025" class="html-bibr">1</a>].</p>
Full article ">Figure 5
<p>ORB-SLAM3 GUI [<a href="#B1-sensors-25-01025" class="html-bibr">1</a>].</p>
Full article ">Figure 6
<p>State-chart for implemented algorithm [<a href="#B1-sensors-25-01025" class="html-bibr">1</a>].</p>
Full article ">
17 pages, 15387 KiB  
Article
Improving 3D Reconstruction Through RGB-D Sensor Noise Modeling
by Fahira Afzal Maken, Sundaram Muthu, Chuong Nguyen, Changming Sun, Jinguang Tong, Shan Wang, Russell Tsuchida, David Howard, Simon Dunstall and Lars Petersson
Sensors 2025, 25(3), 950; https://doi.org/10.3390/s25030950 - 5 Feb 2025
Viewed by 459
Abstract
High-resolution RGB-D sensors are widely used in computer vision, manufacturing, and robotics. The depth maps from these sensors have inherently high measurement uncertainty that includes both systematic and non-systematic noise. These noisy depth estimates degrade the quality of scans, resulting in less accurate [...] Read more.
High-resolution RGB-D sensors are widely used in computer vision, manufacturing, and robotics. The depth maps from these sensors have inherently high measurement uncertainty that includes both systematic and non-systematic noise. These noisy depth estimates degrade the quality of scans, resulting in less accurate 3D reconstruction, making them unsuitable for some high-precision applications. In this paper, we focus on quantifying the uncertainty in the depth maps of high-resolution RGB-D sensors for the purpose of improving 3D reconstruction accuracy. To this end, we estimate the noise model for a recent high-precision RGB-D structured light sensor called Zivid when mounted on a robot arm. Our proposed noise model takes into account the measurement distance and angle between the sensor and the measured surface. We additionally analyze the effect of background light, exposure time, and the number of captures on the quality of the depth maps obtained. Our noise model seamlessly integrates with well-known classical and modern neural rendering-based algorithms, from KinectFusion to Point-SLAM methods using bilinear interpolation as well as 3D analytical functions. We collect a high-resolution RGB-D dataset and apply our noise model to improve tracking and produce higher-resolution 3D models. Full article
Show Figures

Figure 1

Figure 1
<p>Our noise model, when integrated with KinectFusion [<a href="#B10-sensors-25-00950" class="html-bibr">10</a>], improves the quality of the surface estimation, highlighted in colored boxes (<b>middle</b>) compared with the quality without noise filtering (<b>left</b>). On the right, the noisy depth filtering using our noise model (<b>top right</b>) effectively captures high-resolution details, such as the ridges in the ear (green box) and the background texture (red box), which are absent in the unfiltered version (<b>bottom right</b>).</p>
Full article ">Figure 2
<p>Illustration of Zivid camera noise components at an arbitrary point, <span class="html-italic">P(x, y, z)</span>, measured by the camera. Axial noise, <math display="inline"><semantics> <msub> <mi>σ</mi> <mi>Z</mi> </msub> </semantics></math>, and lateral noise, <math display="inline"><semantics> <msub> <mi>σ</mi> <mi>L</mi> </msub> </semantics></math>, represent the uncertainty of the measured location of point <span class="html-italic">P</span>, along with the z-axis and x- and y-axes, respectively.</p>
Full article ">Figure 3
<p>Our experimental setup for modeling sensor noise includes a robot arm (<b>b</b>), a planar target (<b>c</b>), and a Zivid 2 RGB-D structured light sensor (<b>d</b>), as shown in (<b>a</b>).</p>
Full article ">Figure 4
<p>Axial noise modeling. (<b>b</b>,<b>c</b>) Fitted surface models for axial noise with a (<b>b</b>) 7th order bivariate polynomial, (<b>c</b>) bivariate exponential function plus a 2nd order bivariate polynomial, and (<b>d</b>) bilinearly interpolated image encoding for axial noise values corresponding to (<b>a</b>). The axis labels and titles are provided for illustrative purposes.</p>
Full article ">Figure 5
<p>Pre-processing steps for lateral noise modeling. (<b>a</b>) Depth map with segmented edge of the planar target (marked in red color); Line fitted to pixels corresponding to the edge: at <span class="html-italic">far</span> distance (<b>b</b>), at <span class="html-italic">medium</span> distance (<b>c</b>), and at <span class="html-italic">near</span> distance (<b>d</b>). Rows indicate the angle of rotation of target, varying from 0° (<b>top</b>) to 60° (<b>bottom</b>).</p>
Full article ">Figure 6
<p>Lateral noise against different distances and angles.</p>
Full article ">Figure 7
<p>Variations in axial (<b>top</b>) and lateral noise (<b>bottom</b>) due to lighting conditions (<b>left</b>), number of captures (<b>middle</b>), and exposure time (<b>right</b>) as a function of distances at fixed 0°.</p>
Full article ">Figure 8
<p>The experimental setup for dataset collection. From top to bottom: (<b>a</b>) complete setup with a robot arm, camera, and object placed in a rotating table for 0° and 45° rotation of the object, (<b>b</b>) corresponding RGB and scaled depth images captured, and (<b>c</b>) other object sequences captured—Dragon, Ganesh, Rock, and Dino.</p>
Full article ">Figure 9
<p>Poisson 3D surface reconstructions with point cloud data merged based on pair-wise ICP registration followed by pose graph optimization for (<b>a</b>) Shiva and (<b>b</b>) Gripper.</p>
Full article ">Figure 10
<p>The qualitative results from the Shiva dataset, both top-front and bottom-back views, highlighting improved reconstruction quality (<b>b</b>), in the colored boxes, when noisy depth measurements are filtered using our noise model in the KinectFusion algorithm. This improvement is particularly noticeable compared with the reconstruction without depth filtering (<b>a</b>) and against the traditional reconstruction pipeline (<b>c</b>) discussed in <a href="#sec4dot1-sensors-25-00950" class="html-sec">Section 4.1</a>.</p>
Full article ">Figure 11
<p>Comparison of trajectories for different objects with and without noise filtering. Results demonstrate superior trajectories with noise filtering (green and blue) as compared with those without noise (red). Fitted circle to the trajectory with axial and lateral noise models (blue). The axes are not of the same scale.</p>
Full article ">Figure 12
<p>Qualitative results of integrating our noise model in Point-SLAM with the Shiva dataset. Highlighted boxes show improved reconstruction—Point-SLAM baseline (<b>a</b>), noisy depth filtering using our noise model (<b>b</b>), and depth loss term using our noise model (<b>c</b>). Please zoom in the red box for better visualization.</p>
Full article ">
17 pages, 1609 KiB  
Article
Related Keyframe Optimization Gaussian–Simultaneous Localization and Mapping: A 3D Gaussian Splatting-Based Simultaneous Localization and Mapping with Related Keyframe Optimization
by Xiasheng Ma, Ci Song, Yimin Ji and Shanlin Zhong
Appl. Sci. 2025, 15(3), 1320; https://doi.org/10.3390/app15031320 - 27 Jan 2025
Viewed by 690
Abstract
Simultaneous localization and mapping (SLAM) is the basis for intelligent robots to explore the world. As a promising method for 3D reconstruction, 3D Gaussian splatting (3DGS) integrated with SLAM systems has shown significant potential. However, due to environmental uncertainties, errors in the tracking [...] Read more.
Simultaneous localization and mapping (SLAM) is the basis for intelligent robots to explore the world. As a promising method for 3D reconstruction, 3D Gaussian splatting (3DGS) integrated with SLAM systems has shown significant potential. However, due to environmental uncertainties, errors in the tracking process with 3D Gaussians can negatively impact SLAM systems. This paper introduces a novel dense RGB-D SLAM system based on 3DGS that refines Gaussians through sub-Gaussians in the camera coordinate system. Additionally, we propose an algorithm to select keyframes closely related to the current frame, optimizing the scene map and pose of the current keyframe. This approach effectively enhances both the tracking and mapping performance. Experiments on high-quality synthetic scenes (Replica dataset) and low-quality real-world scenes (TUM-RGBD and ScanNet datasets) demonstrate that our system achieves competitive performance in tracking and mapping. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of RK-SLAM.</p>
Full article ">Figure 2
<p>Sub-Gasussians.</p>
Full article ">Figure 3
<p>Example for related window.</p>
Full article ">Figure 4
<p>Comparison of mapping results in the ablation study. The first row shows the full images, while the second row provides zoomed-in views of the regions with the red box from the first row.</p>
Full article ">Figure 5
<p>The extent of fine-tuning of the keyframe pose in the sliding window <math display="inline"><semantics> <msub> <mi>W</mi> <mi>s</mi> </msub> </semantics></math> with the number of iterations.</p>
Full article ">
22 pages, 4507 KiB  
Article
Visual Target-Driven Robot Crowd Navigation with Limited FOV Using Self-Attention Enhanced Deep Reinforcement Learning
by Yinbei Li, Qingyang Lyu, Jiaqiang Yang, Yasir Salam and Baixiang Wang
Sensors 2025, 25(3), 639; https://doi.org/10.3390/s25030639 - 22 Jan 2025
Viewed by 623
Abstract
Navigating crowded environments poses significant challenges for mobile robots, particularly as traditional Simultaneous Localization and Mapping (SLAM)-based methods often struggle with dynamic and unpredictable settings. This paper proposes a visual target-driven navigation method using self-attention enhanced deep reinforcement learning (DRL) to overcome these [...] Read more.
Navigating crowded environments poses significant challenges for mobile robots, particularly as traditional Simultaneous Localization and Mapping (SLAM)-based methods often struggle with dynamic and unpredictable settings. This paper proposes a visual target-driven navigation method using self-attention enhanced deep reinforcement learning (DRL) to overcome these limitations. The navigation policy is developed based on the Twin-Delayed Deep Deterministic Policy Gradient (TD3) algorithm, enabling efficient obstacle avoidance and target pursuit. We utilize a single RGB-D camera with a limited field of view (FOV) for target detection and surrounding sensing, where environmental features are extracted from depth data via a convolutional neural network (CNN). A self-attention network (SAN) is employed to compensate for the limited FOV, enhancing the robot’s capability of searching for the target when it is temporarily lost. Experimental results show that our method achieves a higher success rate and shorter average target-reaching time in dynamic environments, while offering hardware simplicity, cost-effectiveness, and ease of deployment in real-world applications. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

Figure 1
<p>A scenario of a mobile robot navigating in a crowded environment toward a visual target.</p>
Full article ">Figure 2
<p>System architecture of the proposed visual target-driven robot navigation.</p>
Full article ">Figure 3
<p>Convolution neural network (CNN) to extract features from depth data.</p>
Full article ">Figure 4
<p>Self-attention network to process sequences of past target positions and actions.</p>
Full article ">Figure 5
<p>The collision risk penalty based on the ratio of the number of “dangerous-level” depth pixels to the number of “warning-level” depth pixels.</p>
Full article ">Figure 6
<p>Enhanced TD3 framework for visual target-driven robot crowd navigation with simultaneous CNN and SAN updates using critic loss.</p>
Full article ">Figure 7
<p>Simulation environment for training and testing. The arrows indicate obstacle movements.</p>
Full article ">Figure 8
<p>Trajectories of the robot navigating a 5 m × 5 m simulated space with eight moving cylindrical obstacles. (<b>a</b>–<b>e</b>) illustrate the robot’s trajectory in an environment where the obstacles move in a Brownian pattern, while (<b>a’</b>–<b>e’</b>) illustrate the robot’s trajectory in an environment where the obstacles move in a crossing pattern.</p>
Full article ">Figure 9
<p>Learning curves in the ablation study.</p>
Full article ">Figure 10
<p>Real robot experiment. (<b>a</b>) Real robot with RGBD camera. (<b>b</b>) Experiment environment.</p>
Full article ">Figure 11
<p>Real robot operating in three scenarios. The arrows in (<b>b</b>) and (<b>c</b>) indicate human movements.</p>
Full article ">
21 pages, 4833 KiB  
Article
An Effective 3D Instance Map Reconstruction Method Based on RGBD Images for Indoor Scene
by Heng Wu, Yanjie Liu, Chao Wang and Yanlong Wei
Remote Sens. 2025, 17(1), 139; https://doi.org/10.3390/rs17010139 - 3 Jan 2025
Viewed by 544
Abstract
To enhance the intelligence of robots, constructing accurate object-level instance maps is essential. However, the diversity and clutter of objects in indoor scenes present significant challenges for instance map construction. To tackle this issue, we propose a method for constructing object-level instance maps [...] Read more.
To enhance the intelligence of robots, constructing accurate object-level instance maps is essential. However, the diversity and clutter of objects in indoor scenes present significant challenges for instance map construction. To tackle this issue, we propose a method for constructing object-level instance maps based on RGBD images. First, we utilize the advanced visual odometer ORB-SLAM3 to estimate the poses of image frames and extract keyframes. Next, we perform semantic and geometric segmentation on the color and depth images of these keyframes, respectively, using semantic segmentation to optimize the geometric segmentation results and address inaccuracies in the target segmentation caused by small depth variations. The segmented depth images are then projected into point cloud segments, which are assigned corresponding semantic information. We integrate these point cloud segments into a global voxel map, updating each voxel’s class using color, distance constraints, and Bayesian methods to create an object-level instance map. Finally, we construct an ellipsoids scene from this map to test the robot’s localization capabilities in indoor environments using semantic information. Our experiments demonstrate that this method accurately and robustly constructs the environment, facilitating precise object-level scene segmentation. Furthermore, compared to manually labeled ellipsoidal maps, generating ellipsoidal maps from extracted objects enables accurate global localization. Full article
(This article belongs to the Special Issue 3D Scene Reconstruction, Modeling and Analysis Using Remote Sensing)
Show Figures

Figure 1

Figure 1
<p>Overview of our pipeline. In Object-level Voxel Construction, different colors represent different instance objects.</p>
Full article ">Figure 2
<p>The fusion of semantic segmentation and geometric segmentation. YOLOv11 is used for semantic segmentation of the RGB image (<b>a</b>), and the Mask of the target is obtained (<b>b</b>). The depth image (<b>c</b>) is projected onto 3D space for geometric segmentation (<b>d</b>). Then, the mask extracted from semantic segmentation is compensated for geometric segmentation (<b>e</b>), and finally the segmented regions (<b>f</b>) are obtained.</p>
Full article ">Figure 3
<p>The fusion of point cloud segment between different frames in the 061 sequence of the SceneNN dataset. The point cloud segment labels ➀, ➁, … of Frame_0 are global labels, while the point cloud segment labels “1”, “2”, … of Frame_191 are temporary labels. Global labels and temporary labels are not one-to-one correspondences; they are only used to represent mutual relationships. For example, segment “2” is scored separately with overlapping regions ➀ and ➁. When the scores of “2” and ➁ are higher, label ➁ is assigned to segment “2” and integrated into the global map.</p>
Full article ">Figure 4
<p>Error reconstruction areas under different thresholds. Evaluate the three methods using error thresholds of 2 cm, 4 cm, and 8 cm on the <math display="inline"><semantics> <mrow> <mn>11</mn> <mo>_</mo> <mn>01</mn> </mrow> </semantics></math> sequence of the Scannet dataset. The red area represents error reconstructions that exceed the threshold.</p>
Full article ">Figure 5
<p>Compare the map accuracy of the 11_01 squence in Scannet. Top: Ground truth maps and RGB images from different viewpoints. Bottom: Images observed from three different viewpoints on different maps.</p>
Full article ">Figure 6
<p>Qualitative results on a selected scene from SceneNN.</p>
Full article ">Figure 7
<p>Constructing object-level instance map and ellipsoids scene using Chess sequences in the 7scene dataset. (<b>a</b>) is the mesh map of the scene, (<b>b</b>) is the object-level instance map, (<b>c</b>) is the ellipsoid scene, and (<b>d</b>) shows the extracted objects in the environment.</p>
Full article ">Figure 8
<p>Absolute translation errors for different sequences in the 7Scenes dataset. Zins et al. [<a href="#B35-remotesensing-17-00139" class="html-bibr">35</a>] used manually labeled ellipsoid scenes, while Voxblox++ and ours use ellipsoid scenes constructed from instance objects.</p>
Full article ">Figure 9
<p>Absolute rotation errors for different sequences in the 7Scenes dataset. Zins et al. [<a href="#B35-remotesensing-17-00139" class="html-bibr">35</a>] used manually labeled ellipsoid scenes, while Voxblox++ and ours use ellipsoid scenes constructed from instance objects.</p>
Full article ">Figure 10
<p>Constructing object-level instance map and ellipsoids scene in real world. (<b>a</b>) is the mesh map of the scene, (<b>b</b>) is the object-level instance map, (<b>c</b>) is the ellipsoids scene, and (<b>d</b>) shows the extracted objects in the scene.</p>
Full article ">
24 pages, 31029 KiB  
Article
InCrowd-VI: A Realistic Visual–Inertial Dataset for Evaluating Simultaneous Localization and Mapping in Indoor Pedestrian-Rich Spaces for Human Navigation
by Marziyeh Bamdad, Hans-Peter Hutter and Alireza Darvishy
Sensors 2024, 24(24), 8164; https://doi.org/10.3390/s24248164 - 21 Dec 2024
Viewed by 709
Abstract
Simultaneous localization and mapping (SLAM) techniques can be used to navigate the visually impaired, but the development of robust SLAM solutions for crowded spaces is limited by the lack of realistic datasets. To address this, we introduce InCrowd-VI, a novel visual–inertial dataset specifically [...] Read more.
Simultaneous localization and mapping (SLAM) techniques can be used to navigate the visually impaired, but the development of robust SLAM solutions for crowded spaces is limited by the lack of realistic datasets. To address this, we introduce InCrowd-VI, a novel visual–inertial dataset specifically designed for human navigation in indoor pedestrian-rich environments. Recorded using Meta Aria Project glasses, it captures realistic scenarios without environmental control. InCrowd-VI features 58 sequences totaling a 5 km trajectory length and 1.5 h of recording time, including RGB, stereo images, and IMU measurements. The dataset captures important challenges such as pedestrian occlusions, varying crowd densities, complex layouts, and lighting changes. Ground-truth trajectories, accurate to approximately 2 cm, are provided in the dataset, originating from the Meta Aria project machine perception SLAM service. In addition, a semi-dense 3D point cloud of scenes is provided for each sequence. The evaluation of state-of-the-art visual odometry (VO) and SLAM algorithms on InCrowd-VI revealed severe performance limitations in these realistic scenarios. Under challenging conditions, systems exceeded the required localization accuracy of 0.5 m and the 1% drift threshold, with classical methods showing drift up to 5–10%. While deep learning-based approaches maintained high pose estimation coverage (>90%), they failed to achieve real-time processing speeds necessary for walking pace navigation. These results demonstrate the need and value of a new dataset to advance SLAM research for visually impaired navigation in complex indoor environments. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

Figure 1
<p>Sample of manual measurement process for ground-truth validation. <b>Left</b>: Real-world scene with a landmark floor tile highlighted by pink rectangle. <b>Middle</b>: Full 3D point cloud map of the scene with four adjacent floor tiles marked in blue. <b>Right</b>: Zoomed view of the marked corner of the tiles in the point cloud used for measurement.</p>
Full article ">Figure 2
<p>Correlation between real-world measurements and point-cloud-derived distances in challenging sequences, where state-of-the-art SLAM systems exhibited failure or suboptimal performance. The scatter plot demonstrates a strong linear relationship between real-world and measured distances (in centimeters), with an average error of 2.14 cm, standard deviation of 1.46 cm, and median error of 2.0 cm.</p>
Full article ">Figure 3
<p>Refined 3D reconstruction demonstrating the removal of dynamic pedestrians that initially appeared static relative to the camera on the escalator.</p>
Full article ">Figure 4
<p>Example of image data and corresponding 3D map from a dataset sequence: The top-left image shows the RGB frame, and the top-middle and top-right images represent the left and right images of a stereo pair. The bottom image shows the 3D map of the scene.</p>
Full article ">Figure 5
<p>Distribution of challenges across sequences in the InCrowd-VI dataset, categorized by crowd density levels (High: &gt;10 pedestrians per frame; Medium: 4–10 pedestrians; Low: 1–3 pedestrians; None: no pedestrians). The x-axis represents the different types of challenges, and the y-axis indicates the total number of sequences. Note that the sequences may contain multiple challenges simultaneously.</p>
Full article ">Figure 6
<p>Histogram of trajectory length.</p>
Full article ">Figure 7
<p>Example scenes from the InCrowd-VI dataset demonstrating various challenges: (<b>a</b>) high pedestrian density, (<b>b</b>) varying lighting conditions, (<b>c</b>) texture-poor surfaces, (<b>d</b>) reflective surfaces, (<b>e</b>) narrow aisles, and (<b>f</b>) stairs.</p>
Full article ">Figure 8
<p>ATE comparison of evaluated SLAM systems under challenging conditions, with the x-axis depicting sequences categorized by crowd density: high, medium, low, and none.</p>
Full article ">
19 pages, 21309 KiB  
Article
Real-Time Drivable Region Mapping Using an RGB-D Sensor with Loop Closure Refinement and 3D Semantic Map-Merging
by ChangWan Ha, DongHyun Yang, Gicheol Wang, Sung Chang Kim and HyungGi Jo
Appl. Sci. 2024, 14(24), 11613; https://doi.org/10.3390/app142411613 - 12 Dec 2024
Viewed by 767
Abstract
Drivable region maps, created using a visual sensor, are essential for autonomous navigation because off-the-shelf maps do not reflect contemporary real-world conditions. This study presents a large-scale drivable region mapping system that is capable of capturing large-scale environments in real-time, using a single [...] Read more.
Drivable region maps, created using a visual sensor, are essential for autonomous navigation because off-the-shelf maps do not reflect contemporary real-world conditions. This study presents a large-scale drivable region mapping system that is capable of capturing large-scale environments in real-time, using a single RGB-D sensor. Whereas existing semantic simultaneous localization and mapping (SLAM) methods consider only accurate pose estimation and the registration of semantic information, when loop closure is detected, contemporaneous large-scale spatial semantic maps are generated by refining 3D point clouds and semantic information. When loop closure occurs, our method finds the corresponding keyframe for each semantically labeled point cloud and transforms the point cloud into adjusted positions. Additionally, a map-merging algorithm for semantic maps is proposed to address large-scale environments. Experiments were conducted on the Complex Urban dataset and our custom dataset, which are publicly available, and real-world datasets using a vehicle-mounted sensor. Our method alleviates the drift errors that frequently occur when the agents navigate in large areas. Compared with satellite images, the resulting semantic maps are well aligned and have proven validity in terms of timeliness and accuracy. Full article
Show Figures

Figure 1

Figure 1
<p>Schematic overview of the proposed method. It consists of precise drivable region mapping (<b>left</b>) and map-merging for large-scale mapping (<b>right</b>).</p>
Full article ">Figure 2
<p>(<b>a</b>) RGB image. (<b>b</b>) The resulting image of semantic segmentation. (<b>c</b>) 3D point clouds created from RGB image and depth image. (<b>d</b>) 3D point clouds generated from the result of semantic segmentation and depth image.</p>
Full article ">Figure 3
<p>A schematic example of loop closure refinement and comparison before and after applying refinement to the drivable region map. (<b>a</b>) Schematic diagram before loop closure. Despite being at the same location, misalignment occurs owing to drift. (<b>b</b>) Schematic diagram after loop closure. Loop closure resolves issues caused by drift error and realigns the frame pose and point clouds. (<b>c</b>) Drivable region mapping result before loop closure. White point clouds represent feature points extracted from images, whereas sky blue point clouds denote the drivable region. (<b>d</b>) Drivable region mapping result after loop closure. White point clouds represent feature points extracted from images, whereas sky blue point clouds denote the drivable region.</p>
Full article ">Figure 4
<p>Map merging: (<b>a</b>) Model 1; (<b>b</b>) Model 2; (<b>c</b>) A red square denotes the overlapping region of Model 1; (<b>d</b>) A red square denotes the overlapping region of Model 2; (<b>e</b>) Voxel downsampling; (<b>f</b>) FPFH and find correspondence; (<b>g</b>) Point cloud registration via TEASER++.</p>
Full article ">Figure 5
<p>Satellite image of Jeonbuk National University. Red, blue, green, yellow, pink, sky blue, light pink, and brown represent the paths of map1 to map8, respectively.</p>
Full article ">Figure 6
<p>Urban 28 dataset [<a href="#B55-applsci-14-11613" class="html-bibr">55</a>] satellite image and divided course. The course was manually divided into three sections and displayed on a satellite image. The satellite images were captured via Google Maps. The lines on the satellite images were drawn manually. The trajectories for maps 1, 2, and 3 are shown in blue, red, and green.</p>
Full article ">Figure 7
<p>(<b>a</b>) Drivable region mapping result without loop closure (<b>b</b>) Drivable region mapping result with loop closure.</p>
Full article ">Figure 8
<p>Drivable region mapping results at Jeonbuk National University: (<b>a</b>) map1, (<b>b</b>) map2, (<b>c</b>) map3, (<b>d</b>) map4, (<b>e</b>) map5, (<b>f</b>) map6, (<b>g</b>) map7, (<b>h</b>) map8.</p>
Full article ">Figure 9
<p>Comparison results of the Urban 28 dataset [<a href="#B55-applsci-14-11613" class="html-bibr">55</a>] using the proposed method with the provided ground truth trajectory. The trajectory of each map is manually divided, as shown in <a href="#applsci-14-11613-f006" class="html-fig">Figure 6</a>. The comparison was conducted using the EVO [<a href="#B61-applsci-14-11613" class="html-bibr">61</a>]. The dashed line represents the ground-truth trajectory of Urban Region 28. (<b>a</b>) Map 1 drivable region mapping results. (<b>b</b>) Map 1 trajectory-comparison image. (<b>c</b>) Map 3 drivable region mapping results. (<b>d</b>) Map 3 trajectory-comparison image. (<b>e</b>) Map 2 drivable region mapping results. (<b>f</b>) Map 2 trajectory-comparison image.</p>
Full article ">Figure 10
<p>Map-merging results of Jeonbuk National University. A comparison of map-merging using three different methods is shown in the accompanying image. By zooming in on a portion of the map, we compare the quality on the quality on the degree of overlap in the drivable regions. The more accurately the central circles overlap, the better the result. (<b>a</b>) Map-merging by manually transforming the positions of the eight maps. (<b>b</b>) Map-merging using image similarity and the poses of the eight maps. (<b>c</b>) Map-merging via the proposed method.</p>
Full article ">Figure 11
<p>A large-scale drivable region map created using two methods (a single run without splitting data; the proposed method) is manually overlaid on a satellite image. To ensure a fair comparison, only the drivable regions were retained. (<b>a</b>,<b>b</b>) Each image depicts a large-scale drivable map created from data captured in a single run. Despite using the same dataset in the same environment, the results showed significant inconsistencies. (<b>c</b>) Image of a large-scale drivable map created via the proposed method.</p>
Full article ">Figure 12
<p>Urban 28 dataset [<a href="#B55-applsci-14-11613" class="html-bibr">55</a>] drivable region mapping by two methods (single run without splitting data; proposed method) results. (<b>a</b>) The provided ground-truth point cloud map of Urban 28. (<b>b</b>) The provided ground-truth trajectory of Urban 28. Blue, red, and green represent the trajectories of maps 1, 2, and 3, respectively. (<b>c</b>) Result of the first run of drivable region mapping without splitting the data. The resulting map shows misalignment. (<b>d</b>) Result of the second run of drivable region mapping without splitting the data. The resulting map shows misalignment. (<b>e</b>) A drivable region map produced by dividing the map and merging it with the proposed method. (<b>f</b>) Comparison between the trajectory generated via the proposed method and the ground-truth trajectory via EVO [<a href="#B61-applsci-14-11613" class="html-bibr">61</a>]. The dashed line represents the ground-truth trajectory of Urban 28.</p>
Full article ">
15 pages, 6086 KiB  
Article
Improved Visual SLAM Algorithm Based on Dynamic Scenes
by Jinxing Niu, Ziqi Chen, Tao Zhang and Shiyu Zheng
Appl. Sci. 2024, 14(22), 10727; https://doi.org/10.3390/app142210727 - 20 Nov 2024
Viewed by 892
Abstract
This work presents a novel RGB-D dynamic simultaneous localization and mapping (SLAM) method that improves accuracy, stability, and efficiency of localization while relying on deep learning in a dynamic environment, in contrast to traditional static scene-based visual SLAM methods. Based on the classic [...] Read more.
This work presents a novel RGB-D dynamic simultaneous localization and mapping (SLAM) method that improves accuracy, stability, and efficiency of localization while relying on deep learning in a dynamic environment, in contrast to traditional static scene-based visual SLAM methods. Based on the classic framework of traditional visual SLAM, we propose a method that replaces the traditional feature extraction method with a convolutional neural network approach, aiming to enhance the accuracy of feature extraction and localization, as well as to improve the algorithm’s ability to capture and represent the characteristics of the entire scene. Subsequently, the semantic segmentation thread was utilized in a target detection network combined with geometric methods to identify potential dynamic areas in the image and generate masks for dynamic objects. Finally, the standard deviation of the depth information of potential dynamic points was calculated to identify true dynamic feature points, to guarantee that static feature points were used for position estimation. We performed experiments based on the public datasets to validate the feasibility of the proposed algorithm. The experimental results indicate that the improved SLAM algorithm, which boasts a reduction in absolute trajectory error (ATE) by approximately 97% compared to traditional static visual SLAM and about 20% compared to traditional dynamic visual SLAM, also exhibited a 68% decrease in computation time compared to well-known dynamic visual SLAM, thereby possessing absolute advantages in both positioning accuracy and operational efficiency. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Overview of the enhanced SLAM system. The framework of the algorithm comprises four threads: semantic segmentation, tracking, local mapping, and loop closing.</p>
Full article ">Figure 2
<p>GCNv2 feature extraction network structure with channel numbers listed below each convolutional layer.</p>
Full article ">Figure 3
<p>YOLOv5’s network architecture diagram.</p>
Full article ">Figure 4
<p>(<b>a</b>,<b>b</b>) and (<b>c</b>,<b>d</b>) are the semantic segmentation results based on the modified SLAM. Red indicates the detection boxes from YOLOv5x for object detection, while green represents the extracted feature points.</p>
Full article ">Figure 5
<p>Comparing feature point distribution between ORB and GCNv2, the scenes in figures (<b>a</b>,<b>b</b>) are cluttered with various objects, including computer screens, which made it difficult to obtain features. The images in (<b>c</b>,<b>d</b>) were taken from the corner of a table where the camera was moving, resulting in significant changes in viewpoint.</p>
Full article ">Figure 6
<p>Comparing the ATE of the improved SLAM, ORB-SLAM2 across five dynamic scene sequences from the fr3 dataset, (<b>a</b>–<b>e</b>) represent the trajectory maps of ORB-SLAM2, while (<b>f</b>–<b>j</b>) represent the trajectory maps of the improved SLAM.</p>
Full article ">Figure 7
<p>Shows the results for the fr3_walking_xyz sequence. Panels (<b>a</b>,<b>b</b>) illustrate the estimated trajectories compared to the ground truth, as well as the errors along the x, y, and z axes for ORB-SLAM2 and the improved SLAM. Panel (<b>c</b>) displays the time consumption for each method.</p>
Full article ">
32 pages, 11087 KiB  
Article
Path Planning and Motion Control of Robot Dog Through Rough Terrain Based on Vision Navigation
by Tianxiang Chen, Yipeng Huangfu, Sutthiphong Srigrarom and Boo Cheong Khoo
Sensors 2024, 24(22), 7306; https://doi.org/10.3390/s24227306 - 15 Nov 2024
Viewed by 2090
Abstract
This article delineates the enhancement of an autonomous navigation and obstacle avoidance system for a quadruped robot dog. Part one of this paper presents the integration of a sophisticated multi-level dynamic control framework, utilizing Model Predictive Control (MPC) and Whole-Body Control (WBC) from [...] Read more.
This article delineates the enhancement of an autonomous navigation and obstacle avoidance system for a quadruped robot dog. Part one of this paper presents the integration of a sophisticated multi-level dynamic control framework, utilizing Model Predictive Control (MPC) and Whole-Body Control (WBC) from MIT Cheetah. The system employs an Intel RealSense D435i depth camera for depth vision-based navigation, which enables high-fidelity 3D environmental mapping and real-time path planning. A significant innovation is the customization of the EGO-Planner to optimize trajectory planning in dynamically changing terrains, coupled with the implementation of a multi-body dynamics model that significantly improves the robot’s stability and maneuverability across various surfaces. The experimental results show that the RGB-D system exhibits superior velocity stability and trajectory accuracy to the SLAM system, with a 20% reduction in the cumulative velocity error and a 10% improvement in path tracking precision. The experimental results also show that the RGB-D system achieves smoother navigation, requiring 15% fewer iterations for path planning, and a 30% faster success rate recovery in challenging environments. The successful application of these technologies in simulated urban disaster scenarios suggests promising future applications in emergency response and complex urban environments. Part two of this paper presents the development of a robust path planning algorithm for a robot dog on a rough terrain based on attached binocular vision navigation. We use a commercial-of-the-shelf (COTS) robot dog. An optical CCD binocular vision dynamic tracking system is used to provide environment information. Likewise, the pose and posture of the robot dog are obtained from the robot’s own sensors, and a kinematics model is established. Then, a binocular vision tracking method is developed to determine the optimal path, provide a proposal (commands to actuators) of the position and posture of the bionic robot, and achieve stable motion on tough terrains. The terrain is assumed to be a gentle uneven terrain to begin with and subsequently proceeds to a more rough surface. This work consists of four steps: (1) pose and position data are acquired from the robot dog’s own inertial sensors, (2) terrain and environment information is input from onboard cameras, (3) information is fused (integrated), and (4) path planning and motion control proposals are made. Ultimately, this work provides a robust framework for future developments in the vision-based navigation and control of quadruped robots, offering potential solutions for navigating complex and dynamic terrains. Full article
Show Figures

Figure 1

Figure 1
<p>Simplified box model of the Lite3P quadruped robotic dog.</p>
Full article ">Figure 2
<p>Internal sensor arrangement of the quadruped robotic dog.</p>
Full article ">Figure 3
<p>Dynamic control flowchart.</p>
Full article ">Figure 4
<p>MPC flowchart.</p>
Full article ">Figure 5
<p>WBC flowchart [<a href="#B30-sensors-24-07306" class="html-bibr">30</a>].</p>
Full article ">Figure 6
<p>Robot coordinates and joint point settings [<a href="#B30-sensors-24-07306" class="html-bibr">30</a>].</p>
Full article ">Figure 7
<p>Intel D435i and velodyne LIDAR.</p>
Full article ">Figure 8
<p>ICP diagram.</p>
Full article ">Figure 9
<p>Comparison of before and after modifying the perception region.</p>
Full article ">Figure 10
<p>Point cloud processing flowchart.</p>
Full article ">Figure 11
<p>{p, v} generation: (<b>a</b>) the creation of {p, v} pairs for collision points; (<b>b</b>) the process of generating anchor points and repulsive vectors for dynamic obstacle avoidance [<a href="#B41-sensors-24-07306" class="html-bibr">41</a>].</p>
Full article ">Figure 12
<p>Overall framework of 2D EGO-Planner.</p>
Full article ">Figure 13
<p>Robot initialization and control process in Gazebo simulation: (<b>a</b>) Gazebo environment creation, (<b>b</b>) robot model import, (<b>c</b>) torque balance mode activation, and (<b>d</b>) robot stepping and rotation in simulation.</p>
Full article ">Figure 14
<p>Joint rotational angles of FL and RL legs.</p>
Full article ">Figure 15
<p>Joint angular velocities of FL and RL legs.</p>
Full article ">Figure 16
<p>Torque applied to FL and RL joints during the gait cycle.</p>
Full article ">Figure 17
<p>The robot navigating in a simple environment using a camera.</p>
Full article ">Figure 18
<p>The robot navigating in a complex environment using a camera.</p>
Full article ">Figure 19
<p>A 2D trajectory showing start and goal positions, obstacles, and rough path.</p>
Full article ">Figure 20
<p>Initial environment setup.</p>
Full article ">Figure 21
<p>The robot starts navigating in a simple environment with a static obstacle (brown box).</p>
Full article ">Figure 22
<p>Dynamic Obstacle 1 introduced: the robot detects a new obstacle and recalculates its path.</p>
Full article ">Figure 23
<p>Dynamic Obstacle 2 introduced: after avoiding the first obstacle, a second obstacle is introduced and detected by the planner.</p>
Full article ">Figure 24
<p>Approaching the target: the robot adjusts its path to approach the target point as the distance shortens.</p>
Full article ">Figure 25
<p>Reaching the target: the robot completes its path and reaches the designated target point.</p>
Full article ">Figure 26
<p>Real-time B-spline trajectory updates in response to dynamic obstacles. Set 1 (orange) shows the initial path avoiding static obstacles. When the first dynamic obstacle is detected, the EGO-Planner updates the path (Set 2, blue) using local optimization. A second obstacle prompts another adjustment (Set 3, green), guiding the robot smoothly towards the target as trajectory updates become more frequent.</p>
Full article ">Figure 27
<p>The robot navigating a simple environment using SLAM.</p>
Full article ">Figure 28
<p>The robot navigating a complex environment using SLAM.</p>
Full article ">Figure 29
<p>A 2D trajectory showing start and goal positions, obstacles, and the planned path in a complex environment using SLAM.</p>
Full article ">Figure 30
<p>Navigation based on RGB-D camera.</p>
Full article ">Figure 31
<p>Navigation based on SLAM.</p>
Full article ">Figure 32
<p>Velocity deviation based on RGB-D camera.</p>
Full article ">Figure 33
<p>Velocity deviation based on SLAM.</p>
Full article ">Figure 34
<p>Cumulative average iterations.</p>
Full article ">Figure 35
<p>Cumulative success rate.</p>
Full article ">
26 pages, 3132 KiB  
Article
A Novel Fuzzy Image-Based UAV Landing Using RGBD Data and Visual SLAM
by Shayan Sepahvand, Niloufar Amiri, Houman Masnavi, Iraj Mantegh and Farrokh Janabi-Sharifi
Drones 2024, 8(10), 594; https://doi.org/10.3390/drones8100594 - 18 Oct 2024
Viewed by 1227
Abstract
In this work, an innovative perception-guided approach is proposed for landing zone detection and realization of Unmanned Aerial Vehicles (UAVs) operating in unstructured environments ridden with obstacles. To accommodate secure landing, two well-established tools, namely fuzzy systems and visual Simultaneous Localization and Mapping [...] Read more.
In this work, an innovative perception-guided approach is proposed for landing zone detection and realization of Unmanned Aerial Vehicles (UAVs) operating in unstructured environments ridden with obstacles. To accommodate secure landing, two well-established tools, namely fuzzy systems and visual Simultaneous Localization and Mapping (vSLAM), are implemented into the landing pipeline. Firstly, colored images and point clouds acquired by a visual sensory device are processed to serve as characterizing maps that acquire information about flatness, steepness, inclination, and depth variation. By leveraging these images, a novel fuzzy map infers the areas for risk-free landing on which the UAV can safely land. Subsequently, the vSLAM system is employed to estimate the platform’s pose and an additional set of point clouds. The vSLAM point clouds presented in the corresponding keyframe are projected back onto the image plane on which a threshold fuzzy landing score map is applied. In other words, this binary image serves as a mask for the re-projected vSLAM world points to identify the best subset for landing. Once these image points are identified, their corresponding world points are located, and among them, the center of the cluster with the largest area is chosen as the point to land. Depending on the UAV’s size, four synthesis points are added to the vSLAM point cloud to execute the image-based visual servoing landing using image moment features. The effectiveness of the landing package is assessed through the ROS Gazebo simulation environment, where comparisons are made with a state-of-the-art landing site detection method. Full article
Show Figures

Figure 1

Figure 1
<p>The proposed landing system structure. The major subsystems are highlighted. The blocks highlighted in green represent the inputs. The yellow area corresponds to the vSLAM system. The red section illustrates the fuzzy map construction and postprocessing of vSLAM keypoints. The area highlighted in purple shows the IBVS landing subsystem. The black arrows show the data direction.</p>
Full article ">Figure 2
<p>The coordinate frames and image planes. The blue coordinate frame indicates the body-fixed frame, the red represents the virtual camera frame, the green shows the real camera frame, and the black represents the inertial frame. The black dots are world points, while the green and red dots illustrate the corresponding projected points on the real and virtual camera image planes, respectively.</p>
Full article ">Figure 3
<p>The proposed control system block diagram includes two major subsystems: IBVS Control and Landing Zone Detection, highlighted in light green and blue, respectively.</p>
Full article ">Figure 4
<p>The created Gazebo worlds along with the Parrot Bebop 2 drone encircled with a red circle. (<b>a</b>) The parking lot simulation environment; (<b>b</b>) the parking lot environment from a secondary viewpoint; (<b>c</b>) The playground environment; (<b>d</b>) the office simulation environment.</p>
Full article ">Figure 5
<p>The first scenario: terrain characterizing maps.</p>
Full article ">Figure 6
<p>The first scenario: the landing score maps and SLAM-based images. (<b>A</b>) The proposed method; (<b>B</b>) state-of-the-art method. In the heat maps, the black areas represent potential safe landing zones. The red, blue, and pink stars indicate the SLAM keypoints, the selected keypoints, and the flattest subset, respectively.</p>
Full article ">Figure 7
<p>The first scenario: Image-plane feature motion, where circles show the initial feature position and the stars indicate the desired position. The trace of each of the features is distinguished by using four different colors. (<b>a</b>) The fuzzy map; (<b>b</b>) the Bayesian method.</p>
Full article ">Figure 8
<p>The first scenario: the feature error development.</p>
Full article ">Figure 9
<p>The second scenario: terrain characterizing maps with the initial absolute position (−3, 12, 9).</p>
Full article ">Figure 10
<p>The second scenario: the landing score maps and SLAM-based images. (<b>A</b>) The proposed method; (<b>B</b>) state-of-the-art method. In the heat maps, the black areas represent potential safe landing zones. The red, blue, and pink stars indicate the SLAM keypoints, the selected keypoints, and the flattest subset, respectively.</p>
Full article ">Figure 11
<p>The second scenario: image-plane feature motion, where circles show the initial feature position and the stars indicate the desired position. The trace of each of the features is distinguished by using four different colors. (<b>a</b>) The fuzzy map; (<b>b</b>) the Bayesian method.</p>
Full article ">Figure 12
<p>The second scenario: the feature error development.</p>
Full article ">Figure 13
<p>The third scenario: terrain characterizing maps with the initial absolute position (0, 0, 7).</p>
Full article ">Figure 14
<p>The third scenario: the landing score maps and SLAM-based images. (<b>A</b>) The proposed method; (<b>B</b>) state-of-the-art method. In the heat maps, the black areas represent potential safe landing zones. The red, blue, and pink stars indicate the SLAM keypoints, the selected keypoints, and the flattest subset, respectively.</p>
Full article ">Figure 15
<p>The third scenario: image-plane feature motion, where circles show the initial feature position and the stars indicate the desired position. The trace of each of the features is distinguished by using four different colors. (<b>a</b>) The fuzzy map; (<b>b</b>) the Bayesian method.</p>
Full article ">Figure 16
<p>The third scenario: the feature error development.</p>
Full article ">Figure 17
<p>The fourth scenario: terrain characterizing maps with the initial absolute position (0, 0, 10).</p>
Full article ">Figure 18
<p>The fourth scenario: the landing score maps and SLAM-based images. (<b>A</b>) The proposed method; (<b>B</b>) state-of-the-art method. In the heat maps, the black areas represent potential safe landing zones. The red, blue, and pink stars indicate the SLAM keypoints, the selected keypoints, and the flattest subset, respectively.</p>
Full article ">Figure 19
<p>The fourth scenario: image-plane feature motion, where circles show the initial feature position and the stars indicate the desired position. The trace of each of the features is distinguished by using four different colors. (<b>a</b>) The fuzzy map; (<b>b</b>) the Bayesian method.</p>
Full article ">Figure 20
<p>The fourth scenario: the feature error development.</p>
Full article ">Figure 21
<p>The grassy ground. The color bar applies to sub-figures (<b>b</b>–<b>f</b>).</p>
Full article ">Figure 22
<p>The hard ground. The color bar applies to sub-figures (<b>b</b>–<b>f</b>).</p>
Full article ">Figure 23
<p>The sandy ground. The color bar applies to sub-figures (<b>b</b>–<b>f</b>).</p>
Full article ">
20 pages, 6262 KiB  
Article
YPR-SLAM: A SLAM System Combining Object Detection and Geometric Constraints for Dynamic Scenes
by Xukang Kan, Gefei Shi, Xuerong Yang and Xinwei Hu
Sensors 2024, 24(20), 6576; https://doi.org/10.3390/s24206576 - 12 Oct 2024
Viewed by 895
Abstract
Traditional SLAM systems assume a static environment, but moving objects break this ideal assumption. In the real world, moving objects can greatly influence the precision of image matching and camera pose estimation. In order to solve these problems, the YPR-SLAM system is proposed. [...] Read more.
Traditional SLAM systems assume a static environment, but moving objects break this ideal assumption. In the real world, moving objects can greatly influence the precision of image matching and camera pose estimation. In order to solve these problems, the YPR-SLAM system is proposed. First of all, the system includes a lightweight YOLOv5 detection network for detecting both dynamic and static objects, which provides pre-dynamic object information to the SLAM system. Secondly, utilizing the prior information of dynamic targets and the depth image, a method of geometric constraint for removing motion feature points from the depth image is proposed. The Depth-PROSAC algorithm is used to differentiate the dynamic and static feature points so that dynamic feature points can be removed. At last, the dense cloud map is constructed by the static feature points. The YPR-SLAM system is an efficient combination of object detection and geometry constraint in a tightly coupled way, eliminating motion feature points and minimizing their adverse effects on SLAM systems. The performance of the YPR-SLAM was assessed on the public TUM RGB-D dataset, and it was found that YPR-SLAM was suitable for dynamic situations. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>Framework of YPR-SLAM System. The blue section is ORB-SLAM2, and the orange section is the addition of this paper.</p>
Full article ">Figure 2
<p>The YOLOv5 network architecture.</p>
Full article ">Figure 3
<p>Dynamic target detection and filtering thread. First, the ORB feature point is extracted from the RGB image by the tracking thread. Next, the dynamic target detection thread identifies potential dynamic target areas, and then the Depth-PROSAC algorithm is applied to filter out dynamic feature points. Finally, the static feature points are retained for subsequent pose estimation.</p>
Full article ">Figure 4
<p>The comparison between target detection algorithms and the Depth-PROSAC algorithm in filtering out dynamic feature points. (<b>a</b>) shows that the object detection method directly filters out dynamic feature points, and (<b>b</b>) shows that the Depth-PROSAC algorithm filters out dynamic feature points.</p>
Full article ">Figure 5
<p>Dense point cloud construction workflow.</p>
Full article ">Figure 6
<p>In the fr3_walking_halfsphere sequence, the YPR-SLAM and ORB-SLAM2 systems were used to estimate the 3D motion of the camera. (<b>a</b>) Camera path estimated by ORB-SLAM2; (<b>b</b>) YPR-SLAM estimation of camera trajectory.</p>
Full article ">Figure 7
<p><span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> of the ORB-SLAM2 system and the YPR-SLAM system under different datasets. (<b>a1</b>,<b>a2</b>,<b>c1</b>,<b>c2</b>,<b>e1</b>,<b>e2</b>,<b>g1</b>,<b>g2</b>) represent ATE and RPE obtained by the ORB-SLAM2 system by running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>b1</b>,<b>b2</b>,<b>d1</b>,<b>d2</b>,<b>f1</b>,<b>f2</b>,<b>h1</b>,<b>h2</b>) represent <span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> plots of the YPR-SLAM system running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>a1</b>,<b>b1</b>,<b>c1</b>,<b>d1</b>,<b>e1</b>,<b>f1</b>,<b>g1</b>,<b>h1</b>) represent ATE plots. (<b>a2</b>,<b>b2</b>,<b>c2</b>,<b>d2</b>,<b>e2</b>,<b>f2</b>,<b>g2</b>,<b>h2</b>) represent <span class="html-italic">RPE</span> plots.</p>
Full article ">Figure 7 Cont.
<p><span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> of the ORB-SLAM2 system and the YPR-SLAM system under different datasets. (<b>a1</b>,<b>a2</b>,<b>c1</b>,<b>c2</b>,<b>e1</b>,<b>e2</b>,<b>g1</b>,<b>g2</b>) represent ATE and RPE obtained by the ORB-SLAM2 system by running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>b1</b>,<b>b2</b>,<b>d1</b>,<b>d2</b>,<b>f1</b>,<b>f2</b>,<b>h1</b>,<b>h2</b>) represent <span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> plots of the YPR-SLAM system running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>a1</b>,<b>b1</b>,<b>c1</b>,<b>d1</b>,<b>e1</b>,<b>f1</b>,<b>g1</b>,<b>h1</b>) represent ATE plots. (<b>a2</b>,<b>b2</b>,<b>c2</b>,<b>d2</b>,<b>e2</b>,<b>f2</b>,<b>g2</b>,<b>h2</b>) represent <span class="html-italic">RPE</span> plots.</p>
Full article ">Figure 7 Cont.
<p><span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> of the ORB-SLAM2 system and the YPR-SLAM system under different datasets. (<b>a1</b>,<b>a2</b>,<b>c1</b>,<b>c2</b>,<b>e1</b>,<b>e2</b>,<b>g1</b>,<b>g2</b>) represent ATE and RPE obtained by the ORB-SLAM2 system by running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>b1</b>,<b>b2</b>,<b>d1</b>,<b>d2</b>,<b>f1</b>,<b>f2</b>,<b>h1</b>,<b>h2</b>) represent <span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> plots of the YPR-SLAM system running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>a1</b>,<b>b1</b>,<b>c1</b>,<b>d1</b>,<b>e1</b>,<b>f1</b>,<b>g1</b>,<b>h1</b>) represent ATE plots. (<b>a2</b>,<b>b2</b>,<b>c2</b>,<b>d2</b>,<b>e2</b>,<b>f2</b>,<b>g2</b>,<b>h2</b>) represent <span class="html-italic">RPE</span> plots.</p>
Full article ">Figure 8
<p>Using ORB-SLAM2 and YPR-SLAM to construct dense 3D point cloud map in dynamic scene sequence fre3_walking_xyz. (<b>a</b>) represents a dense 3D point cloud map constructed by the ORB-SLAM2 system; (<b>b</b>) represents a dense 3D point cloud map constructed by the YPR-SLAM system.</p>
Full article ">
23 pages, 9746 KiB  
Article
Research on SLAM Localization Algorithm for Orchard Dynamic Vision Based on YOLOD-SLAM2
by Zhen Ma, Siyuan Yang, Jingbin Li and Jiangtao Qi
Agriculture 2024, 14(9), 1622; https://doi.org/10.3390/agriculture14091622 - 16 Sep 2024
Cited by 1 | Viewed by 1108
Abstract
With the development of agriculture, the complexity and dynamism of orchard environments pose challenges to the perception and positioning of inter-row environments for agricultural vehicles. This paper proposes a method for extracting navigation lines and measuring pedestrian obstacles. The improved YOLOv5 algorithm is [...] Read more.
With the development of agriculture, the complexity and dynamism of orchard environments pose challenges to the perception and positioning of inter-row environments for agricultural vehicles. This paper proposes a method for extracting navigation lines and measuring pedestrian obstacles. The improved YOLOv5 algorithm is used to detect tree trunks between left and right rows in orchards. The experimental results show that the average angle deviation of the extracted navigation lines was less than 5 degrees, verifying its accuracy. Due to the variable posture of pedestrians and ineffective camera depth, a distance measurement algorithm based on a four-zone depth comparison is proposed for pedestrian obstacle distance measurement. Experimental results showed that within a range of 6 m, the average relative error of distance measurement did not exceed 1%, and within a range of 9 m, the maximum relative error was 2.03%. The average distance measurement time was 30 ms, which could accurately and quickly achieve pedestrian distance measurement in orchard environments. On the publicly available TUM RGB-D dynamic dataset, YOLOD-SLAM2 significantly reduced the RMSE index of absolute trajectory error compared to the ORB-SLAM2 algorithm, which was less than 0.05 m/s. In actual orchard environments, YOLOD-SLAM2 had a higher degree of agreement between the estimated trajectory and the true trajectory when the vehicle was traveling in straight and circular directions. The RMSE index of the absolute trajectory error was less than 0.03 m/s, and the average tracking time was 47 ms, indicating that the YOLOD-SLAM2 algorithm proposed in this paper could meet the accuracy and real-time requirements of agricultural vehicle positioning in orchard environments. Full article
(This article belongs to the Section Agricultural Technology)
Show Figures

Figure 1

Figure 1
<p>Pinhole Camera Model.</p>
Full article ">Figure 2
<p>Pixel Coordinate System.</p>
Full article ">Figure 3
<p>YOLOv5s algorithm network structure.</p>
Full article ">Figure 4
<p>Overall principle block diagram of ORB-SLAM2.</p>
Full article ">Figure 5
<p>Algorithm flowchart.</p>
Full article ">Figure 6
<p>Reference point of tree trunk.</p>
Full article ">Figure 7
<p>Problem diagram of extracting navigation lines in Scenario 1.</p>
Full article ">Figure 8
<p>Problem diagram of extracting navigation lines in scenario 2.</p>
Full article ">Figure 9
<p>Navigation line extraction diagram.</p>
Full article ">Figure 10
<p>Navigation line extraction in different orchard scenes.</p>
Full article ">Figure 11
<p>Schematic diagram of evaluation indicators.</p>
Full article ">Figure 12
<p>Angle deviation.</p>
Full article ">Figure 13
<p>Schematic diagram of algorithm flow.</p>
Full article ">Figure 14
<p>Pedestrian Distance Measurement Experiment.</p>
Full article ">Figure 14 Cont.
<p>Pedestrian Distance Measurement Experiment.</p>
Full article ">Figure 15
<p>YOLOD-SLAM2 Algorithm Framework.</p>
Full article ">Figure 16
<p>Left image and depth map of binocular camera.</p>
Full article ">Figure 17
<p>Geometric Constraints of Polar Lines.</p>
Full article ">Figure 18
<p>Comparison of Algorithm Trajectory.</p>
Full article ">Figure 19
<p>Actual Orchard Environment Trajectory Map.</p>
Full article ">Figure 20
<p>Trajectory diagram of the straight-line driving in x, y, and z directions.</p>
Full article ">Figure 21
<p>Trajectory diagram of circular driving in the x, y, and z directions.</p>
Full article ">
18 pages, 5473 KiB  
Article
Visual-Inertial RGB-D SLAM with Encoder Integration of ORB Triangulation and Depth Measurement Uncertainties
by Zhan-Wu Ma and Wan-Sheng Cheng
Sensors 2024, 24(18), 5964; https://doi.org/10.3390/s24185964 - 14 Sep 2024
Cited by 2 | Viewed by 1351
Abstract
In recent years, the accuracy of visual SLAM (Simultaneous Localization and Mapping) technology has seen significant improvements, making it a prominent area of research. However, within the current RGB-D SLAM systems, the estimation of 3D positions of feature points primarily relies on direct [...] Read more.
In recent years, the accuracy of visual SLAM (Simultaneous Localization and Mapping) technology has seen significant improvements, making it a prominent area of research. However, within the current RGB-D SLAM systems, the estimation of 3D positions of feature points primarily relies on direct measurements from RGB-D depth cameras, which inherently contain measurement errors. Moreover, the potential of triangulation-based estimation for ORB (Oriented FAST and Rotated BRIEF) feature points remains underutilized. To address the singularity of measurement data, this paper proposes the integration of the ORB features, triangulation uncertainty estimation and depth measurements uncertainty estimation, for 3D positions of feature points. This integration is achieved using a CI (Covariance Intersection) filter, referred to as the CI-TEDM (Triangulation Estimates and Depth Measurements) method. Vision-based SLAM systems face significant challenges, particularly in environments, such as long straight corridors, weakly textured scenes, or during rapid motion, where tracking failures are common. To enhance the stability of visual SLAM, this paper introduces an improved CI-TEDM method by incorporating wheel encoder data. The mathematical model of the encoder is proposed, and detailed derivations of the encoder pre-integration model and error model are provided. Building on these improvements, we propose a novel tightly coupled visual-inertial RGB-D SLAM with encoder integration of ORB triangulation and depth measurement uncertainties. Validation on open-source datasets and real-world environments demonstrates that the proposed improvements significantly enhance the robustness of real-time state estimation and localization accuracy for intelligent vehicles in challenging environments. Full article
Show Figures

Figure 1

Figure 1
<p>The system framework diagram. The system framework diagram consists of three main modules: input, function, and output.</p>
Full article ">Figure 2
<p>An example diagram of reprojection error. Feature matching indicates that points <math display="inline"><semantics> <mrow> <mi>p</mi> <mn>1</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>p</mi> <mn>2</mn> </mrow> </semantics></math> are projections of the same spatial point <math display="inline"><semantics> <mi>p</mi> </semantics></math>, but the camera pose is initially unknown. Initially, there is a certain distance between the projected point, <math display="inline"><semantics> <mrow> <mover> <mi>p</mi> <mo>∧</mo> </mover> <mn>2</mn> </mrow> </semantics></math>, of <math display="inline"><semantics> <mi>P</mi> </semantics></math> and the actual point, <math display="inline"><semantics> <mrow> <mi>p</mi> <mn>2</mn> </mrow> </semantics></math>. The camera pose is then adjusted to minimize this distance.</p>
Full article ">Figure 3
<p>The motion model of the wheeled robot using wheel encoders. The figure illustrates the motion model of a mobile robot using wheel encoders in a 2D plane. The model describes the robot’s trajectory between its position at time <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mi>k</mi> </msub> </mrow> </semantics></math>, denoted as <math display="inline"><semantics> <mrow> <mfenced> <mrow> <msub> <mi>x</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>y</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </mrow> </mfenced> </mrow> </semantics></math>, and its position at time <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math>, denoted as <math display="inline"><semantics> <mrow> <mfenced> <mrow> <msub> <mi>x</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>y</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </mrow> </mfenced> </mrow> </semantics></math>.</p>
Full article ">Figure 4
<p>The process of running datasets in the VEOS3-TEDM algorithm: (<b>a</b>) corridor scene and (<b>b</b>) laboratory scene. The blue frames represent keyframes, red frames represent initial keyframes, and green frames represent current frames.</p>
Full article ">Figure 5
<p>The process of tracking datasets in the VEOS3-TEDM algorithm: (<b>a</b>) corridor scene and (<b>b</b>) laboratory scene. The green boxes in the figure represent key feature points detected by VEOS3-TEDM algorithm.</p>
Full article ">Figure 6
<p>The comparison between estimated and true trajectory in the VEOS3-TEDM algorithm: (<b>a</b>) corridor scene and (<b>b</b>) laboratory scene.</p>
Full article ">Figure 7
<p>The comparison between true and estimated trajectories in <span class="html-italic">x</span>, <span class="html-italic">y</span> and <span class="html-italic">z</span> directions, using the VEOS3-TEDM algorithm: (<b>a</b>) corridor scene and (<b>b</b>) laboratory scene.</p>
Full article ">Figure 8
<p>3D point cloud maps: (<b>a</b>) corridor scene and (<b>b</b>) laboratory scene.</p>
Full article ">Figure 9
<p>Images of the experimental platform: (<b>a</b>) front view and (<b>b</b>) left view.</p>
Full article ">Figure 10
<p>The location of various components on the mobile robot: (<b>a</b>) bottom level and (<b>b</b>) upper level.</p>
Full article ">Figure 11
<p>The process of tracking real-world environments in the VEOS3-TEDM algorithm: (<b>a1</b>,<b>a2</b>) laboratory, (<b>b1</b>,<b>b2</b>) hall, (<b>c1</b>,<b>c2</b>) weak texture scene, (<b>d1</b>,<b>d2</b>) long straight corridor. The green boxes in the figure represent key feature points detected by VEOS3-TEDM algorithm.</p>
Full article ">Figure 12
<p>A comparison of estimated and true trajectories in real-world environments using the VEOS3-TEDM algorithm.</p>
Full article ">
29 pages, 9403 KiB  
Article
DIO-SLAM: A Dynamic RGB-D SLAM Method Combining Instance Segmentation and Optical Flow
by Lang He, Shiyun Li, Junting Qiu and Chenhaomin Zhang
Sensors 2024, 24(18), 5929; https://doi.org/10.3390/s24185929 - 12 Sep 2024
Viewed by 1378
Abstract
Feature points from moving objects can negatively impact the accuracy of Visual Simultaneous Localization and Mapping (VSLAM) algorithms, while detection or semantic segmentation-based VSLAM approaches often fail to accurately determine the true motion state of objects. To address this challenge, this paper introduces [...] Read more.
Feature points from moving objects can negatively impact the accuracy of Visual Simultaneous Localization and Mapping (VSLAM) algorithms, while detection or semantic segmentation-based VSLAM approaches often fail to accurately determine the true motion state of objects. To address this challenge, this paper introduces DIO-SLAM: Dynamic Instance Optical Flow SLAM, a VSLAM system specifically designed for dynamic environments. Initially, the detection thread employs YOLACT (You Only Look At CoefficienTs) to distinguish between rigid and non-rigid objects within the scene. Subsequently, the optical flow thread estimates optical flow and introduces a novel approach to capture the optical flow of moving objects by leveraging optical flow residuals. Following this, an optical flow consistency method is implemented to assess the dynamic nature of rigid object mask regions, classifying them as either moving or stationary rigid objects. To mitigate errors caused by missed detections or motion blur, a motion frame propagation method is employed. Lastly, a dense mapping thread is incorporated to filter out non-rigid objects using semantic information, track the point clouds of rigid objects, reconstruct the static background, and store the resulting map in an octree format. Experimental results demonstrate that the proposed method surpasses current mainstream dynamic VSLAM techniques in both localization accuracy and real-time performance. Full article
(This article belongs to the Special Issue Sensors and Algorithms for 3D Visual Analysis and SLAM)
Show Figures

Figure 1

Figure 1
<p>Performance of the traditional ORB-SLAM3 algorithm in highly dynamic environments. (<b>a</b>) Image of the highly dynamic scene. (<b>b</b>) Feature point extraction in the highly dynamic scene, where yellow boxes indicate moving objects and dynamic feature points are marked in red. (<b>c</b>) Comparison between the estimated camera pose and the ground truth camera pose. (<b>d</b>) Reconstruction results of dense mapping.</p>
Full article ">Figure 2
<p>The overall system framework of DIO-SLAM. Key innovations are highlighted in red font, while the original ORB-SLAM3 framework is represented by unfilled boxes. (<b>a</b>) Detection thread, represented by green boxes. (<b>b</b>) Optical flow thread, represented by blue boxes. (<b>c</b>) Dynamic feature point filtering module, which is composed of both the detection and optical flow threads. (<b>d</b>) Independent dense mapping thread.</p>
Full article ">Figure 3
<p>Instance segmentation results and non-rigid object mask extraction. (<b>a</b>) RGB frame used for segmentation. (<b>b</b>) Instance segmentation output.</p>
Full article ">Figure 4
<p>Separation of non-rigid and rigid object masks based on semantic information.</p>
Full article ">Figure 5
<p>Optical flow network inputs and output. (<b>a</b>) Frame <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>−</mo> <mn>1</mn> </mrow> </semantics></math>. (<b>b</b>) Frame <math display="inline"><semantics> <mi>n</mi> </semantics></math>. (<b>c</b>) Dense optical flow.</p>
Full article ">Figure 6
<p>Optical flow changes between adjacent frames.</p>
Full article ">Figure 7
<p>Iterative removal of camera self-motion flow using optical flow residuals. (<b>a</b>) Original dense optical flow. (<b>b</b>) Number of iterations = 5. (<b>c</b>) Number of iterations = 7. (<b>d</b>) Number of iterations = 9.</p>
Full article ">Figure 8
<p>Optical flow consistency for determining the moving rigid object region.</p>
Full article ">Figure 9
<p>Motion frame propagation.</p>
Full article ">Figure 10
<p>Effect of dynamic feature point removal. The colored areas in the figure depict the optical flow of moving rigid objects, while the green areas indicate the final extracted feature points. The feature points of non-rigid objects, such as the human body, are removed in all scenes. (<b>a</b>,<b>b</b>) A chair is being dragged, with its feature points being removed. (<b>c</b>,<b>d</b>) Hitting a balloon, where the feature points on the balloon are removed. (<b>e</b>) The box is stationary, and the feature points are normally extracted. (<b>f</b>) The box is being moved, with its feature points removed. (<b>g</b>,<b>h</b>) The box is put down, and its feature points are restored.</p>
Full article ">Figure 11
<p>Absolute trajectory error and relative pose error of fr3_walkingx_xyz.</p>
Full article ">Figure 12
<p>Absolute trajectory error and relative pose error of fr3_walking_static.</p>
Full article ">Figure 13
<p>Absolute trajectory error and relative pose error of fr3_walking_rpy.</p>
Full article ">Figure 14
<p>Absolute trajectory error and relative pose error of fr3_walking_halfsphere.</p>
Full article ">Figure 15
<p>Absolute trajectory error and relative pose error of fr3_sitting_static.</p>
Full article ">Figure 16
<p>Dense point cloud reconstruction. (<b>a</b>) RGB frame, dense point cloud, and octree map of the fr3_walking_xyz sequence. (<b>b</b>) RGB frame, dense point cloud, and octree map of the moving_nonobstructing_box sequence.</p>
Full article ">Figure 16 Cont.
<p>Dense point cloud reconstruction. (<b>a</b>) RGB frame, dense point cloud, and octree map of the fr3_walking_xyz sequence. (<b>b</b>) RGB frame, dense point cloud, and octree map of the moving_nonobstructing_box sequence.</p>
Full article ">Figure 17
<p>Point cloud error heatmaps. (<b>a</b>) kt0 sequence. (<b>b</b>) kt1 sequence. (<b>c</b>) kt2 sequence. (<b>d</b>) kt3 sequence.</p>
Full article ">Figure 18
<p>Real-world scenario test results. (<b>a</b>) Color images. (<b>b</b>) Depth images. (<b>c</b>) Optical flow of moving objects. (<b>d</b>) Moving rigid object masks. (<b>e</b>) Feature points in traditional ORB-SLAM3. (<b>f</b>) Feature points in DIO-SLAM.</p>
Full article ">
Back to TopTop