Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (254)

Search Parameters:
Keywords = dense image matching

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 13698 KiB  
Article
Self-Supervised Foundation Model for Template Matching
by Anton Hristov, Dimo Dimov and Maria Nisheva-Pavlova
Big Data Cogn. Comput. 2025, 9(2), 38; https://doi.org/10.3390/bdcc9020038 - 11 Feb 2025
Abstract
Finding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations [...] Read more.
Finding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations in the textures, different modalities, and weak visual features exist in the images, leading to limited applications on real-world tasks. We introduce Self-Supervised Foundation Model for Template Matching (Self-TM), a novel end-to-end approach to self-supervised learning template matching. The idea behind Self-TM is to learn hierarchical features incorporating localization properties from images without any annotations. As going deeper in the convolutional neural network (CNN) layers, their filters begin to react to more complex structures and their receptive fields increase. This leads to loss of localization information in contrast to the early layers. The hierarchical propagation of the last layers back to the first layer results in precise template localization. Due to its zero-shot generalization capabilities on tasks such as image retrieval, dense template matching, and sparse image matching, our pre-trained model can be classified as a foundation one. Full article
(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)
Show Figures

Figure 1

Figure 1
<p>Illustration of Self-TM.</p>
Full article ">Figure 2
<p>Illustration of a receptive field, <math display="inline"><semantics> <mrow> <mi>R</mi> <msub> <mi>F</mi> <mrow> <mi>p</mi> <mi>r</mi> <mi>e</mi> <mi>d</mi> <mo>_</mo> <msub> <mi>p</mi> <mi>N</mi> </msub> </mrow> </msub> </mrow> </semantics></math>, in layer <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>−</mo> <mn>1</mn> </mrow> </semantics></math> (in orange) of a detected maximum value, <math display="inline"><semantics> <mrow> <mi>p</mi> <mi>r</mi> <mi>e</mi> <mi>d</mi> <mo>_</mo> <msub> <mi>p</mi> <mi>N</mi> </msub> </mrow> </semantics></math>, in layer <math display="inline"><semantics> <mi>N</mi> </semantics></math> (in red).</p>
Full article ">Figure 3
<p>Visual representation of results on Hpatches (values, excluding those for Self-TM, are taken from Twin-Net [<a href="#B61-BDCC-09-00038" class="html-bibr">61</a>]): (<b>a</b>) patch verification task; (<b>b</b>) image matching task; (<b>c</b>) patch retrieval task. The methods are grouped into the following groups: “handcrafted”, which were manually created by their authors; “supervised”, which used annotated data for their training; “self-supervised”, which did not use any annotations. A plus (+) denotes Self-TM models that are finetuned on the Hpatches dataset, and similarly (*) denotes variations of Tfear models.</p>
Full article ">Figure 3 Cont.
<p>Visual representation of results on Hpatches (values, excluding those for Self-TM, are taken from Twin-Net [<a href="#B61-BDCC-09-00038" class="html-bibr">61</a>]): (<b>a</b>) patch verification task; (<b>b</b>) image matching task; (<b>c</b>) patch retrieval task. The methods are grouped into the following groups: “handcrafted”, which were manually created by their authors; “supervised”, which used annotated data for their training; “self-supervised”, which did not use any annotations. A plus (+) denotes Self-TM models that are finetuned on the Hpatches dataset, and similarly (*) denotes variations of Tfear models.</p>
Full article ">Figure 4
<p>Comparison of OmniGlue [<a href="#B34-BDCC-09-00038" class="html-bibr">34</a>] (<b>a</b>) and OmniGlue + Self-TM Base (<b>b</b>) in finding keypoint matches in an image with out-of-training-domain modality. For the purpose of visualization, matches with high “confidence” are not visualized to make the errors visible. The correct matches are shown in green color, respectively the incorrect matches in red color.</p>
Full article ">
24 pages, 8492 KiB  
Article
Conditional Generative Adversarial Networks and Deep Learning Data Augmentation: A Multi-Perspective Data-Driven Survey Across Multiple Application Fields and Classification Architectures
by Lucas C. Ribas, Wallace Casaca and Ricardo T. Fares
AI 2025, 6(2), 32; https://doi.org/10.3390/ai6020032 - 7 Feb 2025
Abstract
Effectively training deep learning models relies heavily on large datasets, as insufficient instances can hinder model generalization. A simple yet effective way to address this is by applying modern deep learning augmentation methods, as they synthesize new data matching the input distribution while [...] Read more.
Effectively training deep learning models relies heavily on large datasets, as insufficient instances can hinder model generalization. A simple yet effective way to address this is by applying modern deep learning augmentation methods, as they synthesize new data matching the input distribution while preserving the semantic content. While these methods produce realistic samples, important issues persist concerning how well they generalize across different classification architectures and their overall impact in accuracy improvement. Furthermore, the relationship between dataset size and model accuracy, as well as the determination of an optimal augmentation level, remains an open question in the field. Aiming to address these challenges, in this paper, we investigate the effectiveness of eight data augmentation methods—StyleGAN3, DCGAN, SAGAN, RandAugment, Random Erasing, AutoAugment, TrivialAugment and AugMix—throughout several classification networks of varying depth: ResNet18, ConvNeXt-Nano, DenseNet121 and InceptionResNetV2. By comparing their performance on diverse datasets from leaf textures, medical imaging and remote sensing, we assess which methods offer superior accuracy and generalization capability in training models with no pre-trained weights. Our findings indicate that deep learning data augmentation is an effective tool for dealing with small datasets, achieving accuracy gains of up to 17%. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

Figure 1
<p>Example of rotation geometric transformation. (<b>a</b>) Rotation does preserve label. (<b>b</b>) Rotation does not preserve label.</p>
Full article ">Figure 2
<p>Six samples from the 1200Tex, Kather, and Brazilian Coffee Scenes (BCSs) data collections, with each column representing a sample and each row corresponding to a dataset.</p>
Full article ">Figure 3
<p>Schematization of proposed data augmentation training and testing pipeline.</p>
Full article ">Figure 4
<p>Samples generated by all evaluated data augmentation approaches.</p>
Full article ">Figure 5
<p>The t-SNE projection of the original training sets for the 1200Tex, Kather and Brazilian Coffee Scenes datasets.</p>
Full article ">Figure 6
<p>The t-SNE projections of all compared augmentation approaches for the 1200Tex dataset.</p>
Full article ">Figure 7
<p>The t-SNE projections of all compared augmentation approaches for the Kather dataset.</p>
Full article ">Figure 8
<p>The t-SNE projections of all compared augmentation approaches for the Brazilian Coffee Scenes dataset.</p>
Full article ">
14 pages, 3344 KiB  
Article
Robot-Based Procedure for 3D Reconstruction of Abdominal Organs Using the Iterative Closest Point and Pose Graph Algorithms
by Birthe Göbel, Jonas Huurdeman, Alexander Reiterer and Knut Möller
J. Imaging 2025, 11(2), 44; https://doi.org/10.3390/jimaging11020044 - 5 Feb 2025
Abstract
Image-based 3D reconstruction enables robot-assisted interventions and image-guided navigation, which are emerging technologies in laparoscopy. When a robotic arm guides a laparoscope for image acquisition, hand–eye calibration is required to know the transformation between the camera and the robot flange. The calibration procedure [...] Read more.
Image-based 3D reconstruction enables robot-assisted interventions and image-guided navigation, which are emerging technologies in laparoscopy. When a robotic arm guides a laparoscope for image acquisition, hand–eye calibration is required to know the transformation between the camera and the robot flange. The calibration procedure is complex and must be conducted after each intervention (when the laparoscope is dismounted for cleaning). In the field, the surgeons and their assistants cannot be expected to do so. Thus, our approach is a procedure for a robot-based multi-view 3D reconstruction without hand–eye calibration, but with pose optimization algorithms instead. In this work, a robotic arm and a stereo laparoscope build the experimental setup. The procedure includes the stereo matching algorithm Semi Global Matching from OpenCV for depth measurement and the multiscale color iterative closest point algorithm from Open3D (v0.19), along with the multiway registration algorithm using a pose graph from Open3D (v0.19) for pose optimization. The procedure is evaluated quantitatively and qualitatively on ex vivo organs. The results are a low root mean squared error (1.1–3.37 mm) and dense point clouds. The proposed procedure leads to a plausible 3D model, and there is no need for complex hand–eye calibration, as this step can be compensated for by pose optimization algorithms. Full article
(This article belongs to the Special Issue Geometry Reconstruction from Images (2nd Edition))
Show Figures

Figure 1

Figure 1
<p>Schematic overview of the 3D reconstruction method stereoscopy. It shows the surface to be reconstructed, with point P in blue, the laparoscope tip with two image sensors generating the estimated depth (black arrow), and the estimated camera positions (x marks and dotted line).</p>
Full article ">Figure 2
<p>Schematic overview of the experimental setup. The TipCam Rubina 1S 30° is held by the UR5 CB3 robotic arm (Universal Robots A/S, Odense, Denmark). The video laparoscope is equipped with a stereo camera system with chip-on-the-tip technology, which is angled at 30° and has an 80° FOV.</p>
Full article ">Figure 3
<p>Visualization of the architecture of the robot-based 3D reconstruction procedure.</p>
Full article ">Figure 4
<p>Photography of pig organs (<b>left</b>) and screenshot of the reconstructed point cloud created by our approach (<b>right</b>).</p>
Full article ">Figure 5
<p>Screenshot of a section of the reconstructed point cloud in front view (<b>left</b>) and side view (<b>right</b>).</p>
Full article ">Figure 6
<p>Six example images from the examined dataset (always left images) (<b>left</b>) and the corresponding depth maps with nan values in white, closer objects in more yellowish tones, and objects further away in more blueish colors (<b>right</b>).</p>
Full article ">Figure 7
<p>Screenshots of the pig organ point clouds (<b>left</b>) and an excerpt with a focus on the gallbladder (<b>right</b>) created by three different approaches: camera position estimation only by robot kinematics (Rob only) (<b>top</b>), by robot kinematics + ICP (Rob + ICP) (<b>middle</b>), by robot kinematics + ICP + pose graphs (Rob +ICP + pose graph) (<b>bottom</b>).</p>
Full article ">Figure 8
<p>Screenshot of the ground truth point cloud (<b>left</b>) and the reconstructed point cloud by our approach (<b>middle</b>) with markers A0–A7 and R0–R7. The markers are used for point cloud alignment to compute the reconstruction error as RMSE. Screenshot of the overlaid ground truth in black and the reconstructed point cloud in colors to compute the reconstruction error (<b>right</b>).</p>
Full article ">
24 pages, 6629 KiB  
Article
UnDER: Unsupervised Dense Point Cloud Extraction Routine for UAV Imagery Using Deep Learning
by John Ray Bergado and Francesco Nex
Remote Sens. 2025, 17(1), 24; https://doi.org/10.3390/rs17010024 - 25 Dec 2024
Viewed by 479
Abstract
Extraction of dense 3D geographic information from ultra-high-resolution unmanned aerial vehicle (UAV) imagery unlocks a great number of mapping and monitoring applications. This is facilitated by a step called dense image matching, which tries to find pixels corresponding to the same object within [...] Read more.
Extraction of dense 3D geographic information from ultra-high-resolution unmanned aerial vehicle (UAV) imagery unlocks a great number of mapping and monitoring applications. This is facilitated by a step called dense image matching, which tries to find pixels corresponding to the same object within overlapping images captured by the UAV from different locations. Recent developments in deep learning utilize deep convolutional networks to perform this dense pixel correspondence task. A common theme in these developments is to train the network in a supervised setting using available dense 3D reference datasets. However, in this work we propose a novel unsupervised dense point cloud extraction routine for UAV imagery, called UnDER. We propose a novel disparity-shifting procedure to enable the use of a stereo matching network pretrained on an entirely different typology of image data in the disparity-estimation step of UnDER. Unlike previously proposed disparity-shifting techniques for forming cost volumes, the goal of our procedure was to address the domain shift between the images that the network was pretrained on and the UAV images, by using prior information from the UAV image acquisition. We also developed a procedure for occlusion masking based on disparity consistency checking that uses the disparity image space rather than the object space proposed in a standard 3D reconstruction routine for UAV data. Our benchmarking results demonstrated significant improvements in quantitative performance, reducing the mean cloud-to-cloud distance by approximately 1.8 times the ground sampling distance (GSD) compared to other methods. Full article
Show Figures

Figure 1

Figure 1
<p>An overview of the proposed UnDER framework consisting of three main steps: image rectification, disparity estimation, and triangulation. UnDER accepts the following as an input: undistorted UAV image pairs, camera interior and exterior orientation parameters, a disparity estimation network. UnDER produces, as a final output, a dense point cloud corresponding to the overlapping area of the image pairs.</p>
Full article ">Figure 2
<p>An overview of the parallax attention stereo matching network used in the disparity estimation step of UnDER.</p>
Full article ">Figure 3
<p>Comparison of self-attention and parallax attention. The similarity of the selected pixel (green) to other pixels (in different colors) is measured in the same feature map (self-attention), or in a feature map extracted from a paired right image (parallax attention).</p>
Full article ">Figure 4
<p>Reference figure for defining disparity shifting. It shows the image planes of a stereo pair, a basis depth for deriving the disparity shift, the projection centers of the two cameras, the image points of the left principal point in both image planes, the corresponding object point lying on the basis depth, and the disparity of the left principal point.</p>
Full article ">Figure 5
<p>Reference figure for the disparity consistency check. It shows how the occlusion mask is calculated by comparing output disparity maps by switching the base image in the image pairs. Images <math display="inline"><semantics> <msup> <mi>I</mi> <mo>′</mo> </msup> </semantics></math> and <math display="inline"><semantics> <msup> <mi>I</mi> <mrow> <mo>″</mo> </mrow> </msup> </semantics></math> are correspondingly captured at two different locations of the camera projection center <math display="inline"><semantics> <msup> <mi mathvariant="bold">Z</mi> <mo>′</mo> </msup> </semantics></math> and <math display="inline"><semantics> <msup> <mi mathvariant="bold">Z</mi> <mrow> <mo>″</mo> </mrow> </msup> </semantics></math>, and <span class="html-italic">M</span> is the output mask.</p>
Full article ">Figure 6
<p>Dataset-1 of the UseGeo dataset: full extent of the dataset, a sample undistorted image, and a corresponding subset of the reference LiDAR point cloud (left to right). The area of the sample image is located in the yellow box annotated on the extent of Dataset-1.</p>
Full article ">Figure 7
<p>The UAV-Nunspeet dataset: full extent of the dataset, a sample undistorted image, and a corresponding subset of the point cloud derived from Pix4D. The area of the sample image is located in the yellow box annotated on the extent of the dataset.</p>
Full article ">Figure 8
<p>Subset of the UAV-Zeche-Zollern dataset: the extent of the subset and the corresponding reference Pix4D point cloud.</p>
Full article ">Figure 9
<p>Plot showing the effect of varying the disparity shift ratio (<math display="inline"><semantics> <mi>δ</mi> </semantics></math>) values used in the disparity-estimation step of the point cloud extraction routine. Each solid curve corresponds to a different <math display="inline"><semantics> <mi>δ</mi> </semantics></math> value. The horizontal axis shows the base images used in each multi-stereo pair. The left vertical axis shows the natural logarithm (log) of the mean cloud-to-cloud (C2C) distance: comparing the point cloud extracted from each multi-stereo pair with the reference LiDAR point cloud. The dashed curve shows the mean baseline length of the image pairs used in the multi-stereo. The right vertical axis provides the range of values of the mean baseline length.</p>
Full article ">Figure 10
<p>Plot showing the effect of varying the disparity difference threshold (<math display="inline"><semantics> <mi>ϵ</mi> </semantics></math>) values used in the occlusion-masking step of the point cloud extraction routine. Each curve corresponds to a different <math display="inline"><semantics> <mi>ϵ</mi> </semantics></math> value. The horizontal axis shows the base images used in each multi-stereo pair. The vertical axis shows the natural logarithm (log) of the mean cloud-to-cloud (C2C) distance, comparing the point cloud extracted from each multi-stereo pair with the reference LiDAR point cloud. A zoomed-in portion of the graph is included, to further highlight the differences in the setups with increasing <math display="inline"><semantics> <mi>ϵ</mi> </semantics></math>.</p>
Full article ">Figure 11
<p>Plot showing the effect of using a multi-stereo setup compared to a single-stereo setup in the triangulation step of the point cloud extraction routine. The first solid curve corresponds to the single-stereo setup while the second solid curve corresponds to the multi-stereo setup. The horizontal axis shows the base images used in each single-stereo or multi-stereo pair. The left vertical axis shows the natural logarithm (log) of the mean cloud-to-cloud (C2C) distance, comparing the point cloud extracted from each multi-stereo pair with the reference LiDAR point cloud. The dashed curve shows the mean absolute difference in <math display="inline"><semantics> <mi>κ</mi> </semantics></math> values of the images used in each single-stereo and multi-stereo pair. The right vertical axis displays the range of the mean differences in <math display="inline"><semantics> <mi>κ</mi> </semantics></math> angles.</p>
Full article ">Figure 12
<p>A subset of the UseGeo Dataset-1 showing the UseGeo DIM point cloud and the mean cloud-to-cloud (C2C) distances of UnDER-P and UnDER-FN+FPCfilter (left to right) with respect to the reference LiDAR point cloud. The bottom row shows a zoomed-in portion of the subset from the top row, indicated by the yellow box. All C2C distances greater than 0.1 m are displayed in red, all C2C distances less than 0.02 m are displayed as blue, and everything in between is displayed in a gradient of green to yellow.</p>
Full article ">Figure 13
<p>Histogram of mean C2C distance values of UseGeo DIM, UnDER-P, and UnDER-FN+FPCfilter. Values beyond 0.5 m were truncated for better visualization.</p>
Full article ">
22 pages, 6639 KiB  
Article
Reliable Disparity Estimation Using Multiocular Vision with Adjustable Baseline
by Victor H. Diaz-Ramirez, Martin Gonzalez-Ruiz, Rigoberto Juarez-Salazar and Miguel Cazorla
Sensors 2025, 25(1), 21; https://doi.org/10.3390/s25010021 - 24 Dec 2024
Viewed by 516
Abstract
Accurate estimation of three-dimensional (3D) information from captured images is essential in numerous computer vision applications. Although binocular stereo vision has been extensively investigated for this task, its reliability is conditioned by the baseline between cameras. A larger baseline improves the resolution of [...] Read more.
Accurate estimation of three-dimensional (3D) information from captured images is essential in numerous computer vision applications. Although binocular stereo vision has been extensively investigated for this task, its reliability is conditioned by the baseline between cameras. A larger baseline improves the resolution of disparity estimation but increases the probability of matching errors. This research presents a reliable method for disparity estimation through progressive baseline increases in multiocular vision. First, a robust rectification method for multiocular images is introduced, satisfying epipolar constraints and minimizing induced distortion. This method can improve rectification error by 25% for binocular images and 80% for multiocular images compared to well-known existing methods. Next, a dense disparity map is estimated by stereo matching from the rectified images with the shortest baseline. Afterwards, the disparity map for the subsequent images with an extended baseline is estimated within a short optimized interval, minimizing the probability of matching errors and further error propagation. This process is iterated until the disparity map for the images with the longest baseline is obtained. The proposed method increases disparity estimation accuracy by 20% for multiocular images compared to a similar existing method. The proposed approach enables accurate scene characterization and spatial point computation from disparity maps with improved resolution. The effectiveness of the proposed method is verified through exhaustive evaluations using well-known multiocular image datasets and physical scenes, achieving superior performance over similar existing methods in terms of objective measures. Full article
(This article belongs to the Collection Robotics and 3D Computer Vision)
Show Figures

Figure 1

Figure 1
<p>Optical setup of a multiocular vision system.</p>
Full article ">Figure 2
<p>Block diagram of the proposed PSO-based method for multiocular image rectification.</p>
Full article ">Figure 3
<p>Diagram of the proposed method for disparity estimation with an adjustable baseline.</p>
Full article ">Figure 4
<p>Stereo image rectification results. (<b>a</b>) Unrectified test images. Rectified images obtained using: (<b>b</b>) Fusiello et al. [<a href="#B42-sensors-25-00021" class="html-bibr">42</a>], (<b>c</b>) Juarez-Salazar et al. [<a href="#B27-sensors-25-00021" class="html-bibr">27</a>], (<b>d</b>) DSR [<a href="#B41-sensors-25-00021" class="html-bibr">41</a>], and (<b>e</b>) proposed method.</p>
Full article ">Figure 5
<p>Constructed laboratory platform for experiments. (<b>a</b>) Frontal view of the multiocular camera. (<b>b</b>) Side view of the multiocular camera. (<b>c</b>) Test scene captured by the experimental multiocular platform.</p>
Full article ">Figure 6
<p>Multiocular image rectification results from a real scene captured with the experimental platform shown in <a href="#sensors-25-00021-f005" class="html-fig">Figure 5</a>. (<b>a</b>) Unrectified input images. Rectified images obtained using: (<b>b</b>) Yang et al. [<a href="#B44-sensors-25-00021" class="html-bibr">44</a>] method and (<b>c</b>) proposed method.</p>
Full article ">Figure 7
<p>Disparity estimation results for multiocular images obtained with the proposed approach and the method by Li et al. [<a href="#B15-sensors-25-00021" class="html-bibr">15</a>]. (<b>a</b>) Reference image of the input multiocular image set. (<b>b</b>) Ground truth disparity map of the reference image with the largest baseline. Estimated disparity maps obtained with the proposed method for: (<b>c</b>) Cameras 1 and 5. (<b>d</b>) Cameras 1, 3, and 5. (<b>e</b>) Cameras 1, 2, 3, and 5. (<b>f</b>) All images. (<b>g</b>) Estimated disparity obtained with the method by Li et al. [<a href="#B15-sensors-25-00021" class="html-bibr">15</a>].</p>
Full article ">Figure 8
<p>Three-dimensional reconstruction results obtained with the proposed approach in real scenes captured with the experimental platform shown in <a href="#sensors-25-00021-f005" class="html-fig">Figure 5</a>. (<b>a</b>) Reference images of the captured scenes. (<b>b</b>) Estimated disparity map obtained with the proposed approach between cameras 1 and 4. (<b>c</b>–<b>e</b>) Different perspective views of the reconstructed three-dimensional scenes.</p>
Full article ">Figure 9
<p>Reprojection errors obtained with the estimated intrinsic parameters obtained using the calibration methods: (<b>a</b>) DLT. (<b>b</b>) Distorted pinhole. (<b>c</b>) Zhang’s method.</p>
Full article ">
26 pages, 13651 KiB  
Article
Dense In Situ Underwater 3D Reconstruction by Aggregation of Successive Partial Local Clouds
by Loïca Avanthey and Laurent Beaudoin
Remote Sens. 2024, 16(24), 4737; https://doi.org/10.3390/rs16244737 - 19 Dec 2024
Viewed by 557
Abstract
Assessing the completeness of an underwater 3D reconstruction on-site is crucial as it allows for rescheduling acquisitions, which capture missing data during a mission, avoiding additional costs of a subsequent mission. This assessment needs to rely on a dense point cloud since a [...] Read more.
Assessing the completeness of an underwater 3D reconstruction on-site is crucial as it allows for rescheduling acquisitions, which capture missing data during a mission, avoiding additional costs of a subsequent mission. This assessment needs to rely on a dense point cloud since a sparse cloud lacks detail and a triangulated model can hide gaps. The challenge is to generate a dense cloud with field-deployable tools. Traditional dense reconstruction methods can take several dozen hours on low-capacity systems like laptops or embedded units. To speed up this process, we propose building the dense cloud incrementally within an SfM framework while incorporating data redundancy management to eliminate recalculations and filtering already-processed data. The method evaluates overlap area limits and computes depths by propagating the matching around SeaPoints—the keypoints we design for identifying reliable areas regardless of the quality of the processed underwater images. This produces local partial dense clouds, which are aggregated into a common frame via the SfM pipeline to produce the global dense cloud. Compared to the production of complete dense local clouds, this approach reduces the computation time by about 70% while maintaining a comparable final density. The underlying prospect of this work is to enable real-time completeness estimation directly on board, allowing for the dynamic re-planning of the acquisition trajectory. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of the global dense point clouds constructed from images of the four datasets: the Mermaid Dataset (<b>top left</b>), the Lost Freediver Rock Dataset (<b>top right</b>), the Flying Fortress Dataset (<b>bottom left</b>), and the Landingship Wreck dataset (<b>bottom right</b>).</p>
Full article ">Figure 2
<p>Workflow diagram of a standard incremental SfM framework, with options for incremental dense cloud generation and flexible application of loop closure detection and bundle adjustment based on criteria such as sparsity or exhaustiveness.</p>
Full article ">Figure 3
<p>Diagram of the algorithm for generating a partial local dense point cloud from an image pair selected in the incremental flow to optimize spatial sampling and depth resolution. Its main steps include detecting reliable areas using SeaPoints, assessing the overlap rate and identifying the overlap area based on prior information, and performing dense matching by propagating matches in the vicinity of SeaPoints outside the overlap area to obtain a partial disparity map. The resulting dense points can then be reprojected into the 3D frame, as with sparse points within the SfM framework, to form a partial local dense cloud that is subsequently aligned with previous local clouds.</p>
Full article ">Figure 4
<p>Diagram of the SeaPoint detector algorithm. To begin, we construct a map containing the Harris measurements for each pixel. Next, non-maximum suppression (NMS) is applied on the map to retain only the local maxima within a specified radius. The threshold to select the SeaPoints among these values is then determined through an analysis of the cumulative histogram of the map, aiming to achieve a given range of points. If the currently analyzed histogram bin lacks sufficient granularity (too many values in one bin), the range of values is expanded, generating a new histogram, and the analysis is recursively continued until convergence is achieved. The target interval, which indicates the desired minimum and maximum number of points, must be sufficiently wide to ensure convergence. A target interval with a range of a few hundred points typically guarantees convergence across a wide range of image types. Usually we look for several thousand points on 10 MP images.</p>
Full article ">Figure 5
<p>Example of the histogram during the SeaPoint Detector process for the image of the Lost Freediver Rock dataset in <a href="#remotesensing-16-04737-f006" class="html-fig">Figure 6</a>. (<b>Left</b>) is the first histogram and (<b>center</b>), a zoom on this histogram. There are not enough points accumulated when arriving at bin 98 (2202 points) of the first histogram to be consistent with the minimum of the target interval (2500 points minimum), but we would exceed the maximum of the target interval (3000 points maximum) by taking bin 97 (3793 points). We, therefore, re-explode the contents of bin 97 into a new histogram by a recursive call (<b>right</b>). The algorithms finally converge on 2502 points at bin 186 of the second histogram. Here, the histograms were calculated on 256 bins.</p>
Full article ">Figure 6
<p>SeaPoint Detector examples with a target interval of [2500, 3000]. (<b>Top left</b>): 2502 SeaPoints found on an image of the Lost Freediver Rock dataset in two recursive rounds (threshold adjusted to 38,32% of the max value). (<b>Top right</b>): 2930 SeaPoints found on an image of the Flying Fortress dataset in one round (threshold adjusted to 90,20% of the max value). (<b>Bottom left</b>): 2500 SeaPoints found on an image of the Mermaid statue in two recursive rounds (threshold adjusted to 62.31% of the max value). (<b>Bottom right</b>): 2504 SeaPoints found on an image of the LandingShip Wreck dataset in one round (threshold adjusted to 47.45% of the max value).</p>
Full article ">Figure 7
<p>In green and yellow: visualization of the intrapair matchings used to form the seeds for densifying the matching through propagation and generating the partial local dense clouds (4). In blue: visualization of the interpair matching used to evaluate the overlap rate for selecting the next pair (1), to estimate the relative pose for registering the new local cloud (2), and to automatically exclude the overlap area from the 3D reconstruction of the new local cloud (3).</p>
Full article ">Figure 8
<p>Flowchart illustrating the local statistical filtering applied to SeaPoint directional vectors for distinguishing inliers from outliers: the consistency score of each vector is incremented for each neighboring vector with a similar norm and direction. Vectors with low consistency scores are classified as outliers and are removed, resulting in a refined, robust list of matched SeaPoints.</p>
Full article ">Figure 9
<p>Directional vector flow is a representation of the matching within a single view. A local statistical filtering process, based on neighborhood coherence, is applied: neighboring vectors exhibiting similar norms and directions contribute to the assessment of the studied vector. The greater the number of votes, the more coherent the vector is deemed. The most locally coherent vectors are kept as inliers. In this image, the resulting inliers are represented in blue, while those identified as outliers are marked in red. The latter have a different direction and/or norm from their neighbors (or not enough neighbors to ensure this).</p>
Full article ">Figure 10
<p>Identification of the overlapping area thanks to the establishment of an area of influence around the interpair SeaPoints. The blue circles indicate the influence areas around the interpair matches on a view V (<b>left</b>) and on its subsequent view V + 2 (<b>right</b>), delimitating the overlap area between the two views.</p>
Full article ">Figure 11
<p>Mask on V + 2 given the areas of influence (see <a href="#remotesensing-16-04737-f010" class="html-fig">Figure 10</a>) calculated between V and V + 2 using the interpair SeaPoints. The sum of all white pixels in the mask is used to estimate the overlap rate between V and V + 2 with regard to the total number of pixels in V + 2.</p>
Full article ">Figure 12
<p>Diagram of the algorithm used to densify the matching by propagation around the seeds. In the first iteration, we analyze the neighborhood of a list of seeds (the initial seeds are all SeaPoints). After having selected all the best possible matches, and if they are not too far from their initial seed (this distance can be approximated by the number of iterations, for example), they are added to a new list of seeds. This new list will be studied in a second iteration until there are no more seeds added to the next list (the points did not match or are all already matched with the best score or are too far from the initial SeaPoint).</p>
Full article ">Figure 13
<p>Partial reconstruction: on the (<b>left</b>), the disparity map of the first pair with the SeaPoints in blue, and on the (<b>right</b>), the disparity map obtained for a normal propagation of the second pair (red + green) as well as the partial disparity map (green only) taking into account the exclusion of the overlap area.</p>
Full article ">Figure 14
<p>Diagram of the algorithm that reconstructs a partial local cloud by propagating the matching around the seeds while automatically excluding the overlapping area (and areas without reliable information).</p>
Full article ">Figure 15
<p>Illustration of two types of occlusion: on the (<b>left</b>), intrapair occlusion areas for which local seeds (circled in red in the black areas) have not spread, on the (<b>right</b>) an interpair occlusion area (circled in red) for which there has an absence of SeaPoints matched during the interpair matching (no blue circles).</p>
Full article ">Figure 16
<p>Modified diagram of the algorithm that reconstructs a partial local cloud by propagating the matching around the seeds while automatically excluding the overlapping area to take into account occlusion problems (compared to <a href="#remotesensing-16-04737-f014" class="html-fig">Figure 14</a>, the changes are framed in red).</p>
Full article ">Figure 17
<p>On the (<b>left</b>), the disparity map obtained after a partial propagation excluding entirely the overlap area, and on the (<b>right</b>), the disparity map obtained after a partial propagation taking into account intrapair and interpair occlusions.</p>
Full article ">Figure 18
<p>From left to right: example results of intermatching ORB points, SIFT points, and SeaPoints, each using approximately 3000 keypoints in both images of the interpair (<b>top row</b>), along with their corresponding masks showing influence areas applied around the matches to segment the reliable regions of the overlap area (<b>bottom row</b>).</p>
Full article ">Figure 19
<p>At the (<b>top</b>), the two successive local clouds reconstructed classically (total reconstruction), in the (<b>center</b>), the two successive local clouds, the second of which is partially reconstructed by following our method. (<b>Below</b>), the fusion of the two classic local clouds on the (<b>left</b>) and the fusion of the two partial local clouds on the (<b>right</b>).</p>
Full article ">
20 pages, 15268 KiB  
Article
Automatic Reading and Reporting Weather Information from Surface Fax Charts for Ships Sailing in Actual Northern Pacific and Atlantic Oceans
by Jun Jian, Yingxiang Zhang, Ke Xu and Peter J. Webster
J. Mar. Sci. Eng. 2024, 12(11), 2096; https://doi.org/10.3390/jmse12112096 - 19 Nov 2024
Viewed by 720
Abstract
This study is aimed to improve the intelligence level, efficiency, and accuracy of ship safety and security systems by contributing to the development of marine weather forecasting. The accurate and prompt recognition of weather fax charts is very important for navigation safety. This [...] Read more.
This study is aimed to improve the intelligence level, efficiency, and accuracy of ship safety and security systems by contributing to the development of marine weather forecasting. The accurate and prompt recognition of weather fax charts is very important for navigation safety. This study employed many artificial intelligent (AI) methods including a vectorization approach and target recognition algorithm to automatically detect the severe weather information from Japanese and US weather charts. This enabled the expansion of an existing auto-response marine forecasting system’s applications toward north Pacific and Atlantic Oceans, thus enhancing decision-making capabilities and response measures for sailing ships at actual sea. The OpenCV image processing method and YOLOv5s/YOLO8vn algorithm were utilized to make template matches and locate warning symbols and weather reports from surface weather charts. After these improvements, the average accuracy of the model significantly increased from 0.920 to 0.928, and the detection rate of a single image reached a maximum of 1.2 ms. Additionally, OCR technology was applied to retract texts from weather reports and highlighted the marine areas where dense fog and great wind conditions are likely to occur. Finally, the field tests confirmed that this auto and intelligent system could assist the navigator within 2–3 min and thus greatly enhance the navigation safety in specific areas in the sailing routes with minor text-based communication costs. Full article
(This article belongs to the Special Issue Ship Performance in Actual Seas)
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Weather report and warning symbols in JMA surface weather fax chart retrieved from imocwx.com.(accessed on 14 Feb 2022). (<b>b</b>) Warning symbols and wind barbs in a 48 h surface forecast chart issued by US National Weather Service at 1758 UTC on 13 February 2022.</p>
Full article ">Figure 2
<p>JMA (<b>a</b>) original surface fax weather chart, (<b>b</b>) averaged base chart, (<b>c</b>) after binarization, (<b>d</b>) difference between the original (<b>a</b>) and base chart (<b>c</b>), resulting in a pure weather map.</p>
Full article ">Figure 3
<p>Flow chart of the auto-warning system for JMA charts.</p>
Full article ">Figure 4
<p>YOLOv5s-CBAM(SE) network structure diagram, the parts related with CBAM(SE) are marked in gray.</p>
Full article ">Figure 5
<p>YOLOv8n model structure diagram.</p>
Full article ">Figure 6
<p>Comparison of weather briefing text recognition results.</p>
Full article ">Figure 7
<p>Recognition of warning symbols “hPa”, “GW”, “SW”, and “FOG[W]” from the chart <a href="#jmse-12-02096-f002" class="html-fig">Figure 2</a>d.</p>
Full article ">Figure 8
<p>Comparison of detection results of wind barb (interception).</p>
Full article ">Figure 8 Cont.
<p>Comparison of detection results of wind barb (interception).</p>
Full article ">Figure 9
<p>Training process visualization (<b>a</b>) train-loss and (<b>b</b>) mAP values for original and improved YOLOv5s.</p>
Full article ">Figure 10
<p>(<b>a</b>) JMA charts with warning symbols detected, (<b>b</b>) JMA charts with warning area colored, red and yellow for wind speeds greater than 50 kts and 35–49 kts, green for visibility &lt; 0.3 nm (<b>c</b>) US charts with wind levels colored.</p>
Full article ">Figure 11
<p>Field tests of the auto-warning system (<b>upper</b> and <b>middle</b>) US case (<b>bottom</b>) and JMA case.</p>
Full article ">
24 pages, 14942 KiB  
Article
The Ground-Penetrating Radar Image Matching Method Based on Central Dense Structure Context Features
by Jie Xu, Qifeng Lai, Dongyan Wei, Xinchun Ji, Ge Shen and Hong Yuan
Remote Sens. 2024, 16(22), 4291; https://doi.org/10.3390/rs16224291 - 18 Nov 2024
Cited by 1 | Viewed by 763
Abstract
Subsurface structural distribution can be detected using Ground-Penetrating Radar (GPR). The distribution can be considered as road fingerprints for vehicle positioning. Similar to the principle of visual image matching for localization, the position coordinates of the vehicle can be calculated by matching real-time [...] Read more.
Subsurface structural distribution can be detected using Ground-Penetrating Radar (GPR). The distribution can be considered as road fingerprints for vehicle positioning. Similar to the principle of visual image matching for localization, the position coordinates of the vehicle can be calculated by matching real-time GPR images with pre-constructed reference GPR images. However, GPR images, due to their low resolution, cannot extract well-defined geometric features such as corners and lines. Thus, traditional visual image processing algorithms perform inadequately when applied to GPR image matching. To address this issue, this paper innovatively proposes a GPR image matching and localization method based on a novel feature descriptor, termed as central dense structure context (CDSC) features. The algorithm utilizes the strip-like elements in GPR images to improve the accuracy of GPR image matching. First, a CDSC feature descriptor is designed. By applying threshold segmentation and extremum point extraction to the GPR image, stratified strip-like elements and pseudo-corner points are obtained. The pseudo-corner points are treated as the centers, and the surrounding strip-like elements are described in context to form the GPR feature descriptors. Then, based on the feature description method, feature descriptors for both the real-time image and the reference image are calculated separately. By searching for the nearest matching point pairs and removing erroneous pairs, GPR image matching and localization are achieved. The proposed algorithm was evaluated on datasets collected from urban roads and railway tracks, achieving localization errors of 0.06 m (RMSE) and 1.22 m (RMSE), respectively. Compared to the traditional Speeded Up Robust Features (SURF) visual image matching algorithm, localization errors were reduced by 86.6% and 95.7% in urban road and railway track scenarios, respectively. Full article
(This article belongs to the Special Issue Advanced Ground-Penetrating Radar (GPR) Technologies and Applications)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Vehicle-borne GPR detects underground information.</p>
Full article ">Figure 2
<p>Three data types of ground-penetrating radar. (<b>a</b>) A-scan: the single-channel data signal acquired by GPR. (<b>b</b>) B-scan: data acquired by the GPR antenna through continuous scanning in the direction of movement. (<b>c</b>) C-scan: data composed of multiple B-scans.</p>
Full article ">Figure 3
<p>Stripe and spot features in GPR images.</p>
Full article ">Figure 4
<p>Data were collected from the same road segment at different times and under different weather conditions using GPR. Subfigure (<b>a</b>,<b>b</b>,<b>e</b>,<b>f</b>) are collected in sunny days, and subfigure (<b>c</b>,<b>d</b>) are collected in rainy days.</p>
Full article ">Figure 5
<p>Algorithm flow.</p>
Full article ">Figure 6
<p>Comparison of feature point extraction results. (<b>a</b>) The SITF algorithm was used to extract feature points in GPR images. (<b>b</b>) The ORB algorithm was used to extract feature points in GPR images. (<b>c</b>) The SURF algorithm was used to extract feature points in GPR images.</p>
Full article ">Figure 7
<p>Shape context algorithm feature extraction.</p>
Full article ">Figure 8
<p>The steps of CDSC feature extraction. Descriptors are obtained by leveraging the stripes in the binarized image surrounding the pseudo-corner points. Subgraph (<b>a</b>) is the filtered image, subgraph (<b>b</b>) is the Binarized image, and subgraph (<b>c</b>) is the Center Dense Struct Context Feature.</p>
Full article ">Figure 9
<p>CDSC features were extracted from GPR images collected twice at the same location and compared. The nine feature points on each image on the left correspond to the nine features on the right.</p>
Full article ">Figure 10
<p>Image matching schematic diagram.</p>
Full article ">Figure 11
<p>Comparison of GPR images before and after preprocessing; (<b>a</b>) displays the original output image, (<b>b</b>) shows the filtered image.</p>
Full article ">Figure 12
<p>Urban test trajectory and equipment setup. (<b>a</b>) Urban test trajectory, the yellow line represents the test trajectory. (<b>b</b>) The equipment setup for the road experiment.</p>
Full article ">Figure 13
<p>Train test trajectory and equipment setup. (<b>a</b>) Train test trajectory total length is approximately 6.7 km, the yellow line represents the test trajectory. (<b>b</b>) The equipment setup for the railway experiment.</p>
Full article ">Figure 14
<p>Railway test trajectory positioning error.</p>
Full article ">Figure 15
<p>Comparison of matching results of different methods in the railway test trajectory. (<b>a</b>) Matching GPR image with strong interference; (<b>b</b>) matching GPR image without interference.</p>
Full article ">Figure 16
<p>Railway test trajectory positioning error CDF.</p>
Full article ">Figure 17
<p>Urban test trajectory positioning error.</p>
Full article ">Figure 18
<p>Comparison of matching results of different methods in the urban test trajectory. (<b>a</b>) Matching GPR image with strong interference; (<b>b</b>) matching GPR image without interference.</p>
Full article ">Figure 19
<p>Urban test trajectory positioning error CDF.</p>
Full article ">
18 pages, 2990 KiB  
Article
A GGCM-E Based Semantic Filter and Its Application in VSLAM Systems
by Yuanjie Li, Chunyan Shao and Jiaming Wang
Electronics 2024, 13(22), 4487; https://doi.org/10.3390/electronics13224487 - 15 Nov 2024
Viewed by 481
Abstract
Image matching-based visual simultaneous localization and mapping (vSLAM) extracts low-level pixel features to reconstruct camera trajectories and maps through the epipolar geometry method. However, it fails to achieve correct trajectories and mapping when there are low-quality feature correspondences in several challenging environments. Although [...] Read more.
Image matching-based visual simultaneous localization and mapping (vSLAM) extracts low-level pixel features to reconstruct camera trajectories and maps through the epipolar geometry method. However, it fails to achieve correct trajectories and mapping when there are low-quality feature correspondences in several challenging environments. Although the RANSAC-based framework can enable better results, it is computationally inefficient and unstable in the presence of a large number of outliers. A Faster R-CNN learning-based semantic filter is proposed to explore the semantic information of inliers to remove low-quality correspondences, helping vSLAM localize accurately in our previous work. However, the semantic filter learning method generalizes low precision for low-level and dense texture-rich scenes, leading the semantic filter-based vSLAM to be unstable and have poor geometry estimation. In this paper, a GGCM-E-based semantic filter using YOLOv8 is proposed to address these problems. Firstly, the semantic patches of images are collected from the KITTI dataset, the TUM dataset provided by the Technical University of Munich, and real outdoor scenes. Secondly, the semantic patches are classified by our proposed GGCM-E descriptors to obtain the YOLOv8 neural network training dataset. Finally, several semantic filters for filtering low-level and dense texture-rich scenes are generated and combined into the ORB-SLAM3 system. Extensive experiments show that the semantic filter can detect and classify semantic levels of different scenes effectively, filtering low-level semantic scenes to improve the quality of correspondences, thus achieving accurate and robust trajectory reconstruction and mapping. For the challenging autonomous driving benchmark and real environments, the vSLAM system with respect to the GGCM-E-based semantic filter demonstrates its superiority regarding reducing the 3D position error, such that the absolute trajectory error is reduced by up to approximately 17.44%, showing its promise and good generalization. Full article
(This article belongs to the Special Issue Application of Artificial Intelligence in Robotics)
Show Figures

Figure 1

Figure 1
<p>ORB-SLAM3 framework with the proposed semantic filter module.</p>
Full article ">Figure 2
<p>Framework of the proposed semantic filter approach.</p>
Full article ">Figure 3
<p>Computation of GGCM-E features.</p>
Full article ">Figure 4
<p>Semantic filtering on the KITTI frame.</p>
Full article ">Figure 5
<p>Semantic filtering on our captured outdoor frame.</p>
Full article ">Figure 6
<p>The trajectory of KITTI07 with respect to the ground truth using GGCM-E semantic filter.</p>
Full article ">Figure 7
<p>Comparison of trajectories between the proposed method and ground truth in the KITTI dataset.</p>
Full article ">Figure 8
<p>Comparison on APEs with respect to ground truth of the ORB-SLAM3 and the semantic filter.</p>
Full article ">Figure 9
<p>Dense texture-rich sequences in TUM dataset (DTR sequences).</p>
Full article ">Figure 10
<p>Comparison of camera trajectories in DTR sequences.</p>
Full article ">Figure 11
<p>Comparison of the trajectory with respect to the ground truth of DynaSLAM and GGCM-E+DynaSLAM on KITTI00 sequences.</p>
Full article ">Figure 12
<p>Comparison of the APEs of semantic filter-based Structure-SLAM, LDSO and DynaSLAM on KITTI07 sequences.</p>
Full article ">
21 pages, 12827 KiB  
Article
Research on the Registration of Aerial Images of Cyclobalanopsis Natural Forest Based on Optimized Fast Sample Consensus Point Matching with SIFT Features
by Peng Wu, Hailong Liu, Xiaomei Yi, Lufeng Mo, Guoying Wang and Shuai Ma
Forests 2024, 15(11), 1908; https://doi.org/10.3390/f15111908 - 29 Oct 2024
Viewed by 866
Abstract
The effective management and conservation of forest resources hinge on accurate monitoring. Nonetheless, individual remote-sensing images captured by low-altitude unmanned aerial vehicles (UAVs) fail to encapsulate the entirety of a forest’s characteristics. The application of image-stitching technology to high-resolution drone imagery facilitates a [...] Read more.
The effective management and conservation of forest resources hinge on accurate monitoring. Nonetheless, individual remote-sensing images captured by low-altitude unmanned aerial vehicles (UAVs) fail to encapsulate the entirety of a forest’s characteristics. The application of image-stitching technology to high-resolution drone imagery facilitates a prompt evaluation of forest resources, encompassing quantity, quality, and spatial distribution. This study introduces an improved SIFT algorithm designed to tackle the challenges of low matching rates and prolonged registration times encountered with forest images characterized by dense textures. By implementing the SIFT-OCT (SIFT omitting the initial scale space) approach, the algorithm bypasses the initial scale space, thereby reducing the number of ineffective feature points and augmenting processing efficiency. To bolster the SIFT algorithm’s resilience against rotation and illumination variations, and to furnish supplementary information for registration even when fewer valid feature points are available, a gradient location and orientation histogram (GLOH) descriptor is integrated. For feature matching, the more computationally efficient Manhattan distance is utilized to filter feature points, which further optimizes efficiency. The fast sample consensus (FSC) algorithm is then applied to remove mismatched point pairs, thus refining registration accuracy. This research also investigates the influence of vegetation coverage and image overlap rates on the algorithm’s efficacy, using five sets of Cyclobalanopsis natural forest images. Experimental outcomes reveal that the proposed method significantly reduces registration time by an average of 3.66 times compared to that of SIFT, 1.71 times compared to that of SIFT-OCT, 5.67 times compared to that of PSO-SIFT, and 3.42 times compared to that of KAZE, demonstrating its superior performance. Full article
(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)
Show Figures

Figure 1

Figure 1
<p>Study area map.</p>
Full article ">Figure 2
<p>(<b>a1</b>) Dense forest reference image; (<b>a2</b>) dense forest image to be registered; (<b>b1</b>) non-dense forest reference image; and (<b>b2</b>) non-dense forest image to be registered.</p>
Full article ">Figure 3
<p>(<b>a1</b>) Forest reference image with an overlap rate of 60% to 90%; (<b>a2</b>) forest image to be registered with an overlap rate of 60% to 90%; (<b>b1</b>) forest reference image with an overlap rate of 30% to 60%; (<b>b2</b>) forest image to be registered with an overlap rate of less than 30%; (<b>c1</b>) dense forest reference image; and (<b>c2</b>) dense forest image to be registered.</p>
Full article ">Figure 4
<p>Structure diagram of the optimized SIFT algorithm based on FSC feature screening.</p>
Full article ">Figure 5
<p>Scale-space extreme value detection.</p>
Full article ">Figure 6
<p>GLOH descriptor feature vectors.</p>
Full article ">Figure 7
<p>Sampling example.</p>
Full article ">Figure 8
<p>(<b>a1</b>) Original dense forest reference image; (<b>a2</b>) original dense forest reference image to be registered; (<b>b1</b>) matching point pairs using the SIFT algorithm; (<b>b2</b>) stitched image using the SIFT algorithm; (<b>c1</b>) matching point pairs using the SIFT-OCT algorithm; (<b>c2</b>) stitched image using the SIFT-OCT algorithm; (<b>d1</b>) matching point pairs using the PSO-SIFT algorithm; (<b>d2</b>) stitched image using the PSO-SIFT algorithm; (<b>e1</b>) matching point pairs using the KAZE algorithm; (<b>e2</b>) stitched image using the KAZE algorithm; (<b>f1</b>) matching point pairs using the optimized algorithm; and (<b>f2</b>) stitched image using the optimized algorithm.</p>
Full article ">Figure 9
<p>(<b>a1</b>) Original non-dense forest reference image; (<b>a2</b>) original non-dense forest reference image to be registered; (<b>b1</b>) matching point pairs using the SIFT algorithm; (<b>b2</b>) stitched image using the SIFT algorithm; (<b>c1</b>) matching point pairs using the SIFT-OCT algorithm; (<b>c2</b>) stitched image using the SIFT-OCT algorithm; (<b>d1</b>) matching point pairs using the PSO-SIFT algorithm; (<b>d2</b>) stitched image using the PSO-SIFT algorithm; (<b>e1</b>) matching point pairs using the KAZE algorithm; (<b>e2</b>) stitched image using the KAZE algorithm; (<b>f1</b>) matching point pairs using the optimized algorithm; and (<b>f2</b>) stitched image using the optimized algorithm.</p>
Full article ">Figure 10
<p>(<b>a1</b>) Original forest reference image with an overlap rate of 60% to 90%; (<b>a2</b>) original forest reference image to be registered with an overlap rate of 60% to 90%; (<b>b1</b>) matching point pairs using the SIFT algorithm; (<b>b2</b>) stitched image using the SIFT algorithm; (<b>c1</b>) matching point pairs using the SIFT-OCT algorithm; (<b>c2</b>) stitched image using the SIFT-OCT algorithm; (<b>d1</b>) matching point pairs using the PSO-SIFT algorithm; (<b>d2</b>) stitched image using the PSO-SIFT algorithm; (<b>e1</b>) matching point pairs using the KAZE algorithm; (<b>e2</b>) stitched image using the KAZE algorithm; (<b>f1</b>) matching point pairs using the optimized algorithm; and (<b>f2</b>) stitched image using the optimized algorithm.</p>
Full article ">Figure 11
<p>(<b>a1</b>) Original forest reference image with an overlap rate of 30% to 60%; (<b>a2</b>) original forest reference image to be registered with an overlap rate of 30% to 60%; (<b>b1</b>) matching point pairs using the SIFT algorithm; (<b>b2</b>) stitched image using the SIFT algorithm; (<b>c1</b>) matching point pairs using the SIFT-OCT algorithm; (<b>c2</b>) stitched image using the SIFT-OCT algorithm; (<b>d1</b>) matching point pairs using the PSO-SIFT algorithm; (<b>d2</b>) stitched image using the PSO-SIFT algorithm; (<b>e1</b>) matching point pairs using the KAZE algorithm; (<b>e2</b>) stitched image using the KAZE algorithm; (<b>f1</b>) matching point pairs using the optimized algorithm; and (<b>f2</b>) stitched image using the optimized algorithm.</p>
Full article ">Figure 12
<p>(<b>a1</b>) Original forest reference image with an overlap rate below 30%; (<b>a2</b>) original forest reference image to be registered with an overlap rate below 30%; (<b>b1</b>) matching point pairs using the SIFT algorithm; (<b>b2</b>) stitched image using the SIFT algorithm; (<b>c1</b>) matching point pairs using the SIFT-OCT algorithm; (<b>c2</b>) stitched image using the SIFT-OCT algorithm; (<b>d1</b>) matching point pairs using the PSO-SIFT algorithm; (<b>d2</b>) stitched image using the PSO-SIFT algorithm; (<b>e1</b>) matching point pairs using the KAZE algorithm; (<b>e2</b>) stitched image using the KAZE algorithm; (<b>f1</b>) matching point pairs using the optimized algorithm; and (<b>f2</b>) stitched image using the optimized algorithm.</p>
Full article ">Figure 13
<p>Comparison of matching accuracy of five algorithms on five dataset images.</p>
Full article ">Figure 14
<p>Registration comparison of five algorithms on five dataset images.</p>
Full article ">Figure 15
<p>(<b>a</b>) Reference image; (<b>b</b>) image to be registered; (<b>c</b>) initially screened matched point pairs; and (<b>d</b>) final matched point pairs.</p>
Full article ">Figure 16
<p>(<b>a1</b>) Reference image feature analysis; and (<b>a2</b>) feature analysis of the image to be registered.</p>
Full article ">
22 pages, 10007 KiB  
Article
Deep Learning-Based Emergency Rescue Positioning Technology Using Matching-Map Images
by Juil Jeon, Myungin Ji, Jungho Lee, Kyeong-Soo Han and Youngsu Cho
Remote Sens. 2024, 16(21), 4014; https://doi.org/10.3390/rs16214014 - 29 Oct 2024
Cited by 1 | Viewed by 725
Abstract
Smartphone-based location estimation technology is becoming increasingly important across various fields. Accurate location estimation plays a critical role in life-saving efforts during emergency rescue situations, where rapid response is essential. Traditional methods such as GPS often face limitations in indoors or in densely [...] Read more.
Smartphone-based location estimation technology is becoming increasingly important across various fields. Accurate location estimation plays a critical role in life-saving efforts during emergency rescue situations, where rapid response is essential. Traditional methods such as GPS often face limitations in indoors or in densely built environments, where signals may be obstructed or reflected, leading to inaccuracies. Similarly, fingerprinting-based methods rely heavily on existing infrastructure and exhibit signal variability, making them less reliable in dynamic, real-world conditions. In this study, we analyzed the strengths and weaknesses of different types of wireless signal data and proposed a new deep learning-based method for location estimation that comprehensively integrates these data sources. The core of our research is the introduction of a ‘matching-map image’ conversion technique that efficiently integrates LTE, WiFi, and BLE signals. These generated matching-map images were applied to a deep learning model, enabling highly accurate and stable location estimates even in challenging emergency rescue situations. In real-world experiments, our method, utilizing multi-source data, achieved a positioning success rate of 85.27%, which meets the US FCC’s E911 standards for location accuracy and reliability across various conditions and environments. This makes the proposed approach particularly well-suited for emergency applications, where both accuracy and speed are critical. Full article
Show Figures

Figure 1

Figure 1
<p>Collecting device for Vehicle: (<b>a</b>) 3D model; (<b>b</b>) Attached to the dashboard; (<b>c</b>) Attached to the bicycle.</p>
Full article ">Figure 2
<p>Data collection area and routes: (<b>a</b>) Seocho1-dong (urban); (<b>b</b>) Seocho2-dong (urban); (<b>c</b>) Naegok-dong (suburban); (<b>d</b>) Yeomgok-dong (suburban).</p>
Full article ">Figure 3
<p>Data collection route: Yeomgok-dong.</p>
Full article ">Figure 4
<p>Base data Grid Sample.</p>
Full article ">Figure 5
<p>LTE Matching-map Generation Process.</p>
Full article ">Figure 6
<p>Matching-map Image Generation Process.</p>
Full article ">Figure 7
<p>Matching-map image sample (Yeomgok-dong): (<b>a</b>) label ‘8-4’; (<b>b</b>) label ‘13-15’.</p>
Full article ">Figure 8
<p>Block Diagram of Deep Learning Base Positioning Process.</p>
Full article ">Figure 9
<p>Positioning Test Location: (<b>a</b>) Seocho1-dong; (<b>b</b>) Seocho2-dong; (<b>c</b>) Naegok-dong; (<b>d</b>) Yeomgok-dong.</p>
Full article ">Figure 10
<p>Positioning Test Location Photograph Samples: (<b>a</b>) Test Point 2; (<b>b</b>) Indoor Test Position for Test Point 2; (<b>c</b>) Test Point 20; (<b>d</b>) Indoor Test Position for Test Point 20.</p>
Full article ">Figure 11
<p>CDF graph of results according to positioning method.</p>
Full article ">Figure 12
<p>CDF graph of results according to regional characteristics (Fingerprint).</p>
Full article ">Figure 13
<p>CDF graph of results according to regional characteristics (Matching-map image).</p>
Full article ">Figure 14
<p>CDF graph of results according to data type.</p>
Full article ">Figure 15
<p>CDF graph of results according to the number of data.</p>
Full article ">
17 pages, 3301 KiB  
Article
Stereo and LiDAR Loosely Coupled SLAM Constrained Ground Detection
by Tian Sun, Lei Cheng, Ting Zhang, Xiaoping Yuan, Yanzheng Zhao and Yong Liu
Sensors 2024, 24(21), 6828; https://doi.org/10.3390/s24216828 - 24 Oct 2024
Viewed by 928
Abstract
In many robotic applications, creating a map is crucial, and 3D maps provide a method for estimating the positions of other objects or obstacles. Most of the previous research processes 3D point clouds through projection-based or voxel-based models, but both approaches have certain [...] Read more.
In many robotic applications, creating a map is crucial, and 3D maps provide a method for estimating the positions of other objects or obstacles. Most of the previous research processes 3D point clouds through projection-based or voxel-based models, but both approaches have certain limitations. This paper proposes a hybrid localization and mapping method using stereo vision and LiDAR. Unlike the traditional single-sensor systems, we construct a pose optimization model by matching ground information between LiDAR maps and visual images. We use stereo vision to extract ground information and fuse it with LiDAR tensor voting data to establish coplanarity constraints. Pose optimization is achieved through a graph-based optimization algorithm and a local window optimization method. The proposed method is evaluated using the KITTI dataset and compared against the ORB-SLAM3, F-LOAM, LOAM, and LeGO-LOAM methods. Additionally, we generate 3D point cloud maps for the corresponding sequences and high-definition point cloud maps of the streets in sequence 00. The experimental results demonstrate significant improvements in trajectory accuracy and robustness, enabling the construction of clear, dense 3D maps. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

Figure 1
<p>Pose optimization based on ground information. <span class="html-italic">T</span>, <span class="html-italic">p</span>, and <span class="html-italic">q</span> represent the transformation matrix, points on the plane, and points off the plane, respectively.</p>
Full article ">Figure 2
<p>The stereo sensor model and the coordinate systems used [<a href="#B34-sensors-24-06828" class="html-bibr">34</a>].</p>
Full article ">Figure 3
<p>Region of interest extraction. (<b>a</b>) Left image. (<b>b</b>) Right image. (<b>c</b>) Disparity image. (<b>d</b>) v-disparity. (<b>e</b>) u-disparity. (<b>d</b>,<b>e</b>) are derived from (<b>c</b>). (<b>f</b>) Large obstacles removed by removing peak values from (<b>e</b>). (<b>g</b>) v-disparity based on (<b>f</b>), and red line is disparity profile of ground plane. (<b>h</b>) Detected ground plane and region of interest (RoI); RoI in red box. (<b>i</b>) City 3D reconstruction; green represents ground.</p>
Full article ">Figure 4
<p>Graph-structure optimization. <span class="html-italic">P</span> represents the nodes of visual points, and <span class="html-italic">X</span> represents the pose of the frame. “Ground” denotes the ground information extracted from the 3D reconstruction.</p>
Full article ">Figure 5
<p>Trajectory estimates in the KITTI dataset. (<b>a</b>) 00. (<b>b</b>) 01. (<b>c</b>) 05. (<b>d</b>) 07. (<b>e</b>) 08. (<b>f</b>) 09.</p>
Full article ">Figure 6
<p>High-definition display of point clouds for some streets in the 00 sequence. (<b>a</b>) 00. (<b>b</b>) 01. (<b>c</b>) 05. (<b>d</b>) 07. (<b>e</b>) 08. (<b>f</b>) 09.</p>
Full article ">Figure 7
<p>3D reconstruction based on road constraints, where green represents the road. (<b>a</b>) 00. (<b>b</b>) 01. (<b>c</b>) 05. (<b>d</b>) 07. (<b>e</b>) 08. (<b>f</b>) 09.</p>
Full article ">Figure 7 Cont.
<p>3D reconstruction based on road constraints, where green represents the road. (<b>a</b>) 00. (<b>b</b>) 01. (<b>c</b>) 05. (<b>d</b>) 07. (<b>e</b>) 08. (<b>f</b>) 09.</p>
Full article ">Figure 8
<p>High-definition display of point clouds for some streets in the 00 sequence.The image in the top left corner is a 3D reconstruction of the entire city, and the other images depict details of its streets (<b>a</b>–<b>e</b>).</p>
Full article ">
24 pages, 14015 KiB  
Article
CDP-MVS: Forest Multi-View Reconstruction with Enhanced Confidence-Guided Dynamic Domain Propagation
by Zitian Liu, Zhao Chen, Xiaoli Zhang and Shihan Cheng
Remote Sens. 2024, 16(20), 3845; https://doi.org/10.3390/rs16203845 - 16 Oct 2024
Viewed by 981
Abstract
Using multi-view images of forest plots to reconstruct dense point clouds and extract individual tree parameters enables rapid, high-precision, and cost-effective forest plot surveys. However, images captured at close range face challenges in forest reconstruction, such as unclear canopy reconstruction, prolonged reconstruction times, [...] Read more.
Using multi-view images of forest plots to reconstruct dense point clouds and extract individual tree parameters enables rapid, high-precision, and cost-effective forest plot surveys. However, images captured at close range face challenges in forest reconstruction, such as unclear canopy reconstruction, prolonged reconstruction times, insufficient accuracy, and issues with tree duplication. To address these challenges, this paper introduces a new image dataset creation process that enhances both the efficiency and quality of image acquisition. Additionally, a block-matching-based multi-view reconstruction algorithm, Forest Multi-View Reconstruction with Enhanced Confidence-Guided Dynamic Domain Propagation (CDP-MVS), is proposed. The CDP-MVS algorithm addresses the issue of canopy and sky mixing in reconstructed point clouds by segmenting the sky in the depth maps and setting its depth value to zero. Furthermore, the algorithm introduces a confidence calculation method that comprehensively evaluates multiple aspects. Moreover, CDP-MVS employs a decentralized dynamic domain propagation sampling strategy, guiding the propagation of the dynamic domain through newly defined confidence measures. Finally, this paper compares the reconstruction results and individual tree parameters of the CDP-MVS, ACMMP, and PatchMatchNet algorithms using self-collected data. Visualization results show that, compared to the other two algorithms, CDP-MVS produces the least sky noise in tree reconstructions, with the clearest and most detailed canopy branches and trunk sections. In terms of parameter metrics, CDP-MVS achieved 100% accuracy in reconstructing tree quantities across the four plots, effectively avoiding tree duplication. The accuracy of breast diameter extraction values of point clouds reconstructed by CDPMVS reached 96.27%, 90%, 90.64%, and 93.62%, respectively, in the four sample plots. The positional deviation of reconstructed trees, compared to ACMMP, was reduced by 0.37 m, 0.07 m, 0.18 m and 0.33 m, with the average distance deviation across the four plots converging within 0.25 m. In terms of reconstruction efficiency, CDP-MVS completed the reconstruction of the four plots in 1.8 to 3.1 h, reducing the average reconstruction time per plot by six minutes compared to ACMMP and by two to three times compared to PatchMatchNet. Finally, the differences in tree height accuracy among the point clouds reconstructed by the different algorithms were minimal. The experimental results demonstrate that CDP-MVS, as a multi-view reconstruction algorithm tailored for forest reconstruction, shows promising application potential and can provide valuable support for forestry surveys. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of the study area. (<b>A</b>) Dongsheng Bajia Country Park—poplar; (<b>B</b>) Jiufeng—pine; (<b>C</b>) Olympic Forest Park—elm; (<b>D</b>) Olympic Forest Park—ginkgo.</p>
Full article ">Figure 2
<p>Comparison of camera position trajectories generated by Colmap. (<b>A</b>) Filming method with two circular paths. (<b>B</b>) Filming method with a single circular path around the forest plot.</p>
Full article ">Figure 3
<p>Technical framework.</p>
Full article ">Figure 4
<p>Comparison of sparse reconstruction point clouds under two filming methods. (<b>A</b>) Single circular path around the forest plot. (<b>B</b>) Two circular paths inside and outside the forest plot.</p>
Full article ">Figure 5
<p>Adaptive checkerboard propagation scheme of ACMMP. (Each V-shaped region contains 7 sampling pixels, and each strip region contains 11 sampling pixels. In the figure, Circles represent pixels. The black solid circle indicates the pixel to be estimated. The yellow circle represents the sampling point. During each propagation, the depth value of the red pixel is updated by the black pixel, and vice versa.).</p>
Full article ">Figure 6
<p>CDP-MVS dynamic domain propagation scheme (removing the central sample points and independently sampling in eight directions. Circles represent pixels. The black solid circle indicates the pixel to be estimated. During each propagation, the depth value of the red pixel is updated by the black pixel, and vice versa.).</p>
Full article ">Figure 7
<p>Reprojection flowchart. (The yellow line represents the process of projecting the pixel point p of the reference image to the point q in the adjacent image. The green line represents the process of re-projecting the point q back to the reference image.).</p>
Full article ">Figure 8
<p>Reference image (<b>A</b>) and its binarized grayscale image with sky segmentation (<b>B</b>).</p>
Full article ">Figure 9
<p>CDP-MVS algorithm flowchart.</p>
Full article ">Figure 10
<p>PatchMatchNet propagation sampling strategy. (Circles represent pixels. The black solid circle indicates the pixel to be estimated. During each propagation, the depth value of the red pixel is updated by the black pixel, and vice versa.).</p>
Full article ">Figure 11
<p>Dense point clouds reconstructed for three plots using different algorithms. (<b>a</b>) Poplar, (<b>b</b>) pine, (<b>c</b>) elm, (<b>d</b>) ginkgo. 1: CDP-MVS, 2: ACMMP, 3: PatchMatchNet.</p>
Full article ">Figure 11 Cont.
<p>Dense point clouds reconstructed for three plots using different algorithms. (<b>a</b>) Poplar, (<b>b</b>) pine, (<b>c</b>) elm, (<b>d</b>) ginkgo. 1: CDP-MVS, 2: ACMMP, 3: PatchMatchNet.</p>
Full article ">Figure 12
<p>Comparison of canopy details reconstructed by different algorithms. (<b>a</b>) Poplar, (<b>b</b>) pine, (<b>c</b>) elm, (<b>d</b>) ginkgo. 1: CDP-MVS, 2: ACMMP, 3: PatchMatchNet.</p>
Full article ">Figure 12 Cont.
<p>Comparison of canopy details reconstructed by different algorithms. (<b>a</b>) Poplar, (<b>b</b>) pine, (<b>c</b>) elm, (<b>d</b>) ginkgo. 1: CDP-MVS, 2: ACMMP, 3: PatchMatchNet.</p>
Full article ">Figure 13
<p>Comparison of trunk details reconstructed by different algorithms. (<b>a</b>) Poplar, (<b>b</b>) pine, (<b>c</b>) elm, (<b>d</b>) ginkgo. 1: CDP-MVS, 2: ACMMP, 3: PatchMatchNet.</p>
Full article ">Figure 13 Cont.
<p>Comparison of trunk details reconstructed by different algorithms. (<b>a</b>) Poplar, (<b>b</b>) pine, (<b>c</b>) elm, (<b>d</b>) ginkgo. 1: CDP-MVS, 2: ACMMP, 3: PatchMatchNet.</p>
Full article ">Figure 14
<p>Scatter plot comparing reconstructed tree positions with actual positions (unit: meters). (<b>a</b>) Poplar, (<b>b</b>) pine, (<b>c</b>) elm, (<b>d</b>) ginkgo. 1: CDP-MVS, 2: ACMMP, 3: PatchMatchNet.</p>
Full article ">Figure 14 Cont.
<p>Scatter plot comparing reconstructed tree positions with actual positions (unit: meters). (<b>a</b>) Poplar, (<b>b</b>) pine, (<b>c</b>) elm, (<b>d</b>) ginkgo. 1: CDP-MVS, 2: ACMMP, 3: PatchMatchNet.</p>
Full article ">
20 pages, 6262 KiB  
Article
YPR-SLAM: A SLAM System Combining Object Detection and Geometric Constraints for Dynamic Scenes
by Xukang Kan, Gefei Shi, Xuerong Yang and Xinwei Hu
Sensors 2024, 24(20), 6576; https://doi.org/10.3390/s24206576 - 12 Oct 2024
Viewed by 895
Abstract
Traditional SLAM systems assume a static environment, but moving objects break this ideal assumption. In the real world, moving objects can greatly influence the precision of image matching and camera pose estimation. In order to solve these problems, the YPR-SLAM system is proposed. [...] Read more.
Traditional SLAM systems assume a static environment, but moving objects break this ideal assumption. In the real world, moving objects can greatly influence the precision of image matching and camera pose estimation. In order to solve these problems, the YPR-SLAM system is proposed. First of all, the system includes a lightweight YOLOv5 detection network for detecting both dynamic and static objects, which provides pre-dynamic object information to the SLAM system. Secondly, utilizing the prior information of dynamic targets and the depth image, a method of geometric constraint for removing motion feature points from the depth image is proposed. The Depth-PROSAC algorithm is used to differentiate the dynamic and static feature points so that dynamic feature points can be removed. At last, the dense cloud map is constructed by the static feature points. The YPR-SLAM system is an efficient combination of object detection and geometry constraint in a tightly coupled way, eliminating motion feature points and minimizing their adverse effects on SLAM systems. The performance of the YPR-SLAM was assessed on the public TUM RGB-D dataset, and it was found that YPR-SLAM was suitable for dynamic situations. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>Framework of YPR-SLAM System. The blue section is ORB-SLAM2, and the orange section is the addition of this paper.</p>
Full article ">Figure 2
<p>The YOLOv5 network architecture.</p>
Full article ">Figure 3
<p>Dynamic target detection and filtering thread. First, the ORB feature point is extracted from the RGB image by the tracking thread. Next, the dynamic target detection thread identifies potential dynamic target areas, and then the Depth-PROSAC algorithm is applied to filter out dynamic feature points. Finally, the static feature points are retained for subsequent pose estimation.</p>
Full article ">Figure 4
<p>The comparison between target detection algorithms and the Depth-PROSAC algorithm in filtering out dynamic feature points. (<b>a</b>) shows that the object detection method directly filters out dynamic feature points, and (<b>b</b>) shows that the Depth-PROSAC algorithm filters out dynamic feature points.</p>
Full article ">Figure 5
<p>Dense point cloud construction workflow.</p>
Full article ">Figure 6
<p>In the fr3_walking_halfsphere sequence, the YPR-SLAM and ORB-SLAM2 systems were used to estimate the 3D motion of the camera. (<b>a</b>) Camera path estimated by ORB-SLAM2; (<b>b</b>) YPR-SLAM estimation of camera trajectory.</p>
Full article ">Figure 7
<p><span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> of the ORB-SLAM2 system and the YPR-SLAM system under different datasets. (<b>a1</b>,<b>a2</b>,<b>c1</b>,<b>c2</b>,<b>e1</b>,<b>e2</b>,<b>g1</b>,<b>g2</b>) represent ATE and RPE obtained by the ORB-SLAM2 system by running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>b1</b>,<b>b2</b>,<b>d1</b>,<b>d2</b>,<b>f1</b>,<b>f2</b>,<b>h1</b>,<b>h2</b>) represent <span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> plots of the YPR-SLAM system running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>a1</b>,<b>b1</b>,<b>c1</b>,<b>d1</b>,<b>e1</b>,<b>f1</b>,<b>g1</b>,<b>h1</b>) represent ATE plots. (<b>a2</b>,<b>b2</b>,<b>c2</b>,<b>d2</b>,<b>e2</b>,<b>f2</b>,<b>g2</b>,<b>h2</b>) represent <span class="html-italic">RPE</span> plots.</p>
Full article ">Figure 7 Cont.
<p><span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> of the ORB-SLAM2 system and the YPR-SLAM system under different datasets. (<b>a1</b>,<b>a2</b>,<b>c1</b>,<b>c2</b>,<b>e1</b>,<b>e2</b>,<b>g1</b>,<b>g2</b>) represent ATE and RPE obtained by the ORB-SLAM2 system by running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>b1</b>,<b>b2</b>,<b>d1</b>,<b>d2</b>,<b>f1</b>,<b>f2</b>,<b>h1</b>,<b>h2</b>) represent <span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> plots of the YPR-SLAM system running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>a1</b>,<b>b1</b>,<b>c1</b>,<b>d1</b>,<b>e1</b>,<b>f1</b>,<b>g1</b>,<b>h1</b>) represent ATE plots. (<b>a2</b>,<b>b2</b>,<b>c2</b>,<b>d2</b>,<b>e2</b>,<b>f2</b>,<b>g2</b>,<b>h2</b>) represent <span class="html-italic">RPE</span> plots.</p>
Full article ">Figure 7 Cont.
<p><span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> of the ORB-SLAM2 system and the YPR-SLAM system under different datasets. (<b>a1</b>,<b>a2</b>,<b>c1</b>,<b>c2</b>,<b>e1</b>,<b>e2</b>,<b>g1</b>,<b>g2</b>) represent ATE and RPE obtained by the ORB-SLAM2 system by running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>b1</b>,<b>b2</b>,<b>d1</b>,<b>d2</b>,<b>f1</b>,<b>f2</b>,<b>h1</b>,<b>h2</b>) represent <span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> plots of the YPR-SLAM system running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>a1</b>,<b>b1</b>,<b>c1</b>,<b>d1</b>,<b>e1</b>,<b>f1</b>,<b>g1</b>,<b>h1</b>) represent ATE plots. (<b>a2</b>,<b>b2</b>,<b>c2</b>,<b>d2</b>,<b>e2</b>,<b>f2</b>,<b>g2</b>,<b>h2</b>) represent <span class="html-italic">RPE</span> plots.</p>
Full article ">Figure 8
<p>Using ORB-SLAM2 and YPR-SLAM to construct dense 3D point cloud map in dynamic scene sequence fre3_walking_xyz. (<b>a</b>) represents a dense 3D point cloud map constructed by the ORB-SLAM2 system; (<b>b</b>) represents a dense 3D point cloud map constructed by the YPR-SLAM system.</p>
Full article ">
38 pages, 98377 KiB  
Article
FaSS-MVS: Fast Multi-View Stereo with Surface-Aware Semi-Global Matching from UAV-Borne Monocular Imagery
by Boitumelo Ruf, Martin Weinmann and Stefan Hinz
Sensors 2024, 24(19), 6397; https://doi.org/10.3390/s24196397 - 2 Oct 2024
Viewed by 831
Abstract
With FaSS-MVS, we present a fast, surface-aware semi-global optimization approach for multi-view stereo that allows for rapid depth and normal map estimation from monocular aerial video data captured by unmanned aerial vehicles (UAVs). The data estimated by FaSS-MVS, in turn, facilitate online 3D [...] Read more.
With FaSS-MVS, we present a fast, surface-aware semi-global optimization approach for multi-view stereo that allows for rapid depth and normal map estimation from monocular aerial video data captured by unmanned aerial vehicles (UAVs). The data estimated by FaSS-MVS, in turn, facilitate online 3D mapping, meaning that a 3D map of the scene is immediately and incrementally generated as the image data are acquired or being received. FaSS-MVS is composed of a hierarchical processing scheme in which depth and normal data, as well as corresponding confidence scores, are estimated in a coarse-to-fine manner, allowing efficient processing of large scene depths, such as those inherent in oblique images acquired by UAVs flying at low altitudes. The actual depth estimation uses a plane-sweep algorithm for dense multi-image matching to produce depth hypotheses from which the actual depth map is extracted by means of a surface-aware semi-global optimization, reducing the fronto-parallel bias of Semi-Global Matching (SGM). Given the estimated depth map, the pixel-wise surface normal information is then computed by reprojecting the depth map into a point cloud and computing the normal vectors within a confined local neighborhood. In a thorough quantitative and ablative study, we show that the accuracy of the 3D information computed by FaSS-MVS is close to that of state-of-the-art offline multi-view stereo approaches, with the error not even an order of magnitude higher than that of COLMAP. At the same time, however, the average runtime of FaSS-MVS for estimating a single depth and normal map is less than 14% of that of COLMAP, allowing us to perform online and incremental processing of full HD images at 1–2 Hz. Full article
(This article belongs to the Special Issue Advances on UAV-Based Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>Overview of the processing pipeline for FaSS-MVS. Given a bundle of images and corresponding camera poses <math display="inline"><semantics> <msub> <mfenced separators="" open="(" close=")"> <mi mathvariant="script">I</mi> <mo>,</mo> <mi mathvariant="normal">P</mi> </mfenced> <mi>k</mi> </msub> </semantics></math> of an input sequence, a hierarchical MVS estimation is performed to recover a depth, normal and confidence map <math display="inline"><semantics> <mfenced separators="" open="(" close=")"> <mi mathvariant="script">D</mi> <mo>,</mo> <mi mathvariant="script">N</mi> <mo>,</mo> <mi mathvariant="script">C</mi> </mfenced> </semantics></math>. Adapted from [<a href="#B12-sensors-24-06397" class="html-bibr">12</a>,<a href="#B45-sensors-24-06397" class="html-bibr">45</a>].</p>
Full article ">Figure 2
<p>Illustration of the plane-sweep algorithm for multi-image matching. A scene is sampled by a plane <math display="inline"><semantics> <mrow> <mi mathvariant="sans-serif">Π</mi> <mo>=</mo> <mo>(</mo> <mi mathvariant="normal">n</mi> <mo>,</mo> <mi>δ</mi> <mo>)</mo> </mrow> </semantics></math>, where <math display="inline"><semantics> <mi mathvariant="normal">n</mi> </semantics></math> is the normal vector of the plane and <math display="inline"><semantics> <mi>δ</mi> </semantics></math> is the orthogonal distance of the plane from <math display="inline"><semantics> <msub> <mi mathvariant="normal">c</mi> <mi>ref</mi> </msub> </semantics></math>. The plane is swept through space along its normal vector between two bounding planes <math display="inline"><semantics> <msub> <mi mathvariant="sans-serif">Π</mi> <mi>max</mi> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi mathvariant="sans-serif">Π</mi> <mi>min</mi> </msub> </semantics></math>. For each distance <math display="inline"><semantics> <mi>δ</mi> </semantics></math> of <math display="inline"><semantics> <mi mathvariant="sans-serif">Π</mi> </semantics></math>, the reference pixel <math display="inline"><semantics> <msup> <mi mathvariant="normal">p</mi> <mi>ref</mi> </msup> </semantics></math> is projected by the plane-induced homography <math display="inline"><semantics> <msub> <mi mathvariant="normal">H</mi> <mrow> <mi>ref</mi> <mo>→</mo> <mi>k</mi> </mrow> </msub> </semantics></math> into an arbitrary number of viewpoints where it is matched with the corresponding pixel in <math display="inline"><semantics> <msub> <mi mathvariant="script">I</mi> <mi>k</mi> </msub> </semantics></math>.</p>
Full article ">Figure 3
<p>Illustration of determining the orthogonal distance parameter of the sampling planes of the plane-sweep multi-image matching by using the cross-ratio and epipolar geometry. Here, <math display="inline"><semantics> <msub> <mi mathvariant="normal">c</mi> <mi>ref</mi> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi mathvariant="normal">c</mi> <mi>k</mi> </msub> </semantics></math> represent the positions of the optical centers of the two cameras. Adapted from [<a href="#B11-sensors-24-06397" class="html-bibr">11</a>].</p>
Full article ">Figure 4
<p>Illustration of the different path aggregation strategies along one path direction <math display="inline"><semantics> <mi mathvariant="normal">r</mi> </semantics></math> within the three presented SGM<sup>x</sup> optimization schemes. Column 1: Reference image and normal map of a building. Illustrated area is marked with yellow line. Column 2: SGM<sup>Π</sup> path aggregation. The blue and pink lines represent the blue and pink surface orientations on the building facade. When aggregating the path costs for pixel <math display="inline"><semantics> <mi mathvariant="normal">p</mi> </semantics></math> at plane <math display="inline"><semantics> <mi mathvariant="sans-serif">Π</mi> </semantics></math>, SGM<sup>Π</sup> will include the previous costs at the same plane position (green) without additional penalty. The previous path costs at <math display="inline"><semantics> <mrow> <mi mathvariant="sans-serif">Π</mi> <mspace width="0.166667em"/> <mo>±</mo> <mn>1</mn> </mrow> </semantics></math> (yellow) will be penalized with <math display="inline"><semantics> <msub> <mi>φ</mi> <mn>1</mn> </msub> </semantics></math>. The previous path costs located at <math display="inline"><semantics> <mrow> <mi mathvariant="sans-serif">Π</mi> <mspace width="3.33333pt"/> <mo>+</mo> <mspace width="3.33333pt"/> <mn>2</mn> </mrow> </semantics></math> (red), which is actually located on the corresponding surface, will be penalized with the highest penalty <math display="inline"><semantics> <msub> <mi>φ</mi> <mn>2</mn> </msub> </semantics></math>. Column 3: SGM<sup>Π-sn</sup> uses the normal vector <math display="inline"><semantics> <msub> <mi mathvariant="normal">n</mi> <mi mathvariant="normal">p</mi> </msub> </semantics></math>, which encodes the surface orientation at pixel <math display="inline"><semantics> <mi mathvariant="normal">p</mi> </semantics></math>, and computes a discrete index jump <math display="inline"><semantics> <mrow> <mo>Δ</mo> <msub> <mi>i</mi> <mi>sn</mi> </msub> </mrow> </semantics></math>, which ideally adjusts the zero-cost transition so that the previous path costs at <math display="inline"><semantics> <mrow> <mi mathvariant="sans-serif">Π</mi> <mo>+</mo> <mn>2</mn> </mrow> </semantics></math> are not penalized. Column 4: Similar to SGM<sup>Π-sn</sup>, SGM<sup>Π-pg</sup> adjusts the zero-cost transition. However, the discrete index jump <math display="inline"><semantics> <mrow> <mo>Δ</mo> <msub> <mi>i</mi> <mi>pg</mi> </msub> </mrow> </semantics></math> is derived from the running gradient <math display="inline"><semantics> <mrow> <mo>∇</mo> <mi mathvariant="normal">r</mi> </mrow> </semantics></math> of the minimum-cost path. Adapted from [<a href="#B12-sensors-24-06397" class="html-bibr">12</a>].</p>
Full article ">Figure 5
<p>Overview of the datasets used for performance evaluation of FaSS-MVS. Column 1: Two building models from the DTU Robot MVS dataset. Column 2: Example images in oblique and nadir view from the 3DOMcity Benchmark dataset. Column 3: Excerpt of the privately acquired TMB dataset. Column 4: Use-case-specific dataset acquired during an exercise of the local fire brigade.</p>
Full article ">Figure 6
<p>Qualitative comparison of the results achieved by the three different SGM implementations on the DTU dataset. Row 1: Reference data from the dataset, i.e., the ground truth depth and normal map, as well as the reference image for which the data are computed. Rows 2–4: Data, i.e., depth, normal and confidence maps, computed by SGM<sup>Π</sup>, SGM<sup>Π-sn</sup> and SGM<sup>Π-pg</sup>, respectively. Furthermore, difference maps are provided which hold the pixel-wise absolute difference between the estimated depth map and the ground truth. The color encoding reaches from dark blue (low error) via green to yellow (high error). The depth range within the depth maps reaches from 580 mm (blue) to 830 mm (red). The estimated maps are masked according to the ground truth.</p>
Full article ">Figure 7
<p>Qualitative comparison of the results achieved by the three different SGM implementations on the 3DOMcity dataset. Row 1: Reference data from the dataset, i.e., the ground truth depth and normal map, as well as the reference image for which the data are computed. Rows 2–4: Data, i.e., depth, normal and confidence maps, computed by SGM<sup>Π</sup>, SGM<sup>Π-sn</sup> and SGM<sup>Π-pg</sup>, respectively. Furthermore, difference maps are provided which hold the pixel-wise absolute difference between the estimated depth map and the ground truth. The depth range within the depth maps reaches from 1 m (blue) to <math display="inline"><semantics> <mrow> <mn>1.8</mn> </mrow> </semantics></math> m (red). The estimated maps are masked according to the ground truth. For visualization in this figure, the resulting images have been rotated counterclockwise by <math display="inline"><semantics> <msup> <mn>90</mn> <mo>∘</mo> </msup> </semantics></math>. Thus, the color encoding of the normal maps differs from that used in the other figures. Here, red represents an upwards orientation, while green represents an orientation to the left.</p>
Full article ">Figure 8
<p>ROC curves illustrating the error rate achieved by the three different SGM implementations as a function of increasing density of the estimated depth map.</p>
Full article ">Figure 9
<p>Accuracy–completeness curves of different post-filtering strategies, i.e., DoG filtering, GCC as well as a combination of both, executed in combination with the three different SGM extensions and a fronto-parallel sampling. In this, the threshold <math display="inline"><semantics> <mi>θ</mi> </semantics></math> is varied within the list of {<math display="inline"><semantics> <mrow> <mn>1.25</mn> <mo>,</mo> <mn>1.20</mn> <mo>,</mo> <mn>1.15</mn> <mo>,</mo> <mn>1.10</mn> <mo>,</mo> <mn>1.05</mn> <mo>,</mo> <mn>1.01</mn> </mrow> </semantics></math>}. By decreasing <math display="inline"><semantics> <mi>θ</mi> </semantics></math>, the accuracy and completeness rates drop.</p>
Full article ">Figure 10
<p>Qualitative comparison of FaSS-MVS with its three SGM extensions and GCC, the PSL with differently sized support regions for the NCC, as well as with GCC.</p>
Full article ">Figure 11
<p>Qualitative results of SGM<sup>Π-pg</sup> with 4 aggregation paths achieved on the two real-world and use-case-specific datasets, namely the TMB dataset and the FB dataset. As comparison, the corresponding depth maps estimated by COLMAP are also visualized. Rows 1 and 2: TMB Building scene captured from an altitude of 15 m and 8 m, respectively. Rows 3 and 4: TMB Container scene. Rows 5 and 6: Two excerpts from the FB dataset.</p>
Full article ">Figure A1
<p>Qualitative comparison between the use of a fronto-parallel and non-fronto-parallel sampling direction in combination with SGM<sup>Π</sup>. Columns 2 and 4: Corresponding estimated depth map. Columns 3 and 5: Difference map holding the pixel-wise absolute difference between the estimated depth map and the ground truth. The color encoding reaches from dark blue (low error) via green to yellow (high error). The estimated depth maps and the difference maps are masked according to the ground truth.</p>
Full article ">
Back to TopTop