Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (81)

Search Parameters:
Keywords = NeRF

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
13 pages, 8320 KiB  
Technical Note
Unmanned Aerial Vehicle-Neural Radiance Field (UAV-NeRF): Learning Multiview Drone Three-Dimensional Reconstruction with Neural Radiance Field
by Li Li, Yongsheng Zhang, Zhipeng Jiang, Ziquan Wang, Lei Zhang and Han Gao
Remote Sens. 2024, 16(22), 4168; https://doi.org/10.3390/rs16224168 - 8 Nov 2024
Viewed by 393
Abstract
In traditional 3D reconstruction using UAV images, only radiance information, which is treated as a geometric constraint, is used in feature matching, allowing for the restoration of the scene’s structure. After introducing radiance supervision, NeRF can adjust the geometry in the fixed-ray direction, [...] Read more.
In traditional 3D reconstruction using UAV images, only radiance information, which is treated as a geometric constraint, is used in feature matching, allowing for the restoration of the scene’s structure. After introducing radiance supervision, NeRF can adjust the geometry in the fixed-ray direction, resulting in a smaller search space and higher robustness. Considering the lack of NeRF construction methods for aerial scenarios, we propose a new NeRF point sampling method, which is generated using a UAV imaging model, compatible with a global geographic coordinate system, and suitable for a UAV view. We found that NeRF is optimized entirely based on the radiance while ignoring the direct geometry constraint. Therefore, we designed a radiance correction strategy that considers the incidence angle. Our method can complete point sampling in a UAV imaging scene, as well as simultaneously perform digital surface model construction and ground radiance information recovery. When tested on self-acquired datasets, the NeRF variant proposed in this paper achieved better reconstruction accuracy than the original NeRF-based methods. It also reached a level of precision comparable to that of traditional photogrammetry methods, and it is capable of outputting a surface albedo that includes shadow information. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p><b>The main motivation for our proposed method</b>. We analyzed NeRF’s 3D reconstruction workflow (<b>c</b>) from a photogrammetric perspective (<b>b</b>) and found that the latter uses reprojection errors to geometrically adjust the ray direction, while the former can adjust the transmittance by radiance help, thereby narrowing the search space to a single ray. However, there is a rare NeRF method specifically for drone imaging scenarios. In addition, NeRF does not consider the influence of geometric structure changes on the radiance (<b>a</b>); as such, we designed a new geographical NeRF point sampling method for UAVs and introduced the photogrammetry incident angle model to optimize NeRF radiance features, thus completing end-to-end 3D reconstruction and radiance acquisition (<b>d</b>).</p>
Full article ">Figure 2
<p><b>Main workflow of the multitask UAV-NeRF</b>. We used traditional photogrammetry methods to sample the NeRF point, and we then used a geometric imaging model to perform radiation correction and decoding. “MLP” represents the multilayer perceptrons that were used to decode the different radiation information.</p>
Full article ">Figure 3
<p>The study location along the with general collection pattern, flight lines for the drone imagery.</p>
Full article ">Figure 4
<p><b>Intuitive performance comparison of different methods.</b> These experiments were conducted on the DengFeng and XinMi areas, and they demonstrate the improvement in 3D surface construction achieved using our proposed method. From (<b>a</b>) to (<b>d</b>), the images are as follows: (<b>a</b>) the original imagery captured by the drone, (<b>b</b>) the corresponding ground DSM truth obtained from LIDAR, (<b>c</b>) the DSM predicted by the method proposed in this paper, and (<b>d</b>) the DSM obtained using the CC method. Additionally, another representation of the results, showcasing additional details, is provided in <a href="#remotesensing-16-04168-t001" class="html-table">Table 1</a>.</p>
Full article ">Figure 5
<p>The UAV image, albedo, shadow scalar <span class="html-italic">s</span>, and transient scalar <math display="inline"><semantics> <mi>β</mi> </semantics></math>.</p>
Full article ">
16 pages, 11298 KiB  
Article
Scene Measurement Method Based on Fusion of Image Sequence and Improved LiDAR SLAM
by Dongtai Liang, Donghui Li, Kui Yang, Wenxue Hu, Xuwen Chen and Zhangwei Chen
Electronics 2024, 13(21), 4250; https://doi.org/10.3390/electronics13214250 - 30 Oct 2024
Viewed by 557
Abstract
To address the issue that sparse point cloud maps constructed by SLAM cannot provide detailed information about measured objects, and image sequence-based measurement methods have problems with large data volume and cumulative errors, this paper proposes a scene measurement method that integrates image [...] Read more.
To address the issue that sparse point cloud maps constructed by SLAM cannot provide detailed information about measured objects, and image sequence-based measurement methods have problems with large data volume and cumulative errors, this paper proposes a scene measurement method that integrates image sequences with an improved LiDAR SLAM. By introducing plane features, the positioning accuracy of LiDAR SLAM is enhanced, and real-time odometry poses are generated. Simultaneously, the system captures image sequences of the measured object using synchronized cameras, and NeRF is used for 3D reconstruction. Time synchronization and data registration between the LiDAR and camera data frames with identical timestamps are achieved. Finally, the least squares method and ICP algorithm are employed to compute the scale factor s and transformation matrices R and t between different point clouds from LiDAR and NeRF reconstruction. Then, the precise measurement of the objects could be implemented. Experimental results demonstrate that this method significantly improves measurement accuracy, with an average error within 10 mm and 1°, providing a robust and reliable solution for scene measurement. Full article
Show Figures

Figure 1

Figure 1
<p>Algorithm framework diagram. Improved LiDAR SLAM: The improvements in plane expansion and constraint optimization. Three-dimensional reconstruction based on image sequences: the process of NeRF 3D reconstruction using images, along with obtaining the camera’s coordinate system information. Dimension and position recovery matrix calculation, used for unifying dimensions and recovering the scale and pose of the point cloud.</p>
Full article ">Figure 2
<p>Epipolar geometry constraints. Baseline O<sub>1</sub>O<sub>2</sub>: the line connecting the optical centers of the cameras. Epipoles e<sub>1</sub> and e<sub>2</sub> and the intersection points of the baseline with the image planes of the cameras. Epipolar Plane PO<sub>1</sub>O<sub>2</sub>: the plane passing through the baseline and the feature point in the three-dimensional world. Epipolar Lines l<sub>1</sub> and l<sub>2</sub>: the intersection lines of the image planes with the epipolar plane.</p>
Full article ">Figure 3
<p>Four solutions to the decomposition of the essential matrix. (<b>a</b>) Rotation matrix <b><span class="html-italic">R</span></b><sub>1</sub> and translation vector <b><span class="html-italic">t</span></b>. (<b>b</b>) Rotation matrix <b><span class="html-italic">R</span></b><sub>1</sub> and translation vector −<b><span class="html-italic">t</span></b>. (<b>c</b>) Rotation matrix <b><span class="html-italic">R</span></b><sub>2</sub> and translation vector t. (<b>d</b>) Rotation matrix <b><span class="html-italic">R</span></b><sub>2</sub> and translation vector −<b><span class="html-italic">t</span></b>.</p>
Full article ">Figure 4
<p>Flowchart of NeRF algorithm [<a href="#B12-electronics-13-04250" class="html-bibr">12</a>].</p>
Full article ">Figure 5
<p>LiDAR and camera data frame positional schematic. (<b>a</b>) LiDAR data points. (<b>b</b>) Camera data points.</p>
Full article ">Figure 6
<p>Sensor timestamp hardware synchronization. (<b>a</b>) Before timestamp alignment and (<b>b</b>) after timestamp alignment.</p>
Full article ">Figure 7
<p>Experimental hardware platform.</p>
Full article ">Figure 8
<p>Improved LiDAR SLAM and NeRF target point cloud reconstruction. (<b>a</b>–<b>d</b>) The point cloud construction effects of the improved LiDAR SLAM. (<b>e</b>–<b>h</b>) The corresponding NeRF 3D reconstruction effects.</p>
Full article ">Figure 8 Cont.
<p>Improved LiDAR SLAM and NeRF target point cloud reconstruction. (<b>a</b>–<b>d</b>) The point cloud construction effects of the improved LiDAR SLAM. (<b>e</b>–<b>h</b>) The corresponding NeRF 3D reconstruction effects.</p>
Full article ">Figure 9
<p>Poster geometric measurement. (<b>a</b>–<b>c</b>) The measurement results of the R3LIVE algorithm. (<b>d</b>,<b>e</b>) The measurement results of the method in this paper.</p>
Full article ">Figure 10
<p>Geometric relationship measurement of the poster, emblem, and lampposts. (<b>a</b>) The result of entire scene with two lampposts. (<b>b</b>) The result of the only two lampposts. (<b>c</b>) The result of the entire academy. (<b>d</b>) The result of the poster and academy emblem.</p>
Full article ">
12 pages, 1842 KiB  
Article
Neural Radiance Fields for Fisheye Driving Scenes Using Edge-Aware Integrated Depth Supervision
by Jiho Choi and Sang Jun Lee
Sensors 2024, 24(21), 6790; https://doi.org/10.3390/s24216790 - 22 Oct 2024
Viewed by 570
Abstract
Neural radiance fields (NeRF) have become an effective method for encoding scenes into neural representations, allowing for the synthesis of photorealistic views of unseen views from given input images. However, the applicability of traditional NeRF is significantly limited by its assumption that images [...] Read more.
Neural radiance fields (NeRF) have become an effective method for encoding scenes into neural representations, allowing for the synthesis of photorealistic views of unseen views from given input images. However, the applicability of traditional NeRF is significantly limited by its assumption that images are captured for object-centric scenes with a pinhole camera. Expanding these boundaries, we focus on driving scenarios using a fisheye camera, which offers the advantage of capturing visual information from a wide field of view. To address the challenges due to the unbounded and distorted characteristics of fisheye images, we propose an edge-aware integration loss function. This approach leverages sparse LiDAR projections and dense depth maps estimated from a learning-based depth model. The proposed algorithm assigns larger weights to neighboring points that have depth values similar to the sensor data. Experiments were conducted on the KITTI-360 and JBNU-Depth360 datasets, which are public and real-world datasets of driving scenarios using fisheye cameras. Experimental results demonstrated that the proposed method is effective in synthesizing novel view images, outperforming existing approaches. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of the proposed pipeline. Given a set of RGB images captured by fisheye camera in driving scenarios, we trained a monocular depth estimation network that outputs the densely predicted depth map <math display="inline"><semantics> <mover accent="true"> <mi>D</mi> <mo stretchy="false">^</mo> </mover> </semantics></math>. Moreover, the LiDAR points are projected to generate a sparse depth map <span class="html-italic">D</span>. These two depth priors are employed in the edge-aware integration module. The training of the radiance field is guided by the RGB images and the integrated depth maps <math display="inline"><semantics> <mover accent="true"> <mi>D</mi> <mo stretchy="false">˜</mo> </mover> </semantics></math>, which inform the model regarding ray termination. NeRF takes 5D inputs and is trained using the <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>c</mi> <mi>o</mi> <mi>l</mi> <mi>o</mi> <mi>r</mi> </mrow> </msub> </semantics></math> and the proposed <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>e</mi> <mi>d</mi> <mi>g</mi> <mi>e</mi> </mrow> </msub> </semantics></math> loss functions.</p>
Full article ">Figure 2
<p>Process of integrating the sparse LiDAR projection and dense estimated depth map. We propose the edge-aware integration loss function to optimize the NeRF model with depth supervision, as detailed in (<a href="#FD4-sensors-24-06790" class="html-disp-formula">4</a>). The proposed method guides the NeRF model using depth priors from the scene by minimizing the difference between the distributions of ray termination from the model and the given depth information. This depth is determined using the edge-aware smoothing kernel, which takes advantage of both depth priors. Moreover, it assigns a larger weight to adjacent points that are consistent with depth values.</p>
Full article ">Figure 3
<p>View synthesis on KITTI-360 dataset. The proposed method has demonstrated improved photorealistic results, as highlighted in the red boxes.</p>
Full article ">Figure 4
<p>View synthesis on JBNU-Depth360. We have highlighted the details of the synthesized image with a red box.</p>
Full article ">Figure 5
<p>Qualitative results of an ablation study on edge-aware integrated function. Compared to the spatial Gaussian function, the proposed approach better preserves the object’s edge information, resulting in a more realistic representation in the synthetic image.</p>
Full article ">
30 pages, 18530 KiB  
Article
Dimensionality Reduction for the Real-Time Light-Field View Synthesis of Kernel-Based Models
by Martijn Courteaux, Hannes Mareen, Bert Ramlot, Peter Lambert and Glenn Van Wallendael
Electronics 2024, 13(20), 4062; https://doi.org/10.3390/electronics13204062 - 15 Oct 2024
Viewed by 692
Abstract
Several frameworks have been proposed for delivering interactive, panoramic, camera-captured, six-degrees-of-freedom video content. However, it remains unclear which framework will meet all requirements the best. In this work, we focus on a Steered Mixture of Experts (SMoE) for 4D planar light fields, which [...] Read more.
Several frameworks have been proposed for delivering interactive, panoramic, camera-captured, six-degrees-of-freedom video content. However, it remains unclear which framework will meet all requirements the best. In this work, we focus on a Steered Mixture of Experts (SMoE) for 4D planar light fields, which is a kernel-based representation. For SMoE to be viable in interactive light-field experiences, real-time view synthesis is crucial yet unsolved. This paper presents two key contributions: a mathematical derivation of a view-specific, intrinsically 2D model from the original 4D light field model and a GPU graphics pipeline that synthesizes these viewpoints in real time. Configuring the proposed GPU implementation for high accuracy, a frequency of 180 to 290 Hz at a resolution of 2048×2048 pixels on an NVIDIA RTX 2080Ti is achieved. Compared to NVIDIA’s instant-ngp Neural Radiance Fields (NeRFs) with the default configuration, our light field rendering technique is 42 to 597 times faster. Additionally, allowing near-imperceptible artifacts in the reconstruction process can further increase speed by 40%. A first-order Taylor approximation causes imperfect views with peak signal-to-noise ratio (PSNR) scores between 45 dB and 63 dB compared to the reference implementation. In conclusion, we present an efficient algorithm for synthesizing 2D views at arbitrary viewpoints from 4D planar light-field SMoE models, enabling real-time, interactive, and high-quality light-field rendering within the SMoE framework. Full article
(This article belongs to the Special Issue Recent Advances in Signal Processing and Applications)
Show Figures

Figure 1

Figure 1
<p>Example viewpoints from the input views for each scene.</p>
Full article ">Figure 2
<p>Speed and quality analysis of varying the <math display="inline"><semantics> <mover accent="true"> <mi>α</mi> <mo stretchy="false">¯</mo> </mover> </semantics></math> threshold value on a resolution of <math display="inline"><semantics> <mrow> <mn>2048</mn> <mo>×</mo> <mn>2048</mn> </mrow> </semantics></math>. This threshold directly influences the size of the area that the component is evaluated on. Two views from the <span class="html-italic">barbershop</span> model were used for this test: <span class="html-italic">straight</span>, where the optical axis is aligned with the original input views (like in the <span class="html-italic">push–pull</span> tasks), and <span class="html-italic">tilted</span>, where the virtual camera is rotated instead (like in the <span class="html-italic">spin</span> tasks). The horizontal axis is rescaled by factor 256 such that one quantization level of an 8-bit display corresponds to one unit.</p>
Full article ">Figure 3
<p>The highest-quality render with <math display="inline"><semantics> <mrow> <mover accent="true"> <mi>α</mi> <mo stretchy="false">¯</mo> </mover> <mo>=</mo> <mn>0.125</mn> <mo>/</mo> <mn>256</mn> </mrow> </semantics></math> (<b>top</b>), compared to the lowest-quality render with <math display="inline"><semantics> <mrow> <mover accent="true"> <mi>α</mi> <mo stretchy="false">¯</mo> </mover> <mo>=</mo> <mn>15</mn> <mo>/</mo> <mn>256</mn> </mrow> </semantics></math> (<b>bottom</b>), of the <span class="html-italic">tilted</span> view in the test of <a href="#electronics-13-04062-f002" class="html-fig">Figure 2</a>. These results were produced in 4.1 ms and 2.2 ms, respectively. The quality of the fastest render (i.e., highest <math display="inline"><semantics> <mover accent="true"> <mi>α</mi> <mo stretchy="false">¯</mo> </mover> </semantics></math>, <b>bottom</b>) is still relatively high (i.e., 39 dB instead of 59 dB). The most noticeable artifacts have been highlighted. Notice how the regions of influence of components being cut off too early cause sharp edges for smooth components (i.e., low <math display="inline"><semantics> <msub> <mi>s</mi> <mi>i</mi> </msub> </semantics></math>). This effect is less noticeable for components that already have a sharp edge.</p>
Full article ">Figure 4
<p>Analysis of the rendering time and quality of the GPU rendering approach, compared to the reference quality from the CPU-block implementation, along with rendering time per frame. Note that rendering was performed with high precision using a threshold <math display="inline"><semantics> <mover accent="true"> <mi>α</mi> <mo stretchy="false">¯</mo> </mover> </semantics></math> of <math display="inline"><semantics> <mrow> <mn>0.125</mn> <mo>/</mo> <mn>256</mn> </mrow> </semantics></math>. Figure (<b>d</b>) shows the highest absolute error on an individual pixel (MPE) in each frame. The MPE being less than 2 for the <span class="html-italic">push–pull</span> task indicates near-perfect reconstruction over the whole image. This analysis is performed with the <span class="html-italic">barbershop</span> model on a resolution of <math display="inline"><semantics> <mrow> <mn>2048</mn> <mo>×</mo> <mn>2048</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 4 Cont.
<p>Analysis of the rendering time and quality of the GPU rendering approach, compared to the reference quality from the CPU-block implementation, along with rendering time per frame. Note that rendering was performed with high precision using a threshold <math display="inline"><semantics> <mover accent="true"> <mi>α</mi> <mo stretchy="false">¯</mo> </mover> </semantics></math> of <math display="inline"><semantics> <mrow> <mn>0.125</mn> <mo>/</mo> <mn>256</mn> </mrow> </semantics></math>. Figure (<b>d</b>) shows the highest absolute error on an individual pixel (MPE) in each frame. The MPE being less than 2 for the <span class="html-italic">push–pull</span> task indicates near-perfect reconstruction over the whole image. This analysis is performed with the <span class="html-italic">barbershop</span> model on a resolution of <math display="inline"><semantics> <mrow> <mn>2048</mn> <mo>×</mo> <mn>2048</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 5
<p>Visual comparison for the scene painter between SMoE, <tt>instant-ngp</tt> NeRFs, and the ground truth. The NeRF base model exhibits the highest visual quality and NeRF small the lowest, with SMoE falling in between.</p>
Full article ">Figure 6
<p>Speed and quality measurements for all original views of all datasets rendered with the three configurations (• SMoE <math display="inline"><semantics> <mrow> <mover accent="true"> <mi>α</mi> <mo stretchy="false">¯</mo> </mover> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mn>256</mn> </mrow> </semantics></math>, ■ NeRF base, | NeRF small). The data are split across two charts to reduce overlapping data points (the axes of both charts are identical). Each color represents a scene (<span class="html-italic">top chart</span>: <span style="color: #C40000">•</span> barbershop, <span style="color: #C4A000">•</span> painter – <span class="html-italic">bottom chart</span>: <span style="color: #00C400">•</span> lone monk, <span style="color: #0000E2">•</span> zen garden, <span style="color: #00B5E2">•</span> kitchen). The dashed vertical line marks the frame time of 16.6 ms, which is required for 60 Hz playback. Most notably, it can be observed that the SMoE configurations are on the left side of the vertical line (i.e., fast rendering), whereas the NeRF-based configurations are on the right side (i.e., slow rendering).</p>
Full article ">Figure A1
<p>The intersection of a ray <math display="inline"><semantics> <mover accent="true"> <mi>r</mi> <mo stretchy="false">→</mo> </mover> </semantics></math> with the camera plane. The ray starts in the virtual viewpoint <math display="inline"><semantics> <mover accent="true"> <mi>v</mi> <mo stretchy="false">→</mo> </mover> </semantics></math> and travels in direction <math display="inline"><semantics> <mover accent="true"> <mi>d</mi> <mo stretchy="false">→</mo> </mover> </semantics></math> and intersects the camera plane in <math display="inline"><semantics> <mover accent="true"> <mi>ρ</mi> <mo stretchy="false">→</mo> </mover> </semantics></math>.</p>
Full article ">Figure A2
<p>Schematic simulation of obtaining the query points by intersecting the rays coming out of each pixel of the virtual camera in <math display="inline"><semantics> <mover accent="true"> <mi>v</mi> <mo stretchy="false">→</mo> </mover> </semantics></math> with the plane of original cameras. The top diagram shows a top-down view in real-world, physical dimensions. The bottom diagram visualizes two of the four model dimensions, of which the horizontal one is the same real-world physical x-axis, and the vertical one is the pixel-x coordinate within each camera. The hollow circles represent the positions of the nine original cameras (by means of their xz-coordinate in the top diagram and by their x-coordinate in the bottom diagram). In the top diagram, the rays intersect with the plane of the original cameras at the marked orange dots, which yield the camera-xy coordinates of the query points. Additionally, for four of those intersections, we have drawn the hypothetical cameras to show how a ray can be linked to xy-pixel coordinates of the query points. Those intersections additionally each have a red dot drawn on their pixel-x axis in the top diagram, connected with a blue dotted line to the corresponding orange dot. This connection between the orange and the red dot represents the query point. In the bottom diagram, the blue line is the mapping surface through the model, and the blue dots on it represent the query points. The four intersections for which the hypothetical cameras are drawn have query points that are indicated with bigger blue dots. The dots from the black dotted lines represent the pixels of the pictures the original cameras have taken. The two red ellipses represent example components from the model (by means of their region of influence, which corresponds to a cutoff Mahalanobis distance from their center point). The two fainter ellipses are drawn such that they touch the mapping surface (i.e., they represent the locus of points at the minimal Mahalanobis distance from any point on the mapping surface to the center of the component).</p>
Full article ">Figure A3
<p>Auxiliary lower-dimensional visualization of the approach taken in this proposed work to calculate the Mahalanobis distance from a certain point to the center of a given component. Note that subfigure (<b>a</b>) represents the intersection of subfigure (<b>b</b>) along the mapping surface (indicated by the corresponding shared blue color), where the blue dots correspond on both figures. (<b>a</b>) The alpha-function <math display="inline"><semantics> <msub> <mi>γ</mi> <mrow> <mi mathvariant="sans-serif">s</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> </semantics></math> of a single component in the virtual view. Pixel coordinates are marked with blue dots. The point <math display="inline"><semantics> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> </semantics></math> on the screen represents a point for which we want to reconstruct the color of the SMoE model. Note that this diagram is a 1D simplification, as, in reality, the screen is 2D. The point <math display="inline"><semantics> <msub> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">opt</mi> </msub> </semantics></math> on the screen corresponds to the point where the Mahalanobis distance is minimal (and thus where the alpha function yields its highest value). The vector <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> </mrow> </semantics></math> is the difference vector between the <math display="inline"><semantics> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> </semantics></math> and <math display="inline"><semantics> <msub> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">opt</mi> </msub> </semantics></math> vectors; (<b>b</b>) A visualization of how a single component, with center <math display="inline"><semantics> <msub> <mover accent="true"> <mi>μ</mi> <mo stretchy="false">→</mo> </mover> <mi>i</mi> </msub> </semantics></math> and covariance matrix <math display="inline"><semantics> <msub> <mi>R</mi> <mi>i</mi> </msub> </semantics></math>, yields its corresponding 1st-order Taylor approximation of <math display="inline"><semantics> <msub> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">s</mi> </msub> </semantics></math>, denoted <math display="inline"><semantics> <msubsup> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mrow> <mi mathvariant="sans-serif">s</mi> </mrow> <mrow> <mspace width="0.166667em"/> <mo>′</mo> </mrow> </msubsup> </semantics></math>. The mapping surface visualizes the query points corresponding to the screen-space pixel coordinates of the virtual view. The gradient matrix <math display="inline"><semantics> <mrow> <mi>G</mi> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mn>4</mn> <mo>×</mo> <mn>2</mn> </mrow> </msup> </mrow> </semantics></math> is the linear factor between <math display="inline"><semantics> <mrow> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mn>2</mn> </msup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">s</mi> </msub> <mrow> <mo>(</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mo>)</mo> </mrow> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mn>4</mn> </msup> </mrow> </semantics></math> around <math display="inline"><semantics> <msub> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">opt</mi> </msub> </semantics></math>. To evaluate the alpha-function of the SMoE component for every pixel of the virtual view, the Mahalanobis length of the vector <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mover accent="true"> <mi>x</mi> <mo stretchy="false">→</mo> </mover> <mo>≔</mo> <msub> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">s</mi> </msub> <mrow> <mo>(</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mo>)</mo> </mrow> <mo>−</mo> <msub> <mover accent="true"> <mi>μ</mi> <mo stretchy="false">→</mo> </mover> <mi>i</mi> </msub> </mrow> </semantics></math> is required for every <math display="inline"><semantics> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> </semantics></math>. However, the proposed method will actually compute <math display="inline"><semantics> <mrow> <msubsup> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mrow> <mi mathvariant="sans-serif">s</mi> </mrow> <mrow> <mspace width="0.166667em"/> <mo>′</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mo>)</mo> </mrow> </mrow> </semantics></math> instead of <math display="inline"><semantics> <mrow> <msub> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">s</mi> </msub> <mrow> <mo>(</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mo>)</mo> </mrow> </mrow> </semantics></math>. Per component, the proposed method essentially breaks down the vector <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mover accent="true"> <mi>x</mi> <mo stretchy="false">→</mo> </mover> </mrow> </semantics></math> into the constant vector <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> </mrow> </semantics></math> and a variable vector <math display="inline"><semantics> <mrow> <mi>G</mi> <mo>Δ</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> </mrow> </semantics></math> aligned with the mapping surface, where <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mo>≔</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mo>−</mo> <msub> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">opt</mi> </msub> </mrow> </semantics></math>. The spacing of the query points is actually non-uniform, and yet the linear Taylor approximation <math display="inline"><semantics> <msubsup> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mrow> <mi mathvariant="sans-serif">s</mi> </mrow> <mrow> <mspace width="0.166667em"/> <mo>′</mo> </mrow> </msubsup> </semantics></math> models the spacing of the query points as uniform. As such, a slight discrepancy can be observed between <math display="inline"><semantics> <mrow> <msubsup> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mrow> <mi mathvariant="sans-serif">s</mi> </mrow> <mrow> <mspace width="0.166667em"/> <mo>′</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mo>)</mo> </mrow> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">s</mi> </msub> <mrow> <mo>(</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mo>)</mo> </mrow> </mrow> </semantics></math>. Similarly, the computed Mahalanobis cut-off distance of the region of influence will be slightly off. As such, the actual region of influence will be underestimated towards the center of the view and overestimated towards the edges of the view. This diagram also helps to illustrate why <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>G</mi> <mo>Δ</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> </mrow> </semantics></math> are orthogonal with respect to the covariance <math display="inline"><semantics> <msub> <mi>R</mi> <mi>i</mi> </msub> </semantics></math>. By mentally transforming the whole diagram linearly such that the drawn ellipses end up being perfect circles and the covariance becomes the identity matrix, one can imagine these two vectors are indeed orthogonal in the classical Euclidean way. This is precisely because the inner ellipse touches the mapping surface. This was achieved by finding the point <math display="inline"><semantics> <msub> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">opt</mi> </msub> </semantics></math> on the mapping surface that minimized Mahalanobis distance to the component.</p>
Full article ">Figure A3 Cont.
<p>Auxiliary lower-dimensional visualization of the approach taken in this proposed work to calculate the Mahalanobis distance from a certain point to the center of a given component. Note that subfigure (<b>a</b>) represents the intersection of subfigure (<b>b</b>) along the mapping surface (indicated by the corresponding shared blue color), where the blue dots correspond on both figures. (<b>a</b>) The alpha-function <math display="inline"><semantics> <msub> <mi>γ</mi> <mrow> <mi mathvariant="sans-serif">s</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> </semantics></math> of a single component in the virtual view. Pixel coordinates are marked with blue dots. The point <math display="inline"><semantics> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> </semantics></math> on the screen represents a point for which we want to reconstruct the color of the SMoE model. Note that this diagram is a 1D simplification, as, in reality, the screen is 2D. The point <math display="inline"><semantics> <msub> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">opt</mi> </msub> </semantics></math> on the screen corresponds to the point where the Mahalanobis distance is minimal (and thus where the alpha function yields its highest value). The vector <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> </mrow> </semantics></math> is the difference vector between the <math display="inline"><semantics> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> </semantics></math> and <math display="inline"><semantics> <msub> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">opt</mi> </msub> </semantics></math> vectors; (<b>b</b>) A visualization of how a single component, with center <math display="inline"><semantics> <msub> <mover accent="true"> <mi>μ</mi> <mo stretchy="false">→</mo> </mover> <mi>i</mi> </msub> </semantics></math> and covariance matrix <math display="inline"><semantics> <msub> <mi>R</mi> <mi>i</mi> </msub> </semantics></math>, yields its corresponding 1st-order Taylor approximation of <math display="inline"><semantics> <msub> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">s</mi> </msub> </semantics></math>, denoted <math display="inline"><semantics> <msubsup> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mrow> <mi mathvariant="sans-serif">s</mi> </mrow> <mrow> <mspace width="0.166667em"/> <mo>′</mo> </mrow> </msubsup> </semantics></math>. The mapping surface visualizes the query points corresponding to the screen-space pixel coordinates of the virtual view. The gradient matrix <math display="inline"><semantics> <mrow> <mi>G</mi> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mn>4</mn> <mo>×</mo> <mn>2</mn> </mrow> </msup> </mrow> </semantics></math> is the linear factor between <math display="inline"><semantics> <mrow> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mn>2</mn> </msup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">s</mi> </msub> <mrow> <mo>(</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mo>)</mo> </mrow> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mn>4</mn> </msup> </mrow> </semantics></math> around <math display="inline"><semantics> <msub> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">opt</mi> </msub> </semantics></math>. To evaluate the alpha-function of the SMoE component for every pixel of the virtual view, the Mahalanobis length of the vector <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mover accent="true"> <mi>x</mi> <mo stretchy="false">→</mo> </mover> <mo>≔</mo> <msub> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">s</mi> </msub> <mrow> <mo>(</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mo>)</mo> </mrow> <mo>−</mo> <msub> <mover accent="true"> <mi>μ</mi> <mo stretchy="false">→</mo> </mover> <mi>i</mi> </msub> </mrow> </semantics></math> is required for every <math display="inline"><semantics> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> </semantics></math>. However, the proposed method will actually compute <math display="inline"><semantics> <mrow> <msubsup> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mrow> <mi mathvariant="sans-serif">s</mi> </mrow> <mrow> <mspace width="0.166667em"/> <mo>′</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mo>)</mo> </mrow> </mrow> </semantics></math> instead of <math display="inline"><semantics> <mrow> <msub> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">s</mi> </msub> <mrow> <mo>(</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mo>)</mo> </mrow> </mrow> </semantics></math>. Per component, the proposed method essentially breaks down the vector <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mover accent="true"> <mi>x</mi> <mo stretchy="false">→</mo> </mover> </mrow> </semantics></math> into the constant vector <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> </mrow> </semantics></math> and a variable vector <math display="inline"><semantics> <mrow> <mi>G</mi> <mo>Δ</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> </mrow> </semantics></math> aligned with the mapping surface, where <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mo>≔</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mo>−</mo> <msub> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">opt</mi> </msub> </mrow> </semantics></math>. The spacing of the query points is actually non-uniform, and yet the linear Taylor approximation <math display="inline"><semantics> <msubsup> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mrow> <mi mathvariant="sans-serif">s</mi> </mrow> <mrow> <mspace width="0.166667em"/> <mo>′</mo> </mrow> </msubsup> </semantics></math> models the spacing of the query points as uniform. As such, a slight discrepancy can be observed between <math display="inline"><semantics> <mrow> <msubsup> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mrow> <mi mathvariant="sans-serif">s</mi> </mrow> <mrow> <mspace width="0.166667em"/> <mo>′</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mo>)</mo> </mrow> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">s</mi> </msub> <mrow> <mo>(</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> <mo>)</mo> </mrow> </mrow> </semantics></math>. Similarly, the computed Mahalanobis cut-off distance of the region of influence will be slightly off. As such, the actual region of influence will be underestimated towards the center of the view and overestimated towards the edges of the view. This diagram also helps to illustrate why <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>G</mi> <mo>Δ</mo> <mover accent="true"> <mi>s</mi> <mo stretchy="false">→</mo> </mover> </mrow> </semantics></math> are orthogonal with respect to the covariance <math display="inline"><semantics> <msub> <mi>R</mi> <mi>i</mi> </msub> </semantics></math>. By mentally transforming the whole diagram linearly such that the drawn ellipses end up being perfect circles and the covariance becomes the identity matrix, one can imagine these two vectors are indeed orthogonal in the classical Euclidean way. This is precisely because the inner ellipse touches the mapping surface. This was achieved by finding the point <math display="inline"><semantics> <msub> <mover accent="true"> <mi>q</mi> <mo stretchy="false">→</mo> </mover> <mi mathvariant="sans-serif">opt</mi> </msub> </semantics></math> on the mapping surface that minimized Mahalanobis distance to the component.</p>
Full article ">Figure A4
<p>Depth map rendered for the <span class="html-italic">lone monk</span> scene, along with the corresponding color rendition. Depth values are mapped onto a rotating color hue. As a side note, observe that the sky has incorrect depth, but this is a modeling artifact.</p>
Full article ">
19 pages, 5207 KiB  
Article
Enhancing the Precision of Forest Growing Stock Volume in the Estonian National Forest Inventory with Different Predictive Techniques and Remote Sensing Data
by Temitope Olaoluwa Omoniyi and Allan Sims
Remote Sens. 2024, 16(20), 3794; https://doi.org/10.3390/rs16203794 - 12 Oct 2024
Viewed by 678
Abstract
Estimating forest growing stock volume (GSV) is crucial for forest growth and resource management, as it reflects forest productivity. National measurements are laborious and costly; however, integrating satellite data such as optical, Synthetic Aperture Radar (SAR), and airborne laser scanning (ALS) with National [...] Read more.
Estimating forest growing stock volume (GSV) is crucial for forest growth and resource management, as it reflects forest productivity. National measurements are laborious and costly; however, integrating satellite data such as optical, Synthetic Aperture Radar (SAR), and airborne laser scanning (ALS) with National Forest Inventory (NFI) data and machine learning (ML) methods has transformed forest management. In this study, random forest (RF), support vector regression (SVR), and Extreme Gradient Boosting (XGBoost) were used to predict GSV using Estonian NFI data, Sentinel-2 imagery, and ALS point cloud data. Four variable combinations were tested: CO1 (vegetation indices and LiDAR), CO2 (vegetation indices and individual band reflectance), CO3 (LiDAR and individual band reflectance), and CO4 (a combination of vegetation indices, individual band reflectance, and LiDAR). Across Estonia’s geographical regions, RF consistently delivered the best performance. In the northwest (NW), the RF model achieved the best performance with the CO3 combination, having an R2 of 0.63 and an RMSE of 125.39 m3/plot. In the southwest (SW), the RF model also performed exceptionally well, achieving an R2 of 0.73 and an RMSE of 128.86 m3/plot with the CO4 variable combination. In the northeast (NE), the RF model outperformed other ML models, achieving an R2 of 0.64 and an RMSE of 133.77 m3/plot under the CO4 combination. Finally, in the southeast (SE) region, the best performance was achieved with the CO4 combination, yielding an R2 of 0.70 and an RMSE of 21,120.72 m3/plot. These results underscore RF’s precision in predicting GSV across diverse environments, though refining variable selection and improving tree species data could further enhance accuracy. Full article
Show Figures

Figure 1

Figure 1
<p>Cluster network (<b>a</b>) of the Estonia NFI permanent and temporary plot (2018–2022); Cartogram of the elevation model of the land cover (<b>b</b>).</p>
Full article ">Figure 2
<p>Methodology flowchart for this study.</p>
Full article ">Figure 3
<p>Scatter plot of observed vs. predicted GSV values for the validation plots using the best predictive model. The symbols * and ** represent the CO3 and CO4 combinations, respectively. (<b>a</b>), (<b>b</b>), (<b>c</b>), and (<b>d</b>) denote the random forest-based models for the northwest, southwest, northeast, and southeast regions, respectively.</p>
Full article ">Figure 3 Cont.
<p>Scatter plot of observed vs. predicted GSV values for the validation plots using the best predictive model. The symbols * and ** represent the CO3 and CO4 combinations, respectively. (<b>a</b>), (<b>b</b>), (<b>c</b>), and (<b>d</b>) denote the random forest-based models for the northwest, southwest, northeast, and southeast regions, respectively.</p>
Full article ">Figure A1
<p>Variable important plot using the best predictive model. Where (<b>a</b>), (<b>b</b>), (<b>c</b>), and (<b>d</b>) denote the random forest-based models for the northwest, southwest, northeast, and southeast regions, respectively.</p>
Full article ">
20 pages, 4626 KiB  
Article
Three-Dimensional Reconstruction of Indoor Scenes Based on Implicit Neural Representation
by Zhaoji Lin, Yutao Huang and Li Yao
J. Imaging 2024, 10(9), 231; https://doi.org/10.3390/jimaging10090231 - 16 Sep 2024
Viewed by 811
Abstract
Reconstructing 3D indoor scenes from 2D images has always been an important task in computer vision and graphics applications. For indoor scenes, traditional 3D reconstruction methods have problems such as missing surface details, poor reconstruction of large plane textures and uneven illumination areas, [...] Read more.
Reconstructing 3D indoor scenes from 2D images has always been an important task in computer vision and graphics applications. For indoor scenes, traditional 3D reconstruction methods have problems such as missing surface details, poor reconstruction of large plane textures and uneven illumination areas, and many wrongly reconstructed floating debris noises in the reconstructed models. This paper proposes a 3D reconstruction method for indoor scenes that combines neural radiation field (NeRFs) and signed distance function (SDF) implicit expressions. The volume density of the NeRF is used to provide geometric information for the SDF field, and the learning of geometric shapes and surfaces is strengthened by adding an adaptive normal prior optimization learning process. It not only preserves the high-quality geometric information of the NeRF, but also uses the SDF to generate an explicit mesh with a smooth surface, significantly improving the reconstruction quality of large plane textures and uneven illumination areas in indoor scenes. At the same time, a new regularization term is designed to constrain the weight distribution, making it an ideal unimodal compact distribution, thereby alleviating the problem of uneven density distribution and achieving the effect of floating debris removal in the final model. Experiments show that the 3D reconstruction effect of this paper on ScanNet, Hypersim, and Replica datasets outperforms the state-of-the-art methods. Full article
(This article belongs to the Special Issue Geometry Reconstruction from Images (2nd Edition))
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Distortion of reconstructed 3D models under uneven lighting conditions enclosed by the red dashed box; (<b>b</b>) distortion of 3D reconstruction of smooth planar texture areas enclosed by the red dashed box; (<b>c</b>) floating debris noise in red box in 3D reconstruction.</p>
Full article ">Figure 2
<p>Overall framework of the method.</p>
Full article ">Figure 3
<p>(<b>a</b>) The normal estimation is inaccurate in some fine structures enclosed by red dashed box, such as chair legs, based on TiltedSN normal estimation module; (<b>b</b>) we use an adaptive normal prior method to derive accurate normals based on the consistency of adjacent images. In the red dashed box, the fine structures are accurately reconstructed.</p>
Full article ">Figure 3 Cont.
<p>(<b>a</b>) The normal estimation is inaccurate in some fine structures enclosed by red dashed box, such as chair legs, based on TiltedSN normal estimation module; (<b>b</b>) we use an adaptive normal prior method to derive accurate normals based on the consistency of adjacent images. In the red dashed box, the fine structures are accurately reconstructed.</p>
Full article ">Figure 4
<p>Neural implicit reconstruction process.</p>
Full article ">Figure 5
<p>Distribution diagram of distance and weight values between sampling points.</p>
Full article ">Figure 6
<p>Three-dimensional model reconstructed from scenes in the ScanNet dataset. (<b>a</b>) Comparison of 3D models; (<b>b</b>) comparison of the specific details in the red dashed box.</p>
Full article ">Figure 7
<p>Qualitive comparison for thin structure areas using ScanNet dataset: (<b>a</b>) reference image; (<b>b</b>) model reconstructed without using normal prior; (<b>c</b>) model reconstructed with normal prior and without adaptive scheme; (<b>d</b>) model reconstructed with normal prior and adaptive scheme.</p>
Full article ">Figure 8
<p>Qualitive comparison for reflective areas using Hypersim dataset: (<b>a</b>) reference image; (<b>b</b>) model reconstructed without using normal prior; (<b>c</b>) model reconstructed with normal prior and without adaptive scheme; (<b>d</b>) model reconstructed with normal prior and adaptive scheme.</p>
Full article ">Figure 9
<p>Visual comparison for a scene with a large amount of floating debris using the ScanNet dataset; (<b>a</b>) reconstruction result without adding a distortion loss function; (<b>b</b>) reconstruction result with a distortion loss function.</p>
Full article ">Figure 10
<p>Visual comparison for a scene with single floating debris areas enclosed by red dashed box using the ScanNet dataset; (<b>a</b>) reconstruction result without adding distortion loss function; (<b>b</b>) reconstruction result with distortion loss function.</p>
Full article ">Figure 11
<p>The limitations of this method in the 3D reconstruction of scenes with clutter, occlusion, soft non-solid objects, and blurred images, using the ScanNet dataset.</p>
Full article ">
13 pages, 11191 KiB  
Article
The Adaption of Recent New Concepts in Neural Radiance Fields and Their Role for High-Fidelity Volume Reconstruction in Medical Images
by Haill An, Jawad Khan, Suhyeon Kim, Junseo Choi and Younhyun Jung
Sensors 2024, 24(18), 5923; https://doi.org/10.3390/s24185923 - 12 Sep 2024
Viewed by 965
Abstract
Volume reconstruction techniques are gaining increasing interest in medical domains due to their potential to learn complex 3D structural information from sparse 2D images. Recently, neural radiance fields (NeRF), which implicitly model continuous radiance fields based on multi-layer perceptrons to enable volume reconstruction [...] Read more.
Volume reconstruction techniques are gaining increasing interest in medical domains due to their potential to learn complex 3D structural information from sparse 2D images. Recently, neural radiance fields (NeRF), which implicitly model continuous radiance fields based on multi-layer perceptrons to enable volume reconstruction of objects at arbitrary resolution, have gained traction in natural image volume reconstruction. However, the direct application of NeRF to medical volume reconstruction presents unique challenges due to differences in imaging principles, internal structure requirements, and boundary delineation. In this paper, we evaluate different NeRF techniques developed for natural images, including sampling strategies, feature encoding, and the use of complimentary features, by applying them to medical images. We evaluate three state-of-the-art NeRF techniques on four datasets of medical images of different complexity. Our goal is to identify the strengths, limitations, and future directions for integrating NeRF into the medical domain. Full article
(This article belongs to the Special Issue Biomedical Sensing System Based on Image Analysis)
Show Figures

Figure 1

Figure 1
<p>The architecture overview of the vanilla NeRF (<b>a</b>) and the three variations (<b>b</b>–<b>d</b>).</p>
Full article ">Figure 2
<p>Visual comparison of four NeRF variations—vanilla NeRF, MipNeRF, Instant-NGP, and PixelNeRF—to the traditional FDK using four medical image slices (each row).</p>
Full article ">
26 pages, 22671 KiB  
Review
A Brief Review on Differentiable Rendering: Recent Advances and Challenges
by Ruicheng Gao and Yue Qi
Electronics 2024, 13(17), 3546; https://doi.org/10.3390/electronics13173546 - 6 Sep 2024
Cited by 1 | Viewed by 1529
Abstract
Differentiable rendering techniques have received significant attention from both industry and academia for novel view synthesis or for reconstructing shapes and materials from one or multiple input photographs. These techniques are used to propagate gradients from image pixel colors back to scene parameters. [...] Read more.
Differentiable rendering techniques have received significant attention from both industry and academia for novel view synthesis or for reconstructing shapes and materials from one or multiple input photographs. These techniques are used to propagate gradients from image pixel colors back to scene parameters. The obtained gradients can then be used in various optimization algorithms to reconstruct the scene representation or can be further propagated into a neural network to learn the scene’s neural representations. In this work, we provide a brief taxonomy of existing popular differentiable rendering methods, categorizing them based on the primary rendering algorithms employed: physics-based differentiable rendering (PBDR), methods based on neural radiance fields (NeRFs), and methods based on 3D Gaussian splatting (3DGS). Since there are already several reviews for NeRF-based or 3DGS-based differentiable rendering methods but almost zero for physics-based differentiable rendering, we place our main focus on PBDR and, for completeness, only review several improvements made for NeRF and 3DGS in this survey. Specifically, we provide introductions to the theories behind all three categories of methods, a benchmark comparison of the performance of influential works across different aspects, and a summary of the current state and open research problems. With this survey, we seek to welcome new researchers to the field of differentiable rendering, offer a useful reference for key influential works, and inspire future research through our concluding section. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

Figure 1
<p>A brief taxonomy of existing differentiable rendering methods. Figures come from [<a href="#B5-electronics-13-03546" class="html-bibr">5</a>,<a href="#B6-electronics-13-03546" class="html-bibr">6</a>,<a href="#B10-electronics-13-03546" class="html-bibr">10</a>,<a href="#B11-electronics-13-03546" class="html-bibr">11</a>].</p>
Full article ">Figure 2
<p>Selected representative works in the area of differentiable rendering [<a href="#B9-electronics-13-03546" class="html-bibr">9</a>,<a href="#B10-electronics-13-03546" class="html-bibr">10</a>,<a href="#B11-electronics-13-03546" class="html-bibr">11</a>,<a href="#B14-electronics-13-03546" class="html-bibr">14</a>,<a href="#B15-electronics-13-03546" class="html-bibr">15</a>,<a href="#B16-electronics-13-03546" class="html-bibr">16</a>,<a href="#B18-electronics-13-03546" class="html-bibr">18</a>,<a href="#B19-electronics-13-03546" class="html-bibr">19</a>,<a href="#B20-electronics-13-03546" class="html-bibr">20</a>,<a href="#B27-electronics-13-03546" class="html-bibr">27</a>,<a href="#B28-electronics-13-03546" class="html-bibr">28</a>,<a href="#B29-electronics-13-03546" class="html-bibr">29</a>,<a href="#B30-electronics-13-03546" class="html-bibr">30</a>,<a href="#B31-electronics-13-03546" class="html-bibr">31</a>,<a href="#B32-electronics-13-03546" class="html-bibr">32</a>,<a href="#B33-electronics-13-03546" class="html-bibr">33</a>,<a href="#B34-electronics-13-03546" class="html-bibr">34</a>,<a href="#B35-electronics-13-03546" class="html-bibr">35</a>,<a href="#B36-electronics-13-03546" class="html-bibr">36</a>,<a href="#B37-electronics-13-03546" class="html-bibr">37</a>,<a href="#B38-electronics-13-03546" class="html-bibr">38</a>,<a href="#B39-electronics-13-03546" class="html-bibr">39</a>,<a href="#B40-electronics-13-03546" class="html-bibr">40</a>].</p>
Full article ">Figure 3
<p>Reconstruction of geometry from one photograph by [<a href="#B14-electronics-13-03546" class="html-bibr">14</a>].</p>
Full article ">Figure 4
<p>Three sources of triangle edges that cause discontinuities in the scene function by Zhang et al. [<a href="#B12-electronics-13-03546" class="html-bibr">12</a>].</p>
Full article ">Figure 5
<p>Comparison of the effectiveness between algorithms from Zhang et al. [<a href="#B11-electronics-13-03546" class="html-bibr">11</a>] and Li et al. [<a href="#B9-electronics-13-03546" class="html-bibr">9</a>] under an equal−sample configuration. Images in the left column visualize the overall scene configurations. The method of Li et al. achieves lower variance with less computation time. The figures come from [<a href="#B11-electronics-13-03546" class="html-bibr">11</a>].</p>
Full article ">Figure 6
<p>Comparison of the effectiveness between algorithms from Yan et al. [<a href="#B41-electronics-13-03546" class="html-bibr">41</a>] and Zhang et al. [<a href="#B11-electronics-13-03546" class="html-bibr">11</a>] under an equal−time configuration. The adaptive data structures employed by Yan et al. [<a href="#B41-electronics-13-03546" class="html-bibr">41</a>] provide better exploration of the boundary sample space, resulting in lower variance for the gradient estimates. Figures come from [<a href="#B41-electronics-13-03546" class="html-bibr">41</a>].</p>
Full article ">Figure 7
<p>Comparison of the effectiveness between algorithms from Zhang et al. [<a href="#B14-electronics-13-03546" class="html-bibr">14</a>] and Yan et al. [<a href="#B41-electronics-13-03546" class="html-bibr">41</a>] under an equal−time configuration. The results show that the calculation of the boundary integral can greatly benefit from the information gathered during the simulation of the interior term. Figures come from [<a href="#B14-electronics-13-03546" class="html-bibr">14</a>].</p>
Full article ">Figure 8
<p>(<b>a</b>) Use of spherical rotation to approximate the movement of geometric discontinuities within a small spherical domain. (<b>b</b>) Use of spherical convolution to transform the large function support case into a small support case by Loubet et al. [<a href="#B15-electronics-13-03546" class="html-bibr">15</a>].</p>
Full article ">Figure 9
<p>Comparison of the effectiveness between algorithms from Bangaru et al. [<a href="#B10-electronics-13-03546" class="html-bibr">10</a>] and Loubet et al. [<a href="#B15-electronics-13-03546" class="html-bibr">15</a>]. Note that the results from Loubet et al. [<a href="#B15-electronics-13-03546" class="html-bibr">15</a>] exhibit high bias, while those from Bangaru et al. [<a href="#B10-electronics-13-03546" class="html-bibr">10</a>] closely match the reference. Figures come from [<a href="#B10-electronics-13-03546" class="html-bibr">10</a>].</p>
Full article ">Figure 10
<p>Comparison of the effectiveness between algorithms from Xu et al. [<a href="#B16-electronics-13-03546" class="html-bibr">16</a>] and Bangaru et al. [<a href="#B10-electronics-13-03546" class="html-bibr">10</a>] under an equal-sample configuration. The results from Xu et al. [<a href="#B16-electronics-13-03546" class="html-bibr">16</a>] exhibit lower variance due to the new formulation for reparameterized differential path integrals and the new distance function proposed. Figures come from [<a href="#B16-electronics-13-03546" class="html-bibr">16</a>].</p>
Full article ">Figure 11
<p>Overview of the NeRF rendering process: (<b>a</b>) select a series of sampling points along camera rays, (<b>b</b>) output the color and volume density of the sampling points using the underlying neural network, (<b>c</b>) calculate individual pixel colors via Equation (25), and (<b>d</b>) compare the predicted image with the reference image and optimize the parameters of the underlying neural network. Figures come from [<a href="#B5-electronics-13-03546" class="html-bibr">5</a>].</p>
Full article ">Figure 12
<p>Comparison results of four methods—NeRF [<a href="#B5-electronics-13-03546" class="html-bibr">5</a>], MetaNeRF [<a href="#B64-electronics-13-03546" class="html-bibr">64</a>], PixelNeRF [<a href="#B28-electronics-13-03546" class="html-bibr">28</a>], and DS-NeRF [<a href="#B27-electronics-13-03546" class="html-bibr">27</a>]—on the DTU [<a href="#B55-electronics-13-03546" class="html-bibr">55</a>] and Redwood [<a href="#B57-electronics-13-03546" class="html-bibr">57</a>] datasets with sparse input views. Note that DS-NeRF performs the best, except on DTU when given 3 views. Figures come from [<a href="#B27-electronics-13-03546" class="html-bibr">27</a>].</p>
Full article ">Figure 13
<p>Comparison results of four methods—NeRF [<a href="#B5-electronics-13-03546" class="html-bibr">5</a>], MetaNeRF [<a href="#B64-electronics-13-03546" class="html-bibr">64</a>], PixelNeRF [<a href="#B28-electronics-13-03546" class="html-bibr">28</a>], and DS-NeRF [<a href="#B27-electronics-13-03546" class="html-bibr">27</a>]—on the NeRF Synthetic dataset [<a href="#B5-electronics-13-03546" class="html-bibr">5</a>] with sparse input views. Note that DS-NeRF performs the best in all cases. Figures come from [<a href="#B27-electronics-13-03546" class="html-bibr">27</a>].</p>
Full article ">Figure 14
<p>Comparison results between FreeNeRF [<a href="#B29-electronics-13-03546" class="html-bibr">29</a>] and RegNeRF [<a href="#B65-electronics-13-03546" class="html-bibr">65</a>] on the DTU [<a href="#B55-electronics-13-03546" class="html-bibr">55</a>] dataset. Note that FreeNeRF performs better than RegNeRF on fine-grained details. Figures come from [<a href="#B29-electronics-13-03546" class="html-bibr">29</a>].</p>
Full article ">Figure 15
<p>Comparison results between FreeNeRF [<a href="#B29-electronics-13-03546" class="html-bibr">29</a>] and RegNeRF [<a href="#B65-electronics-13-03546" class="html-bibr">65</a>] on the LLFF [<a href="#B56-electronics-13-03546" class="html-bibr">56</a>] dataset. Note that FreeNeRF reconstructs less-noisy occupancy fields with fewer floaters. Figures come from [<a href="#B29-electronics-13-03546" class="html-bibr">29</a>].</p>
Full article ">Figure 16
<p>Comparison results for per-scene optimization between Point-NeRF [<a href="#B34-electronics-13-03546" class="html-bibr">34</a>] and previous methods on the DTU [<a href="#B55-electronics-13-03546" class="html-bibr">55</a>] dataset. Note that Point-NeRF recovers texture details and geometrical structures more accurately than other methods. Figures come from [<a href="#B34-electronics-13-03546" class="html-bibr">34</a>].</p>
Full article ">Figure 17
<p>Comparison results between SA-GS [<a href="#B75-electronics-13-03546" class="html-bibr">75</a>] and previous methods under zoom-in and zoom-out settings. Note that the rendering results of SA-GS exhibit better anti-aliasing performance and scale consistency compared to other methods. Figures come from [<a href="#B75-electronics-13-03546" class="html-bibr">75</a>].</p>
Full article ">Figure 18
<p>Comparison results between VDGS [<a href="#B36-electronics-13-03546" class="html-bibr">36</a>] and original 3DGS [<a href="#B6-electronics-13-03546" class="html-bibr">6</a>] on Tanks and Temples [<a href="#B59-electronics-13-03546" class="html-bibr">59</a>] and Mip-NeRF 360 [<a href="#B58-electronics-13-03546" class="html-bibr">58</a>] datasets. Note that VDGS renders fewer artifacts in both datasets compared to original 3DGS. Figures come from [<a href="#B36-electronics-13-03546" class="html-bibr">36</a>].</p>
Full article ">Figure 19
<p>Pipeline from GeoGaussian [<a href="#B79-electronics-13-03546" class="html-bibr">79</a>]. A geometry-aware 3D Gaussian initialization strategy is proposed.</p>
Full article ">Figure 20
<p>Comparison results for novel view synthesis between GeoGaussian [<a href="#B79-electronics-13-03546" class="html-bibr">79</a>] and previous methods on the ICL-NUIM [<a href="#B60-electronics-13-03546" class="html-bibr">60</a>] dataset. Note that the artifacts present in the results of 3DGS and LightGS disappear in GeoGaussian. Figures come from [<a href="#B79-electronics-13-03546" class="html-bibr">79</a>].</p>
Full article ">Figure 21
<p>Reconstruction and editing result of SuGaR [<a href="#B20-electronics-13-03546" class="html-bibr">20</a>].</p>
Full article ">
15 pages, 3059 KiB  
Article
Preliminary Exploration of Low Frequency Low-Pressure Capacitively Coupled Ar-O2 Plasma
by Niaz Wali, Weiwen Xiao, Qayam Ud Din, Najeeb Ur Rehman, Chiyu Wang, Jiatong Ma, Wenjie Zhong and Qiwei Yang
Processes 2024, 12(9), 1858; https://doi.org/10.3390/pr12091858 - 31 Aug 2024
Viewed by 1016
Abstract
Non-thermal plasma as an emergent technology has received considerable attention for its wide range of applications in agriculture, material synthesis, and the biomedical field due to its low cost and portability. It has promising antimicrobial properties, making it a powerful tool for bacterial [...] Read more.
Non-thermal plasma as an emergent technology has received considerable attention for its wide range of applications in agriculture, material synthesis, and the biomedical field due to its low cost and portability. It has promising antimicrobial properties, making it a powerful tool for bacterial decontamination. However, traditional techniques for producing non-thermal plasma frequently rely on radiofrequency (RF) devices, despite their effectiveness, are intricate and expensive. This study focuses on generating Ar-O2 capacitively coupled plasma under vacuum conditions, utilizing a low-frequency alternating current (AC) power supply, to evaluate the system’s antimicrobial efficacy. A single Langmuir probe diagnostic was used to assess the key plasma parameters such as electron density (ne), electron temperature (Te), and electron energy distribution function (EEDF). Experimental results showed that ne increases (7 × 1015 m−3 to 1.5 × 1016 m−3) with a rise in pressure and AC power. Similarly, the EEDF modified into a bi-Maxwellian distribution with an increase in AC power, showing a higher population of low-energy electrons at higher power. Finally, the generated plasma was tested for antimicrobial treatment of Xanthomonas campestris pv. Vesicatoria. It is noted that the plasma generated by the AC power supply, at a pressure of 0.5 mbar and power of 400 W for 180 s, has 75% killing efficiency. This promising result highlights the capability of the suggested approach, which may be a budget-friendly and effective technique for eliminating microbes with promising applications in agriculture, biomedicine, and food processing. Full article
Show Figures

Figure 1

Figure 1
<p>Schematic diagram of the AC capacitively coupled experimental setup with installed single Langmuir probe to study the low-frequency Ar-O<sub>2</sub> plasma discharge.</p>
Full article ">Figure 2
<p>(<b>a</b>) Current–voltage waveform; (<b>b</b>) the instantaneous power as a function of time. The experiment was conducted at 4% oxygen content, 100 W applied AC power, 6 kHz fixed frequency, and 0.5 mbar pressure. (<b>c</b>) The I-V characteristics of AC power supply generated Argon and Ar-O<sub>2</sub> (O<sub>2</sub>, 4%) plasma at ~400 W and 0.5 mbar.</p>
Full article ">Figure 3
<p>(<b>a</b>) Represents the evolution of the <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="normal">n</mi> </mrow> <mrow> <mi mathvariant="normal">e</mi> </mrow> </msub> </mrow> </semantics></math> with different AC power supply of Argon plasma, (<b>b</b>) the evolution of the <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>n</mi> </mrow> <mrow> <mi>e</mi> </mrow> </msub> </mrow> </semantics></math> with different AC power supplies of Ar-O<sub>2</sub> plasma at 0.5 mbar and 0.7 mbar, respectively. The grey bars show a plasma density peak when the AC power is at ~400 W and 0.5 mbar.</p>
Full article ">Figure 4
<p>The evolution of the <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>n</mi> </mrow> <mrow> <mi>e</mi> </mrow> </msub> </mrow> </semantics></math> with different <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="normal">O</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math> contents at fixed AC power ~400 W for two different gas pressures ~0.3 mbar and ~0.5 mbar, and the evolution of the <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>n</mi> </mrow> <mrow> <mi>e</mi> </mrow> </msub> </mrow> </semantics></math> with different <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="normal">O</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math> contents at fixed RF power ~130 W and fixed gas pressure ~0.3 mbar. Similar trends of density changes are observed for both types of discharges. The RF curve is deduced from the Figure 4c of Ref. [<a href="#B13-processes-12-01858" class="html-bibr">13</a>].</p>
Full article ">Figure 5
<p>(<b>a</b>) Represents the evolution of the <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>T</mi> </mrow> <mrow> <mi>e</mi> </mrow> </msub> </mrow> </semantics></math> with different AC power at 0.5 mbar and 0.7 mbar, respectively. (<b>b</b>) a comparison of the <span class="html-italic">T<sub>e</sub></span> in different discharge types by the AC power supply and the RF power supply. The RF curve is deduced from the Figure 5c of Ref. [<a href="#B13-processes-12-01858" class="html-bibr">13</a>].</p>
Full article ">Figure 6
<p>(<b>a</b>) Represents the EEDF at fixed AC input power, and (<b>b</b>) illustrates the EEDF at fixed pressure. The blue arrow shows that the EEDF is non-Maxwellian.</p>
Full article ">Figure 7
<p>Photographs of (<b>a</b>) untreated <span class="html-italic">X. c.</span> pv. <span class="html-italic">vesicatoria</span> bacteria sample, (<b>b</b>) Ar-O<sub>2</sub> plasma treatment, (<b>c</b>) treated for 90 s, and (<b>d</b>) treated for 180 s, (<b>e</b>) the number of survival colonies after Ar-O<sub>2</sub> (4% oxygen contents) plasma treatment. The treatment was carried out at a fixed frequency of 6 kHz, 0.5 mbar pressure, and power of 400 W.</p>
Full article ">
17 pages, 29032 KiB  
Article
Real-Time Dense Visual SLAM with Neural Factor Representation
by Weifeng Wei, Jie Wang, Xiaolong Xie, Jie Liu and Pengxiang Su
Electronics 2024, 13(16), 3332; https://doi.org/10.3390/electronics13163332 - 22 Aug 2024
Viewed by 929
Abstract
Developing a high-quality, real-time, dense visual SLAM system poses a significant challenge in the field of computer vision. NeRF introduces neural implicit representation, marking a notable advancement in visual SLAM research. However, existing neural implicit SLAM methods suffer from long runtimes and face [...] Read more.
Developing a high-quality, real-time, dense visual SLAM system poses a significant challenge in the field of computer vision. NeRF introduces neural implicit representation, marking a notable advancement in visual SLAM research. However, existing neural implicit SLAM methods suffer from long runtimes and face challenges when modeling complex structures in scenes. In this paper, we propose a neural implicit dense visual SLAM method that enables high-quality real-time reconstruction even on a desktop PC. Firstly, we propose a novel neural scene representation, encoding the geometry and appearance information of the scene as a combination of the basis and coefficient factors. This representation allows for efficient memory usage and the accurate modeling of high-frequency detail regions. Secondly, we introduce feature integration rendering to significantly improve rendering speed while maintaining the quality of color rendering. Extensive experiments on synthetic and real-world datasets demonstrate that our method achieves an average improvement of more than 60% for Depth L1 and ATE RMSE compared to existing state-of-the-art methods when running at 9.8 Hz on a desktop PC with a 3.20 GHz Intel Core i9-12900K CPU and a single NVIDIA RTX 3090 GPU. This remarkable advancement highlights the crucial importance of our approach in the field of dense visual SLAM. Full article
(This article belongs to the Special Issue Advances of Artificial Intelligence and Vision Applications)
Show Figures

Figure 1

Figure 1
<p><b>Overview.</b> (1) Scene representation: We use two different sets of factor grids to represent the scene geometry and appearance, respectively. In order to simplify our overview, we use the symbol ∗ to denote both geometry, <span class="html-italic">g</span>, and appearance, <span class="html-italic">a</span>, e.g., <math display="inline"><semantics> <mrow> <msub> <mi>b</mi> <mo>∗</mo> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math> can be either <math display="inline"><semantics> <mrow> <msub> <mi>b</mi> <mi>g</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math> or <math display="inline"><semantics> <mrow> <msub> <mi>b</mi> <mi>a</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math>. For sample points along the ray, we query the basis and coefficient factors for depth and feature integration rendering. (2) Mapping process: We uniformly sample pixels from a selected set of keyframes and jointly optimize scene representation and camera poses of these keyframes. (3) Tracking process: Factor grids and MLPs remain fixed, and only the camera pose of each input frame in the RGB-D stream is estimated.</p>
Full article ">Figure 2
<p>Comparison of qualitative results for reconstruction using the Replica dataset [<a href="#B28-electronics-13-03332" class="html-bibr">28</a>].</p>
Full article ">Figure 3
<p>Comparison of qualitative results for reconstruction using the Replica dataset [<a href="#B28-electronics-13-03332" class="html-bibr">28</a>]. We visualize the untextured meshes.</p>
Full article ">Figure 4
<p>Comparison of qualitative results for reconstruction using ScanNet [<a href="#B29-electronics-13-03332" class="html-bibr">29</a>].</p>
Full article ">Figure 5
<p>Qualitative reconstruction using the ScanNet dataset [<a href="#B29-electronics-13-03332" class="html-bibr">29</a>]. We visualize the untextured meshes.</p>
Full article ">Figure 6
<p>Camera trajectories of (<b>a</b>) NICE-SLAM [<a href="#B6-electronics-13-03332" class="html-bibr">6</a>], (<b>b</b>) Co-SLAM [<a href="#B8-electronics-13-03332" class="html-bibr">8</a>], (<b>c</b>) ESLAM [<a href="#B7-electronics-13-03332" class="html-bibr">7</a>] and (<b>d</b>) our method for scene0207 from the ScanNet dataset [<a href="#B29-electronics-13-03332" class="html-bibr">29</a>].</p>
Full article ">Figure 7
<p>Comparison of the relationship between map size and memory usage changes. We select some scenes from the Replica [<a href="#B28-electronics-13-03332" class="html-bibr">28</a>] and ScanNet [<a href="#B29-electronics-13-03332" class="html-bibr">29</a>] datasets for evaluation.</p>
Full article ">Figure 8
<p>Ablation results for feature integration rendering for Replica [<a href="#B28-electronics-13-03332" class="html-bibr">28</a>] room0.</p>
Full article ">
24 pages, 1413 KiB  
Article
Loop Detection Method Based on Neural Radiance Field BoW Model for Visual Inertial Navigation of UAVs
by Xiaoyue Zhang, Yue Cui, Yanchao Ren, Guodong Duan and Huanrui Zhang
Remote Sens. 2024, 16(16), 3038; https://doi.org/10.3390/rs16163038 - 19 Aug 2024
Viewed by 681
Abstract
The loop closure detection (LCD) methods in Unmanned Aerial Vehicle (UAV) Visual Inertial Navigation System (VINS) are often affected by issues such as insufficient image texture information and limited observational perspectives, resulting in constrained UAV positioning accuracy and reduced capability to perform complex [...] Read more.
The loop closure detection (LCD) methods in Unmanned Aerial Vehicle (UAV) Visual Inertial Navigation System (VINS) are often affected by issues such as insufficient image texture information and limited observational perspectives, resulting in constrained UAV positioning accuracy and reduced capability to perform complex tasks. This study proposes a Bag-of-Words (BoW) LCD method based on Neural Radiance Field (NeRF), which estimates camera poses from existing images and achieves rapid scene reconstruction through NeRF. A method is designed to select virtual viewpoints and render images along the flight trajectory using a specific sampling approach to expand the limited observational angles, mitigating the impact of image blur and insufficient texture information at specific viewpoints while enlarging the loop closure candidate frames to improve the accuracy and success rate of LCD. Additionally, a BoW vector construction method that incorporates the importance of similar visual words and an adapted virtual image filtering and comprehensive scoring calculation method are designed to determine loop closures. Applied to VINS-Mono and ORB-SLAM3, and compared with the advanced BoW model LCDs of the two systems, results indicate that the NeRF-based BoW LCD method can detect more than 48% additional accurate loop closures, while the system’s navigation positioning error mean is reduced by over 46%, validating the effectiveness and superiority of the proposed method and demonstrating its significant importance for improving the navigation accuracy of VINS. Full article
Show Figures

Figure 1

Figure 1
<p>Framework of the BoW LCD method based on NeRF and its position in VINS.</p>
Full article ">Figure 2
<p>Positions of the central pixel and surrounding pixels.</p>
Full article ">Figure 3
<p>Example of Instant-NGP virtual view camera pose.</p>
Full article ">Figure 4
<p>Quadtree uniform feature point extraction.</p>
Full article ">Figure 5
<p>Comparison of reconstructed data of three pose estimation schemes.</p>
Full article ">Figure 6
<p>Feature matching effect between real image (<b>left</b>) and synthetic image (<b>right</b>).</p>
Full article ">Figure 7
<p>The loop closure frame detection results for the two approaches used in VINS-Mono.</p>
Full article ">Figure 8
<p>The loop closure frame detection results for the two approaches used in ORB-SLAM3.</p>
Full article ">Figure 9
<p>Example of additional loopback matching results.</p>
Full article ">Figure 10
<p>The ground truth and trajectory of two methods in VINS-Mono.</p>
Full article ">Figure 11
<p>Statistical data of the APE with image frame index in VINS-Mono.</p>
Full article ">Figure 12
<p>The APE statistics of BoW LCD method.</p>
Full article ">Figure 13
<p>The APE statistics of NeRF-based BoW model LCD method.</p>
Full article ">Figure 14
<p>The distribution image of APE in the VINS-Mono system with the color of the trajectory.</p>
Full article ">Figure 15
<p>The ground truth and trajectory of two methods in ORB-SLAM3.</p>
Full article ">Figure 16
<p>Statistical data of the APE with image frame index in ORB-SLAM3.</p>
Full article ">Figure 17
<p>The APE statistics of BoW LCD method in ORB-SLAM3.</p>
Full article ">Figure 18
<p>The APE statistics of NeRF-based BoW model LCD method in ORB-SLAM3.</p>
Full article ">Figure 19
<p>The distribution image of APE in the ORB-SLAM3 system with the color of the trajectory.</p>
Full article ">Figure 20
<p>Distribution of detected loop closures as a function of threshold r.</p>
Full article ">
18 pages, 1937 KiB  
Article
Advancing Crayfish Disease Detection: A Comparative Study of Deep Learning and Canonical Machine Learning Techniques
by Yasin Atilkan, Berk Kirik, Koray Acici, Recep Benzer, Fatih Ekinci, Mehmet Serdar Guzel, Semra Benzer and Tunc Asuroglu
Appl. Sci. 2024, 14(14), 6211; https://doi.org/10.3390/app14146211 - 17 Jul 2024
Viewed by 925
Abstract
This study evaluates the effectiveness of deep learning and canonical machine learning models for detecting diseases in crayfish from an imbalanced dataset. In this study, measurements such as weight, size, and gender of healthy and diseased crayfish individuals were taken, and at least [...] Read more.
This study evaluates the effectiveness of deep learning and canonical machine learning models for detecting diseases in crayfish from an imbalanced dataset. In this study, measurements such as weight, size, and gender of healthy and diseased crayfish individuals were taken, and at least five photographs of each individual were used. Deep learning models outperformed canonical models, but combining both approaches proved the most effective. Utilizing the ResNet50 model for automatic feature extraction and subsequent training of the RF algorithm with these extracted features led to a hybrid model, RF-ResNet50, which achieved the highest performance in diseased sample detection. This result underscores the value of integrating canonical machine learning algorithms with deep learning models. Additionally, the ConvNeXt-T model, optimized with AdamW, performed better than those using SGD, although its disease detection sensitivity was 1.3% lower than the hybrid model. McNemar’s test confirmed the statistical significance of the performance differences between the hybrid and the ConvNeXt-T model with AdamW. The ResNet50 model’s performance was improved by 3.2% when combined with the RF algorithm, demonstrating the potential of hybrid approaches in enhancing disease detection accuracy. Overall, this study highlights the advantages of leveraging both deep learning and canonical machine learning techniques for early and accurate detection of diseases in crayfish populations, which is crucial for maintaining ecosystem balance and preventing population declines. Full article
(This article belongs to the Section Environmental Sciences)
Show Figures

Figure 1

Figure 1
<p>General framework for canonical machine learning and deep learning algorithms.</p>
Full article ">Figure 2
<p>General framework for the hybrid algorithm.</p>
Full article ">Figure 3
<p>RF-ResNet50 hybrid model. ResNet50 architecture was taken from [<a href="#B51-applsci-14-06211" class="html-bibr">51</a>].</p>
Full article ">
20 pages, 3580 KiB  
Article
IW-NeRF: Using Implicit Watermarks to Protect the Copyright of Neural Radiation Fields
by Lifeng Chen, Chaoyue Song, Jia Liu, Wenquan Sun, Weina Dong and Fuqiang Di
Appl. Sci. 2024, 14(14), 6184; https://doi.org/10.3390/app14146184 - 16 Jul 2024
Viewed by 664
Abstract
The neural radiance field (NeRF) has demonstrated significant advancements in computer vision. However, the training process for NeRF models necessitates extensive computational resources and ample training data. In the event of unauthorized usage or theft of the model, substantial losses can be incurred [...] Read more.
The neural radiance field (NeRF) has demonstrated significant advancements in computer vision. However, the training process for NeRF models necessitates extensive computational resources and ample training data. In the event of unauthorized usage or theft of the model, substantial losses can be incurred by the copyright holder. To address this concern, we present a novel algorithm that leverages the implicit neural representation (INR) watermarking technique to safeguard NeRF model copyrights. By encoding the watermark information implicitly, we integrate its parameters into the NeRF model’s network using a unique key. Through this key, the copyright owner can extract the embedded watermarks from the NeRF model for ownership verification. To the best of our knowledge, this is the pioneering implementation of INR watermarking for the protection of NeRF model copyrights. Our experimental results substantiate that our approach not only offers robustness and preserves high-quality 3D reconstructions but also ensures the flawless (100%) extraction of watermark content, thereby effectively securing the copyright of the NeRF model. Full article
(This article belongs to the Special Issue Recent Advances in Multimedia Steganography and Watermarking)
Show Figures

Figure 1

Figure 1
<p>The network structure of the NeRF model.</p>
Full article ">Figure 2
<p>The reconstruction quality of NeRF model under different pruning rates.</p>
Full article ">Figure 3
<p>We present an implicit representation watermarking algorithm tailored for NeRFs. This approach encapsulates the watermark information implicitly within the NeRF model through a specified key, facilitating subsequent watermark extraction using the same key.</p>
Full article ">Figure 4
<p>The overall framework of our algorithm IW-NeRF.</p>
Full article ">Figure 5
<p>Three expansion strategies for constructing carrier networks.</p>
Full article ">Figure 6
<p>Comparison of reconstruction quality of NeRF models with different baselines.</p>
Full article ">Figure 7
<p>The NeRF reconstruction and watermark extraction effects of our scheme.</p>
Full article ">Figure 8
<p>The reconstruction and watermark extraction performance of NeRF models under different watermark network layers with a fixed number of carrier network layers.</p>
Full article ">Figure 9
<p>The SSIM value between the images rendered by the carrier NeRF model and the images in the original dataset of NeRF under different watermark network layers.</p>
Full article ">Figure 10
<p>The 3D reconstruction quality of the NeRF model under different carrier network layers after a fixed number of watermark network layers.</p>
Full article ">Figure 11
<p>SSIM values between the images rendered by the carrier NeRF model and the carrier dataset images under different carrier network layers.</p>
Full article ">Figure 12
<p>The effect of different pruning methods on watermark extraction.</p>
Full article ">Figure 13
<p>SSIM values between the original watermark image and the extracted watermark image under different pruning methods.</p>
Full article ">Figure 14
<p>Comparison of weight visualization between watermark network and carrier network.</p>
Full article ">Figure 15
<p>The average PSNR between the original watermark and the extracted watermark under different incorrect bit keys.</p>
Full article ">Figure 16
<p>Visual effects of extracting watermarks under different key error bits.</p>
Full article ">
34 pages, 14681 KiB  
Article
Performance Evaluation and Optimization of 3D Models from Low-Cost 3D Scanning Technologies for Virtual Reality and Metaverse E-Commerce
by Rubén Grande, Javier Albusac, David Vallejo, Carlos Glez-Morcillo and José Jesús Castro-Schez
Appl. Sci. 2024, 14(14), 6037; https://doi.org/10.3390/app14146037 - 10 Jul 2024
Cited by 2 | Viewed by 1559
Abstract
Virtual Reality (VR) is and will be a key driver in the evolution of e-commerce, providing an immersive and gamified shopping experience. However, for VR shopping spaces to become a reality, retailers’ product catalogues must first be digitised into 3D models. While this [...] Read more.
Virtual Reality (VR) is and will be a key driver in the evolution of e-commerce, providing an immersive and gamified shopping experience. However, for VR shopping spaces to become a reality, retailers’ product catalogues must first be digitised into 3D models. While this may be a simple task for retail giants, it can be a major obstacle for small retailers, whose human and financial resources are often more limited, making them less competitive. Therefore, this paper presents an analysis of low-cost scanning technologies for small business owners to digitise their products and make them available on VR shopping platforms, with the aim of helping improve the competitiveness of small businesses through VR and Artificial Intelligence (AI). The technologies to be considered are photogrammetry, LiDAR sensors and NeRF.In addition to investigating which technology provides the best visual quality of 3D models based on metrics and quantitative results, these models must also offer good performance in commercial VR headsets. In this way, we also analyse the performance of such models when running on Meta Quest 2, Quest Pro and Quest 3 headsets (Reality Labs, Reality Labs, CA, USA) to determine their feasibility and provide use cases for each type of model from a scalability point of view. Finally, our work describes a model optimisation process that reduce the polygon count and texture size of high-poly models, converting them into more performance-friendly versions without significantly compromising visual quality. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Set of 6 expanded polystyrene Objects used.</p>
Full article ">Figure 2
<p>Object based on a mirror material. (<b>a</b>) Decorative candle with a mirror; (<b>b</b>) 3D model of decorative candle with a mirror in Blender.</p>
Full article ">Figure 3
<p>Collage of the 5 objects scanned in real life.</p>
Full article ">Figure 4
<p>Collage of the 3D models in low-poly from Luma AI.</p>
Full article ">Figure 5
<p>Comparison of 3D models performed using CloudCompare 2.9.3. (<b>a</b>) High-poly shoes as reference and low-poly shoes as compared; (<b>b</b>) High-poly teddy as reference and Polycam in Android’s version teddy as compared.</p>
Full article ">Figure 6
<p>High-poly and low-poly meshes generated by Luma AI from one of the objects.</p>
Full article ">Figure 7
<p>Comparison of details between Luma AI low-poly model, Polycam from Xiaomi, Polycam from iPad and the real object. (<b>a</b>) Shoe model generated with Xiaomi phone (Polycam); (<b>b</b>) Shoe low-poly model generated with Luma AI; (<b>c</b>) Shoe model generated with iPad (Polycam); (<b>d</b>) Shoes in real life.</p>
Full article ">Figure 8
<p>Two of the scenes developed for analysing the performance of models. (<b>a</b>) Simple scene (1 object of each model) of low-poly Luma AI models; (<b>b</b>) 12x scene (12 objects of each model) of low-poly Luma AI models.</p>
Full article ">Figure 9
<p>Average FPS obtained for each device and type of object. The number of polygons is the total of those for the objects loaded in each scene. LP = Low-Poly objects, MP = Medium-Poly objects, HP = High-Poly objects.</p>
Full article ">Figure 10
<p>Meta Quest 3 bottleneck identification based on frame rate and App T.</p>
Full article ">Figure 11
<p>CPU utilization % obtained of each scene run on each VR Headset. This metric represents the worst performing core. Therefore, for multi-threaded applications, the main thread of the app may not be represented in this metric.</p>
Full article ">Figure 12
<p>CPU and GPU levels in each scene run in each VR headset.</p>
Full article ">Figure 13
<p>High-poly (<b>left</b>) and optimised (<b>right</b>) models from the octopus teddy, showing the differences in both meshes and rendering.</p>
Full article ">Figure 14
<p>Visual quality comparison between high-poly models (red bars) and optimised models from high poly (blue bars). In the fifth subgraph, the red bar represents data obtained from low-poly models. The yellow text box contains the percentage difference between high-poly and optimised models.</p>
Full article ">
17 pages, 9818 KiB  
Article
Constraining the Geometry of NeRFs for Accurate DSM Generation from Multi-View Satellite Images
by Qifeng Wan, Yuzheng Guan, Qiang Zhao, Xiang Wen and Jiangfeng She
ISPRS Int. J. Geo-Inf. 2024, 13(7), 243; https://doi.org/10.3390/ijgi13070243 - 8 Jul 2024
Viewed by 1298
Abstract
Neural Radiance Fields (NeRFs) are an emerging approach to 3D reconstruction that use neural networks to reconstruct scenes. However, its applications for multi-view satellite photogrammetry, which aim to reconstruct the Earth’s surface, struggle to acquire accurate digital surface models (DSMs). To address this [...] Read more.
Neural Radiance Fields (NeRFs) are an emerging approach to 3D reconstruction that use neural networks to reconstruct scenes. However, its applications for multi-view satellite photogrammetry, which aim to reconstruct the Earth’s surface, struggle to acquire accurate digital surface models (DSMs). To address this issue, a novel framework, Geometric Constrained Neural Radiance Field (GC-NeRF) tailored for multi-view satellite photogrammetry, is proposed. GC-NeRF achieves higher DSM accuracy from multi-view satellite images. The key point of this approach is a geometric loss term, which constrains the scene geometry by making the scene surface thinner. The geometric loss term alongside z-axis scene stretching and multi-view DSM fusion strategies greatly improve the accuracy of generated DSMs. During training, bundle-adjustment-refined satellite camera models are used to cast rays through the scene. To avoid the additional input of altitude bounds described in previous works, the sparse point cloud resulting from the bundle adjustment is converted to an occupancy grid to guide the ray sampling. Experiments on WorldView-3 images indicate GC-NeRF’s superiority in accurate DSM generation from multi-view satellite images. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of GC-NeRFs. GC-NeRFs use satellite images and corresponding satellite camera models to reconstruct scenes and generate accurate DSMs. Their key contributions include z-axis scene stretching, an occupancy grid converted from sparse point clouds, DSM fusion, and a geometric loss term for network training, which are shown in bold.</p>
Full article ">Figure 2
<p>Z-axis scene stretching. (<b>a</b>) The original satellite scene is flat. (<b>b</b>) The suitably stretched scene makes full use of multi-resolution hash encodings. (<b>c</b>) The over-stretched scene presents excessive hash conflicts.</p>
Full article ">Figure 3
<p>The architecture of the GC-NeRF network. The model receives 3D spatial coordinate <math display="inline"><semantics> <mrow> <mi>X</mi> </mrow> </semantics></math>, sun direction <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>d</mi> </mrow> <mrow> <mi>s</mi> <mi>u</mi> <mi>n</mi> </mrow> </msub> </mrow> </semantics></math>, and viewing direction <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>d</mi> </mrow> <mrow> <mi>v</mi> <mi>i</mi> <mi>e</mi> <mi>w</mi> </mrow> </msub> </mrow> </semantics></math> as inputs to predict the volume density <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>σ</mi> </mrow> <mrow> <mi>X</mi> </mrow> </msub> </mrow> </semantics></math> and color <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>c</mi> </mrow> <mrow> <mi>X</mi> </mrow> </msub> </mrow> </semantics></math> at <math display="inline"><semantics> <mrow> <mi>X</mi> </mrow> </semantics></math>.</p>
Full article ">Figure 4
<p>The converting and updating of the occupancy grid. (<b>a</b>) The point cloud is freely obtained from the bundle adjustment. (<b>a</b>,<b>b</b>) The point cloud is converted into a bit occupancy grid. (<b>b</b>,<b>c</b>) The bit grid cells are classified by the float grid cells. (<b>c</b>,<b>d</b>) The float grid cells are updated by the <math display="inline"><semantics> <mrow> <mi>α</mi> </mrow> </semantics></math> value predicted from the network.</p>
Full article ">Figure 5
<p>(<b>a</b>) At the same camera parameter error angle <math display="inline"><semantics> <mrow> <mi>θ</mi> </mrow> </semantics></math>, the geometric error is greater in the satellite scene. (<b>b</b>) Without <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>L</mi> <mi>o</mi> <mi>s</mi> <mi>s</mi> </mrow> <mrow> <mi>g</mi> </mrow> </msub> </mrow> </semantics></math>, the sample weight distribution is scattered around the true depth, resulting in significant errors in depth estimation. By contrast, the weight distribution is compactly around the true depth.</p>
Full article ">Figure 6
<p>The relativity between positive DSM errors in merged point clouds and the root mean square error (RMSE) of multi-view DSMs. The predicted elevation is greater than the actual elevation in most areas with a large elevation standard deviation.</p>
Full article ">Figure 7
<p>Visualization of 3D models derived by superimposing DSMs onto images. The DSMs and images are generated by GC-NeRFs.</p>
Full article ">Figure 8
<p>Images rendered by S-NeRFs, Sat-NeRFs, and GC-NeRFs. (<b>a</b>,<b>b</b>) Sat-NeRFs are robust to transient phenomenon such as cars. (<b>c</b>,<b>d</b>) GC-NeRFs render clearer images.</p>
Full article ">Figure 9
<p>Visualization of lidar, S-NeRFs, Sat-NeRFs, and GC-NeRF DSMs. Areas marked by water and building changes are masked. (<b>a</b>,<b>b</b>) The DSM rendered by a GC-NeRF shows that the road is uneven compared to lidar DSM. (<b>c</b>,<b>d</b>) The GC-NeRF DSM displays sharper building edges than the Sat-NeRF DSM. (<b>e</b>,<b>f</b>) The DSM quality rendered by the GC-NeRF is superior to that of the S-NeRF.</p>
Full article ">Figure 10
<p>The MAE of DSMs under different <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>s</mi> </mrow> <mrow> <mi>z</mi> </mrow> </msub> </mrow> </semantics></math>. Low MAE indicates high DSM accuracy, and the best <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="bold-italic">s</mi> </mrow> <mrow> <mi mathvariant="bold-italic">z</mi> </mrow> </msub> </mrow> </semantics></math> value is around 0.8.</p>
Full article ">
Back to TopTop