Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (100)

Search Parameters:
Keywords = NeRF

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 1773 KiB  
Article
Impact of Data Capture Methods on 3D Reconstruction with Gaussian Splatting
by Dimitar Rangelov, Sierd Waanders, Kars Waanders, Maurice van Keulen and Radoslav Miltchev
J. Imaging 2025, 11(2), 65; https://doi.org/10.3390/jimaging11020065 - 18 Feb 2025
Abstract
This study examines how different filming techniques can enhance the quality of 3D reconstructions with a particular focus on their use in indoor crime scene investigations. Using Neural Radiance Fields (NeRF) and Gaussian Splatting, we explored how factors like camera orientation, filming speed, [...] Read more.
This study examines how different filming techniques can enhance the quality of 3D reconstructions with a particular focus on their use in indoor crime scene investigations. Using Neural Radiance Fields (NeRF) and Gaussian Splatting, we explored how factors like camera orientation, filming speed, data layering, and scanning path affect the detail and clarity of 3D reconstructions. Through experiments in a mock crime scene apartment, we identified optimal filming methods that reduce noise and artifacts, delivering clearer and more accurate reconstructions. Filming in landscape mode, at a slower speed, with at least three layers and focused on key objects produced the most effective results. These insights provide valuable guidelines for professionals in forensics, architecture, and cultural heritage preservation, helping them capture realistic high-quality 3D representations. This study also highlights the potential for future research to expand on these findings by exploring other algorithms, camera parameters, and real-time adjustment techniques. Full article
(This article belongs to the Special Issue Geometry Reconstruction from Images (2nd Edition))
25 pages, 8016 KiB  
Article
High-Fold 3D Gaussian Splatting Model Pruning Method Assisted by Opacity
by Shiyu Qiu, Chunlei Wu, Zhenghao Wan and Siyuan Tong
Appl. Sci. 2025, 15(3), 1535; https://doi.org/10.3390/app15031535 - 3 Feb 2025
Viewed by 681
Abstract
Recent advancements in 3D scene representation have underscored the potential of Neural Radiance Fields (NeRFs) for producing high-fidelity renderings of complex scenes. However, NeRFs are hindered by the significant computational burden of volumetric rendering. To address this, 3D Gaussian Splatting (3DGS) has emerged [...] Read more.
Recent advancements in 3D scene representation have underscored the potential of Neural Radiance Fields (NeRFs) for producing high-fidelity renderings of complex scenes. However, NeRFs are hindered by the significant computational burden of volumetric rendering. To address this, 3D Gaussian Splatting (3DGS) has emerged as an efficient alternative, utilizing Gaussian-based representations and rasterization techniques to achieve faster rendering speeds without sacrificing image quality. Despite these advantages, the large number of Gaussian points and associated internal parameters result in high storage demands. To address this challenge, we propose a pruning strategy applied during the Gaussian densification and pruning phases. Our approach integrates learnable Gaussian masks with a contribution-based pruning mechanism, further enhanced by an opacity update strategy to facilitate the pruning process. This method effectively eliminates redundant Gaussian points and those with minimal contributions to scene construction. Additionally, during the Gaussian parameter compression phase, we employ a combination of teacher–student models and vector quantization to compress the spherical harmonic coefficients. Extensive experimental results demonstrate that our approach reduces the storage requirements of original 3D Gaussian models by over 30 times, with only a minor degradation in rendering quality. Full article
(This article belongs to the Special Issue Technical Advances in 3D Reconstruction)
Show Figures

Figure 1

Figure 1
<p>Illustration of the proposed Gaussian compression framework. Firstly, the 3D Gaussian points are initialized using Structure from Motion (SfM). During the densification process, an opacity-based strategy is applied to guide the cloning of Gaussian points. In the pruning stage, a learnable Gaussian mask is introduced to identify and mask redundant Gaussian points, which are subsequently pruned based on their contribution to image rendering at certain intervals. To address the high memory usage of spherical harmonic coefficients, knowledge distillation and vector quantization are applied, resulting in the final compressed model.</p>
Full article ">Figure 2
<p>(<b>left</b>) Comparison of the original update policy with the opacity update strategy. (<b>right</b>) A flowchart of the Gaussian mask integration that illustrates the relationship between opacity, density, and pruning choices.</p>
Full article ">Figure 3
<p>The process of vector quantization of SH coefficients in 3DGS model. Comparison of the original update policy with the opacity update strategy. The process employs K-means algorithms, which iteratively assigns 3DGS points to clusters in order to minimize the distance between the SH vectors and their corresponding centroids. Data compression is then achieved by substituting each original vector with the index of the nearest representative vector in the SH codebook.</p>
Full article ">Figure 4
<p>Visualization of the reconstructed results. It can be observed that our method can significantly reduce the size of 3D models without significantly affecting the visual perception of rendering quality.</p>
Full article ">Figure 5
<p>Visual comparison results of the original image, LightGaussian rendering, and our model rendering.</p>
Full article ">Figure 6
<p>Visualize the result of reducing the number of Gaussian points. We degrade the Gaussian points and visualize only the center point. It can be seen that our method effectively reduces the number of Gaussian points.</p>
Full article ">Figure 7
<p>Compression ratio and visualization at different stages on the Horse dataset.</p>
Full article ">Figure 7 Cont.
<p>Compression ratio and visualization at different stages on the Horse dataset.</p>
Full article ">Figure 8
<p>The influence curve of the opacity update strategy on the number of Gaussian points.</p>
Full article ">Figure 9
<p>Pseudocode of the overall pipeline.</p>
Full article ">Figure 10
<p>Pseudocode of Gaussian quantity compression.</p>
Full article ">Figure 11
<p>Relationship between PSNR, SSIM, LPIPS, and Size during the training process of our method and 3DGS method.</p>
Full article ">Figure 12
<p>ANOVA Comparison of Different Gaussian Compression Methods.</p>
Full article ">Figure 13
<p>Line chart of memory usage during training of some datasets.</p>
Full article ">
15 pages, 2549 KiB  
Article
SRNeRF: Super-Resolution Neural Radiance Fields for Autonomous Driving Scenario Reconstruction from Sparse Views
by Jun Wang, Xiaojun Zhu, Ziyu Chen, Peng Li, Chunmao Jiang, Hui Zhang, Chennian Yu and Biao Yu
World Electr. Veh. J. 2025, 16(2), 66; https://doi.org/10.3390/wevj16020066 - 23 Jan 2025
Viewed by 572
Abstract
High-fidelity driving scenario reconstruction can generate a lot of realistic virtual simulation environment samples, which can support effective training and testing for autonomous vehicles. Neural radiance fields (NeRFs) have demonstrated their excellence in high-fidelity scenario reconstruction; however, they still rely on dense-view data [...] Read more.
High-fidelity driving scenario reconstruction can generate a lot of realistic virtual simulation environment samples, which can support effective training and testing for autonomous vehicles. Neural radiance fields (NeRFs) have demonstrated their excellence in high-fidelity scenario reconstruction; however, they still rely on dense-view data and precise camera poses, which are difficult to obtain in autonomous vehicles. To address the above issues, we propose a novel approach called SRNeRF, which can eliminate pose-based operations and perform scenario reconstruction from sparse views. To extract more scene knowledge from limited views, we incorporate an image super-resolution module based on a fully convolutional neural network and introduce a new texture loss to capture scene details for higher-quality scene reconstruction. On both object-centric and scene-level datasets, SRNeRF performs comparably to previous methods with ground truth poses and significantly outperforms methods with predicted poses, with a PSNR improvement of about 30%. Finally, we evaluate SRNeRF on our custom autonomous driving dataset, and the results show that SRNeRF can still generate stable images and novel views in the face of sparse views, demonstrating its scalability in autonomous driving scenario synthesis. Full article
(This article belongs to the Special Issue Recent Advances in Intelligent Vehicle)
Show Figures

Figure 1

Figure 1
<p>The network architecture. It consists of three components: the image rendering module, the upsampling module, and the loss function.</p>
Full article ">Figure 2
<p>Architecture of image rendering module. Starting from the input image features, feature extraction and processing are performed through the multi-view encoder. The neural volume data are then updated, and finally, the neural rendering module generates an image of the new view.</p>
Full article ">Figure 3
<p>Network architecture of the upsampling module. It consists of convolutional layers, several Generator networks, and two upsampling layers. The image goes through a series of convolutions (with 3 × 3 kernels) and ReLU operations, followed by an upsampling step, and finally, bicubic interpolation is applied to increase the resolution.</p>
Full article ">Figure 4
<p>Comparison with prior methods. PixelNeRF quickly deteriorates under predicted poses, and SRT can only produce blurry images. LEAP can synthesize high-quality images, but its handling of lighting and shadows lacks realism. In contrast, SRNeRF reliably recovers details, with the novel views closely matching the ground truth target views.</p>
Full article ">Figure 5
<p>Qualitative results on KITTI (first column) and Waymo datasets (second to fourth columns). PixelNeRF quickly deteriorates under predicted poses, and SRT can only produce blurry images. LEAP can synthesize high-quality images, but its handling of lighting and shadows lacks realism. In contrast, SRNeRF reliably recovers details, with the novel views closely matching the ground truth target views.</p>
Full article ">Figure 6
<p>Qualitative results on Science Island dataset. On our own dataset, SRNeRF demonstrates superior metrics and better synthesis results.</p>
Full article ">Figure 7
<p>Visualization of NVS results. The images showcase the new view synthesis performance of LEAP and SRNeRF on the KITTI and Waymo datasets.</p>
Full article ">
25 pages, 477 KiB  
Article
Topology of Locally and Non-Locally Generalized Derivatives
by Dimiter Prodanov
Fractal Fract. 2025, 9(1), 53; https://doi.org/10.3390/fractalfract9010053 - 20 Jan 2025
Viewed by 531
Abstract
This article investigates the continuity of derivatives of real-valued functions from a topological perspective. This is achieved by the characterization of their sets of discontinuity. The same principle is applied to Gateaux derivatives and gradients in Euclidean spaces. This article also introduces a [...] Read more.
This article investigates the continuity of derivatives of real-valued functions from a topological perspective. This is achieved by the characterization of their sets of discontinuity. The same principle is applied to Gateaux derivatives and gradients in Euclidean spaces. This article also introduces a generalization of the derivatives from the perspective of the modulus of continuity and characterizes their sets of discontinuities. There is a need for such generalizations when dealing with physical phenomena, such as fractures, shock waves, turbulence, Brownian motion, etc. Full article
Show Figures

Figure 1

Figure 1
<p>Neidinger–Bernouli function and its fractional variation. (<b>A</b>)—Original Neidinger construction <math display="inline"><semantics> <mrow> <mi>N</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mn>1</mn> <mo>/</mo> <msqrt> <mn>2</mn> </msqrt> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>M</mi> <mrow> <mn>1</mn> <mo>/</mo> <mn>3</mn> </mrow> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mn>1</mn> <mo>/</mo> <msqrt> <mn>2</mn> </msqrt> <mo>)</mo> </mrow> </mrow> </semantics></math>; (<b>B</b>)—modified construction <math display="inline"><semantics> <mrow> <msub> <mi>M</mi> <mrow> <mn>2</mn> <mo>/</mo> <mn>3</mn> </mrow> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mn>1</mn> <mo>/</mo> <msqrt> <mn>2</mn> </msqrt> <mo>)</mo> </mrow> </mrow> </semantics></math>.</p>
Full article ">
21 pages, 49659 KiB  
Article
Neural Radiance Fields for High-Fidelity Soft Tissue Reconstruction in Endoscopy
by Jinhua Liu, Yongsheng Shi, Dongjin Huang and Jiantao Qu
Sensors 2025, 25(2), 565; https://doi.org/10.3390/s25020565 - 19 Jan 2025
Viewed by 865
Abstract
The advancement of neural radiance fields (NeRFs) has facilitated the high-quality 3D reconstruction of complex scenes. However, for most NeRFs, reconstructing 3D tissues from endoscopy images poses significant challenges due to the occlusion of soft tissue regions by invalid pixels, deformations in soft [...] Read more.
The advancement of neural radiance fields (NeRFs) has facilitated the high-quality 3D reconstruction of complex scenes. However, for most NeRFs, reconstructing 3D tissues from endoscopy images poses significant challenges due to the occlusion of soft tissue regions by invalid pixels, deformations in soft tissue, and poor image quality, which severely limits their application in endoscopic scenarios. To address the above issues, we propose a novel framework to reconstruct high-fidelity soft tissue scenes from low-quality endoscopic images. We first construct an EndoTissue dataset of soft tissue regions in endoscopic images and fine-tune the Segment Anything Model (SAM) based on EndoTissue to obtain a potent segmentation network. Given a sequence of monocular endoscopic images, this segmentation network can quickly obtain the tissue mask images. Additionally, we incorporate tissue masks into a dynamic scene reconstruction method called Tensor4D to effectively guide the reconstruction of 3D deformable soft tissues. Finally, we propose adopting the image enhancement model EDAU-Net to improve the quality of the rendered views. The experimental results show that our method can effectively focus on the soft tissue regions in the image, achieving higher fidelity in detail and geometric structural integrity in reconstruction compared to state-of-the-art algorithms. Feedback from the user study indicates high participant scores for our method. Full article
(This article belongs to the Collection Artificial Intelligence (AI) in Biomedical Imaging)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The challenge of monocular endoscopic image reconstruction. The regions marked by red lines in the figure are invalid pixels.</p>
Full article ">Figure 2
<p>The framework of the proposed method.</p>
Full article ">Figure 3
<p>Endoscopic images and their mask images. (<b>a</b>) Source of endoscopic images for dataset EndoTissue. (<b>b</b>) The training process of fine-tuning SAM using the constructed dataset EndoTissue.</p>
Full article ">Figure 4
<p>The overall flow of deformable soft tissue reconstruction.</p>
Full article ">Figure 5
<p>Effect comparison before and after the image enhancement model EDAU-Net.</p>
Full article ">Figure 6
<p>Camera positions of all scenes in 3D space. The blue boxes mark the cameras corresponding to the test images.</p>
Full article ">Figure 7
<p>Visual comparison with state-of-the-art 3D reconstruction methods for the endoscopic image of Scene A.</p>
Full article ">Figure 8
<p>Visual comparison with state-of-the-art 3D reconstruction methods for the endoscopic image of Scene B.</p>
Full article ">Figure 9
<p>Visual comparison with state-of-the-art image enhancement methods for the endoscopic image of Scenes C and D.</p>
Full article ">Figure 10
<p>Ablation experiment results on tissue mask-guided ray sampling. (<b>a</b>) Without tissue mask-guided ray sampling; (<b>b</b>) our method; (<b>c</b>) ground truth.</p>
Full article ">Figure 11
<p>Ablation experiment results on EDAU-Net. (<b>a</b>) Without EDAU-Net; (<b>b</b>) our method; (<b>c</b>) ground truth.</p>
Full article ">Figure 12
<p>Results of user study.</p>
Full article ">
13 pages, 3746 KiB  
Article
NeRF-Accelerated Ecological Monitoring in Mixed-Evergreen Redwood Forest
by Adam Korycki, Cory Yeaton, Gregory S. Gilbert, Colleen Josephson and Steve McGuire
Forests 2025, 16(1), 173; https://doi.org/10.3390/f16010173 - 17 Jan 2025
Viewed by 625
Abstract
Forest mapping provides critical observational data needed to understand the dynamics of forest environments. Notably, tree diameter at breast height (DBH) is a metric used to estimate forest biomass and carbon dioxide (CO2) sequestration. Manual methods of forest mapping are [...] Read more.
Forest mapping provides critical observational data needed to understand the dynamics of forest environments. Notably, tree diameter at breast height (DBH) is a metric used to estimate forest biomass and carbon dioxide (CO2) sequestration. Manual methods of forest mapping are labor intensive and time consuming, a bottleneck for large-scale mapping efforts. Automated mapping relies on acquiring dense forest reconstructions, typically in the form of point clouds. Terrestrial laser scanning (TLS) and mobile laser scanning (MLS) generate point clouds using expensive LiDAR sensing and have been used successfully to estimate tree diameter. Neural radiance fields (NeRFs) are an emergent technology enabling photorealistic, vision-based reconstruction by training a neural network on a sparse set of input views. In this paper, we present a comparison of MLS and NeRF forest reconstructions for the purpose of trunk diameter estimation in a mixed-evergreen Redwood forest. In addition, we propose an improved DBH-estimation method using convex-hull modeling. Using this approach, we achieved 1.68 cm RMSE (2.81%), which consistently outperformed standard cylinder modeling approaches. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Forestry: 2nd Edition)
Show Figures

Figure 1

Figure 1
<p>NeRF scene representation flow. Sparse images with corresponding poses are sampled using ray-tracing to generate 5D input vector comprised of location (<math display="inline"><semantics> <mrow> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>,</mo> <mi>z</mi> </mrow> </semantics></math>) and viewing direction (<math display="inline"><semantics> <mrow> <mi>θ</mi> <mo>,</mo> <mi>ϕ</mi> </mrow> </semantics></math>). A cascaded MLP learns the weights to map this 5D vector to output color (r, g, b) and volume density <math display="inline"><semantics> <mi>σ</mi> </semantics></math>. Volume rendering composites the learned rays to novel views.</p>
Full article ">Figure 2
<p>Quadruped robot creating a dense LiDAR-inertial reconstruction in a forest environment (<b>left</b>). LIOSAM visualization of estimated trajectory (torqouise), loop-closure events (yellow), and tightly aligned LiDAR scans (magenta) (<b>right</b>).</p>
Full article ">Figure 3
<p>TreeTool process applied to a forest NeRF reconstruction. (<b>A</b>) ground segmentation, (<b>B</b>) trunk segmentation, and (<b>C</b>) trunk modeling. Our tree modeling approach considers trees as stacks of convex-hull slices which outperformed other approaches by 3–4× in terms of DBH estimation accuracy.</p>
Full article ">Figure 4
<p>Forest reconstructions produced by SLAM (<b>bottom</b> row) and NeRF (<b>top</b> row) methods of both datasets. Adjacent plots are data collection trajectories for each reconstruction. In dataset A, we illustrate the effectiveness of segmentation between the ground points (orange) and trees (violet). We use a z-axis color gradient to enhance the visualization of dataset B reconstructions, as this region included more complex ground-level vegetation. The figure also compares a zoomed-in section of a tree trunk. The NeRF reconstruction is approximately 4× denser compared to SLAM, and is of higher surface quality.</p>
Full article ">Figure 5
<p>Four comparisons of RANSAC and convex-hull modeling approaches. Deltas between manual DBH and each modeling approach are provided on the top line. RANSAC cylinder modeling consistently under-fits well-represented trunk projections. Convex hull DBH estimation outperformed RANSAC by 3–4×.</p>
Full article ">
15 pages, 1946 KiB  
Article
Neuromuscular Excitation Pattern in Expert Indoor Skydivers
by Susanna Rampichini, Eloisa Limonta, Irene Di Giulio, Marta Borrelli, Emiliano Cè, Giuseppe Coratella, Johnny Padulo, Stefano Longo and Fabio Esposito
Appl. Sci. 2025, 15(2), 852; https://doi.org/10.3390/app15020852 - 16 Jan 2025
Viewed by 476
Abstract
The evolution of indoor skydiving from military origins to a globally recreational pursuit has attracted individuals without prior specific training, exposing them to the heightened risk of muscle–joint injuries associated with indoor skydiving. This study aims to assess the muscular excitation patterns in [...] Read more.
The evolution of indoor skydiving from military origins to a globally recreational pursuit has attracted individuals without prior specific training, exposing them to the heightened risk of muscle–joint injuries associated with indoor skydiving. This study aims to assess the muscular excitation patterns in highly skilled instructors to determine the optimal activation strategies for preventing musculoskeletal injuries. Nine expert indoor skydiving instructors (M/F: 8/1; age 31 ± 7 years; body mass: 70.5 ± 11.1 kg; stature: 1.74 ± 0.09 m) were enrolled. The surface electromyographic signal was recorded on the anterior deltoid, AD; posterior deltoid, PD; pectoralis major, PM; latissimus dorsi, LD; rectus abdominis, Rabd; erector spinae, ES; rectus femoris, RF; and biceps femoris, BF for each position: PRONE, SUPINE, SIT, and head-down (HD). A synchronous heart rate was recorded to assess the cardiac engagement. The neuromuscular load (NeLo), representative of the excitation amplitude of a muscle set, was determined for each position across different body regions. PRONE and SUPINE were the least demanding in terms of cardiac engagement (67 ± 6 and 85 ± 8 bpm, respectively) and exhibited the lowest neuromuscular excitation (24 ± 10 and 24 ± 8% Max, respectively). By contrast, HD exhibited the highest cardiac (127 + 18 bpm) and neuromuscular (71 + 11% Max) engagement and particularly in the lower-trunk and lower-limbs muscles. SIT predominantly engaged upper-trunk and shoulder muscles. The findings indicate the significant engagement of the musculature surrounding the shoulder joint and that responsible for lower-trunk stabilization in maintaining the investigated positions. A targeted training regimen on strengthening these muscles is advised before practicing indoor skydiving to prevent shoulder joint injuries or lower back muscle overloads. Full article
(This article belongs to the Special Issue Exercise, Fitness, Human Performance and Health: 2nd Edition)
Show Figures

Figure 1

Figure 1
<p>Pictures showing the fundamental postures. (<b>a</b>) PRONE; (<b>b</b>) SUPINE; (<b>c</b>) SIT; (<b>d</b>) head down (HD).</p>
Full article ">Figure 2
<p>HR values determined in the resting conditions outside the tunnel and, for each posture, inside the tunnel. Dashed, white, light grey, dark grey, and black bars refer to the REST OUT, PRONE, SUPINE, SIT, and HD conditions, respectively. + <span class="html-italic">p</span> &lt; 0.05 vs. REST OUT, * <span class="html-italic">p</span> &lt; 0.05 vs. PRONE, and # <span class="html-italic">p</span> &lt; 0.05 vs. SUPINE. HR values are expressed as the mean + SD.</p>
Full article ">Figure 3
<p>Mean values of NeLo (<b>a</b>) and regional NeLo (NeLoR (<b>b</b>)), along with NeLo<sub>FRONT</sub> and NeLo<sub>BACK</sub> (<b>c</b>), for each posture. White, light grey, dark grey, and black bars refer to the PRONE, SUPINE, SIT, and HD conditions, respectively. White and dashed bars refer to NeLo<sub>FRONT</sub> and NeLo<sub>BACK</sub>, respectively. * <span class="html-italic">p</span> &lt; 0.05 vs. PRONE, # <span class="html-italic">p</span> &lt; 0.05 vs. SUPINE, and § <span class="html-italic">p</span> &lt; 0.05 vs. SIT. NeLo values are expressed as the mean + SD.</p>
Full article ">Figure 4
<p>Average RMS values, expressed as the percentage of the maximal excitation of the investigated muscles for each posture. PRONE, SUPINE, SIT, and HD are shown in (<b>a</b>), (<b>b</b>), (<b>c</b>), and (<b>d</b>), respectively.</p>
Full article ">
21 pages, 11620 KiB  
Article
Performance Evaluation and Optimization of 3D Gaussian Splatting in Indoor Scene Generation and Rendering
by Xinjian Fang, Yingdan Zhang, Hao Tan, Chao Liu and Xu Yang
ISPRS Int. J. Geo-Inf. 2025, 14(1), 21; https://doi.org/10.3390/ijgi14010021 - 7 Jan 2025
Viewed by 1289
Abstract
This study addresses the prevalent challenges of inefficiency and suboptimal quality in indoor 3D scene generation and rendering by proposing a parameter-tuning strategy for 3D Gaussian Splatting (3DGS). Through a systematic quantitative analysis of various performance indicators under differing resolution conditions, threshold settings [...] Read more.
This study addresses the prevalent challenges of inefficiency and suboptimal quality in indoor 3D scene generation and rendering by proposing a parameter-tuning strategy for 3D Gaussian Splatting (3DGS). Through a systematic quantitative analysis of various performance indicators under differing resolution conditions, threshold settings for the average magnitude of spatial position gradients, and adjustments to the scaling learning rate, the optimal parameter configuration for the 3DGS model, specifically tailored for indoor modeling scenarios, is determined. Firstly, utilizing a self-collected dataset, a comprehensive comparison was conducted among COLLI-SION-MAPping (abbreviated as COLMAP (V3.7), an open-source software based on Structure from Motion and Multi-View Stereo (SFM-MVS)), Context Capture (V10.2) (abbreviated as CC, a software utilizing oblique photography algorithms), Neural Radiance Fields (NeRF), and the currently renowned 3DGS algorithm. The key dimensions of focus included the number of images, rendering time, and overall rendering effectiveness. Subsequently, based on this comparison, rigorous qualitative and quantitative evaluations are further conducted on the overall performance and detail processing capabilities of the 3DGS algorithm. Finally, to meet the specific requirements of indoor scene modeling and rendering, targeted parameter tuning is performed on the algorithm. The results demonstrate significant performance improvements in the optimized 3DGS algorithm: the PSNR metric increases by 4.3%, and the SSIM metric improves by 0.2%. The experimental results prove that the improved 3DGS algorithm exhibits superior expressive power and persuasiveness in indoor scene rendering. Full article
Show Figures

Figure 1

Figure 1
<p>Diagram of Multi-Resolution Hash Encoding method. The blue, brown, orange, and green boxes represent the index calculation of hash tables at different Levels. Each Level has a different grid resolution. T denotes the size of the hash table.</p>
Full article ">Figure 2
<p>Overview of 3DGS technology.</p>
Full article ">Figure 3
<p>Study Area. (<b>a</b>) Floor; (<b>b</b>) Ceiling; (<b>c</b>) Table; (<b>d</b>) Full View.</p>
Full article ">Figure 4
<p>Schematic diagram of data collection. The arrows indicate the direction and trajectory of the shooting.</p>
Full article ">Figure 5
<p>Model construction and rendered image generation. (<b>a</b>) CC modeling diagram, the red border highlights the damaged area of the model.; (<b>b</b>) OSketch Up individualized rendering; (<b>c</b>,<b>d</b>) OSketch Up interactive operation diagrams.</p>
Full article ">Figure 6
<p>Comparison Chart of Rendering Effects Among Different Algorithms: (<b>a</b>) COLMAP; (<b>b</b>) NeRF (Instant-NGP); (<b>c</b>) 3DGS; (<b>d</b>) CC.</p>
Full article ">Figure 7
<p>A comparison of reconstruction results between COLMAP, NeRF, and 3DGS methods under varying numbers of remote sensing images. (<b>a</b>–<b>c</b>) Showcase the modeling effects of COLMAP when the number of images is 170, 110, and 66, respectively; (<b>d</b>–<b>f</b>) Demonstrate the modeling outcomes of NeRF (Instant-NGP) with 170, 110, and 66 images; (<b>g</b>–<b>i</b>) Present the modeling performance of 3DGS for the same sets of 170, 110, and 66 images.</p>
Full article ">Figure 8
<p>A comparative analysis of rendering outcomes of tables/chairs. (<b>a</b>,<b>b</b>) Modeling accuracy of chairs utilizing NeRF (Instant-NGP) with 110 and 66 Images, respectively. (<b>c</b>,<b>d</b>) Reconstruction fidelity of chairs by 3DGS for 110 and 66 images, respectively.</p>
Full article ">Figure 9
<p>A comparative analysis of rendering outcomes of windows. (<b>a</b>,<b>b</b>) Modeling performance of windows achieved by NeRF (Instant-NGP) with 110 and 66 images, respectively. (<b>c</b>,<b>d</b>) Reconstruction precision of windows via 3DGS for the corresponding sets of 110 and 66 images, respectively.</p>
Full article ">Figure 10
<p>Comparison of scene images under different numbers of best pictures and iterations. (<b>a</b>) Original image; (<b>b</b>) rendered image.</p>
Full article ">Figure 11
<p>Comparison of dimensions for rendered images at various resolutions. (<b>a</b>–<b>g</b>) represent rendered images with resolutions of 0.3 k, 0.5 k, 0.8 k, 1.2 k, 1.5 k, 1.6 k, and 2 k, respectively.</p>
Full article ">Figure 12
<p>Comparison of rendering effects for different threshold values of the average magnitude of spatial position gradients. (<b>a</b>–<b>d</b>) represent the rendering effects when the threshold values are set to 0.0001, 0.0002, 0.0003, and 0.0004, respectively.</p>
Full article ">Figure 13
<p>Comparison of rendering effects for different learning rates at various scaling scales. (<b>a</b>–<b>d</b>) represent the rendering effects when the scaling scale learning rates are set to 0.004, 0.005, 0.006, and 0.008, respectively.</p>
Full article ">Figure 14
<p>Comparison of rendering effects for different hyperparameter settings. (<b>a</b>–<b>e</b>) represents the rendering effects when the hyperparameter settings are 0.0005, 0.001, 0.002, 0.01, and 0.1, respectively.</p>
Full article ">Figure 15
<p>Comparison diagram of ceiling area before and after algorithm optimization. (<b>a</b>) Before optimization; (<b>b</b>) After optimization.</p>
Full article ">Figure 16
<p>Iterative 3DGS training results. (<b>a</b>) Original image; (<b>b</b>) Training result after 7 k iterations; (<b>c</b>) Training result after 30 k iterations.</p>
Full article ">
17 pages, 3610 KiB  
Article
Multi-Level Feature Dynamic Fusion Neural Radiance Fields for Audio-Driven Talking Head Generation
by Wenchao Song, Qiong Liu, Yanchao Liu, Pengzhou Zhang and Juan Cao
Appl. Sci. 2025, 15(1), 479; https://doi.org/10.3390/app15010479 - 6 Jan 2025
Viewed by 650
Abstract
Audio-driven cross-modal talking head generation has experienced significant advancement in the last several years, and it aims to generate a talking head video that corresponds to a given audio sequence. Out of these approaches, the NeRF-based method can generate videos featuring a specific [...] Read more.
Audio-driven cross-modal talking head generation has experienced significant advancement in the last several years, and it aims to generate a talking head video that corresponds to a given audio sequence. Out of these approaches, the NeRF-based method can generate videos featuring a specific person with more natural motion compared to the one-shot methods. However, previous approaches failed to distinguish the importance of different regions, resulting in the loss of information-rich region features. To alleviate the problem and improve video quality, we propose MLDF-NeRF, an end-to-end method for talking head generation, which can achieve better vector representation through multi-level feature dynamic fusion. Specifically, we designed two modules in MLDF-NeRF to enhance the cross-modal mapping ability between audio and different facial regions. We initially developed a multi-level tri-plane hash representation that uses three sets of tri-plane hash networks with varying resolutions of limitation to capture the dynamic information of the face more accurately. Then, we introduce the idea of multi-head attention and design an efficient audio-visual fusion module that explicitly fuses audio features with image features from different planes, thereby improving the mapping between audio features and spatial information. Meanwhile, the design helps to minimize interference from facial areas unrelated to audio, thereby improving the overall quality of the representation. The quantitative and qualitative results indicate that our proposed method can effectively generate talk heads with natural actions and realistic details. Compared with previous methods, it performs better in terms of image quality, lip sync, and other aspects. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of the proposed MLDF-NeRF.</p>
Full article ">Figure 2
<p><b>Audio feature extraction:</b> Using the sliding window approach, we split the provided audio stream into several parts. Following the ASR classification of each audio clip, we apply 1D convolution and self-attention modules to temporally smooth the audio clips in order to produce the final audio feature <math display="inline"><semantics> <msub> <mi>α</mi> <mi>s</mi> </msub> </semantics></math>.</p>
Full article ">Figure 3
<p>Visualization of the <b>audio-visual fusion module</b>. We display the attention map visualized through the norm of the attention vector in the generated head space image.</p>
Full article ">Figure 4
<p>The comparison of the generated portraits with the Obama video.</p>
Full article ">Figure 5
<p>The comparison of the details. We show the generated details of the baselines (AD-NeRF, RAD-NeRF, and ER-NeRF) to compare with our method.</p>
Full article ">Figure 6
<p>User study. The rating is on a scale of 1–5 (the higher the better).</p>
Full article ">
17 pages, 2126 KiB  
Article
Novel View Synthesis with Depth Priors Using Neural Radiance Fields and CycleGAN with Attention Transformer
by Yuxin Qin, Xinlin Li, Linan Zu and Ming Liang Jin
Symmetry 2025, 17(1), 59; https://doi.org/10.3390/sym17010059 - 1 Jan 2025
Viewed by 681
Abstract
Novel view synthesis aims to generate new perspectives from a limited number of input views. Neural Radiance Field (NeRF) is a key method for this task, and it produces high-fidelity images from a comprehensive set of inputs. However, a NeRF’s performance drops significantly [...] Read more.
Novel view synthesis aims to generate new perspectives from a limited number of input views. Neural Radiance Field (NeRF) is a key method for this task, and it produces high-fidelity images from a comprehensive set of inputs. However, a NeRF’s performance drops significantly with sparse views. To mitigate this, depth information can be used to guide training, with coarse depth maps often readily available in practical settings. We propose an improved sparse view NeRF model, ATGANNeRF, which integrates an enhanced U-Net generator with a dual-discriminator framework, CBAM, and Multi-Head Self-Attention mechanisms. The symmetric design enhances the model’s ability to capture and preserve spatial relationships, ensuring a more consistent generation of novel views. Additionally, local depth ranking is employed to ensure depth consistency with coarse maps, and spatial continuity constraints are introduced to synthesize novel views from sparse samples. SSIM loss is also added to preserve local structural details like edges and textures. Evaluation on LLFF, DTU, and our own datasets shows that ATGANNeRF significantly outperforms state-of-the-art methods, both quantitatively and qualitatively. Full article
(This article belongs to the Section Engineering and Materials)
Show Figures

Figure 1

Figure 1
<p>Framework Overview. The four components of ATGANNeRF are as follows: (<b>a</b>) An image color reconstruction; (<b>b</b>) A dual discriminator structure is employed to re-estimate the coarse depth map; (<b>c</b>) A ranking regularization is applied to the obtained fine depth map; (<b>d</b>) A spatial continuity regularization is enforced.</p>
Full article ">Figure 2
<p>Coarse depth prior module: The dual discriminator structure evaluates fake depth maps and RGB images from the enhanced U-Net and CBAM-equipped generator, while the inverse generator ensures cycle consistency. Obtaining better depth maps via recurrent adversarial training.</p>
Full article ">Figure 3
<p>Comparison of visual outcomes on the LLFF dataset using three views, with regions of interest marked in red boxes.</p>
Full article ">Figure 4
<p>Comparison of visual outcomes on the DTU dataset using three views, with regions of interest marked in red boxes.</p>
Full article ">Figure 5
<p>Comparison of visual outcomes on the own dataset using three views, with regions of interest marked in red boxes.</p>
Full article ">Figure 6
<p>Ablation study on the visual effect of CBAM, with regions of interest highlighted in red boxes.</p>
Full article ">Figure 7
<p>Ablation study on the visual effect of MHSA, with regions of interest highlighted in red boxes.</p>
Full article ">Figure 8
<p>Ablation study on the visual effect of the dual-discriminator, with regions of interest highlighted in red boxes.</p>
Full article ">Figure 9
<p>Ablation study on the visual effect of SSIM Loss, with regions of interest highlighted in red boxes.</p>
Full article ">
12 pages, 2105 KiB  
Article
An Automated Marker-Less Registration Approach Using Neural Radiance Fields for Potential Use in Mixed Reality-Based Computer-Aided Surgical Navigation of Paranasal Sinus
by Suhyeon Kim, Hyeonji Kim and Younhyun Jung
Computers 2025, 14(1), 5; https://doi.org/10.3390/computers14010005 - 27 Dec 2024
Viewed by 445
Abstract
Paranasal sinus surgery, a common treatment for chronic rhinosinusitis, requires exceptional precision due to the proximity of critical anatomical structures. To ensure accurate instrument control and clear visualization of the surgical site, surgeons utilize computer-aided surgical navigation (CSN). A key component of CSN [...] Read more.
Paranasal sinus surgery, a common treatment for chronic rhinosinusitis, requires exceptional precision due to the proximity of critical anatomical structures. To ensure accurate instrument control and clear visualization of the surgical site, surgeons utilize computer-aided surgical navigation (CSN). A key component of CSN is the registration process, which is traditionally reliant on manual or marker-based techniques. However, there is a growing shift toward marker-less registration methods. In previous work, we investigated a mesh-based registration approach using a Mixed Reality Head-Mounted Display (MR-HMD), specifically the Microsoft HoloLens 2. However, this method faced limitations, including depth holes and invalid values. These issues stemmed from the device’s low-resolution camera specifications and the 3D projection steps required to upscale RGB camera spaces. In this study, we propose a novel automated marker-less registration method leveraging Neural Radiance Field (NeRF) technology with an MR-HMD. To address insufficient depth information in the previous approach, we utilize rendered-depth images generated by the trained NeRF model. We evaluated our method against two other techniques, including prior mesh-based registration, using a facial phantom and three participants. The results demonstrate our proposed method achieves at least a 0.873 mm (12%) improvement in registration accuracy compared to others. Full article
Show Figures

Figure 1

Figure 1
<p>Comprehensive overview: (<b>a</b>) Multi-view Image Data Acquisition, (<b>b</b>) 2D Facial Landmark Extraction, (<b>c</b>) Reprojected Depth Images and NeRF-Rendered Depth Image Acquisition, (<b>d</b>) 3D Facial Landmark Extraction using Depth Images, (<b>e</b>) CT Facial Landmark Extraction and Registration.</p>
Full article ">Figure 2
<p>Reprojected depth image and NeRF-rendered depth image: (<b>a</b>) Reprojected Depth Image Process, (<b>b</b>) NeRF-Rendered Depth Image Process.</p>
Full article ">Figure 3
<p>Flowchart for obtaining depth values corresponding to 2D facial landmarks.</p>
Full article ">Figure 4
<p>(<b>a</b>) the captured RGB image of the facial phantom; (<b>b</b>) the captured depth image from the built-in depth camera; (<b>c</b>) the 3D geometry mesh from (<b>b</b>) in the RGB space, (<b>d</b>) the reprojected depth image; (<b>e</b>) its interpolated depth image from (<b>d</b>); (<b>f</b>) the NeRF-rendered depth image.</p>
Full article ">Figure 5
<p>The augmentation results of the CT rendering onto the physical phantom using the transformation matrix from our approach: (<b>a</b>) front view, (<b>b</b>) lateral view, and (<b>c</b>) bottom view.</p>
Full article ">
19 pages, 5781 KiB  
Article
UAV-Multispectral Based Maize Lodging Stress Assessment with Machine and Deep Learning Methods
by Minghu Zhao, Dashuai Wang, Qing Yan, Zhuolin Li and Xiaoguang Liu
Agriculture 2025, 15(1), 36; https://doi.org/10.3390/agriculture15010036 - 26 Dec 2024
Viewed by 685
Abstract
Maize lodging is a prevalent stress that can significantly diminish corn yield and quality. Unmanned aerial vehicles (UAVs) remote sensing is a practical means to quickly obtain lodging information at field scale, such as area, severity, and distribution. However, existing studies primarily use [...] Read more.
Maize lodging is a prevalent stress that can significantly diminish corn yield and quality. Unmanned aerial vehicles (UAVs) remote sensing is a practical means to quickly obtain lodging information at field scale, such as area, severity, and distribution. However, existing studies primarily use machine learning (ML) methods to qualitatively analyze maize lodging (lodging and non-lodging) or estimate the maize lodging percentage, while there is less research using deep learning (DL) to quantitatively estimate maize lodging parameters (type, severity, and direction). This study aims to introduce advanced DL algorithms into the maize lodging classification task using UAV-multispectral images and investigate the advantages of DL compared with traditional ML methods. This study collected a UAV-multispectral dataset containing non-lodging maize and lodging maize with different lodging types, severities, and directions. Additionally, 22 vegetation indices (VIs) were extracted from multispectral data, followed by spatial aggregation and image cropping. Five ML classifiers and three DL models were trained to classify the maize lodging parameters. Finally, we compared the performance of ML and DL models in evaluating maize lodging parameters. The results indicate that the Random Forest (RF) model outperforms the other four ML algorithms, achieving an overall accuracy (OA) of 89.29% and a Kappa coefficient of 0.8852. However, the maize lodging classification performance of DL models is significantly better than that of ML methods. Specifically, Swin-T performs better than ResNet-50 and ConvNeXt-T, with an OA reaching 96.02% and a Kappa coefficient of 0.9574. This can be attributed to the fact that Swin-T can more effectively extract detailed information that accurately characterizes maize lodging traits from UAV-multispectral data. This study demonstrates that combining DL with UAV-multispectral data enables a more comprehensive understanding of maize lodging type, severity, and direction, which is essential for post-disaster rescue operations and agricultural insurance claims. Full article
Show Figures

Figure 1

Figure 1
<p>The workflow of this study.</p>
Full article ">Figure 2
<p>Overview of the study area. (<b>a</b>) Location of the study area; (<b>b</b>) UAV and sensors used; (<b>c</b>) UAV flight path and lodging stages.</p>
Full article ">Figure 3
<p>The difference in spectral reflectance of maize under 17 lodging parameters under 22 VIs. (<b>a</b>) Spectral reflectance for Blue to NIR bands; (<b>b</b>) Spectral reflectance for BNDVI to GDVI; (<b>c</b>) Spectral reflectance for GNDVI to RDVI; (<b>d</b>) Spectral reflectance for SAVI to VARI; (<b>e</b>) Overall spectral reflectance trend across 22 VIs.</p>
Full article ">Figure 4
<p>The correlation coefficient between each VI and maize lodging parameters.</p>
Full article ">Figure 5
<p>Correlation coefficient and significance analysis among 22 VIs.</p>
Full article ">Figure 6
<p>Feature selection using the Boruta algorithm. Features marked in blue represent “shadow features”, while green features are retained after the process.</p>
Full article ">Figure 7
<p>Training and validation of loss and accuracy curves of the three deep learning models. (<b>a</b>) Training loss curves; (<b>b</b>) Training accuracy curves; (<b>c</b>) Validation loss curves; (<b>d</b>) Validation accuracy curves.</p>
Full article ">Figure 8
<p>The t-SNE visualization of RF and Swin-T. (<b>a</b>) RF across different lodging types; (<b>b</b>) RF across lodging types and severities; (<b>c</b>) RF across lodging types, severities, and directions. (<b>d</b>) Swin-T across lodging types, severities, and directions.</p>
Full article ">
18 pages, 8385 KiB  
Article
Accurate Fruit Phenotype Reconstruction via Geometry-Smooth Neural Implicit Surface
by Wei Ying, Kewei Hu, Ayham Ahmed, Zhenfeng Yi, Junhong Zhao and Hanwen Kang
Agriculture 2024, 14(12), 2325; https://doi.org/10.3390/agriculture14122325 - 19 Dec 2024
Viewed by 663
Abstract
Accurate collection of plant phenotyping is critical to optimising sustainable farming practices in precision agriculture. Traditional phenotyping in controlled laboratory environments, while valuable, falls short in understanding plant growth under real-world conditions. Emerging sensor and digital technologies offer a promising approach for direct [...] Read more.
Accurate collection of plant phenotyping is critical to optimising sustainable farming practices in precision agriculture. Traditional phenotyping in controlled laboratory environments, while valuable, falls short in understanding plant growth under real-world conditions. Emerging sensor and digital technologies offer a promising approach for direct phenotyping of plants in farm environments. This study investigates a learning-based phenotyping method using neural implicit surfaces reconstruction to achieve accurate in situ phenotyping of pepper plants in greenhouse environments. To quantitatively evaluate the performance of this method, traditional point cloud registration on 3D scanning data is implemented for comparison. Experimental result shows that NIR (neural implicit surfaces reconstruction) achieves competitive accuracy compared to the 3D scanning method. The mean distance error between the scanner-based method and the NeRF (neural radiance fields)-based method is 0.811 mm. This study shows that the learning-based NeRF method has similar accuracy to the 3D scanning-based method but with greater scalability and faster deployment capabilities. Full article
(This article belongs to the Special Issue Research Advances in Perception for Agricultural Robots)
Show Figures

Figure 1

Figure 1
<p>Phenotyping robot operating in greenhouse and its reconstructed local geometry model.</p>
Full article ">Figure 2
<p>Pipeline of GS-NIR model. (<b>a</b>) is the sampling module, which generates the camera poses and coordinates according to the random dilation sampling (RDS) strategy and calculates the scene size by the adaptive scene module (ASM). (<b>b</b>) is coding module, which encodes the poses with Spherical Harmonics Encoding (SHE), and the point coordinates with Hash Table Encoding (HTE). (<b>c</b>) is the network training. (<b>d</b>) is the surface rendering and reconstruction, finally, according to the loss function and the backpropagation.</p>
Full article ">Figure 3
<p>Illustration of proposed Geometric Depth Smoothing Constraints. (<b>a</b>) is the sampling point. (<b>b</b>) is the process of obtaining the depth and colour values from the sampling points and performing Geometric Depth Smoothing Constraints.</p>
Full article ">Figure 4
<p>Illustration of proposed random dilation sampling. (<b>a</b>) is the sampling method in the original method. (<b>b</b>) is a demonstration of the random dilation sampling we used.</p>
Full article ">Figure 5
<p>Illustration of proposed 3D semantic segmentation method. Input data are point cloud data extracted from GS-NIR reconstruction. The input point cloud is segmented by the segmentation module. Finally, the segmentation result is obtained.</p>
Full article ">Figure 6
<p>Illustrations of the standard scenarios for the agriculture and collection facilities.</p>
Full article ">Figure 7
<p>Reconstructions of the method on the Pepper dataset are demonstrated: (<b>a</b>) is the ground truth, (<b>b</b>) is the mesh with the colours removed and (<b>c</b>) is the mesh with normal direction, demonstrating high-quality mesh and accurate normals.</p>
Full article ">Figure 8
<p>Qualitative studies of geometric depth smoothing constraints, random dilation sampling and adaptive scene module on the Pepper dataset. (<b>a</b>) is without geometric depth smoothing, (<b>b</b>) is without random dilation sampling and (<b>c</b>) is without adaptive scene module. Mesh is shown on top, normal mapping is shown on bottom.</p>
Full article ">Figure 9
<p>Illustration shows the high-quality reconstruction results of GS-NIR on orange, pitahaya, pitahaya flower and litchi datasets, including colour mesh, mesh and normal.</p>
Full article ">Figure 10
<p>Illustration shows the qualitative analysis of the results of the four reconstruction methods: GS-NIR, Reality Capture, instant-NGP and instant-NSR.</p>
Full article ">Figure 11
<p>(<b>a</b>–<b>c</b>) Illustration of matching results on three different samples by using the scanner method and GS-NIR model.</p>
Full article ">Figure 12
<p>Illustration of the point cloud acquired by the 3D scanner and several problems with point cloud acquisition. (<b>a</b>) Bleeding points with cut edges on smooth surfaces. (<b>b</b>) Sampling chromatic aberration. (<b>c</b>) Insufficient light produces artefacts. (<b>d</b>) Reflections from highly reflective surfaces.</p>
Full article ">Figure 13
<p>Illustrations of phenotypic data collected from a point cloud after 3D semantic segmentation and scale restoration. (<b>a</b>) the GS-NIR model reconstructed 3D mesh with colours. (<b>b</b>) Results after 3D point cloud segmentation using Fuse-PointNet.</p>
Full article ">
18 pages, 5411 KiB  
Article
Leveraging Neural Radiance Fields for Large-Scale 3D Reconstruction from Aerial Imagery
by Max Hermann, Hyovin Kwak, Boitumelo Ruf and Martin Weinmann
Remote Sens. 2024, 16(24), 4655; https://doi.org/10.3390/rs16244655 - 12 Dec 2024
Viewed by 983
Abstract
Since conventional photogrammetric approaches struggle with with low-texture, reflective, and transparent regions, this study explores the application of Neural Radiance Fields (NeRFs) for large-scale 3D reconstruction of outdoor scenes, since NeRF-based methods have recently shown very impressive results in these areas. We evaluate [...] Read more.
Since conventional photogrammetric approaches struggle with with low-texture, reflective, and transparent regions, this study explores the application of Neural Radiance Fields (NeRFs) for large-scale 3D reconstruction of outdoor scenes, since NeRF-based methods have recently shown very impressive results in these areas. We evaluate three approaches: Mega-NeRF, Block-NeRF, and Direct Voxel Grid Optimization, focusing on their accuracy and completeness compared to ground truth point clouds. In addition, we analyze the effects of using multiple sub-modules, estimating the visibility by an additional neural network and varying the density threshold for the extraction of the point cloud. For performance evaluation, we use benchmark datasets that correspond to the setting off standard flight campaigns and therefore typically have nadir camera perspective and relatively little image overlap, which can be challenging for NeRF-based approaches that are typically trained with significantly more images and varying camera angles. We show that despite lower quality compared to classic photogrammetric approaches, NeRF-based reconstructions provide visually convincing results in challenging areas. Furthermore, our study shows that in particular increasing the number of sub-modules and predicting the visibility using an additional neural network improves the quality of the resulting reconstructions significantly. Full article
Show Figures

Figure 1

Figure 1
<p>Flowchart of our methodology. The input data consists of images and their camera calibration. Based on this, the images are clustered using the camera centers and assigned to individual sub-modules, which are then reconstructed as NeRF volumes. The point clouds are then extracted from the camera viewpoint using random sampling.</p>
Full article ">Figure 2
<p>Visualization of the cluster pre-processing based on the position of the camera centers. The colors symbolize the four sub-modules.</p>
Full article ">Figure 3
<p>Point clouds derived for the TMB dataset. Each row shows examples of one approach, with COLMAP as a reference in the first row.</p>
Full article ">Figure 4
<p>Point clouds derived for the UseGeo3 dataset. Each row shows examples of one approach, with COLMAP as a reference in the first row.</p>
Full article ">Figure 5
<p>Point clouds derived for the Hessigheim 3D dataset. Each row shows examples of one approach, with COLMAP as a reference in the first row.</p>
Full article ">Figure 6
<p>Qualitative evaluation of different density thresholds using Mega-NeRF. As the density threshold increases, roads and canals in particular show significantly more holes.</p>
Full article ">Figure 7
<p>Relationship between accuracy, completeness and the density threshold. The letters represent the three methods analyzed: Mega-NeRF (M), Block-NeRF (B) and DVGO (D). The numbers in front of them refer to their position in <a href="#remotesensing-16-04655-t002" class="html-table">Table 2</a>, whereas the number after the letter indicates the number of sub-modules. The black arrows indicate the approximate direction of the values over the increasing threshold.</p>
Full article ">Figure 8
<p>Reconstructions with and without VisibilityNet derived for the UseGeo and Hessigheim 3D dataset. B in the first column denotes Block-NeRF. Point clouds extracted using the VisibilityNet show a better level of detail in general.</p>
Full article ">Figure 9
<p>Mip-NeRF based point cloud reconstruction without and with an identity function for the positional encoding, which means additionally concatenating the original input coordinates. Positional encoded features without an identity function lead to points that are scattered over several periodic layers, whereas with an identity function these artifacts disappear.</p>
Full article ">
20 pages, 5960 KiB  
Article
SMCNet: State-Space Model for Enhanced Corruption Robustness in 3D Classification
by Junhui Li, Bangju Huang and Lei Pan
Sensors 2024, 24(23), 7861; https://doi.org/10.3390/s24237861 - 9 Dec 2024
Viewed by 668
Abstract
Accurate classification of three-dimensional (3D) point clouds in real-world environments is often impeded by sensor noise, occlusions, and incomplete data. To overcome these challenges, we propose SMCNet, a robust multimodal framework for 3D point cloud classification. SMCNet combines multi-view projection and neural radiance [...] Read more.
Accurate classification of three-dimensional (3D) point clouds in real-world environments is often impeded by sensor noise, occlusions, and incomplete data. To overcome these challenges, we propose SMCNet, a robust multimodal framework for 3D point cloud classification. SMCNet combines multi-view projection and neural radiance fields (NeRFs) to generate high-fidelity 2D representations with enhanced texture realism, addressing occlusions and lighting inconsistencies effectively. The Mamba model is further refined within this framework by integrating a depth perception module to capture long-range point interactions and adopting a dual-channel structure to enhance point-wise feature extraction. Fine-tuning adapters for the CLIP and Mamba models are also introduced, significantly improving cross-domain adaptability. Additionally, an intelligent voting mechanism aggregates predictions from multiple viewpoints, ensuring enhanced classification robustness. Comprehensive experiments demonstrate that SMCNet achieves state-of-the-art performance, outperforming the PointNet++ baseline with a 0.5% improvement in mean overall accuracy (mOA) on ModelNet40 and a 7.9% improvement on ScanObjectNN. In corruption resistance, SMCNet reduces the mean corruption error (mCE) by 0.8% on ModelNet40-C and 3.6% on ScanObjectNN-C. These results highlight the effectiveness of SMCNet in tackling real-world classification scenarios with noisy and corrupted data. Full article
(This article belongs to the Special Issue Object Detection via Point Cloud Data)
Show Figures

Figure 1

Figure 1
<p>The architecture of the SMCNet: A multimodal 3D point cloud classification framework utilizing CLIP and Mamba.</p>
Full article ">Figure 2
<p>Some examples generated using our method. It is evident that our method generates samples with realistic textures and intricate lighting details.</p>
Full article ">Figure 3
<p>Structural diagram of the improved Mamba model and its depth perception module. This structure improves the accuracy and effectiveness of feature extraction by extracting features from three-dimensional point cloud data, combining global receptive fields and dynamic weighting mechanisms.</p>
Full article ">Figure 4
<p>The architecture of the improved adapter. It effectively integrates information from different dimensions through the introduction of linear projection and convolutional layers. In addition, the adapter injects processed 3D features after the output of each Swin Transformer layer, thereby enhancing the model’s ability to capture details and abstract representations of point cloud data.</p>
Full article ">Figure 5
<p>Some samples in ModelNet-C and ScanObjectNN-C.</p>
Full article ">Figure 6
<p>Ablation study on the impact of number of projections on 3D point cloud classification model performance. Blue: ScanObjectiNN; orange: ModelNet40.</p>
Full article ">
Back to TopTop