Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (422)

Search Parameters:
Keywords = shape and image texture

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 13201 KiB  
Article
Quantifying Shape and Texture Biases for Enhancing Transfer Learning in Convolutional Neural Networks
by Akinori Iwata and Masahiro Okuda
Signals 2024, 5(4), 721-735; https://doi.org/10.3390/signals5040040 - 4 Nov 2024
Viewed by 308
Abstract
Neural networks have inductive biases owing to the assumptions associated with the selected learning algorithm, datasets, and network structure. Specifically, convolutional neural networks (CNNs) are known for their tendency to exhibit textural biases. This bias is closely related to image classification accuracy. Aligning [...] Read more.
Neural networks have inductive biases owing to the assumptions associated with the selected learning algorithm, datasets, and network structure. Specifically, convolutional neural networks (CNNs) are known for their tendency to exhibit textural biases. This bias is closely related to image classification accuracy. Aligning the model’s bias with the dataset’s bias can significantly enhance performance in transfer learning, leading to more efficient learning. This study aims to quantitatively demonstrate that increasing shape bias within the network by varying kernel sizes and dilation rates improves accuracy on shape-dominant data and enables efficient learning with less data. Furthermore, we propose a novel method for quantitatively evaluating the balance between texture bias and shape bias. This method enables efficient learning by aligning the biases of the transfer learning dataset with those of the model. Systematically adjusting these biases allows CNNs to better fit data with specific biases. Compared to the original model, an accuracy improvement of up to 9.9% was observed. Our findings underscore the critical role of bias adjustment in CNN design, contributing to developing more efficient and effective image classification models. Full article
Show Figures

Figure 1

Figure 1
<p>The combined shape/texture images used to calculate the bias metric (on the left side) included a shape-dominant image in the upper part and a texture-dominant image in the lower part. This combined image is used for transfer learning in a two-class classification. Subsequently, these test-combined images are input into the model, and the shape/texture bias is calculated through gradient-weighted class activation mapping (Grad-CAM).</p>
Full article ">Figure 2
<p>Results of Grad-CAM visualization used with the proposed shape/texture bias metric. In the case of the original ResNeXt (a texture-biased model), the heat map of the lower image (which is texture-dominant) turns red, indicating that the model is focusing on it. On the other hand, for the ResNeXt with a dilation rate of three (a shape-biased model), the heat map of the upper image (which is shape-dominant) turns red, indicating its focus. This demonstrates that simply increasing the dilation rate results in a stronger bias towards shapes in the model. (<b>a</b>) The visualization images were obtained by applying Grad-CAM to the original ResNeXt (texture-biased model). (<b>b</b>) The visualization images were obtained by applying Grad-CAM to ResNeXt with a dilation of three (shape-biased model).</p>
Full article ">Figure 3
<p>Results of limiting the training data for each dataset. The accuracy rate is defined as the accuracy achieved with limited training data divided by the accuracy achieved with the entire dataset. The data ratio represents the proportion of the data used for training. (<b>a</b>) Results of reducing the amount of training data in the Logo dataset. (<b>b</b>) Results of reducing the amount of training data in the Cartoon dataset. (<b>c</b>) Results of reducing the amount of training data in the Sketch dataset.</p>
Full article ">Figure 3 Cont.
<p>Results of limiting the training data for each dataset. The accuracy rate is defined as the accuracy achieved with limited training data divided by the accuracy achieved with the entire dataset. The data ratio represents the proportion of the data used for training. (<b>a</b>) Results of reducing the amount of training data in the Logo dataset. (<b>b</b>) Results of reducing the amount of training data in the Cartoon dataset. (<b>c</b>) Results of reducing the amount of training data in the Sketch dataset.</p>
Full article ">
14 pages, 4843 KiB  
Article
Enhanced Multi-Scale Attention-Driven 3D Human Reconstruction from Single Image
by Yong Ren, Mingquan Zhou, Pengbo Zhou, Shibo Wang, Yangyang Liu, Guohua Geng, Kang Li and Xin Cao
Electronics 2024, 13(21), 4264; https://doi.org/10.3390/electronics13214264 - 30 Oct 2024
Viewed by 362
Abstract
Due to the inherent limitations of a single viewpoint, reconstructing 3D human meshes from a single image has long been a challenging task. While deep learning networks enable us to approximate the shape of unseen sides, capturing the texture details of the non-visible [...] Read more.
Due to the inherent limitations of a single viewpoint, reconstructing 3D human meshes from a single image has long been a challenging task. While deep learning networks enable us to approximate the shape of unseen sides, capturing the texture details of the non-visible side remains difficult with just one image. Traditional methods utilize Generative Adversarial Networks (GANs) to predict the normal maps of the non-visible side, thereby inferring detailed textures and wrinkles on the model’s surface. However, we have identified challenges with existing normal prediction networks when dealing with complex scenes, such as a lack of focus on local features and insufficient modeling of spatial relationships.To address these challenges, we introduce EMAR—Enhanced Multi-scale Attention-Driven Single-Image 3D Human Reconstruction. This approach incorporates a novel Enhanced Multi-Scale Attention (EMSA) mechanism, which excels at capturing intricate features and global relationships in complex scenes. EMSA surpasses traditional single-scale attention mechanisms by adaptively adjusting the weights between features, enabling the network to more effectively leverage information across various scales. Furthermore, we have improved the feature fusion method to better integrate representations from different scales. This enhanced feature fusion allows the network to more comprehensively understand both fine details and global structures within the image. Finally, we have designed a hybrid loss function tailored to the introduced attention mechanism and feature fusion method, optimizing the network’s training process and enhancing the quality of reconstruction results. Our network demonstrates significant improvements in performance for 3D human model reconstruction. Experimental results show that our method exhibits greater robustness to challenging poses compared to traditional single-scale approaches. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of our proposed EMAR. Given a single-view image <math display="inline"><semantics> <msub> <mi mathvariant="normal">I</mi> <mi>in</mi> </msub> </semantics></math> and the corresponding SMPL-X mesh <math display="inline"><semantics> <mi mathvariant="normal">M</mi> </semantics></math>, we first prepare the normal maps <math display="inline"><semantics> <msub> <mi mathvariant="normal">N</mi> <mrow> <mo>(</mo> <mi mathvariant="normal">V</mi> <mo>/</mo> <mi>IN</mi> <mo>)</mo> </mrow> </msub> </semantics></math> and SDF features for <math display="inline"><semantics> <mi mathvariant="normal">M</mi> </semantics></math>. Our enhanced multi-scale attention module aids the network in learning more discriminative feature representations across different scales. The proposed feature fusion module further enhances the feature representation, producing smoother and more continuous normal maps <math display="inline"><semantics> <msub> <mi mathvariant="normal">N</mi> <mrow> <mi>CB</mi> <mo>(</mo> <mi mathvariant="normal">V</mi> <mo>/</mo> <mi mathvariant="normal">N</mi> <mo>)</mo> </mrow> </msub> </semantics></math>. Finally, an Implicit Function (IF) is utilized to infer the occupancy field <math display="inline"><semantics> <mover accent="true"> <mi mathvariant="script">O</mi> <mo>^</mo> </mover> </semantics></math> of the clothed human body.</p>
Full article ">Figure 2
<p>Enhanced Multi-Scale Attention Module.</p>
Full article ">Figure 3
<p>Feature Fusion Module.</p>
Full article ">Figure 4
<p>Qualitative experiments on in-the-wild photos, where green is the front of the model and blue is the back of the model: Column (<b>a</b>) shows the input images, column (<b>b</b>) presents the results of EMAR, column (<b>c</b>) shows the results of PaMIR, column (<b>d</b>) presents the results of PIFu, column (<b>e</b>) shows the results of ICON, and column (<b>f</b>) shows the results of 2K2K.</p>
Full article ">Figure 5
<p>Comparison of Details with ICON.</p>
Full article ">Figure 6
<p>Failure ases.</p>
Full article ">
16 pages, 8285 KiB  
Technical Note
A Feature-Driven Inception Dilated Network for Infrared Image Super-Resolution Reconstruction
by Jiaxin Huang, Huicong Wang, Yuhan Li and Shijian Liu
Remote Sens. 2024, 16(21), 4033; https://doi.org/10.3390/rs16214033 - 30 Oct 2024
Viewed by 272
Abstract
Image super-resolution (SR) algorithms based on deep learning yield good visual performances on visible images. Due to the blurred edges and low contrast of infrared (IR) images, methods transferred directly from visible images to IR images have a poor performance and ignore the [...] Read more.
Image super-resolution (SR) algorithms based on deep learning yield good visual performances on visible images. Due to the blurred edges and low contrast of infrared (IR) images, methods transferred directly from visible images to IR images have a poor performance and ignore the demands of downstream detection tasks. Therefore, an Inception Dilated Super-Resolution (IDSR) network with multiple branches is proposed. A dilated convolutional branch captures high-frequency information to reconstruct edge details, while a non-local operation branch captures long-range dependencies between any two positions to maintain the global structure. Furthermore, deformable convolution is utilized to fuse features extracted from different branches, enabling adaptation to targets of various shapes. To enhance the detection performance of low-resolution (LR) images, we crop the images into patches based on target labels before feeding them to the network. This allows the network to focus on learning the reconstruction of the target areas only, reducing the interference of background areas in the target areas’ reconstruction. Additionally, a feature-driven module is cascaded at the end of the IDSR network to guide the high-resolution (HR) image reconstruction with feature prior information from a detection backbone. This method has been tested on the FLIR Thermal Dataset and the M3FD Dataset and compared with five mainstream SR algorithms. The final results demonstrate that our method effectively maintains image texture details. More importantly, our method achieves 80.55% mAP, outperforming other methods on FLIR Dataset detection accuracy, and with 74.7% mAP outperforms other methods on M3FD Dataset detection accuracy. Full article
Show Figures

Figure 1

Figure 1
<p>The overall structure of the proposed method, which mainly consists of three parts: a data preprocessing method to crop the images into patches, a SR reconstruction network to generate SR images and a feature-driven module to improve the detection accuracy.</p>
Full article ">Figure 2
<p>The architecture of the proposed ISDR for image super-resolution.</p>
Full article ">Figure 3
<p>The details of Inception Dilated Mixer (IDM).</p>
Full article ">Figure 4
<p>Frequency magnitude from 8 output channels of high-frequency extractor and low-frequency extractor.</p>
Full article ">Figure 5
<p>Super-resolution reconstruction results for LR images from the FLIR dataset (200 k iterations). Each two rows represent a scene, and from <b>top</b> to <b>bottom</b> are FLIR-08989 and FLIR-08951.</p>
Full article ">Figure 6
<p>The analysis of loss weight <math display="inline"><semantics> <mrow> <mo>(</mo> <mi>λ</mi> <mo>,</mo> <mi>μ</mi> <mo>)</mo> </mrow> </semantics></math> selection in our method.</p>
Full article ">Figure 7
<p>Super-resolution reconstruction results for LR images from the FLIR dataset by feature-driven IDSR (our method, 300 k iterations). Each two rows represent a scene, and from <b>top</b> to <b>bottom</b> are FLIR-08989 and FLIR-08951.</p>
Full article ">Figure 8
<p>Object detection (YOLOv7) results for SR images from the FLIR dataset by feature-driven IDSR (our method, 300 k iterations). Each two rows represent a scene, and from <b>top</b> to <b>bottom</b> are FLIR-09401 and FLIR-09572.</p>
Full article ">
15 pages, 3173 KiB  
Article
Joint Optimization-Based Texture and Geometry Enhancement Method for Single-Image-Based 3D Content Creation
by Jisun Park, Moonhyeon Kim, Jaesung Kim, Wongyeom Kim and Kyungeun Cho
Mathematics 2024, 12(21), 3369; https://doi.org/10.3390/math12213369 - 28 Oct 2024
Viewed by 521
Abstract
Recent studies have explored the generation of three-dimensional (3D) meshes from single images. A key challenge in this area is the difficulty of improving both the generalization and detail simultaneously in 3D mesh generation. To address this issue, existing methods utilize fixed-resolution mesh [...] Read more.
Recent studies have explored the generation of three-dimensional (3D) meshes from single images. A key challenge in this area is the difficulty of improving both the generalization and detail simultaneously in 3D mesh generation. To address this issue, existing methods utilize fixed-resolution mesh features to train networks for generalization. This approach is capable of generating the overall 3D shape without limitations on object categories. However, the generated shape often exhibits a blurred surface and suffers from suboptimal texture resolution due to the fixed-resolution mesh features. In this study, we propose a joint optimization method that enhances geometry and texture by integrating generalized 3D mesh generation with adjustable mesh resolution. Specifically, we apply an inverse-rendering-based remeshing technique that enables the estimation of complex-shaped mesh estimations without relying on fixed-resolution structures. After remeshing, we enhance the texture to improve the detailed quality of the remeshed mesh via a texture enhancement diffusion model. By separating the tasks of generalization, detailed geometry estimation, and texture enhancement and adapting different target features for each specific network, the proposed joint optimization method effectively addresses the characteristics of individual objects, resulting in increased surface detail and the generation of high-quality textures. Experimental results on the Google Scanned Objects and ShapeNet datasets demonstrate that the proposed method significantly improves the accuracy of 3D geometry and texture estimation, as evaluated by the PSNR, SSIM, LPIPS, and CD metrics. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of the proposed joint optimization-based texture and geometry enhancement method for single-image-based 3D mesh generation (stage 1: coarse 3D mesh reconstruction using 3D LRMs, stage 2: geometry enhancement, stage 3: texture enhancement).</p>
Full article ">Figure 2
<p>Proposed geometry enhancement for the joint optimization process.</p>
Full article ">Figure 3
<p>Proposed texture enhancement for the joint optimization process.</p>
Full article ">Figure 4
<p>Qualitative results of the proposed joint optimization method.</p>
Full article ">Figure 5
<p>Qualitative comparison results on the ShapeNet dataset: (<b>a</b>) TripoSR [<a href="#B5-mathematics-12-03369" class="html-bibr">5</a>], (<b>b</b>) CRM [<a href="#B6-mathematics-12-03369" class="html-bibr">6</a>], (<b>c</b>) InstantMesh [<a href="#B7-mathematics-12-03369" class="html-bibr">7</a>], (<b>d</b>) Real3D [<a href="#B59-mathematics-12-03369" class="html-bibr">59</a>], and (<b>e</b>) Ous.</p>
Full article ">Figure 6
<p>Qualitative comparison results on the GSO dataset: (<b>a</b>) TripoSR [<a href="#B5-mathematics-12-03369" class="html-bibr">5</a>], (<b>b</b>) CRM [<a href="#B6-mathematics-12-03369" class="html-bibr">6</a>], (<b>c</b>) InstantMesh [<a href="#B7-mathematics-12-03369" class="html-bibr">7</a>], (<b>d</b>) Real3D [<a href="#B59-mathematics-12-03369" class="html-bibr">59</a>], and (<b>e</b>) Ous.</p>
Full article ">
17 pages, 2991 KiB  
Article
Feature Extraction and Identification of Rheumatoid Nodules Using Advanced Image Processing Techniques
by Azmath Mubeen and Uma N. Dulhare
Rheumato 2024, 4(4), 176-192; https://doi.org/10.3390/rheumato4040014 - 24 Oct 2024
Viewed by 357
Abstract
Background/Objectives: Accurate detection and classification of nodules in medical images, particularly rheumatoid nodules, are critical due to the varying nature of these nodules, where their specific type is often unknown before analysis. This study addresses the challenges of multi-class prediction in nodule detection, [...] Read more.
Background/Objectives: Accurate detection and classification of nodules in medical images, particularly rheumatoid nodules, are critical due to the varying nature of these nodules, where their specific type is often unknown before analysis. This study addresses the challenges of multi-class prediction in nodule detection, with a specific focus on rheumatoid nodules, by employing a comprehensive approach to feature extraction and classification. We utilized a diverse dataset of nodules, including rheumatoid nodules sourced from the DermNet dataset and local rheumatologists. Method: This study integrates 62 features, combining traditional image characteristics with advanced graph-based features derived from a superpixel graph constructed through Delaunay triangulation. The key steps include image preprocessing with anisotropic diffusion and Retinex enhancement, superpixel segmentation using SLIC, and graph-based feature extraction. Texture analysis was performed using Gray-Level Co-occurrence Matrix (GLCM) metrics, while shape analysis was conducted with Fourier descriptors. Vascular pattern recognition, crucial for identifying rheumatoid nodules, was enhanced using the Frangi filter. A Hybrid CNN–Transformer model was employed for feature fusion, and feature selection and hyperparameter tuning were optimized using Gray Wolf Optimization (GWO) and Particle Swarm Optimization (PSO). Feature importance was assessed using SHAP values. Results: The proposed methodology achieved an accuracy of 85%, with a precision of 0.85, a recall of 0.89, and an F1 measure of 0.87, demonstrating the effectiveness of the approach in detecting and classifying rheumatoid nodules in both binary and multi-class classification scenarios. Conclusions: This study presents a robust tool for the detection and classification of nodules, particularly rheumatoid nodules, in medical imaging, offering significant potential for improving diagnostic accuracy and aiding in the early identification of rheumatoid conditions. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Gray-scale lesion image; (<b>b</b>) resized image.</p>
Full article ">Figure 2
<p>(<b>a</b>) Resized image; (<b>b</b>) segmented image.</p>
Full article ">Figure 3
<p>Graph density output.</p>
Full article ">Figure 4
<p>Rheumatoid nodule identified.</p>
Full article ">Figure 5
<p>Confusion matrix.</p>
Full article ">
20 pages, 4473 KiB  
Article
Mamba- and ResNet-Based Dual-Branch Network for Ultrasound Thyroid Nodule Segmentation
by Min Hu, Yaorong Zhang, Huijun Xue, Hao Lv and Shipeng Han
Bioengineering 2024, 11(10), 1047; https://doi.org/10.3390/bioengineering11101047 - 20 Oct 2024
Viewed by 781
Abstract
Accurate segmentation of thyroid nodules in ultrasound images is crucial for the diagnosis of thyroid cancer and preoperative planning. However, the segmentation of thyroid nodules is challenging due to their irregular shape, blurred boundary, and uneven echo texture. To address these challenges, a [...] Read more.
Accurate segmentation of thyroid nodules in ultrasound images is crucial for the diagnosis of thyroid cancer and preoperative planning. However, the segmentation of thyroid nodules is challenging due to their irregular shape, blurred boundary, and uneven echo texture. To address these challenges, a novel Mamba- and ResNet-based dual-branch network (MRDB) is proposed. Specifically, the visual state space block (VSSB) from Mamba and ResNet-34 are utilized to construct a dual encoder for extracting global semantics and local details, and establishing multi-dimensional feature connections. Meanwhile, an upsampling–convolution strategy is employed in the left decoder focusing on image size and detail reconstruction. A convolution–upsampling strategy is used in the right decoder to emphasize gradual feature refinement and recovery. To facilitate the interaction between local details and global context within the encoder and decoder, cross-skip connection is introduced. Additionally, a novel hybrid loss function is proposed to improve the boundary segmentation performance of thyroid nodules. Experimental results show that MRDB outperforms the state-of-the-art approaches with DSC of 90.02% and 80.6% on two public thyroid nodule datasets, TN3K and TNUI-2021, respectively. Furthermore, experiments on a third external dataset, DDTI, demonstrate that our method improves the DSC by 10.8% compared to baseline and exhibits good generalization to clinical small-scale thyroid nodule datasets. The proposed MRDB can effectively improve thyroid nodule segmentation accuracy and has great potential for clinical applications. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Figure 1

Figure 1
<p>The overall framework of MRDB.</p>
Full article ">Figure 2
<p>Main structure of dual-branch encoder. (<b>a</b>) VSSB; (<b>b</b>) ResNet-34.</p>
Full article ">Figure 3
<p>The 2D-selective-scan on an image.</p>
Full article ">Figure 4
<p>Main structure of dual-branch decoder. (<b>a</b>) Left decoder; (<b>b</b>) right decoder.</p>
Full article ">Figure 5
<p>Schematic of concatenate and addition. (<b>a</b>) Addition; (<b>b</b>) concatenate.</p>
Full article ">Figure 6
<p>PR curves of different advanced models on three datasets. (<b>a</b>) TN3K; (<b>b</b>) TNUI-2021; (<b>c</b>) DDTI.</p>
Full article ">Figure 7
<p>Segmentation results achieved by eight distinct methods on the TN3K dataset. The green and red lines represent ground truth and segmentation results, respectively.</p>
Full article ">Figure 8
<p>Segmentation results achieved by eight distinct methods on the TNUI-2021 dataset. The green and red lines represent ground truth and segmentation results, respectively.</p>
Full article ">Figure 9
<p>Segmentation results achieved by eight distinct methods on the DDTI dataset. The green and red lines represent ground truth and segmentation results, respectively.</p>
Full article ">
15 pages, 3814 KiB  
Article
Implementing Antimony Supply and Sustainability Measures via Extraction as a By-Product in Skarn Deposits: The Case of the Chalkidiki Pb-Zn-Au Mines
by Micol Bussolesi, Alessandro Cavallo, Vithleem Gazea, Evangelos Tzamos and Giovanni Grieco
Sustainability 2024, 16(20), 8991; https://doi.org/10.3390/su16208991 - 17 Oct 2024
Viewed by 742
Abstract
Antimony is one of the world’s scarcest metals and is listed as a Critical Raw Material (CRM) for the European Union. To meet the increasing demand for metals in a sustainable way, one of the strategies that could be implemented would be the [...] Read more.
Antimony is one of the world’s scarcest metals and is listed as a Critical Raw Material (CRM) for the European Union. To meet the increasing demand for metals in a sustainable way, one of the strategies that could be implemented would be the recovery of metals as by-products. This would decrease the amount of hazardous materials filling mining dumps. The present study investigates the potential for producing antimony as a by-product at the Olympias separation plant in Northern Greece. This plant works a skarn mineralization that shows interesting amounts of Sb. Boulangerite (Pb5Sb4S11) reports on Pb concentrate levels reached 8% in the analyzed product. This pre-enrichment is favorable in terms of boulangerite recovery since it can be separated from galena through froth flotation. Boulangerite distribution in the primary ore is quite heterogeneous in terms of the inclusion relationships and grain size. However, a qualitative assessment shows that the current Pb concentrate grain size is too coarse to successfully liberate a good amount of boulangerite. The use of image analysis and textural assessments is pivotal in determining shape factors and crystal size, which is essential for the targeting of flotation parameters during separation. The extraction of antimony as a by-product is possible through a two-step process; namely, (i) the preliminary concentration of boulangerite, followed by (ii) the hydrometallurgical extraction of the antimony from the boulangerite concentrate. The Olympias enrichment plant could therefore set a positive example by promoting the benefits of targeted Sb extraction as a by-product within similar sulfide deposits within the European territory. Full article
(This article belongs to the Special Issue Sustainable Mining and Circular Economy)
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) simplified geological map of the Kassandra mining district; modified after Högdahl et al. [<a href="#B14-sustainability-16-08991" class="html-bibr">14</a>]; (<b>b</b>) simplified flotation plant; sample tags are reported as numbers.</p>
Full article ">Figure 2
<p>Grain size distribution of Pb (<b>a</b>), Zn (<b>c</b>) and Au (<b>e</b>) concentrates and of the different final tailings (<b>b</b>,<b>d</b>,<b>f</b>) produced during the flotation process.</p>
Full article ">Figure 3
<p>Texture and optical features of Stratoni ore minerals; (<b>a</b>) pyrite, galena and sphalerite with chalcopyrite disease (optical microscope reflected light); (<b>b</b>) boulangerite replacing galena and arsenopyrite (BSE image); (<b>c</b>) acicular boulangerite crystals within carbonate gangue and arsenopyrite (BSE image); (<b>d</b>) pyrrhotite, pyrite, chalcopyrite and galena crystals (optical microscope, reflected light).</p>
Full article ">Figure 4
<p>XRD patterns and mineral modal contents (relative abundance %) of initial feed (<b>a</b>) Pb concentrate; (<b>b</b>) Pb tailing acting as Zn feed (<b>c</b>); Zn concentrate; (<b>d</b>) Zn tailing acting as Au feed; (<b>e</b>) Au concentrate; (<b>f</b>) final tailing (<b>g</b>).</p>
Full article ">Figure 5
<p>Examples of boulangerite distribution in selected samples; (<b>a</b>) rare boulangerite included within pyrite, sample A; (<b>b</b>) widespread boulangerite within pyrite, sample C1; (<b>c</b>) boulangerite included within galena and pyrite, sample C1; (<b>d</b>) boulangerite crystals crosscutting gangue and ore minerals, sample C1.</p>
Full article ">Figure 6
<p>Particle size distribution of boulangerite in eight representative BSE images; cumulative areas of boulangerite particles and frequency of distribution histograms are also reported; ore sample A-5 area a (<b>a</b>), ore sample C1, BSE image C1-30 area a (<b>b</b>), ore sample C1, BSE image C1-30 area b (<b>c</b>) and ore sample C1, BSE image C1-30 area c (<b>d</b>), ore sample C1, BSE image C1-3 area a (<b>e</b>), ore sample C1, BSE image C1-6 area a (<b>f</b>), ore sample C1, BSE image C1-12 area a (<b>g</b>), ore sample C1, BSE image C1-32 (<b>h</b>).</p>
Full article ">
21 pages, 8602 KiB  
Review
From Outside to Inside: The Subtle Probing of Globular Fruits and Solanaceous Vegetables Using Machine Vision and Near-Infrared Methods
by Junhua Lu, Mei Zhang, Yongsong Hu, Wei Ma, Zhiwei Tian, Hongsen Liao, Jiawei Chen and Yuxin Yang
Agronomy 2024, 14(10), 2395; https://doi.org/10.3390/agronomy14102395 - 16 Oct 2024
Viewed by 608
Abstract
Machine vision and near-infrared light technology are widely used in fruits and vegetable grading, as an important means of agricultural non-destructive testing. The characteristics of fruits and vegetables can easily be automatically distinguished by these two technologies, such as appearance, shape, color and [...] Read more.
Machine vision and near-infrared light technology are widely used in fruits and vegetable grading, as an important means of agricultural non-destructive testing. The characteristics of fruits and vegetables can easily be automatically distinguished by these two technologies, such as appearance, shape, color and texture. Nondestructive testing is reasonably used for image processing and pattern recognition, and can meet the identification and grading of single features and fusion features in production. Through the summary and analysis of the fruits and vegetable grading technology in the past five years, the results show that the accuracy of machine vision for fruits and vegetable size grading is 70–99.8%, the accuracy of external defect grading is 88–95%, and the accuracy of NIR and hyperspectral internal detection grading is 80.56–100%. Comprehensive research on multi-feature fusion technology in the future can provide comprehensive guidance for the construction of automatic integrated grading of fruits and vegetables, which is the main research direction of fruits and vegetable grading in the future. Full article
Show Figures

Figure 1

Figure 1
<p>Machine-vision technology features.</p>
Full article ">Figure 2
<p>Schematic diagram of the whole machine structure [<a href="#B19-agronomy-14-02395" class="html-bibr">19</a>]. 1. Hoister. 2. Sorting mechanism. 3. Chain delivery mechanism. 4. Image acquisition system. 5. Screening unit. 6. Electric cabinet. 7. Cup. 8. Belt flipping module. 9. PT-A. 10. Orange. 11. PT-B. 12. LED. 13. Ken. 14. Industrial camera.</p>
Full article ">Figure 3
<p>System hardware [<a href="#B21-agronomy-14-02395" class="html-bibr">21</a>].</p>
Full article ">Figure 4
<p>Core methods [<a href="#B21-agronomy-14-02395" class="html-bibr">21</a>].</p>
Full article ">Figure 5
<p>Overall dimension detection system of spheroid fruit.</p>
Full article ">Figure 6
<p>Schematic diagram of picture acquisition. 1. PC. 2. Blueberry. 3. Light source. 4. Industrial camera. 5. Camera support.</p>
Full article ">Figure 7
<p>Technical roadmap of potato surface recognition [<a href="#B19-agronomy-14-02395" class="html-bibr">19</a>].</p>
Full article ">Figure 8
<p>Overall design scheme of the system.</p>
Full article ">Figure 9
<p>Image acquisition device. 1. Objective table. 2. Mango. 3. Light source. 4. Light source. 5. Industrial camera. 6. Date line. 7. Computer.</p>
Full article ">Figure 10
<p>Comparison of mobile-citrus network results and development trace results. The tracking series of five identified defective mandarins are shown in figure. Green box stands that this mandarin is identified as normal case, while red box stands that this mandarin is identified as defect [<a href="#B43-agronomy-14-02395" class="html-bibr">43</a>].</p>
Full article ">Figure 11
<p>Recognition and grading before and after optimization [<a href="#B44-agronomy-14-02395" class="html-bibr">44</a>].</p>
Full article ">Figure 12
<p>Six kinds of spectral preprocessing methods [<a href="#B49-agronomy-14-02395" class="html-bibr">49</a>].</p>
Full article ">Figure 13
<p>The diagram illustrates the structure of a robotic hand system.</p>
Full article ">Figure 14
<p>Schematic diagram of seed-vigor detection and grading equipment. 1. Computer. 2. Near-infrared spectrometer. 3. Optical fiber probe. 4. Light source. 5. Seed box (energetic). 6. Seed box (not energetic). 7. Sorting pipeline. 8. Running pipeline. 9. Single granulation device. 10. Fiber optic sensor.</p>
Full article ">Figure 15
<p>Schematic diagram of hyperspectral imaging system.</p>
Full article ">Figure 16
<p>Hierarchical flow chart of hyperspectral ensemble learning of Lycium ruthenicum [<a href="#B55-agronomy-14-02395" class="html-bibr">55</a>].</p>
Full article ">Figure 17
<p>Hyperspectral imaging system. 1. CCD-camera. 2. Spectrograph. 3. Lens. 4. Lamps. 5. Black goji berry. 6. Translation platform. 7. Dark box. 8. Motor controller. 9. Computer.</p>
Full article ">Figure 18
<p>Equipment imaging using hyperspectral technology. (<b>a</b>) Schematic diagram, (<b>b</b>) picture [<a href="#B56-agronomy-14-02395" class="html-bibr">56</a>].</p>
Full article ">Figure 19
<p>Acquisition system of image information.</p>
Full article ">
21 pages, 2380 KiB  
Article
Crack Detection, Classification, and Segmentation on Road Pavement Material Using Multi-Scale Feature Aggregation and Transformer-Based Attention Mechanisms
by Arselan Ashraf, Ali Sophian and Ali Aryo Bawono
Constr. Mater. 2024, 4(4), 655-675; https://doi.org/10.3390/constrmater4040036 - 16 Oct 2024
Viewed by 741
Abstract
This paper introduces a novel approach to pavement material crack detection, classification, and segmentation using advanced deep learning techniques, including multi-scale feature aggregation and transformer-based attention mechanisms. The proposed methodology significantly enhances the model’s ability to handle varying crack sizes, shapes, and complex [...] Read more.
This paper introduces a novel approach to pavement material crack detection, classification, and segmentation using advanced deep learning techniques, including multi-scale feature aggregation and transformer-based attention mechanisms. The proposed methodology significantly enhances the model’s ability to handle varying crack sizes, shapes, and complex pavement textures. Trained on a dataset of 10,000 images, the model achieved substantial performance improvements across all tasks after integrating transformer-based attention. Detection precision increased from 88.7% to 94.3%, and IoU improved from 78.8% to 93.2%. In classification, precision rose from 88.3% to 94.8%, and recall improved from 86.8% to 94.2%. For segmentation, the Dice Coefficient increased from 80.3% to 94.7%, and IoU for segmentation advanced from 74.2% to 92.3%. These results underscore the model’s robustness and accuracy in identifying pavement cracks in challenging real-world scenarios. This framework not only advances automated pavement maintenance but also provides a foundation for future research focused on optimizing real-time processing and extending the model’s applicability to more diverse pavement conditions. Full article
Show Figures

Figure 1

Figure 1
<p>Trends in pavement crack detection, classification, and segmentation.</p>
Full article ">Figure 2
<p>Crack image segmentation.</p>
Full article ">Figure 3
<p>Research methodology.</p>
Full article ">Figure 4
<p>Confusion Matrix Analysis for Crack Classification before Transformer-Based Attention.</p>
Full article ">Figure 5
<p>Confusion matrix analysis for crack classification after transformer-based attention.</p>
Full article ">Figure 6
<p>Visual representation of the model’s output, showing the detection, classification, and segmentation of pavement cracks.</p>
Full article ">
28 pages, 4011 KiB  
Article
Advanced Deep Learning Fusion Model for Early Multi-Classification of Lung and Colon Cancer Using Histopathological Images
by A. A. Abd El-Aziz, Mahmood A. Mahmood and Sameh Abd El-Ghany
Diagnostics 2024, 14(20), 2274; https://doi.org/10.3390/diagnostics14202274 - 12 Oct 2024
Viewed by 914
Abstract
Background: In recent years, the healthcare field has experienced significant advancements. New diagnostic techniques, treatments, and insights into the causes of various diseases have emerged. Despite these progressions, cancer remains a major concern. It is a widespread illness affecting individuals of all ages [...] Read more.
Background: In recent years, the healthcare field has experienced significant advancements. New diagnostic techniques, treatments, and insights into the causes of various diseases have emerged. Despite these progressions, cancer remains a major concern. It is a widespread illness affecting individuals of all ages and leads to one out of every six deaths. Lung and colon cancer alone account for nearly two million fatalities. Though it is rare for lung and colon cancers to co-occur, the spread of cancer cells between these two areas—known as metastasis—is notably high. Early detection of cancer greatly increases survival rates. Currently, histopathological image (HI) diagnosis and appropriate treatment are key methods for reducing cancer mortality and enhancing survival rates. Digital image processing (DIP) and deep learning (DL) algorithms can be employed to analyze the HIs of five different types of lung and colon tissues. Methods: Therefore, this paper proposes a refined DL model that integrates feature fusion for the multi-classification of lung and colon cancers. The proposed model incorporates three DL architectures: ResNet-101V2, NASNetMobile, and EfficientNet-B0. Each model has limitations concerning variations in the shape and texture of input images. To address this, the proposed model utilizes a concatenate layer to merge the pre-trained individual feature vectors from ResNet-101V2, NASNetMobile, and EfficientNet-B0 into a single feature vector, which is then fine-tuned. As a result, the proposed DL model achieves high success in multi-classification by leveraging the strengths of all three models to enhance overall accuracy. This model aims to assist pathologists in the early detection of lung and colon cancer with reduced effort, time, and cost. The proposed DL model was evaluated using the LC25000 dataset, which contains colon and lung HIs. The dataset was pre-processed using resizing and normalization techniques. Results: The model was tested and compared with recent DL models, achieving impressive results: 99.8% for precision, 99.8% for recall, 99.8% for F1-score, 99.96% for specificity, and 99.94% for accuracy. Conclusions: Thus, the proposed DL model demonstrates exceptional performance across all classification categories. Full article
(This article belongs to the Special Issue Machine-Learning-Based Disease Diagnosis and Prediction)
Show Figures

Figure 1

Figure 1
<p>40× Tissue samples of the LC25000 dataset: (<b>a</b>) NSCLC, (<b>b</b>) SCLC, (<b>c</b>) benign lung tissue, (<b>d</b>) colon cancer tissue, and (<b>e</b>) benign colon tissue.</p>
Full article ">Figure 2
<p>The steps of the proposed DL model.</p>
Full article ">Figure 3
<p>The overall model architecture.</p>
Full article ">Figure 4
<p>The ResNet-101V2’s architecture.</p>
Full article ">Figure 5
<p>The architecture of the reduction cell and NASNet normal.</p>
Full article ">Figure 6
<p>The architecture of EfficientNet-B0.</p>
Full article ">Figure 7
<p>Training and validation loss of the three CNN models and the proposed fusion model.</p>
Full article ">Figure 8
<p>Training and validation accuracy of the three CNN models and the proposed fusion model.</p>
Full article ">Figure 9
<p>The confusion matrix for the three CNN models and the proposed fusion model on the test set.</p>
Full article ">
14 pages, 3677 KiB  
Article
MRI-Based Machine Learning for Prediction of Clinical Outcomes in Primary Central Nervous System Lymphoma
by Ching-Chung Ko, Yan-Lin Liu, Kuo-Chuan Hung, Cheng-Chun Yang, Sher-Wei Lim, Lee-Ren Yeh, Jeon-Hor Chen and Min-Ying Su
Life 2024, 14(10), 1290; https://doi.org/10.3390/life14101290 - 11 Oct 2024
Viewed by 685
Abstract
A portion of individuals diagnosed with primary central nervous system lymphomas (PCNSL) may experience early relapse or refractory (R/R) disease following treatment. This research explored the potential of MRI-based radiomics in forecasting R/R cases in PCNSL. Forty-six patients with pathologically confirmed PCNSL diagnosed [...] Read more.
A portion of individuals diagnosed with primary central nervous system lymphomas (PCNSL) may experience early relapse or refractory (R/R) disease following treatment. This research explored the potential of MRI-based radiomics in forecasting R/R cases in PCNSL. Forty-six patients with pathologically confirmed PCNSL diagnosed between January 2008 and December 2020 were included in this study. Only patients who underwent pretreatment brain MRIs and complete postoperative follow-up MRIs were included. Pretreatment contrast-enhanced T1WI, T2WI, and T2 FLAIR imaging were analyzed. A total of 107 radiomic features, including 14 shape-based, 18 first-order statistical, and 75 texture features, were extracted from each sequence. Predictive models were then built using five different machine learning algorithms to predict R/R in PCNSL. Of the included 46 PCNSL patients, 20 (20/46, 43.5%) patients were found to have R/R. In the R/R group, the median scores in predictive models such as support vector machine, k-nearest neighbors, linear discriminant analysis, naïve Bayes, and decision trees were significantly higher, while the apparent diffusion coefficient values were notably lower compared to those without R/R (p < 0.05). The support vector machine model exhibited the highest performance, achieving an overall prediction accuracy of 83%, a precision rate of 80%, and an AUC of 0.78. Additionally, when analyzing tumor progression, patients with elevated support vector machine and naïve Bayes scores demonstrated a significantly reduced progression-free survival (p < 0.05). These findings suggest that preoperative MRI-based radiomics may provide critical insights for treatment strategies in PCNSL. Full article
(This article belongs to the Special Issue Advances in Artificial Intelligence for Medical Image Analysis)
Show Figures

Figure 1

Figure 1
<p>Flowchart for building the radiomics-based predictive model. The PCNSL is segmented by a fuzzy c-means clustering algorithm on contrast-enhanced T1WI, and the segmented ROI is mapped to T2WI and T2 FLAIR. In feature extraction, a total of 107 radiomic features, including 14 shape-based features, 18 first-order statistics features, and 75 texture features in each imaging sequence, were extracted. Further, the most important 5 features were selected by SVM, and each feature was normalized by the Z-score method. Subsequently, predictive models were built using five different ML algorithms to predict R/R PCNSL.</p>
Full article ">Figure 2
<p>A 74-year-old woman was diagnosed with PCNSL pathologically. Imaging studies included (<b>A</b>) axial T2WI and (<b>B</b>) axial contrast-enhanced T1WI, which identified an enhancing tumor (white arrow) in the right basal ganglia, along with peritumoral edema (open arrowhead) and intratumoral necrosis (black arrow). (<b>C</b>) DWI showed hyperintensity in the tumor (white arrow), suggesting restricted random motion of water molecules. (<b>D</b>) The ADC value, measured within a defined circular region, was 0.56 × 10<sup>−3</sup> mm<sup>2</sup>/s. In ML algorithms, the computed scores were as follows: 1.12 for SVM, 0.78 for KNN, 0.69 for LDA, 0.89 for NB, and 0.77 for DT. (<b>E</b>–<b>G</b>) Following first-line chemotherapy, a reduction in tumor size (open arrow) was noted, leading to a complete response (<b>G</b>). (<b>H</b>) However, 51 months later, recurrent tumors (arrowheads) were detected.</p>
Full article ">Figure 3
<p>The box plots illustrate the values for (<b>A</b>) SVM, (<b>B</b>) KNN, (<b>C</b>) LDA, (<b>D</b>) NB, (<b>E</b>) DT, and (<b>F</b>) ADC in PCNSL patients with and without R/R disease. The R/R group exhibited higher scores for SVM, KNN, LDA, NB, and DT, alongside lower ADC values when compared to the non-relapsed group. * Statistical difference (<span class="html-italic">p</span> &lt; 0.05). The boxes in the plots represent the interquartile range, while the whiskers extend to indicate the full range of the data. The median value for each category is marked by a horizontal line within the box. Outliers are depicted as circles, which are defined as data points falling more than 1.5 times the interquartile range below the first quartile or above the third quartile. Additionally, extreme values are indicated by stars, representing those that exceed three times the interquartile range above the third quartile.</p>
Full article ">Figure 4
<p>The ROC curves were analyzed for two categories: (<b>A</b>) MRI-based radiomic ML algorithms and (<b>B</b>) ADC values in predicting R/R PCNSL. The AUC values for the various models were as follows: SVM achieved an AUC of 0.78, followed by KNN at 0.73, LDA at 0.68, NB at 0.72, DT at 0.74, and the ADC model at 0.71.</p>
Full article ">Figure 5
<p>Kaplan–Meier curves illustrating overall progression-free survival trends based on cut-off points for (<b>A</b>) SMV, (<b>B</b>) KNN, (<b>C</b>) LDA, (<b>D</b>) NB, (<b>E</b>) DT, and (<b>F</b>) ADC values. * Statistical difference (<span class="html-italic">p</span> &lt; 0.05).</p>
Full article ">
20 pages, 2673 KiB  
Article
Research on a Metal Surface Defect Detection Algorithm Based on DSL-YOLO
by Zhiwen Wang, Lei Zhao, Heng Li, Xiaojun Xue and Hui Liu
Sensors 2024, 24(19), 6268; https://doi.org/10.3390/s24196268 - 27 Sep 2024
Viewed by 998
Abstract
In industrial manufacturing, metal surface defect detection often suffers from low detection accuracy, high leakage rates, and false detection rates. To address these issues, this paper proposes a novel model named DSL-YOLO for metal surface defect detection. First, we introduce the C2f_DWRB structure [...] Read more.
In industrial manufacturing, metal surface defect detection often suffers from low detection accuracy, high leakage rates, and false detection rates. To address these issues, this paper proposes a novel model named DSL-YOLO for metal surface defect detection. First, we introduce the C2f_DWRB structure by integrating the DWRB module with C2f, enhancing the model’s ability to detect small and occluded targets and effectively extract sparse spatial features. Second, we design the SADown module to improve feature extraction in challenging tasks involving blurred images or very small objects. Finally, to further enhance the model’s capacity to extract multi-scale features and capture critical image information (such as edges, textures, and shapes) without significantly increasing memory usage and computational cost, we propose the LASPPF structure. Experimental results demonstrate that the improved model achieves significant performance gains on both the GC10-DET and NEU-DET datasets, with a [email protected] increase of 4.2% and 2.6%, respectively. The improvements in detection accuracy highlight the model’s ability to address common challenges while maintaining efficiency and feasibility in metal surface defect detection, providing a valuable solution for industrial applications. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
Show Figures

Figure 1

Figure 1
<p>The structure of the DSL-YOLO network.</p>
Full article ">Figure 2
<p>The structure of the DWRB network.</p>
Full article ">Figure 3
<p>The structure of the C2f_DWRB network.</p>
Full article ">Figure 4
<p>The structure of the SADown network.</p>
Full article ">Figure 5
<p>The structure of the LASPPF network.</p>
Full article ">Figure 6
<p>An example of the GC10-DET steel strip surface defect dataset.</p>
Full article ">Figure 7
<p>An example of the NEU-DET steel strip surface defect dataset.</p>
Full article ">Figure 8
<p>Comparison of heatmap visualization results between the GC10-DET and NEU-DET datasets.</p>
Full article ">Figure 9
<p>Comparison of visualization results on the GC10-DET dataset.</p>
Full article ">Figure 10
<p>Comparison of visualization results on the NEU-DET dataset.</p>
Full article ">
27 pages, 12821 KiB  
Article
FishDet-YOLO: Enhanced Underwater Fish Detection with Richer Gradient Flow and Long-Range Dependency Capture through Mamba-C2f
by Chen Yang, Jian Xiang, Xiaoyong Li and Yunjie Xie
Electronics 2024, 13(18), 3780; https://doi.org/10.3390/electronics13183780 - 23 Sep 2024
Viewed by 865
Abstract
The fish detection task is an essential component of marine exploration, which helps scientists monitor fish population numbers and diversity and understand changes in fish behavior and habitat. It also plays a significant role in assessing the health of marine ecosystems, formulating conservation [...] Read more.
The fish detection task is an essential component of marine exploration, which helps scientists monitor fish population numbers and diversity and understand changes in fish behavior and habitat. It also plays a significant role in assessing the health of marine ecosystems, formulating conservation measures, and maintaining biodiversity. However, there are two main issues with current fish detection algorithms. First, the lighting conditions underwater are significantly different from those on land. In addition, light scattering and absorption in water trigger uneven illumination, color distortion, and reduced contrast in images. The accuracy of detection algorithms can be affected by these lighting variations. Second, the wide variation of fish species in shape, color, and size brings about some challenges. As some fish have complex textures or camouflage features, it is difficult to differentiate them using current detection algorithms. To address these issues, we propose a fish detection algorithm—FishDet-YOLO—through improvement in the YOLOv8 algorithm. To tackle the complexities of underwater environments, we design an Underwater Enhancement Module network (UEM) that can be jointly trained with YOLO. The UEM enhances the details of underwater images via end-to-end training with YOLO. To address the diversity of fish species, we leverage the Mamba model’s capability for long-distance dependencies without increasing computational complexity and integrate it with the C2f from YOLOv8 to create the Mamba-C2f. Through this design, the adaptability in handling complex fish detection tasks is improved. In addition, the RUOD and DUO public datasets are used to train and evaluate FishDet-YOLO. FishDet-YOLO achieves mAP scores of 89.5% and 88.8% on the test sets of RUOD and DUO, respectively, marking an improvement of 8% and 8.2% over YOLOv8. It also surpasses recent state-of-the-art general object detection and underwater fish detection algorithms. Full article
Show Figures

Figure 1

Figure 1
<p>FishDet-YOLO network architecture. The red square represents each individual component.</p>
Full article ">Figure 2
<p>Structure of Detail-Aware Branch.</p>
Full article ">Figure 3
<p>Structure of Low-Frequency-Aware Branch.</p>
Full article ">Figure 4
<p>Network structure diagram of Mamba-C2f.</p>
Full article ">Figure 5
<p>RUOD dataset distribution: (<b>a</b>) Number of samples per category in training set. (<b>b</b>) Number of samples per category in test set.</p>
Full article ">Figure 6
<p>DUO dataset distribution: (<b>a</b>) Number of samples per category in training set. (<b>b</b>) Number of samples per category in test set.</p>
Full article ">Figure 7
<p>P-R Curves on the RUOD dataset for YOLOv8 and FishDet-YOLO. (<b>a</b>) P-R Curve for YOLOv8; (<b>b</b>) P-R Curve for FishDet-YOLO.</p>
Full article ">Figure 8
<p>P-R Curves on the DUO dataset for YOLOv8 and FishDet-YOLO. (<b>a</b>) P-R Curve for YOLOv8; (<b>b</b>) P-R Curve for FishDet-YOLO.</p>
Full article ">Figure 9
<p>Detection visualization results on the RUOD dataset. (<b>a</b>) YOLOv8 detection results; (<b>b</b>) FishDet-YOLO detection results; (<b>c</b>) ground truth.</p>
Full article ">Figure 9 Cont.
<p>Detection visualization results on the RUOD dataset. (<b>a</b>) YOLOv8 detection results; (<b>b</b>) FishDet-YOLO detection results; (<b>c</b>) ground truth.</p>
Full article ">Figure 10
<p>Detection visualization results on the DUO dataset. (<b>a</b>) YOLOv8 detection results; (<b>b</b>) FishDet-YOLO detection results; (<b>c</b>) ground truth.</p>
Full article ">Figure 10 Cont.
<p>Detection visualization results on the DUO dataset. (<b>a</b>) YOLOv8 detection results; (<b>b</b>) FishDet-YOLO detection results; (<b>c</b>) ground truth.</p>
Full article ">Figure 11
<p>Visualization comparison of UEM output: (<b>a</b>) original image; (<b>b</b>) UEM output.</p>
Full article ">Figure 11 Cont.
<p>Visualization comparison of UEM output: (<b>a</b>) original image; (<b>b</b>) UEM output.</p>
Full article ">
19 pages, 5824 KiB  
Article
Feature-Selection-Based Unsupervised Transfer Learning for Change Detection from VHR Optical Images
by Qiang Chen, Peng Yue, Yingjun Xu, Shisong Cao, Lei Zhou, Yang Liu and Jianhui Luo
Remote Sens. 2024, 16(18), 3507; https://doi.org/10.3390/rs16183507 - 21 Sep 2024
Viewed by 549
Abstract
Accurate understanding of urban land use change information is of great significance for urban planning, urban monitoring, and disaster assessment. The use of Very-High-Resolution (VHR) remote sensing images for change detection on urban land features has gradually become mainstream. However, most existing transfer [...] Read more.
Accurate understanding of urban land use change information is of great significance for urban planning, urban monitoring, and disaster assessment. The use of Very-High-Resolution (VHR) remote sensing images for change detection on urban land features has gradually become mainstream. However, most existing transfer learning-based change detection models compute multiple deep image features, leading to feature redundancy. Therefore, we propose a Transfer Learning Change Detection Model Based on Change Feature Selection (TL-FS). The proposed method involves using a pretrained transfer learning model framework to compute deep features from multitemporal remote sensing images. A change feature selection algorithm is then designed to filter relevant change information. Subsequently, these change features are combined into a vector. The Change Vector Analysis (CVA) is employed to calculate the magnitude of change in the vector. Finally, the Fuzzy C-Means (FCM) classification is utilized to obtain binary change detection results. In this study, we selected four VHR optical image datasets from Beijing-2 for the experiment. Compared with the Change Vector Analysis and Spectral Gradient Difference, the TL-FS method had maximum increases of 26.41% in the F1-score, 38.04% in precision, 29.88% in recall, and 26.15% in the overall accuracy. The results of the ablation experiments also indicate that TL-FS could provide clearer texture and shape detections for dual-temporal VHR image changes. It can effectively detect complex features in urban scenes. Full article
Show Figures

Figure 1

Figure 1
<p>Flowchart of the TL-FS method.</p>
Full article ">Figure 2
<p>The process of feature selection variation.</p>
Full article ">Figure 3
<p>The four datasets used in the experimentations named A, B, C, and D; (<b>a</b>–<b>c</b>) the prechange images, postchange images, and the standard reference change maps generated through visual interpretation, respectively.</p>
Full article ">Figure 4
<p>The change detection results of the different methods on dataset A: (<b>a</b>–<b>g</b>) CVA, IRMAD, PCA-Kmeans, DSFA, SARAS-Net, TL-FS, and standard reference change map, respectively.</p>
Full article ">Figure 5
<p>The change detection results of the different methods on dataset B: (<b>a</b>–<b>g</b>) CVA, IRMAD, PCA-Kmeans, DSFA, SARAS-Net, TL-FS, and standard reference change map, respectively.</p>
Full article ">Figure 6
<p>The change detection results of the different methods on dataset C: (<b>a</b>–<b>g</b>) CVA, IRMAD, PCA-Kmeans, DSFA, SARAS-Net, TL-FS, and standard reference change map, respectively.</p>
Full article ">Figure 7
<p>The change detection results of the different methods on dataset D: (<b>a</b>–<b>g</b>) CVA, IRMAD, PCA-Kmeans, DSFA, SARAS-Net, TL-FS, and standard reference change map, respectively.</p>
Full article ">Figure 8
<p>Comparison of the accuracy verification results of different change detection methods.</p>
Full article ">Figure 9
<p>Comparison of TL-FS and TL-NFS change detection results: (<b>a</b>,<b>b</b>), (<b>c</b>,<b>d</b>), (<b>e</b>,<b>f</b>), and (<b>g</b>,<b>h</b>), respectively, represent the results of datasets A, B, C, and D with and without the FS module.</p>
Full article ">Figure 10
<p>Comparison of F1-scores between TL-FS and TL-NFS.</p>
Full article ">Figure 11
<p>Different layer feature analysis results (taking dataset A as an example): (<b>a</b>,<b>b</b>) urban feature extraction maps of conv2, conv4, and conv7 in the prechange and postchange phases, respectively; (<b>c</b>) feature difference maps of the three convolutional layers; (<b>d</b>) change maps of the three convolutional layers.</p>
Full article ">Figure 12
<p>Comparison of the results between TL-FS and TL-NFS: (<b>a</b>) parts of the four datasets in 2018; (<b>b</b>) parts of the four datasets in 2021; (<b>c</b>) change detection result graphs for TL-NFS; (<b>d</b>) change detection result graphs for TL-FS. The red circles indicate the difference in detail handling between TL-FS and TL-NFS.</p>
Full article ">
20 pages, 4626 KiB  
Article
Three-Dimensional Reconstruction of Indoor Scenes Based on Implicit Neural Representation
by Zhaoji Lin, Yutao Huang and Li Yao
J. Imaging 2024, 10(9), 231; https://doi.org/10.3390/jimaging10090231 - 16 Sep 2024
Viewed by 706
Abstract
Reconstructing 3D indoor scenes from 2D images has always been an important task in computer vision and graphics applications. For indoor scenes, traditional 3D reconstruction methods have problems such as missing surface details, poor reconstruction of large plane textures and uneven illumination areas, [...] Read more.
Reconstructing 3D indoor scenes from 2D images has always been an important task in computer vision and graphics applications. For indoor scenes, traditional 3D reconstruction methods have problems such as missing surface details, poor reconstruction of large plane textures and uneven illumination areas, and many wrongly reconstructed floating debris noises in the reconstructed models. This paper proposes a 3D reconstruction method for indoor scenes that combines neural radiation field (NeRFs) and signed distance function (SDF) implicit expressions. The volume density of the NeRF is used to provide geometric information for the SDF field, and the learning of geometric shapes and surfaces is strengthened by adding an adaptive normal prior optimization learning process. It not only preserves the high-quality geometric information of the NeRF, but also uses the SDF to generate an explicit mesh with a smooth surface, significantly improving the reconstruction quality of large plane textures and uneven illumination areas in indoor scenes. At the same time, a new regularization term is designed to constrain the weight distribution, making it an ideal unimodal compact distribution, thereby alleviating the problem of uneven density distribution and achieving the effect of floating debris removal in the final model. Experiments show that the 3D reconstruction effect of this paper on ScanNet, Hypersim, and Replica datasets outperforms the state-of-the-art methods. Full article
(This article belongs to the Special Issue Geometry Reconstruction from Images (2nd Edition))
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Distortion of reconstructed 3D models under uneven lighting conditions enclosed by the red dashed box; (<b>b</b>) distortion of 3D reconstruction of smooth planar texture areas enclosed by the red dashed box; (<b>c</b>) floating debris noise in red box in 3D reconstruction.</p>
Full article ">Figure 2
<p>Overall framework of the method.</p>
Full article ">Figure 3
<p>(<b>a</b>) The normal estimation is inaccurate in some fine structures enclosed by red dashed box, such as chair legs, based on TiltedSN normal estimation module; (<b>b</b>) we use an adaptive normal prior method to derive accurate normals based on the consistency of adjacent images. In the red dashed box, the fine structures are accurately reconstructed.</p>
Full article ">Figure 3 Cont.
<p>(<b>a</b>) The normal estimation is inaccurate in some fine structures enclosed by red dashed box, such as chair legs, based on TiltedSN normal estimation module; (<b>b</b>) we use an adaptive normal prior method to derive accurate normals based on the consistency of adjacent images. In the red dashed box, the fine structures are accurately reconstructed.</p>
Full article ">Figure 4
<p>Neural implicit reconstruction process.</p>
Full article ">Figure 5
<p>Distribution diagram of distance and weight values between sampling points.</p>
Full article ">Figure 6
<p>Three-dimensional model reconstructed from scenes in the ScanNet dataset. (<b>a</b>) Comparison of 3D models; (<b>b</b>) comparison of the specific details in the red dashed box.</p>
Full article ">Figure 7
<p>Qualitive comparison for thin structure areas using ScanNet dataset: (<b>a</b>) reference image; (<b>b</b>) model reconstructed without using normal prior; (<b>c</b>) model reconstructed with normal prior and without adaptive scheme; (<b>d</b>) model reconstructed with normal prior and adaptive scheme.</p>
Full article ">Figure 8
<p>Qualitive comparison for reflective areas using Hypersim dataset: (<b>a</b>) reference image; (<b>b</b>) model reconstructed without using normal prior; (<b>c</b>) model reconstructed with normal prior and without adaptive scheme; (<b>d</b>) model reconstructed with normal prior and adaptive scheme.</p>
Full article ">Figure 9
<p>Visual comparison for a scene with a large amount of floating debris using the ScanNet dataset; (<b>a</b>) reconstruction result without adding a distortion loss function; (<b>b</b>) reconstruction result with a distortion loss function.</p>
Full article ">Figure 10
<p>Visual comparison for a scene with single floating debris areas enclosed by red dashed box using the ScanNet dataset; (<b>a</b>) reconstruction result without adding distortion loss function; (<b>b</b>) reconstruction result with distortion loss function.</p>
Full article ">Figure 11
<p>The limitations of this method in the 3D reconstruction of scenes with clutter, occlusion, soft non-solid objects, and blurred images, using the ScanNet dataset.</p>
Full article ">
Back to TopTop