Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (16,883)

Search Parameters:
Keywords = computational imaging

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 2388 KiB  
Article
DeFFace: Deep Face Recognition Unlocked by Illumination Attributes
by Xiangling Zhou, Zhongmin Gao, Huanji Gong and Shenglin Li
Electronics 2024, 13(22), 4566; https://doi.org/10.3390/electronics13224566 - 20 Nov 2024
Abstract
General face recognition is currently one of the key technologies in the field of computer vision, and it has achieved tremendous success with the support of deep-learning technology. General face recognition models currently exhibit extremely high accuracy on some high-quality face datasets. However, [...] Read more.
General face recognition is currently one of the key technologies in the field of computer vision, and it has achieved tremendous success with the support of deep-learning technology. General face recognition models currently exhibit extremely high accuracy on some high-quality face datasets. However, their performance decreases in challenging environments, such as low-light scenes. To enhance the performance of face recognition models in low-light scenarios, we propose a face recognition approach based on feature decoupling and fusion (DeFFace). Our main idea is to extract facial-related features from images that are not influenced by illumination. First, we introduce a feature decoupling network (D-Net) to decouple the image into facial-related features and illumination-related features. By incorporating the illumination triplet loss optimized with unpaired identity IDs, we regulate illumination-related features to minimize the impact of lighting conditions on the face recognition system. However, the decoupled features are relatively coarse. Therefore, we introduce a feature fusion network (F-Net) to further extract the residual facial-related features from the illumination-related features and fuse them with the initial facial-related features. Finally, we introduce a lighting-facial correlation loss to reduce the correlation between the two decoupled features in the specific space. We demonstrate the effectiveness of our method on four real-world low-light datasets and three simulated low-light datasets. We retrain multiple general face recognition methods using our proposed low-light training sets to further validate the advanced performance of our method. Compared to general face recognition methods, our approach achieves an average improvement of more than 2.11 percentage points on low-light face datasets. In comparison with image enhancement-based solutions, our method shows an average improvement of around 16 percentage points on low-light datasets, and it also delivers an average improvement of approximately 5.67 percentage points when compared to illumination normalization-based methods. Full article
Show Figures

Figure 1

Figure 1
<p>Diagrams of different approaches for low-light face image recognition. (<b>a</b>) Illumination-enhanced low-light face recognition method. (<b>b</b>) Illumination normalization-based low-light face recognition method using Retinex theory. (<b>c</b>) Face recognition based on near-infrared camera. (<b>d</b>) Our Low-light face recognition method based on feature decoupling and fusion scheme (DeFFace).</p>
Full article ">Figure 2
<p>The DeFFace architecture diagram primarily consists of four parts: backbone, D-Net, F-Net, and face recognition. The D-Net is constrained by the illumination triplet loss, the F-Net is constrained by the lighting–facial correlation loss, and the face recognition module is constrained by the Softmax-based loss.</p>
Full article ">Figure 3
<p>Diagram showing the detailed network configuration of the D-Net and F-Net. Subfigure (<b>a</b>) presents the detailed configuration of D-Net, and subfigure (<b>b</b>) presents the detailed configuration of the F-Net.</p>
Full article ">Figure 4
<p>Examples from the LowCASIA-Train, where green boxes indicate well-illuminated facial areas, blue boxes denote low-light facial areas, and yellow boxes represent randomly selected lighting triples.</p>
Full article ">Figure 5
<p>Examples from the validation set. LFW* indicates the Low-light version of LFW.</p>
Full article ">Figure 6
<p>Left depicts the performance of decoupling sub-modules with identical layer configurations, while right illustrates the performance of decoupling sub-modules with varying numbers of layers.</p>
Full article ">Figure 7
<p>The visualized results of rank-10 retrieval on the low-light face dataset LFW* using our method and the ArcFace method are presented. Using the person on the far left as an example, the green dashed box indicates a match as the same person, the yellow dashed box indicates a different person, and the blue and orange text boxes represent confidence scores. We display the top rank-10 visualization results from high to low.</p>
Full article ">
15 pages, 8053 KiB  
Article
In Situ Monitoring of Anodic Acidification Process Using 3D μ-XCT Method
by Chaoqun Zeng, Shanshan Qin, Zhijun Deng and Miaochang Zhu
Materials 2024, 17(22), 5662; https://doi.org/10.3390/ma17225662 - 20 Nov 2024
Abstract
Debonding of the primary anode caused by anodic acidification is one of the major failure modes of the impressed current cathodic protection (ICCP) system in reinforced concrete structures. This study used 3D micro X-ray computed tomography (μ-XCT) to monitor the in situ evolution [...] Read more.
Debonding of the primary anode caused by anodic acidification is one of the major failure modes of the impressed current cathodic protection (ICCP) system in reinforced concrete structures. This study used 3D micro X-ray computed tomography (μ-XCT) to monitor the in situ evolution of the anodic acidification-affected zone. Samples were scanned after 0 to 40 days of the accelerated anodic acidification test. The anodic acidification-affected zone was identified in μ-XCT images using the gray level segmentation method. The total volume of this zone was measured using the 3D reconstruction method. It was found that detailed 3D information can be extracted using the 3D reconstruction method. The spatial heterogeneity was analyzed using this reconstructed volume information. The Faraday efficiency was calculated and found to increase after 20 days of operation. It was also found that the affected zone was proportional to the input electrical energy. The proposed model is useful for estimating the durability of an ICCP system. Full article
(This article belongs to the Section Construction and Building Materials)
Show Figures

Figure 1

Figure 1
<p>Specimen preparation for accelerated acidification test: (<b>a</b>) side view; (<b>b</b>) real specimen photo; (<b>c</b>) front view.</p>
Full article ">Figure 2
<p>Experimental setup of the accelerated acidification test.</p>
Full article ">Figure 3
<p>Standard linear attenuation coefficient of the component in our samples according to the National Institute of Standards and Technology [<a href="#B19-materials-17-05662" class="html-bibr">19</a>].</p>
Full article ">Figure 4
<p>Inspection of the anodic acidification-affected zone using 3D μ-XCT technique after accelerated acidification test: (<b>a</b>) principle of X-ray computed tomography; (<b>b</b>) photo of the μ-XCT device.</p>
Full article ">Figure 5
<p>Evolution of the driven voltage during the accelerated acidification test.</p>
Full article ">Figure 6
<p>Phase identification process of the anodic acidification-affected zone: (<b>a</b>) 3D construction of the samples; (<b>b</b>) 2D section view of the scanned image; (<b>c</b>) histogram of the gray level clearly showing the gray level of the affected zone.</p>
Full article ">Figure 7
<p>Results of the identification process. The identified anodic acidification-affected zone is shown in dark red. The duration of the accelerated ICCP test increased from 0 to 40 days for (<b>a</b>–<b>e</b>) with increasing order.</p>
Full article ">Figure 8
<p>A 3D reconstruction of the anodic acidification-affected zone. The duration of the test from (<b>a</b>–<b>e</b>) is 0 to 40 days with increasing order.</p>
Full article ">Figure 9
<p>Comparison of the μ-XCT image and optical analysis of the sample at the end of the accelerated ICCP test: (<b>a</b>) μ-XCT image; (<b>b</b>) optical image with a phenolphthalein pH indicator spray on the surface of the sample.</p>
Full article ">Figure 10
<p>SEM and EDS results for the sample after 40 days of accelerated anode acidification test: (<b>a</b>) SEM view at low magnification; (<b>b</b>–<b>d</b>) increased magnification of EDS spot 1, 3, 5, respectively; (<b>e</b>–<b>g</b>) EDS results for three points selected from the sample surface.</p>
Full article ">Figure 11
<p>Evolution of the Faraday efficiency of reaction (1) [<a href="#B1-materials-17-05662" class="html-bibr">1</a>].</p>
Full article ">Figure 12
<p>Experimental setup of the accelerated acidification test.</p>
Full article ">Figure 13
<p>Correlation between the input electrical energy and the affected volume.</p>
Full article ">
16 pages, 1799 KiB  
Article
Optimizing Fire Scene Analysis: Hybrid Convolutional Neural Network Model Leveraging Multiscale Feature and Attention Mechanisms
by Shakhnoza Muksimova, Sabina Umirzakova, Mirjamol Abdullaev and Young-Im Cho
Fire 2024, 7(11), 422; https://doi.org/10.3390/fire7110422 - 20 Nov 2024
Viewed by 54
Abstract
The rapid and accurate detection of fire scenes in various environments is crucial for effective disaster management and mitigation. Fire scene classification is a critical aspect of modern fire detection systems that directly affects public safety and property preservation. This research introduced a [...] Read more.
The rapid and accurate detection of fire scenes in various environments is crucial for effective disaster management and mitigation. Fire scene classification is a critical aspect of modern fire detection systems that directly affects public safety and property preservation. This research introduced a novel hybrid deep learning model designed to enhance the accuracy and efficiency of fire scene classification across diverse environments. The proposed model integrates advanced convolutional neural networks with multiscale feature extraction, attention mechanisms, and ensemble learning to achieve superior performance in real-time fire detection. By leveraging the strengths of pre-trained networks such as ResNet50, VGG16, and EfficientNet-B3, the model captures detailed features at multiple scales, ensuring robust detection capabilities. Including spatial and channel attention mechanisms further refines the focus on critical areas within the input images, reducing false positives and improving detection precision. Extensive experiments on a comprehensive dataset encompassing wildfires, building fires, vehicle fires, and non-fire scenes demonstrate that the proposed framework outperforms existing cutting-edge techniques. The model also exhibited reduced computational complexity and enhanced inference speed, making it suitable for deployment in real-time applications on various hardware platforms. This study sets a new benchmark for fire detection and offers a powerful tool for early warning systems and emergency response initiatives. Full article
Show Figures

Figure 1

Figure 1
<p>Hybrid convolutional neural network architecture for fire type classification.</p>
Full article ">Figure 2
<p>Examples of images from each of the four classes used in the training datasets.</p>
Full article ">Figure 2 Cont.
<p>Examples of images from each of the four classes used in the training datasets.</p>
Full article ">
16 pages, 2295 KiB  
Article
Machine Learning Models for the Classification of Histopathological Images of Colorectal Cancer
by Nektarios Georgiou, Pavlos Kolias and Ioanna Chouvarda
Appl. Sci. 2024, 14(22), 10731; https://doi.org/10.3390/app142210731 - 20 Nov 2024
Viewed by 108
Abstract
The aim of this study was to explore the application of computational models for the analysis of histopathological images in the context of colon cancer. A comprehensive dataset of colon cancer images annotated into eight distinct categories based on their representation of cancerous [...] Read more.
The aim of this study was to explore the application of computational models for the analysis of histopathological images in the context of colon cancer. A comprehensive dataset of colon cancer images annotated into eight distinct categories based on their representation of cancerous cell portions was used. The primary objective was to employ various image classification algorithms to assess their efficacy in the context of cancer classification. Additionally, this study investigated the use of feature extraction techniques to derive meaningful data from the images, contributing to a more nuanced understanding of cancerous tissues, comparing the performance of different image classification algorithms in the context of colon cancer image analysis. The findings of this research suggested that XGboost provides the highest accuracy (89.79%) and could contribute to the growing body of knowledge in computational pathology. Other algorithms, such as the random forest, SVM, and CNN, also provided satisfactory results, offering insights into the effectiveness of image classification algorithms in distinguishing between different categories of cancerous cells. This work holds implications for the development of more accurate and efficient tools, underscoring the potential of computational models in enhancing the analysis of histopathological images and improving diagnostic capabilities in cancer research. Full article
(This article belongs to the Special Issue Advances in Medical Imaging and Radiation Therapy)
Show Figures

Figure 1

Figure 1
<p>Representation of the eight tissue types, (<b>1</b>) tumor epithelium, (<b>2</b>) complex stroma, (<b>3</b>) immune cells (including immune cell conglomerates and sub-mucosal lymphoid follicles), (<b>4</b>) simple stroma (homogeneous composition, including tumor stroma, extra-tumoral stroma, and smooth muscle), (<b>5</b>) debris (including necrosis, hemorrhage, and mucus), (<b>6</b>) adipose tissue, (<b>7</b>) background/empty (no tissue), and (<b>8</b>) normal mucosal gland.</p>
Full article ">Figure 2
<p>Correlation between the extracted features.</p>
Full article ">Figure 3
<p>Boxplots for RGB color red and energy across the eight tissue types.</p>
Full article ">Figure 4
<p>Comparison of predictive performance in the test set across the different algorithms.</p>
Full article ">Figure 5
<p>Comparison of predictive performance ((<b>A</b>) sensitivity, (<b>B</b>) specificity, (<b>C</b>) precision, (<b>D</b>) F1 score) in the test set across the different tissue types and algorithms used.</p>
Full article ">
19 pages, 7893 KiB  
Article
AI-Driven Crack Detection for Remanufacturing Cylinder Heads Using Deep Learning and Engineering-Informed Data Augmentation
by Mohammad Mohammadzadeh, Gül E. Okudan Kremer, Sigurdur Olafsson and Paul A. Kremer
Automation 2024, 5(4), 578-596; https://doi.org/10.3390/automation5040033 (registering DOI) - 20 Nov 2024
Viewed by 89
Abstract
Detecting cracks in cylinder heads traditionally relies on manual inspection, which is time-consuming and susceptible to human error. As an alternative, automated object detection utilizing computer vision and machine learning models has been explored. However, these methods often face challenges due to a [...] Read more.
Detecting cracks in cylinder heads traditionally relies on manual inspection, which is time-consuming and susceptible to human error. As an alternative, automated object detection utilizing computer vision and machine learning models has been explored. However, these methods often face challenges due to a lack of sufficiently annotated training data, limited image diversity, and the inherently small size of cracks. Addressing these constraints, this paper introduces a novel automated crack-detection method that enhances data availability through a synthetic data generation technique. Unlike general data augmentation practices, our method involves copying cracks from one location to another, guided by both random and informed engineering decisions about likely crack formations due to cyclic thermomechanical loads. The innovative aspect of our approach lies in the integration of domain-specific engineering knowledge into the synthetic generation process, which substantially improves detection accuracy. We evaluate our method’s effectiveness using two metrics: the F2 score, which emphasizes recall to prioritize detecting all potential cracks, and mean average precision (MAP), a standard measure in object detection. Experimental results demonstrate that, without engineering insights, our method increases the F2 score from 0.40 to 0.65, while maintaining a stable MAP. Incorporating detailed engineering knowledge further enhances the F2 score to 0.70 and improves MAP to 0.57, representing increases of 63% and 43%, respectively. These results confirm that our approach not only mitigates the limitations of traditional data augmentation but also significantly advances the reliability and precision of crack detection in industrial settings. Full article
(This article belongs to the Special Issue Smart Remanufacturing)
Show Figures

Figure 1

Figure 1
<p>Overview of the proposed methods and evaluation procedures.</p>
Full article ">Figure 2
<p>Sample images and cracks (small red squares) obtained in different methods: (<b>a</b>) lab setting and (<b>b</b>) less controlled setting.</p>
Full article ">Figure 3
<p>Sample cracks (small red squares) in original image vs. augmented images.</p>
Full article ">Figure 4
<p>Sample crack patches extracted from training set.</p>
Full article ">Figure 5
<p>Potential (known) areas where cracks may occur.</p>
Full article ">Figure 6
<p>Sample of an image before and after adding synthetic cracks to known areas.</p>
Full article ">Figure 7
<p>Sample of an image before and after adding synthetic cracks to uncommon areas.</p>
Full article ">Figure 8
<p>Sample of training images after randomly placing synthetic cracks.</p>
Full article ">Figure 9
<p>Sample of predicted images using YOLOv8x before (<b>a</b>) and after (<b>b</b>) adding 300 synthetic cracks.</p>
Full article ">
27 pages, 28012 KiB  
Article
A Model Development Approach Based on Point Cloud Reconstruction and Mapping Texture Enhancement
by Boyang You and Barmak Honarvar Shakibaei Asli
Big Data Cogn. Comput. 2024, 8(11), 164; https://doi.org/10.3390/bdcc8110164 - 20 Nov 2024
Viewed by 115
Abstract
To address the challenge of rapid geometric model development in the digital twin industry, this paper presents a comprehensive pipeline for constructing 3D models from images using monocular vision imaging principles. Firstly, a structure-from-motion (SFM) algorithm generates a 3D point cloud from photographs. [...] Read more.
To address the challenge of rapid geometric model development in the digital twin industry, this paper presents a comprehensive pipeline for constructing 3D models from images using monocular vision imaging principles. Firstly, a structure-from-motion (SFM) algorithm generates a 3D point cloud from photographs. The feature detection methods scale-invariant feature transform (SIFT), speeded-up robust features (SURF), and KAZE are compared across six datasets, with SIFT proving the most effective (matching rate higher than 0.12). Using K-nearest-neighbor matching and random sample consensus (RANSAC), refined feature point matching and 3D spatial representation are achieved via antipodal geometry. Then, the Poisson surface reconstruction algorithm converts the point cloud into a mesh model. Additionally, texture images are enhanced by leveraging a visual geometry group (VGG) network-based deep learning approach. Content images from a dataset provide geometric contours via higher-level VGG layers, while textures from style images are extracted using the lower-level layers. These are fused to create texture-transferred images, where the image quality assessment (IQA) metrics SSIM and PSNR are used to evaluate texture-enhanced images. Finally, texture mapping integrates the enhanced textures with the mesh model, improving the scene representation with enhanced texture. The method presented in this paper surpassed a LiDAR-based reconstruction approach by 20% in terms of point cloud density and number of model facets, while the hardware cost was only 1% of that associated with LiDAR. Full article
Show Figures

Figure 1

Figure 1
<p>Samples from Dataset 1 (Source: <a href="https://github.com/Abhishek-Aditya-bs/MultiView-3D-Reconstruction/tree/main/Datasets" target="_blank">https://github.com/Abhishek-Aditya-bs/MultiView-3D-Reconstruction/tree/main/Datasets</a> accessed on 18 November 2024) and samples from Dataset 2.</p>
Full article ">Figure 2
<p>Demonstration of Dataset 3.</p>
Full article ">Figure 3
<p>Diagram of SFM algorithm.</p>
Full article ">Figure 4
<p>Camera imaging model.</p>
Full article ">Figure 5
<p>Coplanarity condition of photogrammetry.</p>
Full article ">Figure 6
<p>Process of surface reconstruction.</p>
Full article ">Figure 7
<p>Demonstration of isosurface.</p>
Full article ">Figure 8
<p>Demonstration of VGG network.</p>
Full article ">Figure 9
<p>Demonstration of Gram matrix.</p>
Full article ">Figure 10
<p>Style transformation architecture.</p>
Full article ">Figure 11
<p>Texture mapping process.</p>
Full article ">Figure 12
<p>Demonstration of the three kinds of feature descriptors used on Dataset 1 and Dataset 2.</p>
Full article ">Figure 13
<p>Matching rate fitting of three kinds of image descriptors.</p>
Full article ">Figure 14
<p>SIFT point matching for <span class="html-italic">CNC1</span> object under different thresholds.</p>
Full article ">Figure 15
<p>SIFT point matching for <span class="html-italic">Fountain</span> object under different thresholds.</p>
Full article ">Figure 16
<p>Matching result of Dataset 2 using RANSAC method.</p>
Full article ">Figure 17
<p>Triangulation presentation of feature points obtained from objects in Dataset 1.</p>
Full article ">Figure 18
<p>Triangulation presentation of feature points obtained from objects in Dataset 2.</p>
Full article ">Figure 19
<p>Point cloud data of objects in Dataset 1.</p>
Full article ">Figure 20
<p>Point cloud data of objects in Dataset 2.</p>
Full article ">Figure 21
<p>Normal vector presentation of the points set obtained from objects in Dataset 1.</p>
Full article ">Figure 22
<p>Normal vector of the points set obtained from objects in Dataset 2.</p>
Full article ">Figure 23
<p>Poisson surface reconstruction results of objects in Dataset 1.</p>
Full article ">Figure 24
<p>Poisson surface reconstruction results of objects in Dataset 2.</p>
Full article ">Figure 25
<p>Style transfer result of <span class="html-italic">Statue</span> object.</p>
Full article ">Figure 26
<p>Style transfer result of <span class="html-italic">Fountain</span> object.</p>
Full article ">Figure 27
<p>Style transfer result of <span class="html-italic">Castle</span> object.</p>
Full article ">Figure 28
<p>Style transfer result of <span class="html-italic">CNC1</span> object.</p>
Full article ">Figure 29
<p>Style transfer result of <span class="html-italic">CNC2</span> object.</p>
Full article ">Figure 30
<p>Style transfer result of <span class="html-italic">Robot</span> object.</p>
Full article ">Figure 31
<p>Training loss in style transfer for <b>CNC1</b> object.</p>
Full article ">Figure 32
<p>IQA assessment for <b>CNC1</b> images after style transfer.</p>
Full article ">Figure 33
<p>Results of texture mapping for Dataset 1.</p>
Full article ">Figure 34
<p>Results of texture mapping for Dataset 2.</p>
Full article ">Figure A1
<p>Results of Camera calibration.</p>
Full article ">
28 pages, 1152 KiB  
Article
Lung and Colon Cancer Detection Using a Deep AI Model
by Nazmul Shahadat, Ritika Lama and Anna Nguyen
Cancers 2024, 16(22), 3879; https://doi.org/10.3390/cancers16223879 - 20 Nov 2024
Viewed by 89
Abstract
Lung and colon cancers are among the leading causes of cancer-related mortality worldwide. Early and accurate detection of these cancers is crucial for effective treatment and improved patient outcomes. False or incorrect detection is harmful. Accurately detecting cancer in a patient’s tissue is [...] Read more.
Lung and colon cancers are among the leading causes of cancer-related mortality worldwide. Early and accurate detection of these cancers is crucial for effective treatment and improved patient outcomes. False or incorrect detection is harmful. Accurately detecting cancer in a patient’s tissue is crucial to their effective treatment. While analyzing tissue samples is complicated and time-consuming, deep learning techniques have made it possible to complete this process more efficiently and accurately. As a result, researchers can study more patients in a shorter amount of time and at a lower cost. Much research has been conducted to investigate deep learning models that require great computational ability and resources. However, none of these have had a 100% accurate detection rate for these life-threatening malignancies. Misclassified or falsely detecting cancer can have very harmful consequences. This research proposes a new lightweight, parameter-efficient, and mobile-embedded deep learning model based on a 1D convolutional neural network with squeeze-and-excitation layers for efficient lung and colon cancer detection. This proposed model diagnoses and classifies lung squamous cell carcinomas and adenocarcinoma of the lung and colon from digital pathology images. Extensive experiment demonstrates that our proposed model achieves 100% accuracy for detecting lung, colon, and lung and colon cancers from the histopathological (LC25000) lung and colon datasets, which is considered the best accuracy for around 0.35 million trainable parameters and around 6.4 million flops. Compared with the existing results, our proposed architecture shows state-of-the-art performance in lung, colon, and lung and colon cancer detection. Full article
(This article belongs to the Collection Oncology: State-of-the-Art Research in the USA)
Show Figures

Figure 1

Figure 1
<p>Illustration of block diagrams found in (<b>a</b>) Residual network [<a href="#B50-cancers-16-03879" class="html-bibr">50</a>], (<b>b</b>) Residual 1D convolutional network [<a href="#B48-cancers-16-03879" class="html-bibr">48</a>], and (<b>c</b>) SqueezeNet [<a href="#B49-cancers-16-03879" class="html-bibr">49</a>].</p>
Full article ">Figure 2
<p>SqueezeNet block [<a href="#B49-cancers-16-03879" class="html-bibr">49</a>].</p>
Full article ">Figure 3
<p>Illustration of block diagrams found in (<b>a</b>) SqueezeNext [<a href="#B51-cancers-16-03879" class="html-bibr">51</a>], and (<b>b</b>) SEC [<a href="#B52-cancers-16-03879" class="html-bibr">52</a>].</p>
Full article ">Figure 4
<p>SqueezeNext network architecture (23 layers) [<a href="#B51-cancers-16-03879" class="html-bibr">51</a>].</p>
Full article ">Figure 5
<p>Illustration of block diagrams found in (<b>a</b>) Reduced CNN layers Network [<a href="#B48-cancers-16-03879" class="html-bibr">48</a>,<a href="#B53-cancers-16-03879" class="html-bibr">53</a>,<a href="#B54-cancers-16-03879" class="html-bibr">54</a>], (<b>b</b>) our proposed network architectures.</p>
Full article ">Figure 6
<p>Randomly selected lung and colon cancer histopathological images from the LC25000 dataset [<a href="#B55-cancers-16-03879" class="html-bibr">55</a>].</p>
Full article ">Figure 7
<p>Training vs. validation loss diagrams to analyze overfitting in our proposed model across different numbers of epochs.</p>
Full article ">Figure 7 Cont.
<p>Training vs. validation loss diagrams to analyze overfitting in our proposed model across different numbers of epochs.</p>
Full article ">
18 pages, 12381 KiB  
Article
AQSA—Algorithm for Automatic Quantification of Spheres Derived from Cancer Cells in Microfluidic Devices
by Ana Belén Peñaherrera-Pazmiño, Ramiro Fernando Isa-Jara, Elsa Hincapié-Arias, Silvia Gómez, Denise Belgorosky, Eduardo Imanol Agüero, Matías Tellado, Ana María Eiján, Betiana Lerner and Maximiliano Pérez
J. Imaging 2024, 10(11), 295; https://doi.org/10.3390/jimaging10110295 - 20 Nov 2024
Viewed by 167
Abstract
Sphere formation assay is an accepted cancer stem cell (CSC) enrichment method. CSCs play a crucial role in chemoresistance and cancer recurrence. Therefore, CSC growth is studied in plates and microdevices to develop prediction chemotherapy assays in cancer. As counting spheres cultured in [...] Read more.
Sphere formation assay is an accepted cancer stem cell (CSC) enrichment method. CSCs play a crucial role in chemoresistance and cancer recurrence. Therefore, CSC growth is studied in plates and microdevices to develop prediction chemotherapy assays in cancer. As counting spheres cultured in devices is laborious, time-consuming, and operator-dependent, a computational program called the Automatic Quantification of Spheres Algorithm (ASQA) that detects, identifies, counts, and measures spheres automatically was developed. The algorithm and manual counts were compared, and there was no statistically significant difference (p = 0.167). The performance of the AQSA is better when the input image has a uniform background, whereas, with a nonuniform background, artifacts can be interpreted as spheres according to image characteristics. The areas of spheres derived from LN229 cells and CSCs from primary cultures were measured. For images with one sphere, area measurements obtained with the AQSA and SpheroidJ were compared, and there was no statistically significant difference between them (p = 0.173). Notably, the AQSA detects more than one sphere, compared to other approaches available in the literature, and computes the sphere area automatically, which enables the observation of treatment response in the sphere derived from the human glioblastoma LN229 cell line. In addition, the algorithm identifies spheres with numbers to identify each one over time. The AQSA analyzes many images in 0.3 s per image with a low computational cost, enabling laboratories from developing countries to perform sphere counts and area measurements without needing a powerful computer. Consequently, it can be a useful tool for automated CSC quantification from cancer cell lines, and it can be adjusted to quantify CSCs from primary culture cells. CSC-derived sphere detection is highly relevant as it avoids expensive treatments and unnecessary toxicity. Full article
(This article belongs to the Special Issue Advancements in Imaging Techniques for Detection of Cancer)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Microfluidic device architecture. (<b>a</b>) The microfluidic device design comprises 6 channels. Each channel has an inlet, 5 chambers, and an outlet. (<b>b</b>) Microfluidic device top view with scale in cm. (<b>c</b>) Microfluidic device side view.</p>
Full article ">Figure 2
<p>Flow diagram of image analysis process. Schematic representation of AQSA performance in uniform (12-multiwell plate) and nonuniform (microfluidic chip) environments.</p>
Full article ">Figure 3
<p>CLAHE and Gaussian filtering effect. Image quality improvement applying CLAHE and Gaussian filter.</p>
Full article ">Figure 4
<p>Gabor filter set application. Parameters of Gabor filter set: <span class="html-italic">f</span> = [1/4, 1/6, 1/8], <span class="html-italic">s</span><span class="html-italic">i</span><span class="html-italic">z</span><span class="html-italic">e</span> = 21, 6 rotations: <span class="html-italic">θ</span> = [<math display="inline"><semantics> <mrow> <msup> <mrow> <mn>0</mn> </mrow> <mrow> <mi>o</mi> </mrow> </msup> <mo>,</mo> <mo> </mo> <msup> <mrow> <mn>30</mn> </mrow> <mrow> <mi>o</mi> </mrow> </msup> <mo>,</mo> <mo> </mo> <msup> <mrow> <mn>60</mn> </mrow> <mrow> <mi>o</mi> </mrow> </msup> <mo>,</mo> <mo> </mo> <msup> <mrow> <mn>90</mn> </mrow> <mrow> <mi>o</mi> </mrow> </msup> <mo>,</mo> <mo> </mo> <msup> <mrow> <mn>120</mn> </mrow> <mrow> <mi>o</mi> </mrow> </msup> <mo>,</mo> <mo> </mo> <msup> <mrow> <mn>150</mn> </mrow> <mrow> <mi>o</mi> </mrow> </msup> </mrow> </semantics></math>], and <span class="html-italic">γ</span> = 10, η = 0.5.</p>
Full article ">Figure 5
<p>Relation between average area and cell number. Representative 40× images of LN229 cell line. White arrow indicates cell position. Scale bar: 100 μm. Solid line represents linear regression line. While dashed lines are the boundaries of all possible straight lines.</p>
Full article ">Figure 6
<p>Sphere detection, identification, and quantification in 12-well plate. (<b>a</b>) Original 10× image. (<b>b</b>) Detected spheres are circled in green and numbered in red. U251 human glioblastoma cell line. Scale bar 200 μm. (<b>c</b>) Sphere number comparison between AQSA algorithm and manual count performed with a hundred 10× U251 human glioblastoma cell line images. Student’s <span class="html-italic">t</span>-test (<span class="html-italic">p</span> = 0.167); ns means no statistically significant difference (<span class="html-italic">p</span> &gt; 0.05).</p>
Full article ">Figure 7
<p>Area quantification. (<b>a</b>) Area measurement comparison among ImageJ, manual, and AQSA for images with more than one sphere. (<b>b</b>) Area measurement comparison between AQSA and SpheroidJ for images with one sphere (Student’s <span class="html-italic">t</span>-test (<span class="html-italic">p</span> = 0.173); ns means no statistically significant difference (<span class="html-italic">p</span> &gt; 0.05)).</p>
Full article ">Figure 8
<p>Treatment response. Upper panel shows an image of LN229 cell-line-derived sphere. (<b>a</b>) Original 40× image and (<b>b</b>) AQSA analyzed image. Lower panel indicates treatment response according to (<b>c</b>) sphere formation efficiency (SFE), (<b>d</b>) sphere area, and (<b>e</b>) sphere number. Scale bar corresponds to 50 μm (* <span class="html-italic">p</span> ≤ 0.05 and ** <span class="html-italic">p</span> ≤ 0.01).</p>
Full article ">Figure 9
<p>High-grade compact solid canine thyroid carcinoma. (<b>a</b>) 4× image of a tracked in time sphere inside a microfluidic device chamber at day 4. Scale 100 um. (<b>b</b>) 4× image of a tracked in time sphere inside a microfluidic device chamber at day 5. (<b>c</b>) 40× image of sphere tracking at day 4. (<b>d</b>) 40× image of sphere tracking at day 5. Scale bar 100 µm. (<b>e</b>) Original 10× image of sphere derived from thyroid cancer cells. (<b>f</b>) Detected spheres reported by AQSA. Scale bar 100 µm. (<b>g</b>) Sphere manual count compared to AQSA count at days 3–5. (<b>h</b>) Area measurement by AQSA at different cell concentrations at days 3–5. White arrows indicate the tracking of the same sphere during time and ns means no statistically significant difference (<span class="html-italic">p</span> &gt; 0.05).</p>
Full article ">Figure 10
<p>Pleomorphic neoplastic lesion with carcinomatous pattern in the canine nasal cavity. (<b>a</b>) 4× image of a tracked in time sphere inside a microfluidic device chamber on day 10. Scale: 100 µm. (<b>b</b>) 40× image of sphere tracking day 10. (<b>c</b>) 40× image of sphere tracking day 12. Scale bar 50 µm. (<b>d</b>) Original 10× image of sphere derived from nasal tumor cells. (<b>e</b>) Detected spheres reported by AQSA. Scale bar 100 µm. (<b>f</b>) Sphere manual count compared to AQSA count on day 5 and day 10. (<b>g</b>) Area measurement by AQSA. White arrows indicate the tracking of the same sphere during time, ns means no statistically significant difference (<span class="html-italic">p</span> &gt; 0.05), and green circles mark the counted spheres.</p>
Full article ">Figure 11
<p>Comparison of AQSA sphere detection in two microscopes. Upper panel shows images acquired with phase-contrast Nikon microscope, and lower panel presents images obtained with an inverted Zeiss microscope. Upper panel 4× 12-well plate image scale bar corresponds to 500 µm, while 10× microdevice image scale bar corresponds to 200 µm. Lower panel 40× 12-well plate image scale bar corresponds to 50 µm, while 10× microdevice image scale bar corresponds to 100 µm.</p>
Full article ">
15 pages, 2189 KiB  
Article
Entropy-Based Ensemble of Convolutional Neural Networks for Clothes Texture Pattern Recognition
by Reham Al-Majed and Muhammad Hussain
Appl. Sci. 2024, 14(22), 10730; https://doi.org/10.3390/app142210730 - 20 Nov 2024
Viewed by 190
Abstract
Automatic clothes pattern recognition is important to assist visually impaired people and for real-world applications such as e-commerce or personal fashion recommendation systems, and it has attracted increased interest from researchers. It is a challenging texture classification problem in that even images of [...] Read more.
Automatic clothes pattern recognition is important to assist visually impaired people and for real-world applications such as e-commerce or personal fashion recommendation systems, and it has attracted increased interest from researchers. It is a challenging texture classification problem in that even images of the same texture class expose a high degree of intraclass variations. Moreover, images of clothes patterns may be taken in an unconstrained illumination environment. Machine learning methods proposed for this problem mostly rely on handcrafted features and traditional classification methods. The research works that utilize the deep learning approach result in poor recognition performance. We propose a deep learning method based on an ensemble of convolutional neural networks where feature engineering is not required while extracting robust local and global features of clothes patterns. The ensemble classifier employs a pre-trained ResNet50 with a non-local (NL) block, a squeeze-and-excitation (SE) block, and a coordinate attention (CA) block as base learners. To fuse the individual decisions of the base learners, we introduce a simple and effective fusing technique based on entropy voting, which incorporates the uncertainties in the decisions of base learners. We validate the proposed method on benchmark datasets for clothes patterns that have six categories: solid, striped, checkered, dotted, zigzag, and floral. The proposed method achieves promising results for limited computational and data resources. In terms of accuracy, it achieves 98.18% for the GoogleClothingDataset and 96.03% for the CCYN dataset. Full article
(This article belongs to the Special Issue Application of Artificial Intelligence in Image Processing)
Show Figures

Figure 1

Figure 1
<p>Examples of six classes of clothes patterns: checkered, floral, dotted, solid, striped, and zigzag.</p>
Full article ">Figure 2
<p>High-level depiction of the architecture of the proposed ensemble classifier.</p>
Full article ">Figure 3
<p>Detail of ResNet50 architecture.</p>
Full article ">Figure 4
<p>Architecture of bottleneck residual block.</p>
Full article ">Figure 5
<p>ResNet50 with SE Blocks. <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>s</mi> <msub> <mi>G</mi> <mi>i</mi> </msub> </mrow> </semantics></math> is the <span class="html-italic">i</span>th group of ResNet blocks.</p>
Full article ">Figure 6
<p>ResNet50 with CA block. <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>s</mi> <msub> <mi>G</mi> <mi>i</mi> </msub> </mrow> </semantics></math> is the <span class="html-italic">i</span>th group of ResNet blocks.</p>
Full article ">Figure 7
<p>ResNet50 with NL block. <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>s</mi> <msub> <mi>G</mi> <mi>i</mi> </msub> </mrow> </semantics></math> is the <span class="html-italic">i</span>th group of ResNet blocks.</p>
Full article ">Figure 8
<p>The performance of two ensemble classifiers. (<b>a</b>) The performance in terms of accuracy of ensemble learner 1 and the base learners. (<b>b</b>) The performance in terms of accuracy of ensemble learner 2 and the base learners.</p>
Full article ">Figure 9
<p>Venn diagram of base learners’ errors. (<b>a</b>) Error analysis of base learners of ensemble classifier 1. (<b>b</b>) Error analysis of base learners of ensemble classifier 2.</p>
Full article ">Figure 10
<p>Confusion matrix showing the decision making of the ensemble classifier.</p>
Full article ">Figure 11
<p>Performance of the base learners for each class.</p>
Full article ">
15 pages, 6086 KiB  
Article
Improved Visual SLAM Algorithm Based on Dynamic Scenes
by Jinxing Niu, Ziqi Chen, Tao Zhang and Shiyu Zheng
Appl. Sci. 2024, 14(22), 10727; https://doi.org/10.3390/app142210727 - 20 Nov 2024
Viewed by 159
Abstract
This work presents a novel RGB-D dynamic simultaneous localization and mapping (SLAM) method that improves accuracy, stability, and efficiency of localization while relying on deep learning in a dynamic environment, in contrast to traditional static scene-based visual SLAM methods. Based on the classic [...] Read more.
This work presents a novel RGB-D dynamic simultaneous localization and mapping (SLAM) method that improves accuracy, stability, and efficiency of localization while relying on deep learning in a dynamic environment, in contrast to traditional static scene-based visual SLAM methods. Based on the classic framework of traditional visual SLAM, we propose a method that replaces the traditional feature extraction method with a convolutional neural network approach, aiming to enhance the accuracy of feature extraction and localization, as well as to improve the algorithm’s ability to capture and represent the characteristics of the entire scene. Subsequently, the semantic segmentation thread was utilized in a target detection network combined with geometric methods to identify potential dynamic areas in the image and generate masks for dynamic objects. Finally, the standard deviation of the depth information of potential dynamic points was calculated to identify true dynamic feature points, to guarantee that static feature points were used for position estimation. We performed experiments based on the public datasets to validate the feasibility of the proposed algorithm. The experimental results indicate that the improved SLAM algorithm, which boasts a reduction in absolute trajectory error (ATE) by approximately 97% compared to traditional static visual SLAM and about 20% compared to traditional dynamic visual SLAM, also exhibited a 68% decrease in computation time compared to well-known dynamic visual SLAM, thereby possessing absolute advantages in both positioning accuracy and operational efficiency. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Overview of the enhanced SLAM system. The framework of the algorithm comprises four threads: semantic segmentation, tracking, local mapping, and loop closing.</p>
Full article ">Figure 2
<p>GCNv2 feature extraction network structure with channel numbers listed below each convolutional layer.</p>
Full article ">Figure 3
<p>YOLOv5’s network architecture diagram.</p>
Full article ">Figure 4
<p>(<b>a</b>,<b>b</b>) and (<b>c</b>,<b>d</b>) are the semantic segmentation results based on the modified SLAM. Red indicates the detection boxes from YOLOv5x for object detection, while green represents the extracted feature points.</p>
Full article ">Figure 5
<p>Comparing feature point distribution between ORB and GCNv2, the scenes in figures (<b>a</b>,<b>b</b>) are cluttered with various objects, including computer screens, which made it difficult to obtain features. The images in (<b>c</b>,<b>d</b>) were taken from the corner of a table where the camera was moving, resulting in significant changes in viewpoint.</p>
Full article ">Figure 6
<p>Comparing the ATE of the improved SLAM, ORB-SLAM2 across five dynamic scene sequences from the fr3 dataset, (<b>a</b>–<b>e</b>) represent the trajectory maps of ORB-SLAM2, while (<b>f</b>–<b>j</b>) represent the trajectory maps of the improved SLAM.</p>
Full article ">Figure 7
<p>Shows the results for the fr3_walking_xyz sequence. Panels (<b>a</b>,<b>b</b>) illustrate the estimated trajectories compared to the ground truth, as well as the errors along the x, y, and z axes for ORB-SLAM2 and the improved SLAM. Panel (<b>c</b>) displays the time consumption for each method.</p>
Full article ">
15 pages, 5207 KiB  
Article
Threshold Ranges of Multiphase Components from Natural Ice CT Images Based on Watershed Algorithm
by Shengbo Hu, Qingkai Wang, Chunjiang Li and Zhijun Li
Water 2024, 16(22), 3330; https://doi.org/10.3390/w16223330 - 19 Nov 2024
Viewed by 251
Abstract
The multiphase components of natural ice contain gas, ice, unfrozen water, sediment and brine. X-ray computed tomography (CT) analysis of ice multiphase components has the advantage of high precision, non-destructiveness and visualization; however, it is limited by the segmentation thresholds. Due to the [...] Read more.
The multiphase components of natural ice contain gas, ice, unfrozen water, sediment and brine. X-ray computed tomography (CT) analysis of ice multiphase components has the advantage of high precision, non-destructiveness and visualization; however, it is limited by the segmentation thresholds. Due to the proximity of the CT value ranges of gas, ice, unfrozen water, sediment and brine within the samples, there is uncertainty in the artificial determination of the CT image segmentation thresholds, as well as unsuitability of the global threshold segmentation methods. In order to improve the accuracy of multi-threshold segmentation in CT images, a CT system was used to scan the Yellow River ice, the Wuliangsuhai lake ice and the Arctic sea ice. The threshold ranges of multiphase components within the ice were determined by watershed algorithm to construct a high-precision three-dimensional ice model. The results indicated that CT combined with watershed algorithm was an efficient and non-destructive method for obtaining microscopic information within ice, which accurately segmented the ice into multiphase components such as gas, ice, unfrozen water, sediment, and brine. The gas CT values of the Yellow River ice, the Wuliangsuhai lake ice and the Arctic sea ice ranged from −1024 Hu~−107 Hu, −1024 Hu~−103 Hu, and −1024 Hu~−160 Hu, respectively. The ice CT values of the Yellow River ice, the Wuliangsuhai lake ice and the Arctic sea ice ranged from −103 Hu~−50 Hu, −100 Hu~−38 Hu, −153 Hu~−51 Hu. The unfrozen water CT values of the Yellow River ice and the Wuliangsuhai lake ice ranged from −8 Hu~18 Hu, −8 Hu~13 Hu. The sediment CT values of the Yellow River ice and the Wuliangsuhai lake ice ranged from 20 Hu~3071 Hu, 20 Hu~3071 Hu, and the brine CT values of the Arctic sea ice ranged from −6 Hu~3071 Hu. The errors between the three-dimensional ice model divided by threshold ranges and measured sediment content were less than 0.003 g/cm3, which verified the high accuracy of the established microscopic model. It provided a scientific basis for ice engineering, ice remote sensing, and ice disaster prevention. Full article
(This article belongs to the Special Issue Ice and Snow Properties and Their Applications)
Show Figures

Figure 1

Figure 1
<p>Flow chart of the experimental processing.</p>
Full article ">Figure 2
<p>CT original image and research area frame.</p>
Full article ">Figure 3
<p>Histogram of ice sample CT values. There are no peaks and valleys in the CT value histograms, which proves that the CT values range of gas, unfrozen water, ice, and sediment within the samples are similar without significant intervals.</p>
Full article ">Figure 4
<p>Schematic diagram of watershed algorithm model.</p>
Full article ">Figure 5
<p>Schematic of sample collection and two-dimensional image threshold segmentation. (<b>a</b>) The Yellow River ice sample. Original two-dimensional CT images of (<b>b</b>) top layer, (<b>c</b>) 25 cm from the bottom layer, (<b>d</b>) bottom layer. Histograms of CT values for (<b>e</b>) top layer, (<b>f</b>) 25 cm from the bottom layer, (<b>g</b>) bottom layer. Two-dimensional image multi-threshold segmentation results of (<b>h</b>) top layer, (<b>i</b>) 25 cm from the bottom layer, (<b>j</b>) bottom layer.</p>
Full article ">Figure 6
<p>Three-dimensional reconstructed images of the Yellow River ice samples. Global three-dimensional image of (<b>a</b>) the Yellow River No. 3 ice sample, and (<b>c</b>) the Yellow River No. 4 ice sample. Local three-dimensional images of (<b>b</b>) the Yellow River No. 3 ice sample, and (<b>d</b>) the Yellow River No. 4 ice sample.</p>
Full article ">Figure 7
<p>Study area and local three-dimensional reconstructed images of the Wuliangsuhai lake ice. (<b>a</b>) the Wuliangsuhai No. 1 and No. 2 ice samples collection areas. Global three-dimensional images of (<b>b</b>) the Wuliangsuhai No. 1 ice sample, and (<b>d</b>) the Wuliangsuhai No. 2 ice sample. Local three-dimensional images of (<b>c</b>) the Wuliangsuhai No. 1 ice sample, and (<b>e</b>) the Wuliangsuhai No. 2 ice sample.</p>
Full article ">Figure 8
<p>Global three-dimensional image of (<b>a</b>) the Arctic No. 1 ice sample. Local three-dimensional image of (<b>b</b>) the Arctic No. 1 ice sample.</p>
Full article ">Figure 9
<p>Distribution of sediment content in the Yellow River ice along the depth direction.</p>
Full article ">
23 pages, 32729 KiB  
Article
PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous Vehicles
by Husnain Mushtaq, Xiaoheng Deng, Fizza Azhar, Mubashir Ali and Hafiz Husnain Raza Sherazi
Information 2024, 15(11), 739; https://doi.org/10.3390/info15110739 - 19 Nov 2024
Viewed by 150
Abstract
Accurate 3D object detection is essential for autonomous driving, yet traditional LiDAR models often struggle with sparse point clouds. We propose perspective-aware hierarchical vision transformer-based LiDAR-camera fusion (PLC-Fusion) for 3D object detection to address this. This efficient, multi-modal 3D object detection framework integrates [...] Read more.
Accurate 3D object detection is essential for autonomous driving, yet traditional LiDAR models often struggle with sparse point clouds. We propose perspective-aware hierarchical vision transformer-based LiDAR-camera fusion (PLC-Fusion) for 3D object detection to address this. This efficient, multi-modal 3D object detection framework integrates LiDAR and camera data for improved performance. First, our method enhances LiDAR data by projecting them onto a 2D plane, enabling the extraction of object perspective features from a probability map via the Object Perspective Sampling (OPS) module. It incorporates a lightweight perspective detector, consisting of interconnected 2D and monocular 3D sub-networks, to extract image features and generate object perspective proposals by predicting and refining top-scored 3D candidates. Second, it leverages two independent transformers—CamViT for 2D image features and LidViT for 3D point cloud features. These ViT-based representations are fused via the Cross-Fusion module for hierarchical and deep representation learning, improving performance and computational efficiency. These mechanisms enhance the utilization of semantic features in a region of interest (ROI) to obtain more representative point features, leading to a more effective fusion of information from both LiDAR and camera sources. PLC-Fusion outperforms existing methods, achieving a mean average precision (mAP) of 83.52% and 90.37% for 3D and BEV detection, respectively. Moreover, PLC-Fusion maintains a competitive inference time of 0.18 s. Our model addresses computational bottlenecks by eliminating the need for dense BEV searches and global attention mechanisms while improving detection range and precision. Full article
(This article belongs to the Special Issue Emerging Research in Object Tracking and Image Segmentation)
Show Figures

Figure 1

Figure 1
<p>The architecture of our PLC-Fusion model for 3D object detection using LiDAR and camera data. The raw point cloud from LiDAR and raw image data are processed by separate 3D and 2D backbones, respectively. Perspective-based sampling is applied to both modalities before passing through a vision transformer (ViT)-based model (LiDViT for LiDAR data and CamViT for image data) to establish 2D and 3D correspondence. The Cross-Fusion module integrates these features, followed by region of interest (RoI)-based 3D detection for generating 3D bounding box predictions.</p>
Full article ">Figure 2
<p>Graphical depiction of the object perspective sampling process for LiDAR and camera data within the multimodal fusion model.</p>
Full article ">Figure 3
<p>Illustration of our object perspective sampling and projection process for LiDAR and camera data within the multimodal fusion model. The sampled points from LiDAR and camera images are projected into their respective 3D and 2D coordinate systems. Sparse feature extraction is applied to both modalities before being passed into vision transformer (ViT)-based encoders (LiDAR-ViT for LiDAR features and camera-ViT for image features). These extracted features are then fused in the Cross-Fusion module to establish a 2D–3D correspondence for improved multimodal 3D object detection.</p>
Full article ">Figure 4
<p>The figure illustrates the vision transformer (ViT)-based cross-fusion approach for 3D object detection, combining camera and LiDAR data. Object perspective sampling extracts features from both sensors. The camera branch (CamViT) generates 3D and 2D feature maps <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="bold">H</mi> <mi>c</mi> </msub> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <msub> <mi>N</mi> <mi>c</mi> </msub> <mo>×</mo> <msub> <mi>D</mi> <mi>c</mi> </msub> </mrow> </msup> </mrow> </semantics></math> using multi-head attention (MH-Attention) and a feedforward neural network (FFN), while the LiDAR branch (LiDViT) processes 3D voxel features <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="bold">H</mi> <mi>v</mi> </msub> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>V</mi> <mo>×</mo> <msub> <mi>D</mi> <mi>v</mi> </msub> </mrow> </msup> </mrow> </semantics></math> through a similar transformer architecture. The 2D and 3D feature maps from both modalities are concatenated <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="bold">F</mi> <mrow> <mi>f</mi> <mi>u</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </msub> <mo>=</mo> <mrow> <mo>[</mo> <msub> <mi mathvariant="bold">A</mi> <mrow> <mi>c</mi> <mi>v</mi> </mrow> </msub> <mo>;</mo> <msub> <mi mathvariant="bold">H</mi> <mi>c</mi> </msub> <mo>;</mo> <msub> <mi mathvariant="bold">A</mi> <mrow> <mi>v</mi> <mi>c</mi> </mrow> </msub> <mo>;</mo> <msub> <mi mathvariant="bold">H</mi> <mi>v</mi> </msub> <mo>]</mo> </mrow> </mrow> </semantics></math> and undergo cross-attention to align visual and geometric data. A final FFN refines the fused representation <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="bold">F</mi> <mrow> <mi>f</mi> <mi>i</mi> <mi>n</mi> <mi>a</mi> <mi>l</mi> </mrow> </msub> <mo>=</mo> <mi>MLP</mi> <mrow> <mo>(</mo> <msub> <mi mathvariant="bold">F</mi> <mrow> <mi>f</mi> <mi>u</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math>, providing deep multimodal features for accurate object detection in 3D space.</p>
Full article ">Figure 5
<p>Visual results of the proposed method on the KITTI validation dataset. For each case of sub-figures (<b>a</b>–<b>d</b>), the top row shows the visualization in the RGB image, and the bottom row displays the visualization in the LiDAR point cloud. Green represents the ground truth, and blue denotes the predicted outcomes.</p>
Full article ">Figure 6
<p>Visual results of the proposed method on the KITTI test and validation datasets. Row (<b>a</b>) presents the testing results, and row (<b>b</b>) displays the validation outcomes. The detection results demonstrate the effectiveness of our method, with the dotted circles highlighting the undetected instances caused by distance and heavy occlusion.</p>
Full article ">Figure 7
<p>Car class with Moderate condition: AP vs. IoU on KITII validation set.</p>
Full article ">Figure 8
<p>Comparative analysis of the runtime of our model with recent methods.</p>
Full article ">
10 pages, 5073 KiB  
Review
Radiological Approach to Assessment of Lower-Limb Alignment—Coronal and Transverse Plane Analysis
by Anna Michalska-Foryszewska, Piotr Modzelewski, Katarzyna Sklinda, Bartosz Mruk and Jerzy Walecki
J. Clin. Med. 2024, 13(22), 6975; https://doi.org/10.3390/jcm13226975 - 19 Nov 2024
Viewed by 230
Abstract
Lower-limb alignment deformities constitute a significant clinical concern, as they can lead to serious complications, including progressive degenerative diseases and disabilities. Rotational deformities may give rise to conditions such as joint arthrosis, patellar instability, and the degeneration of the patellofemoral cartilage. Therefore, a [...] Read more.
Lower-limb alignment deformities constitute a significant clinical concern, as they can lead to serious complications, including progressive degenerative diseases and disabilities. Rotational deformities may give rise to conditions such as joint arthrosis, patellar instability, and the degeneration of the patellofemoral cartilage. Therefore, a comprehensive evaluation of lower-limb alignment is essential for the effective patient management, preoperative planning, and successful correction of these deformities. The primary assessment method employs full-length standing radiographs in the anteroposterior (AP) projection, which facilitates accurate measurements of the anatomical and mechanical axes of the lower limb, including angles and deviations. The outcomes of this analysis are vital for the meticulous planning of osteotomy and total knee arthroplasty (TKA). In addition, computed tomography (CT) provides a specialized approach for the precise evaluation of femoral and tibial rotation. In this area, there are potential opportunities for the implementation of AI-based automated measurement systems. Full article
(This article belongs to the Section Nuclear Medicine & Radiology)
Show Figures

Figure 1

Figure 1
<p>Evaluation of the mechanical and anatomical axes of the lower limb (based on Luís et al. [<a href="#B8-jcm-13-06975" class="html-bibr">8</a>]).</p>
Full article ">Figure 2
<p>Evaluation of lower-limb angles (based on Luís et al. [<a href="#B8-jcm-13-06975" class="html-bibr">8</a>]).</p>
Full article ">Figure 3
<p>Rotation deformities: femoral torsion angle and tibial torsion angle (based on Luís et al. [<a href="#B8-jcm-13-06975" class="html-bibr">8</a>]).</p>
Full article ">
20 pages, 687 KiB  
Review
Deep Learning-Based Atmospheric Visibility Detection
by Yawei Qu, Yuxin Fang, Shengxuan Ji, Cheng Yuan, Hao Wu, Shengbo Zhu, Haoran Qin and Fan Que
Atmosphere 2024, 15(11), 1394; https://doi.org/10.3390/atmos15111394 - 19 Nov 2024
Viewed by 199
Abstract
Atmospheric visibility is a crucial meteorological element impacting urban air pollution monitoring, public transportation, and military security. Traditional visibility detection methods, primarily manual and instrumental, have been costly and imprecise. With advancements in data science and computing, deep learning-based visibility detection technologies have [...] Read more.
Atmospheric visibility is a crucial meteorological element impacting urban air pollution monitoring, public transportation, and military security. Traditional visibility detection methods, primarily manual and instrumental, have been costly and imprecise. With advancements in data science and computing, deep learning-based visibility detection technologies have rapidly emerged as a research hotspot in atmospheric science. This paper systematically reviews the applications of various deep learning models—Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), and Transformer networks—in visibility estimation, prediction, and enhancement. Each model’s characteristics and application methods are discussed, highlighting the efficiency of CNNs in spatial feature extraction, RNNs in temporal tracking, GANs in image restoration, and Transformers in capturing long-range dependencies. Furthermore, the paper addresses critical challenges in the field, including dataset quality, algorithm optimization, and practical application barriers, proposing future research directions, such as the development of large-scale, accurately labeled datasets, innovative learning strategies, and enhanced model interpretability. These findings highlight the potential of deep learning in enhancing atmospheric visibility detection techniques, providing valuable insights into the literature and contributing to advances in the field of meteorological observation and public safety. Full article
(This article belongs to the Special Issue Air Pollution Modeling and Observations in Asian Megacities)
20 pages, 458 KiB  
Article
Neural Architecture Search via Trainless Pruning Algorithm: A Bayesian Evaluation of a Network with Multiple Indicators
by Yiqi Lin, Yuki Endo, Jinho Lee and Shunsuke Kamijo
Electronics 2024, 13(22), 4547; https://doi.org/10.3390/electronics13224547 - 19 Nov 2024
Viewed by 243
Abstract
Neural Architecture Search (NAS) has found applications in various areas of computer vision, including image recognition and object detection. An increasing number of algorithms, such as ENAS (Efficient Neural Architecture Search via Parameter Sharing) and DARTS (Differentiable Architecture Search), have been applied to [...] Read more.
Neural Architecture Search (NAS) has found applications in various areas of computer vision, including image recognition and object detection. An increasing number of algorithms, such as ENAS (Efficient Neural Architecture Search via Parameter Sharing) and DARTS (Differentiable Architecture Search), have been applied to NAS. Nevertheless, the current Training-free NAS methods continue to exhibit unreliability and inefficiency. This paper introduces a training-free prune-based algorithm called TTNAS (True-Skill Training-Free Neural Architecture Search), which utilizes a Bayesian method (true-skill algorithm) to combine multiple indicators for evaluating neural networks across different datasets. The algorithm demonstrates highly competitive accuracy and efficiency compared to state-of-the-art approaches on various datasets. Specifically, it achieves 93.90% accuracy on CIFAR-10, 71.91% accuracy on CIFAR-100, and 44.96% accuracy on ImageNet 16-120, using 1466 GPU seconds in NAS-Bench-201. Additionally, the algorithm exhibits improved adaptation to other datasets and tasks. Full article
(This article belongs to the Special Issue Computational Imaging and Its Application)
Show Figures

Figure 1

Figure 1
<p>Pipeline: the left part is an algorithm to the evaluate child networks’ performance; the right part is the procedure of the main algorithm. “<span class="html-italic">N</span>” represents the Gaussian distribution, “<span class="html-italic">s</span>” signifies skill, “<span class="html-italic">p</span>” denotes performance, representing actual performance, “<span class="html-italic">d</span>” implies difference, indicating the performance gap between the two parties in competition, and “<span class="html-italic">r</span>” stands for the competing result. Each iteration involves evaluating the performance of the current network without incorporating a specific operation <math display="inline"><semantics> <msub> <mi>o</mi> <mi>j</mi> </msub> </semantics></math>, followed by pruning the least-significant operator on each edge. This process continues until we obtain a single-path network that represents the optimal neural network configuration.</p>
Full article ">Figure 2
<p>The distribution of the preferences of <math display="inline"><semantics> <msub> <mi mathvariant="script">K</mi> <mi mathvariant="script">N</mi> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>R</mi> <mi mathvariant="script">N</mi> </msub> </semantics></math> (conducted by TE-NAS [<a href="#B3-electronics-13-04547" class="html-bibr">3</a>]). <math display="inline"><semantics> <msub> <mi mathvariant="script">K</mi> <mi mathvariant="script">N</mi> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>R</mi> <mi mathvariant="script">N</mi> </msub> </semantics></math> favor different operations on NAS-Bench-201.</p>
Full article ">Figure 3
<p>Training phase elaboration: The algorithm starts with super-network <math display="inline"><semantics> <msub> <mi mathvariant="script">N</mi> <mn>0</mn> </msub> </semantics></math>, then it is optimized under the vector sum of <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi mathvariant="script">K</mi> </msub> <mover accent="true"> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mrow> <mo>∂</mo> <msub> <mi mathvariant="script">F</mi> <msub> <mi>o</mi> <mi>j</mi> </msub> </msub> </mrow> <mrow> <mo>∂</mo> <msub> <mi mathvariant="script">K</mi> <mrow> <mi>t</mi> <mo>,</mo> <msub> <mi>o</mi> <mi>j</mi> </msub> </mrow> </msub> </mrow> </mfrac> </mstyle> <mo>→</mo> </mover> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>R</mi> </msub> <mover accent="true"> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mrow> <mo>∂</mo> <msub> <mi mathvariant="script">F</mi> <msub> <mi>o</mi> <mi>j</mi> </msub> </msub> </mrow> <mrow> <mo>∂</mo> <msub> <mover accent="true"> <mi>R</mi> <mo stretchy="false">^</mo> </mover> <mrow> <mi>t</mi> <mo>,</mo> <msub> <mi>o</mi> <mi>j</mi> </msub> </mrow> </msub> </mrow> </mfrac> </mstyle> <mo>→</mo> </mover> </mrow> </semantics></math>. <math display="inline"><semantics> <msub> <mi>λ</mi> <mi mathvariant="script">K</mi> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>R</mi> </msub> </semantics></math> are obtained from Algorithm 1. Iteratively, the stop point <math display="inline"><semantics> <msub> <mi mathvariant="script">N</mi> <mi>t</mi> </msub> </semantics></math> will approach the best performance point.</p>
Full article ">Figure 4
<p>“<span class="html-italic">N</span>” represents the Gaussian distribution, “<span class="html-italic">s</span>” signifies skill, denoting the ability value of each player, “<span class="html-italic">p</span>” denotes performance, representing actual performance, “<span class="html-italic">d</span>” implies difference, indicating the performance gap between the two parties in competition, and “<span class="html-italic">r</span>” stands for result, signifying the outcome of the competition.</p>
Full article ">
Back to TopTop