Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (208)

Search Parameters:
Keywords = weakly supervised learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 5312 KiB  
Article
Application of Anomaly Detection to Identifying Aggressive Pig Behaviors Using Reconstruction Loss Inversion
by Hyun-Soo Kim, Yu Sung Edward Kim, Fania Ardelia Devira and Mun Yong Yi
AgriEngineering 2024, 6(4), 4442-4459; https://doi.org/10.3390/agriengineering6040252 (registering DOI) - 25 Nov 2024
Viewed by 42
Abstract
Increasing concerns of animal welfare in the commercial pig industry include aggression between pigs as it affects their health and growth. Early detection of aggressive behaviors is essential for optimizing their living environment. A major challenge for detection is that these behaviors are [...] Read more.
Increasing concerns of animal welfare in the commercial pig industry include aggression between pigs as it affects their health and growth. Early detection of aggressive behaviors is essential for optimizing their living environment. A major challenge for detection is that these behaviors are observed occasionally in normal conditions. Under this circumstance, a limited amount of aggressive behavior data will lead to class imbalance issue, making it difficult to develop an effective classification model for the detection of aggressive behaviors. In order to address this issue, this study has been designed with the aim of developing an anomaly detection model for identifying aggressive behaviors in pigs, enabling better management of the imbalanced class distribution and effective detection of infrequent aggressive episodes. The model consists of a convolutional neural network (CNN) and a variational long short-term memory (LSTM) autoencoder. Additionally, we adopted a training method similar to weakly supervised anomaly detection and included a few aggressive behavior data in the training set for prior learning. To effectively utilize the aggressive behavior data, we introduced Reconstruction Loss Inversion, a novel objective function, to train the autoencoder-based model, which increases the reconstruction error for aggressive behaviors by inverting the loss function. This approach has improved detection accuracy in both AUC-ROC and AUC-PR, demonstrating a significant enhancement in distinguishing aggressive episodes from normal behavior. As a result, it outperforms traditional classification-based methods, effectively identifying aggressive behaviors in a natural pig-farming environment. This method offers a robust solution for detecting aggressive animal behaviors and contributes to improving their welfare. Full article
(This article belongs to the Section Livestock Farming Technology)
Show Figures

Figure 1

Figure 1
<p>Frame captures of four types of aggressive behaviors exhibited by pigs: (<b>a</b>) biting, showcasing direct physical contact with teeth visible at the target area, (<b>b</b>) levering, highlighting the use of head or body force to push another pig, (<b>c</b>) threatening, emphasizing tense posture and visual intimidation without physical contact, and (<b>d</b>) retreating. illustrating a pig withdrawing from the aggressor.</p>
Full article ">Figure 1 Cont.
<p>Frame captures of four types of aggressive behaviors exhibited by pigs: (<b>a</b>) biting, showcasing direct physical contact with teeth visible at the target area, (<b>b</b>) levering, highlighting the use of head or body force to push another pig, (<b>c</b>) threatening, emphasizing tense posture and visual intimidation without physical contact, and (<b>d</b>) retreating. illustrating a pig withdrawing from the aggressor.</p>
Full article ">Figure 2
<p>Workflow of AutoEncoder for feature extraction and reconstruction: encoding (<span class="html-italic">f</span>), latent representation (<span class="html-italic">z</span>), and reconstructed features (<math display="inline"><semantics> <mrow> <mover accent="true"> <mrow> <mi>f</mi> </mrow> <mo>^</mo> </mover> </mrow> </semantics></math>).</p>
Full article ">Figure 3
<p>The workflow of the proposed aggression detection framework. Spatial features are extracted from video frames using a convolutional neural network (CNN) and reconstructed through a variational long short-term memory (LSTM) autoencoder. The red arrows indicate the flow of data between processes, black arrows represent key transformations or connections between stages, and blue curved arrows illustrate the computation and application of the reconstruction loss. Anomalies are detected by calculating the mean squared error between the original and reconstructed features. Reconstruction Loss Inversion (RLI) enhances these error signals to improve detection accuracy.</p>
Full article ">Figure 4
<p>Distribution of reconstruction errors in unsupervised and Reconstruction Loss Inversion (RLI) settings. (<b>a</b>) is the distribution of reconstruction errors when the positive labels are not included in the train set. (<b>b</b>,<b>c</b>) are the distribution when the positive labels are included in the train set for RLI.</p>
Full article ">Figure 5
<p>Distribution of True Positives (TPs), False Positives (FPs), True Negatives (TNs), and False Negatives (FNs) by thresholds. In the figure, the label “_Over” following the model name indicates the condition of simple oversampling.</p>
Full article ">Figure 6
<p>Comparisons of precision–recall curves. In the figure, the label “_Over” following the model name indicates the condition of simple oversampling.</p>
Full article ">
18 pages, 2733 KiB  
Article
Mastitis Classification in Dairy Cows Using Weakly Supervised Representation Learning
by Soo-Hyun Cho, Mingyung Lee, Wang-Hee Lee, Seongwon Seo and Dae-Hyun Lee
Agriculture 2024, 14(11), 2084; https://doi.org/10.3390/agriculture14112084 - 19 Nov 2024
Viewed by 387
Abstract
Detecting mastitis on time in dairy cows is crucial for maintaining milk production and preventing significant economic losses, and machine learning has recently gained significant attention as a promising solution to address this issue. Most studies have detected mastitis on time series data [...] Read more.
Detecting mastitis on time in dairy cows is crucial for maintaining milk production and preventing significant economic losses, and machine learning has recently gained significant attention as a promising solution to address this issue. Most studies have detected mastitis on time series data using a supervised learning model, which requires the scale of labeled data; however, annotating the onset of mastitis in milking data from dairy cows is very difficult and costly, while supervised learning relies on accurate labels for ensuring the performance. Therefore, this study proposed a mastitis classification based on weakly supervised representation learning using an autoencoder on time series milking data, which allows for concurrent milking representation learning and weakly supervision with low-cost labels. The proposed method employed a structure where the classifier branches from the latent space of a 1D-convolutional autoencoder, enabling representation learning of milking data to be conducted from the perspective of reconstructing the original information and detecting mastitis. The branched classifier backpropagate the mastitis symptoms, which are less costly than mastitis diagnosis, during the encoder’s representation learning. The results showed that the proposed method achieved an F1-score of 0.6 that demonstrates performance comparable to previous studies despite using low-cost labels. Our method has the advantage of being easily reproducible across various data domains through low-cost annotation for supervised learning and is practical as it can be implemented with just milking data and weak labels, which can be collected in the field. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of the proposed pipeline for mastitis based on autoencoder with classifier.</p>
Full article ">Figure 2
<p>Extraction of training examples using the window sliding method with label annotation based on mastitis onset period. The figure shows an example with a 5-day size window sliding.</p>
Full article ">Figure 3
<p>The proposed supervised autoencoder-based mastitis detection for dairy cows. The model comprises an autoencoder structure, utilizing a 1D-CNN as the backbone network. In addition, a classifier is included to differentiate between the normal and mastitis feature vectors.</p>
Full article ">Figure 4
<p>Loss curves for training (<b>a</b>) and validation (<b>b</b>) sets during model training; each loss value was expressed as mean and standard deviation within 100 epoch intervals.</p>
Full article ">Figure 5
<p>Results of representative cow samples for mastitis detection using deep learning model.</p>
Full article ">Figure 6
<p>Comparison of reconstruction loss between AE and our model with label period.</p>
Full article ">Figure 7
<p>The ROC curve with AUC by onset period (<b>a</b>) and window size (<b>b</b>).</p>
Full article ">
19 pages, 2815 KiB  
Article
Reinforcement Learning with Multi-Policy Movement Strategy for Weakly Supervised Temporal Sentence Grounding
by Shan Jiang, Yuqiu Kong, Lihe Zhang and Baocai Yin
Appl. Sci. 2024, 14(21), 9696; https://doi.org/10.3390/app14219696 - 23 Oct 2024
Viewed by 624
Abstract
Temporal grounding involves identifying the target moment based on the provided sentence in an untrimmed video. In weakly supervised temporal grounding studies, existing temporal sentence grounding methods face challenges in (1) learning semantic alignment between the candidate window and language query and (2) [...] Read more.
Temporal grounding involves identifying the target moment based on the provided sentence in an untrimmed video. In weakly supervised temporal grounding studies, existing temporal sentence grounding methods face challenges in (1) learning semantic alignment between the candidate window and language query and (2) identifying accurate temporal boundaries during the grounding process. In this work, we propose a reinforcement learning (RL)-based multi-policy movement framework (MMF) for weakly supervised temporal sentence grounding. We imitate the behavior of human beings when grounding specified content in a video, starting from a coarse location and then identifying fine-grained temporal boundaries. The RL-based framework initially sets a series of candidate windows and learns to adjust them step-by-step by maximizing the rewards, indicating the semantic alignment between the current window and the query. To better learn the alignment, we propose a Gaussian-based Dual-Alignment Module (GDAM) which combines the strengths of both scoring-based and reconstruction-based alignment methods, addressing the issues of negative sample bias and language bias. We also employ the multi-policy movement strategy (MMS) which grounds the temporal position in a coarse-to-fine manner. Extensive experiments demonstrate that our proposed method outperforms existing weakly supervised algorithms, achieving state-of-the-art performance on the Charades-STA and ActivityNet Captions datasets. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Existing weakly supervised approaches for temporal sentence ground. (<b>a</b>) Scoring-based methods learn to score the semantic alignment using a large number of positive/negative samples; (<b>b</b>) Reconstruction-based method assumes that the mask query can be reconstructed from matched video proposals; (<b>c</b>) RL-based methods adjust the candidate windows by rewarding beneficial actions.</p>
Full article ">Figure 2
<p>The framework of the proposed method. In the RL framework, the state learns the cross-modal representation from the environment (input query and untrimmed video) and inputs to the agent. The agent adopts the MMS to adjust to the candidate window according to the reward, which is defined by the GDAM.</p>
Full article ">Figure 3
<p>The dynamic Gaussian mask varies in the training process and allocates dynamic attention values to different positions of the video. Before E epoch, the Gaussian distribution gradually changes from red to blue, and its variance continues to increase. After E epoch, the opposite is true.</p>
Full article ">Figure 4
<p>Influence of hyper-parameters. (<b>a</b>) Influence of epoch number <span class="html-italic">E</span> in Equation (<a href="#FD6-applsci-14-09696" class="html-disp-formula">6</a>) and (<b>b</b>) influence of different maximum number of steps in RL.</p>
Full article ">Figure 5
<p>Movement process of our method.</p>
Full article ">Figure 6
<p>Visual comparisons with CPL.</p>
Full article ">Figure 7
<p>Failed test cases during testing.</p>
Full article ">
27 pages, 6293 KiB  
Article
Lightweight Advanced Deep Neural Network (DNN) Model for Early-Stage Lung Cancer Detection
by Isha Bhatia, Aarti, Syed Immamul Ansarullah, Farhan Amin and Amerah Alabrah
Diagnostics 2024, 14(21), 2356; https://doi.org/10.3390/diagnostics14212356 - 22 Oct 2024
Viewed by 676
Abstract
Background: Lung cancer, also known as lung carcinoma, has a high mortality rate; however, an early prediction helps to reduce the risk. In the current literature, various approaches have been developed for the prediction of lung carcinoma (at an early stage), but these [...] Read more.
Background: Lung cancer, also known as lung carcinoma, has a high mortality rate; however, an early prediction helps to reduce the risk. In the current literature, various approaches have been developed for the prediction of lung carcinoma (at an early stage), but these still have various issues, such as low accuracy, high noise, low contrast, poor recognition rates, and a high false-positive rate, etc. Thus, in this research effort, we have proposed an advanced algorithm and combined two different types of deep neural networks to make it easier to spot lung melanoma in the early phases. Methods: We have used WDSI (weakly supervised dense instance-level lung segmentation) for laborious pixel-level annotations. In addition, we suggested an SS-CL (deep continuous learning-based deep neural network) that can be applied to the labeled and unlabeled data to improve efficiency. This work intends to evaluate potential lightweight, low-memory deep neural net (DNN) designs for image processing. Results: Our experimental results show that, by combining WDSI and LSO segmentation, we can achieve super-sensitive, specific, and accurate early detection of lung cancer. For experiments, we used the lung nodule (LUNA16) dataset, which consists of the patients’ 3D CT scan images. We confirmed that our proposed model is lightweight because it uses less memory. We have compared them with state-of-the-art models named PSNR and SSIM. The efficiency is 32.8% and 0.97, respectively. The proposed lightweight deep neural network (DNN) model archives a high accuracy of 98.2% and also removes noise more effectively. Conclusions: Our proposed approach has a lot of potential to help medical image analysis to help improve the accuracy of test results, and it may also prove helpful in saving patients’ lives. Full article
(This article belongs to the Special Issue Artificial Intelligence in Cancers—2nd Edition)
Show Figures

Figure 1

Figure 1
<p>Block schematic of the proposed model.</p>
Full article ">Figure 2
<p>Dataset utilization for cancer detection.</p>
Full article ">Figure 3
<p>Dataset Utilization for Cancerous Image. (<b>a</b>) Large-Cell Carcinoma; (<b>b</b>) Squamous Cell Carcinoma.</p>
Full article ">Figure 4
<p>Sample images in the proposed model. (<b>A</b>,<b>B</b>) The top and bottom images show the original CT sample images and the contrast-stretched images. (<b>C</b>,<b>D</b>) The top and bottom images show the edge enhancement image and segmentation. (<b>E</b>) The images show the classified output (large-cell cancer, squamous cell cancer, and normal).</p>
Full article ">Figure 4 Cont.
<p>Sample images in the proposed model. (<b>A</b>,<b>B</b>) The top and bottom images show the original CT sample images and the contrast-stretched images. (<b>C</b>,<b>D</b>) The top and bottom images show the edge enhancement image and segmentation. (<b>E</b>) The images show the classified output (large-cell cancer, squamous cell cancer, and normal).</p>
Full article ">Figure 5
<p>Left and right with cancer and highlighted cancerous image.</p>
Full article ">Figure 6
<p>Nodule Detection Confusion Matrix.</p>
Full article ">Figure 7
<p>Risk Assessment Confusion Matrix.</p>
Full article ">Figure 8
<p>Performance Measures for Risk Assessment and Nodule Detection.</p>
Full article ">Figure 9
<p>Comparative analysis of the nodule detection accuracy of the proposed model with different techniques.</p>
Full article ">Figure 10
<p>Comparative analysis of the risk assessment accuracy in the proposed model with different techniques.</p>
Full article ">Figure 11
<p>The assessment of the proposed and new models with SSIM performance measures.</p>
Full article ">Figure 12
<p>Evaluation of the suggested models and recent ones using the performance metrics ET.</p>
Full article ">Figure 13
<p>Evaluation of the suggested models and recent ones using the performance metrics (PSNR).</p>
Full article ">
21 pages, 29547 KiB  
Article
Detection of Wet Snow by Weakly Supervised Deep Learning Change Detection Algorithm with Sentinel-1 Data
by Hanying Gong, Zehao Yu, Shiqiang Zhang and Gang Zhou
Remote Sens. 2024, 16(19), 3575; https://doi.org/10.3390/rs16193575 - 25 Sep 2024
Viewed by 648
Abstract
The snowmelt process plays a crucial role in hydrological forecasting, climate change, disaster management, and other related fields. Accurate detection of wet snow distribution and its changes is essential for understanding and modeling the snow melting process. To address the limitations of conventional [...] Read more.
The snowmelt process plays a crucial role in hydrological forecasting, climate change, disaster management, and other related fields. Accurate detection of wet snow distribution and its changes is essential for understanding and modeling the snow melting process. To address the limitations of conventional fixed-threshold methods, which suffer from poor adaptability and significant interference from scattering noise, we propose a weakly supervised deep learning change detection algorithm with Sentinel-1 multi-temporal data. This algorithm incorporates the Multi-Region Convolution Module (MRC) to enhance the central region while effectively suppressing edge noise. Furthermore, it integrates the ResNet residual network to capture deeper image features, facilitating wet snow identification through feature fusion. Various combinations of differential images, polarization data, elevation, and slope information during and after snowmelt were input into the model and tested. The results suggest that the combination of differential images, VV polarization data, and slope information has greater advantages in wet snow extraction. Comparisons between our method, the fixed-threshold method, OTSU algorithm, and FCM algorithm against the results of Landsat images indicates that the overall accuracy of our method improves significantly when the proportion of wet snow cover is large, and the average overall accuracy of wet snow extraction is 85.2%. This study provides clues for the accurate identification of wet snow during the mid-snowmelt phase. Full article
(This article belongs to the Section Remote Sensing in Geology, Geomorphology and Hydrology)
Show Figures

Figure 1

Figure 1
<p>Study area. (<b>a</b>) Elevation. (<b>b</b>) Land-cover types.</p>
Full article ">Figure 2
<p>Differential images and corresponding histograms for different snow melting periods in 2019.</p>
Full article ">Figure 3
<p>(<b>a</b>) Landsat image from 2 July 2018. (<b>b</b>) HFCM clustering results with an initial cluster number of 5. (<b>c</b>) HFCM clustering results with an initial cluster number of 2.</p>
Full article ">Figure 4
<p>Architecture of the dual-branch deep learning model. n is the number of residual blocks and different values are chosen depending on the neighbourhood size. When the neighbourhood size is 7, n = 3.</p>
Full article ">Figure 5
<p>Wet snow extraction results of the four methods.</p>
Full article ">Figure 6
<p>Wet snow percentage across different elevation zones.</p>
Full article ">Figure 7
<p>Wet snow cover area, temperature, and precipitation from March to July 2018–2021.</p>
Full article ">Figure 8
<p>Wet snow detection results with different input data on 3 June 2019. (<b>a</b>–<b>n</b>) correspond to the identification results of different band combinations in <a href="#remotesensing-16-03575-t005" class="html-table">Table 5</a>.</p>
Full article ">Figure 9
<p>Comparison of wet snow extraction accuracy with different input data.</p>
Full article ">Figure 10
<p>Snow detection under cloud cover. (<b>a</b>) Landsat false-color image. (<b>b</b>) Optical imagery detection results. (<b>c</b>) Wet snow detection results.</p>
Full article ">
19 pages, 3821 KiB  
Article
Pedestrian Re-Identification Based on Weakly Supervised Multi-Feature Fusion
by Changming Qin, Zhiwen Wang, Linghui Zhang, Qichang Peng, Guixing Lin and Guanlin Lu
Algorithms 2024, 17(10), 426; https://doi.org/10.3390/a17100426 - 24 Sep 2024
Viewed by 527
Abstract
This article proposes a weakly supervised multi-feature fusion pedestrian re-identification method, which introduces a multi-feature fusion mechanism to extract feature information from different layers into the same feature space and fuse them into the deep and shallow joint features. The goal is to [...] Read more.
This article proposes a weakly supervised multi-feature fusion pedestrian re-identification method, which introduces a multi-feature fusion mechanism to extract feature information from different layers into the same feature space and fuse them into the deep and shallow joint features. The goal is to fully utilize the rich information in the image and improve the performance and robustness of the pedestrian re-identification model. Secondly, by matching the target character with unprocessed surveillance videos, one only needs to know that the identity of a person appears in the video, without annotating the identity of a person in any of the frames of the video during the training process. This simplifies the annotation of training images by replacing accurate annotations with broad annotations; that is, it puts the pedestrian identities that appeared in the video in one package and assigns a video-level label to each package. This greatly reduces the annotation work and transforms this weakly supervised pedestrian re-identification challenge into a multi-instance and multi-label learning problem. The experimental results show that the method proposed in this paper is effective and can significantly improve mAP. Full article
(This article belongs to the Section Evolutionary Algorithms and Machine Learning)
Show Figures

Figure 1

Figure 1
<p>Weak supervision diagram.</p>
Full article ">Figure 2
<p>Development history of pedestrian re-identification [<a href="#B19-algorithms-17-00426" class="html-bibr">19</a>,<a href="#B22-algorithms-17-00426" class="html-bibr">22</a>].</p>
Full article ">Figure 3
<p>Weakly supervised algorithm model network architecture.</p>
Full article ">Figure 4
<p>Multi-feature fusion module.</p>
Full article ">Figure 5
<p>Schematic diagram of triplet loss.</p>
Full article ">Figure 6
<p>Model performance results.</p>
Full article ">Figure 6 Cont.
<p>Model performance results.</p>
Full article ">Figure 7
<p>CMC curves for Market-1501 dataset.</p>
Full article ">Figure 8
<p>DuKeMTMC partial retrieval visualization results chart.</p>
Full article ">Figure 9
<p>Market-1501 partial retrieval visualization results chart.</p>
Full article ">
15 pages, 1791 KiB  
Article
NMGrad: Advancing Histopathological Bladder Cancer Grading with Weakly Supervised Deep Learning
by Saul Fuster, Umay Kiraz, Trygve Eftestøl, Emiel A. M. Janssen and Kjersti Engan
Bioengineering 2024, 11(9), 909; https://doi.org/10.3390/bioengineering11090909 - 11 Sep 2024
Viewed by 816
Abstract
The most prevalent form of bladder cancer is urothelial carcinoma, characterized by a high recurrence rate and substantial lifetime treatment costs for patients. Grading is a prime factor for patient risk stratification, although it suffers from inconsistencies and variations among pathologists. Moreover, absence [...] Read more.
The most prevalent form of bladder cancer is urothelial carcinoma, characterized by a high recurrence rate and substantial lifetime treatment costs for patients. Grading is a prime factor for patient risk stratification, although it suffers from inconsistencies and variations among pathologists. Moreover, absence of annotations in medical imaging renders it difficult to train deep learning models. To address these challenges, we introduce a pipeline designed for bladder cancer grading using histological slides. First, it extracts urothelium tissue tiles at different magnification levels, employing a convolutional neural network for processing for feature extraction. Then, it engages in the slide-level prediction process. It employs a nested multiple-instance learning approach with attention to predict the grade. To distinguish different levels of malignancy within specific regions of the slide, we include the origins of the tiles in our analysis. The attention scores at region level are shown to correlate with verified high-grade regions, giving some explainability to the model. Clinical evaluations demonstrate that our model consistently outperforms previous state-of-the-art methods, achieving an F1 score of 0.85. Full article
(This article belongs to the Special Issue Computer-Aided Diagnosis for Biomedical Engineering)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>NMGrad pipeline. Initially, we apply a tissue segmentation algorithm for ROI extraction. Then, we pinpoint diagnostically significant urothelium areas within WSIs. Subsequently, we split the urothelium mask into regions, based on proximity and size, and extract tile triplets. In a hierarchical fashion, we further transform these triplets within their corresponding regions into region feature embeddings, using an attention-based aggregation method. All the region representations are then consolidated into a comprehensive WSI-level representation through a weight-independent attention module. Finally, this WSI feature embedding is input into the WHO04 grading classifier, in order to produce accurate WSI grade predictions.</p>
Full article ">Figure 2
<p>We obtain sets comprised of three tiles at different magnification levels, named triplets <math display="inline"><semantics> <mi mathvariant="script">T</mi> </semantics></math>, enabling detailed examination. Tile triplets demonstrate regions associated with low- and high-grade features.</p>
Full article ">Figure 3
<p>Region definition. Urothelial tissue within a WSI is eligible for tile extraction. Blobs of tiles are formed, and blobs smaller than a threshold <math display="inline"><semantics> <msub> <mi>T</mi> <mi>LOWER</mi> </msub> </semantics></math> are discarded. From the remaining blobs, any smaller than <math display="inline"><semantics> <msub> <mi>T</mi> <mi>UPPER</mi> </msub> </semantics></math> are kept and defined as a region. For blobs bigger than <math display="inline"><semantics> <msub> <mi>T</mi> <mi>UPPER</mi> </msub> </semantics></math>, the blob is subdivided into smaller pieces, using the location of the individual tiles within and KMeans clustering. The obtained clusters are designated as regions.</p>
Full article ">Figure 4
<p>Plot displaying the WSI predictions of the test set, with green shading representing the LG confidence interval, red for HG, and gray denoting the uncertainty interval. Additionally, a blue line depicts the regression line fitting the predictions.</p>
Full article ">Figure 5
<p>Region-level attention score heatmaps. Example regions of annotations of low- and high-grade ROIs annotated by a uropathologist are compared to the output attention provided by the proposed model NMGrad, left-to-right, respectively. The choice of annotated ROIs corresponded to the highest attention scores; red and blue correspond to low and high attention correspondingly. We have included the WSI-level prediction score for reference.</p>
Full article ">Figure 6
<p>Correlation between region attention scores and output prediction of region embeddings on the test set of WSIs. Regions from accurately predicted WSIs (TN, TP) are denoted by squares, while those from incorrectly predicted WSIs (FN, FP) are marked with crosses. A discernible pattern emerges, where low attention scores align with diminished predictions and, conversely, higher attention scores correlate with elevated predictions. Additionally, an observable trend indicates that wrong predictions tend to manifest on the opposite end of the spectrum, with low-grade instances concentrating high attention and prediction scores, and vice versa. The trend is represented with a polynomial regression line (RL).</p>
Full article ">
18 pages, 1057 KiB  
Review
Advancing in RGB-D Salient Object Detection: A Survey
by Ai Chen, Xin Li, Tianxiang He, Junlin Zhou and Duanbing Chen
Appl. Sci. 2024, 14(17), 8078; https://doi.org/10.3390/app14178078 - 9 Sep 2024
Viewed by 966
Abstract
The human visual system can rapidly focus on prominent objects in complex scenes, significantly enhancing information processing efficiency. Salient object detection (SOD) mimics this biological ability, aiming to identify and segment the most prominent regions or objects in images or videos. This reduces [...] Read more.
The human visual system can rapidly focus on prominent objects in complex scenes, significantly enhancing information processing efficiency. Salient object detection (SOD) mimics this biological ability, aiming to identify and segment the most prominent regions or objects in images or videos. This reduces the amount of data needed to process while enhancing the accuracy and efficiency of information extraction. In recent years, SOD has made significant progress in many areas such as deep learning, multi-modal fusion, and attention mechanisms. Additionally, it has expanded in real-time detection, weakly supervised learning, and cross-domain applications. Depth images can provide three-dimensional structural information of a scene, aiding in a more accurate understanding of object shapes and distances. In SOD tasks, depth images enhance detection accuracy and robustness by providing additional geometric information. This additional information is particularly crucial in complex scenes and occlusion situations. This survey reviews the substantial advancements in the field of RGB-Depth SOD, with a focus on the critical roles played by attention mechanisms and cross-modal fusion methods. It summarizes the existing literature, provides a brief overview of mainstream datasets and evaluation metrics, and quantitatively compares the discussed models. Full article
(This article belongs to the Special Issue Artificial Intelligence in Computer Vision and Object Detection)
Show Figures

Figure 1

Figure 1
<p>Typical schemes for RGB-D SOD.</p>
Full article ">Figure 2
<p>Typical RGB-D saliency detection datasets. The RGB image, depth map, and annotation are shown from left to right for each dataset.</p>
Full article ">
21 pages, 2923 KiB  
Article
Multi-Scale Classification and Contrastive Regularization: Weakly Supervised Large-Scale 3D Point Cloud Semantic Segmentation
by Jingyi Wang, Jingyang He, Yu Liu, Chen Chen, Maojun Zhang and Hanlin Tan
Remote Sens. 2024, 16(17), 3319; https://doi.org/10.3390/rs16173319 - 7 Sep 2024
Viewed by 874
Abstract
With the proliferation of large-scale 3D point cloud datasets, the high cost of per-point annotation has spurred the development of weakly supervised semantic segmentation methods. Current popular research mainly focuses on single-scale classification, which fails to address the significant feature scale differences between [...] Read more.
With the proliferation of large-scale 3D point cloud datasets, the high cost of per-point annotation has spurred the development of weakly supervised semantic segmentation methods. Current popular research mainly focuses on single-scale classification, which fails to address the significant feature scale differences between background and objects in large scenes. Therefore, we propose MCCR (Multi-scale Classification and Contrastive Regularization), an end-to-end semantic segmentation framework for large-scale 3D scenes under weak supervision. MCCR first aggregates features and applies random downsampling to the input data. Then, it captures the local features of a random point based on multi-layer features and the input coordinates. These features are then fed into the network to obtain the initial and final prediction results, and MCCR iteratively trains the model using strategies such as contrastive learning. Notably, MCCR combines multi-scale classification with contrastive regularization to fully exploit multi-scale features and weakly labeled information. We investigate both point-level and local contrastive regularization to leverage point cloud augmentor and local semantic information and introduce a Decoupling Layer to guide the loss optimization in different spaces. Results on three popular large-scale datasets, S3DIS, SemanticKITTI and SensatUrban, demonstrate that our model achieves state-of-the-art (SOTA) performance on large-scale outdoor datasets with only 0.1% labeled points for supervision, while maintaining strong performance on indoor datasets. Full article
Show Figures

Figure 1

Figure 1
<p>Multi-scale classification. The point cloud is processed to obtain multi-scale features, which are extracted and analyzed at different scales, and then fed into classifiers. The results are fused to produce the final prediction.</p>
Full article ">Figure 2
<p>The architecture of MCCR. The original point clouds are first processed through random mirroring, rotation and jittering to generate augmented point clouds. Then, both the original and augmented points are subjected to Local Feature Aggregation and Random Sampling modules to obtain multi-scale features. By randomly selecting a point, its local features are captured at different scales and interpolated accordingly. The interpolated local features are, on the one hand, used for multi-scale classification, and on the other hand, they are fused and fed into a series of MLPs to obtain initial prediction results. These initial predictions are then utilized for local and point-level contrastive regularization, and combined with the multi-scale classification outcomes to derive the final predictions. Note that the red dot represents the input point, while the yellow and reddish-brown dots represent the predictions based on the original data and augmented data, respectively.</p>
Full article ">Figure 3
<p>Visualization results on the validation set (Sequence 08) of SemanticKITTI. Red circles highlight where we outperform SQN*.</p>
Full article ">Figure 4
<p>Visualization results on the validation set of SensatUrban. Raw point cloud, ground truth, our results and the baseline are presented separately, from left to right, and we use black circles to highlight where we outperform SQN*.</p>
Full article ">Figure 5
<p>Visualization results on the test set of S3DIS Area-5. Red circles highlight where we outperform SQN*.</p>
Full article ">Figure 6
<p>Visualization results on the test set of S3DIS Area-5. Each one includes raw point cloud with segmentation results displayed in the red rectangle for the ground truth, baseline results and ours, respectively. With 1% labeled points, the segmentation results of the baseline method deviate from the ground truth, but our proposed MCCR is able to obtain results consistent with the ground truth.</p>
Full article ">Figure 7
<p>Visualization results on the test set of S3DIS Area-5. Each one includes the raw point cloud with segmentation results displayed in the red rectangle for the ground truth, baseline results and ours, respectively. With 1% labeled points, the segmentation results of our proposed MCCR deviate from the ground truth, but the baseline method is able to obtain results consistent with the ground truth.</p>
Full article ">
12 pages, 1254 KiB  
Article
Deep Learning Model for Predicting Lung Adenocarcinoma Recurrence from Whole Slide Images
by Ziyu Su, Usman Afzaal, Shuo Niu, Margarita Munoz de Toro, Fei Xing, Jimmy Ruiz, Metin N. Gurcan, Wencheng Li and M. Khalid Khan Niazi
Cancers 2024, 16(17), 3097; https://doi.org/10.3390/cancers16173097 - 6 Sep 2024
Viewed by 842
Abstract
Lung cancer is the leading cause of cancer-related death in the United States. Lung adenocarcinoma (LUAD) is one of the most common subtypes of lung cancer that can be treated with resection. While resection can be curative, there is a significant risk of [...] Read more.
Lung cancer is the leading cause of cancer-related death in the United States. Lung adenocarcinoma (LUAD) is one of the most common subtypes of lung cancer that can be treated with resection. While resection can be curative, there is a significant risk of recurrence, which necessitates close monitoring and additional treatment planning. Traditionally, microscopic evaluation of tumor grading in resected specimens is a standard pathologic practice that informs subsequent therapy and patient management. However, this approach is labor-intensive and subject to inter-observer variability. To address the challenge of accurately predicting recurrence, we propose a deep learning-based model to predict the 5-year recurrence of LUAD in patients following surgical resection. In our model, we introduce an innovative dual-attention architecture that significantly enhances computational efficiency. Our model demonstrates excellent performance in recurrent risk stratification, achieving a hazard ratio of 2.29 (95% CI: 1.69–3.09, p < 0.005), which outperforms several existing deep learning methods. This study contributes to ongoing efforts to use deep learning models for automatically learning histologic patterns from whole slide images (WSIs) and predicting LUAD recurrence risk, thereby improving the accuracy and efficiency of treatment decision making. Full article
(This article belongs to the Section Cancer Informatics and Big Data)
Show Figures

Figure 1

Figure 1
<p>Overview of our tumor bulk segmentation pipeline, including tumor patch prediction, tumor bulk mask refinement, and tumor bulk extraction.</p>
Full article ">Figure 2
<p>Overview of the DAMIL architecture. The tumor bulk region from a WSI is extracted and cropped into patches during the preprocessing step. All patches are further encoded into vectors by a feature extractor model. The encoded vectors go through a cross-attention module followed by a gated-attention module. The final representation is used for RC/NRC prediction.</p>
Full article ">Figure 3
<p>Kaplan–Meier plots for recurrence-free probability for 5-year follow-up.</p>
Full article ">
16 pages, 2006 KiB  
Article
Weakly Supervised Specular Highlight Removal Using Only Highlight Images
by Yuanfeng Zheng, Guangwei Hu, Hao Jiang, Hao Wang and Lihua Wu
Mathematics 2024, 12(16), 2578; https://doi.org/10.3390/math12162578 - 21 Aug 2024
Viewed by 523
Abstract
Specular highlight removal is a challenging task in the field of image enhancement, while it can significantly improve the quality of image in highlight regions. Recently, deep learning-based methods have been widely adopted in this task, demonstrating excellent performance by training on either [...] Read more.
Specular highlight removal is a challenging task in the field of image enhancement, while it can significantly improve the quality of image in highlight regions. Recently, deep learning-based methods have been widely adopted in this task, demonstrating excellent performance by training on either massive paired data, wherein both the highlighted and highlight-free versions of the same image are available, or unpaired datasets where the one-to-one correspondence is inapplicable. However, it is difficult to obtain the corresponding highlight-free version of a highlight image, as the latter has already been produced under specific lighting conditions. In this paper, we propose a method for weakly supervised specular highlight removal that only requires highlight images. This method involves generating highlight-free images from highlight images with the guidance of masks estimated using non-negative matrix factorization (NMF). These highlight-free images are then fed consecutively into a series of modules derived from a Cycle Generative Adversarial Network (Cycle-GAN)-style network, namely the highlight generation, highlight removal, and reconstruction modules in sequential order. These modules are trained jointly, resulting in a highly effective highlight removal module during the verification. On the specular highlight image quadruples (SHIQ) and the LIME datasets, our method achieves an accuracy of 0.90 and a balance error rate (BER) of 8.6 on SHIQ, and an accuracy of 0.89 and a BER of 9.1 on LIME, outperforming existing methods and demonstrating its potential for improving image quality in various applications. Full article
(This article belongs to the Special Issue Advances in Applied Mathematics in Computer Vision)
Show Figures

Figure 1

Figure 1
<p>Training process of our proposed network, including highlight generation <span class="html-italic">G</span>, highlight removal <span class="html-italic">R</span>, and reconstruction <span class="html-italic">C</span>. The NMF method estimates the highlight mask, which is used to generate a highlight-free dataset. Consistency between the original and reconstructed images ensures dataset similarity.</p>
Full article ">Figure 2
<p>Testing process overview. The highlight removal module transforms the input into a highlight-free image, which is compared with the ground truth. The reconstruction output is also evaluated.</p>
Full article ">Figure 3
<p>Highlight masks (<b>second column</b>) calculated by NMF from natural images (<b>first column</b>). Highlight-free images (<b>third column</b>) and highlight images (<b>last column</b>) are obtained by cropping with the guidance of random masks and corresponding masks crops, respectively.</p>
Full article ">Figure 4
<p>The overall architecture of the modules, including three convolutional layers, nine residual blocks, and three different deconvolutional layers.</p>
Full article ">Figure 5
<p>The architecture of the slightly adapted network trained on paired data. <span class="html-italic">I</span> and <span class="html-italic">M</span> denotes the input image and mask, respectively. <math display="inline"><semantics> <msub> <mi>I</mi> <mi>f</mi> </msub> </semantics></math> obtained from the ground truth is the corresponding highlight-free image with <math display="inline"><semantics> <msub> <mover accent="true"> <mi>I</mi> <mo>˜</mo> </mover> <mi>f</mi> </msub> </semantics></math>; <math display="inline"><semantics> <msub> <mi>I</mi> <mi>c</mi> </msub> </semantics></math> is the output of the network, which ought to be consistent with the ground truth.The other letters have been depicted in <a href="#mathematics-12-02578-f002" class="html-fig">Figure 2</a>.</p>
Full article ">Figure 6
<p>A visual comparison between our method and the state-of-the-art method is performed on the SHIQ dataset. It can be seen that our method has the advantage of recovering highlight-free images more clearly [<a href="#B10-mathematics-12-02578" class="html-bibr">10</a>,<a href="#B11-mathematics-12-02578" class="html-bibr">11</a>,<a href="#B12-mathematics-12-02578" class="html-bibr">12</a>,<a href="#B19-mathematics-12-02578" class="html-bibr">19</a>,<a href="#B22-mathematics-12-02578" class="html-bibr">22</a>,<a href="#B25-mathematics-12-02578" class="html-bibr">25</a>,<a href="#B31-mathematics-12-02578" class="html-bibr">31</a>,<a href="#B35-mathematics-12-02578" class="html-bibr">35</a>].</p>
Full article ">Figure 7
<p>Ablation study of loss functions on SHIQ dataset. (<b>a</b>): input images; (<b>b</b>–<b>e</b>) are the results of experiments without <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>G</mi> <mi>A</mi> <mi>N</mi> </mrow> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>i</mi> <mi>d</mi> <mi>e</mi> <mi>n</mi> </mrow> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>c</mi> <mi>y</mi> <mi>c</mi> <mi>l</mi> <mi>e</mi> </mrow> </msub> </semantics></math>, and <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>p</mi> <mi>i</mi> <mi>x</mi> <mi>e</mi> <mi>l</mi> </mrow> </msub> </semantics></math>, respectively; (<b>f</b>): with full loss; (<b>g</b>): ground truth.</p>
Full article ">Figure 8
<p>Failure cases of our method on the SHIQ dataset. Top row: input specular highlight images. Middle row: our removal methods. Bottom row: ground truth. (<b>a</b>,<b>b</b>) are the cases where the specular highlight covers the entire area. (<b>c</b>,<b>d</b>) are the cases where the specular highlight is caused by the glass.</p>
Full article ">
17 pages, 31026 KiB  
Article
Addressing Noisy Pixels in Weakly Supervised Semantic Segmentation with Weights Assigned
by Feng Qian, Juan Yang, Sipeng Tang, Gao Chen and Jingwen Yan
Mathematics 2024, 12(16), 2520; https://doi.org/10.3390/math12162520 - 15 Aug 2024
Viewed by 722
Abstract
Weakly supervised semantic segmentation (WSSS) aims to segment objects without a heavy burden of dense annotations. Pseudo-masks serve as supervisory information for training segmentation models, which is crucial to the performance of segmentation models. However, the generated pseudo-masks contain significant noisy labels, which [...] Read more.
Weakly supervised semantic segmentation (WSSS) aims to segment objects without a heavy burden of dense annotations. Pseudo-masks serve as supervisory information for training segmentation models, which is crucial to the performance of segmentation models. However, the generated pseudo-masks contain significant noisy labels, which leads to poor performance of the segmentation models trained on these pseudo-masks. Few studies address this issue, as these noisy labels remain inevitable even after the pseudo-masks are improved. In this paper, we propose an uncertainty-weight transform module to mitigate the impact of noisy labels on model performance. It is noteworthy that our approach is not aimed at eliminating noisy labels but rather enhancing the robustness of the model to noisy labels. The proposed method adopts a frequency-based approach to estimate pixel uncertainty. Moreover, the uncertainty of pixels is transformed into loss weights through a set of well-designed functions. After dynamically assigning weights, the model allocates attention to each pixel in a significantly differentiated manner. Meanwhile, the impact of noisy labels on model performance is weakened. Experiments validate the effectiveness of the proposed method, achieving state-of-the-art results of 69.3% on PASCAL VOC 2012 and 39.3% on MS COCO 2014, respectively. Full article
Show Figures

Figure 1

Figure 1
<p>Two-stage training pipeline for weakly supervised semantic segmentation.</p>
Full article ">Figure 2
<p>The overview of the proposed method. The blue arrows mean the multi-scaled CRF, which is leveraged for postprocessing of the prediction results. The proposed uncertainty estimation module is applied to estimate the uncertainty of these results and convert pixel uncertainty into loss weight.</p>
Full article ">Figure 3
<p>The RLF curves generated by different <math display="inline"><semantics> <mi>α</mi> </semantics></math> values.</p>
Full article ">Figure 4
<p>Curve of the function obtained by replacing the RF function with a constant value of 1. The part below <math display="inline"><semantics> <mi>α</mi> </semantics></math> is still represented by the LF function.</p>
Full article ">Figure 5
<p>Curve of the function using a constant value of 0.05 instead of the LF function. The part greater than <math display="inline"><semantics> <mi>α</mi> </semantics></math> is still represented by the RF function.</p>
Full article ">Figure 6
<p>Visualization weights of the proposed method and URN on the VOC12 val dataset. The first column (<b>a</b>) shows the input images with ground truth, the second column (<b>b</b>) shows the input images with pseudo-masks, the third column (<b>c</b>) shows the segmentation results of URN, and the fourth column (<b>d</b>) shows the segmentation results of the proposed method.</p>
Full article ">Figure 7
<p>Visualization results of the proposed method and URN on the VOC12 val dataset. The first column (<b>a</b>) shows the input images with ground truth, the second column (<b>b</b>) shows the segmentation results of URN, and the third column (<b>c</b>) shows the segmentation results of our method.</p>
Full article ">Figure 8
<p>Visualization results of our method and URN on the COCO 2014 val dataset. The first column (<b>a</b>) shows the input images with ground truth, the second column (<b>b</b>) shows the segmentation results of URN, and the third column (<b>c</b>) shows the segmentation results of the proposed method on COCO.</p>
Full article ">
21 pages, 4569 KiB  
Article
Pairwise-Constraint-Guided Multi-View Feature Selection by Joint Sparse Regularization and Similarity Learning
by Jinxi Li and Hong Tao
Mathematics 2024, 12(14), 2278; https://doi.org/10.3390/math12142278 - 21 Jul 2024
Viewed by 799
Abstract
Feature selection is a basic and important step in real applications, such as face recognition and image segmentation. In this paper, we propose a new weakly supervised multi-view feature selection method by utilizing pairwise constraints, i.e., the pairwise constraint-guided multi-view f [...] Read more.
Feature selection is a basic and important step in real applications, such as face recognition and image segmentation. In this paper, we propose a new weakly supervised multi-view feature selection method by utilizing pairwise constraints, i.e., the pairwise constraint-guided multi-view feature selection (PCFS for short) method. In this method, linear projections of all views and a consistent similarity graph with pairwise constraints are jointly optimized to learning discriminative projections. Meanwhile, the l2,0-norm-based row sparsity constraint is imposed on the concatenation of projections for discriminative feature selection. Then, an iterative algorithm with theoretically guaranteed convergence is developed for the optimization of PCFS. The performance of the proposed PCFS method was evaluated by comprehensive experiments on six benchmark datasets and applications on cancer clustering. The experimental results demonstrate that PCFS exhibited competitive performance in feature selection in comparison with related models. Full article
Show Figures

Figure 1

Figure 1
<p>This is the workflow of PCFS, where the sample images come from Ref. [<a href="#B29-mathematics-12-02278" class="html-bibr">29</a>].</p>
Full article ">Figure 2
<p>Average ACC (%) scores of different baseline methods with different percentages of selected features on real datasets.</p>
Full article ">Figure 3
<p>Average NMI (%) scores of different baseline methods with different percentages of selected features on real datasets.</p>
Full article ">Figure 4
<p>Empirical convergence analysis. The objective function value of PCFS is illustrated as the number of iterations increased.</p>
Full article ">Figure 5
<p>The average performance (with respect to ACC (%)) scores of PCFS with varying <math display="inline"><semantics> <mi>γ</mi> </semantics></math> and <math display="inline"><semantics> <mi>β</mi> </semantics></math> while fixing <math display="inline"><semantics> <mrow> <mi>r</mi> <mo>=</mo> <mn>0.1</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 6
<p>Average ACC (%) scores of different baseline methods with different percentages of selected features on cancer datasets.</p>
Full article ">Figure 7
<p>Average NMI (%) scores of different baseline methods with different percentages of selected features on cancer datasets.</p>
Full article ">
22 pages, 8686 KiB  
Article
Weakly Supervised Transformer for Radar Jamming Recognition
by Menglu Zhang, Yushi Chen and Ye Zhang
Remote Sens. 2024, 16(14), 2541; https://doi.org/10.3390/rs16142541 - 10 Jul 2024
Viewed by 816
Abstract
Radar jamming recognition is a key step in electronic countermeasures, and accurate and sufficient labeled samples are essential for supervised learning-based recognition methods. However, in real practice, collected radar jamming samples often have weak labels (i.e., noisy-labeled or unlabeled ones), which degrade recognition [...] Read more.
Radar jamming recognition is a key step in electronic countermeasures, and accurate and sufficient labeled samples are essential for supervised learning-based recognition methods. However, in real practice, collected radar jamming samples often have weak labels (i.e., noisy-labeled or unlabeled ones), which degrade recognition performance. Additionally, recognition performance is hindered by limitations in capturing the global features of radar jamming. The Transformer (TR) has advantages in modeling long-range relationships. Therefore, a weakly supervised Transformer is proposed to address the issues of performance degradation under weak supervision. Specifically, complementary label (CL) TR, called RadarCL-TR, is proposed to improve radar jamming recognition accuracy with noisy samples. CL learning and a cleansing module are successively utilized to detect and remove potentially noisy samples. Thus, the adverse influence of noisy samples is mitigated. Additionally, semi-supervised learning (SSL) TR, called RadarSSL-PL-TR, is proposed to boost recognition performance under unlabeled samples via pseudo labels (PLs). Network generalization is improved by training with pseudo-labeling unlabeled samples. Moreover, the RadarSSL-PL-S-TR is proposed to further promote recognition performance, where a selection module identifies reliable pseudo-labeling samples. The experimental results show that the proposed RadarCL-TR and RadarSSL-PL-S-TR outperform comparison methods in recognition accuracy by at least 7.07% and 6.17% with noisy and unlabeled samples, respectively. Full article
(This article belongs to the Topic Radar Signal and Data Processing with Applications)
Show Figures

Figure 1

Figure 1
<p>An overview of the proposed RadarCL-TR framework for radar jamming recognition with noisy samples.</p>
Full article ">Figure 2
<p>The Transformer architecture for extracting features of radar jamming.</p>
Full article ">Figure 3
<p>An overview of the proposed RadarSSL-PL-S-TR framework for radar jamming recognition with labeled and unlabeled samples.</p>
Full article ">Figure 4
<p>Time−domain waveforms of radar jamming.</p>
Full article ">Figure 5
<p>(<b>a</b>–<b>c</b>) and (<b>d</b>–<b>f</b>) are the distribution histograms of the output probability under RadarCL-CNN and RadarCL-TR during the early, middle, and later stages of the training process, respectively.</p>
Full article ">Figure 6
<p>Radar jamming recognition accuracy for each class with 20 training samples at various ratios of symmetric noise.</p>
Full article ">Figure 7
<p>t-SNE visualization. (<b>a</b>) CNN, (<b>b</b>) GCE-CNN, (<b>c</b>) SCE-CNN, (<b>d</b>) RadarCL-CNN, (<b>e</b>) TR, (<b>f</b>) GCE-TR, (<b>g</b>) SCE-TR, and (<b>h</b>) RadarCL-TR.</p>
Full article ">Figure 8
<p>A parameter sensitivity analysis of the weight in the proposed RadarSSL-PL-TR and Ra-darSSL-PL-S-TR methods.</p>
Full article ">Figure 9
<p>t-SNE visualization. (<b>a</b>) CNN, (<b>b</b>) AROC-CNN, (<b>c</b>) RadarSSL-PL-CNN, (<b>d</b>) RadarSSL-PL-S-CNN, (<b>e</b>) TR, (<b>f</b>) AROC-TR, (<b>g</b>) RadarSSL-PL-TR, and (<b>h</b>) RadarSSL-PL-S-TR.</p>
Full article ">
15 pages, 3271 KiB  
Article
A 2.5D Self-Training Strategy for Carotid Artery Segmentation in T1-Weighted Brain Magnetic Resonance Images
by Adriel Silva de Araújo, Márcio Sarroglia Pinho, Ana Maria Marques da Silva, Luis Felipe Fiorentini and Jefferson Becker
J. Imaging 2024, 10(7), 161; https://doi.org/10.3390/jimaging10070161 - 3 Jul 2024
Viewed by 1122
Abstract
Precise annotations for large medical image datasets can be time-consuming. Additionally, when dealing with volumetric regions of interest, it is typical to apply segmentation techniques on 2D slices, compromising important information for accurately segmenting 3D structures. This study presents a deep learning pipeline [...] Read more.
Precise annotations for large medical image datasets can be time-consuming. Additionally, when dealing with volumetric regions of interest, it is typical to apply segmentation techniques on 2D slices, compromising important information for accurately segmenting 3D structures. This study presents a deep learning pipeline that simultaneously tackles both challenges. Firstly, to streamline the annotation process, we employ a semi-automatic segmentation approach using bounding boxes as masks, which is less time-consuming than pixel-level delineation. Subsequently, recursive self-training is utilized to enhance annotation quality. Finally, a 2.5D segmentation technique is adopted, wherein a slice of a volumetric image is segmented using a pseudo-RGB image. The pipeline was applied to segment the carotid artery tree in T1-weighted brain magnetic resonance images. Utilizing 42 volumetric non-contrast T1-weighted brain scans from four datasets, we delineated bounding boxes around the carotid arteries in the axial slices. Pseudo-RGB images were generated from these slices, and recursive segmentation was conducted using a Res-Unet-based neural network architecture. The model’s performance was tested on a separate dataset, with ground truth annotations provided by a radiologist. After recursive training, we achieved an Intersection over Union (IoU) score of (0.68 ± 0.08) on the unseen dataset, demonstrating commendable qualitative results. Full article
Show Figures

Figure 1

Figure 1
<p>Pairs of 2.5D pseudo-RGB MR slices (<b>left</b>) and bounding boxes (<b>right</b>).</p>
Full article ">Figure 1 Cont.
<p>Pairs of 2.5D pseudo-RGB MR slices (<b>left</b>) and bounding boxes (<b>right</b>).</p>
Full article ">Figure 2
<p>TR1-weighted MR slices and validated annotations of the carotid arteries.</p>
Full article ">Figure 3
<p>Overlap of the bounding boxes in different sections of the carotid arteries. The bounding boxes in the C2–C3 portions (<b>right</b>) have a larger area than those in the C1 portion (<b>left</b>).</p>
Full article ">Figure 4
<p>Histogram showing the frequency of bounding box areas in the training data. Lower areas are prevalent. The dotted line represents the mean + 1 standard deviation.</p>
Full article ">Figure 5
<p>A variant of the U-net architecture with residual connections (Res-Unet).</p>
Full article ">Figure 6
<p>Comparison of the 2.5D approach (blue) with 2D slice segmentation (orange) using the same mask update technique.</p>
Full article ">Figure 7
<p>Collection of 2.5D pseudo-RGB slices and their predictions using the CNNs of the last round of training. The rightmost column shows the errors in the mask, purple representing false positive pixels and green being false negatives. True positives remain white.</p>
Full article ">Figure 7 Cont.
<p>Collection of 2.5D pseudo-RGB slices and their predictions using the CNNs of the last round of training. The rightmost column shows the errors in the mask, purple representing false positive pixels and green being false negatives. True positives remain white.</p>
Full article ">Figure 8
<p>Differences in carotid arteries contrast in brain T1-weighted MR images in two subjects.</p>
Full article ">
Back to TopTop