Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (282)

Search Parameters:
Keywords = building masks

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 7038 KiB  
Article
Activation of Polypropylene (PP) Fiber Surface with 1-Vinyl-1,2,4-triazole and Vinyl Acetate: Synthesis, Characterization, and Application in Cementitious Systems
by Yahya Kaya, Petek Balcı, Süleyman Özen, Ali Mardani and Ali Kara
Materials 2025, 18(5), 1071; https://doi.org/10.3390/ma18051071 - 27 Feb 2025
Viewed by 251
Abstract
Recently, the potential of recycled materials to improve the performance of concrete and other building materials has become an important research topic. It is known that various methods are applied to improve the tensile strength and energy absorption capacity of cementitious systems. One [...] Read more.
Recently, the potential of recycled materials to improve the performance of concrete and other building materials has become an important research topic. It is known that various methods are applied to improve the tensile strength and energy absorption capacity of cementitious systems. One of the most common of these methods is the addition of fibers to the mixture. In this study, the effects of surface-modified polypropylene (PP) fibers obtained from recycled masks on the mechanical properties of mortar mixtures were investigated. In order to improve the matrix–fiber interface performance, 6 mm and 12 mm long recycled PP fibers were chemically coated within the scope of surface modification using 1-Vinyl-1,2,4-Triazole and Vinyl Acetate. With this modification made on the surface of PP fibers, we aimed to increase the surface roughness of the fibers and improve their adhesion to the matrix. Thus, we aimed to increase the mechanical properties of mortar mixtures as a result of the fibers performing more effectively in the concrete matrix. FTIR AND SEM-EDS analyses confirmed the success of the modification and the applicability of 1-Vinyl-1,2,4-Triazole and Vinyl Acetate to the fiber surface and showed that the fibers were successfully modified. It is seen that the fibers modified with Vinyl Acetate exhibit superior performance in terms of both the workability and strength performance of cementitious systems compared to the fibers modified with 1-Vinyl-1,2,4-Triazole. This study provides a significant contribution to sustainable construction materials by revealing the potential of using recycled materials in cementitious systems. Full article
(This article belongs to the Special Issue New Advances in Cement and Concrete Research2nd Edition)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Gradation curve of the fine aggregate used compared to ASTM C33 standard limits.</p>
Full article ">Figure 2
<p>PP fibers of 6 mm (<b>a</b>) and 12 mm (<b>b</b>) in length.</p>
Full article ">Figure 3
<p>The synthesis steps, preparation of chemical solutions, surface activation, and drying processes for activating the PP fiber surface using 1-Vinyl-1,2,4-Triazole and Vinyl Acetate monomers are illustrated in the figure. (<b>a</b>) Weighing of chemicals. (<b>b</b>) Dissolution of chemicals on a magnetic stirrer. (<b>c</b>) Weighing of PP fibers. (<b>d</b>) Opening the hot water bath before the reaction. (<b>e</b>) Addition of pp fibers and chemicals into glass reactors, respectively. (<b>f</b>) Reaction start moment. (<b>g</b>) Drying stage of PP fibers in the oven. (<b>h</b>) Appearance of ready-to-use pp fibers after the drying stage.</p>
Full article ">Figure 4
<p>FTIR graph of PP fiber activated with 1-Vinyl-1,2,4-Triazole and untreated PP fiber.</p>
Full article ">Figure 5
<p>SEM image and EDS graph of PP fiber with 1-Vinyl-1,2,4-Triazole.</p>
Full article ">Figure 6
<p>FTIR graph of PP fiber activated with Vinyl Acetate and untreated PP fiber.</p>
Full article ">Figure 7
<p>SEM image and EDS graph of PP fiber with Vinyl Acetate.</p>
Full article ">Figure 8
<p>Relative PCE requirement of mixtures.</p>
Full article ">Figure 9
<p>Air entrainment mechanism of roughened surface fibers during mixing [<a href="#B21-materials-18-01071" class="html-bibr">21</a>].</p>
Full article ">Figure 10
<p>(<b>a</b>) Relative compressive strength results of the mixtures. (<b>b</b>) Relative flexural strength results of the mixtures.</p>
Full article ">Figure 10 Cont.
<p>(<b>a</b>) Relative compressive strength results of the mixtures. (<b>b</b>) Relative flexural strength results of the mixtures.</p>
Full article ">Figure 11
<p>(<b>a</b>) Adherence of unmodified PP fiber. (<b>b</b>) Adherence of VAPP fiber.</p>
Full article ">
29 pages, 7485 KiB  
Article
SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale Benchmark
by Ruolin Yang, Da Li, Conghui Hu and Honggang Zhang
Appl. Sci. 2025, 15(4), 1751; https://doi.org/10.3390/app15041751 - 9 Feb 2025
Viewed by 518
Abstract
In this paper, we propose sketch-based video object segmentation (SKVOS), a novel task that segments objects consistently across video frames using human-drawn sketches as queries. Traditional reference-based methods, such as photo masks and language descriptions, are commonly used for segmentation. Photo masks provide [...] Read more.
In this paper, we propose sketch-based video object segmentation (SKVOS), a novel task that segments objects consistently across video frames using human-drawn sketches as queries. Traditional reference-based methods, such as photo masks and language descriptions, are commonly used for segmentation. Photo masks provide high precision but are labor intensive, limiting scalability. While language descriptions are easy to provide, they often lack the specificity needed to distinguish visually similar objects within a frame. Despite their simplicity, sketches capture rich, fine-grained details of target objects and can be rapidly created, even by non-experts, making them an attractive alternative for segmentation tasks. We introduce a new approach that utilizes sketches as efficient and informative references for video object segmentation. To evaluate sketch-guided segmentation, we introduce a new benchmark consisting of three datasets: Sketch-DAVIS16, Sketch-DAVIS17, and Sketch-YouTube-VOS. Building on a memory-based framework for semi-supervised video object segmentation, we explore effective strategies for integrating sketch-based references. To ensure robust spatiotemporal coherence, we introduce two key innovations: the Temporal Relation Module and Sketch-Anchored Contrastive Learning. These modules enhance the model’s ability to maintain consistency both across time and across different object instances. Our method is evaluated on the Sketch-VOS benchmark, demonstrating superior performance with overall improvements of 1.9%, 3.3%, and 2.0% over state-of-the-art methods on the Sketch-YouTube-VOS, Sketch-DAVIS 2016, and Sketch-DAVIS 2017 validation sets, respectively. Additionally, on the YouTube-VOS validation set, our method outperforms the leading language-based VOS approach by 10.1%. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Semantic Segmentation, 2nd Edition)
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Motivation: a comparison example between three different annotation types for the Semi-VOS task. (1) Mask reference. (2) Language reference. (3) Sketch reference (Ours). In this case, pixel-level mask annotation is time-consuming and resource intensive, while language annotation has clear ambiguities. Sketch, on the other hand, can balance the annotation cost, providing more intuitive visual and semantic information compared to text. (<b>b</b>) The previous vanilla SKVOS recipe relied solely on the interaction between the sketch and the first frame, using memory to assist in generating the target mask (the black curved arrows). (<b>c</b>) Our approach not only utilizes the sketch to establish associations across multiple frames (the orange curved arrows) but also leverages information from the previous frame to improve segmentation performance for targets of varying scales (the blue curved arrow).</p>
Full article ">Figure 2
<p>Examples from our Sketch-VOS dataset, where the same target is depicted at different drawing levels: (<b>a</b>) good sketches capture the shape and details of the target more precisely, while (<b>b</b>) poor sketches roughly depict the target, reflecting the limited drawing skills of the annotators. (<b>c</b>) shows the video frames corresponding to the sketch.</p>
Full article ">Figure 3
<p>Examples of sketch annotations in different styles.</p>
Full article ">Figure 4
<p>The overview of the proposed SKVOS architecture. Initially, a fusion module integrates the sketch annotation with the first frame to accurately localize the target object, serving as a reference indicator. Subsequently, all memory frames, along with the indicator results, are propagated through the memory-based VOS model pipeline. Concurrently, the query frame and adjacent frames are processed by a Temporal Relation Module (TRM), which enhances the space-time coherence between adjacent frames at different scales. Additionally, we introduce Sketch-Anchored Temporal Contrastive Learning (STCL) to reduce feature variations of the same target object under different representations while distinguishing features of different objects. Finally, the fused feature maps are decoded by a hierarchical mask decoder to iteratively refine mask predictions until all frames converge.</p>
Full article ">Figure 5
<p>Sketch–video fusion designs: (<b>a</b>) concatenation, (<b>b</b>) convolution weight, (<b>c</b>) Cross-KV, and (<b>d</b>) Cross-Q.</p>
Full article ">Figure 6
<p>Implementation of Temporal Relation Module.</p>
Full article ">Figure 7
<p>Training cycles in video of Sketch-Anchored Temporal Contrastive Learning. The arrows show the temporal relationships between objects across frames, and the boxes indicate the objects’ positions in the video frames. Different colors correspond to different objects: red for <math display="inline"><semantics> <msup> <mi>O</mi> <mn>1</mn> </msup> </semantics></math>, yellow for <math display="inline"><semantics> <msup> <mi>O</mi> <mn>2</mn> </msup> </semantics></math>, and green for <math display="inline"><semantics> <msup> <mi>O</mi> <mn>3</mn> </msup> </semantics></math>.</p>
Full article ">Figure 8
<p>Visualized sketch queries and corresponding feature maps weighted by attention on the Sketch-YouTube-VOS validation set.</p>
Full article ">Figure 9
<p>Qualitative results on the Sketch-YouTube-VOS validation set. Best viewed in color.</p>
Full article ">Figure 10
<p>Visual comparison with ReferFormer [<a href="#B34-applsci-15-01751" class="html-bibr">34</a>] and SgMg [<a href="#B35-applsci-15-01751" class="html-bibr">35</a>] on the YouTube-VOS validation set. Best viewed in color. Red boxes indicate segmentation errors, while green boxes indicate correct segmentation.</p>
Full article ">Figure 11
<p>Example results of ablation studies. The comparison of segmentation results in details is highlighted with red boxes.</p>
Full article ">Figure 12
<p>Example results of different style sketches.</p>
Full article ">Figure 13
<p>Example results of different painting-level sketches. The comparison of segmentation results in details is highlighted with red boxes.</p>
Full article ">
18 pages, 12390 KiB  
Article
DeiT and Image Deep Learning-Driven Correction of Particle Size Effect: A Novel Approach to Improving NIRS-XRF Coal Quality Analysis Accuracy
by Jiaxin Yin, Ruonan Liu, Wangbao Yin, Suotang Jia and Lei Zhang
Sensors 2025, 25(3), 928; https://doi.org/10.3390/s25030928 - 4 Feb 2025
Viewed by 552
Abstract
Coal, as a vital global energy resource, directly impacts the efficiency of power generation and environmental protection. Thus, rapid and accurate coal quality analysis is essential to promote its clean and efficient utilization. However, combined near-infrared spectroscopy and X-ray fluorescence (NIRS-XRF) spectroscopy often [...] Read more.
Coal, as a vital global energy resource, directly impacts the efficiency of power generation and environmental protection. Thus, rapid and accurate coal quality analysis is essential to promote its clean and efficient utilization. However, combined near-infrared spectroscopy and X-ray fluorescence (NIRS-XRF) spectroscopy often suffer from the particle size effect of coal samples, resulting in unstable and inaccurate analytical outcomes. This study introduces a novel correction method combining the Segment Anything Model (SAM) for precise particle segmentation and Data-Efficient Image Transformers (DeiTs) to analyze the relationship between particle size and ash measurement errors. Microscopic images of coal samples are processed with SAM to generate binary mask images reflecting particle size characteristics. These masks are analyzed using the DeiT model with transfer learning, building an effective correction model. Experiments show a 22% reduction in standard deviation (SD) and root mean square error (RMSE), significantly enhancing ash prediction accuracy and consistency. This approach integrates cutting-edge image processing and deep learning, effectively reducing submillimeter particle size effects, improving model adaptability, and enhancing measurement reliability. It also holds potential for broader applications in analyzing complex samples, advancing automation and efficiency in online analytical systems, and driving innovation across industries. Full article
(This article belongs to the Special Issue Deep Learning for Perception and Recognition: Method and Applications)
Show Figures

Figure 1

Figure 1
<p>NIRS-XRF combined coal quality analysis setup for particle size effect correction (CCD: charge-coupled device, FTIR: Fourier-transform infrared spectroscopy, HV Power: high voltage power, HG: hydrogen generator, BW: beryllium window, CM: collimator, SDD: silicon drift detector, PLC: programmable logic controller).</p>
Full article ">Figure 2
<p>Schematic diagram of the sample cell and the corresponding magnified image showing the measurement area.</p>
Full article ">Figure 3
<p>Overall construction process of the particle size effect correction model.</p>
Full article ">Figure 4
<p>Basic structure of the SAM model.</p>
Full article ">Figure 5
<p>Comparison of different segmentation methods. (<b>a</b>) Coal sample original microscopic image; (<b>b</b>) Watershed segmentation using convex hull analysis; (<b>c</b>) SAM segmentation.</p>
Full article ">Figure 6
<p>Teacher–student distillation training in DeiT model.</p>
Full article ">Figure 7
<p>NIRS spectra of the same coal sample leveled repeatedly with different particle sizes.</p>
Full article ">Figure 8
<p>XRF energy spectra of the same coal sample leveled repeatedly with different particle sizes.</p>
Full article ">Figure 9
<p>Image processing workflow.</p>
Full article ">Figure 10
<p>Comparison of coal particle images in the dataset.</p>
Full article ">Figure 11
<p>Comparison of standard deviation (SD) before and after correction.</p>
Full article ">Figure 12
<p>Comparison of root mean square error (RMSE) before and after correction.</p>
Full article ">
18 pages, 5098 KiB  
Article
Research on Energy Efficiency Evaluation System for Rural Houses Based on Improved Mask R-CNN Network
by Liping He, Kun Gao, Yuan Jin, Zhechen Shen, Yane Li, Fang’ai Chi and Meiyan Wang
Sustainability 2025, 17(3), 1132; https://doi.org/10.3390/su17031132 - 30 Jan 2025
Viewed by 603
Abstract
This study addresses the issue of energy efficiency evaluation for rural residential buildings and proposes a method for facade recognition based on an improved Mask R-CNN network model. By introducing the Coordinate Attention (CA) mechanism module, the quality of feature extraction and detection [...] Read more.
This study addresses the issue of energy efficiency evaluation for rural residential buildings and proposes a method for facade recognition based on an improved Mask R-CNN network model. By introducing the Coordinate Attention (CA) mechanism module, the quality of feature extraction and detection accuracy is enhanced. Experimental results demonstrate that this method effectively recognizes and segments windows, doors, and other components on building facades, accurately extracting key information, such as their dimensions and positions. For energy consumption simulation, this study utilized the Ladybug Tool in the Grasshopper plugin, combined with actual collected facade data, to assess and simulate the energy consumption of rural residences. By setting building envelope parameters and air conditioning operating parameters, detailed calculations of energy consumption for different orientations, window-to-wall ratios, and sunshade lengths were performed. The results show that the improved Mask R-CNN network model plays a crucial role in quickly and accurately extracting building parameters, providing reliable data support for energy consumption evaluation. Finally, through case studies, specific energy-saving retrofit suggestions were proposed, offering robust technical support and practical guidance for energy optimization in rural residences. Full article
Show Figures

Figure 1

Figure 1
<p>Building facade images taken from different angles and under various lighting conditions. (<b>a</b>) Side view image. (<b>b</b>) Front view image. (<b>c</b>) Side view image with high brightness. (<b>d</b>) Side view image with low brightness.</p>
Full article ">Figure 2
<p>Laboratory room model.</p>
Full article ">Figure 3
<p>Orientation interval diagram.</p>
Full article ">Figure 4
<p>(<b>a</b>) Window-to-wall ratio. (<b>b</b>) Shading length.</p>
Full article ">Figure 5
<p>CA and Mask R-CNN structure diagram.</p>
Full article ">Figure 6
<p>ResNet + CA.</p>
Full article ">Figure 7
<p>Experimental comparison between Mask R-CNN and common networks.</p>
Full article ">Figure 8
<p>Recognition results of building facades from different angles and brightness levels. (<b>a</b>) Side view recognition results. (<b>b</b>) Front view recognition results. (<b>c</b>) Side view recognition results under high brightness. (<b>d</b>) Side view recognition results under low brightness.</p>
Full article ">Figure 9
<p>Annual Meteorological Data for Hangzhou.</p>
Full article ">Figure 10
<p>Contour maps of energy consumption intensity corresponding to various parameters.</p>
Full article ">Figure 11
<p>(<b>a</b>) South elevation (<b>b</b>) East elevation (<b>c</b>) North elevation (<b>d</b>) West elevation.</p>
Full article ">Figure 12
<p>Building facade identification results. (<b>a</b>) South elevation (<b>b</b>) East elevation (<b>c</b>) North elevation (<b>d</b>) West elevation.</p>
Full article ">
15 pages, 7834 KiB  
Article
A Feature Map Fusion Self-Distillation Scheme for Image Classification Networks
by Zhenkai Qin, Shuiping Ni, Mingfu Zhu, Yue Jia, Shangxin Liu and Yawei Chen
Electronics 2025, 14(1), 182; https://doi.org/10.3390/electronics14010182 - 4 Jan 2025
Viewed by 602
Abstract
Self-distillation has been widely applied in the field of deep learning. However, the lack of interaction between the multiple shallow branches in the self-distillation framework reduces the effectiveness of self-distillation methods. To address this issue, a feature map fusion self-distillation scheme is proposed. [...] Read more.
Self-distillation has been widely applied in the field of deep learning. However, the lack of interaction between the multiple shallow branches in the self-distillation framework reduces the effectiveness of self-distillation methods. To address this issue, a feature map fusion self-distillation scheme is proposed. According to the depth of the teacher model, multiple shallow branches as student models are constructed to build a self-distillation framework. Then, the feature map fusion module fuses the intermediate feature maps of each branch to enhance the interaction between the branches. Specifically, this fusion module employs a spatial enhancement module to generate attention masks for multiple feature maps, which are averaged and applied to create intermediate maps. The mean of these intermediate maps results in the final fusion feature map. The experimental findings on the CIFAR10 and CIFAR100 datasets illustrate that our proposed technique has clear advantages in increasing the classification accuracy of the deep learning models. On average, 0.7% and 2.5% accuracy boosts are observed on the CIFAR10 and CIFAR100. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Structure of the feature map fusion module.</p>
Full article ">Figure 2
<p>Structure of the shallow module.</p>
Full article ">Figure 3
<p>ResNet18 equipped with the proposed feature map fusion self-distillation.</p>
Full article ">Figure 4
<p>Visualization of feature maps using two attention modules.</p>
Full article ">Figure 5
<p>Visualization of prefusion and fusion feature maps. As marked in <a href="#electronics-14-00182-f003" class="html-fig">Figure 3</a>, A corresponds to the feature map A, B corresponds to the feature map B, C corresponds to the feature map C, and D corresponds to the feature map D.</p>
Full article ">
19 pages, 6643 KiB  
Article
High-Precision Recognition Algorithm for Equipment Defects Based on Mask R-CNN Algorithm Framework in Power System
by Mingyong Xin, Changbao Xu, Jipu Gao, Yu Wang and Bo Wang
Processes 2024, 12(12), 2940; https://doi.org/10.3390/pr12122940 - 23 Dec 2024
Viewed by 522
Abstract
In current engineering applications, target detection based on power vision neural networks has problems with low accuracy and difficult defect recognition. Thus, this paper proposes a high-precision substation equipment defect recognition algorithm based on the Mask R-CNN algorithm framework to achieve high-precision substation [...] Read more.
In current engineering applications, target detection based on power vision neural networks has problems with low accuracy and difficult defect recognition. Thus, this paper proposes a high-precision substation equipment defect recognition algorithm based on the Mask R-CNN algorithm framework to achieve high-precision substation equipment defect monitoring. The effectiveness of the Mask R-CNN algorithm is compared and analyzed in substation equipment defect recognition and the applicability of the Mask R-CNN algorithm in edge computing. According to different types of substation equipment defect characteristics, substation equipment defect recognition guidelines were developed. The guideline helps to calibrate the existing training set and build defect recognition models for substation equipment based on different algorithms. In the end, the system based on a power edge vision neural network was built. The feasibility and accuracy of the algorithm was verified by model training and actual target detection results. Full article
(This article belongs to the Section Process Control and Monitoring)
Show Figures

Figure 1

Figure 1
<p>Substation equipment defect recognition algorithm flowchart.</p>
Full article ">Figure 2
<p>The improved FPN network (GFPN) architecture.</p>
Full article ">Figure 3
<p>The fusion module.</p>
Full article ">Figure 4
<p>Schematic diagram of CBAM structure.</p>
Full article ">Figure 5
<p>Schematic diagram of channel attention sub-module structure.</p>
Full article ">Figure 6
<p>Schematic diagram of spatial attention sub-module structure.</p>
Full article ">Figure 7
<p>CNN weight pruning.</p>
Full article ">Figure 8
<p>Infrared diagram of joint heating defects (<b>a</b>–<b>c</b>).</p>
Full article ">Figure 9
<p>Transformer oil leakage defects.</p>
Full article ">Figure 10
<p>GAN flowchart.</p>
Full article ">Figure 11
<p>A specific example of IoU.</p>
Full article ">Figure 12
<p>Example of sample set of substation equipment.</p>
Full article ">Figure 13
<p>Label method schematic.</p>
Full article ">Figure 14
<p>Substation equipment recognition results.</p>
Full article ">Figure 15
<p>Oil leakage defect sample.</p>
Full article ">Figure 16
<p>Defect recognition example.</p>
Full article ">Figure 17
<p>Precision of defect recognition under different determination thresholds.</p>
Full article ">Figure 18
<p>Comparison of improved algorithm and manual precision.</p>
Full article ">
23 pages, 18600 KiB  
Article
Cross-Modality Data Augmentation for Aerial Object Detection with Representation Learning
by Chiheng Wei, Lianfa Bai, Xiaoyu Chen and Jing Han
Remote Sens. 2024, 16(24), 4649; https://doi.org/10.3390/rs16244649 - 12 Dec 2024
Viewed by 687
Abstract
Data augmentation methods offer a cost-effective and efficient alternative to the acquisition of additional data, significantly enhancing data diversity and model generalization, making them particularly favored in object detection tasks. However, existing data augmentation techniques primarily focus on the visible spectrum and are [...] Read more.
Data augmentation methods offer a cost-effective and efficient alternative to the acquisition of additional data, significantly enhancing data diversity and model generalization, making them particularly favored in object detection tasks. However, existing data augmentation techniques primarily focus on the visible spectrum and are directly applied to RGB-T object detection tasks, overlooking the inherent differences in image data between the two tasks. Visible images capture rich color and texture information during the daytime, while infrared images are capable of imaging under low-light complex scenarios during the nighttime. By integrating image information from both modalities, their complementary characteristics can be exploited to improve the overall effectiveness of data augmentation methods. To address this, we propose a cross-modality data augmentation method tailored for RGB-T object detection, leveraging masked image modeling within representation learning. Specifically, we focus on the temporal consistency of infrared images and combine them with visible images under varying lighting conditions for joint data augmentation, thereby enhancing the realism of the augmented images. Utilizing the masked image modeling method, we reconstruct images by integrating multimodal features, achieving cross-modality data augmentation in feature space. Additionally, we investigate the differences and complementarities between data augmentation methods in data space and feature space. Building upon existing theoretical foundations, we propose an integrative framework that combines these methods for improved augmentation effectiveness. Furthermore, we address the slow convergence observed with the existing Mosaic method in aerial imagery by introducing a multi-scale training strategy and proposing a full-scale Mosaic method as a complement. This optimization significantly accelerates network convergence. The experimental results validate the effectiveness of our proposed method and highlight its potential for further advancements in cross-modality object detection tasks. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The label errors that may be introduced by data augmentation methods based on the data space. Figure (<b>a</b>) shows an example of a label error in an image classification task: after aggressive cropping, the image’s class changes from “bicycle” to “wheel”. Figure (<b>b</b>) provides an example of a label error in an object detection task, where after applying cropping, the retained portions of two different objects exhibit a high degree of class similarity, potentially introducing a labeling error.</p>
Full article ">Figure 2
<p>The cross-modality data augmentation framework in the feature space consists of an object filtering and editing module and an RGBTMAE reconstruction module. This figure uses the object replacement mode as an example of the object filtering and editing module. In the object filtering and editing module, a classifier is trained to focus on foreground objects, filtering and editing them based on confidence scores. In the RGBTMAE reconstruction module, the correlation between different modality images is utilized to reconstruct the edited images, thereby reducing unnatural transitions between the foreground and background after object editing. In the figure, (<b>a</b>) represents the input raw image, (<b>b</b>) illustrates the image processed through the object filtering and editing module, and (<b>c</b>) shows the reconstructed image produced by the data augmentation method.</p>
Full article ">Figure 3
<p>Comparison of original and processed visible images using different foreground operations in the object filtering and editing module. Images marked in red represent the original input, while those marked in green depict the processed results: (<b>a</b>) shows the outcome of object removal, (<b>b</b>) illustrates object replacement results, and (<b>c</b>) demonstrates the object copy–pasting results.</p>
Full article ">Figure 4
<p>The training and reconstruction phases of the RGBTMAE reconstruction module. During the training phase, the network framework takes two masked modality images as input, with the original images serving as the reconstruction target. In the reconstruction phase, the network input consists of visible and infrared images, where the foreground has been edited. Infrared image information, unaffected by lighting variations, is fully retained, while masked regions cover the replaced foreground areas in the visible images. After reconstruction by the network and subsequent image post-processing, a new set of reconstructed images is obtained.</p>
Full article ">Figure 5
<p>The structure of the RGBT oriented object detection network based on RoI Transformer. The middle fusion strategy is employed to balance fusion performance and model complexity.</p>
Full article ">Figure 6
<p>The effects of different data augmentation methods on the DroneVehicle dataset. (<b>a</b>) shows the original visible and infrared images; (<b>b</b>) illustrates the image augmentation effect of the CutOut method, where portions of the objects are obscured by masks; (<b>c</b>) presents the image augmentation effect of the MixUp method, showing darkened tones in the visible images and overlapping of some objects; (<b>d</b>) demonstrates the image augmentation effect of our proposed full-scale Mosaic method, which significantly enhances the training efficiency of the network. The second row depicts image augmentation methods based on feature space, where visible images are reconstructed from infrared images for data augmentation; (<b>e</b>) shows the image reconstruction results of the CycleGAN method; (<b>f</b>) presents the results of the Pixel2PixelHD method; (<b>g</b>) illustrates the reconstruction results of the DR-AVIT method; and (<b>h</b>) shows the augmentation effect of our proposed cross-modality data augmentation method based on representation learning. It can be observed that other feature-space-based augmentation methods result in reconstructed images with significant deviations from the distribution of real images. In contrast, our method edits the foreground objects while preserving the background of real images as much as possible. The green highlighted regions in the figure indicate the effects of foreground editing, while the red highlighted regions correspond to the respective regions in the original image. In the subsequent experiments, we will integrate methods (<b>d</b>,<b>h</b>) to evaluate the performance of combining both data augmentation approaches.</p>
Full article ">Figure 7
<p>Visual comparison of different feature-space data augmentation methods on dual-modality datasets. The first row presents the reconstruction results on the DroneVehicle dataset, while the second row shows the reconstruction results on the VEDAI dataset. “RGB” denotes the corresponding real visible images. Highlighted regions indicate slight color discrepancies between the reconstruction results of the RGBTMAE method and the real images.</p>
Full article ">Figure 8
<p>The effects of different object filtering and editing methods on the DroneVehicle dataset are presented. (<b>a</b>) displays the results of randomly replacing the foreground object, (<b>b</b>) displays the foreground object replacement results based on our representation learning approach, and (<b>c</b>) shows the effect of removing the foreground objects based on our representation learning approach. It is important to note that the optimization of image effects was conducted only on visible images, where brightness differences are more pronounced. The highlighted regions in the images represent areas with dense objects. As shown in (<b>b</b>), the replaced foreground objects reconstructed using our method better align with the overall image distribution and transition more naturally with the background regions.</p>
Full article ">Figure 9
<p>Ablation experiment results for the two Mosaic methods under different data conditions, where the dataset were expanded by replicating the original data. The experimental results show that the full-scale Mosaic method achieves the fastest convergence speed.</p>
Full article ">
18 pages, 6401 KiB  
Article
Continuous Satellite Image Generation from Standard Layer Maps Using Conditional Generative Adversarial Networks
by Arminas Šidlauskas, Andrius Kriščiūnas and Dalia Čalnerytė
ISPRS Int. J. Geo-Inf. 2024, 13(12), 448; https://doi.org/10.3390/ijgi13120448 - 11 Dec 2024
Viewed by 1041
Abstract
Satellite image generation has a wide range of applications. For example, parts of images must be restored in areas obscured by clouds or cloud shadows or areas that must be anonymized. The need to cover a large area with the generated images faces [...] Read more.
Satellite image generation has a wide range of applications. For example, parts of images must be restored in areas obscured by clouds or cloud shadows or areas that must be anonymized. The need to cover a large area with the generated images faces the challenge that separately generated images must maintain the structural and color continuity between the adjacent generated images as well as the actual ones. This study presents a modified architecture of the generative adversarial network (GAN) pix2pix that ensures the integrity of the generated remote sensing images. The pix2pix model comprises a U-Net generator and a PatchGAN discriminator. The generator was modified by expanding the input set with images representing the known parts of ground truth and the respective mask. Data used for the generative model consist of Sentinel-2 (S2) RGB satellite imagery as the target data and OpenStreetMap mapping data as the input. Since forested areas and fields dominate in images, a Kneedle clusterization method was applied to create datasets that better represent the other classes, such as buildings and roads. The original and updated models were trained on different datasets and their results were evaluated using gradient magnitude (GM), Fréchet inception distance (FID), structural similarity index measure (SSIM), and multiscale structural similarity index measure (MS-SSIM) metrics. The models with the updated architecture show improvement in gradient magnitude, SSIM, and MS-SSIM values for all datasets. The average GMs of the junction region and the full image are similar (do not exceed 7%) for the images generated using the modified architecture whereas it is more than 13% higher in the junction area for the images generated using the original architecture. The importance of class balancing is demonstrated by the fact that, for both architectures, models trained on the dataset with a higher ratio of classes representing buildings and roads compared to the models trained on the dataset without clusterization have more than 10% lower FID (162.673 to 190.036 for pix2pix and 173.408 to 195.621 for the modified architecture) and more than 5% higher SSIM (0.3532 to 0.3284 for pix2pix and 0.3575 to 0.3345 for the modified architecture) and MS-SSIM (0.3532 to 0.3284 for pix2pix and 0.3575 to 0.3345 for the modified architecture) values. Full article
Show Figures

Figure 1

Figure 1
<p>Sentinel-2 study area raster after preprocessing, LKS-94 coordinate format.</p>
Full article ">Figure 2
<p>Study area raster of land type.</p>
Full article ">Figure 3
<p>Composition of 2 datasets. (<b>a</b>) Dataset 1 composed of standard OSM input and Sentinel-2 output. (<b>b</b>) Dataset 2 composed of OSM input, continuation mask that is taken from a real image, and generation mask that defines the generation boundaries with red being non-generated area and green as generated area.</p>
Full article ">Figure 4
<p>Example of a four-image generation for the capture of evaluation zone.</p>
Full article ">Figure 5
<p>Architectures of a generator: (<b>a</b>) denotes the standard U-Net architecture with skip connections; (<b>b</b>) denotes modified U-Net architecture, which takes in 3 different inputs, uses the additional inputs in the standard skip connection training in order to provide continuous generation, and additionally uses two extra inputs for locking in which areas the generator is supposed to generate and which are already complete.</p>
Full article ">Figure 6
<p>(<b>a</b>) Selection of optimal number of clusters (marked by the red dashed line) after applying Kneedle method, (<b>b</b>) Class distribution in different clusters, in addition to their sample sizes.</p>
Full article ">Figure 7
<p>Examples of clusters: (<b>a</b>–<b>c</b>)—data from cluster 1; (<b>d</b>–<b>f</b>)—data from cluster 2; (<b>g</b>–<b>i</b>)—data from cluster 3.</p>
Full article ">Figure 7 Cont.
<p>Examples of clusters: (<b>a</b>–<b>c</b>)—data from cluster 1; (<b>d</b>–<b>f</b>)—data from cluster 2; (<b>g</b>–<b>i</b>)—data from cluster 3.</p>
Full article ">Figure 8
<p>Visualization of gradient magnitude evaluation: (<b>a</b>) Example of ground truth image; (<b>b</b>) Example of an image generated with pix2pix model; (<b>c</b>) Example of an image generated with pix2pix I7 model; (<b>d</b>) Zoomed-in junction area from (<b>b</b>) image bounded in green; (<b>e</b>) Zoomed-in junction area from (<b>c</b>) image bounded in green; (<b>f</b>) Example of input sketch image with mapping data; (<b>g</b>) Example of an image generated with pix2pix model illustrated with gradient magnitude filter; (<b>h</b>) Example of an image generated with pix2pix I7 model with gradient magnitude filter; (<b>i</b>) Zoomed-in junction area from (<b>g</b>) image bounded in green; (<b>j</b>) Zoomed-in junction area from (<b>j</b>) image bounded in green.</p>
Full article ">Figure 9
<p>Examples of images from specific datasets and the predictions of the respective models trained on them: (<b>a</b>) DS3 ground truth image; (<b>b</b>) DS3 input mask image; (<b>c</b>) DS3 pix2pix model generation on DS1; (<b>d</b>) DS3 pix2pix I7 model generation on DS3; (<b>e</b>) DS4 ground truth image; (<b>f</b>) DS4 input mask image; (<b>g</b>) DS4 pix2pix model generation on DS1; (<b>h</b>) DS4 pix2pix I7 model generation on DS4; (<b>i</b>) DS5 ground truth image; (<b>j</b>) DS5 input mask image; (<b>k</b>) DS5 pix2pix model generation on DS1; (<b>l</b>) DS5 pix2pix I7 model generation on DS5. In the mask images, green represents Forest, red represents Field, blue represents Water, cyan represents Road, yellow represents Building.</p>
Full article ">
12 pages, 1950 KiB  
Article
Distance Estimation with a Stereo Camera and Accuracy Determination
by Arnold Zaremba and Szymon Nitkiewicz
Appl. Sci. 2024, 14(23), 11444; https://doi.org/10.3390/app142311444 - 9 Dec 2024
Viewed by 1534
Abstract
Distance measurement plays a key role in many fields of science and technology, including robotics, civil engineering, and navigation systems. This paper focuses on analyzing the precision of a measurement system using stereo camera distance measurement technology in the context of measuring two [...] Read more.
Distance measurement plays a key role in many fields of science and technology, including robotics, civil engineering, and navigation systems. This paper focuses on analyzing the precision of a measurement system using stereo camera distance measurement technology in the context of measuring two objects of different sizes. The first part of the paper presents key information about stereoscopy, followed by a discussion of the process of building a measuring station. The Mask R-CNN algorithm, which is a deep learning model that combines object detection and instance segmentation, was used to identify objects in the images. In the following section, the calibration process of the system and the distance calculation method are presented. The purpose of the study was to determine the precision of the measurement system and to identify the distance ranges where the measurements are most precise. Measurements were made in the range of 20 to 70 cm. The system demonstrated a relative error of 0.95% for larger objects and 1.46% for smaller objects at optimal distances. A detailed analysis showed that for larger objects, the system exhibited higher precision over a wider range of distances, while for smaller objects, the highest accuracy was achieved over a more limited range. These results provide valuable information on the capabilities and limitations of the measurement system used, while pointing out directions for its further optimization. Full article
Show Figures

Figure 1

Figure 1
<p>Schematic view of the test bed for distance measurement with stereo cameras.</p>
Full article ">Figure 2
<p>Flow chart of the calibration process.</p>
Full article ">Figure 3
<p>Illustration of the calculation method.</p>
Full article ">Figure 4
<p>Measurement of the distance of a cup sample.</p>
Full article ">Figure 5
<p>Measurement of the sample distance of a LEGO human figure.</p>
Full article ">Figure 6
<p>Comparison of the results for both objects.</p>
Full article ">
20 pages, 12213 KiB  
Article
Towards Supporting Satellite Design Through the Top-Down Approach: A General Model for Assessing the Ability of Future Satellite Missions to Quantify Point Source Emissions
by Lu Yao, Dongxu Yang, Zhe Jiang, Yi Liu, Lixu Chen, Longfei Tian, Janne Hakkarainen, Zhaonan Cai, Jing Wang and Xiaoyu Ren
Remote Sens. 2024, 16(23), 4503; https://doi.org/10.3390/rs16234503 - 30 Nov 2024
Viewed by 923
Abstract
Monitoring and accurately quantifying greenhouse gas (GHG) emissions from point sources via satellite measurements is crucial for validating emission inventories. Numerous studies have applied varied methods to estimate emission intensities from both natural and anthropogenic point sources, highlighting the potential of satellites for [...] Read more.
Monitoring and accurately quantifying greenhouse gas (GHG) emissions from point sources via satellite measurements is crucial for validating emission inventories. Numerous studies have applied varied methods to estimate emission intensities from both natural and anthropogenic point sources, highlighting the potential of satellites for point source quantification. To promote the development of the space-based GHG monitoring system, it is pivotal to assess the satellite’s capacity to quantify emissions from distinct sources before its design and launch. However, no universal method currently exists for quantitatively assessing the ability of satellites to quantify point source emissions. This paper presents a parametric conceptual model and database for efficiently evaluating the quantification capabilities of satellites and optimizing their technical characteristics for particular detection missions. Using the model and database, we evaluated how well various satellites can detect and quantify GHG emissions. Our findings indicate that accurate estimation of point source emissions requires both high spatial resolution and measurement precision. The requirement for satellite spatial resolution and measurement precision to achieve unbiased emission estimation gradually decreases with increasing emission intensity. The model and database developed in this study can serve as a reference for harmonious satellite configuration that balances measurement precision and spatial resolution. Furthermore, to progress the evaluation model of satellites for low-intensity emission point sources, it is imperative to implement a more precise simulation model and estimate method with a refined mask-building approach. Full article
(This article belongs to the Section Atmospheric Remote Sensing)
Show Figures

Figure 1

Figure 1
<p>Examples of CO<sub>2</sub> column enhancement simulated by the Gaussian dispersion model. The top panels are plume simulations on grids with spatial resolutions of 50 × 50 m<sup>2</sup>, 500 × 500 m<sup>2</sup>, 1 × 1 km<sup>2</sup>, and 2 × 2 km<sup>2</sup> under a wind speed of 3 m/s. The point source is located at the coordinate origin and has an emission intensity of 5 Mt/yr. The bottom panels display the corresponding plume enhancement at distances of 0.5 km, 1 km, 2 km, 4 km, 6 km, and 8 km downwind of the point source.</p>
Full article ">Figure 2
<p>Requirements for spatial resolution and measurement precision in detecting CO<sub>2</sub> emissions. The assessment is conducted under a wind speed of 3 m/s, without considering instrument noise. The red line represents an emission intensity of 1 Mt/yr, while the white, black, and grey lines indicate maximum XCO<sub>2</sub> enhancements of 0.5 ppm, 1 ppm and 10 ppm, respectively.</p>
Full article ">Figure 3
<p>Requirements for spatial resolution and measurement precision in detecting CH<sub>4</sub> emissions. The assessment is conducted under a wind speed of 3 m/s, without considering instrument noise. The pink and red lines represent emission intensities of 100 kg/h and 300 kg/h, while the white, black, and grey lines indicate maximum XCH<sub>4</sub> enhancements of 3.5 ppb, 10 ppb, and 200 ppb, respectively.</p>
Full article ">Figure 4
<p>Examples of pseudo plume and column enhancement resulting from a CO<sub>2</sub> point source. The CO<sub>2</sub> column enhancement is simulated by the Gaussian dispersion model on grids with spatial resolutions of 50 × 50 m<sup>2</sup> and 500 × 500 m<sup>2</sup> at a wind speed of 3 m/s. The isolated point source is located at the coordinate origin and has an emission intensity of 5 Mt/yr. The bottom panels show the corresponding plume enhancement at distances of 1 km, 2 km, and 4 km downwind of the point source. The simulation considered a measurement precision of 0.5 ppm.</p>
Full article ">Figure 5
<p>The ability of satellite measurements to quantify CO<sub>2</sub> emissions from point sources at a wind speed of 3 m/s. The estimation accuracy is represented by the mean (<math display="inline"><semantics> <mrow> <mi>μ</mi> </mrow> </semantics></math>) and standard deviation (<math display="inline"><semantics> <mrow> <mi>σ</mi> </mrow> </semantics></math>) of the relative bias of the satellite-estimated CO<sub>2</sub> emissions, displayed in the top panels and bottom panels. The CO<sub>2</sub> point sources shown have emission intensities of 1 Mt/yr, 5 Mt/yr, 10 Mt/yr, 30 Mt/yr, and 60 Mt/yr.</p>
Full article ">Figure 6
<p>The ability of satellite measurements to quantify CH<sub>4</sub> emissions from point sources at a wind speed of 3 m/s. The estimation accuracy is represented by the mean (<math display="inline"><semantics> <mrow> <mi>μ</mi> </mrow> </semantics></math>) and standard deviation (<math display="inline"><semantics> <mrow> <mi>σ</mi> </mrow> </semantics></math>) of the relative bias of the satellite-estimated CH<sub>4</sub> emissions, displayed in the top and bottom panels. The CH<sub>4</sub> point sources shown have emission intensities of 100 kg/h, 200 kg/h, 500 kg/h, 1000 kg/h, and 2000 kg/h.</p>
Full article ">Figure A1
<p>Examples of CO<sub>2</sub> column enhancement simulated by the Gaussian dispersion model. The top panels are plume simulations on grids with spatial resolutions of 50 × 50 m<sup>2</sup>, 500 × 500 m<sup>2</sup>, 1 × 1 km<sup>2</sup>, and 2 × 2 km<sup>2</sup> under a wind speed of 5 m/s. The point source is located at the coordinate origin and has an emission intensity of 5 Mt/yr. The bottom panels display the corresponding plume enhancement at distances of 0.5 km, 1 km, 2 km, 4 km, 6 km, and 8 km downwind of the point source.</p>
Full article ">Figure A2
<p>Same as <a href="#remotesensing-16-04503-f0A1" class="html-fig">Figure A1</a>, but the simulation is conducted at the wind speed of 8 m/s.</p>
Full article ">Figure A3
<p>Examples of CH<sub>4</sub> column enhancement simulated by the Gaussian dispersion model. The top panels are plume simulations on grids with spatial resolutions of 50 × 50 m<sup>2</sup>, 500 × 500 m<sup>2</sup>, 1 × 1 km<sup>2</sup>, and 2 × 2 km<sup>2</sup> under a wind speed of 3 m/s. The point source is located at the coordinate origin and has an emission intensity of 500 kg/h. The bottom panels display the corresponding plume enhancement at distances of 0.5 km, 1 km, 2 km, 4 km, 6 km, and 8 km downwind of the point source.</p>
Full article ">Figure A4
<p>Same as <a href="#remotesensing-16-04503-f0A3" class="html-fig">Figure A3</a>, but the simulation is conducted at the wind speed of 5 m/s.</p>
Full article ">Figure A5
<p>Same as <a href="#remotesensing-16-04503-f0A3" class="html-fig">Figure A3</a>, but the simulation is conducted at the wind speed of 8 m/s.</p>
Full article ">Figure A6
<p>Requirements for spatial resolution and measurement precision in detecting CO<sub>2</sub> emissions. The assessment is conducted under a wind speed of 5 m/s, without considering instrument noise. The red line represents an emission intensity of 1 Mt/yr, while the white, black, and grey lines indicate maximum XCO<sub>2</sub> enhancements of 0.5 ppm, 1 ppm, and 10 ppm, respectively.</p>
Full article ">Figure A7
<p>Same as <a href="#remotesensing-16-04503-f0A6" class="html-fig">Figure A6</a>, but the simulation is conducted at the wind speed of 8 m/s.</p>
Full article ">Figure A8
<p>Requirements for spatial resolution and measurement precision in detecting CH<sub>4</sub> emissions. The assessment is conducted under a wind speed of 5 m/s, without considering instrument noise. The pink and red lines represent emission intensities of 100 kg/h and 300 kg/h, while the white, black, and grey lines indicate maximum XCH<sub>4</sub> enhancements of 3.5 ppb, 10 ppb, and 200 ppb, respectively.</p>
Full article ">Figure A9
<p>Same as <a href="#remotesensing-16-04503-f0A8" class="html-fig">Figure A8</a>, but the simulation is conducted at the wind speed of 8 m/s.</p>
Full article ">Figure A10
<p>Examples of the pseudo plume and column enhancement resulting from a CH<sub>4</sub> point source. The CH<sub>4</sub> column enhancement is simulated by the Gaussian dispersion model on grids with spatial resolutions of 50 × 50 m<sup>2</sup> and 500 × 500 m<sup>2</sup> at a wind speed of 3 m/s. The isolated point source is located at the coordinate origin and has an emission intensity of 500 kg/h. The bottom panels show the corresponding plume enhancement at distances of 0.5 km, 1 km, 2 km, and 4 km downwind of the point source. The simulation considered a measurement precision of 5 ppb.</p>
Full article ">Figure A11
<p>The ability of satellite measurements to quantify CO<sub>2</sub> emissions from point sources at a wind speed of 5 m/s. The estimation accuracy is represented by the mean (<math display="inline"><semantics> <mrow> <mi>μ</mi> </mrow> </semantics></math>) and standard deviation (<math display="inline"><semantics> <mrow> <mi>σ</mi> </mrow> </semantics></math>) of the relative bias of the satellite-estimated CO<sub>2</sub> emissions, displayed in the top panels and bottom panels. The CO<sub>2</sub> point sources shown have emission intensities of 1Mt/yr, 5 Mt/yr, 10 Mt/yr, 30 Mt/yr, and 60 Mt/yr.</p>
Full article ">Figure A12
<p>Same as <a href="#remotesensing-16-04503-f0A11" class="html-fig">Figure A11</a>, but the quantification ability was calculated at the wind speed of 8 m/s.</p>
Full article ">Figure A13
<p>The ability of satellite measurements to quantify CH<sub>4</sub> emissions from point sources at a wind speed of 5 m/s. The estimation accuracy is represented by the mean (<math display="inline"><semantics> <mrow> <mi>μ</mi> </mrow> </semantics></math>) and standard deviation (<math display="inline"><semantics> <mrow> <mi>σ</mi> </mrow> </semantics></math>) of the relative bias of the satellite-estimated CH<sub>4</sub> emissions, displayed in the top and bottom panels. The CH<sub>4</sub> point sources shown have emission intensities of 100 kg/h, 200 kg/h, 500 kg/h, 1000 kg/h, and 2000 kg/h.</p>
Full article ">Figure A14
<p>Same as <a href="#remotesensing-16-04503-f0A13" class="html-fig">Figure A13</a>, but the quantification ability was calculated at the wind speed of 8 m/s.</p>
Full article ">Figure A15
<p>Test of the number of samples in Gaussian normal fit.</p>
Full article ">
12 pages, 6649 KiB  
Article
Masked Image Modeling Meets Self-Distillation: A Transformer-Based Prostate Gland Segmentation Framework for Pathology Slides
by Haoyue Zhang, Sushant Patkar, Rosina Lis, Maria J. Merino, Peter A. Pinto, Peter L. Choyke, Baris Turkbey and Stephanie Harmon
Cancers 2024, 16(23), 3897; https://doi.org/10.3390/cancers16233897 - 21 Nov 2024
Cited by 1 | Viewed by 1000
Abstract
Detailed evaluation of prostate cancer glands is an essential yet labor-intensive step in grading prostate cancer. Gland segmentation can serve as a valuable preliminary step for machine-learning-based downstream tasks, such as Gleason grading, patient classification, cancer biomarker building, and survival analysis. Despite its [...] Read more.
Detailed evaluation of prostate cancer glands is an essential yet labor-intensive step in grading prostate cancer. Gland segmentation can serve as a valuable preliminary step for machine-learning-based downstream tasks, such as Gleason grading, patient classification, cancer biomarker building, and survival analysis. Despite its importance, there is currently a lack of a reliable gland segmentation model for prostate cancer. Without accurate gland segmentation, researchers rely on cell-level or human-annotated regions of interest for pathomic and deep feature extraction. This approach is sub-optimal, as the extracted features are not explicitly tailored to gland information. Although foundational segmentation models have gained a lot of interest, we demonstrated the limitations of this approach. This work proposes a prostate gland segmentation framework that utilizes a dual-path Swin Transformer UNet structure and leverages Masked Image Modeling for large-scale self-supervised pretaining. A tumor-guided self-distillation step further fused the binary tumor labels of each patch to the encoder to ensure the encoders are suitable for the gland segmentation step. We united heterogeneous data sources for self-supervised training, including biopsy and surgical specimens, to reflect the diversity of benign and cancerous pathology features. We evaluated the segmentation performance on two publicly available prostate cancer datasets. We achieved state-of-the-art segmentation performance with a test mDice of 0.947 on the PANDA dataset and a test mDice of 0.664 on the SICAPv2 dataset. Full article
(This article belongs to the Section Methods and Technologies Development)
Show Figures

Figure 1

Figure 1
<p>Sample slides from the three data cohorts. The top slide is from SICAPv2. Note that the SICAPv2 dataset is provided in a patch form, so the sample shown in this figure was stitched back based on the given coordinates. The bottom-left slide is from the PANDA cohort. The bottom-right slide is a whole-mount slide from our in-house dataset NCI.</p>
Full article ">Figure 2
<p>Overview of the proposed model for prostate gland segmentation. Section (<b>A</b>) shows the architecture of our proposed dual-path segmentation architecture. Section (<b>B</b>) shows our preprocessing, self-supervised learning, and self-distillation schema for the self-supervised learning step.</p>
Full article ">Figure 3
<p>Sample segmentation results for different Gleason grade glands across different methods. Compared with other methods, many small spots were removed by the tumor classification head in our network, which yielded a better visual representation without any post-processing smoothing methods.</p>
Full article ">
28 pages, 45529 KiB  
Article
High-Quality Damaged Building Instance Segmentation Based on Improved Mask Transfiner Using Post-Earthquake UAS Imagery: A Case Study of the Luding Ms 6.8 Earthquake in China
by Kangsan Yu, Shumin Wang, Yitong Wang and Ziying Gu
Remote Sens. 2024, 16(22), 4222; https://doi.org/10.3390/rs16224222 - 13 Nov 2024
Viewed by 1034
Abstract
Unmanned aerial systems (UASs) are increasingly playing a crucial role in earthquake emergency response and disaster assessment due to their ease of operation, mobility, and low cost. However, post-earthquake scenes are complex, with many forms of damaged buildings. UAS imagery has a high [...] Read more.
Unmanned aerial systems (UASs) are increasingly playing a crucial role in earthquake emergency response and disaster assessment due to their ease of operation, mobility, and low cost. However, post-earthquake scenes are complex, with many forms of damaged buildings. UAS imagery has a high spatial resolution, but the resolution is inconsistent between different flight missions. These factors make it challenging for existing methods to accurately identify individual damaged buildings in UAS images from different scenes, resulting in coarse segmentation masks that are insufficient for practical application needs. To address these issues, this paper proposed DB-Transfiner, a building damage instance segmentation method for post-earthquake UAS imagery based on the Mask Transfiner network. This method primarily employed deformable convolution in the backbone network to enhance adaptability to collapsed buildings of arbitrary shapes. Additionally, it used an enhanced bidirectional feature pyramid network (BiFPN) to integrate multi-scale features, improving the representation of targets of various sizes. Furthermore, a lightweight Transformer encoder has been used to process edge pixels, enhancing the efficiency of global feature extraction and the refinement of target edges. We conducted experiments on post-disaster UAS images collected from the 2022 Luding earthquake with a surface wave magnitude (Ms) of 6.8 in the Sichuan Province of China. The results demonstrated that the average precisions (AP) of DB-Transfiner, APbox and APseg, are 56.42% and 54.85%, respectively, outperforming all other comparative methods. Our model improved the original model by 5.00% and 4.07% in APbox and APseg, respectively. Importantly, the APseg of our model was significantly higher than the state-of-the-art instance segmentation model Mask R-CNN, with an increase of 9.07%. In addition, we conducted applicability testing, and the model achieved an average correctness rate of 84.28% for identifying images from different scenes of the same earthquake. We also applied the model to the Yangbi earthquake scene and found that the model maintained good performance, demonstrating a certain level of generalization capability. This method has high accuracy in identifying and assessing damaged buildings after earthquakes and can provide critical data support for disaster loss assessment. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The study area and UAS orthophotos after the earthquake in Luding County, Sichuan Province. (<b>A</b>) study area; (<b>B</b>) UAS orthophotos: (<b>a</b>,<b>c</b>) Moxi town; (<b>b</b>,<b>d</b>,<b>g</b>) Detuo town; (<b>e</b>) Fawang village; (<b>f</b>) Wandong village.</p>
Full article ">Figure 2
<p>The samples of damaged buildings and labels: (<b>a</b>) Field investigation photos; (<b>b</b>) UAS images, the red fan-shaped marker representing the viewing angle of the observation location; (<b>c</b>) Labeled bounding boxes; (<b>d</b>) Labeled instance masks, the color of the polygon masks represents different instance objects.</p>
Full article ">Figure 3
<p>The network architecture of Mask Transfiner.</p>
Full article ">Figure 4
<p>The improved network architecture for DB-Transfiner. Deformable convolution is employed in the backbone. The FPN is replaced by enhanced BiFPN to fuse the multi-scale features, and, in this study, a lightweight sequence encoder is adopted for efficiency.</p>
Full article ">Figure 5
<p>Deformable convolution feature extraction module. Arrows indicate the type of convolution used at each stage. The first two stages use standard convolution, and the last three stages use deformable convolution. (<b>a</b>) Standard convolution; (<b>b</b>) Deformable convolution.</p>
Full article ">Figure 6
<p>Replacing FPN with enhanced BiFPN to improve feature fusion network.</p>
Full article ">Figure 7
<p>Lightweight sequence encoder to improve the efficiency of the network, using a Transformer structure with an eight-headed self-attention mechanism instead of three Transformer structures with four-headed self-attention mechanisms.</p>
Full article ">Figure 8
<p>Loss curve during DB-Transfiner training.</p>
Full article ">Figure 9
<p>Comparison of the performance of all models based on the metrics <span class="html-italic">AP</span> (%).</p>
Full article ">Figure 10
<p>Visualization of the prediction results of different network models. The colored bounding boxes and polygons represent the detection and segmentation results, respectively. (<b>a</b>) Annotated images; (<b>b</b>) Mask R-CNN; (<b>c</b>) Mask Transfiner; (<b>d</b>) DB-Transfiner.</p>
Full article ">Figure 11
<p>Visualization of instance mask results of different network models. The colored polygons represent the recognized instance objects. ① and ② represent two typical damaged buildings with the same level of destruction. (<b>a</b>) Original images; (<b>b</b>) Annotated results; (<b>c</b>) Mask R-CNN; (<b>d</b>) Mask Transfiner; (<b>e</b>) DB-Transfiner.</p>
Full article ">Figure 12
<p>Visualization of heatmaps: (<b>a</b>) The original images; (<b>b</b>) The heatmaps of Conv2_x layer of the DCNM; (<b>c</b>) The heatmaps of Conv5_x layer of the DCNM; (<b>d</b>) The heatmaps of <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>N</mi> </mrow> <mrow> <mn>5</mn> </mrow> </msub> </mrow> </semantics></math> layer of the MEFM; (<b>e</b>) The final results. The colored borders represent the model’s predicted different instance objects.</p>
Full article ">Figure 13
<p>The visualization of feature maps before and after the LTGM. The colored borders represent the different instance objects.</p>
Full article ">Figure 14
<p>Results of damaged building classification in Fawang village (<a href="#remotesensing-16-04222-f001" class="html-fig">Figure 1</a>B(e)). Red indicates correct detections, green indicates incorrect detections, and yellow indicates missed.</p>
Full article ">Figure 15
<p>Results of damaged building classification in Wandong village and Detuo town (<a href="#remotesensing-16-04222-f001" class="html-fig">Figure 1</a>B(f,g)). Red indicates correct detections, green indicates incorrect detections, and yellow indicates missed.</p>
Full article ">Figure 16
<p>Example of UAV imagery from the Yangbi earthquake in Yunnan, China: (<b>a</b>) Huaian village; (<b>b</b>) Yangbi town.</p>
Full article ">Figure 17
<p>UAS imagery samples of damaged buildings from the Yangbi earthquake. (<b>a</b>) The red irregular polygons denote the damaged buildings. (<b>b</b>) The bounding boxes and polygon masks are the visualized results of our model. The colors represent different instance objects.</p>
Full article ">Figure 18
<p>Examples of densely built-up areas. The red boxes indicate buildings with blurred contour information caused by shadows and occlusions.</p>
Full article ">
13 pages, 326 KiB  
Article
Factors Associated with Impact of Event Scores Among Ontario Education Workers During the COVID-19 Pandemic
by Iris Gutmanis, Brenda L. Coleman, Robert G. Maunder, Kailey Fischer, Veronica Zhu and Allison McGeer
Int. J. Environ. Res. Public Health 2024, 21(11), 1448; https://doi.org/10.3390/ijerph21111448 - 31 Oct 2024
Viewed by 1064
Abstract
There is limited information regarding factors related to education workers’ responses to traumatic stress during the COVID-19 pandemic. The study goal was to determine whether personal factors, behaviours that mitigate viral spread, and work-related factors were associated with post-traumatic symptoms. This observational study, [...] Read more.
There is limited information regarding factors related to education workers’ responses to traumatic stress during the COVID-19 pandemic. The study goal was to determine whether personal factors, behaviours that mitigate viral spread, and work-related factors were associated with post-traumatic symptoms. This observational study, embedded within a cohort study, recruited Ontario education workers from February 2021 to June 2023. Exposure data were collected at enrollment and updated annually. Participants completed the Impact of Event Scale (IES) at withdrawal/study completion. Modified Poisson regression was used to build hierarchical models of dichotomized IES scores (≥26: moderate/severe post-traumatic symptoms). Of the 1518 education workers who submitted an IES between September 2022 and December 2023, the incidence rate ratio of IES scores ≥26 was significantly higher among participants who usually/always wore a mask at work (1.48; 95% confidence interval 1.23, 1.79), usually/always practiced physical distancing (1.31; 1.06, 1.62), lived in larger households (1.06; 1.01, 1.12), and reported poor/fair/good health (1.27; 1.11, 1.46). However, models accounted for little of the variance in IES scores, suggesting the need for future studies to collect data on other factors associated with the development of PTSD, such as pre-existing mental health challenges. Early identification of those experiencing traumatic stress and the implementation of stress reduction strategies are needed to ensure the ongoing health of education workers. Full article
21 pages, 7110 KiB  
Article
Pose Tracking and Object Reconstruction Based on Occlusion Relationships in Complex Environments
by Xi Zhao, Yuekun Zhang and Yaqing Zhou
Appl. Sci. 2024, 14(20), 9355; https://doi.org/10.3390/app14209355 - 14 Oct 2024
Viewed by 1286
Abstract
For the reconstruction of objects during hand–object interactions, accurate pose estimation is indispensable. By improving the precision of pose estimation, the accuracy of the 3D reconstruction results can be enhanced. Recently, pose tracking techniques are no longer limited to individual objects, leading to [...] Read more.
For the reconstruction of objects during hand–object interactions, accurate pose estimation is indispensable. By improving the precision of pose estimation, the accuracy of the 3D reconstruction results can be enhanced. Recently, pose tracking techniques are no longer limited to individual objects, leading to advancements in the reconstruction of objects interacting with other objects. However, most methods struggle to handle incomplete target information in complex scenes and mutual interference between objects in the environment, leading to a decrease in pose estimation accuracy. We proposed an improved algorithm building upon the existing BundleSDF framework, which enables more robust and accurate tracking by considering the occlusion relationships between objects. First of all, for detecting changes in occlusion relationships, we segment the target and compute dual-layer masks. Secondly, rough pose estimation is performed through feature matching, and a keyframe pool is introduced for pose optimization, which is maintained based on occlusion relationships. Lastly, the estimated results of historical frames are used to train an object neural field to assist in the subsequent pose-tracking process. Experimental verification shows that on the HO-3D dataset, our method can significantly improve the accuracy and robustness of object tracking in frequent interactions, providing new ideas for object pose-tracking tasks in complex scenes. Full article
(This article belongs to the Special Issue Technical Advances in 3D Reconstruction)
Show Figures

Figure 1

Figure 1
<p>Overview of our system. First, we compute the image mask of the target object, then feed the mask segmentation result into the feature-matching network to obtain a coarse pose estimation using the Umeyama [<a href="#B19-applsci-14-09355" class="html-bibr">19</a>] algorithm with the feature-matching result of the previous frame. Second, we use a dual-layer mask-based strategy to select frames from the key-frame pool that have a strong co-visibility relationship with the current frame and perform joint optimization with the current frame to obtain the final pose estimation result.</p>
Full article ">Figure 2
<p>Visualization of the dual-layer mask results (yellow: target object mask; pink: detected foreground occlusions).</p>
Full article ">Figure 3
<p>The images of mask difference between adjacent frames and the original RGB image are shown above. (<b>a</b>) The original RGB image of this scene. (<b>b</b>) The result of superimposing foreground occlusion masks of adjacent frames is shown above. White indicates overlapping areas, while gray indicates non-overlapping areas. (<b>c</b>) The portion showing the change in foreground occlusion masks between adjacent frames (white areas).</p>
Full article ">Figure 4
<p>The partial results of our method on the HO-3D dataset are visualized above. Each row represents the results of a video in the dataset, where the green bounding box indicates the ground truth pose, and the red bounding box represents the predicted results of our method.</p>
Full article ">Figure 5
<p>Visualizations of partial results of our method on the self-made indoor interaction dataset are presented. Each row represents the results of one video in the dataset.</p>
Full article ">Figure 6
<p>Experiment with moving occluders: qualitative comparison with BundleSDF on the custom dataset. On the left are the visualized results of the estimated poses of the target (yellow box) by the two methods, while on the right, the final reconstructed meshes of the two methods are displayed.</p>
Full article ">Figure 7
<p>The partial pose tracking results for the <span class="html-italic"><b>video_switch</b></span> dataset are shown below: yellow wireframes represent the pose estimation results of the method. (<b>a</b>) The results of the comparative method, BundleSDF; (<b>b</b>) the results of our method.</p>
Full article ">Figure 8
<p>The comparison of reconstruction results for the <span class="html-italic"><b>video_switch</b></span> dataset. The first row depicts the reconstruction results by BundleSDF, while the second row shows the reconstruction results by our method. From left to right, the observations are from the frontal, side, and top views of the object, respectively.</p>
Full article ">Figure 9
<p>The partial results of the tracking for <span class="html-italic"><b>video_cup</b></span> are shown above. The left two columns display the original images and local enlargements of the pose estimation results by BundleSDF, while the right two columns show the original images and local enlargements of the pose estimation results by our method.</p>
Full article ">Figure 10
<p>Comparison of tracking and reconstruction results for the <span class="html-italic"><b>video_cup</b></span> dataset. The first row shows the results obtained with BundleSDF, while the second row presents the output from our method.</p>
Full article ">
15 pages, 1641 KiB  
Article
Interactive Segmentation for Medical Images Using Spatial Modeling Mamba
by Yuxin Tang, Yu Li, Hua Zou and Xuedong Zhang
Information 2024, 15(10), 633; https://doi.org/10.3390/info15100633 - 14 Oct 2024
Viewed by 1610
Abstract
Interactive segmentation methods utilize user-provided positive and negative clicks to guide the model in accurately segmenting target objects. Compared to fully automatic medical image segmentation, these methods can achieve higher segmentation accuracy with limited image data, demonstrating significant potential in clinical applications. Typically, [...] Read more.
Interactive segmentation methods utilize user-provided positive and negative clicks to guide the model in accurately segmenting target objects. Compared to fully automatic medical image segmentation, these methods can achieve higher segmentation accuracy with limited image data, demonstrating significant potential in clinical applications. Typically, for each new click provided by the user, conventional interactive segmentation methods reprocess the entire network by re-inputting the click into the segmentation model, which greatly increases the user’s interaction burden and deviates from the intended goal of interactive segmentation tasks. To address this issue, we propose an efficient segmentation network, ESM-Net, for interactive medical image segmentation. It obtains high-quality segmentation masks based on the user’s initial clicks, reducing the complexity of subsequent refinement steps. Recent studies have demonstrated the strong performance of the Mamba model in various vision tasks; however, its application in interactive segmentation remains unexplored. In our study, we incorporate the Mamba module into our framework for the first time and enhance its spatial representation capabilities by developing a Spatial Augmented Convolution (SAC) module. These components are combined as the fundamental building blocks of our network. Furthermore, we designed a novel and efficient segmentation head to fuse multi-scale features extracted from the encoder, optimizing the generation of the predicted segmentation masks. Through comprehensive experiments, our method achieved state-of-the-art performance on three medical image datasets. Specifically, we achieved 1.43 NoC@90 on the Kvasir-SEG dataset, 1.57 NoC@90 on the CVC-ClinicDB polyp segmentation dataset, and 1.03 NoC@90 on the ADAM retinal disk segmentation dataset. The assessments on these three medical image datasets highlight the effectiveness of our approach in interactive medical image segmentation. Full article
(This article belongs to the Special Issue Applications of Deep Learning in Bioinformatics and Image Processing)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>ESM-Click Overview. Our model comprises two stages: preliminary segmentation and refinement segmentation. The encoded image and click features are fed into our proposed ESM-Net segmentation network to extract target-aware features and generate a coarse segmentation mask guided by the initial click. Starting from the second click, the new user-provided click is fed into the refinement network to optimize the details of the previously generated coarse mask. By iteratively executing the refinement network, a high-quality prediction mask is eventually produced.</p>
Full article ">Figure 2
<p>The Overall Architecture of ESM-Net integrates Spatial Augmented Convolution (SAC), Mamba modules, and MBConv for downsampling within the encoder module. (<b>a</b>) The Spatial Augmented Convolution Module enhances the spatial representation of features before input to the Mamba Module using a gate-like structure. (<b>b</b>) The Mamba Module transforms input features into feature sequences and processes them with SS2D to obtain comprehensive features from the merged sequences. (<b>c</b>) KAN SegHead receives multi-scale features from the encoder and utilizes KANLinear layers to output the final segmentation mask.</p>
Full article ">Figure 3
<p>The mean Intersection over Union (mIoU) and mean Dice coefficient (mDice) scores corresponding to the predictions obtained per click using different methods on the Kvasir-SEG and Clinic datasets.</p>
Full article ">Figure 4
<p>Qualitative results of ESM-Click. The first row illustrates example segmentations from the Kvasir-SEG dataset. The second row presents segmentation examples from the Clinic dataset with varying numbers of clicks. The third row showcases interactive segmentation cases from the ADAM dataset. Segmentation probability maps are depicted in blue; segmentation overlays on the original images are shown in red using the IoU evaluation metric. Green dots indicate positive clicks, while red dots indicate negative clicks.</p>
Full article ">
Back to TopTop