Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (30)

Search Parameters:
Keywords = automatic mask removal

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 2289 KiB  
Article
Automatic Watershed Segmentation of Cancerous Lesions in Unsupervised Breast Histology Images
by Vincent Majanga and Ernest Mnkandla
Appl. Sci. 2024, 14(22), 10394; https://doi.org/10.3390/app142210394 - 12 Nov 2024
Viewed by 422
Abstract
Segmentation of nuclei in histology images is key in analyzing and quantifying morphology changes of nuclei features and tissue structures. Conventional diagnosis, segmenting, and detection methods have relied heavily on the manual-visual inspection of histology images. These methods are only effective on clearly [...] Read more.
Segmentation of nuclei in histology images is key in analyzing and quantifying morphology changes of nuclei features and tissue structures. Conventional diagnosis, segmenting, and detection methods have relied heavily on the manual-visual inspection of histology images. These methods are only effective on clearly visible cancerous lesions on histology images thus limited in their performance due to the complexity of tissue structures in histology images. Hence, early detection of breast cancer is key for treatment and profits from Computer-Aided-Diagnostic (CAD) systems introduced to efficiently and automatically segment and detect nuclei cells in pathology. This paper proposes, an automatic watershed segmentation method of cancerous lesions in unsupervised human breast histology images. Firstly, this approach pre-processes data through various augmentation methods to increase the size of dataset images, then a stain normalization technique is applied to these augmented images to isolate nuclei features from tissue structures. Secondly, data enhancement techniques namely; erosion, dilation, and distance transform are used to highlight foreground and background pixels while removing unwanted regions from the highlighted nuclei objects on the image. Consequently, the connected components method groups these highlighted pixel components with similar intensity values and, assigns them to their relevant labeled component binary mask. Once all binary masked groups have been determined, a deep-learning recurrent neural network from the Keras architecture uses this information to automatically segment nuclei objects with cancerous lesions and their edges on the image via watershed filling. This segmentation method is evaluated on an unsupervised, augmented human breast cancer histology dataset of 11,151 images. This proposed method produced a significant evaluation result of 98% F1-accuracy score. Full article
Show Figures

Figure 1

Figure 1
<p>Breast cancer histology images.</p>
Full article ">Figure 2
<p>Original H&amp;E image, Augmented H&amp;E image, Normalized H&amp;E image, Normalized H image, Normalized E image respectively.</p>
Full article ">Figure 3
<p>Images after OTSU thresholding.</p>
Full article ">Figure 4
<p>Image after noise removal via thresholding.</p>
Full article ">Figure 5
<p>Enter Image after clearing borders via opening morphology operation.</p>
Full article ">Figure 6
<p>Sure background image after dilation morphology.</p>
Full article ">Figure 7
<p>Distance transform image.</p>
Full article ">Figure 8
<p>Thresholding after distance transformation.</p>
Full article ">Figure 9
<p>Connected component images.</p>
Full article ">Figure 10
<p>Unsupervised BC (normalized H) histology images result from this proposed watershed segmentation method.</p>
Full article ">Figure 11
<p>First Row: Original histology images of different human glands provided by the Warwick QU Dataset in the Kaggle dataset repository. Second Row: Resultant images after segmentation application using the proposed watershed method.</p>
Full article ">Figure 12
<p>Training Loss and Accuracy graph curves.</p>
Full article ">Figure 13
<p>Validation Loss and Accuracy graph curve.</p>
Full article ">
20 pages, 7605 KiB  
Article
A Novel Adversarial Example Detection Method Based on Frequency Domain Reconstruction for Image Sensors
by Shuaina Huang, Zhiyong Zhang and Bin Song
Sensors 2024, 24(17), 5507; https://doi.org/10.3390/s24175507 - 25 Aug 2024
Viewed by 1184
Abstract
Convolutional neural networks (CNNs) have been extensively used in numerous remote sensing image detection tasks owing to their exceptional performance. Nevertheless, CNNs are often vulnerable to adversarial examples, limiting the uses in different safety-critical scenarios. Recently, how to efficiently detect adversarial examples and [...] Read more.
Convolutional neural networks (CNNs) have been extensively used in numerous remote sensing image detection tasks owing to their exceptional performance. Nevertheless, CNNs are often vulnerable to adversarial examples, limiting the uses in different safety-critical scenarios. Recently, how to efficiently detect adversarial examples and improve the robustness of CNNs has drawn considerable focus. The existing adversarial example detection methods require modifying CNNs, which not only affects the model performance but also greatly enhances training cost. With the purpose of solving these problems, this study proposes a detection algorithm for adversarial examples that does not need modification of the CNN models and can simultaneously retain the classification accuracy of normal examples. Specifically, we design a method to detect adversarial examples using frequency domain reconstruction. After converting the input adversarial examples into the frequency domain by Fourier transform, the adversarial disturbance from adversarial attacks can be eliminated by modifying the frequency of the example. The inverse Fourier transform is then used to maximize the recovery of the original example. Firstly, we train a CNN to reconstruct input examples. Then, we insert Fourier transform, convolution operation, and inverse Fourier transform into the features of the input examples to automatically filter out adversarial frequencies. We refer to our proposed method as FDR (frequency domain reconstruction), which removes adversarial interference by converting input samples into frequency and reconstructing them back into the spatial domain to restore the image. In addition, we also introduce gradient masking into the proposed FDR method to enhance the detection accuracy of the model for complex adversarial examples. We conduct extensive experiments on five mainstream adversarial attacks on three benchmark datasets, and the experimental results show that FDR can outperform state-of-the-art solutions in detecting adversarial examples. Additionally, FDR does not require any modifications to the detector and can be integrated with other adversarial example detection methods to be installed in sensing devices to ensure detection safety. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>Comparison of frequency domain graphs of original example, adversarial example and reconstructed example. The first column is the spatial domain diagram of the original example, and the rest is the frequency domain diagram.</p>
Full article ">Figure 2
<p>(<b>a</b>) The FDR structure first extracts the features of the input attack example, then performs Fourier transform to the frequency domain, and finally eliminates the attack in the frequency domain and reconstructs the original image. (<b>b</b>) The FDR with gradient masking structure first resizes the attack examples and extracts the features of the input attack examples, then performs Fourier transform to the frequency domain, and finally eliminates the attack in the frequency domain and reconstructs the original image before resizing to recover the input size.</p>
Full article ">Figure 3
<p>We select MJSMA attack algorithms to conduct experiments on three datasets and the number of iterations is 10–100.</p>
Full article ">Figure 4
<p>We select four attack algorithms to conduct experiments on the MNIST dataset and the number of iterations of DeepFool and C&amp;W attack algorithm is 10 or 100, respectively. We compare the proposed algorithm with 3 different detection algorithms.</p>
Full article ">Figure 5
<p>We select four attack algorithms to conduct experiments on the CIFAR10 dataset, and the disturbance value of the FGSM and BIM attack algorithms is 0.03–0.30. We compare the proposed algorithm with 3 different detection algorithms.</p>
Full article ">Figure 6
<p>We select four attack algorithms to conduct experiments on the SVHN dataset, and the disturbance value of the FGSM and BIM attack algorithms is 0.03–0.30. We compare the proposed algorithm with 3 different detection algorithms.</p>
Full article ">Figure 7
<p>We visualize the recovery results of FDR GM method and other attack example detection methods under different attack algorithms.</p>
Full article ">
20 pages, 63242 KiB  
Article
Crater Detection and Population Statistics in Tianwen-1 Landing Area Based on Segment Anything Model (SAM)
by Yaqi Zhao and Hongxia Ye
Remote Sens. 2024, 16(10), 1743; https://doi.org/10.3390/rs16101743 - 14 May 2024
Viewed by 1132
Abstract
Crater detection is useful for research into dating a planetary surface’s age and geological mapping. The high-resolution imaging camera (HiRIC) carried by the Tianwen-1 rover provides digital image model (DIM) datasets with a resolution of 0.7 m/pixel, which are suitable for detecting meter-scale [...] Read more.
Crater detection is useful for research into dating a planetary surface’s age and geological mapping. The high-resolution imaging camera (HiRIC) carried by the Tianwen-1 rover provides digital image model (DIM) datasets with a resolution of 0.7 m/pixel, which are suitable for detecting meter-scale craters. The existing deep-learning-based automatic crater detection algorithms require a large number of crater annotation datasets for training. However, there is currently a lack of datasets of optical images of small-sized craters. In this study, we propose a model based on the Segment Anything Model (SAM) to detect craters in Tianwen-1’s landing area and perform statistical analysis. The SAM network was used to obtain a segmentation mask of the craters from the DIM images. Then non-circular filtering was used to filter out irregular craters. Finally, deduplication and removal of false positives were performed to obtain accurate circular craters, and their center’s position and diameter were obtained through circular fitting analysis. We extracted 841,727 craters in total, with diameters ranging from 1.57 m to 7910.47 m. These data are useful for further Martian crater catalogs and crater datasets. Additionally, the crater size–frequency distribution (CSFD) was also analyzed, indicating that the surface ages of the Tianwen-1 landing area are ~3.25 billion years, with subsequent surface resurfacing events occurring ~1.67 billion years ago. Full article
(This article belongs to the Special Issue Planetary Geologic Mapping and Remote Sensing (Second Edition))
Show Figures

Figure 1

Figure 1
<p>Digital image map (DIM) of Tianwen-1’s landing area. (<b>a</b>) The 15 DIMs of the landing area. The red star in the image represents the landing location of the Zhurong rover. (<b>b</b>) DIM of the first region in the landing area (HX1_GRAS_HIRIC_DIM_0.7_0001_254537N1095850E_A).</p>
Full article ">Figure 2
<p>Network structure of the SAM for the automatic identification of craters (Different colors in the output image represent different segmentation targets).</p>
Full article ">Figure 3
<p>Segmented images from SAM (Different colors in the output image represent different segmentation targets). (<b>a</b>) DIM of the landing area. (<b>b</b>) Segmented images outputted by SAM.</p>
Full article ">Figure 4
<p>Cumulative distribution curve of the circularity of objects segmented using SAM.</p>
Full article ">Figure 5
<p>Non-circular filtering of the segmentation objects generated by SAM (Green represents non circular segmentation targets, while red represents circles). (<b>a</b>) Segmentation results outputted by SAM. (<b>b</b>) Results after filtering non-circular objects.</p>
Full article ">Figure 6
<p>Circular fitting of the mask edges extracted by SAM: (<b>a</b>) before circular fitting; (<b>b</b>) after circular fitting.</p>
Full article ">Figure 7
<p>The distribution of crater sizes and quantities resulting from different cropping sizes.</p>
Full article ">Figure 8
<p>Results of extracting craters from datasets with different cropping sizes.</p>
Full article ">Figure 9
<p>(<b>a</b>) Before removing false craters(IOU = 0.8). (<b>b</b>) After removing false craters (IOU = 0.7).</p>
Full article ">Figure 10
<p>Crater labels for the three subregions of the landing area.</p>
Full article ">Figure 11
<p>The results of extracting craters in two small regions.</p>
Full article ">Figure 12
<p>Crater extraction results from different regions of the two landing areas. (<b>a</b>) HX1_GRAS_HIRIC_DIM_0.7_0001_254537N1095850E_A. (<b>b</b>) HX1_GRAS_HIRIC_DIM_0.7_0007_244453N1095850E_A.</p>
Full article ">Figure 13
<p>(<b>a</b>) Statistics of craters with different diameter ranges. (<b>b</b>) Map of crater density in the main landing region.</p>
Full article ">Figure 14
<p>Craters’ size–frequency distribution and estimation of the age of the landing area. Craters with diameters greater than 1 km, black; diameters ranging from 50 to 800 m, red; estimation of the age using the RH_2012 dataset, green [<a href="#B5-remotesensing-16-01743" class="html-bibr">5</a>,<a href="#B44-remotesensing-16-01743" class="html-bibr">44</a>,<a href="#B46-remotesensing-16-01743" class="html-bibr">46</a>].</p>
Full article ">Figure 15
<p>Morphological analysis of undetected craters: (<b>a</b>) irregularly shaped craters; (<b>b</b>) overlooked large-sized craters.</p>
Full article ">
16 pages, 1308 KiB  
Article
Classification of Rainfall Intensity and Cloud Type from Dash Cam Images Using Feature Removal by Masking
by Kodai Suemitsu, Satoshi Endo and Shunsuke Sato
Climate 2024, 12(5), 70; https://doi.org/10.3390/cli12050070 - 12 May 2024
Cited by 1 | Viewed by 1752
Abstract
Weather Report is an initiative from Weathernews Inc. to obtain sky images and current weather conditions from the users of its weather app. This approach can provide supplementary weather information to radar observations and can potentially improve the accuracy of forecasts However, since [...] Read more.
Weather Report is an initiative from Weathernews Inc. to obtain sky images and current weather conditions from the users of its weather app. This approach can provide supplementary weather information to radar observations and can potentially improve the accuracy of forecasts However, since the time and location of the contributed images are limited, gathering data from different sources is also necessary. This study proposes a system that automatically submits weather reports using a dash cam with communication capabilities and image recognition technology. This system aims to provide detailed weather information by classifying rainfall intensities and cloud formations from images captured via dash cams. In models for fine-grained image classification tasks, there are very subtle differences between some classes and only a few samples per class. Therefore, they tend to include irrelevant details, such as the background, during training, leading to bias. One solution is to remove useless features from images by masking them using semantic segmentation, and then train each masked dataset using EfficientNet, evaluating the resulting accuracy. In the classification of rainfall intensity, the model utilizing the features of the entire image achieved up to 92.61% accuracy, which is 2.84% higher compared to the model trained specifically on road features. This outcome suggests the significance of considering information from the whole image to determine rainfall intensity. Furthermore, analysis using the Grad-CAM visualization technique revealed that classifiers trained on masked dash cam images particularly focused on car headlights when classifying the rainfall intensity. For cloud type classification, the model focusing solely on the sky region attained an accuracy of 68.61%, which is 3.16% higher than that of the model trained on the entire image. This indicates that concentrating on the features of clouds and the sky enables more accurate classification and that eliminating irrelevant areas reduces misclassifications. Full article
(This article belongs to the Special Issue Extreme Weather Detection, Attribution and Adaptation Design)
Show Figures

Figure 1

Figure 1
<p>Example of a weather report. The user selects the current weather conditions, the perceived temperature, and the five-sense forecast from a list of options.</p>
Full article ">Figure 2
<p>Flow of automatic reporting. The proposed system consists of six steps.</p>
Full article ">Figure 3
<p>Example of images for each weather label.</p>
Full article ">Figure 4
<p>Procedure for creating masked images (<b>upper</b>: rainfall intensity dataset, <b>lower</b>: cloud dataset).</p>
Full article ">Figure 5
<p>Examples of masked images. (<b>a</b>) Rainfall intensity dataset: road mask, (<b>b</b>) rainfall intensity dataset: failed road mask, (<b>c</b>) cloud type dataset: sky mask, (<b>d</b>) cloud type dataset: failed sky mask.</p>
Full article ">Figure 6
<p>Example images for each rainfall intensity label.</p>
Full article ">Figure 7
<p>Visualization of classification basis via Grad-CAM (<b>upper</b>: whole-area model, <b>lower</b>: road-area model). Red boxes: focus area.</p>
Full article ">Figure 8
<p>Example images for each cloud label. Cb: cumulonimbus, Ns: nimbostratus, Other: Other nine clouds.</p>
Full article ">Figure 9
<p>Example images for Ns: nimbostratus clouds, Sc: stratocumulus clouds, and St: stratus clouds.</p>
Full article ">Figure 10
<p>Visualization of classification basis via Grad-CAM (<b>upper</b>: whole-area model; <b>lower</b>: sky-area model). Red boxes: focus area.</p>
Full article ">
21 pages, 20756 KiB  
Article
A Novel Method for Cloud and Cloud Shadow Detection Based on the Maximum and Minimum Values of Sentinel-2 Time Series Images
by Kewen Liang, Gang Yang, Yangyan Zuo, Jiahui Chen, Weiwei Sun, Xiangchao Meng and Binjie Chen
Remote Sens. 2024, 16(8), 1392; https://doi.org/10.3390/rs16081392 - 15 Apr 2024
Viewed by 1621
Abstract
Automatic and accurate detection of clouds and cloud shadows is a critical aspect of optical remote sensing image preprocessing. This paper provides a time series maximum and minimum mask method (TSMM) for cloud and cloud shadow detection. Firstly, the Cloud Score+S2_HARMONIZED (CS+S2) is [...] Read more.
Automatic and accurate detection of clouds and cloud shadows is a critical aspect of optical remote sensing image preprocessing. This paper provides a time series maximum and minimum mask method (TSMM) for cloud and cloud shadow detection. Firstly, the Cloud Score+S2_HARMONIZED (CS+S2) is employed as a preliminary mask for clouds and cloud shadows. Secondly, we calculate the ratio of the maximum and sub-maximum values of the blue band in the time series, as well as the ratio of the minimum and sub-minimum values of the near-infrared band in the time series, to eliminate noise from the time series data. Finally, the maximum value of the clear blue band and the minimum value of the near-infrared band after noise removal are employed for cloud and cloud shadow detection, respectively. A national and a global dataset were used to validate the TSMM, and it was quantitatively compared against five other advanced methods or products. When clouds and cloud shadows are detected simultaneously, in the S2ccs dataset, the overall accuracy (OA) reaches 0.93 and the F1 score reaches 0.85. Compared with the most advanced CS+S2, there are increases of 3% and 9%, respectively. In the CloudSEN12 dataset, compared with CS+S2, the producer’s accuracy (PA) and F1 score show increases of 10% and 4%, respectively. Additionally, when applied to Landsat-8 images, TSMM outperforms Fmask, demonstrating its strong generalization capability. Full article
(This article belongs to the Special Issue Satellite-Based Cloud Climatologies)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Seven study areas of S2ccs dataset.</p>
Full article ">Figure 2
<p>Seven representative study areas in CloudSEN12.</p>
Full article ">Figure 3
<p>TSMM technical flow chart. It is mainly composed of (<b>I</b>) pretreatment (in gray), (<b>II</b>) maximum and minimum composite (in yellow), (<b>III</b>) cloud and cloud shadow extraction (in green).</p>
Full article ">Figure 4
<p>Sequence composite diagram. The upper half of the image represents the sequence image of the cloud and the cloud shadow mask. (<b>a</b>) The maximum composite image with cloud noise identifier, (<b>b</b>) the maximum composite image, (<b>c</b>) the sub-maximum composite image, (<b>d</b>) the sub-minimum composite image, (<b>e</b>) the minimum composite image, and (<b>f</b>) the minimum composite image with shadow noise identifier.</p>
Full article ">Figure 5
<p>Time Series Length and Max–Min Magnification Sensitivity Experiments. The first row is the accuracy evaluation of clouds, the second row is the accuracy evaluation of cloud shadows, the third row is the accuracy evaluation of clouds and cloud shadows, and the fourth row is the clear accuracy evaluation. The accuracy evaluations were OA, UA, PA, and F1, respectively. The red point coordinates are expressed as the parameters with the highest accuracy.</p>
Full article ">Figure 6
<p>Convolution Kernel Size and Neighborhood Mean Sensitivity Experiments. The first row is the accuracy evaluation of clouds, the second row is the accuracy evaluation of cloud shadows, the third row is the accuracy evaluation of clouds and cloud shadows, and the fourth row is the clear accuracy evaluation. The accuracy evaluations were OA, UA, PA, and F1, respectively. The red point coordinates are expressed as the parameters with the highest accuracy.</p>
Full article ">Figure 7
<p>From top to bottom, there are two representative research areas in the S2ccs dataset. Two images and their cloud and cloud shadow detection classification maps, which are divided into cloud (red), cloud shadow (yellow), cloud and cloud shadow (orange), and clear (light blue).</p>
Full article ">Figure 8
<p>There are two representative research areas in CloudSEN12. Two images and their cloud and cloud shadow detection classification maps, which are divided into cloud (red), cloud shadow (yellow), cloud and cloud shadow (orange), and clear (light blue).</p>
Full article ">Figure 9
<p>Two images of Ningbo area. The classification map of cloud and cloud shadow detection is divided into cloud and cloud shadow (orange) and clear (light blue). From left to right are the original images, Fmask‘s classification map, and TSMM‘s classification map. The red box line indicates that TSMM detects clouds and cloud shadows, while Fmask does not.</p>
Full article ">Figure 9 Cont.
<p>Two images of Ningbo area. The classification map of cloud and cloud shadow detection is divided into cloud and cloud shadow (orange) and clear (light blue). From left to right are the original images, Fmask‘s classification map, and TSMM‘s classification map. The red box line indicates that TSMM detects clouds and cloud shadows, while Fmask does not.</p>
Full article ">Figure A1
<p>From top to bottom, there are seven research areas in the S2ccs dataset. Fourteen images and their cloud and cloud shadow detection classification maps, which are divided into cloud (red), cloud shadow (yellow), cloud and cloud shadow (orange), and clear (light blue), are presented. A–G represents the images of the seven regions in <a href="#remotesensing-16-01392-f001" class="html-fig">Figure 1</a>, A1 represents the shooting date, A2 represents the shooting date, and other codes also represent the time relationship.</p>
Full article ">Figure A1 Cont.
<p>From top to bottom, there are seven research areas in the S2ccs dataset. Fourteen images and their cloud and cloud shadow detection classification maps, which are divided into cloud (red), cloud shadow (yellow), cloud and cloud shadow (orange), and clear (light blue), are presented. A–G represents the images of the seven regions in <a href="#remotesensing-16-01392-f001" class="html-fig">Figure 1</a>, A1 represents the shooting date, A2 represents the shooting date, and other codes also represent the time relationship.</p>
Full article ">Figure A2
<p>They are seven representative research areas in CloudSEN12. Fourteen images and their cloud and cloud shadow detection classification maps, which are divided into cloud (red), cloud shadow (yellow), cloud and cloud shadow (orange), and clear (light blue), are presented. H–N represents the images of the seven regions in <a href="#remotesensing-16-01392-f002" class="html-fig">Figure 2</a>, H1 represents the shooting date, H2 represents the shooting date, and other codes also represent the time relationship.</p>
Full article ">Figure A2 Cont.
<p>They are seven representative research areas in CloudSEN12. Fourteen images and their cloud and cloud shadow detection classification maps, which are divided into cloud (red), cloud shadow (yellow), cloud and cloud shadow (orange), and clear (light blue), are presented. H–N represents the images of the seven regions in <a href="#remotesensing-16-01392-f002" class="html-fig">Figure 2</a>, H1 represents the shooting date, H2 represents the shooting date, and other codes also represent the time relationship.</p>
Full article ">
22 pages, 32270 KiB  
Article
A Cloud Coverage Image Reconstruction Approach for Remote Sensing of Temperature and Vegetation in Amazon Rainforest
by Emili Bezerra, Salomão Mafalda, Ana Beatriz Alvarez, Diego Armando Uman-Flores, William Isaac Perez-Torres and Facundo Palomino-Quispe
Appl. Sci. 2023, 13(23), 12900; https://doi.org/10.3390/app132312900 - 1 Dec 2023
Cited by 2 | Viewed by 1856
Abstract
Remote sensing involves actions to obtain information about an area located on Earth. In the Amazon region, the presence of clouds is a common occurrence, and the visualization of important terrestrial information in the image, like vegetation and temperature, can be difficult. In [...] Read more.
Remote sensing involves actions to obtain information about an area located on Earth. In the Amazon region, the presence of clouds is a common occurrence, and the visualization of important terrestrial information in the image, like vegetation and temperature, can be difficult. In order to estimate land surface temperature (LST) and the normalized difference vegetation index (NDVI) from satellite images with cloud coverage, the inpainting approach will be applied to remove clouds and restore the image of the removed region. This paper proposes the use of the neural network LaMa (large mask inpainting) and the scalable model named Big LaMa for the automatic reconstruction process in satellite images. Experiments are conducted on Landsat-8 satellite images of the Amazon rainforest in the state of Acre, Brazil. To evaluate the architecture’s accuracy, the RMSE (root mean squared error), SSIM (structural similarity index) and PSNR (peak signal-to-noise ratio) metrics were used. The LST and NDVI of the reconstructed image were calculated and compared qualitatively and quantitatively, using scatter plots and the chosen metrics, respectively. The experimental results show that the Big LaMa architecture performs more effectively and robustly in restoring images in terms of visual quality. And the LaMa network shows minimal superiority for the measured metrics when addressing medium marked areas. When comparing the results achieved in NDVI and LST of the reconstructed images with real cloud coverage, great visual results were obtained with Big LaMa. Full article
(This article belongs to the Section Environmental Sciences)
Show Figures

Figure 1

Figure 1
<p>Application example of the inpainting approach. (<b>a</b>) Image for restoration; (<b>b</b>) detail of areas (in red) for reconstruction; (<b>c</b>) reconstructed output image.</p>
Full article ">Figure 2
<p>Images stacking with the image <span class="html-italic">x</span>, the mask <span class="html-italic">m</span> and the resultant four-channel input tensor <span class="html-italic">x</span><math display="inline"><semantics> <msup> <mrow/> <mo>′</mo> </msup> </semantics></math>.</p>
Full article ">Figure 3
<p>The scheme of the method for large-mask inpainting (LaMa).</p>
Full article ">Figure 4
<p>Sample of masks generated to represent synthetic clouds.</p>
Full article ">Figure 5
<p>SEBAL model steps used in the image processing.</p>
Full article ">Figure 6
<p>Study area.</p>
Full article ">Figure 7
<p>Study area with Landsat-8 scenes highlighted.</p>
Full article ">Figure 8
<p>Process of stacking the B4, B5 and B10 bands to generate a single image.</p>
Full article ">Figure 9
<p>Deskew applied to generate the resulting image.</p>
Full article ">Figure 10
<p>Cropping applied in the images to generate patches.</p>
Full article ">Figure 11
<p>The ground truth image and its respective <span class="html-italic">LST</span> and <span class="html-italic">NDVI</span>.</p>
Full article ">Figure 12
<p>Reconstruction of satellite images contaminated by synthetic clouds using the LaMa network. Each row shows images corresponding to a specific scenario and each column shows images with synthetic clouds, the reconstructed image and its <span class="html-italic">LST</span> and <span class="html-italic">NDVI</span>, respectively.</p>
Full article ">Figure 13
<p>Reconstruction of satellite images contaminated by synthetic clouds with the Big LaMa model. Each row shows images corresponding to a specific scenario and each column shows images with synthetic clouds, the reconstructed image and its <span class="html-italic">LST</span> and <span class="html-italic">NDVI</span>, respectively.</p>
Full article ">Figure 14
<p>LaMa network scatter plots. The black straight line represents the original image, while the red line represents the analysis of the reconstructed image versus the original image for three different scenarios. Each row corresponds to a specific scenario and each column corresponds to the reconstructed image and its <span class="html-italic">LST</span> and <span class="html-italic">NDVI</span>, respectively.</p>
Full article ">Figure 15
<p>Scatter plots of the image reconstruction using the Big LaMa model. The black straight line represents the original image, while the red line represents the analysis of the reconstructed image versus the original image for three different scenarios. Each row corresponds to a specific scenario and each column corresponds to the reconstructed image, its <span class="html-italic">LST</span> and <span class="html-italic">NDVI</span>, respectively.</p>
Full article ">Figure 16
<p>Reconstruction of satellite images with cloud coverage, using LaMa and Big Lama.</p>
Full article ">Figure 17
<p>The original image with clouds and their respective <span class="html-italic">LST</span> and <span class="html-italic">NDVI</span>.</p>
Full article ">
16 pages, 742 KiB  
Article
REKP: Refined External Knowledge into Prompt-Tuning for Few-Shot Text Classification
by Yuzhuo Dang, Weijie Chen, Xin Zhang and Honghui Chen
Mathematics 2023, 11(23), 4780; https://doi.org/10.3390/math11234780 - 27 Nov 2023
Cited by 1 | Viewed by 1137
Abstract
Text classification is a machine learning technique employed to assign a given text to predefined categories, facilitating the automatic analysis and processing of textual data. However, an important problem is that the number of new text categories is growing faster than that of [...] Read more.
Text classification is a machine learning technique employed to assign a given text to predefined categories, facilitating the automatic analysis and processing of textual data. However, an important problem is that the number of new text categories is growing faster than that of human annotation data, which makes many new categories of text data lack a lot of annotation data. As a result, the conventional deep neural network is forced to over-fit, which damages the application in the real world. As a solution to this problem, academics recommend addressing data scarcity through few-shot learning. One of the efficient methods is prompt-tuning, which transforms the input text into a mask prediction problem featuring [MASK]. By utilizing descriptors, the model maps output words to labels, enabling accurate prediction. Nevertheless, the previous prompt-based adaption approaches often relied on manually produced verbalizers or a single label to represent the entire label vocabulary, which makes the mapping granularity low, resulting in words not being accurately mapped to their label. To address these issues, we propose to enhance the verbalizer and construct the refined external knowledge into a prompt-tuning (REKP) model. We employ the external knowledge bases to increase the mapping space of tagged terms and design three refinement methods to remove noise data. We conduct comprehensive experiments on four benchmark datasets, namely AG’s News, Yahoo, IMDB, and Amazon. The results demonstrate that REKP can outperform the state-of-the-art baselines in terms of Micro-F1 on knowledge-enhanced text classification. In addition, we conduct an ablation study to ascertain the functionality of each module in our model, revealing that the refinement module significantly contributes to enhancing classification accuracy. Full article
Show Figures

Figure 1

Figure 1
<p>The framework of REKP. First, the whole label word goes through Label Word Refinement and Correlation Refinement to make its granularity more in line with the target task, and then, through Importance Refinement, the weight value of each label word is calculated. Finally, the verbalizer maps the predictions over label words into Labels.</p>
Full article ">Figure 2
<p>WR process diagram.</p>
Full article ">Figure 3
<p>CR process diagram.</p>
Full article ">Figure 4
<p>IR process diagram. (Instance Encoder. A tag word is vectorized through PLMs. Instance-level Attention. Pay more attention to tag words related to Label and reduce the influence of noise. The whole process can be described as: tag words and Label are transformed into feature vectors through Instance Encoder, and then the results are input into Instance-level Attention, and the weight of the label words is obtained after the weight sum.)</p>
Full article ">Figure 5
<p>The impact of sample quantity on model performance.</p>
Full article ">Figure 6
<p>The remaining number of label words after WR and CR. (In our code, the number of tag thesaurus of each Label will be output after WR and CR, and we add all the tag words to get the number of tag words in WR and CR of this dataset.)</p>
Full article ">
12 pages, 485 KiB  
Article
Habitual Mask Wearing as Part of COVID-19 Control in Japan: An Assessment Using the Self-Report Habit Index
by Tianwen Li, Marie Fujimoto, Katsuma Hayashi, Asami Anzai and Hiroshi Nishiura
Behav. Sci. 2023, 13(11), 951; https://doi.org/10.3390/bs13110951 - 19 Nov 2023
Cited by 4 | Viewed by 3628
Abstract
Although the Japanese government removed mask-wearing requirements in 2023, relatively high rates of mask wearing have continued in Japan. We aimed to assess psychological reasons and the strength of habitual mask wearing in Japan. An Internet-based cross-sectional survey was conducted with non-random participant [...] Read more.
Although the Japanese government removed mask-wearing requirements in 2023, relatively high rates of mask wearing have continued in Japan. We aimed to assess psychological reasons and the strength of habitual mask wearing in Japan. An Internet-based cross-sectional survey was conducted with non-random participant recruitment. We explored the frequency of mask usage, investigating psychological reasons for wearing masks. A regression analysis examined the association between psychological reasons and the frequency of mask wearing. The habitual use of masks was assessed in the participant’s most frequently visited indoor space and public transport using the self-report habit index. The principal component analysis with varimax rotation revealed distinct habitual characteristics. Among the 2640 participants surveyed from 6 to 9 February 2023, only 4.9% reported not wearing masks at all. Conformity to social norms was the most important reason for masks. Participants exhibited a slightly higher degree of habituation towards mask wearing on public transport compared to indoor spaces. The mask-wearing rate was higher in females than in males, and no significant difference was identified by age group. Daily mask wearing in indoor spaces was characterized by two traits (automaticity and behavioral frequency). A high mask-wearing frequency has been maintained in Japan during the social reopening transition period. Mask wearing has become a part of daily habit, especially on public transport, largely driven by automatic and frequent practice. Full article
(This article belongs to the Special Issue Health Psychology and Behaviors during COVID-19)
Show Figures

Figure 1

Figure 1
<p>Distribution of Self-Report Habit Index (SRHI) scores. This figure shows the distribution of SRHI scores of different subgroups in indoor space and public transport. The horizontal axis represents the different subgroups, and the vertical axis represents the SRHI score. The SRHI score represents the strength of the habit of mask wearing, ranging from 12 to 84. Higher SRHI scores indicate stronger habits. (<b>A</b>) Distribution of SRHI scores of all participants in indoor space and on public transport; (<b>B</b>,<b>C</b>) Distribution of the SRHI scores by sex in indoor space and on public transport; (<b>D</b>,<b>E</b>) Distribution of SRHI scores by age group in indoor space and on public transport. The box plot ranges from lower to upper quartiles, and the middle line represents the median value. Whiskers extend to minimum and maximum scores.</p>
Full article ">
19 pages, 8540 KiB  
Article
Bone Metastases Lesion Segmentation on Breast Cancer Bone Scan Images with Negative Sample Training
by Yi-You Chen, Po-Nien Yu, Yung-Chi Lai, Te-Chun Hsieh and Da-Chuan Cheng
Diagnostics 2023, 13(19), 3042; https://doi.org/10.3390/diagnostics13193042 - 25 Sep 2023
Cited by 1 | Viewed by 2511
Abstract
The use of deep learning methods for the automatic detection and quantification of bone metastases in bone scan images holds significant clinical value. A fast and accurate automated system for segmenting bone metastatic lesions can assist clinical physicians in diagnosis. In this study, [...] Read more.
The use of deep learning methods for the automatic detection and quantification of bone metastases in bone scan images holds significant clinical value. A fast and accurate automated system for segmenting bone metastatic lesions can assist clinical physicians in diagnosis. In this study, a small internal dataset comprising 100 breast cancer patients (90 cases of bone metastasis and 10 cases of non-metastasis) and 100 prostate cancer patients (50 cases of bone metastasis and 50 cases of non-metastasis) was used for model training. Initially, all image labels were binary. We used the Otsu thresholding method or negative mining to generate a non-metastasis mask, thereby transforming the image labels into three classes. We adopted the Double U-Net as the baseline model and made modifications to its output activation function. We changed the activation function to SoftMax to accommodate multi-class segmentation. Several methods were used to enhance model performance, including background pre-processing to remove background information, adding negative samples to improve model precision, and using transfer learning to leverage shared features between two datasets, which enhances the model’s performance. The performance was investigated via 10-fold cross-validation and computed on a pixel-level scale. The best model we achieved had a precision of 69.96%, a sensitivity of 63.55%, and an F1-score of 66.60%. Compared to the baseline model, this represents an 8.40% improvement in precision, a 0.56% improvement in sensitivity, and a 4.33% improvement in the F1-score. The developed system has the potential to provide pre-diagnostic reports for physicians in final decisions and the calculation of the bone scan index (BSI) with the combination with bone skeleton segmentation. Full article
(This article belongs to the Special Issue Artificial Intelligence in Clinical Medical Imaging)
Show Figures

Figure 1

Figure 1
<p>Bone scan images of breast cancer patients. (<b>a</b>) With metastasis; (<b>b</b>) without metastasis.</p>
Full article ">Figure 2
<p>The schematic of the manually annotated results. (<b>a</b>) Bone scan image; (<b>b</b>) overlay of bone scan image with ground truth; (<b>c</b>) ground truth.</p>
Full article ">Figure 3
<p>Flowchart of brightness normalization.</p>
Full article ">Figure 4
<p>The modified architecture diagram of Double U-Net, the baseline network.</p>
Full article ">Figure 5
<p>Illustration of negative sample productions. Notably, the metastasis hotspots are eliminated (the black holes), if the image has metastasis. (<b>a</b>) Otsu thresholding; (<b>b</b>) negative mining.</p>
Full article ">Figure 6
<p>The qualitative result of the baseline network. (<b>a</b>) Ground truth; (<b>b</b>) segmentation results (precision: 79.14; sensitivity: 78.22; F1-score: 78.68).</p>
Full article ">Figure 7
<p>Illustration of applying Otsu thresholding to positive samples to generate NM masks. Three classes are included: BG, NM, and M.</p>
Full article ">Figure 8
<p>Illustration of applying Otsu thresholding to negative samples to generate NM masks. Three classes are included: BG, NM, and M.</p>
Full article ">Figure 9
<p>Illustration of applying negative mining to positive samples to generate NM masks. Three classes are included: BG, NM, and M.</p>
Full article ">Figure 10
<p>Illustration of applying negative mining to negative samples to generate NM masks. Three classes are included: BG, NM, and M.</p>
Full article ">Figure 11
<p>The qualitative results after transfer learning. (<b>a</b>) Ground truth; (<b>b</b>) segmentation results with dice loss (precision: 79.14, sensitivity: 73.41, F1-score: 76.17); (<b>c</b>) segmentation results with focal Tversky loss (precision: 74.02, sensitivity: 86.24, F1-score: 79.67).</p>
Full article ">Figure 12
<p>Mis-segmentation of non-metastatic lesions. (<b>a</b>) Bone fracture (head region) (precision: 88.46; sensitivity: 60.97; F1-score: 72.19); (<b>b</b>) motion artifact (head region) (precision: 69.32; sensitivity: 47.84; F1-score: 56.61); (<b>c</b>) injection site (wrist) (precision: 43.55; sensitivity: 70.65; F1-score: 53.88); (<b>d</b>) injection site (elbow) (precision: 82.81; sensitivity: 55.52; F1-score: 66.47); (<b>e</b>) kidney (precision: 51.85; sensitivity: 47.89; F1-score: 49.79); (<b>f</b>) bladder (precision: 47.47; sensitivity: 78.28; F1-score: 59.10).</p>
Full article ">Figure 13
<p>Artifacts in bone scan images of prostate cancer. (<b>a</b>) Catheter; (<b>b</b>) urinary bag; (<b>c</b>) diaper.</p>
Full article ">
26 pages, 31605 KiB  
Article
An Automatic Method for Rice Mapping Based on Phenological Features with Sentinel-1 Time-Series Images
by Guixiang Tian, Heping Li, Qi Jiang, Baojun Qiao, Ning Li, Zhengwei Guo, Jianhui Zhao and Huijin Yang
Remote Sens. 2023, 15(11), 2785; https://doi.org/10.3390/rs15112785 - 26 May 2023
Cited by 4 | Viewed by 2424
Abstract
Rice is one of the most important staple foods in the world, feeding more than 50% of the global population. However, rice is also a significant emitter of greenhouse gases and plays a role in global climate change. As a result, quickly and [...] Read more.
Rice is one of the most important staple foods in the world, feeding more than 50% of the global population. However, rice is also a significant emitter of greenhouse gases and plays a role in global climate change. As a result, quickly and accurately obtaining the rice mapping is crucial for ensuring global food security and mitigating global warming. In this study, we proposed an automated rice mapping method called automated rice mapping using V-shaped phenological features of rice (Auto-RMVPF) based on the time-series Sentinel-1A images, which are composed of four main steps. First, the dynamic threshold method automatically extracts abundant rice samples by flooding signals. Second, the second-order difference method automatically extracts the phenological period of rice based on the scattering feature of rice samples. Then, the key “V” feature of the VH backscatter time series, which rises before and after rice transplanting due to flooding, is used for rice mapping. Finally, the farmland mask is extracted to avoid interference from non-farmland features on the rice map, and the median filter is applied to remove noise from the rice map and obtain the final spatial distribution of rice. The results show that the Auto-RMVPF method not only can automatically obtain abundant rice samples but also can extract the accurate phenological period of rice. At the same time, the accuracy of rice mapping is also satisfactory, with an overall accuracy is more than 95% and an F1 score of over 0.91. The overall accuracy of the Auto-RMVPF method is improved by 2.8–12.2% compared with support vector machine (SVM) with an overall accuracy of 89.9% (25 training samples) and 92.2% (124 training samples), random forest (RF) with an overall accuracy of 82.8% (25 training samples) and 88.3% (124 training samples), and automated rice mapping using synthetic aperture radar flooding signals (ARM-SARFS) with an overall accuracy of 89.9%. Altogether, these experimental results suggest that the Auto-RMVPF method has broad prospects for automatic rice mapping, especially for mountainous regions where ground samples are often not easily accessible. Full article
(This article belongs to the Section Biogeosciences Remote Sensing)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The location of the study area in Taishan County, Guangdong Province, China. Red points are the locations of each field survey. The purple line represents the route of the ground campaigns. The yellow box represents the verification area, and the pink box represents the study area.</p>
Full article ">Figure 2
<p>Rice cropping calendar and schedule of acquisitions organized as Sentinel images and ground campaigns. Black circles denote acquired Sentinel-1A images, and red crosses denote a lack of images. Black triangles denote the date of ground campaigns. There are no Sentinel-1A images with the acquisition date of 25 May 2019. The backscattering coefficients of rice for 25 May 2019 were calculated based on the mean value of backscattering coefficients for 13 May and 5 June 2019.</p>
Full article ">Figure 3
<p>VH backscattering coefficient curve of different land-cover types.</p>
Full article ">Figure 4
<p>The framework of the proposed Auto-RMVPF method. The rice samples extraction based on the dynamic threshold method. The rice phenological-period extraction based on the second-order method. The rice mapping based on the improved ARM-SARFS method.</p>
Full article ">Figure 5
<p>Filtering process of VH backscattering curve of rice. (<b>a</b>) SG filter; (<b>b</b>) Time filter.</p>
Full article ">Figure 6
<p>Process diagram of extracting local maximum and local minimum points from rice-growth curve. (<b>a</b>) The original VH band time series of single rice; (<b>b</b>) results of step 2; (<b>c</b>) results of step 3; (<b>d</b>) results of the final step, the red rectangle represents a growth period of rice (i.e., the growing period of rice).</p>
Full article ">Figure 7
<p>Schematic diagram of the principle of median filter.</p>
Full article ">Figure 8
<p>The rice samples extraction of 20 paddy fields.</p>
Full article ">Figure 9
<p>Spatial distribution of rice sample extracted by the dynamic threshold method.</p>
Full article ">Figure 10
<p>Comparison before and after the farmland mask and median filter of Auto-RMVPF. (<b>a</b>) Without the farmland mask and median filter. (<b>b</b>) With the farmland mask, some noise points remain after applying the farmland mask in the red cycle. (<b>c</b>) With the farmland mask and median filter.</p>
Full article ">Figure 11
<p>The result of rice phenological-period extraction. (<b>a</b>) Statistical results of the number of local minimum points; (<b>b</b>) Statistical results of the number of local maximum points. a–c are the local maximum points of the number of local minimum points; d–f are the local maximum points of the number of local maximum points.</p>
Full article ">Figure 12
<p>VH backscattering coefficients of 20 paddy fields change with time. The black line is the original data. The red line is the data that have been processed by the SG filter and time filter. The red solid square is the extracted transplanting date. The blue solid circle is the extracted harvest date. The black dotted line is the transplanting date obtained by ground survey, and the green dotted line is the harvest date obtained by ground survey.</p>
Full article ">Figure 13
<p>The results of noise removal in verification area through different filters.</p>
Full article ">Figure 14
<p>The results of different median filter window size. (<b>a</b>) Window size is 3, unfinished noise removal in the red cycle; (<b>b</b>) window size is 5, paddy fields with better boundary information protection in the red rectangle; (<b>c</b>) window size is 7, paddy fields boundary information is not well protected.</p>
Full article ">Figure 15
<p>Experimental results of the methods.</p>
Full article ">Figure 16
<p>Results of rice classification in verification area.</p>
Full article ">Figure 17
<p>Results of rice classification in single-rice area by different methods. The red box represents the single-rice area.</p>
Full article ">
22 pages, 6060 KiB  
Article
Effects of Image Size on Deep Learning
by Olivier Rukundo
Electronics 2023, 12(4), 985; https://doi.org/10.3390/electronics12040985 - 16 Feb 2023
Cited by 27 | Viewed by 4038
Abstract
In this work, the best size for late gadolinium enhancement (LGE) magnetic resonance imaging (MRI) images in the training dataset was determined to optimize deep learning training outcomes. Non-extra pixel and extra pixel interpolation algorithms were used to determine the new size of [...] Read more.
In this work, the best size for late gadolinium enhancement (LGE) magnetic resonance imaging (MRI) images in the training dataset was determined to optimize deep learning training outcomes. Non-extra pixel and extra pixel interpolation algorithms were used to determine the new size of the LGE-MRI images. A novel strategy was introduced to handle interpolation masks and remove extra class labels in interpolated ground truth (GT) segmentation masks. The expectation maximization, weighted intensity, a priori information (EWA) algorithm was used for the quantification of myocardial infarction (MI) in automatically segmented LGE-MRI images. Arbitrary threshold, comparison of the sums, and sums of differences are methods used to estimate the relationship between semi-automatic or manual and fully automated quantification of myocardial infarction (MI) results. The relationship between semi-automatic and fully automated quantification of MI results was found to be closer in the case of bigger LGE MRI images (55.5% closer to manual results) than in the case of smaller LGE MRI images (22.2% closer to manual results). Full article
Show Figures

Figure 1

Figure 1
<p>A schematic representation of two examples of the most commonly used networks/architectures—(<b>A</b>) CNN and (<b>B</b>) multi-stream CNN—for automated medical image analysis. Each block contains relevant layer nodes, while relevant layer connections are generalized by a blue arrow symbol.</p>
Full article ">Figure 2
<p>U-net architecture. <span class="html-italic">Conv</span> means convolution. <span class="html-italic">ReLU</span> is rectified linear unit. <span class="html-italic">DepthConv</span> is depth concatenation. <span class="html-italic">UpConv</span> means up-convolution or transposed convolution. <span class="html-italic">MaxPool</span> is Max Pooling.</p>
Full article ">Figure 3
<p>Histograms: (<b>Top left</b>): GT segmentation mask of the size 128 × 128. (<b>Top right</b>): NN-based GT segmentation mask of the size 256 × 256. (<b>Bottom left</b>): BIC-based GT segmentation mask of the size 256 × 256. (<b>Bottom right</b>): LCZ-based GT segmentation mask of the size 256 × 256.</p>
Full article ">Figure 4
<p>Example showing the bicubic (BIC) interpolated GT segmentation mask after removing extra class labels using the Equation (1)-based function.</p>
Full article ">Figure 5
<p>Five steps to remove extra class labels in BIC interpolated GT segmentation masks.</p>
Full article ">Figure 6
<p>(<b>a</b>) S1 and (<b>b</b>) S2 output images of size 256 × 256.</p>
Full article ">Figure 7
<p>(<b>a</b>) S3 and (<b>b</b>) S4 output images of size 256 × 256.</p>
Full article ">Figure 8
<p>(<b>a</b>) Input mask of size 128 × 128. (<b>b</b>) S5 output mask of size 256 × 256.</p>
Full article ">Figure 9
<p>Segmentation results: Region 1.</p>
Full article ">Figure 10
<p>Segmentation results: Region 2.</p>
Full article ">Figure 11
<p>Segmentation results: Region 3.</p>
Full article ">Figure 12
<p>U-net vs. Segnet | Segmentation results | Region 1.</p>
Full article ">Figure 13
<p>U-net vs. Segnet | Segmentation results | Region 2.</p>
Full article ">Figure 14
<p>U-net vs. Segnet | Segmentation Results | Region 3.</p>
Full article ">Figure 15
<p>C128 segmented output masks | From top to bottom: dice indices are equal to 0.9953, 0.9945, 0.9873, and 0.9929.</p>
Full article ">Figure 16
<p>N256 segmented output masks | From top to bottom: dice indices are equal to 0.9961, 0.9963, 0.9909, 0.9925.</p>
Full article ">Figure 17
<p>B256F segmented output masks | From top to bottom: dice indices are equal to 0.9945, 0.9956, 0.9900, 0.9944.</p>
Full article ">Figure 18
<p>L256F segmented output masks | From top to bottom: dice indices are equal to 0.9953, 0.9957, 0.9902, 0.9942.</p>
Full article ">Figure 19
<p>B256U segmented output masks | From top to bottom: dice indices are equal to 0.9718, 0.9554, 0.8868, 0.9130.</p>
Full article ">Figure 20
<p>L256U segmented output masks | From top to bottom: dice indices equal to 0.9694, 0.9558, 0.8854, 0.9150.</p>
Full article ">Figure 21
<p>MI quantification results—scar (mL).</p>
Full article ">Figure 22
<p>MI quantification results—scar (%).</p>
Full article ">Figure 23
<p>MI quantification results—mo (%).</p>
Full article ">
31 pages, 4577 KiB  
Article
Unsupervised Building Extraction from Multimodal Aerial Data Based on Accurate Vegetation Removal and Image Feature Consistency Constraint
by Yan Meng, Shanxiong Chen, Yuxuan Liu, Li Li, Zemin Zhang, Tao Ke and Xiangyun Hu
Remote Sens. 2022, 14(8), 1912; https://doi.org/10.3390/rs14081912 - 15 Apr 2022
Cited by 9 | Viewed by 2672
Abstract
Accurate building extraction from remotely sensed data is difficult to perform automatically because of the complex environments and the complex shapes, colours and textures of buildings. Supervised deep-learning-based methods offer a possible solution to solve this problem. However, these methods generally require many [...] Read more.
Accurate building extraction from remotely sensed data is difficult to perform automatically because of the complex environments and the complex shapes, colours and textures of buildings. Supervised deep-learning-based methods offer a possible solution to solve this problem. However, these methods generally require many high-quality, manually labelled samples to obtain satisfactory test results, and their production is time and labour intensive. For multimodal data with sufficient information, extracting buildings accurately in as unsupervised a manner as possible. Combining remote sensing images and LiDAR point clouds for unsupervised building extraction is not a new idea, but existing methods often experience two problems: (1) the accuracy of vegetation detection is often not high, which leads to limited building extraction accuracy, and (2) they lack a proper mechanism to further refine the building masks. We propose two methods to address these problems, combining aerial images and aerial LiDAR point clouds. First, we improve two recently developed vegetation detection methods to generate accurate initial building masks. We then refine the building masks based on the image feature consistency constraint, which can replace inaccurate LiDAR-derived boundaries with accurate image-based boundaries, remove the remaining vegetation points and recover some missing building points. Our methods do not require manual parameter tuning or manual data labelling, but still exhibit a competitive performance compared to 29 methods: our methods exhibit accuracies higher than or comparable to 19 state-of-the-art methods (including 8 deep-learning-based methods and 11 unsupervised methods, and 9 of them combine remote sensing images and 3D data), and outperform the top 10 methods (4 of them combine remote sensing images and LiDAR data) evaluated using all three test areas of the Vaihingen dataset on the official website of the ISPRS Test Project on Urban Classification and 3D Building Reconstruction in average area quality. These comparative results verify that our unsupervised methods combining multisource data are very effective. Full article
Show Figures

Figure 1

Figure 1
<p>Illustration of the generation of the initial building mask. (<b>a</b>) The first image of the Vaihingen dataset; (<b>b</b>) vegetation detection result of the proposed iSH+eSH method; (<b>c</b>) the nDSM processed by the moment-preserving method [<a href="#B73-remotesensing-14-01912" class="html-bibr">73</a>]; (<b>d</b>) initial building mask (the nDSM where the vegetation recognised by the iSH+eSH method has been removed). The detected vegetation regions in (<b>b</b>) and building regions in (<b>d</b>), and the non-ground points of the nDSM in (<b>c</b>) are marked in white.</p>
Full article ">Figure 2
<p>Workflow of our building-mask refinement method based on the image feature consistency constraint. Seg denotes the image segmentation obtained by a non-semantic image segmentation algorithm, such as the GS [<a href="#B80-remotesensing-14-01912" class="html-bibr">80</a>], SLIC [<a href="#B81-remotesensing-14-01912" class="html-bibr">81</a>] and ERS [<a href="#B82-remotesensing-14-01912" class="html-bibr">82</a>] algorithms used by us; Res denotes a region matching result; and RM stands for region matching.</p>
Full article ">Figure 3
<p>Illustration of preprocessing before the region matching of our building-mask refinement method. (<b>a</b>): An enlarged initial building mask, which is part of <a href="#remotesensing-14-01912-f001" class="html-fig">Figure 1</a>d. (<b>b</b>): (<b>a</b>) after being processed by the first morphological closing operation. (<b>c</b>): (<b>a</b>) after being processed by both the first morphological closing and the first morphological opening operations.</p>
Full article ">Figure 4
<p>Detailed workflow for the region matching of our method.</p>
Full article ">Figure 5
<p>Illustration of postprocessing after the region matching of our building mask refinement method. (<b>a</b>): An enlarged building mask after our region matching method, which has the same spatial range as <a href="#remotesensing-14-01912-f003" class="html-fig">Figure 3</a>a. (<b>b</b>): (<b>a</b>) after being processed by the second morphological opening operation. (<b>c</b>): (<b>a</b>) after being processed by both the second morphological opening and the second closing operation.</p>
Full article ">Figure 6
<p>Illustration of our building mask refinement method on real data. (<b>a</b>,<b>e</b>): Region-matching result of the GS algorithm. (<b>b</b>,<b>f</b>): Region-matching result of the SLIC algorithm. (<b>c</b>,<b>g</b>): Region-matching result of the ERS algorithm. (<b>d</b>,<b>h</b>): Union of the above three region-matching results that has undergone postprocessing. In (<b>e</b>–<b>h</b>), the region-matching results are overlaid with the ground truth, with yellow regions denoting correct detection, red regions denoting false detection (overdetection), and green regions denoting missing detection (underdetection).</p>
Full article ">Figure 7
<p>Building extraction results on the first test area. (<b>a</b>) Orthorectified image; (<b>b</b>) ground truth (white regions denote the buildings); (<b>c</b>) nDSM; (<b>d</b>–<b>h</b>): result of DeepLabv3+, U-Net, the HOA method, the beSH+IFCC method, and the biSH+IFCC method.</p>
Full article ">Figure 8
<p>The building extraction results on the second test area. (<b>a</b>) Orthorectified image; (<b>b</b>) ground truth (white regions denote the buildings); (<b>c</b>) nDSM; (<b>d</b>–<b>h</b>): result of DeepLabv3+, U-Net, the HOA method, the beSH+IFCC method and the biSH+IFCC method.</p>
Full article ">Figure 9
<p>The building extraction results on the third test area. (<b>a</b>) Orthorectified image; (<b>b</b>) ground truth (white regions denote the buildings); (<b>c</b>) nDSM; (<b>d</b>–<b>h</b>): result of DeepLabv3+, U-Net, the HOA method, the beSH+IFCC method and the biSH+IFCC method.</p>
Full article ">Figure 10
<p>Rectangle-marked comparison of the biSH+IFCC method with DeepLabv3+ (<b>a</b>,<b>d</b>), U-Net (<b>b</b>,<b>e</b>) and the HOA method (<b>c</b>,<b>f</b>) on the first test area. In (<b>a</b>–<b>c</b>), the rectangles are drawn on the results of the compared methods, while in (<b>d</b>–<b>f</b>), the rectangles are drawn on the result of the biSH+IFCC method.</p>
Full article ">Figure 11
<p>Rectangle-marked comparison of the biSH+IFCC method with DeepLabv3+ (<b>a</b>,<b>d</b>); U-Net (<b>b</b>,<b>e</b>); and the HOA method (<b>c</b>,<b>f</b>) on the second test area. In (<b>a</b>–<b>c</b>), the rectangles are drawn on the results of the compared methods, while in (<b>d</b>–<b>f</b>), the rectangles are drawn on the result of the biSH+IFCC method.</p>
Full article ">Figure 12
<p>Rectangle-marked comparison of the biSH+IFCC method with DeepLabv3+ (<b>a</b>,<b>d</b>); U-Net (<b>b</b>,<b>e</b>); and the HOA method (<b>c</b>,<b>f</b>) on the third test area. In (<b>a</b>–<b>c</b>), the rectangles are drawn on the results of the compared methods, while in (<b>d</b>–<b>f</b>), the rectangles are drawn on the result of the biSH+IFCC method.</p>
Full article ">Figure 13
<p>Illustration of some enlarged unfavourable results of our biSH+IFCC method. (<b>a</b>) Remote sensing image; (<b>b</b>) nDSM; (<b>c</b>) initial building mask (the nDSM where the vegetation recognised by the iSH+eSH method has been removed); (<b>d</b>) the final building mask.</p>
Full article ">
21 pages, 47782 KiB  
Article
AFD-StackGAN: Automatic Mask Generation Network for Face De-Occlusion Using StackGAN
by Abdul Jabbar, Xi Li, Muhammad Assam, Javed Ali Khan, Marwa Obayya, Mimouna Abdullah Alkhonaini, Fahd N. Al-Wesabi and Muhammad Assad
Sensors 2022, 22(5), 1747; https://doi.org/10.3390/s22051747 - 23 Feb 2022
Cited by 8 | Viewed by 3167
Abstract
To address the problem of automatically detecting and removing the mask without user interaction, we present a GAN-based automatic approach for face de-occlusion, called Automatic Mask Generation Network for Face De-occlusion Using Stacked Generative Adversarial Networks (AFD-StackGAN). In this approach, we decompose the [...] Read more.
To address the problem of automatically detecting and removing the mask without user interaction, we present a GAN-based automatic approach for face de-occlusion, called Automatic Mask Generation Network for Face De-occlusion Using Stacked Generative Adversarial Networks (AFD-StackGAN). In this approach, we decompose the problem into two primary stages (i.e., Stage-I Network and Stage-II Network) and employ a separate GAN in both stages. Stage-I Network (Binary Mask Generation Network) automatically creates a binary mask for the masked region in the input images (occluded images). Then, Stage-II Network (Face De-occlusion Network) removes the mask object and synthesizes the damaged region with fine details while retaining the restored face’s appearance and structural consistency. Furthermore, we create a paired synthetic face-occluded dataset using the publicly available CelebA face images to train the proposed model. AFD-StackGAN is evaluated using real-world test images gathered from the Internet. Our extensive experimental results confirm the robustness and efficiency of the proposed model in removing complex mask objects from facial images compared to the previous image manipulation approaches. Additionally, we provide ablation studies for performance comparison between the user-defined mask and auto-defined mask and demonstrate the benefits of refiner networks in the generation process. Full article
(This article belongs to the Special Issue Big Data Analytics in Internet of Things Environment)
Show Figures

Figure 1

Figure 1
<p>The proposed AFD-StackGAN results on real-world images.</p>
Full article ">Figure 2
<p>The architecture of the automatic mask removal network for face de-occlusion. It consists of Stage-I Network that generates a binary mask and Stage-II Network that removes the mask object from input facial images.</p>
Full article ">Figure 3
<p>Some images of our synthetic dataset.</p>
Full article ">Figure 4
<p>The results of Stage-I Network on real-world images.</p>
Full article ">Figure 5
<p>The results of AFD-StackGAN (Stage-I Network + Stage-II Network) on real-world images.</p>
Full article ">Figure 6
<p>Visual assessment of the proposed AFD-StackGAN with the baseline models on real-world images.</p>
Full article ">Figure 7
<p>AFD-StackGAN performance for real face images with occlusion masks that have very different structures and locations in the face images than the occlusion masks used in the synthetic dataset. The first row shows occluded input facial images, and the second row shows de-occluded output face images.</p>
Full article ">Figure 8
<p>Visual comparison of the automatic mask removal network (used auto-generated mask) with FD-StackGAN (used user-defined mask).</p>
Full article ">Figure 9
<p>Results of image refiner network on real-world images further improve the results by rectifying what is missing or wrong in the mask base network results.</p>
Full article ">
11 pages, 9315 KiB  
Article
Throwaway Shadows Using Parallel Encoders Generative Adversarial Network
by Kamran Javed, Nizam Ud Din, Ghulam Hussain and Tahir Farooq
Appl. Sci. 2022, 12(2), 824; https://doi.org/10.3390/app12020824 - 14 Jan 2022
Cited by 3 | Viewed by 2232
Abstract
Face photographs taken on a bright sunny day or in floodlight contain unnecessary shadows of objects on the face. Most previous works deal with removing shadow from scene images and struggle with doing so for facial images. Faces have a complex semantic structure, [...] Read more.
Face photographs taken on a bright sunny day or in floodlight contain unnecessary shadows of objects on the face. Most previous works deal with removing shadow from scene images and struggle with doing so for facial images. Faces have a complex semantic structure, due to which shadow removal is challenging. The aim of this research is to remove the shadow of an object in facial images. We propose a novel generative adversarial network (GAN) based image-to-image translation approach for shadow removal in face images. The first stage of our model automatically produces a binary segmentation mask for the shadow region. Then, the second stage, which is a GAN-based network, removes the object shadow and synthesizes the effected region. The generator network of our GAN has two parallel encoders—one is standard convolution path and the other is a partial convolution. We find that this combination in the generator results not only in learning an incorporated semantic structure but also in disentangling visual discrepancies problems under the shadow area. In addition to GAN loss, we exploit low level L1, structural level SSIM and perceptual loss from a pre-trained loss network for better texture and perceptual quality, respectively. Since there is no paired dataset for the shadow removal problem, we created a synthetic shadow dataset for training our network in a supervised manner. The proposed approach effectively removes shadows from real and synthetic test samples, while retaining complex facial semantics. Experimental evaluations consistently show the advantages of the proposed method over several representative state-of-the-art approaches. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Proposed Network Architecture for Shadow Removal.</p>
Full article ">Figure 2
<p>Visual comparison of shadow removal. (<b>a</b>) Input image, (<b>b</b>) EdgeConnect [<a href="#B10-applsci-12-00824" class="html-bibr">10</a>], (<b>c</b>) Partial Convolution, [<a href="#B14-applsci-12-00824" class="html-bibr">14</a>], (<b>d</b>) Gated Convolution [<a href="#B15-applsci-12-00824" class="html-bibr">15</a>], (<b>e</b>) Ghost-free Shadow removal [<a href="#B6-applsci-12-00824" class="html-bibr">6</a>], (<b>f</b>) Ours, (<b>g</b>) Ground truth. <b>Note:</b> There is no ground truth for the first couple of rows since these samples are real world shadow images collected from the Internet. The last two samples are from our synthetic database.</p>
Full article ">Figure 3
<p>Additional qualitative results of our model for complex and large size shadow samples in our synthetic database.</p>
Full article ">Figure 4
<p>Shadow removal results of our proposed method on the scene images from ISTD dataset [<a href="#B4-applsci-12-00824" class="html-bibr">4</a>].</p>
Full article ">
19 pages, 10525 KiB  
Article
Moving Car Recognition and Removal for 3D Urban Modelling Using Oblique Images
by Chong Yang, Fan Zhang, Yunlong Gao, Zhu Mao, Liang Li and Xianfeng Huang
Remote Sens. 2021, 13(17), 3458; https://doi.org/10.3390/rs13173458 - 31 Aug 2021
Cited by 12 | Viewed by 4150
Abstract
With the progress of photogrammetry and computer vision technology, three-dimensional (3D) reconstruction using aerial oblique images has been widely applied in urban modelling and smart city applications. However, state-of-the-art image-based automatic 3D reconstruction methods cannot effectively handle the unavoidable geometric deformation and incorrect [...] Read more.
With the progress of photogrammetry and computer vision technology, three-dimensional (3D) reconstruction using aerial oblique images has been widely applied in urban modelling and smart city applications. However, state-of-the-art image-based automatic 3D reconstruction methods cannot effectively handle the unavoidable geometric deformation and incorrect texture mapping problems caused by moving cars in a city. This paper proposes a method to address this situation and prevent the influence of moving cars on 3D modelling by recognizing moving cars and combining the recognition results with a photogrammetric 3D modelling procedure. Through car detection using a deep learning method and multiview geometry constraints, we can analyse the state of a car’s movement and apply a proper preprocessing method to the geometrically model generation and texture mapping steps of 3D reconstruction pipelines. First, we apply the traditional Mask R-CNN object detection method to detect cars from oblique images. Then, a detected car and its corresponding image patch calculated by the geometry constraints in the other view images are used to identify the moving state of the car. Finally, the geometry and texture information corresponding to the moving car will be processed according to its moving state. Experiments on three different urban datasets demonstrate that the proposed method is effective in recognizing and removing moving cars and can repair the geometric deformation and error texture mapping problems caused by moving cars. In addition, the methods proposed in this paper can be applied to eliminate other moving objects in 3D modelling applications. Full article
(This article belongs to the Special Issue Urban Multi-Category Object Detection Using Aerial Images)
Show Figures

Figure 1

Figure 1
<p>The influence of moving cars on the 3D reconstruction results, in which Figure (<b>a</b>) displays the mesh deformation and Figure (<b>b</b>) displays the texture distortion.</p>
Full article ">Figure 2
<p>A comparison of the results before and after the optimization of the moving car regions (Figure (<b>a</b>) was obtained using the traditional reconstruction method, and Figure (<b>b</b>) was obtained with our method).</p>
Full article ">Figure 3
<p>Pipelines of the multiview 3D reconstruction method that integrates moving car recognition and removal.</p>
Full article ">Figure 4
<p>A schematic diagram of car information recognition based on the Mask R-CNN model.</p>
Full article ">Figure 5
<p>The recognition results of car information.</p>
Full article ">Figure 6
<p>Classification diagrams of different car states, in which Figure (<b>a</b>) illustrates cars in the moving state, Figure (<b>b</b>) and Figure (<b>c</b>) illustrate cars that are temporarily stationary, and Figure (<b>d</b>) illustrates stationary cars.</p>
Full article ">Figure 6 Cont.
<p>Classification diagrams of different car states, in which Figure (<b>a</b>) illustrates cars in the moving state, Figure (<b>b</b>) and Figure (<b>c</b>) illustrate cars that are temporarily stationary, and Figure (<b>d</b>) illustrates stationary cars.</p>
Full article ">Figure 7
<p>CMS identification using multiview images.</p>
Full article ">Figure 8
<p>Loss values of model training.</p>
Full article ">Figure 9
<p>Car information detection results under different shooting conditions, in which subfigure (<b>a</b>) is the detection result under sunny and ortho-shooting conditions, and subfigure (<b>b</b>) is the detection result under cloudy and ortho-shooting conditions. Subfigure (<b>c</b>) shows the detection results under sunny and oblique-shooting conditions.</p>
Full article ">Figure 9 Cont.
<p>Car information detection results under different shooting conditions, in which subfigure (<b>a</b>) is the detection result under sunny and ortho-shooting conditions, and subfigure (<b>b</b>) is the detection result under cloudy and ortho-shooting conditions. Subfigure (<b>c</b>) shows the detection results under sunny and oblique-shooting conditions.</p>
Full article ">Figure 10
<p>The car state recognition diagram of the viaduct scene, where subfigure (<b>a</b>) is the original image, subfigure (<b>b</b>) is the geometric mesh of the scene, subfigure (<b>c</b>) is the different textures in the multiview images corresponding to the red frame area in subfigure (<b>a</b>), subfigure (<b>d</b>) is the detection results of car information in subfigure (<b>a</b>), and subfigure (<b>e</b>) is the recognition results of car state in Figure (<b>d</b>).</p>
Full article ">Figure 11
<p>The car state recognition diagram of a public parking lot, in which Figure (<b>a</b>) is the original image, Figure (<b>b</b>) is the geometric mesh of the scene, Figure (<b>c</b>) is the state recognition results of the cars detected in Figure (<b>a</b>), and Figure (<b>d</b>) is the results of geometric mesh clean of the car areas where the temporary stay occurred.</p>
Full article ">Figure 12
<p>The car state recognition diagram of a traffic light intersection area, in which Figure (<b>a</b>) is the original image, Figure (<b>b</b>) is the detection results of the car information in Figure (<b>a</b>), Figure (<b>c</b>) is the geometric mesh of the scene, and Figure (<b>d</b>) is the state recognition results of the cars detected in Figure (<b>b</b>).</p>
Full article ">Figure 13
<p>A comparison of the moving car removal results in urban viaduct scenes, where Figure (<b>a</b>) is the scene model constructed using the traditional method and Figure (<b>b</b>) is the 3D reconstruction result using the method in this article.</p>
Full article ">Figure 14
<p>A comparison of the removal results of moving cars in the parking lot area, where Figure (<b>a</b>) is the 3D model constructed using the traditional method and Figure (<b>b</b>) is the 3D reconstruction result after the moving cars are removed using the method in this paper.</p>
Full article ">Figure 15
<p>A comparison of the removal results of moving cars in the intersection area, where Figure (<b>a</b>) is the scene model constructed using the traditional method and Figure (<b>b</b>) is the 3D reconstruction result using the method in this article.</p>
Full article ">
Back to TopTop