Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (4,653)

Search Parameters:
Keywords = network representation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 2022 KiB  
Article
Selective Auditory Attention Detection Using Combined Transformer and Convolutional Graph Neural Networks
by Masoud Geravanchizadeh, Amir Shaygan Asl and Sebelan Danishvar
Bioengineering 2024, 11(12), 1216; https://doi.org/10.3390/bioengineering11121216 (registering DOI) - 30 Nov 2024
Viewed by 9
Abstract
Attention is one of many human cognitive functions that are essential in everyday life. Given our limited processing capacity, attention helps us focus only on what matters. Focusing attention on one speaker in an environment with many speakers is a critical ability of [...] Read more.
Attention is one of many human cognitive functions that are essential in everyday life. Given our limited processing capacity, attention helps us focus only on what matters. Focusing attention on one speaker in an environment with many speakers is a critical ability of the human auditory system. This paper proposes a new end-to-end method based on the combined transformer and graph convolutional neural network (TraGCNN) that can effectively detect auditory attention from electroencephalograms (EEGs). This approach eliminates the need for manual feature extraction, which is often time-consuming and subjective. Here, the first EEG signals are converted to graphs. We then extract attention information from these graphs using spatial and temporal approaches. Finally, our models are trained with these data. Our model can detect auditory attention in both the spatial and temporal domains. Here, the EEG input is first processed by transformer layers to obtain a sequential representation of EEG based on attention onsets. Then, a family of graph convolutional layers is used to find the most active electrodes using the spatial position of electrodes. Finally, the corresponding EEG features of active electrodes are fed into the graph attention layers to detect auditory attention. The Fuglsang 2020 dataset is used in the experiments to train and test the proposed and baseline systems. The new TraGCNN approach, as compared with state-of-the-art attention classification methods from the literature, yields the highest performance in terms of accuracy (80.12%) as a classification metric. Additionally, the proposed model results in higher performance than our previously graph-based model for different lengths of EEG segments. The new TraGCNN approach is advantageous because attenuation detection is achieved from EEG signals of subjects without requiring speech stimuli, as is the case with conventional auditory attention detection methods. Furthermore, examining the proposed model for different lengths of EEG segments shows that the model is faster than our previous graph-based detection method in terms of computational complexity. The findings of this study have important implications for the understanding and assessment of auditory attention, which is crucial for many applications, such as brain–computer interface (BCI) systems, speech separation, and neuro-steered hearing aid development. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Figure 1

Figure 1
<p>The block diagram of the proposed end-to-end TraGCNN-SAAD system for the detection of auditory attention. The system consists the modules for preprocessing and TraGCNN. Preprocessing aims to create a graph-based dataset. The new TraGCNN module consists of a transformer and a family of GCNNs for detecting auditory attention from the generated graph dataset.</p>
Full article ">Figure 2
<p>The proposed TraGCNN model. The model consists of two distinct parts. The first part consists of transformer layers which extract attentional information in the temporal domain, and the second part consists of a group of GraphSAGE, GCNN, and Graph Attention layers, which consider the position of important electrodes for detecting attention in the spatial domain.</p>
Full article ">Figure 3
<p>GraphSAGE operator. The key idea behind GraphSAGE is that the model learns how to aggregate feature information from a node’s local neighborhood [<a href="#B44-bioengineering-11-01216" class="html-bibr">44</a>,<a href="#B46-bioengineering-11-01216" class="html-bibr">46</a>].</p>
Full article ">Figure 4
<p>The plot of average loss with respect to epochs for the proposed model.</p>
Full article ">Figure 5
<p>The performance of the proposed TraGCNN-SAAD and SAADconnectivity methods for different durations of EEG segments obtained by averaging the accuracy values for 4 folds of cross-validation. Each fold is composed of 15 epochs.</p>
Full article ">
13 pages, 1229 KiB  
Article
Image Quality Assessment and Reliability Analysis of Artificial Intelligence-Based Tumor Classification of Stimulated Raman Histology of Tumor Biobank Samples
by Anna-Katharina Meißner, Tobias Blau, David Reinecke, Gina Fürtjes, Lili Leyer, Nina Müller, Niklas von Spreckelsen, Thomas Stehle, Abdulkader Al Shugri, Reinhard Büttner, Roland Goldbrunner, Marco Timmer and Volker Neuschmelting
Diagnostics 2024, 14(23), 2701; https://doi.org/10.3390/diagnostics14232701 (registering DOI) - 30 Nov 2024
Viewed by 155
Abstract
Background: Stimulated Raman histology (SRH) is a label-free optical imaging method for rapid intraoperative analysis of fresh tissue samples. Analysis of SRH images using Convolutional Neural Networks (CNN) has shown promising results for predicting the main histopathological classes of neurooncological tumors. Due to [...] Read more.
Background: Stimulated Raman histology (SRH) is a label-free optical imaging method for rapid intraoperative analysis of fresh tissue samples. Analysis of SRH images using Convolutional Neural Networks (CNN) has shown promising results for predicting the main histopathological classes of neurooncological tumors. Due to the relatively low number of rare tumor representations in CNN training datasets, a valid prediction of rarer entities remains limited. To develop new reliable analysis tools, larger datasets and greater tumor variety are crucial. One way to accomplish this is through research biobanks storing frozen tumor tissue samples. However, there is currently no data available regarding the pertinency of previously frozen tissue samples for SRH analysis. The aim of this study was to assess image quality and perform a comparative reliability analysis of artificial intelligence-based tumor classification using SRH in fresh and frozen tissue samples. Methods: In a monocentric prospective study, tissue samples from 25 patients undergoing brain tumor resection were obtained. SRH was acquired in fresh and defrosted samples of the same specimen after varying storage durations at −80 °C. Image quality was rated by an experienced neuropathologist, and prediction of histopathological diagnosis was performed using two established CNNs. Results: The image quality of SRH in fresh and defrosted tissue samples was high, with a mean image quality score of 1.96 (range 1–5) for both groups. CNN analysis showed high internal consistency for histo-(Cα 0.95) and molecular (Cα 0.83) pathological tumor classification. The results were confirmed using a dataset with samples from the local tumor biobank (Cα 0.91 and 0.53). Conclusions: Our results showed that SRH appears comparably reliable in fresh and frozen tissue samples, enabling the integration of tumor biobank specimens to potentially improve the diagnostic range and reliability of CNN prediction tools. Full article
(This article belongs to the Special Issue Artificial Intelligence in Pathological Image Analysis—2nd Edition)
Show Figures

Figure 1

Figure 1
<p>Workflow of the test–retest analysis of the patient dataset. A small (3-4 mm) tissue sample (sample #1) was collected during surgery and immediately processed for SRH imaging. The fresh squash preparation was scanned in the SRH microscope (scan 1 sample #1) and frozen at −80 °C afterwards. After varying time intervals, the sample was defrosted and scanned again in the SRH microscope (re-scan sample #1). All SRH images from fresh and frozen samples were assessed for image quality and occurrence of freezing artifacts by an experienced neuropathologist and analyzed by the CNNs. CNN: Convolutional Neural Network.</p>
Full article ">Figure 2
<p>SRH images of fresh and thawed tissue samples. Upper row: SRH images of a meningioma (CNS WHO grade 1) ((<b>A</b>): scan 1 sample #1, fresh; (<b>B</b>) re-scan 1, sample #1, defrosted), showing typical histologic features, such as meningothelial whorls (green arrows). Middle row: SRH images of a pulmonary adenocarcinoma metastasis ((<b>C</b>) scan 1, sample #1, fresh; (<b>D</b>) re-scan 1, sample #1, defrosted), showing sheets of epithelial tumor cells. Lower row: SRH images of a glioblastoma, IDH wild type (CNS WHO grade 4) ((<b>E</b>) scan 1, sample #1, fresh; (<b>F</b>) re-scan 1, sample #1, defrosted), showing infiltration of fibrillary tumor.</p>
Full article ">Figure 3
<p>Confusion matrix of the CNN-based histological entity differentiation (<b>left</b>), and diffuse adult-type glioma subclassification (<b>right</b>) in fresh and frozen tumor tissue samples from the same patient. Ca = Cronbach’s alpha.</p>
Full article ">Figure 4
<p>Confusion matrix of the CNN-based histological entity differentiation (<b>left</b>), and diffuse adult-type glioma subclassification (<b>right</b>) in tumor biobank samples comparing SRH images of fresh and frozen tumor samples. Ca = Cronbach’s alpha.</p>
Full article ">
40 pages, 4759 KiB  
Article
Grid-Coupled Geothermal and Decentralised Heat Supply Systems in a Holistic Open-Source Simulation Model for 5GDHC Networks
by Constantin Völzel and Stefan Lechner
Sustainability 2024, 16(23), 10503; https://doi.org/10.3390/su162310503 (registering DOI) - 29 Nov 2024
Viewed by 305
Abstract
In order to reach climate protection goals at national or international levels, new forms of combined heating and cooling networks with ultra-low network temperatures (5GDHC) are viable alternatives to conventional heating networks. This paper presents a simulation library for 5GDHC networks as sustainable [...] Read more.
In order to reach climate protection goals at national or international levels, new forms of combined heating and cooling networks with ultra-low network temperatures (5GDHC) are viable alternatives to conventional heating networks. This paper presents a simulation library for 5GDHC networks as sustainable shared energy systems, developed in the object-oriented simulation framework OpenModelica. It comprises sub-models for residential buildings acting as prosumers in the network, with additional roof-mounted thermal systems, dynamic thermo-hydraulic representations of distribution pipes and storage, time-series-based sources for heating and cooling, and weather conditions adjustable to user-specified locations. A detailed insight into an in-house development of a sub-model for horizontal ground heat collectors is given. This sub-model is directly coupled with thermo-hydraulic network simulations. The simulation results of energy balances and energetic efficiencies for an example district are described. Findings from this study show that decentralised roof-mounted solar thermal systems coupled to the network can contribute 21% to the total source heat provided in the network while annual thermal gains from the distribution pipes add up to more than 18% within the described settings. The presented simulation library can support conceptual and advanced planning phases for renewable heating and cooling supply structures based on environmental sources. Full article
(This article belongs to the Section Energy Sustainability)
Show Figures

Figure 1

Figure 1
<p>Depiction of the abstraction level from real pipe routing with bifilar windings to the computational domain used in the ground heat collector (GHC) simulation model in the present work.</p>
Full article ">Figure 2
<p>Schematic of the computational domain in the GHC model (<b>left</b>) and example visualisation of the 2- and 1-dimensional discretisations of the soil regime in the computational domain (<b>right</b>). Collector pipe diameter and installation depth are not drawn to scale.</p>
Full article ">Figure 3
<p>Implementation of 2-dimensional heat conduction between collector pipe and surrounding soil cells in the GHC model in <span class="html-italic">Modelica</span>. The icon representing the GHC model in <span class="html-italic">Modelica</span> is shown on the bottom left side. The different zones of the soil regime surrounding the collector pipe are visible on the upper left side.</p>
Full article ">Figure 4
<p><span class="html-italic">Modelica</span> implementation of the described dynamic thermal pipe model. Ring elements around the pipe outer wall constitute the thermal capacities and resistances of the surrounding soil.</p>
Full article ">Figure 5
<p>Depiction of an example prosumer model equipped with roof-mounted solar thermal (ST) system and photovoltaic (PV) system, storage for heating water, and domestic hot water (DHW).</p>
Full article ">Figure 6
<p>Depiction of an example prosumer model equipped with roof-mounted photovoltaic thermal (PVT) system, floor heating system, and storage for DHW.</p>
Full article ">Figure 7
<p>Simulated district for the presented case studies, consisting of four different prosumer models comprising a number of aggregated buildings, a supermarket providing excess heat from process cooling, and a GHC as central heat source.</p>
Full article ">Figure 8
<p>Temporal course of the input of water content saturation for the soil types in the GHC model, varying with depth. Data for the reference soil conditions and for the dry soil conditions are displayed.</p>
Full article ">Figure 9
<p>Hourly course of the multi-day-average as reference ambient air temperature for the reference TRY dataset and an alternative dataset with a cold winter period. The time span covers two cold winter periods, up to the end of the third simulated year.</p>
Full article ">Figure 10
<p>Results for annual performance of decentralised heat pumps, <math display="inline"><semantics> <mrow> <mi>S</mi> <mi>C</mi> <mi>O</mi> <msub> <mi>P</mi> <mrow> <mi>H</mi> <mi>P</mi> </mrow> </msub> </mrow> </semantics></math>, and resulting net thermal extraction from GHC in case study 1. Data for net thermal extraction feature identical line styles and markers as corresponding SCOP data. Data for settings with and without activated free cooling (FC), for reference and dry soil conditions, and for reference and alternative TRY datasets are displayed.</p>
Full article ">Figure 11
<p>Hourly course of ambient air temperature, the supply temperature of the GHC to the warm network line and its return temperature from the cold network line, and the warm line temperature of the two prosumer models <span class="html-italic">SFH_001</span> and <span class="html-italic">MFH_001</span>.</p>
Full article ">Figure 12
<p>Monthly distribution of thermal energy fed into the network by different heat sources. Depiction of charging, discharging, and net energy transfer to the GHC, and thermal yield from decentralised ST and PVT systems and from cooling operations fed into the network.</p>
Full article ">Figure 13
<p>Course of monthly demands for room heating and DHW production as well as free cooling demand of prosumer buildings in the case study. Juxtaposition of SCOP values of single-prosumer models for heating operation (HP) and for combined heating and free cooling operation (sys).</p>
Full article ">Figure 14
<p>Annual district-wide energy balance from simulation results of the case study. Percentages outside brackets refer to total heating demand. Proportionate contributions to source heat of decentralised heat pumps are reported separately.</p>
Full article ">
15 pages, 880 KiB  
Article
Node Classification Method Based on Hierarchical Hypergraph Neural Network
by Feng Xu, Wanyue Xiong, Zizhu Fan and Licheng Sun
Sensors 2024, 24(23), 7655; https://doi.org/10.3390/s24237655 (registering DOI) - 29 Nov 2024
Viewed by 182
Abstract
Hypergraph neural networks have gained widespread attention due to their effectiveness in handling graph-structured data with complex relationships and multi-dimensional interactions. However, existing hypergraph neural network models mainly rely on planar message-passing mechanisms, which have limitations: (i) low efficiency in encoding long-distance information; [...] Read more.
Hypergraph neural networks have gained widespread attention due to their effectiveness in handling graph-structured data with complex relationships and multi-dimensional interactions. However, existing hypergraph neural network models mainly rely on planar message-passing mechanisms, which have limitations: (i) low efficiency in encoding long-distance information; (ii) underutilization of high-order neighborhood features, aggregating information only on the edges of the original graph. This paper proposes an innovative hierarchical hypergraph neural network (HCHG) to address these issues. The HCHG combines the high-order relationship-capturing capability of hypergraphs, uses the Louvain community detection algorithm to identify community structures within the network, and constructs hypergraphs layer by layer. In the bottom-level hypergraph, the model establishes high-order relationships through direct neighbor nodes, while in the top-level hypergraph, it captures global relationships between aggregated communities. Through three hierarchical message-passing mechanisms, the HCHG effectively integrates local and global information, enhancing the multi-resolution representation ability of node representations and significantly improving performance in node classification tasks. In addition, the model performs excellently in handling 3D multi-view datasets. Such datasets can be created by capturing 3D shapes and geometric features through sensors or by manual modeling, providing extensive application scenarios for analyzing three-dimensional shapes and complex geometric structures. Theoretical analysis and experimental results show that the HCHG outperforms traditional hypergraph neural networks in complex networks. Full article
(This article belongs to the Special Issue Deep Learning for Perception and Recognition: Method and Applications)
22 pages, 5240 KiB  
Article
MMPW-Net: Detection of Tiny Objects in Aerial Imagery Using Mixed Minimum Point-Wasserstein Distance
by Nan Su, Zilong Zhao, Yiming Yan, Jinpeng Wang, Wanxuan Lu, Hongbo Cui, Yunfei Qu, Shou Feng and Chunhui Zhao
Remote Sens. 2024, 16(23), 4485; https://doi.org/10.3390/rs16234485 - 29 Nov 2024
Viewed by 239
Abstract
The detection of distant tiny objects in aerial imagery plays a pivotal role in early warning, localization, and recognition tasks. However, due to the scarcity of appearance information, minimal pixel representation, susceptibility to blending with the background, and the incompatibility of conventional metrics, [...] Read more.
The detection of distant tiny objects in aerial imagery plays a pivotal role in early warning, localization, and recognition tasks. However, due to the scarcity of appearance information, minimal pixel representation, susceptibility to blending with the background, and the incompatibility of conventional metrics, the rapid and accurate detection of tiny objects poses significant challenges. To address these issues, a single-stage tiny object detector tailored for aerial imagery is proposed, comprising two primary components. Firstly, we introduce a light backbone-heavy neck architecture, named the Global Context Self-Attention and Dense Nested Connection Feature Extraction Network (GC-DN Network), which efficiently extracts and fuses multi-scale features of the target. Secondly, we propose a novel metric, MMPW, to replace the Intersection over Union (IoU) in label assignment strategies, Non-Maximum Suppression (NMS), and regression loss functions. Specifically, MMPW models bounding boxes as 2D Gaussian distributions and utilizes the Mixed Minimum Point-Wasserstein Distance to quantify the similarity between boxes. Experiments conducted on the latest aerial image tiny object datasets, AI-TOD and VisDrone-19, demonstrate that our method improves AP50 performance by 9.4% and 5%, respectively, and AP performance by 4.3% and 3.6%. This validates the efficacy of our approach for detecting tiny objects in aerial imagery. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

Figure 1
<p>Limitations of tiny object detection.</p>
Full article ">Figure 2
<p>The overall structure diagram of MMPW-Net.</p>
Full article ">Figure 3
<p>Structure diagram of GC module.</p>
Full article ">Figure 4
<p>Structure diagram of global context block.</p>
Full article ">Figure 5
<p>Illustration of local structure of DN-Neck.</p>
Full article ">Figure 6
<p>Illustration of bounding box distance measurement.</p>
Full article ">Figure 7
<p>Visualization of detection results from AI-TOD dataset.</p>
Full article ">Figure 8
<p>Visualization of detection results from Visdrone-19 dataset.</p>
Full article ">Figure 9
<p>Aerial imagery covering tiny-scale and general-scale targets.</p>
Full article ">
26 pages, 4034 KiB  
Article
Semi-Supervised Deep Subspace Embedding for Binary Classification of Sella Turcica
by Kaushlesh Singh Shakya, Azadeh Alavi, Julie Porteous, Priti Khatri, Amit Laddi, Manojkumar Jaiswal and Vinay Kumar
Appl. Sci. 2024, 14(23), 11154; https://doi.org/10.3390/app142311154 - 29 Nov 2024
Viewed by 306
Abstract
In orthodontics, the manual tracing of cephalometric radiographs is a common practice, where the Sella Turcica (ST) serves as a reference point. The radiologist often manually traces the outline of the sella using manual tools (e.g., calipers on radiographs). Perhaps the inherent complexity [...] Read more.
In orthodontics, the manual tracing of cephalometric radiographs is a common practice, where the Sella Turcica (ST) serves as a reference point. The radiologist often manually traces the outline of the sella using manual tools (e.g., calipers on radiographs). Perhaps the inherent complexity and variability in the shapes of sella and the lack of advanced assessment tools make the classification of sella challenging, as it requires extensive training, skills, time, and manpower to detect subtle changes that often may not be apparent. Moreover, existing semi-supervised learning (SSL) methods face key limitations such as shift invariance, inadequate feature representation, overfitting on small datasets, and a lack of generalization to unseen variations in ST morphology. Medical imaging data are often unlabeled, limiting the training of automated classification systems for ST morphology. To address these limitations, a novel semi-supervised deep subspace embedding (SSLDSE) framework is proposed. This approach integrates real-time stochastic augmentation to significantly expand the training dataset and introduce natural variability in the ST morphology, overcoming the constraints of small and non-representative datasets. Non-linear features are extracted and mapped to a non-linear subspace using Kullback–Leibler divergence, which ensures that the model remains consistent despite image transformations, thus resolving issues related to shift invariance. Additionally, fine-tuning the Inception-ResNet-v2 network on these enriched features reduces retraining costs when new unlabeled data becomes available. t-distributed stochastic neighbor embedding (t-SNE) is employed for effective feature representation through manifold learning, capturing complex patterns that previous methods might miss. Finally, a zero-shot classifier is utilized to accurately categorize the ST, addressing the challenge of classifying new or unseen variations. Further, the proposed SSLDSE framework is evaluated through comparative analysis with the existing methods (Active SSL, GAN SSL, Contrastive SSL, Modified Inception-ResNet-v2) for ST classification using various evaluation metrics. The SSLDSE and the existing methods are trained on our dataset (sourced from PGI Chandigarh, India), and a blind test is conducted on the benchmark dataset (IEEE ISBI 2015). The proposed method improves classification accuracy by 15% compared to state-of-the-art models and reduces retraining costs. Full article
(This article belongs to the Special Issue Application of Artificial Intelligence in Biomedical Informatics)
Show Figures

Figure 1

Figure 1
<p>Sample images of pre-defined Sella Turcica (ST) shapes: (<b>A</b>) Oval ST, (<b>B</b>) Circular ST, (<b>C</b>) Flat ST, and (<b>D</b>) Bridging ST. This study classified Circular ST as non-bridging, and Bridging ST was used for binary classification.</p>
Full article ">Figure 2
<p>The schematic representation of a Hybrid Database (<span class="html-italic">L</span>) and Hybrid Case Base (<math display="inline"><semantics> <mrow> <mi>S</mi> <mi>L</mi> </mrow> </semantics></math>) from labeled (<math display="inline"><semantics> <mrow> <mi>S</mi> <msub> <mi>L</mi> <mi>i</mi> </msub> </mrow> </semantics></math>) and unlabeled (<math display="inline"><semantics> <mrow> <mi>S</mi> <msub> <mi>L</mi> <mover accent="true"> <mi>i</mi> <mo>^</mo> </mover> </msub> </mrow> </semantics></math>) case data. Feature extraction using KL divergence, mean (<math display="inline"><semantics> <mi>μ</mi> </semantics></math>), and standard deviation (<math display="inline"><semantics> <mi>σ</mi> </semantics></math>) is applied to both databases. Labeled data form a featured database with labels (<math display="inline"><semantics> <msub> <mi>L</mi> <mi>i</mi> </msub> </semantics></math>), while unlabeled data create a featured database without labels (<math display="inline"><semantics> <msub> <mi>L</mi> <mover accent="true"> <mi>i</mi> <mo>^</mo> </mover> </msub> </semantics></math>). A dynamic responsive data and label mechanism integrates both, resulting in (1) a Hybrid Database (<span class="html-italic">L</span>) and (2) a Hybrid Case Base (<math display="inline"><semantics> <mrow> <mi>S</mi> <mi>L</mi> </mrow> </semantics></math>) for further analysis.</p>
Full article ">Figure 3
<p>Process flow diagram of the proposed SSLDSE framework.</p>
Full article ">Figure 4
<p>The figure illustrates a comprehensive framework of the proposed SSLDSE that integrates labeled (<math display="inline"><semantics> <mrow> <mi>S</mi> <msub> <mi>L</mi> <mi>i</mi> </msub> </mrow> </semantics></math>) and unlabeled case databases (<math display="inline"><semantics> <mrow> <mi>S</mi> <msub> <mi>L</mi> <mover accent="true"> <mi>i</mi> <mo>^</mo> </mover> </msub> </mrow> </semantics></math>). Features are extracted using Kullback–Leibler divergence, mean (<math display="inline"><semantics> <mi>μ</mi> </semantics></math>), and standard deviation (<math display="inline"><semantics> <mi>σ</mi> </semantics></math>), forming a Hybrid Database. The data undergo stochastic augmentation and are processed through an Inception-ResNet-V2 model. A deep subspace descriptor with t-SNE refines the feature representations, and the outputs are classified by a zero-shot classifier (<math display="inline"><semantics> <mrow> <mi>Z</mi> <mi>s</mi> <mi>C</mi> </mrow> </semantics></math>) with KL divergence loss, enabling the model to handle unseen or unlabeled ST structures.</p>
Full article ">Figure 5
<p>The illustrated SSLDSE architectural framework processes labeled (L) and semi-labeled (<math display="inline"><semantics> <mrow> <mi>S</mi> <msub> <mi>L</mi> <mi>i</mi> </msub> </mrow> </semantics></math>) data using Inception-ResNet-V2 as the CNN backbone to extract features (P, Q) and estimate pairwise probability densities (<math display="inline"><semantics> <msub> <mi mathvariant="normal">P</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi mathvariant="normal">Q</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> </semantics></math>). KL divergence (<math display="inline"><semantics> <mrow> <msub> <mi>D</mi> <mi>KL</mi> </msub> <mrow> <mo>(</mo> <mi>P</mi> <mo>‖</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </semantics></math>) minimizes divergence through the optimization of (Y). Manifold learning maps feature matrices (X) to t-SNE representations (Y) while preserving structural relationships. The SSL framework employs deep embedding and clustering (mean: <math display="inline"><semantics> <msub> <mi>μ</mi> <mi>j</mi> </msub> </semantics></math>, covariance: <math display="inline"><semantics> <msub> <mi>Σ</mi> <mi>j</mi> </msub> </semantics></math>) for feature representation. The zero-shot classifier constructs semantic vectors and applies KL divergence loss for output prediction (<math display="inline"><semantics> <msub> <mi>O</mi> <mi>i</mi> </msub> </semantics></math>).</p>
Full article ">Figure 6
<p>Confusion matrix and ROC curve showcasing the validation results of the binary classifier, highlighting the proposed model’s classification performance through true positive/negative rates and the AUC-ROC score.</p>
Full article ">Figure 7
<p>t-SNE plots visualizing the quantitative assessment of the proposed SSLDSE method. The plots illustrate the effective separation between bridging and non-bridging labels from our proprietary and IEEE ISBI 2015 datasets, demonstrating a clear class distinction.</p>
Full article ">Figure 8
<p>Boxplots illustrate a detailed comparison of the classification error rates for the proposed SSLDSE method, showing the distribution of error rates across (<b>a</b>) the proprietary dataset and (<b>b</b>) the IEEE ISBI 2015 dataset, highlighting the variability and consistency in classification accuracy.</p>
Full article ">Figure 9
<p>Boxplots comparing classification error rates across utilized SSL approaches and the proposed SSLDSE method, illustrating performance differences and the effectiveness of SSLDSE in reducing classification errors.</p>
Full article ">Figure 10
<p>Visual interpretation of errors in ST-binary classification predictions, illustrating misclassified instances and highlighting the areas where the model’s predictions diverge from the true labels.</p>
Full article ">
20 pages, 13364 KiB  
Article
A Novel Method for Fault Migration Diagnosis of Rolling Bearings Based on MSCVIT Model
by Xiu-Yan Liu, Dong-Lin He, Dong-Qing Guo and Ting-Ting Guo
Electronics 2024, 13(23), 4726; https://doi.org/10.3390/electronics13234726 - 29 Nov 2024
Viewed by 235
Abstract
The normal operation of rolling bearings is crucial to the performance and reliability of rotating machinery. However, the collected vibration signals are often mixed with complex noise, and the transformer network cannot fully extract the characteristics of the vibration signals. To solve this [...] Read more.
The normal operation of rolling bearings is crucial to the performance and reliability of rotating machinery. However, the collected vibration signals are often mixed with complex noise, and the transformer network cannot fully extract the characteristics of the vibration signals. To solve this problem, we propose a data preprocessing method that utilizes singular value decomposition (SVD) and continuous wavelet transform (CWT) along with an improved vision transformer (ViT) model for fault diagnosis. First, the SVD is applied to identify the noise components to improve the data quality. Then, the CWT is used to convert the denoised signal into a two-dimensional (2D) time–frequency representation (TFR) to display the fault features more intuitively. Finally, an improved multi-scale convolutional block attention module (MSCBAM) is embedded into the ViT network to extract fault features. Experimental results on the classical Case Western Reserve University (CWRU) dataset show that the average diagnostic accuracy of the proposed method is 99.3%. Compared with six other fault diagnosis methods, the method proposed in this paper has also achieved good diagnostic results on three other datasets, which can be effectively applied to the timely handling of problematic equipment and reduce downtime. Full article
Show Figures

Figure 1

Figure 1
<p>The structure of the CBAM.</p>
Full article ">Figure 2
<p>The overall framework of the proposed method (“*” is the positional embedding).</p>
Full article ">Figure 3
<p>Energy threshold curve of singular value.</p>
Full article ">Figure 4
<p>(<b>a</b>) Original data; (<b>b</b>) signal comparison diagram before and after noise reduction.</p>
Full article ">Figure 5
<p>The TFRs of rolling bearings: (<b>a</b>–<b>c</b>) ball fault; (<b>d</b>–<b>f</b>) inner ring fault; (<b>g</b>–<b>i</b>) outer ring fault; (<b>j</b>) normal bearing.</p>
Full article ">Figure 6
<p>(<b>a</b>) Loss curve; (<b>b</b>) accuracy curve.</p>
Full article ">Figure 7
<p>Confusion matrix diagram of ten kinds of faults.</p>
Full article ">Figure 8
<p>Diagnostic accuracy with different SNRs.</p>
Full article ">Figure 9
<p>Column diagram of model migration ability.</p>
Full article ">Figure 10
<p>T-SNE dimensional reduction visualization: (<b>a</b>) acceleration; (<b>b</b>) acceleration then deceleration; (<b>c</b>) deceleration then acceleration; (<b>d</b>) deceleration.</p>
Full article ">Figure 11
<p>The diagnostic effects of the seven models on three datasets.</p>
Full article ">Figure 12
<p>Model migration results in the three datasets. The JNU dataset (<b>a</b>,<b>b</b>); the CU dataset (<b>c</b>,<b>d</b>); the OU dataset (<b>e</b>,<b>f</b>). (<b>a</b>,<b>c</b>,<b>e</b>) belong to the MSCVIT; (<b>b</b>,<b>e</b>,<b>f</b>) belong to the ViT.</p>
Full article ">Figure 12 Cont.
<p>Model migration results in the three datasets. The JNU dataset (<b>a</b>,<b>b</b>); the CU dataset (<b>c</b>,<b>d</b>); the OU dataset (<b>e</b>,<b>f</b>). (<b>a</b>,<b>c</b>,<b>e</b>) belong to the MSCVIT; (<b>b</b>,<b>e</b>,<b>f</b>) belong to the ViT.</p>
Full article ">
17 pages, 2093 KiB  
Article
Investigation of Data Augmentation Techniques in Environmental Sound Recognition
by Anastasios Loukas Sarris, Nikolaos Vryzas, Lazaros Vrysis and Charalampos Dimoulas
Electronics 2024, 13(23), 4719; https://doi.org/10.3390/electronics13234719 - 28 Nov 2024
Viewed by 168
Abstract
The majority of sound events that occur in everyday life, like those caused by animals or household devices, can be included in the environmental sound family. This audio category has not been researched as much as music or speech recognition. One main bottleneck [...] Read more.
The majority of sound events that occur in everyday life, like those caused by animals or household devices, can be included in the environmental sound family. This audio category has not been researched as much as music or speech recognition. One main bottleneck in the design of environmental data-driven monitoring automation is the lack of sufficient data representing each of a wide range of categories. In the context of audio data, an important method to increase the available data is the process of the augmentation of existing datasets. In this study, some of the most widespread time domain data augmentation techniques are studied, along with their effects on the recognition of environmental sounds, through the UrbanSound8K dataset, which consists of ten classes. The confusion matrix and the metrics that can be calculated based on the matrix were used to examine the effect of the augmentation. Also, to address the difficulty that arises when large datasets are augmented, a web-based data augmentation application was created. To evaluate the performance of the data augmentation techniques, a convolutional neural network architecture trained on the original set was used. Moreover, four time domain augmentation techniques were used. Although the parameters of the techniques applied were chosen conservatively, they helped the model to better cluster the data, especially in the four classes in which confusion was high in the initial classification. Furthermore, a web application is presented in which the user can upload their own data and apply these data augmentation techniques to both the audio extract and its time frequency representation, the spectrogram. Full article
(This article belongs to the Special Issue Recent Advances in Audio, Speech and Music Processing and Analysis)
Show Figures

Figure 1

Figure 1
<p>Process block diagram of the experiment.</p>
Full article ">Figure 2
<p>Block diagram of the proposed CNN.</p>
Full article ">Figure 3
<p>Block diagram presenting the pipeline of the data augmentation web application.</p>
Full article ">Figure 4
<p>Data augmentation routine implementation. It refers to random gain addition.</p>
Full article ">Figure 5
<p>Class distribution of UrbanSound8K dataset.</p>
Full article ">Figure 6
<p>Confusion matrix of the original classification on test set.</p>
Full article ">Figure 7
<p>Confusion matrices from classification of the augmented test sets. (<b>a</b>) Random gain addition following uniform distribution. (<b>b</b>) Time stretch by a factor of 2 with steady pitch. (<b>c</b>) Pitch shift by a semitone with steady duration. (<b>d</b>) Dynamic range compression by 5 dBs.</p>
Full article ">Figure A1
<p>Home Page of the web application.</p>
Full article ">Figure A2
<p>Random Gain Augmentation Page. The user can upload only .wav sound files.</p>
Full article ">
17 pages, 1555 KiB  
Article
Revolutionizing Wildfire Detection Through UAV-Driven Fire Monitoring with a Transformer-Based Approach
by Shakhnoza Muksimova, Sabina Umirzakova, Sevara Mardieva, Mirjamol Abdullaev and Young Im Cho
Fire 2024, 7(12), 443; https://doi.org/10.3390/fire7120443 - 28 Nov 2024
Viewed by 232
Abstract
The rapid detection and accurate localization of wildfires are critical for effective disaster management and response. This study proposes an innovative Unmanned aerial vehicles (UAVs)-based fire detection system leveraging a modified Miti-DETR model tailored to meet the computational constraints of drones. The enhanced [...] Read more.
The rapid detection and accurate localization of wildfires are critical for effective disaster management and response. This study proposes an innovative Unmanned aerial vehicles (UAVs)-based fire detection system leveraging a modified Miti-DETR model tailored to meet the computational constraints of drones. The enhanced architecture incorporates a redesigned AlexNet backbone with residual depthwise separable convolution blocks, significantly reducing computational load while improving feature extraction and accuracy. Furthermore, a novel residual self-attention mechanism addresses convergence issues in transformer networks, ensuring robust feature representation for complex aerial imagery. The model, which was trained on the FLAME dataset encompassing diverse fire scenarios, demonstrates superior performance in terms of Mean Average Precision (mAP) and Intersection over Union (IoU) metrics compared to existing systems. Its capability to detect and localize fires across varied backgrounds highlights its practical application in real-world scenarios. This advancement represents a pivotal step forward in applying deep learning for real-time wildfire detection, with implications for broader emergency management applications. Full article
27 pages, 7119 KiB  
Article
MACNet: A Multidimensional Attention-Based Convolutional Neural Network for Lower-Limb Motor Imagery Classification
by Ling-Long Li, Guang-Zhong Cao, Yue-Peng Zhang, Wan-Chen Li and Fang Cui
Sensors 2024, 24(23), 7611; https://doi.org/10.3390/s24237611 (registering DOI) - 28 Nov 2024
Viewed by 161
Abstract
Decoding lower-limb motor imagery (MI) is highly important in brain–computer interfaces (BCIs) and rehabilitation engineering. However, it is challenging to classify lower-limb MI from electroencephalogram (EEG) signals, because lower-limb motions (LLMs) including MI are excessively close to physiological representations in the human brain [...] Read more.
Decoding lower-limb motor imagery (MI) is highly important in brain–computer interfaces (BCIs) and rehabilitation engineering. However, it is challenging to classify lower-limb MI from electroencephalogram (EEG) signals, because lower-limb motions (LLMs) including MI are excessively close to physiological representations in the human brain and generate low-quality EEG signals. To address this challenge, this paper proposes a multidimensional attention-based convolutional neural network (CNN), termed MACNet, which is specifically designed for lower-limb MI classification. MACNet integrates a temporal refining module and an attention-enhanced convolutional module by leveraging the local and global feature representation abilities of CNNs and attention mechanisms. The temporal refining module adaptively investigates critical information from each electrode channel to refine EEG signals along the temporal dimension. The attention-enhanced convolutional module extracts temporal and spatial features while refining the feature maps across the channel and spatial dimensions. Owing to the scarcity of public datasets available for lower-limb MI, a specified lower-limb MI dataset involving four routine LLMs is built, consisting of 10 subjects over 20 sessions. Comparison experiments and ablation studies are conducted on this dataset and a public BCI Competition IV 2a EEG dataset. The experimental results show that MACNet achieves state-of-the-art performance and outperforms alternative models for the subject-specific mode. Visualization analysis reveals the excellent feature learning capabilities of MACNet and the potential relationship between lower-limb MI and brain activity. The effectiveness and generalizability of MACNet are verified. Full article
Show Figures

Figure 1

Figure 1
<p>Map of functional areas of brain regions for motor tasks.</p>
Full article ">Figure 2
<p>Overall architecture of the proposed model.</p>
Full article ">Figure 3
<p>Experimental protocol (including the preparation phase, stimulus phase of watching ME video, MI phase, and rest phase).</p>
Full article ">Figure 4
<p>Stimulus video presentation: (<b>a</b>) ME process from standing to sitting down; (<b>b</b>) ME process from sitting to standing; (<b>c</b>) ME process of walking; (<b>d</b>) ME process of standing.</p>
Full article ">Figure 5
<p>The experimental platform.</p>
Full article ">Figure 6
<p>Classification accuracy of the proposed model on the basis of two experiments.</p>
Full article ">Figure 7
<p>(<b>a</b>) Ablation study of MACNet; (<b>b</b>) Classification results of MACNet under channel reduction.</p>
Full article ">Figure 8
<p>Average confusion matrices of the FBSCP, EEGNet, FBCNet, deep ConvNet, ATCNet, Conformer, proposed MACNet, MACNet-Branch 0, and MACNet-Branch I models.</p>
Full article ">Figure 9
<p>t-SNE visualization of learned EEG features using different methods; different colored dots represent different categories.</p>
Full article ">Figure 10
<p>Brain average PSD topography maps for MI tasks. Blue indicates desynchronization while red indicates synchronization.</p>
Full article ">Figure 11
<p>Classification accuracy of the baseline ConvNet model.</p>
Full article ">
22 pages, 6504 KiB  
Article
A Business-Model-Driven Approach to Task-Planning Knowledge Graph Construction
by Tianguo Jin, Xiaoqian Liu, Bingxiang Zeng, Xinglong Chen and Dongliang Zhang
Appl. Sci. 2024, 14(23), 11090; https://doi.org/10.3390/app142311090 - 28 Nov 2024
Viewed by 217
Abstract
As the complexity of mission planning increases, relying on the subjective experience of planners is no longer sufficient to meet the needs of modern mission planning. Knowledge mapping, as a structured knowledge management technique, provides an effective solution for systematically integrating knowledge in [...] Read more.
As the complexity of mission planning increases, relying on the subjective experience of planners is no longer sufficient to meet the needs of modern mission planning. Knowledge mapping, as a structured knowledge management technique, provides an effective solution for systematically integrating knowledge in the task-planning domain. The mission-planning business model is able to systematically capture and portray domain knowledge in mission planning through a formal representation of mission planning processes, rules, and constraints. Thus, it becomes an important source of knowledge for mission-planning knowledge mapping. This paper proposes a business-model-driven knowledge graph construction method for mission planning. First, under the support of conceptual business knowledge, the multidimensional task-planning ontology network expression method is utilized to construct the task-planning ontology network, and then the data-based business knowledge is structured to transform it into business data mapping to complete the acquisition of business knowledge. Then, the task-planning ontology network is constructed using the multidimensional task-planning ontology network representation method under the support of conceptual knowledge. Subsequently, a domain knowledge categorization algorithm based on Ullman subgraph matching is used to realize the matching mapping between the ontology network and business data mapping to complete the categorization of task-planning domain knowledge. Finally, the generated task-planning domain knowledge graph is stored in the Neo4j graph database. In order to ensure the completeness of the knowledge graph, an adaptive adjustment method based on its actual effectiveness is conceived, which is able to detect and adjust the completeness of the knowledge graph. The effectiveness of the proposed methodology is validated by constructing a space-station mission-planning knowledge graph driven by a space-station mission-planning business model. Full article
20 pages, 6279 KiB  
Article
Self-Activated Implicit Neural Representation for Synthetic Aperture Radar Images
by Dongshen Han and Chaoning Zhang
Remote Sens. 2024, 16(23), 4473; https://doi.org/10.3390/rs16234473 - 28 Nov 2024
Viewed by 203
Abstract
Image Implicit Neural Representations (INRs) adopt a neural network to learn a continuous function for mapping the pixel coordinates to their corresponding values. This task has gained significant attention for representing images in a continuous manner. Despite substantial progress regarding natural images, there [...] Read more.
Image Implicit Neural Representations (INRs) adopt a neural network to learn a continuous function for mapping the pixel coordinates to their corresponding values. This task has gained significant attention for representing images in a continuous manner. Despite substantial progress regarding natural images, there is little investigation of INRs for Synthetic Aperture Radar (SAR) images. This work takes a pioneering effort to study INRs for SAR images and finds that fine details are hard to represent. It has been shown in prior works that fine details can be easier to learn when the model weights are better initialized, which motivated us to investigate the benefits of activating the model weight before target training. The challenge of this task lies in the fact that SAR images cannot be used during the model activation stage. To this end, we propose exploiting a cross-pixel relationship of the model output, which relies on no target images. Specifically, we design a novel self-activation method by alternatively using two loss functions: a loss used to smooth out the model output, and another used for the opposite purpose. Extensive results on SAR images empirically show that our proposed method helps improve the model performance by a non-trivial margin. Full article
(This article belongs to the Section Environmental Remote Sensing)
Show Figures

Figure 1

Figure 1
<p>Overview of the learning INR for the image signal. MLP receives pixel coordinates as input and outputs corresponding pixel values, with weight training guided by MSE loss function.</p>
Full article ">Figure 2
<p>Pipeline of the proposed method, comprising two stages: the self-activation stage and the training stage. The self-activation stage is performed before the training stage without using any information related to the target. Instead, it leverages smoothness and anti-smoothness losses to exploit the cross-pixel relationships within the model’s own output, thereby pre-activating the model.</p>
Full article ">Figure 3
<p>Visualization of the INR model SIREN’s initial output (<b>left</b>) and their corresponding gradient maps (<b>right</b>). We observe that the initial output image of SIREN, which has not been trained, is not flattened but possesses a very complex texture.</p>
Full article ">Figure 4
<p>Visualization of the NTK for the INR model SIREN with and without self-activation. After our self-activation stage, the enhanced diagonal characteristics of the model’s NTK indicate that the model is better at capturing high-frequency components.</p>
Full article ">Figure 5
<p>Qualitative evaluation of SAR image reconstruction. The first column displays the ground-truth image. Our self-activation method demonstrates enhanced capability in capturing high-frequency details.</p>
Full article ">Figure 6
<p>Qualitative evaluation of high-resolution SAR image reconstruction. The first column displays the ground-truth image, and we use a red box to highlight and zoom in on the image reconstruction details for better observation. Our self-activation method demonstrates enhanced capability in capturing high-frequency details.</p>
Full article ">Figure 7
<p>Exhibition of model performance trained with different loss functions during the training stage. Each subfigure illustrates the evolution of PSNR and SSIM metrics throughout the training stage.</p>
Full article ">Figure 8
<p>Exhibition of model performance with different numbers of self-activation stage epochs. Each subfigure illustrates the evolution of PSNR and SSIM metrics throughout the training stage.</p>
Full article ">Figure 9
<p>The model output image gradient curve over epochs during our self-activation stage. The directions denote whether the loss component is anti-smoothness or smoothness.</p>
Full article ">
16 pages, 6723 KiB  
Article
Improvement of YOLO v8 Segmentation Algorithm and Its Study in the Identification of Hazards in Plateau Pika
by Yaosheng Han, Yunpeng Jin, Chunmei Li and Xiangjie Huang
Appl. Sci. 2024, 14(23), 11088; https://doi.org/10.3390/app142311088 - 28 Nov 2024
Viewed by 264
Abstract
Rodent infestation has become one of the important factors in grassland degradation on the Qinghai–Tibet Plateau, one of the hindrances to ecological and environmental protection, and a threat to the balance and development of the ecosystem in the Sanjiangyuan region. Based on the [...] Read more.
Rodent infestation has become one of the important factors in grassland degradation on the Qinghai–Tibet Plateau, one of the hindrances to ecological and environmental protection, and a threat to the balance and development of the ecosystem in the Sanjiangyuan region. Based on the need for the scientific planning for ecological protection, this paper designs a method for detecting rodent infestation in plateau scenarios. Firstly, data were collected and annotated, and a dataset of plateau rodent distribution in the Qinghai region was constructed. The collected data include videos captured through drone-based field surveys, which were processed using OpenCV and annotated with LabelMe. The dataset is categorized into four specific types: ungobbled rat holes, gobbled rat holes, rocks, and cow dung. This categorization allows the model to effectively differentiate between rodent-related features and other environmental elements, which is crucial for the segmentation task. Secondly, the latest segmentation algorithm provided by YOLO v8 is improved to design a segmentation algorithm that can accurately detect the distribution of rodent infestation in the plateau scene. The specific improvements are as follows: firstly, the Contextual Transformer module is introduced in YOLO v8 to improve the global modeling capability; secondly, the DRConv dynamic region-aware convolution is introduced in YOLO v8 to improve the convolutional representation capability; thirdly, the attention mechanism is incorporated in the backbone of YOLO v8 to enhance the feature extraction capability of the network capability. A comparison test with the original algorithm on the plateau rodent distribution dataset showed that the new algorithm improved the detection accuracy from 77.9% to 82.74% and MIoU from 67.65% to 72.69% on the plateau rodent distribution dataset. The accuracy of the evaluation of plateau rodent damage levels has been greatly improved. Full article
Show Figures

Figure 1

Figure 1
<p>Structure of the improved YOLO v8.</p>
Full article ">Figure 2
<p>Contextual Transformer (CoT) block.</p>
Full article ">Figure 3
<p>DRConv.</p>
Full article ">Figure 4
<p>CBAM model.</p>
Full article ">Figure 5
<p>Example of data.</p>
Full article ">Figure 6
<p>Training loss curve.</p>
Full article ">Figure 7
<p>Validation loss curve.</p>
Full article ">Figure 8
<p>Precision and recall curves.</p>
Full article ">Figure 9
<p>Comparison of algorithm predictions.</p>
Full article ">Figure 10
<p>Low hazard meadow scene segmentation.</p>
Full article ">Figure 11
<p>High hazard meadow scene segmentation.</p>
Full article ">Figure 12
<p>Plant detection on Thermopsis lanceolata.</p>
Full article ">Figure 13
<p>Plant detection on jjc.</p>
Full article ">Figure 14
<p>Plant detection on zmc.</p>
Full article ">
19 pages, 2948 KiB  
Article
Residual-Based Implicit Neural Representation for Synthetic Aperture Radar Images
by Dongshen Han and Chaoning Zhang
Remote Sens. 2024, 16(23), 4471; https://doi.org/10.3390/rs16234471 - 28 Nov 2024
Viewed by 230
Abstract
Implicit neural representations (INRs) are a new way to represent all kinds of signals ranging from 1D audio to 3D shape signals, among which 2D images are the most widely explored due to their ubiquitous presence. Image INRs utilize a neural network to [...] Read more.
Implicit neural representations (INRs) are a new way to represent all kinds of signals ranging from 1D audio to 3D shape signals, among which 2D images are the most widely explored due to their ubiquitous presence. Image INRs utilize a neural network to learn a continuous function that takes pixel coordinates as input and outputs the corresponding pixel values. The continuous representation of synthetic aperture radar (SAR) images using INRs has not yet been explored. Existing INR frameworks developed on natural images show reasonable performance, but this performance suffers when capturing fine details. This can be attributed to INR’s prioritization of learning inter-pixel relationships, which harms intra-pixel mapping in those regions that require fine detail. To address this, we decompose the target image into an artificial uniform noise component (intra-pixel mapping) and a residual image (inter-pixel relationships). Rather than directly learning the INRs for the target image, we propose a noise-first residual learning (NRL) method. The NRL first learns the uniform noise component, then gradually incorporates the residual into the optimization target using a sine-adjusted incrementation scheme as training progresses. Given that some SAR images inherently contain significant noise, which can facilitate learning the intra-pixel independent mapping, we propose a gradient-based dataset separation method. This method distinguishes between clean and noisy images, allowing the model to learn directly from the noisy images. Extensive experimental results show that our method achieves competitive performance, indicating that learning the intra-pixel independent mapping first, followed by the inter-pixel relationship, can enhance model performance in learning INR for SAR images. Full article
Show Figures

Figure 1

Figure 1
<p>Framework for learning INR for image signals. An MLP-based network maps pixel coordinates to pixel intensity values, then is optimized by minimizing the MSE loss between the target pixel intensity values and their predicted ones.</p>
Full article ">Figure 2
<p>Plot of the periodic activation function in SIREN.</p>
Full article ">Figure 3
<p>Overview of our proposed method. The gradient-based dataset separation method distinguishes between noisy images and clean images; for clean images, we adopt NRL to learn their INRs, while noisy images are learned directly without additional operations.</p>
Full article ">Figure 4
<p>Plot of the curve of <math display="inline"><semantics> <msup> <mi>k</mi> <mi>γ</mi> </msup> </semantics></math> versus <math display="inline"><semantics> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>e</mi> <mi>E</mi> </mfrac> </mstyle> </semantics></math> as <span class="html-italic">e</span> increases during the training process.</p>
Full article ">Figure 5
<p>Visualization of the representation results on both clean and noisy images. The ground truth image is presented in the first column for reference. For each of the SIREN and NRL-SIREN methods, the top row displays the represented full image and the bottom row shows a detailed portion of the image.</p>
Full article ">Figure 6
<p>Example of a clean image and a noisy image, along with their respective gradient maps.</p>
Full article ">Figure 7
<p>Illustration showing the use of a threshold <math display="inline"><semantics> <mi>α</mi> </semantics></math> to distinguish clean and noisy images.</p>
Full article ">Figure 8
<p>Qualitative comparison of SAR image representation results. Our NRL method demonstrates competitive performance in capturing fine details.</p>
Full article ">Figure 9
<p>Example results of image representation produced by various methods. The ground truth image is presented in the left-most column for reference. For each method, the top row displays the reconstructed full image, while the bottom row shows a detailed portion of the image.</p>
Full article ">Figure 10
<p>Plots the curve of <math display="inline"><semantics> <msup> <mi>k</mi> <mi>γ</mi> </msup> </semantics></math> versus <math display="inline"><semantics> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>e</mi> <mi>E</mi> </mfrac> </mstyle> </semantics></math> for various <math display="inline"><semantics> <mi>γ</mi> </semantics></math> as <span class="html-italic">e</span> increases during the training process.</p>
Full article ">Figure 11
<p>Performance analysis of models trained on the target image with varying numbers of epochs. Each subfigure illustrates the progression of the PSNR and SSIM metrics during training.</p>
Full article ">Figure 12
<p>Qualitative comparison of error maps. SIRAN fails to capture fine details such as the white regions, while our proposed method helps to mitigate this problem.</p>
Full article ">
30 pages, 28873 KiB  
Article
MERGE: A Modal Equilibrium Relational Graph Framework for Multi-Modal Knowledge Graph Completion
by Yuying Shang, Kun Fu, Zequn Zhang, Li Jin, Zinan Liu, Shensi Wang and Shuchao Li
Sensors 2024, 24(23), 7605; https://doi.org/10.3390/s24237605 - 28 Nov 2024
Viewed by 239
Abstract
The multi-modal knowledge graph completion (MMKGC) task aims to automatically mine the missing factual knowledge from the existing multi-modal knowledge graphs (MMKGs), which is crucial in advancing cross-modal learning and reasoning. However, few methods consider the adverse effects caused by different missing modal [...] Read more.
The multi-modal knowledge graph completion (MMKGC) task aims to automatically mine the missing factual knowledge from the existing multi-modal knowledge graphs (MMKGs), which is crucial in advancing cross-modal learning and reasoning. However, few methods consider the adverse effects caused by different missing modal information in the model learning process. To address the above challenges, we innovatively propose a Modal Equilibrium Relational Graph framEwork, called MERGE. By constructing three modal-specific directed relational graph attention networks, MERGE can implicitly represent missing modal information for entities by aggregating the modal embeddings from neighboring nodes. Subsequently, a fusion approach based on low-rank tensor decomposition is adopted to align multiple modal features in both the explicit structural level and the implicit semantic level, utilizing the structural information inherent in the original knowledge graphs, which enhances the interpretability of the fused features. Furthermore, we introduce a novel interpolation re-ranking strategy to adjust the importance of modalities during inference while preserving the semantic integrity of each modality. The proposed framework has been validated on four publicly available datasets, and the experimental results have demonstrated the effectiveness and robustness of our method in the MMKGC task. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

Figure 1
<p>An example of multi-modal knowledge graph in the film domain. The visual information of the entity “Titanic” can be represented by the movie posters and stills, while its textual information can be represented by sentence-level descriptions. Each entity is equipped with text descriptions and multiple images.</p>
Full article ">Figure 2
<p>Illustration of the difference between single-stream frameworks and multi-stream frameworks. (<b>a</b>) depicts the process of single-stream frameworks, which integrate diverse modal features into an ensemble embedding for the object entity. (<b>b</b>) depicts the process of multi-stream frameworks, which encode the information of each modality separately, integrating the multi-modal features through a self-defined interaction module.</p>
Full article ">Figure 3
<p>The modality information distribution of 14,951 entities in FB15K dataset.</p>
Full article ">Figure 4
<p>The graphic abstract of the article. The blue and green squares stand for limitations and causes, respectively, while the orange squares stand for the solutions of MERGE.</p>
Full article ">Figure 5
<p>The illustration of the example of tail entity prediction in the MMKGC task, which can be formed as <math display="inline"><semantics> <mrow> <mo>(</mo> <mi>h</mi> <mo>,</mo> <mi>r</mi> <mo>,</mo> <mo>?</mo> <mo>)</mo> </mrow> </semantics></math>.</p>
Full article ">Figure 6
<p>The framework of MERGE for multi-modal knowledge graph completion.</p>
Full article ">Figure 7
<p>The illustration of the visual filtering gate, which is proposed to select the typical representative image of each entity.</p>
Full article ">Figure 8
<p>An example of a directed relational multi-modal knowledge subgraph. Each entity is equipped with three modalities and is connected by relational edges to form a directed graph.</p>
Full article ">Figure 9
<p>Interpolation re-ranking prediction with ensemble. <math display="inline"><semantics> <mrow> <msub> <mi>e</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>e</mi> <mn>2</mn> </msub> <mo>,</mo> <mo>…</mo> <mo>,</mo> <msub> <mi>e</mi> <mi>n</mi> </msub> </mrow> </semantics></math> denote the indices of entities. <math display="inline"><semantics> <mrow> <msub> <mi>p</mi> <mi>s</mi> </msub> <mo>,</mo> <msub> <mi>p</mi> <mi>t</mi> </msub> <mo>,</mo> <msub> <mi>p</mi> <mi>v</mi> </msub> </mrow> </semantics></math> denote the prediction scores of structural, textual, and visual modalities. <math display="inline"><semantics> <mi>α</mi> </semantics></math>, <math display="inline"><semantics> <mi>β</mi> </semantics></math>, and <math display="inline"><semantics> <mi>γ</mi> </semantics></math> are proposed to control the trade-off among three objectives.</p>
Full article ">Figure 10
<p>Impact of hyper-parameters on FB15k-237-IMG dataset. The vertical coordinate is the MRR metric, and the horizontal labels are embedding dimensions, attention heads. and factorization element numbers for subplots (<b>a</b>), (<b>b</b>), and (<b>c</b>), respectively.</p>
Full article ">Figure 11
<p>The values of (<math display="inline"><semantics> <mrow> <mi>β</mi> <mo>,</mo> <mi>γ</mi> </mrow> </semantics></math>) for the FB15k-237-IMG, WN18RR-IMG, DB15K-IMG, and MKG-W datasets. The best result of each dataset is highlighted in yellow.</p>
Full article ">Figure 12
<p>The 3-dimensional spatial distribution for the three modal features of the random 1024 entities in FB15K-237-IMG dataset. (<b>a</b>) is the spatial distribution of input embeddings, while (<b>b</b>) is the spatial distribution of output embeddings. The blue, red and green nodes stand for visual, structural, and textual modality, respectively.</p>
Full article ">
Back to TopTop