Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (910)

Search Parameters:
Keywords = neural language models

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
12 pages, 621 KiB  
Article
Creating and Validating a Ground Truth Dataset of Unified Modeling Language Diagrams Using Deep Learning Techniques
by Javier Torcal, Valentín Moreno, Juan Llorens and Ana Granados
Appl. Sci. 2024, 14(23), 10873; https://doi.org/10.3390/app142310873 (registering DOI) - 24 Nov 2024
Viewed by 89
Abstract
UML (Unified Modeling Language) diagrams are graphical representations used in software engineering which play a vital role in the design and development of software systems and various engineering processes. Large, good-quality datasets containing UML diagrams are essential for different areas in the industry, [...] Read more.
UML (Unified Modeling Language) diagrams are graphical representations used in software engineering which play a vital role in the design and development of software systems and various engineering processes. Large, good-quality datasets containing UML diagrams are essential for different areas in the industry, research, and teaching purposes; however, few exist in the literature and it is common to find duplicate elements in the existing datasets. This might affect the evaluation of the models obtained when using these datasets. This paper addresses the challenge of creating a ground truth dataset of UML diagrams, including semi-automated inspection to remove duplicates and ensuring the correct labeling of all UML diagrams contained in the dataset. In particular, a dataset of six UML diagram classes was assembled, comprising a total of 2626 images (426 activity diagrams, 636 class diagrams, 352 component diagrams, 357 deployment diagrams, 435 sequence diagrams, and 420 use case diagrams). Importantly, unlike other existing datasets, ours contains no duplicate elements and all diagrams are correctly labeled. Our curated dataset is a valuable and unique resource for the research community, serving as a foundation for training and evaluating diverse artificial intelligence models. In this paper, we demonstrate this by training and testing several deep learning models using our dataset, achieving highly satisfactory results compared to those presented in other works in the literature. Additionally, our experimental results highlight the potential of visual transformers for UML diagram classification, setting our approach apart from others that predominantly used convolutional neural networks for similar tasks. Full article
Show Figures

Figure 1

Figure 1
<p>Examples of duplicate diagrams found in the existing datasets (Dataset from Shcherban et al. (I) [<a href="#B16-applsci-14-10873" class="html-bibr">16</a>], available at <a href="https://doi.org/10.5281/zenodo.4595956" target="_blank">https://doi.org/10.5281/zenodo.4595956</a> under the Creative Commons Attribution 4.0 International license; Dataset from Shcherban et al. (II) [<a href="#B17-applsci-14-10873" class="html-bibr">17</a>], available at <a href="https://doi.org/10.5281/zenodo.5141007" target="_blank">https://doi.org/10.5281/zenodo.5141007</a> under the Creative Commons Attribution 4.0 International license; Dataset from Tavares et al. [<a href="#B18-applsci-14-10873" class="html-bibr">18</a>], available at <a href="https://doi.org/10.5281/zenodo.5544378" target="_blank">https://doi.org/10.5281/zenodo.5544378</a> under the Creative Commons Attribution 4.0 International license). The pairs of duplicate diagrams in the different datasets show that, despite the files not being exact copies (as seen in the differing file names), the UML diagrams themselves are identical. This confirms that, while the files may vary, the content of the images remains the same in terms of the UML diagrams presented.   (<b>a</b>) Image corresponding to file <tt>11421.jpg</tt> from the dataset from Shcherban et al. (I) [<a href="#B16-applsci-14-10873" class="html-bibr">16</a>]; (<b>b</b>) Image corresponding to file <tt>14248.jpg</tt> from the dataset from Shcherban et al. (I) [<a href="#B16-applsci-14-10873" class="html-bibr">16</a>]; (<b>c</b>) Image corresponding to file <tt>uml_class_diagram_for_publicma_77qq.jpg</tt> from the dataset from Shcherban et al. (II) [<a href="#B17-applsci-14-10873" class="html-bibr">17</a>]; (<b>d</b>) Image corresponding to file <tt>uml_class_diagram_for_publicma_84qq.jpg</tt> from the dataset from Shcherban et al. (II) [<a href="#B17-applsci-14-10873" class="html-bibr">17</a>]; (<b>e</b>) Image corresponding to file <tt>Component_Diagram_39.jpg</tt> from the dataset from Tavares et al. [<a href="#B18-applsci-14-10873" class="html-bibr">18</a>]; (<b>f</b>) Image corresponding to file <tt>Component_Diagram_110.jpg</tt> from the dataset from Tavares et al. [<a href="#B18-applsci-14-10873" class="html-bibr">18</a>].</p>
Full article ">Figure 2
<p>Radar chart of the F1-score per class (STTL).</p>
Full article ">
17 pages, 5372 KiB  
Article
Ecological Importance Evaluation and Ecological Function Zoning of Yanshan-Taihang Mountain Area of Hebei Province
by Pengtao Zhang, Qixuan Duan, Jie Dong, Lichao Piao and Zhaoyang Cui
Sustainability 2024, 16(23), 10233; https://doi.org/10.3390/su162310233 - 22 Nov 2024
Viewed by 277
Abstract
Ecological importance evaluation can clearly identify the ecological service functions and ecological values of a region. This paper takes the Yanshan-Taihang Mountain area in Hebei Province as the research area, utilizing 2020 land use data. With the help of various analytical models and [...] Read more.
Ecological importance evaluation can clearly identify the ecological service functions and ecological values of a region. This paper takes the Yanshan-Taihang Mountain area in Hebei Province as the research area, utilizing 2020 land use data. With the help of various analytical models and GIS spatial analysis methods, this paper selects water conservation, soil and water conservation, biodiversity, carbon sequestration and oxygen release to evaluate the importance of ecosystem services, and selects soil and water loss sensitivity and land desertification sensitivity to evaluate the ecological sensitivity, so as to identify the important areas of ecological protection in the study area, analyze their spatial change characteristics and divide the leading ecological functions according to the results. The results show that the moderately important and highly important areas in the Yanshan-Taihang region of Hebei Province account for more than 70% of the total study area. Based on the importance evaluation results, three types of dominant ecological function zones were obtained using self-organized feature mapping neural network analysis in the R language, and control measures were proposed. The research results can provide strategic support for local ecological protection and regional ecological restoration, as well as serving as a reference for the optimization of land spatial development patterns. Full article
(This article belongs to the Section Soil Conservation and Sustainability)
Show Figures

Figure 1

Figure 1
<p>Overview of the study area.</p>
Full article ">Figure 2
<p>Group diagram of ecosystem service evaluation results. In the figure, (<b>a</b>) represents the evaluation result of water source conservation; (<b>b</b>) is the evaluation result of carbon fixation and oxygen release; (<b>c</b>) is the result of soil conservation evaluation; (<b>d</b>) is the evaluation result of biodiversity conservation and (<b>e</b>) shows the evaluation results of the importance of ecosystem service functions.</p>
Full article ">Figure 3
<p>Group map of ecological sensitivity evaluation results. In the figure, (<b>a</b>) represents the evaluation result of soil erosion; (<b>b</b>) is the evaluation result of land desertification; (<b>c</b>) is the result of ecological sensitivity assessment.</p>
Full article ">Figure 4
<p>Results of ecological importance evaluation.</p>
Full article ">Figure 5
<p>R language clustering output.</p>
Full article ">Figure 6
<p>Partition result diagram.</p>
Full article ">
17 pages, 5441 KiB  
Article
Parallel Fusion of Graph and Text with Semantic Enhancement for Commonsense Question Answering
by Jiachuang Zong, Zhao Li, Tong Chen, Liguo Zhang and Yiming Zhan
Electronics 2024, 13(23), 4618; https://doi.org/10.3390/electronics13234618 - 22 Nov 2024
Viewed by 225
Abstract
Commonsense question answering (CSQA) is a challenging task in the field of knowledge graph question answering. It combines the context of the question with the relevant knowledge in the knowledge graph to reason and give an answer to the question. Existing CSQA models [...] Read more.
Commonsense question answering (CSQA) is a challenging task in the field of knowledge graph question answering. It combines the context of the question with the relevant knowledge in the knowledge graph to reason and give an answer to the question. Existing CSQA models combine pretrained language models and graph neural networks to process question context and knowledge graph information, respectively, and obtain each other’s information during the reasoning process to improve the accuracy of reasoning. However, the existing models do not fully utilize the textual representation and graph representation after reasoning to reason about the answer, and they do not give enough semantic representation to the edges during the reasoning process of the knowledge graph. Therefore, we propose a novel parallel fusion framework for text and knowledge graphs, using the fused global graph information to enhance the semantic information of reasoning answers. In addition, we enhance the relationship embedding by enriching the initial semantics and adjusting the initial weight distribution, thereby improving the reasoning ability of the graph neural network. We conducted experiments on two public datasets, CommonsenseQA and OpenBookQA, and found that our model is competitive when compared with other baseline models. Additionally, we validated the generalizability of our model on the MedQA-USMLE dataset. Full article
19 pages, 6034 KiB  
Article
GMN+: A Binary Homologous Vulnerability Detection Method Based on Graph Matching Neural Network with Enhanced Attention
by Zheng Zhao, Tianhao Zhang, Xiaoya Fan, Qian Mao, Dafeng Wang and Qi Zhao
Appl. Sci. 2024, 14(22), 10762; https://doi.org/10.3390/app142210762 - 20 Nov 2024
Viewed by 383
Abstract
The widespread reuse of code in the open-source community has led to the proliferation of homologous vulnerabilities, which are security flaws propagated across diverse software systems through the reuse of vulnerable code. Such vulnerabilities pose serious cybersecurity risks, as attackers can exploit the [...] Read more.
The widespread reuse of code in the open-source community has led to the proliferation of homologous vulnerabilities, which are security flaws propagated across diverse software systems through the reuse of vulnerable code. Such vulnerabilities pose serious cybersecurity risks, as attackers can exploit the same weaknesses across multiple platforms. Deep learning has emerged as a promising approach for detecting homologous vulnerabilities in binary code due to their automated feature extraction and high efficiency. However, existing deep learning methods often struggle to capture deep semantic features in binary code, limiting their effectiveness. To address this limitation, this paper presents GMN+, which is a novel graph matching neural network with enhanced attention for detecting homologous vulnerabilities. This method comprehensively considers the information contained in instructions and incorporates types of input instruction. Masked Language Modeling and Instruction Type Prediction are developed as pre-training tasks to enhance the ability of GMN+ in extracting semantic information from basic blocks. GMN+ utilizes an attention mechanism to focus concurrently on the critical semantic information within functions and differences between them, generating robust function embeddings. Experimental results indicate that GMN+ outperforms state-of-the-art methods in various tasks and achieves notable performance in real-world vulnerability detection scenarios. Full article
Show Figures

Figure 1

Figure 1
<p>Architecture of the GMN+ model.</p>
Full article ">Figure 2
<p>An example of instruction normalization and instruction type extraction. (<b>a</b>) Original assembly instructions; (<b>b</b>) Normalized instructions; (<b>c</b>) Instruction types.</p>
Full article ">Figure 3
<p>BERT input embedding.</p>
Full article ">Figure 4
<p>Graph Learner of GMN+.</p>
Full article ">Figure 5
<p>Comparison of ROC curves for different methods across architectures.</p>
Full article ">Figure 6
<p>Comparison of ROC curves for different methods across optimization levels.</p>
Full article ">Figure 7
<p>Comparative results of homologous function search using various methods.</p>
Full article ">Figure 8
<p>Comparison of time overhead for different methods.</p>
Full article ">Figure 9
<p>The performance of GMN+ variants with different blocks in the Semantic Learner.</p>
Full article ">Figure 10
<p>The performance of GMN+ variants with different blocks in the Graph Learner.</p>
Full article ">Figure 11
<p>Comparison of detection results of different methods on real-world vulnerability detection tasks.</p>
Full article ">
20 pages, 713 KiB  
Article
GRMD: A Two-Stage Design Space Exploration Strategy for Customized RNN Accelerators
by Qingpeng Li, Jian Xiao and Jizeng Wei
Symmetry 2024, 16(11), 1546; https://doi.org/10.3390/sym16111546 - 19 Nov 2024
Viewed by 302
Abstract
Recurrent neural networks (RNNs) have produced significant results in many fields, such as natural language processing and speech recognition. Owing to their computational complexity and sequence dependencies, RNNs need to be deployed on customized hardware accelerators to satisfy performance and energy-efficiency constraints. However, [...] Read more.
Recurrent neural networks (RNNs) have produced significant results in many fields, such as natural language processing and speech recognition. Owing to their computational complexity and sequence dependencies, RNNs need to be deployed on customized hardware accelerators to satisfy performance and energy-efficiency constraints. However, designing hardware accelerators for RNNs is challenged by the vast design space and the reliance on ineffective optimization. An efficient automated design space exploration (DSE) strategy that can balance conflicting objectives is wanted. To address the low efficiency and insufficient universality of the resource allocation process employed for hardware accelerators, we propose an automated two-stage design space exploration (DSE) strategy for customized RNN accelerators. The strategy combines a genetic algorithm (GA) and a reinforcement learning (RL) algorithm, and it utilizes symmetrical exploration and exploitation to find the optimal solutions. In the first stage, the area of the hardware accelerator is taken as the optimization objective, and the GA is used for partial exploration purposes to narrow the design space while maintaining diversity. Then, the latency and power of the hardware accelerator are taken as the optimization objectives, and the RL algorithm is used in the second stage to find the corresponding Pareto solutions. To verify the effectiveness of the developed strategy, it is compared with other algorithms. We use three different network models as benchmarks: a vanilla RNN, LSTM, and a GRU. The results demonstrate that the strategy proposed in this paper can provide better solutions and can achieve latency, power, and area reductions of 9.35%, 5.34%, and 11.95%, respectively. The HV of GRMD is reduced by averages of 6.33%, 6.32%, and 0.67%, and the runtime is reduced by averages of 18.11%, 14.94%, and 10.28%, respectively. Additionally, given different weights, it can make reasonable trade-offs between multiple objectives. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

Figure 1
<p>Microarchitecture of our custom spatial accelerators.</p>
Full article ">Figure 2
<p>Latency, power, and area values of four GEMM accelerators with different PE arrays.</p>
Full article ">Figure 3
<p>Overall framework of the DSE method.</p>
Full article ">Figure 4
<p>An overview of small-granularity and large-granularity crossover.</p>
Full article ">Figure 5
<p>Solutions found by GRMD, the GA, PSO, and BO for the vanilla RNN, LSTM, and the GRU.</p>
Full article ">Figure 6
<p>Trade-offs between the latency and power values determined by the GRMD and the GA.</p>
Full article ">
26 pages, 766 KiB  
Article
Still No Evidence for an Effect of the Proportion of Non-Native Speakers on Natural Language Complexity
by Alexander Koplenig
Entropy 2024, 26(11), 993; https://doi.org/10.3390/e26110993 - 18 Nov 2024
Viewed by 345
Abstract
In a recent study, I demonstrated that large numbers of L2 (second language) speakers do not appear to influence the morphological or information-theoretic complexity of natural languages. This paper has three primary aims: First, I address recent criticisms of my analyses, showing that [...] Read more.
In a recent study, I demonstrated that large numbers of L2 (second language) speakers do not appear to influence the morphological or information-theoretic complexity of natural languages. This paper has three primary aims: First, I address recent criticisms of my analyses, showing that the points raised by my critics were already explicitly considered and analysed in my original work. Furthermore, I show that the proposed alternative analyses fail to withstand detailed examination. Second, I introduce new data on the information-theoretic complexity of natural languages, with the estimates derived from various language models—ranging from simple statistical models to advanced neural networks—based on a database of 40 multilingual text collections that represent a wide range of text types. Third, I re-analyse the information-theoretic and morphological complexity data using novel methods that better account for model uncertainty in parameter estimation, as well as the genealogical relatedness and geographic proximity of languages. In line with my earlier findings, the results show no evidence that large numbers of L2 speakers have an effect on natural language complexity. Full article
(This article belongs to the Special Issue Complexity Characteristics of Natural Language)
Show Figures

Figure 1

Figure 1
<p>Descriptive results of KEW’s multiple imputation approach. Per type of complexity (information-theoretic or morphological), computations are based on the 100 completed samples from KEW’s multiple imputation analysis. (<b>a</b>) Spearman correlation between the imputed L2 proportion and speaker population size and (<b>b</b>) percentage of languages that have an L2 proportion of (i) more than 0, (ii) more than 0.10, (iii) more than 0.25 and (iv) more than 0.50 per type of complexity (information-theoretic or morphological). (<b>c</b>) Median L2 proportion for non-vehicular and vehicular languages per type of complexity.</p>
Full article ">
16 pages, 2277 KiB  
Review
Drug Discovery in the Age of Artificial Intelligence: Transformative Target-Based Approaches
by Akshata Yashwant Patne, Sai Madhav Dhulipala, William Lawless, Satya Prakash, Shyam S. Mohapatra and Subhra Mohapatra
Int. J. Mol. Sci. 2024, 25(22), 12233; https://doi.org/10.3390/ijms252212233 - 14 Nov 2024
Viewed by 602
Abstract
The complexities inherent in drug development are multi-faceted and often hamper accuracy, speed and efficiency, thereby limiting success. This review explores how recent developments in machine learning (ML) are significantly impacting target-based drug discovery, particularly in small-molecule approaches. The Simplified Molecular Input Line [...] Read more.
The complexities inherent in drug development are multi-faceted and often hamper accuracy, speed and efficiency, thereby limiting success. This review explores how recent developments in machine learning (ML) are significantly impacting target-based drug discovery, particularly in small-molecule approaches. The Simplified Molecular Input Line Entry System (SMILES), which translates a chemical compound’s three-dimensional structure into a string of symbols, is now widely used in drug design, mining, and repurposing. Utilizing ML and natural language processing techniques, SMILES has revolutionized lead identification, high-throughput screening and virtual screening. ML models enhance the accuracy of predicting binding affinity and selectivity, reducing the need for extensive experimental screening. Additionally, deep learning, with its strengths in analyzing spatial and sequential data through convolutional neural networks (CNNs) and recurrent neural networks (RNNs), shows promise for virtual screening, target identification, and de novo drug design. Fragment-based approaches also benefit from ML algorithms and techniques like generative adversarial networks (GANs), which predict fragment properties and binding affinities, aiding in hit selection and design optimization. Structure-based drug design, which relies on high-resolution protein structures, leverages ML models for accurate predictions of binding interactions. While challenges such as interpretability and data quality remain, ML’s transformative impact accelerates target-based drug discovery, increasing efficiency and innovation. Its potential to deliver new and improved treatments for various diseases is significant. Full article
(This article belongs to the Special Issue Techniques and Strategies in Drug Design and Discovery, 2nd Edition)
Show Figures

Figure 1

Figure 1
<p>Example of algorithms and classifiers in ML models [<a href="#B4-ijms-25-12233" class="html-bibr">4</a>,<a href="#B5-ijms-25-12233" class="html-bibr">5</a>].</p>
Full article ">Figure 2
<p>Example of algorithms and classifiers in ML models for small molecule-based approach drug discovery [<a href="#B9-ijms-25-12233" class="html-bibr">9</a>].</p>
Full article ">Figure 3
<p>Example of algorithms and classifiers in ML models for a fragment-based approach to drug discovery [<a href="#B24-ijms-25-12233" class="html-bibr">24</a>].</p>
Full article ">Figure 4
<p>Example of Algorithms and Classifiers in ML Models for Structure-Based Approach Drug Discovery [<a href="#B46-ijms-25-12233" class="html-bibr">46</a>].</p>
Full article ">
27 pages, 12110 KiB  
Article
Exploring the Impact of Additive Shortcuts in Neural Networks via Information Bottleneck-like Dynamics: From ResNet to Transformer
by Zhaoyan Lyu and Miguel R. D. Rodrigues
Entropy 2024, 26(11), 974; https://doi.org/10.3390/e26110974 - 14 Nov 2024
Viewed by 425
Abstract
Deep learning has made significant strides, driving advances in areas like computer vision, natural language processing, and autonomous systems. In this paper, we further investigate the implications of the role of additive shortcut connections, focusing on models such as ResNet, Vision Transformers (ViTs), [...] Read more.
Deep learning has made significant strides, driving advances in areas like computer vision, natural language processing, and autonomous systems. In this paper, we further investigate the implications of the role of additive shortcut connections, focusing on models such as ResNet, Vision Transformers (ViTs), and MLP-Mixers, given that they are essential in enabling efficient information flow and mitigating optimization challenges such as vanishing gradients. In particular, capitalizing on our recent information bottleneck approach, we analyze how additive shortcuts influence the fitting and compression phases of training, crucial for generalization. We leverage Z-X and Z-Y measures as practical alternatives to mutual information for observing these dynamics in high-dimensional spaces. Our empirical results demonstrate that models with identity shortcuts (ISs) often skip the initial fitting phase and move directly into the compression phase, while non-identity shortcut (NIS) models follow the conventional two-phase process. Furthermore, we explore how IS models are still able to compress effectively, maintaining their generalization capacity despite bypassing the early fitting stages. These findings offer new insights into the dynamics of shortcut connections in neural networks, contributing to the optimization of modern deep learning architectures. Full article
(This article belongs to the Section Information Theory, Probability and Statistics)
Show Figures

Figure 1

Figure 1
<p>The framework for estimating the Z-X and Z-Y measures. This figure is adapted from <a href="#entropy-26-00974-f001" class="html-fig">Figure 1</a> in [<a href="#B17-entropy-26-00974" class="html-bibr">17</a>]. <math display="inline"><semantics> <msub> <mo>ℓ</mo> <mrow> <mi>S</mi> <mi>E</mi> </mrow> </msub> </semantics></math> refers to the squared loss, and <math display="inline"><semantics> <msub> <mo>ℓ</mo> <mrow> <mi>C</mi> <mi>E</mi> </mrow> </msub> </semantics></math> represents the cross-entropy loss.</p>
Full article ">Figure 2
<p>An illustration of identity and non-identity shortcuts. The representations at different stages are labeled in pink.</p>
Full article ">Figure 3
<p>Architecture of the CNN (<b>left</b>), ResCNN (<b>middle</b>), and iResCNN (<b>right</b>). “Conv” refers to convolutional layers, “ReLU” to rectified linear unit activation, and “FC” to fully connected layers. The convolutional kernel and weight matrix shapes are noted in gray, and the tensor/matrix/vector shapes are labeled in blue.</p>
Full article ">Figure 4
<p>Z-X estimator design for convolutional networks. “TConv” stands for transposed convolution, used to upscale feature maps, and “tanh” represents the hyperbolic tangent activation function.</p>
Full article ">Figure 5
<p>The Z-X dynamics of the CNN (<b>left</b>), ResCNN (<b>middle</b>), and iResCNN (<b>right</b>). The Z-X measures are estimated at corresponding modules in <a href="#entropy-26-00974-f003" class="html-fig">Figure 3</a>.</p>
Full article ">Figure 6
<p>Architecture of ViT (<b>left</b>) and MLP-Mixer (<b>right</b>). “MHSA” represents multi-head self attention modules, “FF” indicates feed-forward modules, and “GAP” represents global average pooling layers.</p>
Full article ">Figure 7
<p>Architecture of Z-X estimators for token-based models. For the tokenized representation of the ViT and MLP-Mixer in this paper, <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>=</mo> <mn>64</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>p</mi> <mo>=</mo> <mn>8</mn> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <mi>d</mi> <mo>=</mo> <mn>512</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 8
<p>The Z-X dynamics of the ViT (<b>top</b>) and MLP-Mixer (<b>bottom</b>). The Z-X measures are estimated at the corresponding modules in <a href="#entropy-26-00974-f006" class="html-fig">Figure 6</a>.</p>
Full article ">Figure 9
<p>Dynamics of averaged element-wise correlation coefficients and averaged element-wise variance in the iResCNN: In the upper row, the curves in a darker color and the left axis show the dynamics of <math display="inline"><semantics> <mover> <mrow> <mi>Corr</mi> <mo>(</mo> <msub> <mi>Z</mi> <mrow> <mi>l</mi> <mo>;</mo> <mi>F</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>Z</mi> <mi>I</mi> </msub> <mo>)</mo> </mrow> <mo>¯</mo> </mover> </semantics></math>, while the lighter curves and the right axis show the Z-X measure (<math display="inline"><semantics> <msub> <mi>m</mi> <mrow> <msub> <mi>Z</mi> <mi>l</mi> </msub> <mo>;</mo> <mi>X</mi> </mrow> </msub> </semantics></math>) obtained from <a href="#sec4dot2-entropy-26-00974" class="html-sec">Section 4.2</a>. In the lower row, the dynamics of <math display="inline"><semantics> <mover> <msubsup> <mi>σ</mi> <msub> <mi>Z</mi> <mi>l</mi> </msub> <mn>2</mn> </msubsup> <mo>¯</mo> </mover> </semantics></math>, <math display="inline"><semantics> <mover> <msubsup> <mi>σ</mi> <msub> <mi>Z</mi> <mrow> <mi>l</mi> <mo>;</mo> <mi>F</mi> </mrow> </msub> <mn>2</mn> </msubsup> <mo>¯</mo> </mover> </semantics></math>, and <math display="inline"><semantics> <mover> <msubsup> <mi>σ</mi> <msub> <mi>Z</mi> <mi>I</mi> </msub> <mn>2</mn> </msubsup> <mo>¯</mo> </mover> </semantics></math> are visualized. The left, middle, and right panels show the representations of different modules. <span class="html-italic">l</span> is the index for the modules in the iResCNN shown in the right panel of <a href="#entropy-26-00974-f003" class="html-fig">Figure 3</a>. Panels in the same row or column share the same axes.</p>
Full article ">Figure 10
<p>Histograms of element-wise correlation coefficients in the iResCNN: These histograms summarize the element-wise coefficients <math display="inline"><semantics> <mrow> <mi>Corr</mi> <mo>(</mo> <msub> <mi>Z</mi> <mrow> <mi>l</mi> <mo>;</mo> <mi>F</mi> <mo>;</mo> <mi>i</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>Z</mi> <mrow> <mi>I</mi> <mo>;</mo> <mi>i</mi> </mrow> </msub> <mo>)</mo> </mrow> </semantics></math>, where <span class="html-italic">i</span> indexes the entries of the representation components.</p>
Full article ">Figure 11
<p>Dynamics of the averaged element-wise correlation coefficients and averaged element-wise variance in the MLP-Mixer: In the upper row, the curves in a darker color and the left axis show the dynamics of <math display="inline"><semantics> <mover> <mrow> <mi>Corr</mi> <mo>(</mo> <msub> <mi>Z</mi> <mrow> <mi>l</mi> <mo>;</mo> <mi>F</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>Z</mi> <mi>I</mi> </msub> <mo>)</mo> </mrow> <mo>¯</mo> </mover> </semantics></math>, while the lighter curves and the right axis show the Z-X measure (<math display="inline"><semantics> <msub> <mi>m</mi> <mrow> <msub> <mi>Z</mi> <mi>l</mi> </msub> <mo>;</mo> <mi>X</mi> </mrow> </msub> </semantics></math>) obtained from <a href="#sec4dot4-entropy-26-00974" class="html-sec">Section 4.4</a>. In the lower row, the dynamics of <math display="inline"><semantics> <mover> <msubsup> <mi>σ</mi> <msub> <mi>Z</mi> <mi>l</mi> </msub> <mn>2</mn> </msubsup> <mo>¯</mo> </mover> </semantics></math>, <math display="inline"><semantics> <mover> <msubsup> <mi>σ</mi> <msub> <mi>Z</mi> <mrow> <mi>l</mi> <mo>;</mo> <mi>F</mi> </mrow> </msub> <mn>2</mn> </msubsup> <mo>¯</mo> </mover> </semantics></math>, and <math display="inline"><semantics> <mover> <msubsup> <mi>σ</mi> <msub> <mi>Z</mi> <mi>I</mi> </msub> <mn>2</mn> </msubsup> <mo>¯</mo> </mover> </semantics></math> are visualized. The left, middle, and right panels show the representations of different modules. <span class="html-italic">l</span> is the index for the modules in the MLP-Mixer shown in the right panel in <a href="#entropy-26-00974-f006" class="html-fig">Figure 6</a>. Panels in the same row or column share the same axes.</p>
Full article ">Figure 12
<p>Dynamics of averaged statistics for the ViT: In the upper row, the curves in a darker color and the left axis show the dynamics of <math display="inline"><semantics> <mover> <mrow> <mi>Corr</mi> <mo>(</mo> <msub> <mi>Z</mi> <mrow> <mi>l</mi> <mo>;</mo> <mi>F</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>Z</mi> <mi>I</mi> </msub> <mo>)</mo> </mrow> <mo>¯</mo> </mover> </semantics></math>, while the lighter curves and the right axis show the Z-X measure (<math display="inline"><semantics> <msub> <mi>m</mi> <mrow> <msub> <mi>Z</mi> <mi>l</mi> </msub> <mo>;</mo> <mi>X</mi> </mrow> </msub> </semantics></math>) obtained from <a href="#sec4dot4-entropy-26-00974" class="html-sec">Section 4.4</a>. In the lower row, the dynamics of <math display="inline"><semantics> <mover> <msubsup> <mi>σ</mi> <msub> <mi>Z</mi> <mi>l</mi> </msub> <mn>2</mn> </msubsup> <mo>¯</mo> </mover> </semantics></math>, <math display="inline"><semantics> <mover> <msubsup> <mi>σ</mi> <msub> <mi>Z</mi> <mrow> <mi>l</mi> <mo>;</mo> <mi>F</mi> </mrow> </msub> <mn>2</mn> </msubsup> <mo>¯</mo> </mover> </semantics></math>, and <math display="inline"><semantics> <mover> <msubsup> <mi>σ</mi> <msub> <mi>Z</mi> <mi>I</mi> </msub> <mn>2</mn> </msubsup> <mo>¯</mo> </mover> </semantics></math> are visualized. The left, middle, and right panels show the representations of different modules. <span class="html-italic">l</span> is the index for the modules in the ViT shown in the left panel in <a href="#entropy-26-00974-f006" class="html-fig">Figure 6</a>. Panels in the same row or column share the same axes.</p>
Full article ">
40 pages, 40760 KiB  
Article
Dynamic-Max-Value ReLU Functions for Adversarially Robust Machine Learning Models
by Korn Sooksatra and Pablo Rivas
Mathematics 2024, 12(22), 3551; https://doi.org/10.3390/math12223551 - 13 Nov 2024
Viewed by 512
Abstract
The proliferation of deep learning has transformed artificial intelligence, demonstrating prowess in domains such as image recognition, natural language processing, and robotics. Nonetheless, deep learning models are susceptible to adversarial examples, well-crafted inputs that can induce erroneous predictions, particularly in safety-critical contexts. Researchers [...] Read more.
The proliferation of deep learning has transformed artificial intelligence, demonstrating prowess in domains such as image recognition, natural language processing, and robotics. Nonetheless, deep learning models are susceptible to adversarial examples, well-crafted inputs that can induce erroneous predictions, particularly in safety-critical contexts. Researchers actively pursue countermeasures such as adversarial training and robust optimization to fortify model resilience. This vulnerability is notably accentuated by the ubiquitous utilization of ReLU functions in deep learning models. A previous study proposed an innovative solution to mitigate this vulnerability, presenting a capped ReLU function tailored to bolster neural network robustness against adversarial examples. However, the approach had a scalability problem. To address this limitation, a series of comprehensive experiments are undertaken across diverse datasets, and we introduce the dynamic-max-value ReLU function to address the scalability problem. Full article
(This article belongs to the Special Issue Advances in Trustworthy and Robust Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Adversarial example that misleads an image classifier to predict the image as a cat.</p>
Full article ">Figure 2
<p>Denoised autoencoder for preprocessing of an adversarial example to create a clean/denoised sample. The solid line is the process with the autoencoder, and the dashed line is the process without the autoencoder.</p>
Full article ">Figure 3
<p>Randomized smoothing method, where the most common predictions are picked as the output. In this example, four noises are generated by the noise generator.</p>
Full article ">Figure 4
<p>Adversarial example detection technique where the detected samples are thrown away.</p>
Full article ">Figure 5
<p>An example of S-ReLU with a max value of 2.</p>
Full article ">Figure 6
<p>Examples of the MNIST dataset.</p>
Full article ">Figure 7
<p>Examples of the CIFAR10 dataset.</p>
Full article ">Figure 8
<p>Examples of the CIFAR100 dataset.</p>
Full article ">Figure 9
<p>Examples of the TinyImagenet dataset.</p>
Full article ">Figure 10
<p>Architecture of our approach with an added layer (in red) with D-ReLU before the output layer.</p>
Full article ">Figure 11
<p>Accuracy of two types of networks on clean MNIST and adversarial examples when adding a dense layer with a D-ReLU function before the output layer.</p>
Full article ">Figure 12
<p>Accuracy of several types of networks on clean CIFAR10 and adversarial examples when adding a dense layer with a D-ReLU function before the output layer.</p>
Full article ">Figure 13
<p>Accuracy of several types of CNNs on clean CIFAR10 and adversarial examples when adding a convolutional layer with a D-ReLU function after the input layer.</p>
Full article ">Figure 14
<p>Accuracy of several types of networks on clean CIFAR100 and adversarial examples when adding a dense layer with a D-ReLU function before the output layer.</p>
Full article ">Figure 15
<p>Accuracy of several types of networks on clean TinyImagenet and adversarial examples when adding a dense layer with a D-ReLU function before the output layer.</p>
Full article ">Figure 16
<p>Accuracy of several types of networks on clean CIFAR10 and adversarial examples generated by a black-box attack (i.e., square attack) when adding a dense layer with a D-ReLU function before the output layer.</p>
Full article ">Figure 17
<p>Accuracy of several types of networks on clean CIFAR100 and adversarial examples generated by a black-box attack (i.e., square attack) when adding a dense layer with a D-ReLU function before the output layer.</p>
Full article ">Figure 18
<p>Accuracy of several types of networks on clean TinyImagenet and adversarial examples generated by a black-box attack (i.e., square attack) when adding a dense layer with a D-ReLU function before the output layer.</p>
Full article ">Figure 19
<p>Accuracy of several types of networks on clean CIFAR10 and adversarial examples when adding a dense layer with a D-ReLU function before the output layer and training them with augmented data samples generated from the EDM.</p>
Full article ">Figure 20
<p>Accuracy of several types of networks on clean CIFAR100 and adversarial examples when adding a dense layer with a D-ReLU function before the output layer and training them with augmented data samples generated from the EDM.</p>
Full article ">Figure 21
<p>Accuracy of several types of networks on clean TinyImagenet and adversarial examples when adding a dense layer with a D-ReLU function before the output layer and training them with augmented data samples generated from the EDM.</p>
Full article ">Figure 22
<p>Accuracy of several types of networks on clean CIFAR10 and adversarial examples generated by a black-box attack (i.e., square attack) when adding a dense layer with a D-ReLU function before the output layer and training them with augmented data samples generated from the EDM.</p>
Full article ">Figure 22 Cont.
<p>Accuracy of several types of networks on clean CIFAR10 and adversarial examples generated by a black-box attack (i.e., square attack) when adding a dense layer with a D-ReLU function before the output layer and training them with augmented data samples generated from the EDM.</p>
Full article ">Figure 23
<p>Accuracy of several types of networks on clean CIFAR100 and adversarial examples generated by a black-box attack (i.e., square attack) when adding a dense layer with a D-ReLU function before the output layer and training them with augmented data samples generated from the EDM.</p>
Full article ">Figure 23 Cont.
<p>Accuracy of several types of networks on clean CIFAR100 and adversarial examples generated by a black-box attack (i.e., square attack) when adding a dense layer with a D-ReLU function before the output layer and training them with augmented data samples generated from the EDM.</p>
Full article ">Figure 24
<p>Accuracy of several types of networks on clean TinyImagenet and adversarial examples generated by black-box attacks when adding a dense layer with a D-ReLU function before the output layer and training them with augmented data samples generated from the EDM.</p>
Full article ">Figure 24 Cont.
<p>Accuracy of several types of networks on clean TinyImagenet and adversarial examples generated by black-box attacks when adding a dense layer with a D-ReLU function before the output layer and training them with augmented data samples generated from the EDM.</p>
Full article ">Figure 25
<p>Accuracy of several approaches on the CIFAR10 dataset under an APGD_CE attack with various perturbation bounds, where mReLU is D-ReLU.</p>
Full article ">Figure 26
<p>Accuracy of several approaches on the CIFAR100 dataset under an APGD_CE attack with various perturbation bounds, where mReLU is D-ReLU.</p>
Full article ">Figure 27
<p>Accuracy of several approaches on the TinyImagenet dataset under an APGD_CE attack with various perturbation bounds, where mReLU is D-ReLU.</p>
Full article ">
21 pages, 603 KiB  
Article
Diversifying Multi-Head Attention in the Transformer Model
by Nicholas Ampazis and Flora Sakketou
Mach. Learn. Knowl. Extr. 2024, 6(4), 2618-2638; https://doi.org/10.3390/make6040126 - 12 Nov 2024
Viewed by 595
Abstract
Recent studies have shown that, due to redundancy, some heads of the Transformer model can be pruned without diminishing the efficiency of the model. In this paper, we propose a constrained optimization algorithm based on Hebbian learning, which trains specific layers in the [...] Read more.
Recent studies have shown that, due to redundancy, some heads of the Transformer model can be pruned without diminishing the efficiency of the model. In this paper, we propose a constrained optimization algorithm based on Hebbian learning, which trains specific layers in the Transformer architecture in order to enforce diversification between the different heads in the multi-head attention module. The diversification of the heads is achieved through a single-layer feed-forward neural network that is added to the Transformer architecture and is trained with the proposed algorithm. We utilize the algorithm in three different architectural variations of the baseline Transformer model. In addition to the diversification of the heads, the proposed methodology can be used to prune the heads that capture redundant information. Experiments on diverse NLP tasks, including machine translation, text summarization, question answering and large language modeling, show that our proposed approach consistently improves the performance of baseline Transformer models. Full article
(This article belongs to the Section Data)
Show Figures

Figure 1

Figure 1
<p>The reshaping operation. This figure illustrates the reshaping operation of the concatenated multi-head attention output <math display="inline"><semantics> <mrow> <mi mathvariant="bold-italic">M</mi> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>×</mo> <mi>h</mi> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> </mrow> </msup> </mrow> </semantics></math> (on the left). Each head <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="bold-italic">Z</mi> <mi>i</mi> </msub> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>×</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> </mrow> </msup> <mspace width="4pt"/> <mspace width="4pt"/> <mo>∀</mo> <mspace width="4pt"/> <mspace width="4pt"/> <mi>i</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>⋯</mo> <mo>,</mo> <mi>h</mi> </mrow> </semantics></math> is represented by a <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>×</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> </mrow> </semantics></math> matrix, where <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>h</mi> <mo>=</mo> <mn>4</mn> </mrow> </semantics></math> for illustrative purposes. The output of the reshaping operation is the matrix <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="bold-italic">M</mi> <mi>r</mi> </msub> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>×</mo> <mi>h</mi> </mrow> </msup> </mrow> </semantics></math> (on the right).</p>
Full article ">Figure 2
<p>The direct architecture. These figures illustrate the direct architecture. (<b>i</b>) shows the operations involved in the PCA layer (Equation (<a href="#FD23-make-06-00126" class="html-disp-formula">23</a>)), where the normalized matrix <math display="inline"><semantics> <mrow> <msub> <mover accent="true"> <mi mathvariant="bold-italic">M</mi> <mo>˜</mo> </mover> <mi>r</mi> </msub> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>×</mo> <mi>h</mi> </mrow> </msup> </mrow> </semantics></math> is multiplied by <math display="inline"><semantics> <mrow> <mi>P</mi> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>h</mi> <mo>×</mo> <mi>h</mi> </mrow> </msup> </mrow> </semantics></math> to obtain <math display="inline"><semantics> <mrow> <msubsup> <mi mathvariant="bold-italic">M</mi> <mi>r</mi> <mo>′</mo> </msubsup> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>×</mo> <mi>h</mi> </mrow> </msup> </mrow> </semantics></math>, with <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>h</mi> <mo>=</mo> <mn>4</mn> </mrow> </semantics></math> for illustrative purposes. <math display="inline"><semantics> <msubsup> <mi mathvariant="bold-italic">M</mi> <mi>r</mi> <mo>′</mo> </msubsup> </semantics></math> is then reshaped into <math display="inline"><semantics> <mrow> <msup> <mrow> <mi mathvariant="bold-italic">M</mi> </mrow> <mo>′</mo> </msup> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>×</mo> <mi>h</mi> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> </mrow> </msup> </mrow> </semantics></math>, which is multiplied by <math display="inline"><semantics> <mrow> <msup> <mrow> <mi mathvariant="bold-italic">W</mi> </mrow> <mi>O</mi> </msup> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>h</mi> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>×</mo> <mi>d</mi> </mrow> </msup> </mrow> </semantics></math> in order to be rescaled to <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="bold-italic">Z</mi> <mrow> <mi>o</mi> <mi>u</mi> <mi>t</mi> </mrow> </msub> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>×</mo> <mi>d</mi> </mrow> </msup> </mrow> </semantics></math> with <math display="inline"><semantics> <mrow> <mi>d</mi> <mo>=</mo> <mn>5</mn> </mrow> </semantics></math>. (<b>ii</b>) shows the rescaling operation (Equation (<a href="#FD25-make-06-00126" class="html-disp-formula">25</a>)).</p>
Full article ">Figure 3
<p>The average architecture. These figures show the average architecture. (<b>i</b>) shows the operation that calculates the average of each head across the <math display="inline"><semantics> <msub> <mi>d</mi> <mi>k</mi> </msub> </semantics></math> dimension as defined in Equation (<a href="#FD26-make-06-00126" class="html-disp-formula">26</a>) in order to obtain matrix <math display="inline"><semantics> <mrow> <mi mathvariant="bold-italic">S</mi> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>×</mo> <mi>h</mi> </mrow> </msup> </mrow> </semantics></math>, where <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>h</mi> <mo>=</mo> <mn>4</mn> </mrow> </semantics></math> for illustrative purposes. (<b>ii</b>) shows the operations involved in the PCA layer, where <span class="html-italic"><b>S</b></span> is multiplied by <math display="inline"><semantics> <mrow> <mi mathvariant="bold-italic">P</mi> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>h</mi> <mo>×</mo> <mi>h</mi> </mrow> </msup> </mrow> </semantics></math> to obtain <math display="inline"><semantics> <mrow> <msup> <mrow> <mi mathvariant="bold-italic">S</mi> </mrow> <mo>′</mo> </msup> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>×</mo> <mi>h</mi> </mrow> </msup> </mrow> </semantics></math> (Equation (<a href="#FD28-make-06-00126" class="html-disp-formula">28</a>)). (<b>iii</b>) shows the rescaling operation (Equation (<a href="#FD30-make-06-00126" class="html-disp-formula">30</a>)) where <math display="inline"><semantics> <msup> <mrow> <mi mathvariant="bold-italic">S</mi> </mrow> <mo>′</mo> </msup> </semantics></math> is multiplied by <math display="inline"><semantics> <mrow> <msup> <mrow> <msup> <mrow> <mi mathvariant="bold-italic">W</mi> </mrow> <mi>O</mi> </msup> </mrow> <mo>′</mo> </msup> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>h</mi> <mo>×</mo> <mi>d</mi> </mrow> </msup> </mrow> </semantics></math> in order to be rescaled to <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="bold-italic">Z</mi> <mrow> <mi>o</mi> <mi>u</mi> <mi>t</mi> </mrow> </msub> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>×</mo> <mi>d</mi> </mrow> </msup> </mrow> </semantics></math> with <math display="inline"><semantics> <mrow> <mi>d</mi> <mo>=</mo> <mn>5</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 4
<p>The non-linear architecture. These figures illustrate the non-linear architecture. (<b>i</b>) shows how the augmented matrix <math display="inline"><semantics> <mrow> <mi mathvariant="bold-italic">C</mi> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>×</mo> <mrow> <mo>(</mo> <mi>h</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>/</mo> <mn>2</mn> <mo>+</mo> <mi>h</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow> </semantics></math> is obtained from matrix <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="bold-italic">M</mi> <mi>r</mi> </msub> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>×</mo> <mi>h</mi> </mrow> </msup> </mrow> </semantics></math>, where <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>h</mi> <mo>=</mo> <mn>4</mn> </mrow> </semantics></math> for illustrative purposes. (<b>ii</b>) shows the operations involved in the PCA layer (Equation (<a href="#FD33-make-06-00126" class="html-disp-formula">33</a>)), where the normalized augmented matrix <math display="inline"><semantics> <mrow> <mover accent="true"> <mi mathvariant="bold-italic">C</mi> <mo>˜</mo> </mover> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>×</mo> <mrow> <mo>(</mo> <mi>h</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>/</mo> <mn>2</mn> <mo>+</mo> <mi>h</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow> </semantics></math> is multiplied by <math display="inline"><semantics> <mrow> <mi mathvariant="bold-italic">P</mi> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>(</mo> <mi>h</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> <mo>/</mo> <mn>2</mn> <mo>+</mo> <mi>h</mi> <mo>)</mo> <mo>×</mo> <mo>(</mo> <mi>h</mi> <mo>(</mo> <mi>h</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> <mo>/</mo> <mn>2</mn> <mo>+</mo> <mi>h</mi> <mo>)</mo> </mrow> </msup> </mrow> </semantics></math> to obtain <math display="inline"><semantics> <mrow> <msup> <mrow> <mi mathvariant="bold-italic">C</mi> </mrow> <mo>′</mo> </msup> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>×</mo> <mrow> <mo>(</mo> <mi>h</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>/</mo> <mn>2</mn> <mo>+</mo> <mi>h</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow> </semantics></math> (Equation (<a href="#FD33-make-06-00126" class="html-disp-formula">33</a>)). Following this, <a href="#make-06-00126-f005" class="html-fig">Figure 5</a>i shows the reshaping operation that takes as input matrix <math display="inline"><semantics> <mrow> <msup> <mrow> <mi mathvariant="bold-italic">C</mi> </mrow> <mo>′</mo> </msup> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>×</mo> <mrow> <mo>(</mo> <mi>h</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>/</mo> <mn>2</mn> <mo>+</mo> <mi>h</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow> </semantics></math> and outputs <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="bold-italic">C</mi> <mi>r</mi> </msub> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>×</mo> <mrow> <mo>(</mo> <mi>h</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>/</mo> <mn>2</mn> <mo>+</mo> <mi>h</mi> <mo>)</mo> </mrow> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> </mrow> </msup> </mrow> </semantics></math>. Finally, <a href="#make-06-00126-f005" class="html-fig">Figure 5</a>ii shows the rescaling operation (Equation (<a href="#FD35-make-06-00126" class="html-disp-formula">35</a>)), where <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">C</mi> <mi>r</mi> </msub> </semantics></math> is multiplied by <math display="inline"><semantics> <mrow> <msup> <mrow> <msup> <mrow> <mi mathvariant="bold-italic">W</mi> </mrow> <mi>O</mi> </msup> </mrow> <mo>′</mo> </msup> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>h</mi> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>×</mo> <mi>d</mi> </mrow> </msup> </mrow> </semantics></math> in order to be rescaled to <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="bold-italic">Z</mi> <mrow> <mi>o</mi> <mi>u</mi> <mi>t</mi> </mrow> </msub> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>×</mo> <mi>d</mi> </mrow> </msup> </mrow> </semantics></math> with <math display="inline"><semantics> <mrow> <mi>d</mi> <mo>=</mo> <mn>5</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 5
<p>The non-linear architecture (cont’d). These figures illustrate the non-linear architecture. <a href="#make-06-00126-f004" class="html-fig">Figure 4</a>i (see the previous page) shows how the augmented matrix <math display="inline"><semantics> <mrow> <mi mathvariant="bold-italic">C</mi> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>×</mo> <mrow> <mo>(</mo> <mi>h</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>/</mo> <mn>2</mn> <mo>+</mo> <mi>h</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow> </semantics></math> is obtained from matrix <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="bold-italic">M</mi> <mi>r</mi> </msub> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>×</mo> <mi>h</mi> </mrow> </msup> </mrow> </semantics></math>, where <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>h</mi> <mo>=</mo> <mn>4</mn> </mrow> </semantics></math> for illustrative purposes. <a href="#make-06-00126-f004" class="html-fig">Figure 4</a>ii shows the operations involved in the PCA layer (Equation (<a href="#FD33-make-06-00126" class="html-disp-formula">33</a>)), where the normalized augmented matrix <math display="inline"><semantics> <mrow> <mover accent="true"> <mi mathvariant="bold-italic">C</mi> <mo>˜</mo> </mover> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>×</mo> <mrow> <mo>(</mo> <mi>h</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>/</mo> <mn>2</mn> <mo>+</mo> <mi>h</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow> </semantics></math> is multiplied by <math display="inline"><semantics> <mrow> <mi mathvariant="bold-italic">P</mi> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>(</mo> <mi>h</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> <mo>/</mo> <mn>2</mn> <mo>+</mo> <mi>h</mi> <mo>)</mo> <mo>×</mo> <mo>(</mo> <mi>h</mi> <mo>(</mo> <mi>h</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> <mo>/</mo> <mn>2</mn> <mo>+</mo> <mi>h</mi> <mo>)</mo> </mrow> </msup> </mrow> </semantics></math> to obtain <math display="inline"><semantics> <mrow> <msup> <mrow> <mi mathvariant="bold-italic">C</mi> </mrow> <mo>′</mo> </msup> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>×</mo> <mrow> <mo>(</mo> <mi>h</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>/</mo> <mn>2</mn> <mo>+</mo> <mi>h</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow> </semantics></math> (Equation (<a href="#FD33-make-06-00126" class="html-disp-formula">33</a>)). Following this, (<b>i</b>) shows the reshaping operation that takes as input the matrix <math display="inline"><semantics> <mrow> <msup> <mrow> <mi mathvariant="bold-italic">C</mi> </mrow> <mo>′</mo> </msup> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>×</mo> <mrow> <mo>(</mo> <mi>h</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>/</mo> <mn>2</mn> <mo>+</mo> <mi>h</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow> </semantics></math> and outputs <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="bold-italic">C</mi> <mi>r</mi> </msub> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>×</mo> <mrow> <mo>(</mo> <mi>h</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>/</mo> <mn>2</mn> <mo>+</mo> <mi>h</mi> <mo>)</mo> </mrow> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> </mrow> </msup> </mrow> </semantics></math>. Finally, (<b>ii</b>) shows the rescaling operation (Equation (<a href="#FD35-make-06-00126" class="html-disp-formula">35</a>)), where <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">C</mi> <mi>r</mi> </msub> </semantics></math> is multiplied by <math display="inline"><semantics> <mrow> <msup> <mrow> <msup> <mrow> <mi mathvariant="bold-italic">W</mi> </mrow> <mi>O</mi> </msup> </mrow> <mo>′</mo> </msup> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>h</mi> <mo>·</mo> <msub> <mi>d</mi> <mi>k</mi> </msub> <mo>×</mo> <mi>d</mi> </mrow> </msup> </mrow> </semantics></math> in order to be rescaled to <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="bold-italic">Z</mi> <mrow> <mi>o</mi> <mi>u</mi> <mi>t</mi> </mrow> </msub> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <mi>n</mi> <mo>×</mo> <mi>d</mi> </mrow> </msup> </mrow> </semantics></math> with <math display="inline"><semantics> <mrow> <mi>d</mi> <mo>=</mo> <mn>5</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 6
<p>Correlation matrix heat map. This figure shows the heat map of the correlation matrix of the PCA layer’s weights. Each plot corresponds to a different architecture.</p>
Full article ">
20 pages, 3548 KiB  
Article
A Comparative Study on Detection and Recognition of Nonuniform License Plates
by Mehak Arshid, Muhammad Raees Azam and Zahid Mahmood
Big Data Cogn. Comput. 2024, 8(11), 155; https://doi.org/10.3390/bdcc8110155 - 11 Nov 2024
Viewed by 884
Abstract
This paper presents a comparative study on license plate detection and recognition algorithms in unconstrained environments, which include varying illuminations, nonstandard plate templates, and different English language fonts. A prime objective of this study is to assess how well these models handle such [...] Read more.
This paper presents a comparative study on license plate detection and recognition algorithms in unconstrained environments, which include varying illuminations, nonstandard plate templates, and different English language fonts. A prime objective of this study is to assess how well these models handle such challenges. These problems are common in developing countries like Pakistan where diverse license plates, styles, and abrupt changes in illuminations make license plates detection and recognition a challenging task. To analyze the license plate detection problem Faster-RCNN and end-to-end (E2E) methods are implemented. For the license plate recognition task, deep neural network and the CA-CenterNet-based methods are compared. Detailed simulations were performed on authors’ own collected dataset of Pakistani license plates, which contains substantially different multi-styled license plates. Our study concludes that, for the task of license plate detection, Faster-RCNN yields a detection accuracy of 98.35%, while the E2E method delivers 98.48% accuracy. Both detection algorithms yielded a mean detection accuracy of 98.41%. For license plate recognition task, the DNN-based method yielded a recognition accuracy of 98.90%, while the CA-CenterNet-based method delivered a high accuracy of 98.96%. In addition, a detailed computational complexity comparison on various image resolutions revealed that E2E and the CA-CenterNet are more efficient than their counterparts during detection and recognition tasks, respectively. Full article
Show Figures

Figure 1

Figure 1
<p>Nonuniform LP samples.</p>
Full article ">Figure 2
<p>Sample images from our dataset.</p>
Full article ">Figure 3
<p>LP detection, (<b>a</b>) Faster-RCNN, and (<b>b</b>) the end-to-end (E2E) method.</p>
Full article ">Figure 4
<p>LP Recognition, (<b>a</b>) DNN, and (<b>b</b>) CA-CenterNet.</p>
Full article ">Figure 5
<p>Computational complexity, (<b>a</b>) LP detection and recognition times, and (<b>b</b>) the mean execution time.</p>
Full article ">Figure 6
<p>Mean and standard deviation comparison.</p>
Full article ">Figure 7
<p>Challenging cases of LPs: (<b>a</b>) broken, (<b>b</b>) severe occlusion, (<b>c</b>) embossed characters, and (<b>d</b>) faded characters.</p>
Full article ">
19 pages, 3710 KiB  
Article
RGISQL: Integrating Refined Grammatical Information into Relational Graph Neural Network for Text-to-SQL Task
by Shuiyan Li, Yaozhen He, Longhao Ao and Rongzhi Qi
Appl. Sci. 2024, 14(22), 10359; https://doi.org/10.3390/app142210359 - 11 Nov 2024
Viewed by 445
Abstract
The text-to-SQL task aims to convert natural language questions into corresponding SQL queries based on a given database schema. Previous models that rely on graph neural networks often struggle to accurately capture the complex grammatical relationships present in these questions, leading to poor [...] Read more.
The text-to-SQL task aims to convert natural language questions into corresponding SQL queries based on a given database schema. Previous models that rely on graph neural networks often struggle to accurately capture the complex grammatical relationships present in these questions, leading to poor performance when generating queries for longer requests. To address these challenges, we propose RGISQL, which integrates refined grammatical information extracted from the question and employs segmentation processing to effectively manage long queries. Additionally, RGISQL minimizes the complexity of edge embeddings by reducing the coupling within graph neural networks. By utilizing grammatical dependency trees, RGISQL is better equipped to capture the inherent structure and grammatical rules of questions. This refined grammatical information offers additional contextual and semantic cues for the model, thereby enhancing both its generalizability and interpretability. Furthermore, we dynamically assess the importance of different edges based on the graph structure, which helps reduce the coupling of edge embeddings and further improves the model’s performance. Multiple sets of experiments conducted on the Spider and Spider-Syn datasets demonstrate that RGISQL outperforms other baselines, achieving the best results in both datasets. Full article
Show Figures

Figure 1

Figure 1
<p>A simple example of parsing natural language questions and generating SQL queries.</p>
Full article ">Figure 2
<p>GAT principle data flow diagram.</p>
Full article ">Figure 3
<p>The overall architecture of RGISQL. <math display="inline"><semantics> <mrow> <mi>X</mi> </mrow> </semantics></math> represents the node embeddings and <math display="inline"><semantics> <mrow> <mi>R</mi> </mrow> </semantics></math> represents the edge embeddings. RSA stands for relation-aware self-attention mechanism.</p>
Full article ">Figure 4
<p>Input interaction diagram example.</p>
Full article ">Figure 5
<p>The two stages of segmentation processing.</p>
Full article ">Figure 6
<p>Results of syntactic analysis: (<b>a</b>) syntactic constituent analysis; (<b>b</b>) grammatical dependency relations.</p>
Full article ">Figure 7
<p>RGAT principle diagram.</p>
Full article ">Figure 8
<p>Over-coupling phenomenon of edge embedding, using dynamic edge attention pooling to reduce over-coupling.</p>
Full article ">Figure 9
<p>RAT layer embedding concatenation principle diagram.</p>
Full article ">Figure 10
<p>Ablation results on Spider. Among them, (<b>a</b>) represents the results of exact set matching accuracy, and (<b>b</b>) represents the results of execution accuracy. “RGI” stands for refined grammatical information. “RC” stands for reduced coupling. “SEG” refers to segmentation processing. “EM” refers to exact set match accuracy. “EX” refers to execution accuracy.</p>
Full article ">Figure 11
<p>Ablation results on Spider-Syn. “RGI” stands for refined grammatical information. “RC” stands for reduced coupling. “SEG” refers to segmentation processing. “EM” refers to exact set match accuracy.</p>
Full article ">Figure 12
<p>Ablation experiments on the settings of RGAT and the decoder. “ECG” represents the edge-centric graph. “EM” refers to exact set match accuracy.</p>
Full article ">Figure 13
<p>The correlation between different words in natural language questions.</p>
Full article ">
13 pages, 264 KiB  
Article
Modification and Validation of the System Causability Scale Using AI-Based Therapeutic Recommendations for Urological Cancer Patients: A Basis for the Development of a Prospective Comparative Study
by Emily Rinderknecht, Dominik von Winning, Anton Kravchuk, Christof Schäfer, Marco J. Schnabel, Stephan Siepmann, Roman Mayr, Jochen Grassinger, Christopher Goßler, Fabian Pohl, Peter J. Siska, Florian Zeman, Johannes Breyer, Anna Schmelzer, Christian Gilfrich, Sabine D. Brookman-May, Maximilian Burger, Maximilian Haas and Matthias May
Curr. Oncol. 2024, 31(11), 7061-7073; https://doi.org/10.3390/curroncol31110520 - 11 Nov 2024
Viewed by 343
Abstract
The integration of artificial intelligence, particularly Large Language Models (LLMs), has the potential to significantly enhance therapeutic decision-making in clinical oncology. Initial studies across various disciplines have demonstrated that LLM-based treatment recommendations can rival those of multidisciplinary tumor boards (MTBs); however, such data [...] Read more.
The integration of artificial intelligence, particularly Large Language Models (LLMs), has the potential to significantly enhance therapeutic decision-making in clinical oncology. Initial studies across various disciplines have demonstrated that LLM-based treatment recommendations can rival those of multidisciplinary tumor boards (MTBs); however, such data are currently lacking for urological cancers. This preparatory study establishes a robust methodological foundation for the forthcoming CONCORDIA trial, including the validation of the System Causability Scale (SCS) and its modified version (mSCS), as well as the selection of LLMs for urological cancer treatment recommendations based on recommendations from ChatGPT-4 and an MTB for 40 urological cancer scenarios. Both scales demonstrated strong validity, reliability (all aggregated Cohen’s K > 0.74), and internal consistency (all Cronbach’s Alpha > 0.9), with the mSCS showing superior reliability, internal consistency, and clinical applicability (p < 0.01). Two Delphi processes were used to define the LLMs to be tested in the CONCORDIA study (ChatGPT-4 and Claude 3.5 Sonnet) and to establish the acceptable non-inferiority margin for LLM recommendations compared to MTB recommendations. The forthcoming ethics-approved and registered CONCORDIA non-inferiority trial will require 110 urological cancer scenarios, with an mSCS difference threshold of 0.15, a Bonferroni corrected alpha of 0.025, and a beta of 0.1. Blinded mSCS assessments of MTB recommendations will then be compared to those of the LLMs. In summary, this work establishes the necessary prerequisites prior to initiating the CONCORDIA study and validates a modified score with high applicability and reliability for this and future trials. Full article
14 pages, 6618 KiB  
Article
Exploring Cutout and Mixup for Robust Human Activity Recognition on Sensor and Skeleton Data
by Hiskias Dingeto and Juntae Kim
Appl. Sci. 2024, 14(22), 10286; https://doi.org/10.3390/app142210286 - 8 Nov 2024
Viewed by 474
Abstract
Human Activity Recognition (HAR) is an essential area of research in Artificial Intelligence and Machine Learning, with numerous applications in healthcare, sports science, and smart environments. While several advancements in the field, such as attention-based models and Graph Neural Networks, have made great [...] Read more.
Human Activity Recognition (HAR) is an essential area of research in Artificial Intelligence and Machine Learning, with numerous applications in healthcare, sports science, and smart environments. While several advancements in the field, such as attention-based models and Graph Neural Networks, have made great strides, this work focuses on data augmentation methods that tackle issues like data scarcity and task variability in HAR. In this work, we investigate and expand the use of mixup and cutout data augmentation methods to sensor-based and skeleton-based HAR datasets. These methods were first widely used in Computer Vision and Natural Language Processing. We use both augmentation techniques, customized for time-series and skeletal data, to improve the robustness and performance of HAR models by diversifying the data and overcoming the drawbacks of having limited training data. Specifically, we customize mixup data augmentation for sensor-based datasets and cutout data augmentation for skeleton-based datasets with the goal of improving model accuracy without adding more data. Our results show that using mixup and cutout techniques improves the accuracy and generalization of activity recognition models on both sensor-based and skeleton-based human activity datasets. This work showcases the potential of data augmentation techniques on transformers and Graph Neural Networks by offering a novel method for enhancing time series and skeletal HAR tasks. Full article
Show Figures

Figure 1

Figure 1
<p>HAR Data Augmentation Framework.</p>
Full article ">Figure 2
<p>Human activity samples from RealWorld dataset before mixup.</p>
Full article ">Figure 3
<p>Human activity sample from RealWorld dataset after mixup.</p>
Full article ">Figure 4
<p>RealWorld dataset with (<b>left</b>) and without (<b>right</b>) mixup (T-SNE).</p>
Full article ">Figure 5
<p>Cutout visualization on human skeleton sample. The white key-points on the right skeleton indicate where cutout has been applied to mask parts of the skeleton.</p>
Full article ">Figure 6
<p>Comparison of model accuracy across different datasets and training methods.</p>
Full article ">
19 pages, 6031 KiB  
Article
GPS-pPLM: A Language Model for Prediction of Prokaryotic Phosphorylation Sites
by Chi Zhang, Dachao Tang, Cheng Han, Yujie Gou, Miaomiao Chen, Xinhe Huang, Dan Liu, Miaoying Zhao, Leming Xiao, Qiang Xiao, Di Peng and Yu Xue
Cells 2024, 13(22), 1854; https://doi.org/10.3390/cells13221854 - 8 Nov 2024
Viewed by 509
Abstract
In the prokaryotic kingdom, protein phosphorylation serves as one of the most important posttranslational modifications (PTMs) and is involved in orchestrating a broad spectrum of biological processes. Here, we report an updated online server named the group-based prediction system for prokaryotic phosphorylation language [...] Read more.
In the prokaryotic kingdom, protein phosphorylation serves as one of the most important posttranslational modifications (PTMs) and is involved in orchestrating a broad spectrum of biological processes. Here, we report an updated online server named the group-based prediction system for prokaryotic phosphorylation language model (GPS-pPLM), used for predicting phosphorylation sites (p-sites) in prokaryotes. For model training, two deep learning methods, a transformer and a deep neural network, were employed, and a total of 10 sequence features and contextual features were integrated. Using 44,839 nonredundant p-sites in 16,041 proteins from 95 prokaryotes, two general models for the prediction of O-phosphorylation and N-phosphorylation were first pretrained and then fine-tuned to construct 6 predictors specific for each phosphorylatable residue type as well as 134 species-specific predictors. Compared with other existing tools, the GPS-pPLM exhibits higher accuracy in predicting prokaryotic O-phosphorylation p-sites. Protein sequences in FASTA format or UniProt accession numbers can be submitted by users, and the predicted results are displayed in tabular form. In addition, we annotate the predicted p-sites with knowledge from 22 public resources, including experimental evidence, 3D structures, and disorder tendencies. The online service of the GPS-pPLM is freely accessible for academic research. Full article
(This article belongs to the Section Cell Methods)
Show Figures

Figure 1

Figure 1
<p>The procedure for the development of the GPS-pPLM, including data collection, feature encoding, and model construction. (<b>A</b>) Data preparation of p-sites curated from the literature and dbPSP 2.0. (<b>B</b>) Sequence feature encoding of the GPS-pPLM algorithm. In total, 8 types of sequence features were used, including APAAC, CKSAAP, CTDC, DDE, distance pair, EAAC, PAAC, and GPS. (<b>C</b>) Model construction methods used for GPS-pPLM. Two machine learning approaches, including transformer and DNN, were used for the construction of predictive models. We pretrained two general models, the <span class="html-italic">O</span>-phosphorylation model and the <span class="html-italic">N</span>-phosphorylation model, and then the pretrained models were fine-tuned using data specific to a single-residue type or species.</p>
Full article ">Figure 2
<p>Performance evaluation and comparison of the GPS-pPLM. (<b>A</b>,<b>B</b>) Performance evaluation of the pS and pR predictors. The accuracies of the 10-feature models and integrated models were evaluated via a 4-fold cross-validation method. (<b>C</b>,<b>D</b>) Performance comparison between the GPS-pPLM and other existing predictors on the S p-site and T p-site independent testing data, including MPsite and NetPhosBac. (<b>E</b>) Evaluation of 10 types of features contributing to the <span class="html-italic">O</span>-phosphorylation model by measuring the SHAP score for each feature. (<b>F</b>) Visualization of normalized attention weights for the relationships between different positions in S/T p-sites.</p>
Full article ">Figure 3
<p>Identification of prokaryotic p-site motifs. (<b>A</b>) Eukaryotic-like phosphorylation motifs identified in prokaryotes through the attention mechanism, including R-X-X-S/T-L/I/V, R-R-X-S/T, L-X-R-X-X-S/T, and G-H-A. (<b>B</b>) The enrichment results of 24 motifs containing two or more amino acids were obtained from all training data (<span class="html-italic">p</span>-value &lt; 0.01). These motifs were identified via the motif-x tool from the phosphoproteomic data of 48 eukaryotes. (<b>C</b>) Sequence alignment results of the PKA, PknA, and PknB kinase functional sites. (<b>D</b>) The substrates that can be specifically phosphorylated by PknB include R-X-X-S/T-L/I/V motifs in prokaryotes. (<b>E</b>) Display of sequences near the p-site of the G-H-A motif in the training dataset.</p>
Full article ">Figure 4
<p>Evaluation of the prediction results of different species-specific predictors. (<b>A</b>) Distribution of AUC values for 95 species-specific predictors in the <span class="html-italic">O</span>-phosphorylation dataset and 39 species-specific predictors in the <span class="html-italic">N</span>-phosphorylation dataset. (<b>B</b>) AUC values of four extensively studied prokaryotic species predictors, namely, <span class="html-italic">B. subtilis</span>, <span class="html-italic">E. coli</span>, <span class="html-italic">M. tuberculosis</span>, and <span class="html-italic">S. aureus</span>. (<b>C</b>,<b>D</b>) The distribution of prediction scores for <span class="html-italic">E.-coli</span>-positive data using four species-specific predictors. (<b>E</b>,<b>F</b>) The distribution of prediction scores for <span class="html-italic">B.-subtilis</span>-positive data using four species-specific predictors.</p>
Full article ">Figure 5
<p>Usage of the GPS-pPLM web server. (<b>A</b>) The interface of sequence submission. Users can input protein sequences in FASTA format or enter UniProt accession numbers and select 3 different thresholds and 6 different residue types to predict p-sites. (<b>B</b>) Presentation of the prediction results of the example. In the tabular list, the predictive results include the position of the p-site, residue type, prediction score, cut-off value, identification via experimental or computational methods, and links to dbPSP 2.0. (<b>C</b>) Comprehensive annotations of the prediction results. The line chart shows the disorder score of the p-sites, and the location of the p-sites is explained in the 3D structure exported from the PDB database. The ASA score and disorder score of the p-sites are also calculated in the comprehensive mode.</p>
Full article ">Figure 6
<p>The prediction results for the <span class="html-italic">E. coli</span> protein tufA. (<b>A</b>) The T residues predicted as p-sites in the tufA protein, including T9, T72, T229, and T383. (<b>B</b>) Amino acid frequency of sequences containing VVT motifs at known p-sites and the amino acid sequence near T229. (<b>C</b>) The 3D structure of T229 and the annotation results from PDB. (<b>D</b>) Annotation of tufA p-sites. The disorder score was measured for the p-sites.</p>
Full article ">
Back to TopTop