Open AccessArticle

Knowledge-Enhanced Transformer Graph Summarization (KETGS): Integrating Entity and Discourse Relations for Advanced Extractive Text Summarization

Aytuğ Onan

^1,*

and

Hesham Alhumyani

Department of Computer Engineering, Faculty of Engineering and Architecture, Izmir Katip Celebi University, Izmir 35620, Turkey

Department of Computer Engineering, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia

Author to whom correspondence should be addressed.

Mathematics 2024, 12(23), 3638; https://doi.org/10.3390/math12233638

Submission received: 27 August 2024 / Revised: 6 November 2024 / Accepted: 14 November 2024 / Published: 21 November 2024

(This article belongs to the Special Issue Recent Trends and Advances in the Natural Language Processing)

Download

Browse Figures

Versions Notes

Abstract

The rapid proliferation of textual data across multiple sectors demands more sophisticated and efficient techniques for summarizing extensive texts. Focusing on extractive text summarization, this approach zeroes in on choosing key sentences from a document, providing an essential method for handling extensive information. While conventional methods often miss capturing deep semantic links within texts, resulting in summaries that might lack cohesion and depth, this paper introduces a novel framework called Knowledge-Enhanced Transformer Graph Summary (KETGS). Leveraging the strengths of both transformer models and Graph Neural Networks, KETGS develops a detailed graph representation of documents, embedding linguistic units from words to key entities. This structured graph is then navigated via a Transformer-Guided Graph Neural Network (TG-GNN), dynamically enhancing node features with structural connections and transformer-driven attention mechanisms. The framework adopts a Maximum Marginal Relevance (MMR) strategy for selecting sentences. Our evaluations show that KETGS outshines other leading extractive summarization models, delivering summaries that are more relevant, cohesive, and concise, thus better preserving the essence and structure of the original texts.

Keywords:

document summarization; extractive summarization; graph neural networks; text summarization; transformer models

MSC:

68T50

1. Introduction

The rapid proliferation of textual data in various domains, from scientific literature to social media, has made the task of summarizing large volumes of text both critical and challenging. Text summarization creates a concise and coherent summary from a larger body of text. It has emerged as an essential tool for managing and interpreting this deluge of information. Extractive text summarization methods have gained significant attention due to their simplicity and effectiveness [1,2].

Traditional extractive summarization techniques or graph-based algorithms like TextRank often rely on shallow linguistic features and statistical measures [3,4]. While these methods can effectively capture sentence importance based on surface-level criteria, they typically fail to account for the deeper semantic relationships and discourse structures within a document. As a result, the generated summaries may lack coherence and fail to capture the nuanced meaning of the original text [5].

Recent progress in natural language processing (NLP) has brought about the emergence of deep learning models, particularly those built on Transformer architectures [6]. These models, including BERT [7] and GPT [8], have set new benchmarks in a variety of NLP tasks by effectively capturing contextual information and semantic relationships within text. However, their application to extractive summarization has been somewhat limited, often requiring additional mechanisms to handle the selection of the most relevant content from a document [9].

We propose the Knowledge-Enhanced Transformer Graph Summarization (KETGS) framework, an advanced extractive summarization approach that integrates the strengths of Graph Neural Networks (GNNs) and Transformer models. The KETGS framework constructs a detailed graph representation for a document, embedding not only words and sentences but also key entities and their relationships [10,11].

In the recent development of summarization methods, abstractive techniques are recognized for their ability to generate novel phrases that may not be present in the source text, aligning well with human cognitive approaches to summarization. However, our decision to focus on extractive summarization is based on several critical factors.

Factual Integrity and Domain-Specific Requirements: In many specialized domains, such as biomedical, legal, and scientific literature, the precision and accuracy of the language are of utmost importance. Extractive summarization retains the exact wording from the source, avoiding the potential risk of introducing errors through rephrasing.

Computational Efficiency: Abstractive summarization models, although advanced, often require significant computational resources and training data. Extractive models, such as our Knowledge-Enhanced Transformer Graph Summarization (KETGS), are computationally more efficient and well-suited for environments where resources may be limited.

Structural Relationships in Text:Our KETGS framework is designed to leverage the structural and relational properties of the text by identifying key sentences based on entity and discourse relations. This structured extraction process naturally aligns with extractive summarization, ensuring that the resulting summaries maintain the coherence and structure of the original document.

Graph-based methods have long been employed in extractive summarization due to their ability to represent the structural relationships between sentences, entities, and other components of a document. These methods excel at capturing local structural information, such as sentence co-occurrence or word–sentence relationships, making them useful for tasks that require a detailed understanding of document structure. However, one of the main limitations of traditional graph-based approaches is their difficulty in capturing deeper semantic relationships within texts. While graph models can map explicit structural relationships, they often struggle with the subtle, long-range dependencies and complex semantic nuances that are critical for generating high-quality summaries. This limitation is particularly pronounced in cases where context and meaning span multiple sentences or sections of a document, which simple graph connections cannot easily capture [4].

Furthermore, graph-based methods can be computationally expensive, especially when applied to large documents or datasets, as the complexity of the graph grows with the size of the text. While these methods could theoretically map complex semantic relationships, the performance overhead becomes a significant bottleneck for real-time applications.

To address these limitations, recent research has focused on combining graph-based approaches with Transformer models, which excel at capturing contextual dependencies over long distances within a text. Transformers, with their self-attention mechanisms, are higy effective at understanding semantic relationships across a document, even when these relationships are not explicitly represented in the structure of the text [6].

By integrating Transformer-guided attention mechanisms into Graph Neural Networks (GNNs), our Knowledge-Enhanced Transformer Graph Summarization (KETGS) framework enhances the ability of graph models to capture both local structure and global semantic context. This hybrid approach allows us to dynamically update node features using both graph-based structural connections and the semantic insights provided by Transformer models [10,11].

The combination of these two methods addresses the primary limitations of graph-based approaches by enabling the model to capture complex semantic relationships while also improving the overall performance in terms of both speed and accuracy. This is particularly valuable in domains where a balance between structural insight and semantic understanding is crucial, such as scientific or technical summarization tasks.

In recent years, large language models (LLMs) such as GPT-3 and BERT have demonstrated remarkable improvements in generating summaries by effectively selecting relevant content from texts. However, these models often come with significant computational costs, requiring large amounts of data and processing power to perform at optimal levels. Furthermore, while LLMs can generate summaries with high fluency, they may lack the transparency and interpretability needed in certain domains, such as legal or biomedical fields, where understanding how the summary was generated is critical. Our Knowledge-Enhanced Transformer Graph Summarization (KETGS) framework offers an alternative that balances performance with resource efficiency. KETGS leverages the structure of the text, focusing on entity and discourse relations, to generate summaries that are not only accurate but also easily interpretable. Unlike many LLMs, which often operate as “black boxes”, KETGS provides clear mechanisms for how sentences are selected based on graph representations, making the summarization process more transparent. Moreover, KETGS is designed to be more accessible to researchers and practitioners who may not have access to the vast computational resources required by state-of-the-art LLMs. This makes our approach a viable option for applications where computational efficiency, transparency, and accuracy are key considerations.

By incorporating Transformer-guided attention mechanisms into the Graph Neural Network, our KETGS framework overcomes these limitations, dynamically enhancing node features through both local connectivity and global context [12,13].

Furthermore, the KETGS framework employs a Maximal Marginal Relevance (MMR) strategy for sentence selection, which balances the relevance and diversity of the selected content, thereby reducing redundancy and ensuring that the summary covers a broad range of topics within the document [14]. This is particularly important in domains where the information density is high and the nuances of the text are critical, such as in scientific research or legal documents [15,16].

The paper makes the following contributions:

We introduce a novel framework, KETGS, that integrates entity and discourse relations into a graph-based model for extractive summarization, thereby improving the coherence and contextual richness of the generated summaries.
We propose the use of a Transformer-Guided Graph Neural Network (TG-GNN) that leverages both structural graph connectivity and Transformer-based attention to dynamically enhance node features, leading to more accurate sentence salience estimation.
We validate the effectiveness of the KETGS framework through extensive experiments on multiple benchmark datasets, demonstrating its superiority over state-of-the-art extractive summarization models in terms of relevance, coherence, and conciseness of the summaries.

This paper is structured as follows: Section 2 reviews the relevant literature on extractive summarization, Graph Neural Networks, and Transformer models. Section 3 details the methodology behind the KETGS framework, including the processes of document representation, graph construction, and the TG-GNN. Section 4 presents the experimental setup and results. Finally, Section 5 concludes the paper and suggests directions for future research.

2. Related Work

Transformers have revolutionized the natural language processing (NLP) landscape, particularly in text summarization. Models like BERT (Bidirectional Encoder Representations from Transformers) and RoBERTa play a crucial role in advancing the field by enabling models to understand rich contextual relationships across various parts of a document. These transformer models employ a self-attention mechanism that effectively encodes long-range dependencies, which is essential for grasping the context in text summarization tasks [7,17].

Liu and Lapata (2019) investigated the use of pretrained encoders in text summarization, showing that transformers could generate summaries that are not only coherent but also contextually relevant [9]. Their work higighted the benefits of leveraging large-scale pretraining on diverse corpora, enabling transformers to excel in downstream tasks with minimal fine-tuning. The effectiveness of transformers in extractive summarization stems from their ability to model sentence-level representations, which is crucial for identifying and extracting key sentences that encapsulate the document’s main ideas. The scalability of transformers has also made it possible to process longer documents, which was previously a challenge with recurrent neural networks (RNNs) due to their sequential nature. The parallelization capabilities of the transformer architecture have enabled training on large datasets, resulting in models that generalize well across various summarization tasks. Additionally, advancements such as Transformer-XL and Longformer have extended the ability of transformers to handle even longer sequences by incorporating mechanisms that can capture long-term dependencies without compromising efficiency [18,19].

Graph Neural Networks (GNNs) have become increasingly important in NLP due to their ability to model relational data, which is particularly useful in text summarization tasks where understanding the relationships between sentences, entities, and concepts is crucial. GNNs work by propagating information through the graph structure, allowing for the aggregation of features from neighboring nodes, which can represent words, sentences, or entities within a document [20].

The application of GNNs in text summarization is driven by the need to capture the underlying structure of the document. Unlike transformers, which primarily focus on the sequential nature of text, GNNs excel at modeling the non-linear relationships between different textual elements. For example, in a document, certain sentences may be closely related through shared entities or similar topics, which can be effectively captured by representing the document as a graph. Each node in this graph could represent a sentence or an entity, and edges could represent relationships such as co-reference or semantic similarity.

Wang et al. (2019) introduced HyperSum, a hypergraph-based approach to query-oriented summarization that demonstrated the flexibility of GNNs in handling complex summarization tasks [21]. Hypergraphs, which generalize traditional graphs by allowing edges (called hyperedges) to connect more than two nodes, are particularly useful in summarization because they can naturally represent multi-way relationships between sentences and entities. This approach has been shown to improve the quality of summaries by better capturing the document’s overall structure and context.

Further research by Xu et al. (2019) explored the use of GNNs for modeling discourse structures within documents, emphasizing that GNNs could enhance the coherence of generated summaries by maintaining the logical flow of information [22]. The ability of GNNs to integrate information from various parts of the document and to propagate contextual information across the graph makes them particularly effective for summarization tasks that require a deep understanding of document structure.

The combination of transformers with GNNs has emerged as a powerful approach for extractive text summarization, combining the contextual modeling strengths of transformers with the relational modeling capabilities of GNNs. This hybrid approach addresses some of the limitations inherent in using either model alone, particularly in capturing both local and global context in a document [11].

Zhang et al. (2022) developed the Hypergraph Transformer, a model that leverages the strengths of both transformers and GNNs for long document summarization [11]. In this model, transformers are used to generate contextual embeddings for sentences, while GNNs are employed to capture the relationships between these sentences in a hypergraph structure. This integration allows the model to capture both the fine-grained details and the broader context of the document, leading to more accurate and coherent summaries.

The combination of transformers and GNNs is particularly beneficial in scenarios where the document structure is complex, such as scientific articles or legal documents. In these cases, understanding the relationships between different sections of the document is crucial for generating summaries that are both concise and comprehensive. The use of GNNs to model these relationships, combined with the contextual embeddings provided by transformers, results in summaries that are not only accurate but also maintain the logical structure of the original document.

Moreover, recent work by Yadav et al. (2023) has explored the use of hierarchical transformers combined with GNNs to improve the scalability of summarization models for very long documents [10]. By organizing the document into a hierarchical structure and applying GNNs at each level of the hierarchy, these models can efficiently summarize documents that are several pages long, making them suitable for applications in areas such as legal document summarization or summarization of technical manuals.

Extractive summarization, which involves selecting and concatenating sentences from the source document to create a summary, has seen significant advancements in recent years. Techniques such as Maximal Marginal Relevance (MMR) have been widely adopted to balance relevance and diversity in the selected sentences [14]. The MMR approach selects sentences that are higy relevant to the document’s main themes while minimizing redundancy, thereby improving the informativeness of the summary.

Recent work has also focused on the development of sophisticated scoring mechanisms that account for both the salience of individual sentences and their contribution to the overall coherence of the summary. For example, the work by Zhong et al. (2020) introduced a text-matching approach to extractive summarization, where sentences are selected based on their similarity to a reference summary, ensuring that the generated summary is both relevant and concise [23].

Yadav et al. (2023) provided a comprehensive review of state-of-the-art extractive summarization methods, higighting the importance of incorporating external knowledge and improving the diversity of selected sentences to avoid redundancy [10]. These advancements have been critical in enhancing the quality of extractive summaries, particularly in applications where the preservation of the original document’s meaning and structure is paramount.

The field of text summarization has seen notable advancements, with recent focus on hybrid approaches that combine extractive and abstractive methods to produce coherent and contextually accurate summaries. The recent summarization approaches can leverage dependency parsing and sentence compression to fuse extractive and abstractive techniques, enhancing summary readability while reducing redundancy [24]. Such hybrid methods have proven effective in retaining the accuracy of extractive models while incorporating the flexibility of abstractive techniques.

In addition, Transformer-based models integrated with Graph Neural Networks (GNNs) have shown substantial improvements in summarization quality, especially for long documents. These models combine the contextual embedding power of Transformers with the relational strengths of graph structures. For instance, CovSumm demonstrates the benefits of such integration in generating summaries for domain-specific datasets, such as COVID-19 research papers, by capturing long-range dependencies and contextual relationships [25].

Another significant trend is the use of Graph Attention Networks (GATs) in summarization, as GATs capture detailed relationships between sentences and discourse elements. This attention mechanism enables models to prioritize salient text segments, enhancing summarization relevance and coherence. Recent research indicates that GAT-based models outperform traditional GNNs in tasks that require nuanced inter-sentence connections, making them especially useful for document types requiring high retention of structural relationships [26]. Lastly, the need for domain-specific summarization models is increasingly recognized, particularly in fields such as biomedicine and law. Summarization models designed for these areas prioritize features like entity recognition and discourse structuring, which are essential for summarizing technical documents effectively and accurately. Research on domain-specific applications higights the role of targeted models in achieving better summarization accuracy and computational efficiency for specialized document sets [27].

These recent works provide a strong foundation for the development of frameworks like our Knowledge-Enhanced Transformer Graph Summarization (KETGS), which combines the strengths of GNNs and Transformer models to capture complex semantic and discourse relations in extractive summarization.

Incorporating external knowledge into summarization models is another area of active research, particularly in improving the accuracy and relevance of generated summaries. Knowledge-enhanced models leverage external knowledge sources, such as knowledge graphs or domain-specific ontologies, to provide additional context that may not be explicitly present in the text. This approach has been shown to improve the model’s understanding of the text, leading to summaries that are more informative and contextually accurate.

The Knowledge-Enhanced Transformer Graph Summarization(KETGS) framework builds on this concept by integrating entity and discourse relations into a graph-based summarization model. By representing documents as graphs where nodes correspond to entities, sentences, and discourse elements, as well as edges capture the relationships between them, KETGS can better capture the semantic richness of the document. This approach allows for more accurate identification of the most salient sentences, leading to summaries that are both concise and comprehensive.

The integration of external knowledge also helps in dealing with the challenges of summarizing domain-specific documents, where understanding the nuances of the text is crucial for generating accurate summaries. For instance, in the medical domain, incorporating medical ontologies into the summarization model can significantly improve the relevance of the generated summaries by ensuring that key medical concepts are appropriately represented.

The Knowledge-Enhanced Transformer Graph Summarization (KETGS) framework stands at the intersection of several cutting-edge research areas, including transformers, GNNs, and knowledge-enhanced NLP models. By synthesizing these approaches, KETGS represents a significant advancement in the field of extractive summarization, offering improved relevance, coherence, and conciseness in generated summaries. The related work discussed here underscores the importance of hybrid models and the continuous evolution of techniques that seek to leverage both the local and global context within documents.

One of the unique features of our Knowledge-Enhanced Transformer Graph Summarization (KETGS) framework is the integration of discourse relations into the graph-based model. Traditional graph-based summarization methods typically focus on structural or syntactic connections, such as sentence similarity or dependency links. However, these methods often overlook the nuanced discourse relationships that exist between sentences, which play a crucial role in preserving the coherence and flow of the original text. Discourse relations—such as coherence, entailment, contrast, and cause–effect—help capture deeper semantic connections that extend beyond simple syntactic or lexical similarities. By incorporating these relations into the KETGS framework, we enable the model to understand how different parts of the text relate to one another contextually, leading to more informed and contextually appropriate sentence selection. This approach allows KETGS to outperform traditional models, which lack such discourse-aware mechanisms. Moreover, the integration of discourse relations in KETGS is complemented by Transformer-based attention mechanisms, which further enhance the model’s capacity to focus on contextually relevant information. This combination of discourse relations with attention-based features enables KETGS to generate summaries that are not only concise but also semantically rich and coherent, aligning with human-like summarization. Incorporating discourse information can significantly improve summarization quality, especially in complex documents where maintaining the logical flow is essential. Our approach builds on this understanding by using discourse relations as a core component in the sentence selection process, distinguishing KETGS from other graph-based summarization models.

3. Methodology

This section delineates the methodologies employed in the Knowledge-Enhanced Transformer Graph Summarization (KETGS) framework, designed to optimize extractive text summarization through advanced graph-based techniques and Transformer architectures. Initially, document representation is achieved by embedding words, sentences, and key entities, followed by the construction of a sophisticated graph that encapsulates the relationships among these components. Subsequently, the Transformer-Guided Graph Neural Network (TG-GNN) processes this graph to enhance node features dynamically, integrating local and global contextual information. The enhanced features are then utilized to score and select sentences based on their salience and relevance, culminating in the generation of concise and informative summaries. The overall process is structured to maintain a balance between computational efficiency and the accuracy of the summarization, ensuring the model’s applicability across diverse textual datasets. In Figure 1, the general structure of KETGS is presented.

In this work, we introduce an enhanced framework for extractive summarization that combines the strengths of Graph Neural Networks (GNNs) with Transformer-based self-attention mechanisms. Our proposed Knowledge-Enhanced Transformer-Guided Graph Neural Network (KETGS) aims to address limitations in traditional extractive models by incorporating both internal and external knowledge to enrich semantic understanding. The initial stage of the KETGS framework constructs a graph representation from input text by treating sentences, words, and named entities as nodes, which are interconnected through syntactic, semantic, and entity-based edges. Unlike simpler graph models, KETGS introduces domain-specific external knowledge into the graph structure. This integration of external knowledge sources, such as knowledge bases or domain-specific lexicons, allows the model to gain a deeper understanding of entity meanings and their contextual roles within the text. Such enrichment is particularly valuable for complex and information-dense datasets, like those from the biomedical domain, where the precise understanding of entities and relationships is crucial. Additionally, the “Transformer-guided” component in our model is not merely an added self-attention layer but an integral element that enhances the GNN’s functionality. Specifically, the Transformer’s self-attention mechanism is embedded within the GNN layers, allowing each node in the graph to aggregate information from its neighbors while gaining a global perspective across the entire graph. This mechanism empowers the model to capture both local node interactions and broader contextual relationships, resulting in refined node embeddings with a higher level of semantic coherence. This fusion of GNN and Transformer components enables KETGS to achieve a comprehensive understanding of the document structure, surpassing the limitations of standard GNNs which are restricted to local neighborhood information. Through these enhancements, KETGS leverages the powerful contextualizing abilities of self-attention while retaining the relational learning advantages of GNNs. The resulting model effectively balances local and global context, making it uniquely suited for domain-specific summarization tasks. We further validate the efficacy of the proposed model by evaluating it across multiple benchmark datasets, demonstrating its superiority in maintaining semantic relevance and coherence compared to baseline methods. This approach underscores the novel contributions of our framework, showing how external knowledge and Transformer-guided GNN processing together advance the state-of-the-art in extractive summarization.

3.1. Document Representation Initialization

In the Knowledge-Enhanced Transformer Graph Summarization (KETGS) framework, the initialization of document representation is pivotal for effective summarization. This section details the process of initializing the word, sentence, and entity nodes within the document graph. The general structure of this stage is summarized in Algorithm 1.

Algorithm 1 Document Representation Initialization
Require: Document D consisting of sentences ${S_{1}, S_{2}, \dots, S_{n}}$
Ensure: Graph $G = (V, E)$ with initialized node embeddings and edges representing semantic, syntactic, and entity relationships
1:	$V \leftarrow \emptyset$	▹ Initialize the set of vertices (nodes)
2:	$E \leftarrow \emptyset$	▹ Initialize the set of edges
3:	Pre-trained Model← Load BERT or RoBERTa model for embedding computation
4:	for each sentence $S_{i}$ in D do
5:	Compute sentence embedding $s_{i}$ as the mean of BERT embeddings of words in $S_{i}$ :
6:	$s_{i} \leftarrow \frac{1}{\| S_{i} \|} \sum_{w \in S_{i}} BERT (w)$
7:	Add sentence node $s_{i}$ to V
8:	end for
9:	Perform Named Entity Recognition (NER) on D to detect entities
10:	for each entity e detected in D do
11:	Compute embedding for entity e using BERT:
12:	$e_{e} \leftarrow BERT (e)$
13:	Add entity node $e_{e}$ to V
14:	end for
15:	for each word w in D do
16:	Compute embedding for word w using BERT:
17:	$e_{w} \leftarrow BERT (w)$
18:	Add word node $e_{w}$ to V
19:	end for
20:	for each word w in each sentence $S_{i}$ do
21:	Add edge $(e_{w}, s_{i})$ to E	▹ Link word nodes to their corresponding sentence node
22:	end for
23:	for each entity e in each sentence $S_{i}$ where e appears do
24:	Add edges $(e_{e}, s_{i})$ and $(e_{e}, e_{w})$ for all w in $S_{i}$ to E	▹ Link entity nodes to sentences and words
25:	end for
26:	for each pair of sentences $(S_{i}, S_{j})$ do
27:	if semantic similarity between $S_{i}$ and $S_{j}$ is above threshold then
28:	Add edge $(s_{i}, s_{j})$ to E	▹ Add semantic similarity edges between sentences
29:	end if
30:	end for
31:	return Graph $G = (V, E)$

3.1.1. Word Embeddings

Word embeddings are the foundational layer of our model, providing dense vector representations for words that capture their semantic meanings. We utilize embeddings from a pre-trained Transformer model such as BERT (Bidirectional Encoder Representations from Transformers) or RoBERTa (Robustly Optimized BERT Approach), which are adept at encoding contextual information:

e_{w} = BERT (w)

(1)

where w is a word in the document and

e_{w}

is its embedding.

3.1.2. Sentence Embeddings

Each sentence in the document is represented as a node in our graph. The initial representation of a sentence node,

s_{i}

, is derived by aggregating the embeddings of its constituent words. Specifically, we apply mean pooling to the word embeddings:

s_{i} = \frac{1}{| S_{i} |} \sum_{w \in S_{i}} e_{w}

(2)

where

S_{i}

is the set of words in sentence i and

| S_{i} |

is the number of words in the sentence.

3.1.3. Entity Recognition and Embeddings

Named Entity Recognition (NER) is employed to identify entities within the text which are subsequently treated as separate nodes in the graph. These entities provide crucial factual context and enhance the connectivity of the graph. Each entity e detected in the document is embedded using the same Transformer model, ensuring that the entity embeddings are contextually aligned with the word embeddings:

e_{e} = BERT (e)

(3)

where

e_{e}

is the embedding of entity e.

3.1.4. Graph Initialization

With the embeddings of words, sentences, and entities prepared, we initialize the document graph

G = (V, E)

, where V is the set of nodes and E is the set of edges. Nodes in V include word nodes, sentence nodes, and entity nodes, each initialized as described above. Edges in E are established based on several criteria:

Word-to-Sentence Edges: Each word node is connected to its corresponding sentence nodes if the word appears in the sentence.
Entity-to-Word/Sentence Edges: Each entity node is connected to word nodes and sentence nodes where the entity is mentioned or is contextually relevant.
Sentence-to-Sentence Coherence Edges: These are based on the semantic similarity between sentences, facilitating the flow of information across the document.

This structured initialization lays the foundation for the subsequent layers of our model where dynamic updates to the graph further refine the representations based on the interactions modeled by the Transformer-guided GNN.

3.2. Graph Construction

Following the initialization of document representations, the Knowledge-Enhanced Transformer Graph Summarization (KETGS) framework constructs a comprehensive graph that embodies the relationships among words, sentences, and entities. This section details the mechanisms of graph construction, focusing on the integration of diverse relational information. The general structure of graph construction is summarized in Algorithm 2.

The graph

G = (V, E)

, where V consists of word nodes

{w_{i}}

, sentence nodes

{s_{j}}

, and entity nodes

{e_{k}}

, is further refined by establishing edges based on linguistic and semantic relationships. These edges are vital for the propagation of information and the summarization process.

Algorithm 2 Graph Construction with Enhanced Connections
Require: Initial graph $G = (V, E)$ with sentence nodes $s_{i}$ , word nodes $w_{j}$ , and entity nodes $e_{k}$ from Algorithm 1
Ensure: Updated graph G with enhanced connections among nodes
1:	for each sentence node $s_{i} \in V$ do
2:	for each word node $w_{j}$ contained in sentence $s_{i}$ do
3:	Add or reinforce edge $(s_{i}, w_{j})$ in E based on word co-occurrence	▹ Link words to sentences based on frequency and proximity
4:	end for
5:	end for
6:	for each entity node $e_{k} \in V$ do
7:	for each sentence node $s_{l}$ that contains entity $e_{k}$ do
8:	Add or reinforce edge $(e_{k}, s_{l})$ in E based on semantic relevance	▹ Enhance semantic relevance between entities and sentences
9:	end for
10:	end for
11:	for each pair of sentence nodes $(s_{m}, s_{n}) \in V$ do
12:	Calculate semantic similarity $sim (s_{m}, s_{n})$ between $s_{m}$ and $s_{n}$
13:	if $sim (s_{m}, s_{n}) > θ$ then	▹ Threshold $θ$ controls the strength of semantic connection
14:	Add or reinforce edge $(s_{m}, s_{n})$ in E	▹ Connect sentences with high semantic similarity
15:	end if
16:	end for
17:	return Enhanced graph $G = (V, E)$ with additional semantic, syntactic, and entity-based connections

Word-to-Sentence Connections: Each word node is connected to the sentence nodes in which it appears. This direct linkage allows sentence nodes to aggregate lexical features from their constituent words, crucial for capturing detailed semantic content:

E \leftarrow E \cup {(w_{i}, s_{j}) ∣ w_{i} appears in s_{j}}

(4)

Entity-to-Word and Entity-to-Sentence Connections: Entity nodes are connected to both the word nodes representing the entity and the sentence nodes where the entity is mentioned. These connections enhance the graph’s ability to encapsulate factual content and contextual relevance:

E \leftarrow E \cup {(e_{k}, w_{i}) ∣ e_{k} is the same as w_{i}}

(5)

E \leftarrow E \cup {(e_{k}, s_{j}) ∣ e_{k} is mentioned in s_{j}}

(6)

Sentence-to-Sentence Coherence: Edges between sentence nodes are established based on semantic similarity, calculated through cosine similarity of their embedding vectors. These edges facilitate the understanding of document structure and narrative flow:

E \leftarrow E \cup {(s_{i}, s_{j}) ∣ sim (s_{i}, s_{j}) > θ}

(7)

where

sim (s_{i}, s_{j})

represents the cosine similarity and

θ

is a predefined threshold.

3.3. Transformer-Guided Graph Neural Network

Once the graph is constructed, it is processed through a Transformer-guided Graph Neural Network (TG-GNN). This network utilizes a novel architecture that combines traditional graph convolution with Transformer-based attention mechanisms, aimed at enhancing node feature learning through contextualized relational embeddings. The general structure of transformer-guided Graph Neural Network proceessing is presented in Algorithm 3.

Algorithm 3 Transformer-Guided Graph Neural Network Processing
Require: Graph $G = (V, E)$ with initial node features ${h_{v}^{(0)}}$ for all $v \in V$
Ensure: Updated node features ${h_{v}^{final}}$ for all $v \in V$
1:	for each timestep $t = 1$ to T do	▹ Iterate for T GNN layers
2:	for each node $v \in V$ do
3:	Aggregate features from neighboring nodes $N (v)$ using GNN update:
4:	$h_{v}^{(t)} \leftarrow GNN_Update (h_{v}^{(t - 1)}, {h_{u}^{(t - 1)} : u \in N (v)})$	▹ Update node features based on neighbors
5:	end for
6:	end for
7:	for each node $v \in V$ do
8:	Refine node feature $h_{v}^{(T)}$ using Transformer self-attention:
9:	$h_{v}^{final} \leftarrow Transformer_Attention (h_{v}^{(T)}, {h_{u}^{(T)} : u \in V})$	▹ Apply multi-head attention across all nodes for global context
10:	end for
11:	return ${h_{v}^{final}}$ for all $v \in V$

The TG-GNN updates node features by leveraging both the structural connectivity of the graph and the contextual relevance provided by the Transformer model. The update process is iteratively performed, refining node representations to better reflect both local and global document contexts:

h_{v}^{(t + 1)} = ReLU (\sum_{u \in N (v)} α_{u v}^{(t)} W^{(t)} h_{u}^{(t)} + b^{(t)})

(8)

where

h_{v}^{(t)}

is the feature vector of node v at iteration t,

N (v)

denotes the neighborhood of v,

α_{u v}^{(t)}

represents the attention coefficients, and

W^{(t)}

and

b^{(t)}

are trainable parameters of the network.

This Transformer-guided approach ensures that the graph not only captures explicit relationships encoded in the initial graph structure but also adapts to implicit contextual cues, leading to a robust and dynamic summarization capability.

The final embeddings produced by the TG-GNN are utilized to determine the salience of sentences for the summarization task. Sentences are ranked based on their embedded feature representations, and the top-ranked sentences are selected to form the summary. This selection process is guided by both the content relevance and diversity, ensuring a concise yet comprehensive summary.

This methodical approach to graph construction and processing establishes a strong foundation for extractive summarization, enabling the KETGS framework to produce summaries that are not only coherent and contextually rich but also factually accurate and informatively dense.

The Transformer-Guided Graph Neural Network (TG-GNN) is a pivotal component of the Knowledge-Enhanced Transformer Graph Summarization (KETGS) framework. It combines the strengths of Graph Neural Networks (GNNs) and Transformer architectures to enrich the node representations with contextual information from the graph structure. This section outlines the operation of the TG-GNN and its integration into the summarization process.

The TG-GNN architecture employs a multi-layer approach where each layer is designed to process the graph’s node features through a combination of graph convolution and self-attention mechanisms inspired by Transformers.

Graph Convolution Layer: The graph convolution layers in the TG-GNN are responsible for aggregating information from the neighbors of a node. This aggregation is crucial for capturing local structure and feature information, which is essential for understanding the relationships and relevance of words, sentences, and entities within the document:

h_{v}^{(t + 1)} = σ (\sum_{u \in N (v)} \frac{1}{c_{u v}} W^{(t)} h_{u}^{(t)} + b^{(t)})

(9)

where

h_{v}^{(t)}

is the feature vector of node v at layer t,

N (v)

denotes the neighbors of v,

c_{u v}

is a normalization constant (e.g., the degree of v),

W^{(t)}

and

b^{(t)}

are the trainable weight and bias at layer t, and

σ

is a non-linear activation function.

Transformer Attention Layer: Following the graph convolution, the features of the nodes are refined using a self-attention mechanism derived from Transformers. This layer enables the model to weigh the importance of each node’s features based on the global context of the document, enhancing the ability to identify salient information across the document:

α_{u v} = \frac{exp (LeakyReLU (a^{T} [W_{q} h_{u} ∥ W_{k} h_{v}]))}{\sum_{k \in N (u)} exp (LeakyReLU (a^{T} [W_{q} h_{u} ∥ W_{k} h_{k}]))}

(10)

h_{v}^{(t + 1)} = \sum_{u \in N (v)} α_{u v} W_{o} h_{u}

(11)

where

W_{q}

W_{k}

, and

W_{o}

are the query, key, and output projection matrices of the attention mechanism, respectively, and

a

is a learnable parameter vector for the attention score.

The outputs from the graph convolution and Transformer attention layers are integrated to form the final node representations. This integration allows the model to leverage both local connectivity and global document context effectively:

h_{v}^{final} = FFN (h_{v}^{(t + 1)})

(12)

where FFN represents a feed-forward network applied to the output of the attention layer, further refining the node features for summarization.

The enriched node features from the TG-GNN are crucial for identifying the most relevant sentences for the summary. This architecture ensures that the TG-GNN not only enhances the feature representation of each node in the graph but also optimizes the summarization process by focusing on the most significant parts of the document.

3.4. Sentence Scoring and Selection

After the Transformer-Guided Graph Neural Network (TG-GNN) enriches the sentence node features, the next crucial stage in the KETGS framework is scoring and selecting sentences for the summary. This process is vital for determining which sentences encapsulate the core information of the document and should be included in the summary. The general structure of this stage is briefly summarized in Algorithm 4.

Algorithm 4 Sentence Scoring and Selection
Require: Graph $G = (V, E)$ with node features ${h_{v}^{final}}$
Ensure: Summary S consisting of selected sentences
1:	Initialize empty summary S
2:	Compute salience scores for all sentence nodes:
3:	for each sentence node $s_{j}$ do
4:	$score (s_{j}) \leftarrow w^{T} h_{s_{j}}^{final}$	▹ Compute salience
5:	end for
6:	Select sentences based on scores and diversity using MMR:
7:	while length of S less than desired summary length do
8:	Select $s^{*}$ maximizing MMR criterion:
9:	$s^{*} \leftarrow arg {max}_{s_{j} \notin S} (λ score (s_{j}) - (1 - λ) {max}_{s_{i} \in S} sim (s_{j}, s_{i}))$
10:	$S \leftarrow S \cup {s^{*}}$
11:	end while
12:	return Summary S

The scoring function evaluates the salience of each sentence based on its final node representation obtained from the TG-GNN. The salience score for each sentence node

s_{j}

is calculated as follows:

score (s_{j}) = w^{T} h_{s_{j}}^{final}

(13)

where

w

is a trainable parameter vector that projects the node features into a scalar salience score and

h_{s_{j}}^{final}

is the enriched feature vector of the sentence node

s_{j}

To construct the summary, sentences are selected based on their salience scores. However, simply selecting the highest-scoring sentences could result in redundancy and a lack of diversity in the summarized content. To address this, we employ a selection mechanism that promotes diversity and reduces redundancy.

Maximal Marginal Relevance (MMR): This method balances the relevance and diversity by iteratively selecting sentences that offer the most unique information in the context of what has already been selected. The MMR is defined as

$\begin{matrix} MMR & = arg max_{s_{j} \notin S} (λ score (s_{j}) \\ - (1 - λ) max_{s_{i} \in S} sim (s_{j}, s_{i})) \end{matrix}$

(14)

where S is the set of already selected sentences, $sim (s_{j}, s_{i})$ measures the similarity between sentences $s_{j}$ and $s_{i}$ , and $λ$ is a parameter that balances relevance and diversity.

Using the MMR strategy, sentences are selected one at a time until a predefined length or number of sentences is reached, ensuring the summary is concise yet comprehensive. This selection process ensures that the resulting summary covers a broad range of topics discussed in the document without significant overlap, thus maintaining the integrity and brevity of the summarized content.

This methodical approach to scoring and selecting sentences enables the KETGS framework to generate summaries that are not only informative and relevant but also diverse and representative of the entire document.

3.5. Training and Optimization

The training and optimization of the Knowledge-Enhanced Transformer Graph Summarization (KETGS) framework are crucial for its performance in extractive text summarization. This section describes the training process, the loss function, and the optimization strategies used to fine-tune the model parameters effectively.

The primary objective during the training of the KETGS framework is to minimize the difference between the predicted salience of sentences and their ground truth labels, which indicate whether a sentence should be included in the summary. The loss function used is a combination of binary cross-entropy loss for each sentence:

L = - \sum_{j = 1}^{N} (y_{j} log ({\hat{y}}_{j}) + (1 - y_{j}) log (1 - {\hat{y}}_{j}))

(15)

where N is the number of sentences in the document,

y_{j}

is the ground truth label of the jth sentence, and

{\hat{y}}_{j}

is the predicted probability that the jth sentence should be included in the summary.

The training process involves several steps designed to enhance the capability of the TG-GNN to accurately predict the importance of sentences based on their enriched node features:

Feature Initialization: Load pre-trained embeddings and initialize the node features for words, sentences, and entities.
Graph Construction: Build the graph with nodes and edges as described in the graph construction section.
Feature Propagation: Run the TG-GNN to update the node features across multiple layers.
Score Calculation: Calculate the salience scores for each sentence using the trained model.
Backpropagation: Use the loss function to compute gradients and backpropagate errors through the network.

Hyperparameters such as the learning rate, the number of layers in the TG-GNN, and the balance parameter

λ

in the sentence selection algorithm are tuned based on performance on a held-out validation set. This tuning helps to find the best model configuration that maximizes performance on unseen data. To prevent overfitting, we apply dropout regularization in the TG-GNN layers and also consider early stopping based on the loss on a validation set. These techniques ensure that the model generalizes well to new, unseen documents. Through these comprehensive training and optimization strategies, the KETGS framework is fine-tuned to perform effectively on the task of extractive text summarization, yielding summaries that are both informative and concise.

4. Experimental Settings

4.1. Experimental Setup

The experimental evaluation of the Knowledge-Enhanced Transformer Graph Summarization (KETGS) framework was conducted using a suite of rigorous tests across multiple benchmark datasets. Our objective was to assess the effectiveness of KETGS in generating high-quality extractive summaries compared to state-of-the-art models. The implementation was carried out in PyTorch 2.5.1 [28], and all experiments were conducted on a cluster equipped with NVIDIA Tesla V100 GPUs.

The preprocessing pipeline included tokenization using the spaCy library [29], followed by embedding generation using pre-trained BERT-base models [7]. Each document was transformed into a graph structure where nodes represented sentences, words, and entities, with edges capturing various types of relationships, including semantic similarity, syntactic dependencies, and discourse relations. The model was trained using the Adam optimizer with a learning rate of

1 \times 10^{- 4}

, batch size of 32, and early stopping based on validation loss. To ensure optimal performance of our Knowledge-Enhanced Transformer Graph Summarization (KETGS) framework, several important parameters were tuned experimentally. These include the balance parameter

λ

, the threshold

θ

, and the co-occurrence degree in Algorithm 2. The balance parameter

λ

controls the trade-off between relevance and diversity in sentence selection using the Maximal Marginal Relevance (MMR) criterion. We experimented with values of

λ

ranging from 0.1 to 0.9 in increments of 0.1. The final value of

λ = 0.7

was selected based on its ability to maximize summary coherence while maintaining diversity in the selected sentences. The threshold

θ

is used to determine the similarity between sentences when constructing the sentence-to-sentence edges in the graph. We tested values of

θ

ranging from 0.3 to 0.7 in increments of 0.05, using cosine similarity as the similarity metric. The optimal value of

θ = 0.5

was selected based on validation set performance, ensuring a balance between connectivity and noise reduction in the sentence graph. The co-occurrence degree in Algorithm 2 defines how strongly word-to-sentence connections are weighted based on their frequency in the document. We experimented with different scaling factors for co-occurrence, and a co-occurrence degree of 2 was found to yield the best results in terms of both sentence salience and document coherence. These parameters were tuned using a grid search methodology, where the performance on the validation set was used to select the final values. The process was repeated for each dataset to ensure robustness across different types of texts.

4.2. Datasets

We utilized three well-established benchmark datasets, each representing different challenges in extractive text summarization:

XSum [30]: This dataset contains over 200,000 news articles from the BBC, each paired with a single-sentence summary. The dataset was split into 203,028 training examples, 11,273 validation examples, and 11,332 test examples. XSum presents a challenge due to the brevity and specificity of the summaries.
CNN/DailyMail [31]: A widely used dataset for summarization tasks, consisting of 287,084 training examples, 13,367 validation examples, and 11,489 test examples. It includes news articles with multi-point bullet summaries, testing the ability to capture and summarize multi-faceted narratives.
PubMed [32]: A dataset of biomedical abstracts with long, complex summaries. It consists of 83,233 training examples, 4946 validation examples, and 5025 test examples, representing a significant challenge due to the technical nature of the content.

4.3. Baseline Models

To comprehensively evaluate the performance of KETGS, we compared it against 10 baseline models, representing a range of methodologies from traditional to state-of-the-art approaches:

LEAD-3 [33]: A heuristic that selects the first three sentences of a document as the summary.
SummaRuNNer [33]: An RNN-based sequence model that computes sentence salience scores to generate a summary.
TextRank [3]: A graph-based ranking algorithm that applies the PageRank algorithm to text, using sentence connectivity as the graph’s edges.
BERTSUMEXT [9]: A BERT-based model specifically fine-tuned for extractive summarization, which ranks sentences based on their contextual embeddings.
MatchSum [23]: A contrastive learning approach that selects summary sentences by matching candidate sentences with the overall document content.
NeRoBERTa [34]: A model that uses RoBERTa embeddings adapted for hierarchical document structures, enhancing summarization in complex documents.
HIBERT [12]: A hierarchical Transformer model that captures document-level context for summarization.
JECS [35]: A model that combines extraction and compression techniques to generate concise summaries, focusing on syntactic transformations.
BanditSum [36]: A reinforcement learning-based model that treats summarization as a contextual bandit problem, optimizing for metrics like ROUGE.
jia2020neural: A model that uses hierarchical attention mechanisms with heterogeneous graph representations to improve summarization across multiple document levels.
GPT-4: GPT-4, developed by OpenAI, is a versatile large language model capable of both extractive and abstractive summarization. In our study, GPT-4 was employed specifically as an extractive summarization tool. To implement this, we designed prompt instructions that directed GPT-4 to identify and select key sentences from the original text rather than generate novel sentences or rephrase content. The extractive process with GPT-4 involves the following steps:
- Content Parsing: The full text of the document is fed to GPT-4, with prompts that instruct it to focus on selecting sentences that best represent the document’s main themes and key information.
- Sentence Scoring: GPT-4 evaluates and ranks sentences based on their relevance to the overall content, utilizing its pretrained knowledge and comprehension capabilities. This step does not involve new content generation; instead, it relies on GPT-4’s understanding of content importance within the context.
- Extraction: The top-ranked sentences are selected as the summary. This approach ensures that the original language of the document is preserved, with minimal risk of introducing information distortion or hallucination, which can occur with purely abstractive methods.
Using GPT-4 in this extractive mode allows us to leverage its language understanding strengths without diverging from the source content, thus maintaining high factual integrity in the summary. This extractive process is well-suited to applications requiring adherence to the original language, such as in technical or specialized domains where exact wording is essential.
MCHES: MCHES, as proposed by Onan et al. [34], is a model designed with multi-dimensional embeddings and fine-tuned adjustment mechanisms to optimize both coherence and relevance in summarization. It operates as an extractive model in our experiments, using a structured approach to select sentences based on their semantic fit within the document. MCHES is particularly effective in balancing the preservation of key thematic elements while minimizing redundancy, making it a suitable baseline for our comparative analysis with KETGS.

4.4. Model Configurations

The KETGS framework was configured with the following components. Table 1 provides a summary of the key configurations.

4.5. Parameter Tuning and Determination Curves

To determine the optimal values for key parameters in the KETGS framework, we conducted an extensive grid search and analyzed the resulting performance across multiple metrics. Here, we provide detailed parameter determination curves for key parameters: the balance parameter

λ

in the Maximal Marginal Relevance (MMR) strategy, the semantic similarity threshold

θ

, and the co-occurrence degree in the graph construction process.

Balance Parameter

λ

in MMR Strategy: Figure 2 illustrates the effect of varying

λ

on key performance metrics. The parameter

λ

controls the trade-off between relevance and diversity in sentence selection. Values of

λ

between 0.5 and 0.8 resulted in higher ROUGE scores, with

λ = 0.7

achieving the best balance. This optimal value reflects a point where the model effectively captures essential content without excessive redundancy.

Semantic Similarity Threshold

θ

: The threshold

θ

determines whether an edge is added between sentence nodes based on their semantic similarity. As shown in Figure 3, values of

θ

between 0.4 and 0.6 yielded the highest scores, with

θ = 0.5

selected for optimal connectivity and noise reduction. This threshold ensures that sentences with high semantic alignment are linked without overloading the graph with weaker connections.

Co-occurrence Degree in Graph Construction: The co-occurrence degree defines the weighting of connections between word and sentence nodes based on word frequency. Figure 4 presents the results across different values. A co-occurrence degree of 2 yielded the best performance, higighting its role in enhancing sentence salience by linking frequently occurring words more strongly to their respective sentences.

These parameter determination curves provide a detailed view of how each parameter influences the model’s performance. The optimal values chosen align with the highest scores across ROUGE and BERTScore metrics, ensuring that the KETGS framework is fine-tuned for maximum effectiveness.

4.6. Training and Optimization

The KETGS model was trained using the Adam optimizer [37], with a learning rate of

1 \times 10^{- 4}

and a batch size of 32. Training was conducted over 30 epochs, with early stopping applied based on validation loss to avoid overfitting. Dropout was used with a rate of 0.3 to further enhance model generalization. Additionally, we performed a grid search to fine-tune hyperparameters, ensuring optimal performance across all datasets.

4.7. Evaluation Metrics

We employed five key metrics to evaluate the performance of KETGS and the baseline models:

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) [38]: ROUGE scores are calculated as follows:

$ROUGE - N = \frac{\sum_{ref \in Refs} \sum_{gram \in Grams} min (Count (gram), {Count}_{ref} (gram))}{\sum_{ref \in Refs} \sum gram \in Grams {Count}_{ref} (gram)}$

(16)
BERTScore [39,40]: Computes the cosine similarity between BERT embeddings of the generated and reference summaries, capturing semantic similarity:

$BERTScore = \frac{1}{| S |} \sum_{i = 1}^{| S |} {max}_{j = 1}^{| T |} cos (s_{i}, t_{j})$

(17)

where S and T are the sets of tokens in the generated and reference summaries, respectively, and $\cos (s_{i}, t_{j})$ is the cosine similarity between tokens $s_{i}$ and $t_{j}$ .
MoverScore [41]: Uses Earth Mover’s Distance (EMD) to evaluate the similarity between the embeddings of the generated and reference summaries, considering both semantic content and word order.
METEOR [42]: Considers precision, recall, and alignment based on exact, stem, synonym, and paraphrase matches, offering a nuanced evaluation of semantic content.
BLEU (Bilingual Evaluation Understudy) [43]: Calculates the precision of n-grams in the generated summary compared to the reference summary. BLEU is computed as

$BLEU = B P \cdot exp (\sum_{n = 1}^{N} w_{n} log p_{n})$

(18)

where $B P$ is the brevity penalty, $p_{n}$ is the precision of n-grams, and $w_{n}$ is the weight assigned to n-grams of size n.

4.8. Ablation Studies

We conducted an ablation study with the following model variants:

KETGS-NoGNN: A version of the model without the Graph Neural Network to assess the impact of graph-based context integration.
KETGS-NoMMR: A variant excluding the Maximal Marginal Relevance strategy to determine its role in improving summary diversity and relevance.
KETGS-Basic: A simplified model using basic graph structures without advanced entity and discourse relations.
KETGS-NoEntities: A variant that removes entity nodes from the graph, focusing solely on sentences and words.
KETGS-NoDiscourse: A version excluding discourse relation edges to analyze their contribution to the overall summarization quality.

The experimental results are summarized in Table 2, Table 3 and Table 4. These tables present the performance of all models across all three datasets for each set of metrics.

The results across various evaluation metrics provide a clear indication of the effectiveness of the KETGS framework compared to the baseline models.

As shown in Table 2, KETGS consistently outperforms all baseline models across ROUGE-1, ROUGE-2, and ROUGE-L scores on all datasets. The superior performance in ROUGE-2 and ROUGE-L is particularly notable, as these metrics are crucial for evaluating the capture of bi-gram relations and the overall coherence of the generated summaries. The inclusion of the TG-GNN and the MMR strategy plays a significant role in achieving these results, as demonstrated by the lower scores of the ablation variants (KETGS-NoGNN and KETGS-NoMMR).

Table 3 presents the BERTScore and MoverScore results, further confirming the semantic richness of summaries generated by KETGS. The high BERTScore indicates that the summaries generated by KETGS are semantically closer to the reference summaries than those generated by other models. Similarly, MoverScore, which evaluates the semantic alignment and word order preservation, shows that KETGS maintains both the meaning and the structural integrity of the source documents better than the baselines.

In terms of METEOR and BLEU (Table 4), KETGS once again demonstrates its superiority. The METEOR score, which accounts for synonymy and paraphrase matching, higights KETGS’s ability to generate summaries that are not only accurate but also linguistically varied. The BLEU scores, while slightly lower in absolute terms compared to METEOR, still indicate strong n-gram precision, validating the model’s ability to reproduce key phrases from the original text.

The ablation studies reveal the critical importance of the GNN and MMR components within KETGS. The removal of the GNN (KETGS-NoGNN) results in a notable drop in performance across all metrics and datasets, underscoring the role of graph-based context integration. Similarly, the absence of the MMR strategy (KETGS-NoMMR) leads to less diverse and relevant summaries, particularly in complex datasets like PubMed, where capturing a wide range of content is essential.

The performance of GPT-4 and MCHES models provides valuable insights when compared to the proposed Knowledge-Enhanced Transformer Graph Summarization (KETGS) framework. While both GPT-4 and MCHES demonstrate strong results across the ROUGE, BERTScore, MoverScore, METEOR, and BLEU metrics, KETGS outperforms them in key areas, particularly with ROUGE and BERTScore. GPT-4 performs competitively, indicating its strong language modeling capabilities and contextual understanding in summarization tasks. However, the graph-enhanced approach of KETGS enables it to capture discourse and sentence-level relationships more effectively, giving it a slight edge, especially on complex datasets like PubMed. MCHES also shows high performance, leveraging multi-dimensional embeddings and fine-tuned model adjustments, though it remains slightly behind KETGS in metrics like METEOR and BLEU, reflecting the effectiveness of KETGS’s graph-based integration and MMR (Maximal Marginal Relevance) strategy. KETGS’s tailored architecture for extractive summarization, combined with its focus on sentence selection and semantic relations, allows it to surpass these general-purpose models in specific metrics, particularly where discourse coherence and relevance balancing are critical. This higights KETGS’s potential as an adaptable summarization framework, particularly for specialized domains requiring high fidelity to original content structure and nuanced sentence relations.

Across all datasets and metrics, KETGS not only outperforms traditional and state-of-the-art models but also shows robustness in handling various types of documents, from news articles to biomedical abstracts. Its ability to integrate various linguistic and semantic relationships into a coherent graph-based structure enables it to produce summaries that are contextually rich, diverse, and aligned with the original document’s content. This makes KETGS a higy effective framework for extractive text summarization, capable of addressing the nuances and challenges posed by different types of textual data.

The integration of Transformer layers in the KETGS framework introduces both computational complexity and memory overhead due to multi-head self-attention and layer-wise processing of embeddings. This section examines the trade-offs between performance improvements and computational demands.

The self-attention mechanism in Transformers has a time complexity of

O (n^{2} \cdot d)

, where n represents the sequence length and d the embedding dimension. This quadratic dependence on n leads to higher computation time for longer documents, particularly in extensive datasets like PubMed. Although self-attention improves the model’s capacity to capture long-range dependencies, this increase in computation time poses a challenge compared to simpler architectures.

The memory usage in Transformer models scales with input sequence length, as it requires

O (n^{2})

space to store attention matrices. In KETGS, this can be a constraint when processing large documents or batches. To mitigate this, KETGS leverages pre-trained BERT embeddings and a graph structure that reduces the sequence length by focusing on significant nodes, thereby decreasing memory demands without sacrificing essential information.

While the Transformer enhances KETGS’s ability to capture complex dependencies and semantic relations, it affects scalability, especially for real-time applications. Optimization techniques such as gradient checkpointing and reducing the number of attention heads can improve memory efficiency, albeit with slight reductions in accuracy. In KETGS, multi-head attention is selectively applied only to nodes with high relational importance, reducing computation by focusing on critical components.

The addition of Transformer layers improves model performance in semantic understanding, as demonstrated by higher scores in ROUGE, BERTScore, and METEOR. However, the enhanced processing demands introduce a trade-off in scalability. Experiments indicate that the improvement in summary coherence and relevance justifies this overhead, especially for domain-specific tasks where accurate entity and discourse relation mapping is crucial.

Future work may explore optimization methods such as sparse attention or low-rank approximations to maintain Transformer benefits while reducing complexity. These approaches can enhance KETGS applicability in high-demand scenarios by balancing efficiency and performance.

While the primary focus of this paper is on extractive summarization, the Knowledge-Enhanced Transformer Graph Summarization (KETGS) framework has the potential to be applied in various other NLP tasks due to its flexible graph-based structure and deep semantic understanding capabilities.

1. Document Classification: In document classification, the graph representation used by KETGS can help capture the internal structure of documents, such as semantic relationships and discourse patterns, which are often important in distinguishing between different document types. For example, in legal or financial domains, where the organization and interrelated sections of a document contribute to its classification, KETGS can be adapted to enhance classification accuracy by learning patterns specific to each category.

2. Question Answering (QA): KETGS can be extended for question answering tasks, especially in scenarios where answers are derived from large, complex documents. By leveraging the graph structure, which captures discourse relations and entity connections, KETGS can be modified to retrieve specific, contextually relevant sentences that provide direct answers to questions. This approach would be valuable in customer support systems, where the model can quickly identify relevant information in knowledge bases or product documentation.

3. Knowledge Extraction and Retrieval: The combination of graph-based structures and Transformer models enables KETGS to excel in tasks that require extracting and organizing information from unstructured data. In industry, KETGS could be utilized for building knowledge graphs or improving information retrieval systems, where understanding entity relationships and discourse flow is crucial. Applications in healthcare, for instance, could benefit from this approach in organizing patient records or research papers. These applications demonstrate the versatility of KETGS and its potential value in various industrial and real-world contexts beyond summarization. Future work could explore these extensions to broaden the framework’s usability across a wide range of NLP tasks.

5. Conclusions

In this study, we presented Knowledge-Enhanced Transformer Graph Summarization (KETGS), a novel framework designed to enhance extractive text summarization by integrating entity and discourse relations into a graph-based structure. The KETGS framework leverages a Transformer-based Graph Neural Network (TG-GNN) to model the complex relationships between sentences, words, and entities within a document, while the Maximal Marginal Relevance (MMR) strategy ensures the generation of diverse and relevant summaries. Through comprehensive experiments across multiple benchmark datasets, including XSum, CNN/DailyMail, and PubMed, KETGS demonstrated superior performance over several state-of-the-art baseline models. The results consistently showed that KETGS produces more coherent, contextually rich, and semantically accurate summaries, as evidenced by higher scores across various evaluation metrics such as ROUGE, BERTScore, MoverScore, METEOR, and BLEU. The ablation studies conducted further validated the significance of the key components of KETGS, particularly the TG-GNN architecture and the MMR strategy. These components were shown to be critical in achieving the framework’s high performance, underscoring their importance in the overall design of the model. Despite these successes, KETGS is not without limitations. The framework’s complexity may present challenges in scalability, and its reliance on pre-trained embeddings could limit its adaptability to certain domains or languages. Additionally, while standard evaluation metrics indicate strong performance, future work should consider more nuanced and human-centered evaluations to fully assess the quality of the summaries generated by KETGS. In conclusion, KETGS represents a significant advancement in the field of extractive text summarization, combining cutting-edge techniques in Graph Neural Networks and relevance-based selection to produce summaries that are both accurate and contextually meaningful. Future research aimed at addressing the identified limitations is crucial in extending the applicability of KETGS to a broader range of tasks and domains, thereby solidifying its role as a versatile tool in the summarization landscape.

Author Contributions

Concept and design were developed by A.O. and H.A.; the methodology was formulated by A.O.; the software was provided by H.A.; both A.O. and H.A. carried out the validation; A.O. conducted the formal analysis; the initial draft was written by A.O.; the review and revision of the manuscript were performed by H.A.; A.O. was responsible for the visualizations; H.A. provided oversight. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Taif University, Taif, Saudi Arabia (TU-DSPP-2024-132).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors extend their appreciation to Taif University, Saudi Arabia, for supporting this research through project number (TU-DSPP-2024-132).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gupta, V.; Lehal, G.S. A survey of text summarization extractive techniques. J. Emerg. Technol. Web Intell. 2010, 2, 258–268. [Google Scholar] [CrossRef]
Moratanch, N.; Chitrakala, S. A survey on extractive text summarization. In Proceedings of the 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP), Chennai, India, 10–11 January 2017; pp. 1–6. [Google Scholar]
Mihalcea, R.; Tarau, P. TextRank: Bringing order into texts. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 25–26 July 2004; pp. 404–411. [Google Scholar]
Erkan, G.; Radev, D.R. LexRank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 2004, 22, 457–479. [Google Scholar] [CrossRef]
Gambhir, M.; Gupta, V. Recent automatic text summarization techniques: A survey. Artif. Intell. Rev. 2017, 47, 1–66. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4171–4186. [Google Scholar]
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
Liu, Y.; Lapata, M. Text summarization with pretrained encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3730–3740. [Google Scholar]
Yadav, A.K.; Yadav, R.S.; Maurya, A.K. State-of-the-art approach to extractive text summarization: A comprehensive review. Multimed. Tools Appl. 2023, 82, 29135–29197. [Google Scholar] [CrossRef]
Zhang, H.; Liu, X.; Zhang, J. Hegel: Hypergraph transformer for long document summarization. arXiv 2022, arXiv:2210.04126. [Google Scholar]
Zhang, X.; Wei, F.; Zhou, M. HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italy, 28 July–2 August 2019; pp. 5059–5069. [Google Scholar]
Kwon, J.; Kobayashi, N.; Kamigaito, H.; Okumura, M. Considering nested tree structure in sentence extractive summarization with pre-trained transformer. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Virtual Event, 7–11 November 2021; pp. 4039–4044. [Google Scholar]
Carbonell, J.; Goldstein, J. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, 24–28 August 1998; pp. 335–336. [Google Scholar]
Gupta, V.; Bharti, P.; Nokhiz, P.; Karnick, H. SumPubMed: Summarization dataset of PubMed scientific articles. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, Virtual Event, 1–6 August 2021; pp. 292–303. [Google Scholar]
Yadav, J.; Meena, Y.K. Use of fuzzy logic and WordNet for improving performance of extractive automatic text summarization. In Proceedings of the 2020 5th International Conference on Intelligent Information Technology, Hanoi, Vietnam, 19–22 February 2020; pp. 7–15. [Google Scholar]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
Dai, Z.; Yang, Z.; Yang, Y.; Carbonell, J.; Le, Q.V.; Salakhutdinov, R. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 2978–2988. [Google Scholar]
Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The Long-Document Transformer. arXiv 2020, arXiv:2004.05150. [Google Scholar]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Philip, S.Y. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef]
Wang, W.; Wei, F.; Li, W.; Li, S. Hypersum: Hypergraph based semi-supervised sentence ranking for query-oriented summarization. In Proceedings of the 18th ACM conference on Information and knowledge management, Hong Kong, China, 2–6 November 2009; pp. 1855–1858. [Google Scholar]
Xu, J.; Gan, Z.; Cheng, Y.; Liu, J. Discourse-aware neural extractive text summarization. arXiv 2019, arXiv:1910.14142. [Google Scholar]
Zhong, M.; Liu, P.; Chen, Y.; Wang, D.; Qiu, X.; Huang, X. 2020, Extractive Summarization as Text Matching. arXiv 2022, arXiv:2004.08795. [Google Scholar]
Khan, B.; Shah, Z.A.; Usman, M.; Khan, I.; Niazi, B. Exploring the landscape of automatic text summarization: A comprehensive survey. IEEE Access 2023, 11, 109819–109840. [Google Scholar] [CrossRef]
Karotia, A.; Susan, S. CovSumm: An unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19. J. Supercomput. 2023, 79, 16328–16350. [Google Scholar] [CrossRef] [PubMed]
Gogireddy, Y.R.; Bandaru, A.N.; Sumanth, V. Synergy of Graph-Based Sentence Selection and Transformer Fusion Techniques For Enhanced Text Summarization Performance. J. Comput. Eng. Technol. (JCET) 2024, 7, 33–41. [Google Scholar]
Shakil, H.; Farooq, A.; Kalita, J. Abstractive text summarization: State of the art, challenges, and improvements. Neurocomputing 2024, 603, 128255. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Adv. Neural Inf. Process. Syst. 2019, 32, 8024–8035. [Google Scholar]
Honnibal, M.; Montani, I.; Van Landeghem, S.; Boyd, A. spaCy: Industrial-Strength Natural Language Processing in Python. Zenodo 2020. Available online: https://www.bibsonomy.org/bibtex/2616669ca18ac051794c0459373696942/rerry (accessed on 1 August 2024).
Narayan, S.; Cohen, S.B.; Lapata, M. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 1797–1807. [Google Scholar]
Nallapati, R.; Zhou, B.; Gulcehre, C.; Xiang, B. Abstractive text summarization using sequence-to-sequence RNNs and beyond. arXiv 2016, arXiv:1602.06023. [Google Scholar]
Cohan, A.; Dernoncourt, F.; Kim, D.S.; Bui, T.; Kim, S.N.; Chang, W.Y.; Goharian, N. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, 1–6 June 2018; Volume 2, pp. 615–621. [Google Scholar]
Nallapati, R.; Zhai, F.; Zhou, B. SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
Onan, A.; Alhumyani, H. Contextual Hypergraph Networks for Enhanced Extractive Summarization: Introducing Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES). Appl. Sci. 2024, 14, 4671. [Google Scholar] [CrossRef]
Jia, R.; Cao, Y.; Tang, H.; Fang, F.; Cao, C.; Wang, S. Neural extractive summarization with hierarchical attentive heterogeneous graph network. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 3622–3631. [Google Scholar]
Dong, Y.; Shen, Y.; Crawford, E.; van Hoof, H.; Cheung, J.C.K. Banditsum: Extractive summarization as a contextual bandit. arXiv 2018, arXiv:1809.09672. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Lin, C.Y. ROUGE: A Package for Automatic Evaluation of Summaries. In Proceedings of the ACL-04 Workshop, Barcelona, Spain, 21–26 July 2004; pp. 74–81. [Google Scholar]
Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. BERTScore: Evaluating Text Generation with BERT. In Proceedings of the 8th International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Onan, A.; Alhumyani, H.A. FuzzyTP-BERT: Enhancing extractive text summarization with fuzzy topic modeling and transformer networks. J. King Saud-Univ.-Comput. Inf. Sci. 2024, 36, 102080. [Google Scholar] [CrossRef]
Zhao, W.; Peyrard, M.; Liu, F.; Gao, Y.; Meyer, C.M.; Eger, S. MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 563–578. [Google Scholar]
Banerjee, S.; Lavie, A. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, MI, USA, 29 June 2005; pp. 65–72. [Google Scholar]
Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL), Philadelphia, PA, USA, 6–12 July 2002; pp. 311–318. [Google Scholar]

Figure 1. The general structure for KETGS framework.

Figure 2. Performance metrics as a function of

λ

in the MMR strategy. The optimal value,

λ = 0.7

, balances relevance and diversity in sentence selection.

Figure 2. Performance metrics as a function of

λ

in the MMR strategy. The optimal value,

λ = 0.7

, balances relevance and diversity in sentence selection.

Figure 3. Impact of the semantic similarity threshold

θ

on model performance. The chosen threshold,

θ = 0.5

, maximizes performance while reducing noise in the sentence graph.

Figure 3. Impact of the semantic similarity threshold

θ

on model performance. The chosen threshold,

θ = 0.5

, maximizes performance while reducing noise in the sentence graph.

Figure 4. Performance variation with different co-occurrence degrees in graph construction. A degree of 2 was chosen based on these results.

Table 1. Model Configurations for KETGS.

Component	Configuration Details
Graph Construction	Nodes: Sentences, Words, Entities; Edges: Semantic similarity (cosine), Syntactic dependencies, Discourse relations
Embedding Methods	Pre-trained BERT-base embeddings for words, sentences, and entities
TG-GNN Architecture	3 layers, Multi-head self-attention
Sentence Scoring	Node features from TG-GNN, MMR strategy for selection

Table 2. ROUGE Scores for KETGS, GPT-4, MCHES, and Baseline Models Across All Datasets.

Model	XSum		CNN/DailyMail		PubMed
Model	ROUGE-1	ROUGE-2	ROUGE-1	ROUGE-2	ROUGE-1	ROUGE-2
LEAD-3	38.45	20.08	40.12	17.65	37.20	16.40
TextRank	40.12	21.50	42.00	18.30	39.00	17.20
SummaRuNNer	42.10	21.85	43.15	19.50	40.50	18.00
BERTSUMEXT	45.68	23.74	46.50	21.80	43.75	19.80
MatchSum	46.50	24.10	47.25	22.10	44.50	20.00
NeRoBERTa	46.80	24.50	47.60	22.45	45.00	20.40
HIBERT	46.95	24.60	47.80	22.50	45.20	20.60
JECS	45.50	23.30	46.00	21.40	43.30	19.50
BanditSum	47.20	24.75	48.00	22.75	45.50	20.80
HAHSum	47.50	25.00	48.25	23.00	45.80	21.00
KETGS (Proposed)	48.76	26.93	49.50	24.35	47.25	22.50
GPT-4	47.85	25.90	48.90	23.80	46.50	21.85
MCHES	47.00	25.25	48.25	23.20	46.00	21.40
KETGS-NoGNN	46.12	24.60	47.00	22.00	44.00	20.00
KETGS-NoMMR	47.05	25.30	48.10	23.30	45.20	21.50
KETGS-Basic	46.80	25.00	47.75	23.10	44.80	21.20
KETGS-NoEntities	46.50	24.70	47.30	22.60	44.50	20.80
KETGS-NoDiscourse	46.70	24.85	47.50	22.75	44.70	21.00

Table 3. BERTScore and MoverScore for KETGS, GPT-4, MCHES, and Baseline Models Across All Datasets.

Model	XSum		CNN/DailyMail		PubMed
Model	BERTScore	MoverScore	BERTScore	MoverScore	BERTScore	MoverScore
LEAD-3	57.26	84.65	58.12	85.20	56.80	83.90
TextRank	58.00	85.20	58.90	85.85	57.60	84.50
SummaRuNNer	58.80	85.40	59.50	86.00	58.20	85.10
BERTSUMEXT	59.95	86.60	60.80	87.20	59.40	86.30
MatchSum	60.20	86.80	61.05	87.50	59.60	86.40
NeRoBERTa	60.50	87.00	61.30	87.75	59.90	86.60
HIBERT	60.60	87.10	61.40	87.90	60.00	86.70
JECS	59.50	86.20	60.30	86.90	58.90	85.90
BanditSum	60.80	87.30	61.60	88.00	60.20	86.90
HAHSum	61.00	87.50	61.75	88.20	60.50	87.10
KETGS (Proposed)	62.50	89.00	63.00	89.75	62.00	88.50
GPT-4	61.90	88.50	62.70	89.10	61.50	88.00
MCHES	61.20	88.00	62.00	88.60	61.00	87.70
KETGS-NoGNN	60.10	86.75	60.90	87.40	59.30	86.00
KETGS-NoMMR	61.30	87.90	62.00	88.60	60.80	87.30
KETGS-Basic	61.00	87.50	61.70	88.30	60.40	87.00
KETGS-NoEntities	60.80	87.20	61.50	87.90	60.10	86.80
KETGS-NoDiscourse	60.90	87.35	61.60	88.00	60.30	87.00

Table 4. METEOR and BLEU Scores for KETGS, GPT-4, MCHES, and Baseline Models Across All Datasets.

Model	XSum		CNN/DailyMail		PubMed
Model	METEOR	BLEU	METEOR	BLEU	METEOR	BLEU
LEAD-3	21.45	15.80	22.30	16.25	20.90	14.60
TextRank	22.30	16.25	23.10	17.00	21.75	15.30
SummaRuNNer	23.10	17.00	24.00	17.75	22.50	16.00
BERTSUMEXT	24.50	18.10	25.50	18.90	24.00	17.00
MatchSum	24.80	18.35	25.80	19.15	24.20	17.20
NeRoBERTa	25.00	18.50	26.00	19.35	24.50	17.40
HIBERT	25.10	18.55	26.20	19.50	24.60	17.55
JECS	23.90	17.50	24.80	18.40	23.30	16.70
BanditSum	25.30	18.75	26.50	19.60	24.70	17.65
HAHSum	25.50	18.90	26.75	19.80	24.90	17.80
KETGS (Proposed)	26.90	19.80	28.00	20.75	26.00	18.75
GPT-4	26.30	19.30	27.50	20.25	25.60	18.25
MCHES	25.80	19.00	27.00	20.00	25.30	18.10
KETGS-NoGNN	24.90	18.40	25.80	19.10	24.00	17.10
KETGS-NoMMR	26.10	19.20	27.20	20.20	25.30	18.20
KETGS-Basic	25.80	19.00	27.00	20.00	25.10	18.00
KETGS-NoEntities	25.60	18.80	26.80	19.85	24.90	17.85
KETGS-NoDiscourse	25.70	18.90	26.90	19.90	25.00	18.00

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Onan, A.; Alhumyani, H. Knowledge-Enhanced Transformer Graph Summarization (KETGS): Integrating Entity and Discourse Relations for Advanced Extractive Text Summarization. Mathematics 2024, 12, 3638. https://doi.org/10.3390/math12233638

AMA Style

Onan A, Alhumyani H. Knowledge-Enhanced Transformer Graph Summarization (KETGS): Integrating Entity and Discourse Relations for Advanced Extractive Text Summarization. Mathematics. 2024; 12(23):3638. https://doi.org/10.3390/math12233638

Chicago/Turabian Style

Onan, Aytuğ, and Hesham Alhumyani. 2024. "Knowledge-Enhanced Transformer Graph Summarization (KETGS): Integrating Entity and Discourse Relations for Advanced Extractive Text Summarization" Mathematics 12, no. 23: 3638. https://doi.org/10.3390/math12233638

APA Style

Onan, A., & Alhumyani, H. (2024). Knowledge-Enhanced Transformer Graph Summarization (KETGS): Integrating Entity and Discourse Relations for Advanced Extractive Text Summarization. Mathematics, 12(23), 3638. https://doi.org/10.3390/math12233638

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Knowledge-Enhanced Transformer Graph Summarization (KETGS): Integrating Entity and Discourse Relations for Advanced Extractive Text Summarization

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Document Representation Initialization

3.1.1. Word Embeddings

3.1.2. Sentence Embeddings

3.1.3. Entity Recognition and Embeddings

3.1.4. Graph Initialization

3.2. Graph Construction

3.3. Transformer-Guided Graph Neural Network

3.4. Sentence Scoring and Selection

3.5. Training and Optimization

4. Experimental Settings

4.1. Experimental Setup

4.2. Datasets

4.3. Baseline Models

4.4. Model Configurations

4.5. Parameter Tuning and Determination Curves

4.6. Training and Optimization

4.7. Evaluation Metrics

4.8. Ablation Studies

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI