Article
Open access
Published: 04 January 2025

Unifying topological structure and self-attention mechanism for node classification in directed networks

Yue Peng^1,2,
Jiwen Xia^1,2,
Dafeng Liu^1,2,
Miao Liu^1,2,
Long Xiao^1,2 &
…
Benyun Shi^1,2

Scientific Reports volume 15, Article number: 805 (2025) Cite this article

1136 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Graph data is essential for modeling complex relationships among entities. Graph Neural Networks (GNNs) have demonstrated effectiveness in processing low-order undirected graph data; however, in complex directed graphs, relationships between nodes extend beyond first-order connections and encompass higher-order relationships. Additionally, the asymmetry introduced by edge directionality further complicates node interactions, presenting greater challenges for extracting node information. In this paper, We propose TWC-GNN, a novel graph neural network design, as a solution to this problem. TWC-GNN uses node degrees to define higher-order topological structures, assess node importance, and capture mutual interactions between central nodes and their adjacent counterparts. This approach improves our understanding of complex relationships within the network. Furthermore, by integrating self-attention mechanisms, TWC-GNN effectively gathers higher-order node information in addition to focusing on first-order node information. Experimental results demonstrate that the integration of topological structures and higher-order node information is crucial for the learning process of graph neural networks, particularly in directed graphs, leading to improved classification accuracy.

GTAT: empowering graph neural networks with cross attention

Article Open access 08 February 2025

Fusing multiplex heterogeneous networks using graph attention-aware fusion networks

Article Open access 24 November 2024

Graph Geometric Algebra networks for graph representation learning

Article Open access 02 January 2025

Introduction

Graphs, as non-Euclidean data structures, are widely used in fields such as social networks ^1,2,3,4, chemistry ⁵, and transportation planning ^6,7. However, traditional deep learning techniques face challenges when processing graph data due to its distinctive properties, including nonlinearity, high heterogeneity, irregularity, and dynamic fluctuations ⁸. These characteristics diverge from the assumptions underlying traditional deep learning models, which are typically designed for Euclidean data. To address these challenges, researchers have proposed a variety of graph neural networks (GNNs) ^9,10,11,12, designed for various different machine learning applications, including community mining ^13,14 and graph embedding ^15,16. GNNs stand out for their ability to capture complex structures and interactions within graph data while effectively integrating node attributes in an end-to-end learning process.In contrast to traditional deep learning techniques, GNNs are adept at representing complex structures and relationships in graph data while effectively incorporating node attributes into an end-to-end learning framework.

Most GNNs update node information by considering only first-order neighbor details. For example, various Graph Convolutional Networks (GCNs) developed in recent years use message passing algorithms within convolutional layers to aggregate attribute information from nearby nodes ^17,18,19,20. To capture higher-order node information, GCNs typically increase the depth of the model. However, deeper models often suffer from the issue of over-smoothing, where node representations become indistinguishable, making it difficult to preserve the unique features of each node ^21,22,23,24. GraphSAGE addresses this by sampling higher-order neighbors for each node in each layer and aggregating their information to produce new representations ²⁵. However, this approach requires manual selection of sampling parameters and the number of samples and neighbor depth can significantly affect model performance. Existing studies have shown that the higher-order structure contains vital information in graph data. As a result, it is crucial to effectively capture the higher-order topological information for graph learning tasks.

Complex networks often employ higher-order topological structures to characterize intricate relationships within the network ^26,27,28,29. By analyzing the local connections of each node, these higher-order topological structures reveal hidden organizational patterns within the network, thereby revealing more intricate higher-order functional regions in networked data. Motifs and graphlets efficiently obtain complex structural information from graphs, representing complex relationships in the network ^30,31. In 2019, John et al. introduced a novel methodology that integrates the analysis of motifs within graph convolutional neural networks ³². This integration significantly improves the model’s ability to capture complex topological structural information at higher orders. Although the method can handle directed graphs with self-loops and accurately capture higher-order structural information, it comes with higher computational complexity. Moreover, identifying essential motifs and developing autonomous methods for learning these motifs from available data can present a complex problem. Addressing the challenge of defining and using higher-order topological structures to extract complex information from networks represents a significant scientific problem.

In this article, we have designed and constructed an innovative graph neural network model, named TWC-GNN, with the primary objective of obtaining complex relationships and hidden information in the network to achieve more accurate node classification. Compared to other related models, the TWC-GNN explores higher-order information throughout the network using a self-attention mechanism and a higher-order topological structure. Since first-order nodes are significant in a graph, we employ the attention method once more to modify the node feature information using the adjacency matrix. In summary, the main contributions of this paper are as follows:

We suggest a higher-order topological structure to capture the structural characteristics of nodes in a directed graph. In this article, we use the concept of node centrality to measure the importance of nodes in the network. By considering the three fundamental relationships among the three nodes, it assesses the mutual influence between the central node and its neighboring nodes, revealing intricate relationships within the data.
We encode the structural information of directed graphs using Laplacian eigenvectors, which enables the application of the self-attention mechanism from the Transformer model to sparsely structured graph data. This extension of the attention mechanism includes all nodes, even those at greater distances, allowing for the acquisition of comprehensive global information about the graph.
We employ the Graph Attention Network (GAT) approach, which integrates the adjacency matrix with the self-attention mechanism to emphasize important first-order neighboring node information. This method enables a thorough exploration of the node features, leading to an effective fusion of global and local interactions within the network.

We implement the proposed TWC-GNN method for node classification tasks in directed networks, using Cora, CiteSeer, Pubmed, and Reddit datasets. To evaluate its performance, we compare it against a range of baseline methods, including DeepWalk, MLP, GCN, FAGCN, FedGCN,S$\hat{3}$GC, GCKSVM, and GAT. The results demonstrate that our algorithm achieves relatively outstanding performance across all datasets.

Methods

Node classification aims to predict the labels of unlabeled nodes in a graph $G=\{V, E\}$, where some vertices are already labeled and others are not. $V_L \in V$ represents the set of nodes with known labels, denoted $Y_L$. $V_U=V \backslash V_L$ represents the set of nodes with unknown labels, denoted $Y_U$. The objective of supervised learning is to train a model f, such that it can accurately predict the label of an unlabeled node $v \in V_U$. The node features are represented by a matrix $X \in R^{n \times d}$, where $n=|V|$ is the number of nodes in G, and d is the dimension of feature vectors.

In this section, we present a detailed overview of the architecture and key components of the proposed TWC-GNN. As illustrated in Figure 1, TWC-GNN integrates higher-order topological structures (Centrality Information) with attention mechanism modules (the Transformer and GAT methods). To capture the mutual influence between central and neighboring nodes, we construct a higher-order topological structure using Centrality Information, derived from node degrees, to represent the complex interactions within the network. The Transformer method, employing the self-attention mechanism, is used to extract implicit information from higher-order nodes. Finally, the GAT method, also based on self-attention, updates node attributes by focusing specifically on first-order neighboring nodes.

Centrality information

The inherent structure of the graph data offers a plethora of valuable information. If a node exhibits a high degree, it can be inferred that the node possesses a significant level of influence. Quantifying node importance is crucial to understand graph structures, yet it remains insufficiently explored in the existing literature ^33,34,35,36. In this analysis, we examine node centrality by focusing on three fundamental types of higher-order relationship. To assess the impact of a central node on its neighboring nodes, we will transform the node information into edge information and explore the correlation between pairs of nodes using Eq. 1.

$$\begin{aligned} Z=M^{T} X W_{a} \end{aligned}$$

(1)

where $M \in R^{|V| \times |E|}$ is the incidence matrix that encodes the connections between nodes and edges, denotes, defined as $M_{i,{i\rightarrow j}}=M_{j,{i\rightarrow j}}=1$ and 0 otherwise. Moreover, $W_{a}$ is a learnable projection matrix that transforms the node feature matrix X to the edge feature matrix Z.

The transformation from node features to edge features is critical in tasks such as node classification because it allows the model to capture both direct and indirect interactions between nodes ³⁷. By converting node information into edge representations, we ensure that the model does not just capture the immediate relationships between nodes, but also the flow of information across the graph. Incorporating edge features derived from node information enables the model to go beyond first-order neighbor interactions, ensuring that higher-order dependencies are captured. This is particularly important for directed graphs, where the flow and influence of information are directional, making the modeling of these edges vital for tasks like node classification. Such indirect interactions often carry important latent information about the network’s structure, especially in directed graphs, where the directionality of relationships is a key aspect.

Node liquidity

In a citation network, if a literature j on genetic algorithms is cited by only one literature i and cites only one literature k, then the probability that the three pieces of literature belong to the same category will be high. If the literature j is cited by or cites a greater number of other literature, the likelihood that the literature i and the literature k are related to genetic algorithms will decrease. As shown in Figure 2(a), the correlation between the citations $(i \rightarrow j)$ and $(j \rightarrow k)$ increases. As shown in Figure 2(b), if the central node j has many neighbors (i.e., the degree of the node j is large), then the correlation between $(i \rightarrow j)$ and $(j \rightarrow k)$ weakens due to the influence of other neighboring nodes. Likewise, on social networks, if the central node, user j, is connected to both user i and user k through a friendship relationship, and both user i and user k are engaged in fraudulent activities, then it is extremely probable that user j is also involved. Therefore, when we discover that the user j has a close social relationship with known fraudsters, we need to be more vigilant about their behavior to prevent potential fraudulent activities. In Eq. 2, the correlation coefficient between $(i \rightarrow j)$ and $(j \rightarrow k)$ is calculated as follows:

$$\begin{aligned} A_{e,(i \rightarrow j),(j \rightarrow k)}=\exp \left( -\frac{\left( \operatorname {deg}^{-}(j)+\operatorname {deg}^{+}(j)-2\right) ^{2}}{\sigma ^{2}}\right) \end{aligned}$$

(2)

where $\sigma$ is the standard deviation of node degrees and $\operatorname {deg}^{-}(j)$ and $\operatorname {deg}^{+}(j)$ indicate the in-degree and out-degree of node j in the graph. $A_e \in R^{|E| \times |E|}$ denotes the correlation coefficient between edges. When a central node j has only one citing node i and one cited node k, the correlation between these two edges is at its maximum, with a value of 1.

Multi-reference ability

As shown in Figure 3(a), literature typically cites relevant works within the same research field. Higher out-degrees indicate more significant influence and a higher probability of being cited by different categories of literature. As shown in Figure 3(b), the literature on genetic algorithms cites a genetic algorithms literature i and a probabilistic methods literature k. However, the literature k can also be referenced in the literature within the field of neural networks. The correlation between $(i \rightarrow j)$ and $(k \rightarrow j)$ is influenced by the out-degree of the citing nodes i and k.

$$\begin{aligned} A_{e,(i \rightarrow j),(k \rightarrow j)}=\exp \left( -\frac{\left( \operatorname {deg}^{+}(i)+\operatorname {deg}^{+}(k)-2\right) ^{2}}{\sigma ^{2}}\right) \end{aligned}$$

(3)

where $\operatorname {deg}^{+}(i)$ and $\operatorname {deg}^{+}(k)$ denote the out-degree of node i and node k. Literature with lesser out-degrees is more likely to be cited by literature from the same category. In Eq. 3, the correlation between edge $(i \rightarrow j)$ and edge $(k \rightarrow j)$ is maximized when both the out-degree of node i and node k is 1.

Multi-propagation property

The greater the out-degree, the stronger the influence and citation impact of the literature. It will be cited not only by literature within the same category but also by those from other domains. For instance, probabilistic methods may be cited in the literature on genetic algorithms and neural networks. As shown in Figure 4 and Eq. 4, the correlation between $(j \rightarrow i)$ and $(j \rightarrow k)$ is not solely determined by the degree of the central node j, but rather by the in-degree of nodesi and k.

$$\begin{aligned} A_{e,(j \rightarrow i),(j \rightarrow k)}=\exp \left( -\frac{\left( \operatorname {deg}^{-}(i)+\operatorname {deg}^{-}(k)-2\right) ^{2}}{\sigma ^{2}}\right) \end{aligned}$$

(4)

where the in-degree of nodes i and k are denoted by $\operatorname {deg}^{-}(i)$ and $\operatorname {deg}^{-}(k)$, respectively. The higher the importance of a node, the lower its similarity to neighboring nodes. The correlation between edge $(j \rightarrow i)$ and edge $(j \rightarrow k)$ is determined by node i and node k. The degree of link between the two edges is maximized when both node i and node k have an in-degree of 1.

This represents a higher-level topology involving three nodes and two edges, defined on the basis of the structural characteristics of the data. Identifying the center node of the graph and evaluating the relationship between it and its neighboring nodes are the primary objectives. This higher-level topology is utilized to extract complex relationships within the graph data, as further elaborated in the comprehensive model introduction below.

Self-attention based mechanisms

The self-attention mechanism enables the thorough extraction of node feature information. In this section, we examine in detail the integration of self-attention mechanisms, including the Transformer and Graph Attention Network (GAT). We provide a comprehensive explanation of how these methods work together to capture higher-order node information, ultimately leading to an enriched representation of node features.

Transformer for higher-order node information

The transformer integrates the attention mechanism with a positional encoding matrix ³⁸. This approach considers all nodes within the network, capturing the interdependence of nodes, even those far apart, instead of focusing solely on neighboring nodes ^39,40,41. Networks, in general, are commonly accepted to exhibit sparsity, indicating that not all nodes are connected. To obtain comprehensive global information and preserve the sparsity of the graph, encoding the graph’s structural information is crucial. Several commonly used techniques for encoding graph positions include Node Index Positional Encoding, Laplacian Eigenvector Encoding ^42,43, and Learnable Positional Encoding ⁴⁴. We utilize Laplacian Eigenvector Encoding as follows:

$$\begin{aligned} A_{\text{ lp }}=\textrm{I}-D^{-1 / 2} A D^{-1 / 2}=U^{T} \Lambda U, \end{aligned}$$

(5)

where A is the directed adjacency matrix of G, with $a_{ij}=1$ if there is a directed edge between nodes i and j, and $a_{ij} = 0$ otherwise.

The Transformer’s attention module makes use of the self-attention mechanism, which has the following definition:

$$\begin{aligned} Q=X W_{Q}, \quad K=X W_{K}, \quad V=X W_{V} \end{aligned}$$

(6)

$$\begin{aligned} A_{\text{ soft } }=\operatorname {softmax}\left( \frac{Q K^{\top }}{\sqrt{d_{K}}} \cdot A_{\text{ lp }}\right) \end{aligned}$$

(7)

$$\begin{aligned} \operatorname {Attn}(H)=A_{\text{ soft } } V \end{aligned}$$

(8)

The self-attention mechanism utilizes new vector representations, denoted Q, K, and V, to represent the nodes Query, Key, and Value, respectively. The weight matrices $W_{Q} \in \mathbb {R}^{d \times d_{K}}$, $W_{K} \in \mathbb {R}^{d \times d_{K}}$ and $W_{V} \in \mathbb {R}^{d \times d_{V}}$ are used to transform the input into query, key, and value representations. $\operatorname {Attn}(H)$ represents the output of the single-headed self-attention module, computed based on the similarity between the query and the key.

The fundamental components of the Transformer architecture consist of multi-head self-attention (MSA), which enables the model to selectively attend to several segments of the input sequence concurrently. Layer normalization (LN) is implemented prior to each block, and the feed-forward network (FFN) is composed of two linear layers followed by a GELU (Gaussian Error Linear Unit) nonlinearity. This configuration allows the model to capture intricate patterns in the data effectively. The mathematical equations regulating these operations are shown below:

$$\begin{aligned} h^{\prime (l)}=\operatorname {MHA}\left( \operatorname {LN}\left( h^{(l-1)}\right) \right) +h^{(l-1)} \end{aligned}$$

(9)

$$\begin{aligned} h^{(l)}=\textrm{FFN}\left( \operatorname {LN}\left( h^{\prime (l)}\right) \right) +h^{\prime (l)} \end{aligned}$$

(10)

where $h^{(l-1)}$ denotes the input of layer l, and $h^{\prime (l)}$ represents the output of layer l after passing through the multi-head attention mechanism module, which is also the input of the position pre-encoding module for layer l.

Distances between nodes in complex networks often vary. The more distant nodes frequently represent the network’s edges and the connections between different components, and they can provide crucial information about the network as a whole.

GAT for feature update among first-order neighbors

Both GAT and Transformer incorporate self-attention mechanisms. One distinction lies in the ability of the Transformer technique to acquire global node information, whereas the GAT method concentrates solely on the first-order neighbor nodes of the given nodes. However, in a network, the first-order neighbor nodes are more important. Therefore, after extracting the latent information from the network, directing attention to the information from the first-order neighboring nodes can effectively consolidate the information within the network.

Using the attention mechanism, the GAT module first determines the attention coefficient $e_{i j}$ between the nodes i and j. Then, the LeakyReLU function and softmax function are employed to normalize the inter-node attention coefficients, respectively, to get the attention weight $\alpha _{i j}$ between any two nodes. Finally, node features are updated according to the adjacency matrix W, ensuring a refined representation of the graph’s structural information:

$$\begin{aligned} h_i=\sigma \left( \sum _{j \in N(i)} \alpha _{i j} W h_j\right) \end{aligned}$$

(11)

TWC-GNN: an implementation of the method

In this section, we present TWC-GNN, a novel model that integrates higher-order topology and attention mechanisms for a thorough exploration of complex relationships and latent information within the network. TWC-GNN consists of higher-order topological structures (Centrality Information) and attention mechanism modules (the Transformer method and the GAT method). The Centrality Information module represents a higher-order topology defined based on the intrinsic features of the data. It is used to reveal intricate relationships within the network. The discovery of concealed information in higher-order nodes is facilitated by leveraging the self-attention mechanism and location-coding information of the Transformer method. The GAT technique focuses on extracting more significant information from first-order neighbors.

Code Availability: The step-by-step procedure is illustrated in Algorithm 1. The implementation of the TWC-GNN model is publicly available on GitHub: https://github.com/xiajiwen/TWC-GNN and https://doi.org/10.5281/zenodo.14264082.

Results and discussion

In this section, we conduct experiments on multiple benchmark networks to assess the effectiveness of the proposed solution. We employ visualization and analysis techniques to showcase the efficacy of our suggested framework based on experimental findings. In addition, we perform ablation experiments to emphasize the necessity of collecting complex interactions and hidden information within the network.

Table 1 Statistic characteristics of all the datasets used in our experiments.

Full size table

Baselines and datasets

In order to assess the significance of higher-order information and complex interactions, we performed a series of experiments on two extensively utilized network datasets in the domain of graph representation learning. Two common types of networks studied in academic research are academic paper citation networks and social networks. To handle the extensive scale of the social network dataset, this study uses a subset that constitutes 3% of the data for experimental purposes. The statistical characteristics of all datasets are summarized in Table 1.

Cora: The dataset is a widely used academic citation network, which is commonly used in graph neural networks. The 2,708 computer science research articles in it are classified into seven classes, including machine learning, databases, artificial intelligence, and others. The frequency of words in the papers is represented by a 1433-dimensional word vector created using a bag-of-words model.
CiteSeer: It comprises over 6000 papers in the field of computer science from 1998 to 2002 together with the connections between them. In this collection, publications are shown as nodes, and citation links between publications are shown as edges. In addition, a label is issued to each node that falls into one of six categories based on themes, including artificial intelligence, distributed systems, databases, machine learning, parallel computing, databases, and information retrieval.
Pubmed: The dataset is a widely used citation network dataset in academic research. The National Library of Medicine of the United States provides it and has approximately 2.4 million scientific papers on the subjects of medicine and biological sciences. Each article is represented as a node on the network, with metadata such as title, abstract, and keywords, and the edges are formed by the citation links between the articles.
Reddit: The dataset is widely used and originates from the social news platform Reddit, containing posts and comments posted by Reddit users. The dataset comprises posts from diverse categories, including politics, sports, and entertainment, and spans a broad spectrum of social interactions such as upvoting, commenting, and sharing.

The fundamental methods for comparison in our experiments are thoroughly explained as follows.

DeepWalk ¹⁵: DeepWalk is a method that utilizes random walks and neural networks for unsupervised graph embedding, enabling efficient learning of node representations on large-scale graphs.
MLP ⁴⁵: MLP is a feedforward neural network based on multiple fully connected layers, capable of learning non-linear features. It has been widely applied in classification, regression, and other machine learning tasks.
GCN ¹⁸: GCN is a neural network model designed to handle graph data. It is characterized by its ability to learn feature representations of nodes while considering their interactions within the graph structure.
FAGCN ⁴⁶: This model is designed to process graph data by adaptively integrating both low-frequency and high-frequency signals from node features and their interactions within the graph structure. It allows for more flexible and efficient learning of node representations across a variety of network scenarios.
FedGCN ⁴⁷: This neural network model is designed to handle graph data distributed across multiple clients. It learns node feature representations while accounting for their interactions within the graph structure, achieving fast convergence with minimal communication and enhanced privacy through federated learning.
S$\hat{3}$GC ⁴⁸: This neural network model is designed for processing graph data, with a key feature being its ability to learn node feature representations while incorporating their interactions within the graph structure. It accomplishes this in a computationally efficient manner, reducing complexity compared to traditional GCNs.
GCKSVM ⁴⁹: GCKSVM integrates graph convolution with kernel learning to efficiently process graph data. It stands out for its ability to generate expressive node features by capturing both inherent attributes and the relational context of the graph structure.
GAT ¹⁹: The proposed method incorporates the self-attention mechanism into graph convolutional networks (GCN), establishing dynamic and weighted connections among nodes. This enhancement aims to capture the relationships and node importance better.

Node classification and visualization

In this work, we applied L2 regularization to prevent overfitting during training and used Adam SGD Optimizer as a parameter optimizer to minimize cross-entropy loss ⁵⁰. Furthermore, we implemented early halting when the negative log-likelihood (NLL) loss did not decrease for 20 consecutive iterations ⁵¹, and we set the maximum number of training epochs at 2000. The learning rates were set to 0.03, 0.005, 0.05, and 0.005 for the Cora, Citeseer, Pubmed, and Reddit datasets, respectively. For the respective datasets, the number of attention heads was set to 8, 4, 1, and 4.

Table 2 An analysis of the classification outcomes for the Citation Networks and Social Network datasets.

Full size table

Table 2 summarized the accuracy of categorization along with the standard deviation of each approach used in our experiments. Clearly, our TWC-GNN model exhibits the best performance in the Cora network, with a 4.7% improvement over the GCN technique. Similarly, the TWC-GNN approach achieves the best performance in the Social Network. On the citation network, Figure 5 illustrates the variations in the loss curves for the GCN, GAT, and TWC-GNN techniques. Compared to the other two models, our proposed TWC-GNN technique achieves a lower final loss value. The substantially higher loss value in the GCN model can be attributed to its less effective use of higher-order node information and simpler handling of local neighborhood information. The left half of Figure 6 displays the ground truth labels of the dataset, while the middle part shows the predicted labels. To facilitate the comparison of anticipated results with ground-truth labels, the right side of the picture uses darker hues to signify correct predictions and lighter shades to highlight prediction mistakes.

In general, this paper introduces a novel graph neural network model, TWC-GNN, which utilizes higher-order topological structures and self-attention mechanisms. Compared to other baseline algorithms, TWC-GNN demonstrates superior capability to capture complex relationships and latent information within the network.

Table 3 Ablation Experiments on higher-Order Topological Structures and higher-Order Node Information.

Full size table

Ablation study

We conducted a series of ablation experiments to determine the significance of higher-order node information and higher-order topological structures, as outlined in Table 3. The Transformer-GAT model incorporates higher-order node information in addition to the GAT model. The comparison of findings with the GAT model indicates that higher-order nodes possess significant latent information. The Centrality-GAT model incorporates higher-order topological features in addition to the GAT model. On the other hand, the EdgeAdj-Transformer-GAT model and the TWC-GNN model differ in terms of the edge matrix weights, which are restricted to binary values of 0 and 1. The empirical findings suggest that higher-order topological structures have superior capability in capturing intricate relationships inside the network, hence leading to enhanced accuracy of the model.

Conclusion

In this paper, we have presented the TWC-GNN model, designed for the node classification task within directed graphs by leveraging higher-order topological structures and self-attention mechanisms. Specifically, we have introduced the concept of Centrality Information to quantify the reciprocal influence between central nodes and their neighboring nodes, allowing for a deeper understanding of intricate network relationships. Additionally, we have incorporated a Transformer-based self-attention mechanism using Laplacian Eigenvector Encoding to extract higher-order node information, which plays a crucial role in understanding the structure of directed graphs. The TWC-GNN model further refines node representations by using the GAT approach to focus on first-order information, balancing global and local node interactions. Through extensive experiments on citation network datasets, we have demonstrated the effectiveness of our model in node classification tasks. A series of ablation studies have shown that the performance of TWC-GNN is significantly influenced by the integration of higher-order topological structures and higher-order node information. In conclusion, our findings highlight the importance of higher-order structural information in directed networks, providing a strong foundation for future research in complex graph-based learning tasks. The proposed TWC-GNN model offers a promising approach for extracting and leveraging global and local graph features, enabling more accurate and efficient node classification in directed networks.

Data availability

The Cora dataset was created by McCallum et al. in 2000 and can be obtained from https://linqs.org/datasets/#cora. The Pubmed Dataset was created by Prithviraj Sen et al. and can be obtained from https://linqs.org/datasets/#pubmed-diabetes. The Citeseer data set was created by C. Lee Giles et al. and can be obtained from https://linqs.org/datasets/#citeseer-doc-classification. The Reddit Dataset consists of data collected from the Reddit community and can be obtained from https://arxiv.org/pdf/1706.02216.pdf. The implementation of the TWC-GNN model is publicly available on GitHub: https://github.com/xiajiwen/TWC-GNN and https://doi.org/10.5281/zenodo.14264082.

References

Camacho, D., Panizo-LLedot, A., Bello-Orgaz, G., Gonzalez-Pardo, A. & Cambria, E. The four dimensions of social network analysis: An overview of research methods, applications, and software tools. Information Fusion 63, 88–120 (2020).
Article Google Scholar
Francisco, M. & Castro, J. L. A methodology to quickly perform opinion mining and build supervised datasets using social networks mechanics. IEEE Transactions on Knowledge and Data Engineering (2023).
Roghani, H. & Bouyer, A. A fast local balanced label diffusion algorithm for community detection in social networks. IEEE Transactions on Knowledge and Data Engineering (2022).
Li, L. & Xu, J. Graph transformer-based self-adaptive malicious relation filtering for fraudulent comments detection in social network. Knowledge-Based Systems 280, 111005 (2023).
Article MATH Google Scholar
Zhang, Y., Lin, Q., Du, W. & Qian, F. Data-driven tabulation for chemistry integration using recurrent neural networks. IEEE Transactions on Neural Networks and Learning Systems (2022).
Li, M. & Zhu, Z. Spatial-temporal fusion graph neural networks for traffic flow forecasting. In Proceedings of the AAAI conference on artificial intelligence 35, 4189–4196 (2021).
Article MATH Google Scholar
Li, T. et al. Autost: Efficient neural architecture search for spatio-temporal prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 794–802 (2020).
Peng, Z. et al. Learning representations by graphical mutual information estimation and maximization. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 722–737 (2022).
Article PubMed MATH Google Scholar
Liu, Y. et al. Graph self-supervised learning: A survey. IEEE Transactions on Knowledge and Data Engineering 35, 5879–5900 (2022).
MATH Google Scholar
Wu, Z. et al. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems 32, 4–24 (2020).
Article MathSciNet MATH Google Scholar
Yang, M. et al. Hyperbolic graph neural networks: a review of methods and applications. arXiv preprint arXiv:2202.13852 (2022).
Zhou, J. et al. Graph neural networks: A review of methods and applications. AI open 1, 57–81 (2020).
Article MATH Google Scholar
Newman, M. E. & Girvan, M. Finding and evaluating community structure in networks. Physical review E. 69, 026113 (2004).
Article ADS MATH Google Scholar
Clauset, A., Newman, M. E. & Moore, C. Finding community structure in very large networks. Physical review E. 70, 066111 (2004).
Article ADS MATH Google Scholar
Perozzi, B., Al-Rfou, R. & Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 701–710 (2014).
Grover, A. & Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 855–864 (2016).
Bruna, J., Zaremba, W., Szlam, A. & LeCun, Y. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013).
Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. arXiv preprint [SPACE]arXiv:1609.02907 (2016).
Velickovic, P. et al. Graph attention networks. stat 1050, 10–48550 (2017).
Google Scholar
Wu, F. et al. Simplifying graph convolutional networks. In International conference on machine learning, 6861–6871 (PMLR, 2019).
Rong, Y., Huang, W., Xu, T. & Huang, J. Dropedge: Towards deep graph convolutional networks on node classification. arXiv preprint[SPACE]arXiv:1907.10903 (2019).
Chen, D. et al. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI conference on artificial intelligence 34, 3438–3445 (2020).
Article MATH Google Scholar
Bodnar, C., Di Giovanni, F., Chamberlain, B., Liò, P. & Bronstein, M. Neural sheaf diffusion: A topological perspective on heterophily and oversmoothing in gnns. Advances in Neural Information Processing Systems 35, 18527–18541 (2022).
Google Scholar
Keriven, N. Not too little, not too much: a theoretical analysis of graph (over) smoothing. Advances in Neural Information Processing Systems 35, 2268–2281 (2022).
MATH Google Scholar
Hamilton, W., Ying, Z. & Leskovec, J. Inductive representation learning on large graphs. Advances in neural information processing systems. 30 (2017).
Liu, C., Cao, T. & Zhou, L. Learning to rank complex network node based on the self-supervised graph convolution model. Knowledge-Based Systems. 251, 109220 (2022).
Article MATH Google Scholar
Dong, L. et al. Improving graph neural network via complex-network-based anchor structure. Knowledge-Based Systems. 233, 107528 (2021).
Article MATH Google Scholar
Battiston, F. et al. Networks beyond pairwise interactions: Structure and dynamics. Physics Reports 874, 1–92 (2020).
Article ADS MathSciNet MATH Google Scholar
Sizemore, A. E. et al. Cliques and cavities in the human connectome. Journal of computational neuroscience 44, 115–145 (2018).
Article MathSciNet PubMed MATH Google Scholar
Benson, A. R., Gleich, D. F. & Leskovec, J. Higher-order organization of complex networks. Science. 353, 163–166 (2016).
Article ADS PubMed PubMed Central MATH Google Scholar
Pržulj, N., Corneil, D. G. & Jurisica, I. Modeling interactome: scale-free or geometric?. Bioinformatics. 20, 3508–3515 (2004).
Article PubMed Google Scholar
Lee, J. B. et al. Graph convolutional networks with motif-based attention. In Proceedings of the 28th ACM international conference on information and knowledge management, 499–508 (2019).
Liu, C., Zhou, X., Zehmakan, A. N. & Zhang, Z. A fast algorithm for moderating critical nodes via edge removal. IEEE Transactions on Knowledge and Data Engineering (2023).
De Meo, P., Levene, M., Messina, F. & Provetti, A. A general centrality framework-based on node navigability. IEEE Transactions on Knowledge and Data Engineering. 32, 2088–2100 (2019).
Article MATH Google Scholar
Girvan, M. & Newman, M. E. Community structure in social and biological networks. Proceedings of the National Academy of Sciences. 99, 7821–7826 (2002).
Article ADS MathSciNet MATH Google Scholar
Cao, J., Ding, C. & Shi, B. Motif-based functional backbone extraction of complex networks. Physica A: Statistical Mechanics and its Applications. 526, 121123 (2019).
Article MATH Google Scholar
Chen, W. et al. Multi-range attentive bicomponent graph convolutional network for traffic forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence. 34, 3529–3536 (2020).
Article MATH Google Scholar
Ying, C. et al. Do transformers really perform badly for graph representation?. Advances in Neural Information Processing Systems. 34, 28877–28888 (2021).
MATH Google Scholar
Gao, J., Gao, J., Ying, X., Lu, M. & Wang, J. Higher-order interaction goes neural: A substructure assembling graph attention network for graph classification. IEEE Transactions on Knowledge and Data Engineering (2021).
Toth, C., Lee, D., Hacker, C. & Oberhauser, H. Capturing graphs with hypo-elliptic diffusions. Advances in Neural Information Processing Systems. 35, 38803–38817 (2022).
Google Scholar
Liu, J., Hooi, B., Kawaguchi, K. & Xiao, X. Mgnni: Multiscale graph neural networks with implicit layers. Advances in Neural Information Processing Systems. 35, 21358–21370 (2022).
Google Scholar
Wang, Y., Hu, L., Cao, X., Chang, Y. & Tsang, I. W. Enhancing locally adaptive smoothing of graph neural networks via laplacian node disagreement. IEEE Transactions on Knowledge and Data Engineering (2023).
You, J., Ren, Z., Yu, F. R. & You, X. One-stage shifted laplacian refining for multiple kernel clustering. IEEE Transactions on Neural Networks and Learning Systems (2023).
Dwivedi, V. P., Luu, A. T., Laurent, T., Bengio, Y. & Bresson, X. Graph neural networks with learnable structural and positional representations. arXiv preprint arXiv:2110.07875 (2021).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. nature. 521, 436–444 (2015).
PubMed Google Scholar
Bo, D., Wang, X., Shi, C. & Shen, H. Beyond low-frequency information in graph convolutional networks. In Proceedings of the AAAI conference on artificial intelligence 35, 3950–3957 (2021).
Article MATH Google Scholar
Yao, Y., Jin, W., Ravi, S. & Joe-Wong, C. Fedgcn: Convergence-communication tradeoffs in federated training of graph convolutional networks. Advances in neural information processing systems 36 (2024).
Devvrit, F., Sinha, A., Dhillon, I. & Jain, P. S3gc: scalable self-supervised graph clustering. Advances in Neural Information Processing Systems 35, 3248–3261 (2022).
Google Scholar
Wu, Z., Zhang, Z. & Fan, J. Graph convolutional kernel machine versus graph convolutional networks. Advances in neural information processing systems 36 (2024).
Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
Caruana, R., Lawrence, S. & Giles, C. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. Advances in neural information processing systems. 13 (2000).

Download references

Acknowledgements

This study is supported by the National Natural Science Foundation of China (NSFC) and the Research Grants Council (RGC) of Hong Kong Joint Research Scheme (Grant no. 62261160387).

Author information

Authors and Affiliations

College of Computer and Information Engineering, Nanjing Tech University, Nanjing, 211800, China
Yue Peng, Jiwen Xia, Dafeng Liu, Miao Liu, Long Xiao & Benyun Shi
College of Artificial Intelligence, Nanjing Tech University, Nanjing, 211800, China
Yue Peng, Jiwen Xia, Dafeng Liu, Miao Liu, Long Xiao & Benyun Shi

Authors

Yue Peng
View author publications
You can also search for this author in PubMed Google Scholar
Jiwen Xia
View author publications
You can also search for this author in PubMed Google Scholar
Dafeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Miao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Long Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Benyun Shi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YP and BS designed the research, YP, JX, and DL collected and processed the data, YP, JX, DL, ML, LX, and BS conceived the experiments, YP, JX, and DL conducted the experiments, YP, JX, DL, ML, LX, and BS analyzed the results. YP, JX, DL, and BS wrote the manuscript and all authors reviewed the manuscript.

Corresponding author

Correspondence to Benyun Shi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Peng, Y., Xia, J., Liu, D. et al. Unifying topological structure and self-attention mechanism for node classification in directed networks. Sci Rep 15, 805 (2025). https://doi.org/10.1038/s41598-024-84816-z

Download citation

Received: 07 April 2024
Accepted: 27 December 2024
Published: 04 January 2025
DOI: https://doi.org/10.1038/s41598-024-84816-z

Unifying topological structure and self-attention mechanism for node classification in directed networks

Subjects

Abstract

Similar content being viewed by others

GTAT: empowering graph neural networks with cross attention

Fusing multiplex heterogeneous networks using graph attention-aware fusion networks

Graph Geometric Algebra networks for graph representation learning

Introduction

Methods

Centrality information

Node liquidity

Multi-reference ability

Multi-propagation property

Self-attention based mechanisms

Transformer for higher-order node information

GAT for feature update among first-order neighbors

TWC-GNN: an implementation of the method

Results and discussion

Baselines and datasets

Node classification and visualization

Ablation study

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

GTAT: empowering graph neural networks with cross attention

Fusing multiplex heterogeneous networks using graph attention-aware fusion networks

Graph Geometric Algebra networks for graph representation learning

Introduction

Methods

Centrality information

Node liquidity

Multi-reference ability

Multi-propagation property

Self-attention based mechanisms

Transformer for higher-order node information

GAT for feature update among first-order neighbors

TWC-GNN: an implementation of the method

Results and discussion

Baselines and datasets

Node classification and visualization

Ablation study

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links