CN113157957A

CN113157957A - Attribute graph document clustering method based on graph convolution neural network

Info

Publication number: CN113157957A
Application number: CN202110244762.6A
Authority: CN
Inventors: 冀俊忠; 梁烨; 雷名龙
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2021-07-23

Abstract

The invention discloses an attribute graph document clustering method based on a graph convolution neural network, and belongs to the field of graph data mining. Specifically, a cross-layer linked graph convolution neural network is used for document attribute graph feature learning; estimating the optimal cluster number from the node characteristics by utilizing a deep clustering estimation model; alternately executing the two steps to finish training; obtaining the characteristics of all document attribute graph nodes to be clustered and the estimated number of clustering clusters by using the trained model; and obtaining a document attribute graph clustering result by using a k-means clustering method by taking the characteristics and the estimated number of the clustering clusters as input. When the cross-layer linked graph convolution neural network is trained, the self-separation regularization item based on the pairwise similarity of the nodes is adopted, so that the similarity of the characteristics of the nodes in the same cluster can be promoted, and the characteristics of the nodes in different clusters are far away, thereby effectively improving the graph clustering performance. The cluster estimation module realizes the data-driven cluster number estimation, so that the whole system is more suitable for a real data environment without labels.

Description

Attribute graph document clustering method based on graph convolution neural network

Technical Field

The invention belongs to the field of graph data mining, and particularly relates to an attribute image-text contribution clustering method based on a graph convolution neural network.

Background

The attribute graph clustering is a basic task in the field of graph data mining, and aims to divide nodes in a graph into mutually-disjoint clusters according to node attributes and graph structure information. Compared with the traditional graph clustering method only using graph structure information, the attribute graph clustering is more suitable for scenes with nodes having rich content information. The attribute graph clustering has wide practical application in the fields of community discovery, protein function module detection, financial network fraud detection and the like.

A number of depth model-based graph clustering efforts have been proposed. Compared with a shallow graph clustering method, the deep graph clustering method is better in capturing nonlinear and complex node relations in the graph and is beneficial to improving the clustering performance. At present, most of the existing deep map clustering methods adopt a two-step framework to complete a clustering task: the feature learning step uses a depth model to learn low-dimensional node features; the clustering step performs a conventional clustering method to complete the graph clustering tasks, such as k-means and spectral clustering. Whether the real features of the attribute Graph can be learned in the feature learning step is important for the Graph clustering task, various Graph Autoencoders (GAE) are usually used for capturing Graph structure information in the early depth model method, but GAEs only use the structure features of the Graph to complete the training of a neural network, and the node attribute information in the attribute Graph is ignored, so that the performance of the method in the attribute Graph clustering task is limited.

In recent years, attribute Graph clustering methods generally implement feature learning of Graph nodes by using Graph Neural Networks (GNNs). The GNNs aggregate the attribute information of adjacent nodes by weighting and update the node characteristics iteratively, and the forward propagation mode of the GNNs fuses the structural characteristics and the node attributes of the attribute graph, so that the data utilization rate is improved, and the GNNs can be naturally applied to attribute graph clustering tasks and improve the clustering performance. Furthermore, the goal of graph clustering is to detect local substructures with dense intra-cluster connections and sparse inter-cluster connections, while the node features learned by GNNs preserve the local similarity of the graphs, which is advantageous for the graph clustering task. However, the current methods have two limitations: firstly, the task guidance of clustering is lacked in the feature learning process, so that cluster-friendly node features are difficult to learn, the node distribution in the feature space is easy to cause overlapping problem, and further clustering is not facilitated. Secondly, the method needs to manually set the number of clusters in advance, and in real application, the network data is large in scale and high in complexity, and the number of clustered clusters is usually difficult to manually estimate. In addition, the actual cluster number is highly correlated with the specific task, and the optimal cluster number should be determined by the node characteristics themselves. Therefore, the method for clustering the non-parameter attribute graph based on the graph convolution neural network is important to graph data mining.

Document clustering aims at dividing documents with similar contents into different groups. The existing literature clustering method adopts a clustering method based on hierarchy, division, density and the like, and the main idea is to cluster the literatures with similar characteristics into a same cluster. However, the current method only considers the similarity between the document contents in the clustering process, and ignores the existing reference relationship between documents. The documents cited from each other generally have higher similarity, and the citation relationship of the documents can also provide valuable information for clustering.

Disclosure of Invention

The invention provides an attribute graph document clustering method based on a graph convolution neural network aiming at the problems in the prior art, which is used for solving the problem that the citation relation of documents is not utilized in the document clustering process, can deal with the unbalanced cluster structure in real graph data, learn node characteristics friendly to clustering tasks from the unbalanced cluster structure, estimate the number of clustering clusters of the graph data according to the node characteristics and realize the parameter-free attribute graph clustering.

A undirected property graph may be represented as G ═ (V, E, X)₁,v₂,…,v_nIs the set of nodes and E is the set of edges. The adjacency matrix of the graph may be denoted as A if node v_iAnd v_jThere is a connection between, then A _ij1, otherwise

The node attribute matrix of diagram G is represented, where n represents the number of nodes and m represents the dimension of the node attribute. The purpose of attribute graph clustering is to divide nodes in an attribute graph G into k mutually disjoint clusters, and in the invention, the number of k is estimated by a clustering modelThe blocks are estimated from the node characteristics.

In order to achieve the purpose, the invention adopts the technical scheme that: the method provides a cross-layer linked graph convolution neural network which can cope with unbalanced cluster structures in real graph data and learn valuable node characteristics from the unbalanced cluster structures, and has strong robustness; then, providing a regularization item for encouraging node feature self-separation, and realizing the joint optimization of feature learning and graph clustering; finally, a cluster estimation module is provided for realizing the data-driven cluster number estimation.

The specific technical scheme is as follows: firstly, training a cross-layer linked graph convolution neural network by using attribute graph data, encouraging self-separation of node characteristics in a training process, and outputting characteristics of graph nodes; secondly, training a clustering estimation module by using the characteristics of the graph nodes, and outputting the optimal clustering number; alternately executing the two steps until the maximum iteration number is reached; finally, clustering the characteristics of the graph nodes by using a k-means algorithm;

step (1) attribute graph feature learning:

step (1.1) attribute graph data encoding: the method utilizes a cross-layer linked graph convolution neural network to carry out coding operation on attribute graph data. Encoding the graph data significantly reduces the computational complexity of clustering and avoids overfitting phenomena that may result from graph data sparsity. The propagation rule of the graph neural network from l-1 to l-1 is as follows:

wherein N (v)_i| A) in a citation network represented by adjacency matrix A, including document v_iAnd document v_iThere are documents in citation relations, i.e., neighbor documents, i 1. W^(l)Is the parameter matrix of the l-th layer. deg (v) represents the degree of node v. When l is 1, in the formula (1)

Namely, the first layer graph convolution neural network aggregates the original characteristics of the neighbor documents. Relu (. cndot.) is a nonlinear activation function.

The cross-layer linked graph convolution neural network splices the output vectors of each layer of graph convolution: to be provided with

Node v in the representation_iThe output of the convolution of the graph of the l < th > layer, node v in the graph_iCoding result d of cross-layer linked graph convolution neural network_iConvolution of the neural network for each layer of graph versus node v in the graph_iThe output stitching vector of (a), expressed as follows:

coding result d_iFor the spliced vector output by each layer of graph convolution neural network, the coding result is subjected to linear mapping operation, and the node characteristic z learned by the graph convolution neural network is output_i. In the encoding step, the method uses a 6-layer cross-chain convolutional neural network.

And (1.2) decoding the node characteristic data:

the method uses a multilayer perceptron to realize the decoding of the attribute matrix:

wherein,

representing node characteristics z_iThe output of the decoding of (a) is,

d_erepresenting the dimension of the code vector, MLP_sRepresenting a multi-layer perceptron of s layers, the method uses a multi-layer perceptron of 2 layers. W_DAre parameters of the decoder. In order to make the node represent the characteristic z_iThe real original image information is reserved, the accuracy of clustering is ensured, and the output of a decoder is ensured

The original graph attribute information x should be kept as much as possible_iTherefore, the method uses mean square error loss (MSE) as an optimization target for the graph convolution neural network:

mean square error loss can measure x_iAnd

the method optimizes the objective function in the training so that the node characteristic z_iCan be decoded into original graph attribute information x_iThereby securing z_iIncluding the characteristics of the graph attribute information.

Step (1.3) clustering friendly graph feature learning:

the method provides a regularization term to encourage the separation of node natural cluster structures in the feature space, the regularization term effectively relieves the overlapping problem in graph clustering, and the clustering performance is further improved. Specifically, the pairwise similarity of graph node features is first modeled using student t-distribution Q:

wherein z is_iRepresenting a node v_iCharacteristic vector of (q)_ijCan be seen as node v_iAnd v_jProbability of having similar features in the feature space. As the student t distribution has a heavy tail characteristic, nodes with lower similarity are farther away in a feature space, and the natural cluster structures in the graph are macroscopically separated and the problem of crowding in the cluster is relieved. To increase the tendency for inter-cluster separation, further separating nodes in the feature space, the model encourages the distribution Q to approach another degree of freedomThe higher degree of student t distribution P realizes cluster-friendly feature learning:

wherein the parameter θ controls the degree of freedom of the distribution of the student t, set as d_e-1,d_eIs the dimension of the feature z. The higher degree of freedom makes the target distribution P more concentrated than Q, so P assigns higher probability for similar nodes in the feature space, making it more compact; lower probabilities are assigned to dissimilar nodes to make them more dispersed, thereby achieving self-separating regularization. The self-separating regularization term is defined as follows:

where KL is the Kullback-Leibler divergence, and is used to measure the asymmetric distance between distributions P and Q, P_ijAnd q is_ijRepresenting pairwise similarities of node features. By optimizing the KL divergence between P and Q, the encoder is able to increase the inter-cluster distance and decrease the intra-cluster distance. The process is beneficial to model learning of clustering friendly features, so that the graph clustering performance is improved. In summary, the feature learning optimization objective of the graph convolution neural network is as follows:

wherein

Representing input node attribute x_iThe output of the codec process, α ═ 0.01, is a hyperparameter. By optimizing the objective function, the node distribution optimization aiming at clustering can be realized while the graph convolution neural network learns the node characteristics of the graph.

Step (2): estimating the number of clustering clusters:

most of the traditional clustering methods need a user to specify the number of clustering clusters, and in order to expand the model to a parameter-free condition, the method provides a deep clustering estimation model. The model aims to estimate the optimal cluster number from the node features z.

The model uses softmax autoencoder for cluster estimation. And (3) performing encoding and decoding operations on the node characteristics z in the step (1) by using a softmax self-encoder, and using a softmax nonlinear function in an output layer of the encoder, wherein the nonlinear function converts the characteristics of the nodes into soft clustering probability distribution. The number of hidden layer neurons represents the upper limit of the number of cluster clusters. The output of the softmax encoder is

Wherein d is_cIs the number of hidden units. Clustering assignment y due to activation using softmax_iThe sum of (a) and (b) is equal to 1,

the method is that d_cThe total number of the nodes is set, so that the clustering estimation is completely independent of parameters, and the clustering estimation without parameters is realized.

The softmax autoencoder estimates the number of clusters by calculating the number of cluster labels, but is itself prone to generating a uniformly distributed cluster allocation, which is disadvantageous for cluster estimation. Therefore, the present model learns the centralized soft cluster distribution by introducing additional Gini-index regularization, which can be expressed as:

ignoring constant 1, the above equation can be expressed as:

optimizing the kini coefficient loss can facilitate low entropy distribution of cluster assignments. Combining the reconstruction loss with the regularization of the kini coefficients, the overall loss function of the softmax autoencoder can be expressed as:

wherein z is_iAnd

represents the input and output of the softmax self-encoder, and β ═ 0.1 is a hyperparameter.

And (5) training the softmax self-encoder by using the target function to obtain a clustering distribution result Y. y is_iRepresenting the probability of dividing a node into the ith cluster. Finally, the model is used as an estimation of the cluster number k by calculating the number of different labels of all nodes. The calculation of k can be described as:

where the Card function calculates the number of different elements present in the collection.

And (3): the alternative training graph convolution and clustering estimation module: the above feature learning and cluster estimation are optimized simultaneously by means of alternate training, and in each iteration, we first fix the parameters of the softmax self-encoder, and then optimize the feature learning using equation (8). The parameters of the feature learning model are then fixed and the parameters of the softmax autoencoder are optimized using equation (11). Then, the cluster number k is calculated from the cluster distribution Y by equation (12).

And (4): and (3) outputting a clustering result: setting the maximum optimization times of the model to be 200, and outputting the node characteristics z and the cluster estimation number k after the maximum optimization times reach the times_eFinally, in z and k_eAnd as input, running a k-means clustering algorithm to obtain a graph clustering result.

Advantageous effects

The invention effectively improves the clustering accuracy.

Drawings

FIG. 1: a flow chart of an attribute graph clustering method based on a graph convolution neural network is disclosed.

FIG. 2: an attribute graph clustering model structure chart based on a graph convolution neural network.

FIG. 3: and (5) visually displaying the clustering result of the Cora data set.

Detailed Description

The validity of the method is verified by taking three literature databases of Cora, Citeser and Pubmed as examples. The document attribute map data is first constructed with the three databases described above. The document attribute map may be expressed as G ═ a, X, where a is the adjacency matrix, if document v_iAnd v_jThere is a reference relationship between them, then A_ij1, otherwise A_ij0. X is the document attribute matrix, the ith row vector X in X_iContains a reference v_iDescription of the contents. The construction method of X comprises the following steps: (1) eliminating the false words, i.e. adverb, preposition, conjunctions, auxiliary words, etc., in the document. (2) Words with elimination frequencies less than 10. (3) Constructing word vector characteristics of each document by using residual words if the jth word is in the document v_iIn (b) is present, then x_ij1, otherwise x_ijAnd (0) constructing the finished document attribute graph parameters as follows:

TABLE 1

Data set	Number of nodes	Number of edges	Number of real clusters	Feature dimension
					Cora	2708	5429	7	1433
Citeseer	3327	4732	6	3703
					Pubmed	19717	44338	3	500
Wiki	2405	17981	17	4973

The following describes the specific implementation steps of the present invention on the four attribute map data sets:

step (1) attribute graph feature learning:

step (1.1) attribute graph data encoding: as shown in the encoder portion of fig. 2, the present invention uses a 6-layer graph convolution neural network to perform an encoding operation on four data sets in table 1, the output of each layer of graph convolution neural network is used as the input of the next layer of graph convolution neural network, and the forward propagation process from layer l-1 to layer l can be expressed as:

taking Cora attribute diagram as an example, the document v in the attribute diagram is_i，N(v_i| A) representation includes document v_iAnd withDocument v_iThere are references in citation, i.e. neighbour references, i ═ 1,2, …,2708, i.e. there are a total of 2708 references. W^(l)Is the parameter matrix of the l-th layer. deg (v)_i) Representing a node v_iDegree of (c). When l is 1, in the formula (1)

I.e. the first layer graph convolution neural network aggregates v_iOf the neighbor documents. Relu (·) is a nonlinear activation function, Relu (x) max (0, x).

As shown in fig. 2, in addition to forward propagation, the output of each layer of the graph convolutional neural network is passed to an inter-layer aggregation module that performs a stitching operation on the node features from the different layers:

and performing linear mapping operation on the spliced vector to further reduce the dimension to obtain a node characteristic vector z., obtaining a 16-dimensional node characteristic vector for the Cora data set through linear mapping, and obtaining a 32-dimensional node characteristic vector for a larger data set Pubmed through linear mapping.

And (1.2) decoding the node characteristic data: as shown in the decoder portion of FIG. 2, the present invention uses a two-layered multi-layered perceptron for the encoding characteristics z of the four data sets described above_iAnd (3) decoding:

wherein,

representing node characteristics z_iThe output of the decoding of (a) is,

d_erepresenting the dimensions of the coded vector, d for the Cora, Citeser, Wiki and Pubmed data sets_e16, 32, respectively MLP₂Multilayer perceptron representing two layers, W_DAre parameters of the decoder. The decoder tries to derive the characteristic z_iReconstructing section attribute information, so the method sets the hidden layer structure of the multilayer perceptron as d_e500-1000-m, where m represents the dimension of the node attribute, and the specific arrangement of m in the above four data sets is given in the fifth column of table 1.

Step (1.3) clustering friendly graph feature learning: as shown in the middle of FIG. 2, the student t distribution Q is first used to map node features z of the four datasets as described above_iPairwise similarity modeling of (c):

the distribution Q is represented in the form of a matrix with the ith row and jth column elements represented by Q_ijAnd (4) forming. For the different sets of data it is possible to,

where n represents the number of nodes of the dataset and the number of nodes n for the Cora, Citeseer, Wiki and Pubmed datasets is given by the second column of table 1. Then, the student t distribution P with higher degree of freedom is calculated in the same way:

wherein the parameter θ controls the degree of freedom of the distribution of the student t, set as d_e-1,d_eIs the dimension of the feature z. The parameters θ for the Cora, citeser, Wiki and Pubmed datasets are set to: 1432. 3702, 499, 4972. the higher degree of freedom makes the target distribution P more concentrated than Q, so P assigns higher similar nodes in the feature spaceThe probability of (2) making it more compact; lower probabilities are assigned to dissimilar nodes to make them more dispersed, thereby achieving self-separating regularization. The method optimizes the graph convolution neural network using the regularization term as follows:

where KL is the Kullback-Leibler divergence, and is used to measure the asymmetric distance between distributions P and Q, P_ijAnd q is_ijRepresenting pairwise similarities of node features. In summary, the feature learning optimization objective of the graph convolution neural network is as follows:

wherein

Representing input node attribute x_iThe output of the codec process, α ═ 0.01, is a hyperparameter.

Step (2): estimating the number of clustering clusters: as shown in fig. 2 cluster estimation module: the invention uses a softmax autoencoder for cluster estimation. And (3) performing encoding and decoding operation on the node characteristics z in the step (1) by using a softmax self-encoder, and using a softmax nonlinear function in an output layer of the encoder, wherein the nonlinear function converts the characteristics of the nodes into soft clustering probability distribution. The output of the softmax encoder is

Wherein d is_cIs the number of hidden units. For different data sets: cora, Citeser, Wiki and Pubmed, the invention will be described in detail_cSet to the total number of nodes n.

The Gini-index regularization of the cluster estimation module may represent this as:

ignoring constant 1, the above equation can be expressed as:

with equation (10) as the regularization term and MSE loss as the main optimization objective, calculating the overall loss function of the softmax self-encoder can be:

wherein z is_iAnd

representing the input and output of softmax from the encoder, β is set to 0.1 uniformly for the four data sets. And (5) training the softmax self-encoder by using the target function to obtain a clustering distribution result Y. y is_iRepresenting the probability of dividing a node into the ith cluster. Finally, the model is used as an estimation of the cluster number k by calculating the number of different labels of all nodes. The k value is calculated as follows:

And (3): the alternative training graph convolution and clustering estimation module: the feature learning and the cluster estimation are optimized simultaneously through an alternate training mode, and in each iteration, the parameters of the softmax self-encoder are firstly fixed, and then the feature learning is optimized by using an equation (8). The parameters of the feature learning model are then fixed and the parameters of the softmax autoencoder are optimized using equation (11). Then, the cluster number k is calculated from the cluster distribution Y by equation (12). The neural network in each module is optimized by adopting an Adam optimizer, and the fixed learning rate is set to be 10^-3Dropout rate is set to 0.2.

To illustrate the beneficial effects of the method of the present invention, in the implementation we performed comparison tests on a number of different algorithms: the k-means algorithm is a traditional clustering method based on distance, and the main idea is to assign samples to the nearest clustering centers. And (3) enabling the Deepwalk to carry out truncated random walk on the graph to generate a group of node sequences, learning node representation by adopting a Skip-gram model, and finally clustering by using a k-means algorithm. Graphnencoder learns the node representation by training stacked sparse autoencoders and applies spectral clustering to obtain a clustering result. The MGAE trains graph convolutional neural networks using an unsupervised mapping loss and performs spectral clustering on node features.

The experimental results of the above comparison algorithm on four data sets of Cora, Citeseer, Wiki and Pubmed are shown in table 2:

TABLE 2

As shown in Table 2, the method of the present invention achieves the best results on the real attribute map data set, with better performance on the ACC, F1, NMI and ARI clustering indexes. The method is reasonable and reliable.

The self-separation regularization item provided by the invention can improve the distribution of node features in a high-dimensional space, is verified by feature vector visualization, a T-SNE algorithm is operated on a feature vector z of a Cora data set, and the visualization result is shown in FIG. 3. As can be seen in fig. 3, comparing the first rows (a) - (e) and the second rows (f) - (j), we can observe a strong contrast between using and not using the self-separating regularization term. The cluster structures in the second row are denser and the gaps between different clusters are more pronounced, indicating that self-separation regularization has a large effect on increasing the inter-cluster distance and decreasing the intra-cluster distance.

The invention is based on a graph convolution neural network, firstly carries out encoding and decoding operation on an attribute graph, provides a cross-layer link map convolution neural network, unsupervised learns the characteristics of graph nodes, and optimizes the characteristic distribution of the nodes by a self-separation regularization item in the process so as to enable the nodes to have a more obvious clustering structure. And then, a cluster estimation module is provided, and the optimal cluster number is estimated based on the node characteristics. The experiment result and the visual experiment on the real data set show that the method is reasonable and reliable and can provide reliable help for the attribute map clustering task.

Claims

1. An attribute graph document clustering method based on a graph convolution neural network is characterized in that:

step (1), performing document attribute graph feature learning by using a cross-layer linked graph convolution neural network, wherein the learning comprises two stages of encoding and decoding to obtain features z of all graph nodes, and the features z are used for completing separation of graph node natural cluster structures in a feature space;

estimating the optimal cluster number from the node characteristics z by utilizing a deep clustering estimation model;

step (3), alternately executing the two steps until the maximum iteration number is reached to finish training;

obtaining the characteristics of all document attribute graph nodes to be clustered and the estimated number of clustering clusters by using the trained cross-layer linked graph convolution neural network and a deep clustering estimation model; and obtaining a document attribute graph clustering result by using a k-means clustering method by taking the characteristics and the estimated number of the clustering clusters as input.

2. The method for clustering the attribute graph documents based on the graph convolution neural network as claimed in claim 1, wherein: the step (1) further comprises the steps of,

step (1.1) attribute graph data encoding: encoding the attribute map data, and setting the input of the document attribute map as G ═ A, X, wherein A is an adjacent matrix, if the document is in the same way as the adjacent matrix, then the document attribute map data is encodedv_iAnd v_jThere is a reference relationship between them, then A_ij1, otherwise A_ijX is a document attribute matrix, each row vector representing a description of the contents of a document, where the ith row vector X in X is 0_iRepresentative of the document v_iThe description of the content, the propagation rule of the graph convolution neural network from the l-1 st layer to the l-1 st layer is as follows:

wherein N (v)_i| A) in a citation network represented by adjacency matrix A, including document v_iAnd document v_iA literature with a citation relationship, namely a neighbor literature, i 1., n, namely n literatures; w^(l)Is the parameter matrix of the l-th layer. deg (v) represents the degree of node v; when l is 1, in the formula (1)

Namely, the first layer graph convolution neural network aggregates the original characteristics of the neighbor documents, and Relu (·) is a nonlinear activation function;

the coding result is subjected to linear mapping operation, and the node v in the graph learned by the graph convolutional neural network is output_iCharacteristic z of the node_i；

And (1.2) decoding the node characteristic data:

decoding of the attribute matrix is achieved using a multi-layer perceptron:

wherein,

representing node characteristics z_iThe output of the decoding of (a) is,

d_erepresenting a coded vector z_iDimension of (1), MLP_sMultilayer perceptron representing s layers, W_DAre parameters of the decoder.

3. The method for clustering the attribute graph documents based on the graph convolution neural network as claimed in claim 2, wherein:

the construction method of X comprises the following steps: (1) eliminating the fictitious words in all the document documents; (2) eliminating all words with frequencies less than 10 in the literature documents; (3) constructing word vector characteristics of each document by using residual words if the jth word is in the document v_iIn (b) is present, then x_ij1, otherwise x_ij＝0。

4. The method for clustering the attribute graph documents based on the graph convolution neural network as claimed in claim 1, wherein: the feature learning optimization objective of the cross-layer linked graph convolution neural network is as follows:

wherein,

representing input node attribute x_iThe output of the encoding and decoding process, alpha is a hyperparameter,

q_ijrepresenting a node v_iAnd v_jThe probability of having similar features in the feature space is as follows:

p_ijthe method is used for approaching another student t distribution with higher degree of freedom to realize cluster-friendly feature learning, and specifically comprises the following steps:

wherein the parameter θ controls the degree of freedom of the distribution of the student t, set as d_e-1，

And optimizing the objective function to realize the distribution optimization of the nodes taking clustering as the target while the graph convolution neural network learns the node characteristics of the graph.

5. The method for clustering the attribute graph documents based on the graph convolution neural network as claimed in claim 1, wherein: the step (2) further comprises the following steps:

the deep clustering estimation model adopts a softmax self-encoder and is used for clustering estimation; performing encoding and decoding operations on the node characteristics z in the step (1) by a softmax self-encoder, wherein a softmax nonlinear function is used in an output layer of the encoder, the nonlinear function converts the characteristics of the nodes into probability distribution of soft clusters, the number of hidden layer neurons represents the upper limit of the number of cluster clusters, and the output of the softmax encoder is a cluster distribution result

Wherein d is_cIs the number of hidden units, cluster assignment y due to activation using softmax_iOfAnd is equal to 1, and is,

d_cand setting the total number of the nodes for realizing complete non-parameter clustering estimation.

6. The method according to claim 5, wherein the method comprises the following steps: the overall loss function of the softmax autoencoder can be expressed as:

wherein z is_iAnd

representing the input and output of the softmax autoencoder, beta being a hyperparameter, y_iRepresenting the probability of dividing a node into the ith cluster.

7. The method for clustering the attribute graph documents based on the graph convolution neural network as claimed in claim 6, wherein the softmax self-encoder calculates the number of different tags of all nodes as the estimation of the cluster number k, and the calculation of k is described as follows:

wherein the Card function calculates the number of different elements present in the set, Y_ijExpression document v_iProbability of belonging to the jth cluster.

8. The method for clustering the attribute graph documents based on the graph convolution neural network as claimed in claim 1, wherein the step (3) is as follows: simultaneously optimizing a cross-layer linked graph convolution neural network and a deep clustering estimation model in an alternating training mode, firstly fixing parameters of a softmax self-encoder in each iteration, and then optimizing feature learning by using an equation (8); then fixing the parameters of the characteristic learning model, and optimizing the parameters of the softmax self-encoder by using an equation (11); then, the cluster number k is calculated from the cluster distribution Y by equation (12).

9. The method for clustering the attribute graph documents based on the graph convolution neural network as claimed in claim 1, wherein: the attribute map data encoding step uses a 6-layer cross-layer linked map convolutional neural network.