Nothing Special   »   [go: up one dir, main page]

CN113887698B - Integral knowledge distillation method and system based on graph neural network - Google Patents

Integral knowledge distillation method and system based on graph neural network Download PDF

Info

Publication number
CN113887698B
CN113887698B CN202110982472.1A CN202110982472A CN113887698B CN 113887698 B CN113887698 B CN 113887698B CN 202110982472 A CN202110982472 A CN 202110982472A CN 113887698 B CN113887698 B CN 113887698B
Authority
CN
China
Prior art keywords
graph
knowledge
model
attribute
teacher
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110982472.1A
Other languages
Chinese (zh)
Other versions
CN113887698A (en
Inventor
周晟
仉鹏
卜佳俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202110982472.1A priority Critical patent/CN113887698B/en
Publication of CN113887698A publication Critical patent/CN113887698A/en
Application granted granted Critical
Publication of CN113887698B publication Critical patent/CN113887698B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention aims to provide an overall knowledge distillation method based on a graph neural network, which comprises the following steps: giving the characteristic representation and classification prediction results learned by teacher and student networks, taking each sample as a node, taking the network-learned characteristic as the attribute of the node, taking the K neighbor (KNN) relationship of the classification prediction result as an edge, and constructing an attribute graph for each network; extracting integrity knowledge by using node attributes and topology information of neighborhood samples in a topological structure self-adaptive graph convolution neural network aggregate attribute graph, wherein the node attributes and the topology information are expressed as unified graph-based embedded vectors; the cross information of the graph embedded representation of the maximum chemical network and teacher network is estimated using infoNCE and the training efficiency is accelerated using feature memory storage techniques. The method comprises the following steps: the knowledge on the individuals and the knowledge on the relations in the teacher network can be integrated at the same time, so that the student network learns the integral knowledge, and the performance of the student network is improved.

Description

Integral knowledge distillation method and system based on graph neural network
Technical Field
The invention relates to the field of deep learning and computer vision, in particular to a global knowledge distillation method and system based on a graph neural network.
Background
Deep Neural Networks (DNNs) have enjoyed great success in a variety of applications. However, their success depends to a large extent on a large amount of computing and storage resources, which are often not available in embedded devices and mobile devices. In order to reduce costs while maintaining satisfactory results, some of the techniques of model compression are beginning to be a hot topic of research. The knowledge distillation technology is one of the methods, and the technology can transfer knowledge from a statistically large trained teacher network to a small-sized student network to be learned, so that the effect of the student network is improved, and meanwhile, the characteristics of small volume and quick operation of the student network are maintained.
Knowledge extracted from the teacher network plays a central role in knowledge distillation. In the existing knowledge distillation method, two types of knowledge distillation are classified according to the type of extracted knowledge, namely, knowledge distillation on an individual basis and knowledge distillation on a relationship basis. Knowledge distillation on-individual means that individual knowledge is extracted from each data instance independently using a teacher network and provides more advantageous supervision than discrete tags, including probabilistic representation (logits), feature representation and feature mapping, etc. Knowledge distillation on the relationship is extracted from the relationships of the paired samples, and the relationships are kept between student networks and teacher networks of different network architectures through training of the student networks.
Although both of the above-described methods of knowledge distillation have been successful, the existing methods independently employ both types of techniques, ignoring their inherent relevance. Especially when the teacher's network is limited in its ability to independently extract one type of knowledge is insufficient for learning by the student's network. Intuitively, knowledge on an individual and knowledge on a relationship can be seen as two views of the natural correlation of the same teacher network. Two similar examples often have similar individual features in a similar pattern of relationships, and exploring this knowledge is critical to training more discriminative student network learning. At the same time, the knowledge on the individuals and the related knowledge on the relations are integrated, and the inherent uniformity of the knowledge and the related knowledge is reserved, so that the knowledge distillation is of great importance.
Disclosure of Invention
The present invention has been made to overcome the above-mentioned drawbacks of the prior art, and provides a global knowledge distillation method (the Holistic Knowledge Distillation (HKD) method and system based on a graph neural network that can integrate knowledge on an individual and knowledge on a relationship at the same time.
The invention aims at realizing the following technical scheme:
an overall knowledge distillation method based on a graph neural network, comprising:
And 1, respectively constructing attribute graphs of a teacher model and a student model. Inputting the image into a teacher model and a student model to obtain a feature representation f t,fs (wherein D t and d s are dimensions of feature representations output by the teacher model and the student model, respectively) and the result p t,ps of the classification prediction, and then construct an attribute graph G t={At,Ft}、Gs={As,Fs for the teacher model and the student model, respectively, where each node represents an instance, and the node attributes represent learned feature representations, whereA t,As is an adjacency matrix of the attribute map constructed based on p t,ps and is based on the formula/>Construction, wherein/>Is a graph construction function based on K-nearest neighbors (Knearestneighbors, KNN). . In the whole training process, gt is fixed, and the attribute and structure of Gs are dynamically changed.
The attribute map defined above has the following characteristics: first, the KNN graph will filter out the least relevant sample pairs compared to the inter-instance complete connected graph constructed by the prior relational knowledge decomposition method. This is particularly important because only a few samples in a randomly sampled batch are truly correlated and provide sufficient information for learning of node representations, the concepts of which should be well known to researchers in the field and will not be described in detail here. Second, since edges are constructed based on predicted probabilities, the graph can model inter-class and intra-class information. Finally, the graph neural network can be utilized to extract knowledge on individuals and knowledge on relationships from the attribute graph in a very efficient manner.
And 2, aggregating node attributes and Topology information of the neighborhood samples in the attribute map by using a topological structure self-adaptive graph convolution neural network (Topology ADAPTIVE GRAPH Convolution Network, TAGCN) to extract overall knowledge, thereby obtaining an embedded vector based on the map.
As shown in the following formulas (1) and (2), learning the attribute information and the topology information of the nodes by using TAGCN to learn simultaneously to extract the knowledge of the integrity, and obtaining the teacher model and the representation of the student model based on the graphAnd/>
Where Θ s 1 and Θ t 1 are learnable parameters and g t,gs is the dimension represented by the above graph. D t,Ds is a diagonal matrix of attribute maps, i.e., shown in equation (3)
And 3, estimating the mutual information of the graph embedded representation of the maximum chemical model and the teacher model by using InfoNCE, and accelerating the training efficiency by using a characteristic memorization storage technology.
In order for the student model to learn knowledge of the teacher model's integrity as much as possible, it is desirable to maximize the similarity of Ht and Hs, many existing vector-wise similarity metric methods (e.g., cosine similarity, euclidean distance) are not suitable for distillation of the integrity knowledge, they are often limited by differences in characterization capability due to structural differences between the teacher model and the student model, and direct alignment of Ht and Hs may result in too refined knowledge learned. To overcome the above limitation, mutual information is used to measure the degree of similarity of student models distilling information from teacher models, i.e., maximizing the mutual information of Ht and Hs, as shown in the following equation (4).
I (-) represents the mutual information measurement of two random variables, and is inspired by the research work of some recent mutual information estimation methods, infoNCE is adopted to estimate the mutual information, and the relation between infoNCE and the mutual information is as shown in the following formula (5):
Where f (·) is a vector-wise similarity measure function and h t i,hs i is the graph-based characterization of sample i learned in teacher and student models.
The student model needs to learn knowledge (such as label information in data) besides learning the teacher model, and cross entropy is a loss function in a common classification task, and the final loss function is shown in the following formula (6).
Where β is the weight of the linear combination.
Because infoNCE requires the use of all the samples in the dataset as negative samples, the calculation of distillation losses to calculate the integrity is prohibitively expensive for larger-scale datasets. To avoid computing the characterization of the sample during the training process, feature memorisation storage techniques have been widely used. In the method of the embodiment, G t and Gs is respectively constructed in a randomly selected mini-Batch, and the overall knowledge of graph-based characterization Ht and Hs reactions is presented in different attribute graphs, which cannot be stored by using a feature memory storage technology, so that only the feature memory storage technology is respectively used for storing F t,Fs. The final approximated overall distillation loss is defined as shown in equation (7):
in the method for embedding the mutual information represented by the maximum chemical model and the teacher model in the step3, infoNCE is used for maximizing the mutual information, and the relation between infoNCE and the mutual information is shown as a formula (8):
Where f (·) is a vector-wise similarity measure function and h t i,hs i is the graph-based characterization of sample i learned in teacher and student models.
The system for implementing the overall knowledge distillation method based on the graph neural network comprises a teacher model and a student model attribute graph construction model, an overall knowledge extraction module and a graph embedded representation mutual information maximization module of the student model and the teacher model, which are connected in sequence.
The working principle of the invention is as follows: according to the knowledge distillation method, on the basis of the individual knowledge and the relational knowledge of the teacher model, the integral knowledge from the teacher model is further extracted by using the graph neural network, and the student model learns the integral knowledge from the teacher model, so that the model performance is improved more obviously than that of other knowledge distillation methods.
The invention has the advantages that: the student model can learn not only knowledge on individuals from teacher models and knowledge on relationships, but also more complex overall knowledge, so that the performance improvement of the student model is more obvious than that of other knowledge distillation methods.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a classification of a knowledge distillation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an overall knowledge distillation method based on a graph neural network according to an embodiment of the present invention;
fig. 3 is a schematic flow chart of an overall knowledge distillation method based on a graph neural network according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The embodiment of the invention provides a global knowledge distillation method based on a graph neural network, which can simultaneously distill knowledge on an individual and knowledge on a relation.
In the knowledge distillation framework provided by the embodiment of the invention, given the characteristic representation learned by a teacher model and a student model and the classification prediction result, firstly, an attribute graph is constructed for each model, wherein each node represents an example, the node attribute represents the learned characteristic representation, and the edges between the examples are constructed by using the K Nearest Neighbor (KNN) relationship of the classification prediction result.
The node properties of neighborhood samples in the property graph are aggregated using a Topology-adaptive graph convolutional neural network (Topology ADAPTIVE GRAPH Convolution Network (TAGCN)) to extract the overall knowledge, represented as a unified graph-based embedding vector. The use InfoNCE of estimates maximizes the mutual information of the graph-embedded representation of the teacher model and the student model and accelerates training efficiency using feature memory storage techniques. Through the complex knowledge in the teacher model attribute diagram, the performance of the student model can be improved more obviously than the current knowledge distillation method.
Specifically, as shown in fig. 3, a schematic diagram of an overall knowledge distillation method based on a graph neural network according to an embodiment of the present invention mainly includes the following steps:
And 1, respectively constructing attribute graphs of a teacher model and a student model. Inputting the image into a teacher model and a student model to obtain a feature representation f t,fs (wherein D t and d s are dimensions of feature representations output by the teacher model and the student model, respectively) and the result p t,ps of the classification prediction, and then construct an attribute graph G t={At,Ft}、Gs={As,Fs for the teacher model and the student model, respectively, where each node represents an instance, and the node attributes represent learned feature representations, where A t,As is an adjacency matrix of the attribute map constructed based on p t,ps and is based on the formula/>Construction, wherein/>Is a graph construction function based on KNN. During the whole training process, G t is fixed, and the attribute and structure of G s are dynamically changed.
The attribute map defined above has the following characteristics: first, the KNN graph will filter out the least relevant sample pairs compared to the inter-instance complete connected graph constructed by the prior relational knowledge decomposition method. This is particularly important because only a few samples in a randomly sampled batch are truly correlated and provide sufficient information for learning of node representations, the concepts of which should be well known to researchers in the field and will not be described in detail here. Second, since edges are constructed based on predicted probabilities, the graph can model inter-class and intra-class information. Finally, the graph neural network can be utilized to extract knowledge on individuals and knowledge on relationships from the attribute graph in a very efficient manner.
And 2, aggregating node attributes and Topology information of the neighborhood samples in the attribute map by using a topological structure self-adaptive graph convolution neural network (Topology ADAPTIVE GRAPH Convolution Network, TAGCN) to extract overall knowledge, thereby obtaining an embedded vector based on the map.
As shown in the following formulas (1) and (2), learning the attribute information and the topology information of the nodes by using TAGCN to learn simultaneously to extract the knowledge of the integrity, and obtaining the teacher model and the representation of the student model based on the graphAnd/>
Where Θ s l and Θ t l are learnable parameters and g t,gs is the dimension represented by the above graph. D t,Ds is a diagonal matrix of attribute maps, i.e., shown in equation (3)
And 3, estimating the mutual information of the graph embedded representation of the maximum chemical model and the teacher model by using InfoNCE, and accelerating the training efficiency by using a characteristic memorization storage technology.
In order for the student model to learn knowledge of the teacher model's integrity as much as possible, it is desirable to maximize the similarity of Ht and Hs, many existing vector-wise similarity metric methods (e.g., cosine similarity, euclidean distance) are not suitable for distillation of the integrity knowledge, they are often limited by differences in characterization capability due to structural differences between the teacher model and the student model, and direct alignment of Ht and Hs may result in too refined knowledge learned. To overcome the above limitation, mutual information is used to measure the degree of similarity of student models distilling information from teacher models, i.e., maximizing the mutual information of Ht and Hs, as shown in the following equation (4).
I (-) represents the mutual information measurement of two random variables, and is inspired by the research work of some recent mutual information estimation methods, infoNCE is adopted to estimate the mutual information, and the relation between infoNCE and the mutual information is as shown in the following formula (5):
Where f (·) is a vector-wise similarity measure function and h t i,hs i is the graph-based characterization of sample i learned in teacher and student models.
The student model needs to learn knowledge (such as label information in data) besides learning the teacher model, and cross entropy is a loss function in a common classification task, and the final loss function is shown in the following formula (6).
Where β is the weight of the linear combination.
Because infoNCE requires the use of all the samples in the dataset as negative samples, the calculation of distillation losses to calculate the integrity is prohibitively expensive for larger-scale datasets. To avoid computing the characterization of the sample during the training process, feature memorisation storage techniques have been widely used. In the method of the embodiment, G t and Gs is built in a randomly selected mini-Batch, and the overall knowledge of the graph-based representations H t and H s are presented in different attribute graphs, which cannot be stored using the feature-memory storage technique, and therefore, only the feature-memory storage technique is used to store F t,Fs, respectively. The final approximated overall distillation loss is defined as shown in equation (7):
in the method for embedding the mutual information represented by the maximum chemical model and the teacher model in the step3, infoNCE is used for maximizing the mutual information, and the relation between infoNCE and the mutual information is shown as a formula (8):
Where f (·) is a vector-wise similarity measure function and h t i,hs i is the graph-based characterization of sample i learned in teacher and student models.
According to the scheme provided by the embodiment of the invention, the knowledge distillation method can enable the student model to learn the integral knowledge of the teacher model more effectively, and compared with other knowledge distillation methods, the performance of the student model is improved more obviously.
The system for implementing the overall knowledge distillation method based on the graph neural network comprises a teacher model and a student model attribute graph construction model, an overall knowledge extraction module and a graph embedded representation mutual information maximization module of the student model and the teacher model, which are connected in sequence. The teacher model and the student model are respectively corresponding to the contents of the steps 1,2 and 3, and the overall knowledge extraction module and the maximization module of the mutual information of the graph embedded representation of the student model and the teacher model.
In order to illustrate the effects of the above-described schemes of embodiments of the present invention, the experiments are described in connection with experiments that are developed on classical datasets in several fields of image classification.
1. Experimental data set
The experiment involved two reference data sets, the correlation of which is described in the following table:
Data set Category number Training set size Test set size Image size
Tiny-ImageNet 200 100000 10000 224*224*3
Cifar-100 20 50000 10000 32*32*3
2. Model structure
Four structures, resNet, VGG, shuffleNet, mobileNet respectively, are used in the teacher model and the student model, and are all network structures familiar to researchers in the field, and are not described in detail herein.
3. Baseline method
In order to demonstrate the superiority of this method, the present examples compare recent knowledge distillation methods, which can be generalized into two categories, differing by the schematic diagram of the knowledge distillation classification provided by the present examples.
In particular, the method comprises the steps of,
The first category is a method of knowledge distillation on an individual, comprising VANILLA KD to learn logits, AT to learn attention Map, PKT to learn the characteristic representation, CRD and SSKD.
The second category is relational knowledge distillation methods that learn knowledge of paired relational types, including RKD, CCKD.
The above comparison methods are all reproduced by using the code of the author which is open source, and in order to keep consistency of training samples among various methods, data enhancement in SSKD codes is removed.
4. Conclusion of the experiment
Training the embodiments of the invention and various comparison methods using different teacher models and student models, at CIFAR100,100
The performance on the dataset is as follows (the evaluation index is the accuracy).
The performance on TINYIMAGENET datasets is as follows (the evaluation index is accuracy).
From the comparative experiments on both data sets, it was found that the performance of the present method (HKD) was significantly better than the comparative method with different teacher model student model combinations.
From the description of the above embodiments, it will be apparent to those skilled in the art that the above embodiments may be implemented in software, or may be implemented by means of software plus a necessary general hardware platform. With such understanding, the technical solutions of the foregoing embodiments may be embodied in a software product, where the software product may be stored in a nonvolatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and include several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to perform the methods of the embodiments of the present invention.
The invention is applicable to image classification. Deep Neural Networks (DNNs) have enjoyed great success in a variety of applications. However, their success depends to a large extent on the large amount of computing and storage resources that are often not available in embedded devices and mobile devices, i.e. under limited computing resource conditions, some large models with high requirements on computing resources are not deployable. In order to keep the available small models satisfactory under the condition of limited computational resources, some knowledge distillation methods (learning methods for allowing a student model to learn the learned knowledge of a teacher model and the knowledge itself at the same time) are widely studied,
An image classification method to which the method of the present invention is applied is described as follows:
1. first, a data set required for the image classification model is prepared, and the preparation of the data set is a process well known to those skilled in the art, and will not be described herein.
2. And a teacher model with larger model volume and excellent performance is trained by using the cross entropy as a loss function.
After the teacher model is trained, parameters are not updated in the subsequent training process. The student model training method, the core content of the method, is described below.
3 Training the student model;
and 3.1, respectively constructing attribute graphs of the teacher model and the student model. Inputting the image into a teacher model and a student model to obtain a feature representation f t,fs (wherein D t and d s are dimensions of feature representations output by the teacher model and the student model, respectively) and the result p t,ps of the classification prediction, and then construct an attribute graph G t={At,Ft}、Gs={As,Fs for the teacher model and the student model, respectively, where each node represents an instance, and the node attributes represent learned feature representations, where A t,As is an adjacency matrix of the attribute map constructed based on p t,ps and is based on the formula/>Construction, wherein/>Is a graph construction function based on KNN. During the whole training process, G t is fixed, and the attribute and structure of G s are dynamically changed.
3.2, Aggregating node attributes and Topology information of neighborhood samples in the attribute map using Topology-adaptive graph convolutional neural network (Topology ADAPTIVE GRAPH Convolution Network, TAGCN) to extract overall knowledge, thereby obtaining a map-based embedded vector.
As shown in the following formulas (1) and (2), learning the attribute information and the topology information of the nodes by using TAGCN to learn simultaneously to extract the knowledge of the integrity, and obtaining the teacher model and the representation of the student model based on the graphAnd/>
Where Θ s l and Θ t l are learnable parameters and g t,gs is the dimension represented by the above graph. D t,Ds is a diagonal matrix of attribute maps, i.e., shown in equation (3)
And 3.3, estimating the mutual information of the graph embedded representation of the maximum chemical model and the teacher model by using InfoNCE, and accelerating the training efficiency by using a characteristic memorization storage technology.
In order for the student model to learn knowledge of the teacher model's model integrity as much as possible, it is desirable to maximize the degree of similarity between H t and H s, many existing vector-wise similarity metrics (e.g., cosine similarity, euclidean distance) are not suitable for distillation of the integrity knowledge, they are often limited by differences in characterization ability due to structural differences between the teacher model and the student model, and direct alignment of Ht and Hs may result in too refined knowledge learned. To overcome the above limitation, mutual information is used to measure the degree of similarity of student models distilling information from teacher models, i.e., maximizing the mutual information of Ht and Hs, as shown in the following equation (4).
I (-) represents the mutual information measurement of two random variables, and is inspired by the research work of some recent mutual information estimation methods, infoNCE is adopted to estimate the mutual information, and the relation between infoNCE and the mutual information is as shown in the following formula (5):
Where f (·) is a vector-wise similarity measure function and h t i,hs i is the graph-based characterization of sample i learned in teacher and student models.
The student model needs to learn knowledge (such as label information in data) besides learning the teacher model, and cross entropy is a loss function in a common classification task, and the final loss function is shown in the following formula (6).
Where β is the weight of the linear combination.
Because infoNCE requires the use of all the samples in the dataset as negative samples, the calculation of distillation losses to calculate the integrity is prohibitively expensive for larger-scale datasets. To avoid computing the characterization of the sample during the training process, feature memorisation storage techniques have been widely used. In the method of the embodiment, G t and Gs is built in a randomly selected mini-Batch, and the overall knowledge of the graph-based representations H t and H s are presented in different attribute graphs, which cannot be stored using the feature-memory storage technique, and therefore, only the feature-memory storage technique is used to store F t,Fs, respectively. The final approximated overall distillation loss is defined as shown in equation (7):
4. and deploying the model, namely deploying the trained student model under the required scene. The method can be realized under limited calculation conditions, and the student model with better performance is used for classifying the images.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (2)

1. An overall knowledge distillation method based on a graph neural network applied to image classification comprises the following steps:
Step 1, respectively constructing attribute graphs of a teacher model and a student model; inputting the image into teacher model and student model to obtain feature representation f t,fs and classification prediction result p t,ps, wherein D t and d s are dimensions of the feature representations output by the teacher model and the student model, respectively; then respectively constructing an attribute graph G t={At,Ft}、Gs={As,Fs for the teacher model and the student model, wherein each node represents an instance, the node attribute represents the learned feature representation, and the node attribute represents the learned feature representation, whereinA t,As is an adjacency matrix of the attribute map constructed based on p t,ps and is based on the formula/>Construction, wherein/>Is a graph construction function based on K nearest neighbors (Knearest neighbors, KNN);
in the whole training process, gt is fixed, and the attribute and structure of Gs are dynamically changed;
the attribute map defined above has the following characteristics: firstly, compared with a full connected graph among examples constructed by the prior relational knowledge decomposition method, a KNN graph filters out least relevant sample pairs; second, since edges are constructed based on predicted probabilities, the graph can model inter-class and intra-class information; finally, knowledge on individuals and knowledge on relationships can be extracted from the attribute map in a combined manner very efficiently by using the graph neural network;
Step 2, node attributes and topology information of neighborhood samples in the graph roll-up neural network aggregate attribute graph are used for extracting overall knowledge, and therefore an embedded vector based on the graph is obtained;
as shown in the following formulas (1) and (2), the attribute information and Topology information of the nodes are simultaneously learned by using Topology-adaptive graph convolutional neural networks (Topology ADAPTIVE GRAPH Convolution Network, TAGCN) to extract the knowledge of the integrity, and a teacher model and a graph-based representation of a student model are obtained And/>
Where Θ s 1 and Θ t 1 are learnable parameters, g t,gs is the dimension represented by the above graph; d t,Ds is a diagonal matrix of attribute maps, i.e., shown in equation (3)
Step 3, embedding the graph of the maximum chemical model and the teacher model into the represented mutual information, and accelerating the training efficiency by using a characteristic memorization storage technology;
The mutual information is used to measure the similarity of the student model distillation information from the teacher model, i.e. to maximize the mutual information of Ht and Hs, as shown in the following equation (4):
l (·) represents the mutual information metric of two random variables, infoNCE is used to estimate the mutual information, and the relationship between infoNCE and the mutual information is shown in formula (5):
Wherein f (·) is a vector-wise similarity measure function, h t i,hs i is a graph-based representation of sample i learned in teacher and student models;
The student model needs to learn knowledge itself in addition to learning the teacher model's knowledge, and the final loss function is as follows formula (6):
Where β is the weight of the linear combination;
G tand Gs is respectively constructed in randomly selected mini-Batch, the overall knowledge of the graph-based reactions of the characterizations H t and H s is presented in different attribute graphs, and the characteristic memorization storage technology is respectively used for storing F t,Fs; the final approximated overall distillation loss is defined as shown in equation (7):
2. the method for global knowledge distillation based on graph neural network for image classification according to claim 1, wherein in step 3, the mutual information of the graph embedded representation of the maximum chemogenetic model and the teacher model is maximized by using infoNCE, and the relationship between infoNCE and the mutual information is as shown in formula (8):
Where f (·) is a vector-wise similarity measure function and h t i,hs i is the graph-based characterization of sample i learned in teacher and student models.
CN202110982472.1A 2021-08-25 2021-08-25 Integral knowledge distillation method and system based on graph neural network Active CN113887698B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110982472.1A CN113887698B (en) 2021-08-25 2021-08-25 Integral knowledge distillation method and system based on graph neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110982472.1A CN113887698B (en) 2021-08-25 2021-08-25 Integral knowledge distillation method and system based on graph neural network

Publications (2)

Publication Number Publication Date
CN113887698A CN113887698A (en) 2022-01-04
CN113887698B true CN113887698B (en) 2024-06-14

Family

ID=79011512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110982472.1A Active CN113887698B (en) 2021-08-25 2021-08-25 Integral knowledge distillation method and system based on graph neural network

Country Status (1)

Country Link
CN (1) CN113887698B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115101119B (en) * 2022-06-27 2024-05-17 山东大学 Isochrom function prediction system based on network embedding
CN117058437B (en) * 2023-06-16 2024-03-08 江苏大学 Flower classification method, system, equipment and medium based on knowledge distillation

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112116030A (en) * 2020-10-13 2020-12-22 浙江大学 Image classification method based on vector standardization and knowledge distillation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3076424A1 (en) * 2019-03-22 2020-09-22 Royal Bank Of Canada System and method for knowledge distillation between neural networks
CN110472730A (en) * 2019-08-07 2019-11-19 交叉信息核心技术研究院(西安)有限公司 A kind of distillation training method and the scalable dynamic prediction method certainly of convolutional neural networks
CN112861936B (en) * 2021-01-26 2023-06-02 北京邮电大学 Graph node classification method and device based on graph neural network knowledge distillation
CN113095480A (en) * 2021-03-24 2021-07-09 重庆邮电大学 Interpretable graph neural network representation method based on knowledge distillation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112116030A (en) * 2020-10-13 2020-12-22 浙江大学 Image classification method based on vector standardization and knowledge distillation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度特征蒸馏的人脸识别;葛仕明;赵胜伟;刘文瑜;李晨钰;;北京交通大学学报;20171215(第06期);全文 *

Also Published As

Publication number Publication date
CN113887698A (en) 2022-01-04

Similar Documents

Publication Publication Date Title
CN109271522B (en) Comment emotion classification method and system based on deep hybrid model transfer learning
CN110942091B (en) Semi-supervised few-sample image classification method for searching reliable abnormal data center
CN110110100A (en) Across the media Hash search methods of discrete supervision decomposed based on Harmonious Matrix
CN109063112B (en) Rapid image retrieval method, model and model construction method based on multitask learning deep semantic hash
CN111950594A (en) Unsupervised graph representation learning method and unsupervised graph representation learning device on large-scale attribute graph based on sub-graph sampling
CN111310074B (en) Method and device for optimizing labels of interest points, electronic equipment and computer readable medium
CN109753589A (en) A kind of figure method for visualizing based on figure convolutional network
CN113255892B (en) Decoupled network structure searching method, device and readable storage medium
CN113887698B (en) Integral knowledge distillation method and system based on graph neural network
CN114299362B (en) A small sample image classification method based on k-means clustering
CN117992805B (en) Zero sample cross-modal retrieval method and system based on tensor product graph fusion diffusion
CN115293919A (en) Graph neural network prediction method and system oriented to social network distribution generalization
CN115761275A (en) Unsupervised community discovery method and system based on graph neural network
CN116304367B (en) Algorithm and device for obtaining communities based on graph self-encoder self-supervision training
CN111241326A (en) Image visual relation referring and positioning method based on attention pyramid network
CN117349494A (en) Graph classification method, system, medium and equipment for space graph convolution neural network
CN112115971B (en) Method and system for carrying out student portrait based on heterogeneous academic network
CN109815335A (en) A kind of paper domain classification method suitable for document network
CN113434815A (en) Community detection method based on similar and dissimilar constraint semi-supervised nonnegative matrix factorization
CN113515519A (en) Method, device and equipment for training graph structure estimation model and storage medium
Saha et al. Novel randomized feature selection algorithms
CN111914108A (en) Discrete supervision cross-modal Hash retrieval method based on semantic preservation
CN115408531A (en) Knowledge graph reasoning method with induction capability
Zhang et al. Color clustering using self-organizing maps
Dennis et al. Autoencoder-enhanced sum-product networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant