CN112749737A

CN112749737A - Image classification method and device, electronic equipment and storage medium

Info

Publication number: CN112749737A
Application number: CN202011614525.6A
Authority: CN
Inventors: 任亮; 傅雨梅; 黄新涛
Original assignee: Beijing Zhiyin Intelligent Technology Co ltd
Current assignee: Beijing Zhiyin Intelligent Technology Co ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-05-04

Abstract

The application provides an image classification method and device, electronic equipment and a storage medium, wherein the method comprises the following steps: inputting the image to be tested into the trained feature extraction model to obtain a visual feature vector output by the feature extraction model; performing dot product calculation on the visual feature vector and the weight vectors of different classifiers to obtain the similarity between the image to be tested and the corresponding classes of the different classifiers; the weight vectors of the classifiers are obtained by calculation through a trained graph convolution network model, and the graph convolution network model is generated based on a relation enhanced knowledge graph and a graph attention machine; and determining the category of the image to be tested according to the similarity between the image to be tested and the corresponding categories of different classifiers. According to the technical scheme, the classification of the images can be realized.

Description

Image classification method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image classification method and apparatus, an electronic device, and a computer-readable storage medium.

Background

With the development of deep learning technology in recent years, the performance of a method based on supervised learning is greatly improved, and particularly in the image recognition direction, the accuracy rate of the method exceeds the recognition capability of human beings. To obtain the classification accuracy, a large number of labeled training samples need to be provided for each target class in the dataset. Because the classifier learned by the supervised method cannot be well migrated to the image sets of other categories, a large amount of manpower-consuming labeled data sets need to be manufactured again every new scene, and the development of supervised learning is greatly hindered by the requirement on the labeled data sets.

At present, GCNZ and GPM model technologies are mainly used, word embedding vectors are used in the two models, and knowledge transfer from a seen category to an unseen category is achieved by constructing a knowledge graph based on the inter-class relation.

However, in the prior art, the construction mode of the knowledge graph is still too single, both GCNZ and GPM only utilize the superior and inferior relations between categories in a WordNet relationship network (the relation established through parent-child topological relation does not show the connection relation of the most similar two categories, but indirectly establishes the relation through parent node connection), and when similarity calculation is carried out, the weight is set through the path length between two nodes, the relation between brother nodes is not considered, and the similarity between the nodes is not necessarily in a negative correlation with the distance.

The zero sample learning is used for solving the learning task lacking the label data, but the zero sample image classification method still has the problem that the external rich prior knowledge cannot be effectively utilized.

Disclosure of Invention

An embodiment of the present application provides an image classification method and apparatus, an electronic device, and a storage medium, which are used to implement migration of semantic knowledge between classes to solve the problem of zero sample learning.

A first aspect of an embodiment of the present application provides an image processing method, where the method includes:

inputting an image to be tested into a trained feature extraction model to obtain a visual feature vector output by the feature extraction model;

performing dot product calculation on the visual feature vector and weight vectors of different classifiers to obtain similarity between the image to be tested and corresponding classes of the different classifiers; wherein the weight vectors of the plurality of classifiers are calculated through a trained graph volume network model, and the graph volume network model is generated based on a relation enhanced knowledge graph and a graph attention machine;

and determining the category to which the image to be tested belongs according to the similarity between the image to be tested and the corresponding categories of different classifiers.

In an embodiment, before the dot product calculating the visual feature vector with weight vectors of different classifiers, the method further comprises:

acquiring an image set to be tested;

extracting a visual characteristic vector of each image to be tested in the image set to be tested;

and inputting the visual characteristic vector of the image to be tested into the trained graph convolution network model to obtain classifiers of multiple categories.

In one embodiment, before the inputting the visual feature vector of the image to be tested into the trained graph convolution network model, the method further comprises:

acquiring a first deep neural network and a second deep neural network;

superposing the first deep neural network and the second deep neural network to obtain the graph convolution network;

and training the graph convolution network to obtain the graph convolution network model.

In one embodiment, the first deep neural network is obtained by:

acquiring a relation enhanced knowledge graph;

and constructing the first graph convolution network according to the relation enhanced knowledge graph.

In one embodiment, the second deep neural network is obtained by:

calculating the visual attribute characteristics of each type of sample in the sample training set through a characteristic extraction network;

calculating an adjacency matrix with attention weight according to the visual attribute characteristics of each type of sample;

and constructing the second graph convolution network according to the adjacency matrix.

In an embodiment, the training the graph convolution network to obtain the graph convolution network model includes:

inputting the visual attribute characteristics of each type of sample into the graph convolution network to obtain a plurality of types of prediction classifiers;

calculating the mean square error loss values between the weight vectors of the prediction classifiers of the multiple classes and the known weight vector of the real classifier, and enabling the mean square error loss values to meet preset conditions by reversely adjusting network parameters of the graph convolution network.

In one embodiment, before the inputting the image to be tested into the trained feature extraction model, the method further comprises:

and adjusting parameters of the pre-trained ResNet network according to the sample training set to obtain the feature extraction model.

A second aspect of the embodiments of the present application provides an image classification apparatus, including:

the characteristic extraction module is used for inputting the image to be tested into the trained characteristic extraction model to obtain a visual characteristic vector output by the characteristic extraction model;

the similarity calculation module is used for performing dot product calculation on the visual feature vectors and the weight vectors of different classifiers to obtain the similarity between the image to be tested and the corresponding classes of the different classifiers; wherein the weight vectors of the plurality of classifiers are calculated through a trained graph volume network model, and the graph volume network model is generated based on a relation enhanced knowledge graph and a graph attention machine;

and the category determining module is used for determining the category to which the image to be tested belongs according to the similarity between the image to be tested and the categories corresponding to different classifiers.

A third aspect of the embodiments of the present application provides an electronic device, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the image classification method of the first aspect of the embodiments of the present application and any embodiment thereof.

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium, which stores a computer program, where the computer program is executable by a processor to perform the image classification method described above.

According to the technical scheme provided by the embodiment of the application, the attention mechanism based on the graph and the relation enhancement type knowledge graph are introduced to strengthen the relation between the classes, the graph convolution network model based on the visual attribute weighting and the graph attention mechanism weighting is built, the graph convolution network model replaces the original word embedding vector by adding the visual characteristic vector, and the new relation enhancement type knowledge graph is built to increase the relation between the leaf nodes in the knowledge graph, so that the graph convolution network model can learn the proper weight parameters by self, the transfer of semantic knowledge between the classes is better realized, and the classification of the images is realized by utilizing the graph convolution network model.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating an image classification method according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating an image classification method according to an embodiment of the present disclosure;

FIG. 4 is a detailed flowchart of step S210 in the corresponding embodiment of FIG. 2;

FIG. 5 is a diagram of the ResNet algorithm;

FIG. 6 is a detailed flowchart of step S220 in the corresponding embodiment of FIG. 2;

FIG. 7 is a detailed flowchart of step S223 in the corresponding embodiment of FIG. 6;

fig. 8 is a detailed flowchart of the method for acquiring the first deep neural network in step S2231 in the corresponding embodiment of fig. 7;

FIG. 9 is a schematic diagram of a relationship-enhanced knowledge-graph;

fig. 10 is a detailed flowchart of the method for acquiring the second deep neural network in step S2231 in the corresponding embodiment of fig. 7;

FIG. 11 is a detailed flowchart of step S2233 in the corresponding embodiment of FIG. 7;

fig. 12 is a schematic structural diagram of an image classification processing apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

Like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

Please refer to fig. 1, which is a schematic structural diagram of an electronic device 100 according to an embodiment of the present disclosure. The electronic device 100 includes: one or more processors 120, and one or more memories 104 storing instructions executable by the processors 120. Wherein the processor 120 is configured to execute an image classification method provided by the following embodiments of the present application.

The processor 120 may be a gateway, or may be an intelligent terminal, or may be a device including a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other form of processing unit having data processing capability and/or instruction execution capability, and may process data of other components in the electronic device 100, and may control other components in the electronic device 100 to perform desired functions.

The memory 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 120 to implement the image classification method described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

In one embodiment, the electronic device 100 shown in FIG. 1 may also include an input device 106, an output device 108, and a data acquisition device 110, which are interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are exemplary only, and not limiting, and the electronic device 100 may have other components and structures as desired.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like. The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like. The data acquisition device 110 may acquire an image of a subject and store the acquired image in the memory 104 for use by other components. Illustratively, the data acquisition device 110 may be a camera.

In an embodiment, the devices in the example electronic device 100 for implementing the image classification method of the embodiment of the present application may be integrally disposed or may be separately disposed, such as the processor 102, the memory 104, the input device 106 and the output device 108 being integrally disposed, and the data acquisition device 110 being separately disposed.

In an embodiment, the example electronic device 100 for implementing the image classification method of the embodiment of the present application may be implemented as a smart terminal, such as a smart phone, a tablet computer, a smart watch, an in-vehicle device, and the like.

Referring to fig. 2, a flowchart of an image classification method according to an embodiment of the present application is shown, and the method can be executed by the electronic device 100 shown in fig. 1. With reference to fig. 2 and the schematic diagram of the principle of the image classification method in fig. 3, the image classification method in the embodiment of the present application can implement the migration of semantic knowledge between categories. The method comprises the following steps:

step S210: and inputting the image to be tested into the trained feature extraction model to obtain the visual feature vector output by the feature extraction model.

The image to be tested refers to an image of an unknown class, and the class to which the image belongs needs to be determined. The feature extraction model can be obtained by training in advance, as shown below. The visual feature vector is used for representing the visual features and semantic attributes of the image to be tested.

Step S220: and performing dot product calculation on the visual feature vector and the weight vectors of different classifiers to obtain the similarity between the image to be tested and the corresponding classes of the different classifiers.

The weight vectors of the classifiers are calculated through a trained graph volume network model, and the graph volume network model is generated based on a relation enhanced knowledge graph and a graph attention machine, which is specifically referred to below.

And performing dot product calculation on the visual feature vector of the image to be tested extracted by the trained feature extraction model and the weight vector of each classifier to obtain the possibility that the image belongs to each class, namely obtaining the similarity of the image to be tested and the corresponding classes of different classifiers.

Step S230: and determining the category to which the image to be tested belongs according to the similarity between the image to be tested and the corresponding categories of different classifiers.

After the probability of the image belonging to each class is obtained, the class with the highest probability is taken as the prediction class of the image, namely, the class to which the image to be detected belongs is determined, and the zero sample image is classified.

In an embodiment, as shown in fig. 4, before the step S210, the method provided in the embodiment of the present application further includes:

step S211: and adjusting parameters of the pre-trained ResNet network according to the sample training set to obtain the feature extraction model.

First, a sample training set D { (x) is obtained₁，y₁)，(x₂，y₂)，…(x_m，y_m) And (3) extracting the visual feature vector by using a ResNet algorithm by using the feature extraction model, finishing pre-training on ImageNet, and then finely adjusting on a sample training set, wherein the finely adjusting process adopts the following steps or methods:

the network accuracy is improved when the number of network layers is increased, but as the network depth is continuously deepened, the network degradation problem is caused, so that the problem needs to be processed, the internal characteristics of the network reach the optimal condition in a certain layer, and the rest network layers should automatically learn the identity mapping mode, so that the output of a shallow network is copied and added to the output of a deep layer by adopting a residual error calculation mode, so that when the network characteristics are mature enough, the task of mapping in the network can be completed by the newly added identity mapping, and the original shallow network only needs to be converted into 0 from the identity mapping, please refer to fig. 5. The calculation formula (1) is:

F(x)＝H(x)-x (1)

where x is the input of the residual block, h (x) is the output of the residual block, and f (x) is the residual map to be learned.

The residual module transforms the task from mapping from x to h (x) to differencing between x and h (x) from x. Therefore, the residual module can obviously reduce the value of the parameter in the module, so that the parameter in the network has more sensitive response capability to the loss value of backward propagation, and simultaneously, because the branch with constant mapping exists in the forward process, the gradient propagation has more simple and convenient paths in the backward propagation process. In conclusion, the most important role of the residual module is to change the way of information transmission in the forward direction and the backward direction, thereby greatly promoting the optimization of the network.

And continuously fine-tuning the sample training set based on the optimization process of the ResNet algorithm to obtain a feature extraction model.

In an embodiment, as shown in fig. 6, before the step S220, the method provided in the embodiment of the present application further includes: step S221 to step S223 below.

Step S221: and acquiring an image set to be tested. The set of images to be tested includes a number of images to be tested.

Step S222: and extracting the visual characteristic vector of each image to be tested in the image set to be tested.

Specifically, each image to be tested is input into the trained feature extraction model, and a visual feature vector output by the feature extraction model is obtained.

Step S223: and inputting the visual characteristic vector of the image to be tested into the trained graph convolution network model to obtain classifiers of multiple categories.

One category may correspond to one classifier, for example, the image containing the vehicle may be one category, and the image containing the plant may be one category.

The graph convolution network model can be obtained by training in advance, wherein the specific training process is as follows.

In an embodiment, as shown in fig. 7, before the step S223, the method provided in the embodiment of the present application further includes: step S2231 to step S2233.

Step S2231: a first deep neural network and a second deep neural network are obtained.

Step S2232: and superposing the first deep neural network and the second deep neural network to obtain the graph convolution network.

Step S2233: and training the graph convolution network to obtain the graph convolution network model.

In an embodiment, as shown in fig. 8, the method for acquiring the first deep neural network in step S2231 specifically includes:

step S22311: and acquiring the relation enhanced knowledge graph.

Wherein the relationship-enhanced knowledge-graph may be as shown in fig. 9. The superior-subordinate relation among the classes in the WordNet relation network is utilized, the connection mode of parent-child relation is improved, all child nodes with the same parent node are connected pairwise, and the brother nodes of most leaf nodes in the WordNet are small in number, so that the network complexity cannot be greatly increased, and therefore the relation enhanced knowledge graph is constructed, and the similarity relation between the classes is easier to learn.

Step S22312: and constructing the first graph convolution network according to the relation enhanced knowledge graph.

After the relationship-enhanced knowledge graph is constructed, different weights need to be assigned to edges between each node and different neighboring nodes thereof. The method comprises the following steps of constructing a first Graph convolution Network GCN1 (GCN 1 for short) based on a relation enhanced knowledge Graph, wherein the input of the Network is visual feature vectors of various categories in a sample training set, and a calculation formula (2) is as follows:

wherein x' is a new feature which is output by the network and is fused with the adjacent node relationship;

a parameter fixed matrix normalized by a relation enhanced adjacency matrix A row;

x is the input class characteristics of the network and W is the network weight to be learned.

And performing edge weighting on the first Graph convolution Network GCN1 (GCN for short) constructed based on the relation enhanced knowledge Graph. In the present application, the arithmetic mean of the visual feature vectors of the categories corresponding to all child nodes is used as the visual feature vector of the category corresponding to the parent node. The calculation formula (3) is:

wherein, V_fatherA feature visual vector for a category corresponding to a parent node without image data;

N_sonsthe number of child nodes that are the parent node;

v_sonand the visual feature vector of the category corresponding to a certain child node.

The visual feature vectors of all classes in the sample training set can be obtained through the above calculation, and then the L2 norm of the two classes of visual feature vectors is calculated as the reference of the similarity, and the calculation formula (4) is:

wherein v is_iThe visual characteristic vector is the category of the ith type in the image to be tested;

v_jthe visual characteristic vector of the jth class in the image to be tested is adjacent to the ith class and the jth class;

k is the dimensionality of the visual feature vector of the image category to be tested;

the L2 norm is referenced for similarity between class i and class j.

The L2 norm is the sum of the squares of the elements of the vector, followed by the square root. The L2 norm can prevent overfitting and improve the generalization capability of the model. The overfitting means that the error of the model during training is small, but the error is large during testing, namely the model is complex enough to fit all training samples, the test ability is strong, but the practical application ability is poor when a new sample is actually predicted.

Constructing new weighted neighbors based on the neighbor matrix A using the L2 norm as weightsMatrix A'₁And then with a weighted neighbor matrix A'₁Carrying out normalization processing, wherein the calculation formula (5) is as follows:

generated by the calculation of formula (5)

Constructing a new adjacency matrix

And continuing to perform forward calculation of the first Graph convolution Network GCN1 (GCN for short), namely performing loop calculation according to the steps from the calculation formula (2) to the calculation formula (5) to construct a first Graph convolution Network GCN1, thereby obtaining a first deep neural Network.

In an embodiment, as shown in fig. 10, the method for acquiring the second deep neural network in step S2231 specifically includes:

step S22313: and calculating the visual attribute characteristics of each type of sample in the sample training set through a characteristic extraction network.

Extracting the visual characteristic vector of the image to be tested through the trained characteristic extraction model, then taking the average value of all visual adjustment vectors under the category to represent the visual characteristic vector of the category, and using the average value for the subsequent calculation, wherein the calculation formula (6) is as follows:

wherein, V_cA category visual feature vector representing the category;

V_ivisual feature vector representing the ith image in the category

N_cIndicating the total number of images contained in the category.

An attribute is an attribute that allows attributes that are not directly visible but are related to a visual feature vector, such as the natural habitat of an animal, to be named, such as the color of an object or the presence or absence of a certain body part. After the attribute of each category of the image to be tested is obtained, a semantic attribute vector can be constructed according to whether the image to be tested has the attribute or the strength of the image to be tested on the attribute for subsequent calculation.

Step S22314: and calculating an adjacency matrix with attention weight according to the visual attribute characteristics of each type of sample.

Step S22315: and constructing the second graph convolution network according to the adjacent matrixes with attention weights.

In this step, attention is drawn to the method of solving the sequential task, facilitating the handling of variable number of inputs and focusing on making decisions with the most relevant parts. When a sufficient amount of information is not available, a learnable linear transformation mechanism is needed to transform the input features of the image to be tested into high-dimensional features. The calculation formula (7) of the linear transformation mechanism is:

wherein e is_ijThe importance of the characteristics of the jth node to the ith node;

h is the feature set of the input of the attention network layer;

w is a shared weight matrix for linear transformation;

a is each node attention coefficient.

In order to make the comparison between the coefficients and different adjacent nodes possible, a standardized calculation is required for many nodes. The normalized calculation formula (8) is:

alpha is obtained by using complete single-layer graph attention network layer calculation_ijConstructing a new adjacency matrix with attention weight

The new attention weighted adjacency matrix

And the forward calculation of a second Graph volume Network GCN2 (GCN 2) is completed by fusing the Graph volume Network GCN1 (GCN), namely, the steps from the calculation formula (2) to the calculation formula (8) are circularly calculated to construct a second Graph volume Network GCN2, so that a second deep neural Network is obtained.

Superposing the first deep neural network and the second deep neural network to obtain a graph convolution network; at this time, the graph convolution network can capture the similarity degree between two nodes in the training process. And training the graph convolution network to obtain a graph convolution network model.

How to train the graph-convolution network to obtain the graph-convolution network model is described in detail below. In an embodiment, as shown in fig. 11, the step S2233 specifically includes: step S22331-step S22332.

Step S22331: and inputting the visual attribute characteristics of each type of sample into the graph convolution network to obtain a plurality of types of prediction classifiers.

The prediction classifier is obtained by inputting the visual attribute characteristics of each type of sample in the sample training set into the graph convolution network. Because the graph-convolution network is not trained, the prediction classifier may not be accurate.

Step S22332: calculating the mean square error loss values between the weight vectors of the prediction classifiers of the multiple classes and the known weight vector of the real classifier, and enabling the mean square error loss values to meet preset conditions by reversely adjusting network parameters of the graph convolution network.

Training set of samples D { (x)₁，y₁)，(x₂，y₂)，…(x_m，y_m) Sending the visual attribute characteristics of each type of sample into a chartIn the product network, calculating to obtain a plurality of categories of prediction classifiers, calculating mean square error loss values between the weight vectors of the prediction classifiers of the categories and the weight vectors of the known real classifiers, calculating gradients in a reverse direction, executing one-time updating operation, judging whether training is finished or not by reversely adjusting network parameters of the graph convolution network according to the loss values, directly outputting results if the training is finished, and continuing to execute the step of 'sending visual attribute characteristics of each type of samples into the graph convolution network, calculating to obtain the prediction classifiers of the categories', and training to enable the mean square error loss values to meet preset conditions.

In order to evaluate the accuracy of the trained image convolution network model, a test image set of a known class can be obtained, according to the method in the application, the image convolution network model is used for predicting the images of the known classes in the test image set, and the accuracy of class prediction of the test image set is calculated. The results of the experiments are shown in tables 1, 2 and 3.

The evaluation index of the experiment is the hit rate of Top-k prediction, namely, whether the real class labels in the test image set are in k maximum probability class sets predicted by the graph convolution network model or not is judged, and k is 1, 2, 5, 10 or 20.

TABLE 12-hoss test image set

TABLE 23-hoss test image set

TABLE 3all test image set

From the experimental effect, the graph convolution network model of the application has great differences in evaluation indexes of 3 test image sets (see tables 1, 2 and 3) and 5 different k values of ImageNet. For example, on Top-1 accuracy evaluation indexes of two test image sets, namely 3-hops and all, the graph convolution network model achieves relative improvement of more than 40%, and the graph convolution network model based on the visual attribute weighting and the graph attention mechanism weighting has effectiveness and great potential in knowledge transfer, and meanwhile has great competitiveness on the ImageNet data set.

The following are embodiments of the apparatus of the present application that can be used to perform embodiments of the image classification method of the present application. For details that are not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the image classification method of the present application.

Fig. 12 is a schematic structural diagram of an image classification processing apparatus according to an embodiment of the present application. As shown in fig. 12, the apparatus includes: a feature extraction module 300, a similarity calculation module 400, a category determination module 500.

The feature extraction module 300 is configured to input the image to be tested into the trained feature extraction model, and obtain a visual feature vector output by the feature extraction model.

The similarity calculation module 400 is configured to perform dot product calculation on the visual feature vector and weight vectors of different classifiers to obtain similarities between the to-be-tested image and corresponding classes of the different classifiers; wherein the weight vectors of the plurality of classifiers are calculated through a trained graph volume network model, and the graph volume network model is generated based on a relation enhanced knowledge graph and a graph attention machine.

The category determining module 500 is configured to determine a category to which the image to be tested belongs according to similarity between the image to be tested and corresponding categories of different classifiers.

The implementation process of the functions and actions of each module in the device is specifically detailed in the implementation process of the corresponding step in the image classification method, and is not described herein again.

In the embodiments provided in the present application, the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. An image classification method, comprising:

2. The method of claim 1, wherein prior to the calculating the dot product of the visual feature vector and the weight vectors of the different classifiers, the method further comprises:

acquiring an image set to be tested;

3. The method of claim 2, wherein prior to said inputting visual feature vectors of the image to be tested into the trained atlas network model, the method further comprises:

acquiring a first deep neural network and a second deep neural network;

4. The method of claim 3, wherein the first deep neural network is obtained by:

acquiring a relation enhanced knowledge graph;

5. The method of claim 3, wherein the second deep neural network is obtained by:

and constructing the second graph convolution network according to the adjacent matrixes with attention weights.

6. The method of claim 5, wherein the training the graph volume network to obtain the graph volume network model comprises:

7. The method of claim 1, wherein prior to said inputting the image to be tested into the trained feature extraction model, the method further comprises:

8. An image classification apparatus, comprising:

9. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the image classification method of any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the storage medium stores a computer program executable by a processor to perform the image classification method of any one of claims 1 to 7.