CN110472688A - The method and device of iamge description, the training method of image description model and device - Google Patents
The method and device of iamge description, the training method of image description model and device Download PDFInfo
- Publication number
- CN110472688A CN110472688A CN201910760737.6A CN201910760737A CN110472688A CN 110472688 A CN110472688 A CN 110472688A CN 201910760737 A CN201910760737 A CN 201910760737A CN 110472688 A CN110472688 A CN 110472688A
- Authority
- CN
- China
- Prior art keywords
- image
- feature
- tag
- input
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
This application provides the method and device of iamge description, the training method of image description model and devices, and wherein the method for iamge description includes: to extract characteristics of image to target image;Characteristics of image is subjected to tag extraction, generates corresponding image tag;The characteristics of image of target image and image tag are input to the encoder of image description model, generate the corresponding eigenmatrix of target image;The decoder that eigenmatrix is input to image description model is decoded, obtain the corresponding iamge description sentence of target image, to make image description model during iamge description sentence, it can be reference according to the information of specific reliable image tag, making the iamge description sentence generated includes more key messages, improves the accuracy and reliability of iamge description sentence;And it is instructed since the generation phase in iamge description sentence is used as according to reliable image tag, reduces the generation of redundant data.
Description
Technical field
This application involves technical field of image processing, in particular to a kind of method and device of iamge description, iamge description
The training method and device of model calculate equipment and computer readable storage medium.
Background technique
Iamge description, the purpose is to a segment description text, i.e. picture talk are automatically generated from image.Iamge description
Process is not only wanted to detect the object in image, but also is appreciated that the correlation between object, finally also with reasonable
Language is expressed.
Currently, the information of image mainly uses the feature of convolutional neural networks model extraction in iamge description task
(Feature map) the either character representation of target detection model inspection to objectives.These information are all with matrix
Form exist, therefore to the expression of the same key message may it is different, such as: be equally automobile, the position due to parking or
The angle that person parks is different, so that different using the character representation of convolutional neural networks model and target detection model extraction
Sample, this will increase the redundancy of information and unreliable.
To sum up, the description information that current iamge description task generates image, which is depended primarily on, carries out spy to image itself
Sign is extracted, and generates description information of image according to the feature of extraction.After carrying out feature extraction to image, due to characteristics of image
Redundancy properties cause the key descriptors for ultimately generating image the deviation even description information of image of generation error occur.
Summary of the invention
In view of this, the embodiment of the present application provides a kind of method and device of iamge description, the instruction of image description model
Practice method and device, calculate equipment and computer readable storage medium, to solve technological deficiency existing in the prior art.
The embodiment of the present application provides a kind of method of iamge description, is used for image description model, which comprises
Characteristics of image is extracted to target image;
Described image feature is subjected to tag extraction, generates corresponding image tag;
The characteristics of image of the target image and image tag be input to the encoder of image description model, described in generation
The corresponding eigenmatrix of target image;
The decoder that the eigenmatrix is input to image description model is decoded, the corresponding figure of target image is obtained
As descriptive statement.
Optionally, described image feature is subjected to tag extraction, generates corresponding image tag, comprising:
Described image feature is input to multi-tag disaggregated model and carries out tag extraction, generates at least one corresponding image
Label.
Optionally, the encoder includes a coding layer;
The characteristics of image of the target image and image tag be input to the encoder of image description model, described in generation
The corresponding eigenmatrix of target image, comprising:
The characteristics of image of the target image and image tag are pre-processed respectively, generate pretreatment image feature and
Label vector;
Pretreatment image feature and label vector are input to the coding layer, and the output feature of the coding layer is made
For the corresponding eigenmatrix of the target image.
Optionally, the encoder includes sequentially connected N number of coding layer;
The characteristics of image of the target image and image tag be input to the encoder of image description model, described in generation
The corresponding eigenmatrix of target image, comprising:
S11, the characteristics of image of the target image and image tag are carried out to characteristic processing respectively, generate pretreatment image
Feature and label vector;
S12, pretreatment image feature and label characteristics are input to first coding layer, obtain the defeated of first coding layer
Feature out;
S13, the output feature of (i-1)-th coding layer and label vector are input to i-th of coding layer, obtain i-th of volume
The output feature of code layer;
S14, i is increased to 1 certainly, whether i of the judgement from after increasing 1 is less than N, if so, execution step S13, if it is not, executing step
S15;
S15, using the output feature of n-th coding layer as the corresponding eigenmatrix of the target image.
Optionally, coding layer include: first from attention layer, the first bull attention layer and first feedforward layer;
Pretreatment image feature and label vector are input to i-th of coding layer, the output for obtaining i-th of coding layer is special
Sign, comprising:
First that pretreatment image feature is input to i-th of coding layer is handled from attention layer, generates first certainly
Attention characteristics;
Described first is input to from attention characteristics and the label characteristics the first bull attention of i-th of coding layer
Layer generates the first fusion feature;
By first fusion feature via the first feedforward layer processing, the output feature of i-th of coding layer is generated.
Optionally, the decoder that the eigenmatrix is input to image description model is decoded, obtains target image
Corresponding iamge description sentence, comprising:
Reference decoder vector and eigenmatrix are input to the decoder to be decoded, obtain the decoder output
Decoded vector;
Linearisation and normalized are carried out according to the decoded vector, generates the corresponding iamge description language of target image
Sentence.
The embodiment of the present application provides a kind of training method of image description model, which comprises
Characteristics of image is extracted to sample image;
Described image feature is subjected to tag extraction, generates corresponding image tag;
The characteristics of image of the sample image, image tag and the corresponding sample image of the sample image are described into language
Sentence is input to image description model, is trained to described image descriptive model, until reaching trained stop condition.
Optionally, the trained stop condition include: the decoded vector that generates described image descriptive model with it is preset
Vector verifying collection compares, and the change rate for obtaining the error of the decoded vector is less than stable threshold.
The embodiment of the present application provides a kind of device of iamge description, and described device includes:
Fisrt feature extraction module is configured as extracting characteristics of image to target image;
First tag extraction module is configured as described image feature carrying out tag extraction, generates corresponding image mark
Label;
Coding module is configured as the characteristics of image of the target image and image tag being input to image description model
Encoder, generate the corresponding eigenmatrix of the target image;
Decoder module, the decoder for being configured as the eigenmatrix being input to image description model are decoded, obtain
To the corresponding iamge description sentence of target image.
The embodiment of the present application provides a kind of training device of image description model, and described device includes:
Second feature extraction module is configured as extracting characteristics of image to sample image;
Second tag extraction module is configured as described image feature carrying out tag extraction, generates corresponding image mark
Label;
Training module is configured as the characteristics of image of the sample image, image tag and the sample image pair
The sample image descriptive statement answered is input to image description model, is trained to described image descriptive model, until reaching instruction
Practice stop condition.
The embodiment of the present application provides a kind of calculating equipment, including memory, processor and storage are on a memory and can
The computer instruction run on a processor, the processor realize image description model as described above when executing described instruction
Training method or iamge description method the step of.
The embodiment of the present application provides a kind of computer readable storage medium, is stored with computer instruction, the instruction quilt
Processor execute when realize image description model as described above training method or iamge description method the step of.
The method and device of iamge description provided by the present application carries out tag extraction by the characteristics of image to target image
Corresponding image tag is generated, the characteristics of image of target image and image tag are input to image description model, obtain target
The corresponding iamge description sentence of image, thus make image description model during iamge description sentence, it can be according to specific
The information of reliable image tag is reference, and making the iamge description sentence generated includes more key messages, improves image and retouches
The accuracy and reliability of predicate sentence;And since the generation phase in iamge description sentence is according to reliable image tag conduct
Guidance, reduces the generation of redundant data.
The method and device of the training of image description model provided by the present application, by the characteristics of image of sample image, image
Label and the corresponding sample image descriptive statement of sample image are input to image description model, instruct to image description model
Practice, until reaching trained stop condition, to obtain the iamge description mould that may be implemented to generate descriptive statement according to target image
Type.
Detailed description of the invention
Fig. 1 be the invention relates to Transformer model configuration diagram;
Fig. 2 is the flow diagram of the method for the iamge description of one embodiment of the application;
Fig. 3 is the flow diagram of the method for the iamge description of one embodiment of the application;
Fig. 4 is the structural schematic diagram of the coding layer of one embodiment of the application;
Fig. 5 is the flow diagram of the method for the iamge description of one embodiment of the application;
Fig. 6 is the model framework schematic diagram of the method for the realization iamge description of one embodiment of the application;
Fig. 7 is the flow diagram of the training method of the image description model of one embodiment of the application;
Fig. 8 is the structural schematic diagram of the device of the iamge description of another embodiment of the application;
Fig. 9 is the structural schematic diagram of the training device of the image description model of another embodiment of the application;
Figure 10 is the structural schematic diagram of the calculating equipment of another embodiment of the application.
Specific embodiment
Many details are explained in the following description in order to fully understand the application.But the application can be with
Much it is different from other way described herein to implement, those skilled in the art can be without prejudice to the application intension the case where
Under do similar popularization, therefore the application is not limited by following public specific implementation.
The term used in this specification one or more embodiment be only merely for for the purpose of describing particular embodiments,
It is not intended to be limiting this specification one or more embodiment.In this specification one or more embodiment and appended claims
The "an" of singular used in book, " described " and "the" are also intended to including most forms, unless context is clearly
Indicate other meanings.It is also understood that term "and/or" used in this specification one or more embodiment refers to and includes
One or more associated any or all of project listed may combine.
It will be appreciated that though may be retouched using term first, second etc. in this specification one or more embodiment
Various information are stated, but these information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other
It opens.For example, first can also be referred to as second, class in the case where not departing from this specification one or more scope of embodiments
As, second can also be referred to as first.Depending on context, word as used in this " if " can be construed to
" ... when " or " when ... " or " in response to determination ".
Firstly, the vocabulary of terms being related to one or more embodiments of the invention explains.
Transformer model: a kind of neural network framework is used for machine translation.Its main thought is by spy to be translated
Sign or vector are encoded by coding layer (encoder) becomes a coding characteristic or vector, then utilizes decoding layer (decoder)
Coding characteristic or vector are decoded, decoded vector is obtained, decoded vector translation is then become into corresponding translation sentence.
Iamge description: a fusion calculation machine vision, the synthtic price index of natural language processing and machine learning, according to image
The natural language sentence that can describe picture material is provided, popular to say, it is exactly that translate a secondary picture be a segment description sentence.
Multi-tag disaggregated model: the text or image given for one, corresponding label may more than one.It utilizes
Multi-tag disaggregated model can predict the corresponding label of given text or image.
In this application, a kind of method and device of iamge description, the training method of image description model and dress are provided
It sets, calculate equipment and computer readable storage medium, be described in detail one by one in the following embodiments.
Firstly, the image description model to the embodiment of the present application is schematically illustrated.Realize the model of iamge description
Can be to be a variety of, such as convolutional neural networks (Convolutional Neural Networks, CNN) model or circulation nerve
Network (Recurrent Neural Networks, RNN) model or Transformer model etc..
Wherein, CNN model generally comprises: input layer, convolutional layer, pond layer and full articulamentum.The mind of one side CNN model
Through the connection between member be it is non-connect entirely, the weight of the connection in another aspect same layer between certain neurons is shared
(i.e. identical).The network structure that its non-full connection and weight are shared is allowed to be more closely similar to biological neural network, reduces net
The complexity of network model reduces the quantity of weight.
RNN model is also known as recurrent neural network, is a kind of neural network with feedback arrangement, output not only with work as
Preceding input is related with the weight of network and related with the input of network before.RNN model crosses over time point by addition
From connection hidden layer, the time is modeled;In other words, the feedback of hidden layer not only enters output end, but also enters
The hidden layer of future time.
The framework of Transformer model includes: encoder (encoder)-decoder (decoder).Encoder is realized
The object statement of input is carried out coding generation coding vector or carries out coding to target image characteristics to generate coding characteristic, decoding
Device, which is realized to be decoded coding vector or coding characteristic, generates corresponding iamge description sentence.
The present embodiment carries out the method for the iamge description of the present embodiment schematical by taking Transformer model as an example
Explanation.It should be noted that the mould of the single model of encoder-decoder architecture or multiple models composition may be implemented in other
The method of the iamge description of the application also may be implemented within the scope of protection of this application in type group.
Fig. 1 shows a kind of framework of Transformer model.Model is divided into encoder and decoder two parts.Coding
Device is superimposed on together by N number of identical coding layer, and each coding layer includes three sublayers: first from attention layer, the first bull
Attention layer and the first feedforward layer.Wherein, the positive integer of N >=1.
Decoder is superimposed on together by M identical decoding layers, and each decoding layer includes three sublayers: implicit bull pays attention to
Power layer, the second bull attention layer and the second feedforward layer.Wherein, the positive integer of M >=1.
In use, in encoder, the characteristics of image of target image and image tag are subjected to characteristic processing respectively,
Pretreatment image feature and label vector are generated, using pretreatment image feature and label vector as the defeated of first coding layer
Enter, obtains the output feature of first coding layer, input of the output feature of each coding layer as next coding layer, finally
Eigenmatrix of the output feature of one coding layer as entire encoder output, is input to each decoding layer of decoder.
In decoder-side, reference vector and eigenmatrix are input to first decoding layer, it is defeated to obtain first decoding layer
Decoded vector out;The decoded vector that eigenmatrix and a upper decoding layer export is input to when the one before decoding layer, is worked as
The decoded vector ... of the one before decoding layer output finally obtains the decoded vector of the last one decoding layer output as decoder
Decoded vector.
The decoded vector of decoder is converted via linear layer and normalization layer (softmax), obtains final mesh
Poster sentence.
It should be noted that iamge description sentence includes multiple images words of description, for decoder, decode every time
An iamge description word is obtained, final object statement is obtained after the completion of decoding.For first figure of iamge description sentence
As words of description, reference decoder vector is preset initial decoded vector;For first image of removing of iamge description sentence
Other iamge description words except words of description, reference decoder vector be a upper iamge description word it is corresponding decode to
Amount.
The embodiment of the present application discloses a kind of method of iamge description, referring to fig. 2, includes the following steps 201~204:
201, characteristics of image is extracted to target image.
Wherein, the method for extracting characteristics of image can be to extract characteristics of image to target image using Feature Selection Model.
Feature Selection Model can be to be a variety of, such as CNN (Convolutional Neural Network, convolutional neural networks) mould
Type, LSTM model etc..
Such as the characteristics of image of Feature Selection Model generation is P*Q*L1 namely characteristics of image is L1, characteristics of image
Having a size of P*Q.Wherein, P*Q is the height * width of characteristics of image.
During extracting characteristics of image, redundant data can be generated.
Redundant data refers to the repeated data generated in iamge description task.In the task of iamge description, such as
The classification of image expression is same, but the feature that Feature Selection Model extracts can difference.In this way, in feature extraction
Feature Selection Model can go out same type of image characteristics extraction in journey, will generate redundant data.
For different images, the feature extracted to same category data is different, and first will increase the difficulty of model learning
And complexity, second due to the difference of feature representation can make real image description there is deviation, especially feature at classification edge,
To be had adverse effect on to iamge description task.
202, described image feature is subjected to tag extraction, generates corresponding image tag.
Specifically, step 202 includes: and described image feature is input to multi-tag disaggregated model to carry out tag extraction, raw
At at least one corresponding image tag.
Such as the image that a children fly a kite on meadow, which is subjected to tag extraction, obtained image
Label includes " children " and " kite " two labels.
It should be noted that comparing for multi-tag disaggregated model compared to target detection model, there is model structure
Simply, training data marks the simple and higher advantage of data rich, model accuracy;Meanwhile multi-tag disaggregated model will scheme
The solidifications such as the object scene as in show, and are more in line with the process that the mankind describe image.
203, the characteristics of image of the target image and image tag are input to the encoder of image description model, generated
The corresponding eigenmatrix of the target image.
Wherein, encoder includes at least one coding layer;
Include the situation of a coding layer for encoder, step 203 includes the following steps S2031~S2032:
S2031, the characteristics of image of target image and image tag are pre-processed respectively, generates pretreatment image feature
And label vector.
Wherein, relative position coding (Positional Encoding) processing is carried out to the characteristics of image of target image, obtained
To pretreatment image feature.Specifically, it is that each input picture feature is added to a feature that relative position coding, which is encoder,
It may thereby determine that the distance between position or the different images feature of each characteristics of image.
Specifically, in the case where the characteristics of image of input includes the two dimensional character of length * width, the pretreatment figure of generation
As feature is still the two dimensional character for including length * width.
For example, be P*Q*L1 with characteristics of image, for the pretreatment image feature (v1, v2 ... vn) of generation, P*Q n,
Each vn is the one-dimensional vector indicated comprising L1 number.
Specifically, for image tag, image tag is subjected to embeding layer (embedding) processing, obtains label vector.
Such as the image tag of piece image can be " apple ", " football ", then label vector is " apple " " football " corresponding one
Dimensional vector.
S2032, pretreatment image feature and label vector are input to coding layer, using the output feature of coding layer as mesh
The corresponding eigenmatrix of logo image.
Step S2032 includes: that pretreatment image feature is input to the first of coding layer to handle from attention layer, raw
At first from attention characteristics;First is input to from attention characteristics and label vector the first bull attention layer of coding layer, it is raw
At the first fusion feature;By the first fusion feature via the first feedforward layer processing, the output feature of coding layer is generated.
For first from attention layer, first can be infused certainly using pretreatment image feature as key-value feature pair
Then feature of anticipating carries out calculating from attention as query feature.
It, can be special from paying attention to by first using label vector as key-value feature pair for the first bull attention layer
Sign is used as query feature.
First can indicate from attention characteristics or the first fusion feature are as follows:
Wherein, dkFor smoothing factor.
Include the situation of multiple coding layers for encoder, referring to Fig. 3, step 203 includes the following steps 301~305:
301, the characteristics of image of the target image and image tag are pre-processed respectively, it is special generates pretreatment image
It seeks peace label vector.
302, pretreatment image feature and label vector are input to first coding layer, obtain the defeated of first coding layer
Feature out.
303, the output feature of (i-1)-th coding layer and label vector are input to i-th of coding layer, obtain i-th of volume
The output feature of code layer.
304, by i from increasing 1, whether i of the judgement from after increasing 1 is less than N, if so, step 303 is executed, if it is not, executing step
305。
305, using the output feature of n-th coding layer as the corresponding eigenmatrix of the target image.
More specifically, referring to fig. 4, coding layer includes: first from before attention layer, the first bull attention layer and first
Present layer.
Step 302 includes: that pretreatment image feature is input to the first of first coding layer from attention layer
Reason generates first from attention characteristics;First is input to from attention characteristics and label vector first bull of first coding layer
Attention layer generates the first fusion feature;By the first fusion feature via the first feedforward layer processing, first coding layer is generated
Export feature.
Step 303 includes: that the output feature of (i-1)-th coding layer is input to the first of i-th of coding layer from attention
Layer is handled, and generates first from attention characteristics;The of i-th of coding layer is input to from attention characteristics and label vector by first
One bull attention layer generates the first fusion feature;By the first fusion feature via the first feedforward layer processing, i-th of volume is generated
The output feature of code layer.
Wherein, the output feature of each coding layer is three-dimensional space matrix, i.e., 3 dimension tensors, dimension is [batch, seq_
Length, hidden_dim], wherein batch is block size;Seq_length is label number or characteristics of image at
Feature map (long * wide) size after reason;Hidden_dim is by the fused label substance of coding layer or characteristics of image
Information.
In addition, the characteristics of image due to extraction includes redundant data, then the pretreatment image feature generated also can include
Redundant data.In an encoding process, the pretreatment image feature of target image can be reset, is reduced using image tag
Redundant data keeps feature statement more acurrate.Such as some specific region in piece image is " flower ", but target image is pre-
Processing characteristics of image be v1, pretreatment image feature label vector u1 corresponding with label " flower " be it is differentiated, due to mark
The label vector for signing " flower " is more accurate, then just directly substituting the pretreatment of target image with the label vector u1 of label " flower "
Characteristics of image v1, to reduce the generation of redundant data.
204, the decoder that the eigenmatrix is input to image description model is decoded, it is corresponding obtains target image
Iamge description sentence.
Specifically, step 204 includes:
S2041, it reference decoder vector and eigenmatrix is input to the decoder is decoded, obtain the decoding
The decoded vector of device output.
It specifically, include M sequentially connected decoding layers for decoder, referring to Fig. 5, step S2041 includes:
501, reference decoder vector and eigenmatrix are input to first decoding layer, obtain the defeated of first decoding layer
Outgoing vector.
502, the output vector of -1 decoding layer of jth and eigenmatrix are input to j-th of decoding layer, obtained j-th
The output vector of decoding layer, wherein 2≤j≤M.
503, by j from increasing 1, whether j of the judgement from after increasing 1 is less than M, if so, step 502 is executed, if it is not, continuing to execute step
Rapid 504.
504, using the output vector of m-th decoding layer as the corresponding decoded vector of target image.
S2042, linearisation and normalized are carried out according to the decoded vector, generates the corresponding image of target image and retouches
Predicate sentence.
Specifically, for decoding every time, linearisation and normalized is carried out according to the decoded vector, generate target figure
As corresponding word, and will be when previous decoded vector is as decoded reference decoder vector next time.Finally, according to target figure
As corresponding multiple words generate iamge description sentence.
Wherein, it is handled by linearisation (linear), decoded vector can be mapped as linear vector.
Normalized can be to be a variety of, and it is preferable to use softmax to be normalized for the present embodiment, so that statistics exists
Statistical probability distribution between [0,1], and according to the corresponding word of decoded vector that determine the probability generates every time.
The method of iamge description provided by the present application carries out tag extraction generation pair by the characteristics of image to target image
The characteristics of image of target image and image tag are input to image description model, obtain target image pair by the image tag answered
The iamge description sentence answered, thus make image description model during iamge description sentence, it can be according to specific reliable
The information of image tag is reference, and making the iamge description sentence generated includes more key messages, improves iamge description sentence
Accuracy and reliability;And it is instructed since the generation phase in iamge description sentence is used as according to reliable image tag,
Reduce the generation of redundant data.
Secondly, the reduction of redundant data, can have following positive influences to the iamge description of the present embodiment:
1) model can be made to be easier to restrain.
2) iamge description becomes comparison controllably (or visualization), can use the legal of middle category control iamge description
The problems such as property.
3) iamge description can be more accurate and reliable, reduces the influence of extraneous data.
In order to further be illustrated to the method for the iamge description of the embodiment of the present application, Fig. 6 shows realization this reality
Apply the specific schematic diagram of model framework of the method for the iamge description of example.
It include three models: Feature Selection Model (CNN), multi-tag disaggregated model and Transformer model in Fig. 6.
Target image in Fig. 6 is a diver in marine diving, and there is a green turtle in lower left.
The method of the present embodiment includes:
1) characteristics of image V is extracted to target image.
2) described image feature is subjected to tag extraction, generates corresponding image tag U.
3) the characteristics of image V of the target image and image tag U are input to the encoder of image description model, generated
The corresponding eigenmatrix of target image.
Specifically, step 3) includes the following steps S11~S15:
S11, the characteristics of image V of the target image and image tag U are pre-processed respectively, generates pretreatment image
Feature v1, v2 ... vn } and label vector { u1, u2 }.
S12, pretreatment image feature { v1, v2 ... vn } and label vector { u1, u2 } are input to first coding layer,
Obtain the output feature of first coding layer.
S13, the output feature of (i-1)-th coding layer and label vector { u1, u2 } are input to i-th of coding layer, obtained
The output feature of i-th of coding layer.
S14, i is increased to 1 certainly, whether i of the judgement from after increasing 1 is less than N, if so, execution step S13, if it is not, executing step
S15。
S15, using the output feature of n-th coding layer as the corresponding eigenmatrix of target image.
4) decoder that the eigenmatrix is input to image description model is decoded, it is corresponding obtains target image
Iamge description sentence.
Specifically, step 4) includes the following steps S21~S24:
S21, reference decoder vector and eigenmatrix are input to first decoding layer, obtain the defeated of first decoding layer
Outgoing vector.
S22, the output vector of -1 decoding layer of jth and eigenmatrix are input to j-th of decoding layer, obtained j-th
The output vector of decoding layer, wherein 2≤j≤M.
S23, j is increased to 1 certainly, whether j of the judgement from after increasing 1 is less than M, if so, execution step S22, if it is not, continuing to execute step
Rapid S24.
S24, using the output vector of m-th decoding layer as the corresponding decoded vector of target image.
S2042, linearisation and normalized are carried out according to the decoded vector, generates the corresponding image of target image and retouches
Predicate sentence.
Specifically, first decoded vector is subjected to linearisation and normalized, generates the corresponding image of target image
Words of description " one ";
Using first decoded vector as reference decoder vector, above-mentioned steps S21~S24 is repeated, obtains second
Decoded vector;Second decoded vector is subjected to linearisation and normalized, generates the corresponding iamge description word of target image
Language " a ";
……
And so on, finally obtained iamge description word includes " one " " a " " latent " " water " " member " " " " sea " "bottom"
" sight " " examining " " sea " " tortoise ", finally obtained iamge description sentence are " diver observes green turtle in seabed ".
The embodiment of the present application also discloses a kind of training method of image description model, wherein sample image and sample graph
As descriptive statement is input to image description model as training set.
Referring to Fig. 7, the training method includes:
701, characteristics of image is extracted to sample image.
702, described image feature is subjected to tag extraction, generates corresponding image tag.
703, the characteristics of image of the sample image, image tag and the corresponding sample image of the sample image are retouched
Predicate sentence is input to image description model, is trained to described image descriptive model, until reaching trained stop condition.
Wherein, training stop condition includes: that the decoded vector for generating image description model and the verifying of preset vector collect
It compares, the change rate for obtaining the error of the decoded vector is less than stable threshold.
Wherein, stable threshold can be set according to actual needs, such as be set as 1%.In this way, error tends towards stability,
It can think that model training finishes.
Specifically, by the characteristics of image of the sample image, image tag and the corresponding sample graph of the sample image
As descriptive statement is input to image description model, described image descriptive model is trained, include the following steps S7031~
S7034:
S7031, the encoder that the characteristics of image of the sample image and image tag are input to image description model,
Generate the output feature of encoder.
S7032, reference decoder vector and output feature are input to the decoder and be decoded, obtain the decoding
The decoded vector of device output.
S7033, linearisation and normalized are carried out according to the decoded vector, generates the corresponding image of sample image and retouches
Predicate sentence.
S7034, the corresponding iamge description sentence of sample image and sample image descriptive statement are subjected to error comparison, and
Adjust the parameter of described image descriptive model.
The method of the training of image description model provided in this embodiment, by the characteristics of image of sample image, image tag
And the corresponding sample image descriptive statement of sample image is input to image description model, is trained to image description model,
Until reaching trained stop condition, to obtain the image description model that may be implemented to generate descriptive statement according to target image.
The embodiment of the present application also discloses a kind of iamge description device, referring to Fig. 8, comprising:
Fisrt feature extraction module 801 is configured as extracting characteristics of image to target image;
First tag extraction module 802 is configured as described image feature carrying out tag extraction, generates corresponding image
Label;
Coding module 803 is configured as the characteristics of image of the target image and image tag being input to iamge description
The encoder of model generates the corresponding eigenmatrix of the target image;
Decoder module 804, the decoder for being configured as the eigenmatrix being input to image description model are decoded,
Obtain the corresponding iamge description sentence of target image.
Optionally, the first tag extraction module 802 is specifically configured to: described image feature is input to multi-tag classification
Model carries out tag extraction, generates at least one corresponding image tag.
Optionally, the encoder includes a coding layer, and coding module 803 is specifically configured to:
The characteristics of image of the target image and image tag are pre-processed respectively, generate pretreatment image feature and
Label vector;
Pretreatment image feature and label vector are input to the coding layer, and the output feature of the coding layer is made
For the corresponding eigenmatrix of the target image.
Optionally, encoder includes sequentially connected N number of coding layer, and coding module 803 is specifically configured to:
Characteristic processing unit is configured as respectively carrying out the characteristics of image of the target image and image tag at feature
Reason generates pretreatment image feature and label vector;
First coding unit is configured as pretreatment image feature and label characteristics being input to first coding layer, obtain
To the output feature of first coding layer;
Second coding unit is configured as the output feature of (i-1)-th coding layer and label vector being input to i-th of volume
Code layer obtains the output feature of i-th of coding layer;
Judging unit is configured as i from increasing 1, and whether i of the judgement from after increasing 1 is less than N, if so, it is single to execute the second coding
Member, if it is not, executing eigenmatrix acquiring unit;
Eigenmatrix acquiring unit is configured as corresponding using the output feature of n-th coding layer as the target image
Eigenmatrix.
Optionally, the coding layer include: first from attention layer, the first bull attention layer and first feedforward layer;
Second coding unit is configured as:
First that pretreatment image feature is input to i-th of coding layer is handled from attention layer, generates first certainly
Attention characteristics;
Described first is input to from attention characteristics and the label characteristics the first bull attention of i-th of coding layer
Layer generates the first fusion feature;
By first fusion feature via the first feedforward layer processing, the output feature of i-th of coding layer is generated.
Optionally, decoder module 804 is specifically configured to:
Reference decoder vector and eigenmatrix are input to the decoder to be decoded, obtain the decoder output
Decoded vector;
Linearisation and normalized are carried out according to the decoded vector, generates the corresponding iamge description language of target image
Sentence.
The device of iamge description provided in this embodiment carries out tag extraction generation by the characteristics of image to target image
The characteristics of image of target image and image tag are input to image description model, obtain target image by corresponding image tag
Corresponding iamge description sentence, thus make image description model during iamge description sentence, it can be according to specific reliable
Image tag information be reference, make generate iamge description sentence include more key messages, improve iamge description language
The accuracy and reliability of sentence;And refer to since the generation phase in iamge description sentence is used as according to reliable image tag
It leads, reduces the generation of redundant data.
A kind of exemplary scheme of the device of above-mentioned iamge description for the present embodiment.It should be noted that the device
The technical solution of technical solution and the method for above-mentioned iamge description belongs to same design, and the technical solution of device is not described in detail
Detail content, may refer to the description of the technical solution of the method for above-mentioned iamge description.
The embodiment of the present application discloses a kind of training device of image description model, referring to Fig. 9, comprising:
Second feature extraction module 901 is configured as extracting characteristics of image to sample image;
Second tag extraction module 902 is configured as described image feature carrying out tag extraction, generates corresponding image
Label;
Training module 903 is configured as the characteristics of image of the sample image, image tag and the sample image
Corresponding sample image descriptive statement is input to image description model, is trained to described image descriptive model, until reaching
Training stop condition.
Optionally, training stop condition includes: the decoded vector and preset vector for generating described image descriptive model
Verifying collection compares, and the change rate for obtaining the error of the decoded vector is less than stable threshold.
The training device of image description model provided in this embodiment, by the characteristics of image of sample image, image tag with
And the corresponding sample image descriptive statement of sample image is input to image description model, is trained to image description model, directly
To trained stop condition is reached, to obtain the image description model that may be implemented to generate descriptive statement according to target image.
A kind of exemplary scheme of the training device of above-mentioned image description model for the present embodiment.It should be noted that
The technical solution of the technical solution of the training device and above-mentioned training method belongs to same design, the technical solution of training device
The detail content being not described in detail may refer to the description of the technical solution of above-mentioned training method.
One embodiment of the application also provides a kind of calculating equipment, is stored with computer instruction, which is held by processor
Realized when row iamge description as previously described method or image description model training method the step of.
Figure 10 is to show the structural block diagram of the calculating equipment 100 according to one embodiment of this specification.The calculating equipment 100
Component include but is not limited to memory 110 and processor 120.Processor 120 is connected with memory 110 by bus 130,
Database 150 is for saving data.
Calculating equipment 100 further includes access device 140, access device 140 enable calculate equipment 100 via one or
Multiple networks 160 communicate.The example of these networks includes public switched telephone network (PSTN), local area network (LAN), wide area network
(WAN), the combination of the communication network of personal area network (PAN) or such as internet.Access device 140 may include wired or wireless
One or more of any kind of network interface (for example, network interface card (NIC)), such as IEEE802.11 wireless local area
Net (WLAN) wireless interface, worldwide interoperability for microwave accesses (Wi-MAX) interface, Ethernet interface, universal serial bus (USB) connect
Mouth, cellular network interface, blue tooth interface, near-field communication (NFC) interface, etc..
In one embodiment of this specification, calculate equipment 100 above-mentioned component and Figure 10 in it is unshowned other
Component can also be connected to each other, such as pass through bus.It should be appreciated that calculating device structure block diagram shown in Fig. 10 is only
In exemplary purpose, rather than the limitation to this specification range.Those skilled in the art can according to need, and increases or replaces
Other component.
Calculating equipment 100 can be any kind of static or mobile computing device, including mobile computer or mobile meter
Calculate equipment (for example, tablet computer, personal digital assistant, laptop computer, notebook computer, net book etc.), movement
Phone (for example, smart phone), wearable calculating equipment (for example, smartwatch, intelligent glasses etc.) or other kinds of shifting
Dynamic equipment, or the static calculating equipment of such as desktop computer or PC.Calculating equipment 100 can also be mobile or state type
Server.
One embodiment of the application also provides a kind of computer readable storage medium, is stored with computer instruction, the instruction
Realized when being executed by processor iamge description as previously described method or image description model training method the step of.
A kind of exemplary scheme of above-mentioned computer readable storage medium for the present embodiment.It should be noted that this is deposited
The technical solution of the training method of the method or image description model of the technical solution of storage media and above-mentioned iamge description belongs to together
One design, the detail content that the technical solution of storage medium is not described in detail, may refer to above-mentioned iamge description method or
The description of the technical solution of the training method of image description model.
The computer instruction includes computer program code, the computer program code can for source code form,
Object identification code form, executable file or certain intermediate forms etc..The computer-readable medium may include: that can carry institute
State any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic disk, CD, the computer storage of computer program code
Device, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory),
Electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that the computer-readable medium include it is interior
Increase and decrease appropriate can be carried out according to the requirement made laws in jurisdiction with patent practice by holding, such as in certain jurisdictions of courts
Area does not include electric carrier signal and telecommunication signal according to legislation and patent practice, computer-readable medium.
It should be noted that for the various method embodiments described above, describing for simplicity, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because
According to the application, certain steps can use other sequences or carry out simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules might not all be this Shen
It please be necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, it may refer to the associated description of other embodiments.
The application preferred embodiment disclosed above is only intended to help to illustrate the application.There is no detailed for alternative embodiment
All details are described, are not limited the invention to the specific embodiments described.Obviously, according to the content of this specification,
It can make many modifications and variations.These embodiments are chosen and specifically described to this specification, is in order to preferably explain the application
Principle and practical application, so that skilled artisan be enable to better understand and utilize the application.The application is only
It is limited by claims and its full scope and equivalent.
Claims (12)
1. a kind of method of iamge description, which is characterized in that be used for image description model, which comprises
Characteristics of image is extracted to target image;
Described image feature is subjected to tag extraction, generates corresponding image tag;
The characteristics of image of the target image and image tag are input to the encoder of image description model, generate the target
The corresponding eigenmatrix of image;
The decoder that the eigenmatrix is input to image description model is decoded, the corresponding image of target image is obtained and retouches
Predicate sentence.
2. the method as described in claim 1, which is characterized in that described image feature is carried out tag extraction, is generated corresponding
Image tag, comprising:
Described image feature is input to multi-tag disaggregated model and carries out tag extraction, generates at least one corresponding image mark
Label.
3. the method as described in claim 1, which is characterized in that the encoder includes a coding layer;
The characteristics of image of the target image and image tag are input to the encoder of image description model, generate the target
The corresponding eigenmatrix of image, comprising:
The characteristics of image of the target image and image tag are pre-processed respectively, generate pretreatment image feature and label
Vector;
Pretreatment image feature and label vector are input to the coding layer, and using the output feature of the coding layer as institute
State the corresponding eigenmatrix of target image.
4. the method as described in claim 1, which is characterized in that the encoder includes sequentially connected N number of coding layer;
The characteristics of image of the target image and image tag are input to the encoder of image description model, generate the target
The corresponding eigenmatrix of image, comprising:
S11, the characteristics of image of the target image and image tag are carried out to characteristic processing respectively, generate pretreatment image feature
And label vector;
S12, pretreatment image feature and label characteristics are input to first coding layer, the output for obtaining first coding layer is special
Sign;
S13, the output feature of (i-1)-th coding layer and label vector are input to i-th of coding layer, obtain i-th of coding layer
Output feature;
S14, i is increased to 1 certainly, whether i of the judgement from after increasing 1 is less than N, if so, execution step S13, if it is not, executing step S15;
S15, using the output feature of n-th coding layer as the corresponding eigenmatrix of the target image.
5. method as claimed in claim 4, which is characterized in that the coding layer includes: first from attention layer, the first bull
Attention layer and the first feedforward layer;
Pretreatment image feature and label vector are input to i-th of coding layer, obtain the output feature of i-th of coding layer, is wrapped
It includes:
First that pretreatment image feature is input to i-th of coding layer is handled from attention layer, generates first from attention
Feature;
Described first is input to from attention characteristics and the label characteristics the first bull attention layer of i-th of coding layer, it is raw
At the first fusion feature;
By first fusion feature via the first feedforward layer processing, the output feature of i-th of coding layer is generated.
6. the method as described in claim 1, which is characterized in that the eigenmatrix is input to the decoding of image description model
Device is decoded, and obtains the corresponding iamge description sentence of target image, comprising:
Reference decoder vector and eigenmatrix are input to the decoder to be decoded, obtain the solution of the decoder output
Code vector;
Linearisation and normalized are carried out according to the decoded vector, generates the corresponding iamge description sentence of target image.
7. a kind of training method of image description model, which is characterized in that the described method includes:
Characteristics of image is extracted to sample image;
Described image feature is subjected to tag extraction, generates corresponding image tag;
The characteristics of image of the sample image, image tag and the corresponding sample image descriptive statement of the sample image is defeated
Enter to image description model, described image descriptive model is trained, until reaching trained stop condition.
8. the method for claim 7, which is characterized in that the trained stop condition includes:
Decoded vector that described image descriptive model generates and preset vector verifying collection are compared, obtain it is described decode to
The change rate of the error of amount is less than stable threshold.
9. a kind of device of iamge description, which is characterized in that described device includes:
Fisrt feature extraction module is configured as extracting characteristics of image to target image;
First tag extraction module is configured as described image feature carrying out tag extraction, generates corresponding image tag;
Coding module is configured as the characteristics of image of the target image and image tag being input to the volume of image description model
Code device, generates the corresponding eigenmatrix of the target image;
Decoder module, the decoder for being configured as the eigenmatrix being input to image description model are decoded, and obtain mesh
The corresponding iamge description sentence of logo image.
10. a kind of training device of image description model, which is characterized in that described device includes:
Second feature extraction module is configured as extracting characteristics of image to sample image;
Second tag extraction module is configured as described image feature carrying out tag extraction, generates corresponding image tag;
Training module is configured as the characteristics of image of the sample image, image tag and the sample image is corresponding
Sample image descriptive statement is input to image description model, is trained to described image descriptive model, stops until reaching training
Only condition.
11. a kind of calculating equipment including memory, processor and stores the calculating that can be run on a memory and on a processor
Machine instruction, which is characterized in that the processor realizes side described in claim 1-6 or 7-8 any one when executing described instruction
The step of method.
12. a kind of computer readable storage medium, is stored with computer instruction, which is characterized in that the instruction is held by processor
The step of claim 1-6 or 7-8 any one the method are realized when row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910760737.6A CN110472688A (en) | 2019-08-16 | 2019-08-16 | The method and device of iamge description, the training method of image description model and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910760737.6A CN110472688A (en) | 2019-08-16 | 2019-08-16 | The method and device of iamge description, the training method of image description model and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110472688A true CN110472688A (en) | 2019-11-19 |
Family
ID=68511036
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910760737.6A Pending CN110472688A (en) | 2019-08-16 | 2019-08-16 | The method and device of iamge description, the training method of image description model and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110472688A (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111275110A (en) * | 2020-01-20 | 2020-06-12 | 北京百度网讯科技有限公司 | Image description method and device, electronic equipment and storage medium |
CN111639594A (en) * | 2020-05-29 | 2020-09-08 | 苏州遐迩信息技术有限公司 | Training method and device of image description model |
CN111914842A (en) * | 2020-08-10 | 2020-11-10 | 深圳市视美泰技术股份有限公司 | License plate information identification method and device, computer equipment and storage medium |
CN112699948A (en) * | 2020-12-31 | 2021-04-23 | 无锡祥生医疗科技股份有限公司 | Ultrasonic breast lesion classification method and device and storage medium |
CN112818975A (en) * | 2021-01-27 | 2021-05-18 | 北京金山数字娱乐科技有限公司 | Text detection model training method and device and text detection method and device |
CN112862727A (en) * | 2021-03-16 | 2021-05-28 | 上海壁仞智能科技有限公司 | Cross-mode image conversion method and device |
CN113095405A (en) * | 2021-04-13 | 2021-07-09 | 沈阳雅译网络技术有限公司 | Construction method of image description generation system based on pre-training and double-layer attention |
CN113869337A (en) * | 2020-06-30 | 2021-12-31 | 北京金山数字娱乐科技有限公司 | Training method and device of image recognition model, and image recognition method and device |
CN113988274A (en) * | 2021-11-11 | 2022-01-28 | 电子科技大学 | Text intelligent generation method based on deep learning |
CN114358203A (en) * | 2022-01-11 | 2022-04-15 | 平安科技(深圳)有限公司 | Training method and device for image description sentence generation module and electronic equipment |
CN114627353A (en) * | 2022-03-21 | 2022-06-14 | 北京有竹居网络技术有限公司 | Image description generation method, device, equipment, medium and product |
CN114743018A (en) * | 2022-04-21 | 2022-07-12 | 平安科技(深圳)有限公司 | Image description generation method, device, equipment and medium |
CN114821271A (en) * | 2022-05-19 | 2022-07-29 | 平安科技(深圳)有限公司 | Model training method, image description generation device and storage medium |
CN114842299A (en) * | 2022-05-10 | 2022-08-02 | 平安科技(深圳)有限公司 | Training method, device, equipment and medium for image description information generation model |
CN114881242A (en) * | 2022-04-21 | 2022-08-09 | 西南石油大学 | Image description method and system based on deep learning, medium and electronic equipment |
WO2023116507A1 (en) * | 2021-12-22 | 2023-06-29 | 北京沃东天骏信息技术有限公司 | Target detection model training method and apparatus, and target detection method and apparatus |
CN116778011A (en) * | 2023-05-22 | 2023-09-19 | 阿里巴巴(中国)有限公司 | Image generating method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268629A (en) * | 2018-01-15 | 2018-07-10 | 北京市商汤科技开发有限公司 | Image Description Methods and device, equipment, medium, program based on keyword |
CN109446534A (en) * | 2018-09-21 | 2019-03-08 | 清华大学 | Machine translation method and device |
-
2019
- 2019-08-16 CN CN201910760737.6A patent/CN110472688A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268629A (en) * | 2018-01-15 | 2018-07-10 | 北京市商汤科技开发有限公司 | Image Description Methods and device, equipment, medium, program based on keyword |
CN109446534A (en) * | 2018-09-21 | 2019-03-08 | 清华大学 | Machine translation method and device |
Non-Patent Citations (3)
Title |
---|
JIANGYUN LI ET AL: "Boosted Transformer for Image Captioning", 《APPLIED SCIENCES》 * |
陆泉: "《图像语义信息可视化交互研究》", 31 July 2015, 国防图书馆出版社 * |
高扬: "《智能摘要与深度学习》", 30 April 2019, 北京理工大学出版社 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111275110B (en) * | 2020-01-20 | 2023-06-09 | 北京百度网讯科技有限公司 | Image description method, device, electronic equipment and storage medium |
CN111275110A (en) * | 2020-01-20 | 2020-06-12 | 北京百度网讯科技有限公司 | Image description method and device, electronic equipment and storage medium |
CN111639594B (en) * | 2020-05-29 | 2023-09-22 | 苏州遐迩信息技术有限公司 | Training method and device for image description model |
CN111639594A (en) * | 2020-05-29 | 2020-09-08 | 苏州遐迩信息技术有限公司 | Training method and device of image description model |
CN113869337A (en) * | 2020-06-30 | 2021-12-31 | 北京金山数字娱乐科技有限公司 | Training method and device of image recognition model, and image recognition method and device |
CN111914842A (en) * | 2020-08-10 | 2020-11-10 | 深圳市视美泰技术股份有限公司 | License plate information identification method and device, computer equipment and storage medium |
CN112699948A (en) * | 2020-12-31 | 2021-04-23 | 无锡祥生医疗科技股份有限公司 | Ultrasonic breast lesion classification method and device and storage medium |
CN112818975A (en) * | 2021-01-27 | 2021-05-18 | 北京金山数字娱乐科技有限公司 | Text detection model training method and device and text detection method and device |
CN112862727A (en) * | 2021-03-16 | 2021-05-28 | 上海壁仞智能科技有限公司 | Cross-mode image conversion method and device |
CN112862727B (en) * | 2021-03-16 | 2023-06-23 | 上海壁仞智能科技有限公司 | Cross-modal image conversion method and device |
CN113095405A (en) * | 2021-04-13 | 2021-07-09 | 沈阳雅译网络技术有限公司 | Construction method of image description generation system based on pre-training and double-layer attention |
CN113095405B (en) * | 2021-04-13 | 2024-04-30 | 沈阳雅译网络技术有限公司 | Method for constructing image description generation system based on pre-training and double-layer attention |
CN113988274A (en) * | 2021-11-11 | 2022-01-28 | 电子科技大学 | Text intelligent generation method based on deep learning |
CN113988274B (en) * | 2021-11-11 | 2023-05-12 | 电子科技大学 | Text intelligent generation method based on deep learning |
WO2023116507A1 (en) * | 2021-12-22 | 2023-06-29 | 北京沃东天骏信息技术有限公司 | Target detection model training method and apparatus, and target detection method and apparatus |
WO2023134082A1 (en) * | 2022-01-11 | 2023-07-20 | 平安科技(深圳)有限公司 | Training method and apparatus for image caption statement generation module, and electronic device |
CN114358203A (en) * | 2022-01-11 | 2022-04-15 | 平安科技(深圳)有限公司 | Training method and device for image description sentence generation module and electronic equipment |
CN114358203B (en) * | 2022-01-11 | 2024-09-27 | 平安科技(深圳)有限公司 | Training method and device for image description sentence generation module and electronic equipment |
CN114627353A (en) * | 2022-03-21 | 2022-06-14 | 北京有竹居网络技术有限公司 | Image description generation method, device, equipment, medium and product |
CN114627353B (en) * | 2022-03-21 | 2023-12-12 | 北京有竹居网络技术有限公司 | Image description generation method, device, equipment, medium and product |
CN114881242A (en) * | 2022-04-21 | 2022-08-09 | 西南石油大学 | Image description method and system based on deep learning, medium and electronic equipment |
CN114743018A (en) * | 2022-04-21 | 2022-07-12 | 平安科技(深圳)有限公司 | Image description generation method, device, equipment and medium |
CN114743018B (en) * | 2022-04-21 | 2024-05-31 | 平安科技(深圳)有限公司 | Image description generation method, device, equipment and medium |
CN114842299A (en) * | 2022-05-10 | 2022-08-02 | 平安科技(深圳)有限公司 | Training method, device, equipment and medium for image description information generation model |
CN114821271A (en) * | 2022-05-19 | 2022-07-29 | 平安科技(深圳)有限公司 | Model training method, image description generation device and storage medium |
CN116778011A (en) * | 2023-05-22 | 2023-09-19 | 阿里巴巴(中国)有限公司 | Image generating method |
CN116778011B (en) * | 2023-05-22 | 2024-05-24 | 阿里巴巴(中国)有限公司 | Image generating method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110472688A (en) | The method and device of iamge description, the training method of image description model and device | |
CN109977428A (en) | A kind of method and device that answer obtains | |
CN110781663A (en) | Training method and device of text analysis model and text analysis method and device | |
CN109190131A (en) | A kind of English word and its capital and small letter unified prediction based on neural machine translation | |
CN111985239A (en) | Entity identification method and device, electronic equipment and storage medium | |
CN113609965B (en) | Training method and device of character recognition model, storage medium and electronic equipment | |
CN114549850B (en) | Multi-mode image aesthetic quality evaluation method for solving modal missing problem | |
CN113609326B (en) | Image description generation method based on relationship between external knowledge and target | |
CN110765791A (en) | Automatic post-editing method and device for machine translation | |
CN114580424B (en) | Labeling method and device for named entity identification of legal document | |
CN113408287B (en) | Entity identification method and device, electronic equipment and storage medium | |
CN109214407A (en) | Event detection model, calculates equipment and storage medium at method, apparatus | |
CN115731552A (en) | Stamp character recognition method and device, processor and electronic equipment | |
CN117540221A (en) | Image processing method and device, storage medium and electronic equipment | |
CN114445832A (en) | Character image recognition method and device based on global semantics and computer equipment | |
CN117762499A (en) | Task instruction construction method and task processing method | |
CN116975288A (en) | Text processing method and text processing model training method | |
CN110570484B (en) | Text-guided image coloring method under image decoupling representation | |
CN111008531B (en) | Training method and device for sentence selection model, sentence selection method and device | |
CN112084788A (en) | Automatic marking method and system for implicit emotional tendency of image captions | |
CN117473359A (en) | Training method and related device of abstract generation model | |
CN117093864A (en) | Text generation model training method and device | |
CN113792550B (en) | Method and device for determining predicted answers, reading and understanding method and device | |
CN109871946A (en) | A kind of application method and device, training method and device of neural network model | |
CN115455144A (en) | Data enhancement method of completion type space filling type for small sample intention recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191119 |
|
RJ01 | Rejection of invention patent application after publication |