CN110223295A - Conspicuousness prediction technique and device based on deep neural network Color perception - Google Patents
Conspicuousness prediction technique and device based on deep neural network Color perception Download PDFInfo
- Publication number
- CN110223295A CN110223295A CN201910542301.XA CN201910542301A CN110223295A CN 110223295 A CN110223295 A CN 110223295A CN 201910542301 A CN201910542301 A CN 201910542301A CN 110223295 A CN110223295 A CN 110223295A
- Authority
- CN
- China
- Prior art keywords
- image
- network
- sample image
- processing unit
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses conspicuousness prediction techniques and device based on deep neural network Color perception, method includes: that fine granularity sample image is input to preset first VGG network, and coarseness sample image is input in preset 2nd VGG network, obtain the fisrt feature figure corresponding to coarseness sample image, and the second feature figure corresponding to fine granularity sample image;Using Feature Fusion Algorithm, blending image is obtained;The characteristic pattern of blending image and blending image are subjected to multiplication process, and then the notable figure predicted;Judge whether the value of cross entropy loss function restrains;If so, using the network of the first VGG network, the 2nd VGG network and channel weighting sub-network composition as target network model, and predicted using the conspicuousness that the target network model carries out image to be detected;If not;Model Weight, hyper parameter are adjusted, until convergence.Using the embodiment of the present invention, the conspicuousness prediction for meeting human eye perception may be implemented.
Description
Technical field
The present invention relates to a kind of conspicuousness prediction technique and device, it is more particularly to based on deep neural network Color perception
Conspicuousness prediction technique and device.
Background technique
In computer vision field, how to make computer simulation human eye, quickly, accurately finds object interested in visual scene
The ability of body is very important research field.The process of attention object position is defined as in computer discovery visual scene
Conspicuousness prediction.
Currently, conspicuousness prediction technique has been widely used in the fields such as compression of images, target identification, image segmentation, and
And achieve significant effect.Conspicuousness in visual scene can come from some column stimulations, including low-level image attribute, such as face
Color, direction, size etc. and semantic information.Color is considered as calculating one of the main feature of conspicuousness from top to bottom.It is so far
Only, existing some models consider guidance of the different color to human eye attention there are difference, and these othernesses are received
Enter attention prediction model.But these otherness conclusions can not be included from the subjective experiment only comprising a small number of colors
The classification of full color color in nature, it is therefore, existing that the research model that attention influences can not be extended based on color
Conspicuousness prediction is carried out into natural vision scene, therefore, there are the technical problems of prediction effect inaccuracy for the prior art.
Summary of the invention
Technical problem to be solved by the present invention lies in providing, the conspicuousness based on deep neural network Color perception is pre-
Method and device is surveyed, to solve the technical problem of prediction effect inaccuracy in the prior art.
The present invention is to solve above-mentioned technical problem by the following technical programs:
The embodiment of the invention provides the conspicuousness prediction technique based on deep neural network Color perception, the method packets
It includes:
1), for each sample image in the sample set of the color image of acquisition, the sample image is converted to
Coarseness sample image and fine granularity sample image, wherein the high resolution of the image in the fine granularity sample image is in institute
State the resolution ratio of the image in coarseness sample image;
2) fine granularity sample image, is input to preset first VGG network, and coarseness sample image is input to pre-
If the 2nd VGG network in, obtain the fisrt feature figure corresponding to coarseness sample image, and correspond to fine granularity sample graph
The second feature figure of picture;
3), using Feature Fusion Algorithm, the fisrt feature figure and the second feature figure is subjected to fusion treatment, obtained
Blending image;
4) characteristic pattern of blending image, is identified using the channel weighting sub-network pre-established;By the spy of blending image
Sign figure carries out multiplication process with blending image, obtains target image;The target image is rolled up using preset convolution kernel
Product processing, the notable figure predicted;
5) it, obtains the notable figure of the prediction and the human eye corresponding to the sample image watches the damage of the cross entropy between figure attentively
Function is lost, and judges whether the value of the cross entropy loss function restrains;
6), if so, using the network of the first VGG network, the 2nd VGG network and channel weighting sub-network composition as target
Network model, and predicted using the conspicuousness that the target network model carries out image to be detected;
7), if not;Adjust the mould in the first VGG network and/or the 2nd VGG network and/or channel weighting sub-network
Type weight and/or hyper parameter, and return to step 2), until the value of the cross entropy loss function restrains.
Optionally, the resolution ratio of the image in the fine granularity sample image is the image in the coarseness sample image
Resolution ratio the first preset quantity times.
Optionally, the first VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing is single
Member and second processing unit include: 2 convolution kernels and a maximum pond layer;
Third processing ternary, the 4th processing unit and the 5th processing unit include: three convolution kernels and
One maximum pond layer.
Optionally, the 2nd VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing is single
Member and second processing unit include: 2 convolution kernels and a maximum pond layer;
Third processing ternary, the 4th processing unit and the 5th processing unit include: three convolution kernels and
One maximum pond layer.
Optionally, in step 3), before, the method also includes:
Up-sampling treatment, obtained second feature figure up-sampling figure are carried out for the second feature figure figure;
The step 3), comprising:
Using Feature Fusion Algorithm, the up-sampling figure of the fisrt feature figure and second feature figure is subjected to fusion treatment,
Obtain blending image.
Optionally, the channel weighting sub-network, comprising:
It sets gradually having a size of 2*2, and the maximum pond layer that step-length is 2;One feature flattening layer and a full connection
Layer.
The embodiment of the invention also provides the conspicuousness prediction meanss based on deep neural network Color perception, described devices
Include:
Conversion module, for each sample image in the sample set for the color image obtained, by the sample
Image is converted to coarseness sample image and fine granularity sample image, wherein point of the image in the fine granularity sample image
Resolution is higher than the resolution ratio of the image in the coarseness sample image;
Input module, for fine granularity sample image to be input to preset first VGG network, and by coarseness sample graph
As being input in preset 2nd VGG network, the fisrt feature figure corresponding to coarseness sample image is obtained, and correspond to thin
The second feature figure of granularity sample image;
Fusion Module melts the fisrt feature figure and the second feature figure for utilizing Feature Fusion Algorithm
Conjunction processing, obtains blending image;
Convolution module, for using the channel weighting sub-network pre-established to identify the characteristic pattern of blending image;It will melt
The characteristic pattern and blending image for closing image carry out multiplication process, obtain target image;Using preset convolution kernel to the target
Image carries out process of convolution, the notable figure predicted;
Judgment module, the human eye for obtaining the notable figure of the prediction and corresponding to the sample image are watched attentively between figure
Cross entropy loss function, and judge whether the value of the cross entropy loss function restrains;
Detection module, for the judging result of the judgment module be in the case where, by the first VGG network, second
VGG network and channel weighting sub-network composition network be used as target network model, and use the target network model into
The conspicuousness of row image to be detected is predicted;
Module is adjusted, for adjusting the first VGG network in the case where the judging result of the judgment module, which is, is
And/or the 2nd VGG network and/or Model Weight and/or hyper parameter in channel weighting sub-network, and trigger input module.
Optionally, the resolution ratio of the image in the fine granularity sample image is the image in the coarseness sample image
Resolution ratio the first preset quantity times.
Optionally, the first VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing is single
Member and second processing unit include: 2 convolution kernels and a maximum pond layer;
Third processing ternary, the 4th processing unit and the 5th processing unit include: three convolution kernels and
One maximum pond layer.
Optionally, the 2nd VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing is single
Member and second processing unit include: 2 convolution kernels and a maximum pond layer;
Third processing ternary, the 4th processing unit and the 5th processing unit include: three convolution kernels and
One maximum pond layer.
Optionally, described device further include: up-sampling module is used for
Up-sampling treatment, obtained second feature figure up-sampling figure are carried out for the second feature figure figure;
Fusion Module is used for:
Using Feature Fusion Algorithm, the up-sampling figure of the fisrt feature figure and second feature figure is subjected to fusion treatment,
Obtain blending image.
Optionally, the channel weighting sub-network, comprising:
It sets gradually having a size of 2*2, and the maximum pond layer that step-length is 2;One feature flattening layer and a full connection
Layer.
The present invention has the advantage that compared with prior art
Using the embodiment of the present invention, it is included in the multiple dimensioned information of color image using double fluid input network, then makes
With channel weighting sub-network, using great Chiization layer to reduce their dimensions and space variance on cascade Feature Mapping channel,
With the relative importance of object in coded image, to realize the conspicuousness prediction for meeting human eye perception.
Detailed description of the invention
Fig. 1 is the process of the conspicuousness prediction technique provided in an embodiment of the present invention based on deep neural network Color perception
Schematic diagram;
Fig. 2 is in the conspicuousness prediction technique provided in an embodiment of the present invention based on deep neural network Color perception
The structural schematic diagram of VGG16 binary-flow network;
Fig. 3 is in the conspicuousness prediction technique provided in an embodiment of the present invention based on deep neural network Color perception
The structural schematic diagram of VGG16 binary-flow network;
Fig. 4 is to be based on deep neural network color to provided in an embodiment of the present invention using AUC-Judd index Evaluation Method
The conspicuousness prediction technique of perception and the technical effect contrast schematic diagram of the prior art;
Fig. 5 is to be based on deep neural network color to provided in an embodiment of the present invention using AUC-Borji index Evaluation Method
The conspicuousness prediction technique of coloured silk perception and the technical effect contrast schematic diagram of the prior art;
Fig. 6 is to be based on deep neural network Color perception to provided in an embodiment of the present invention using sAUC index Evaluation Method
Conspicuousness prediction technique and the prior art technical effect contrast schematic diagram;
Fig. 7 is to be based on deep neural network Color perception to provided in an embodiment of the present invention using NSS index Evaluation Method
Conspicuousness prediction technique and the prior art technical effect contrast schematic diagram;
Fig. 8 is to provided in an embodiment of the present invention using IG index Evaluation Method based on deep neural network Color perception
The technical effect contrast schematic diagram of conspicuousness prediction technique and the prior art;
Fig. 9 is to provided in an embodiment of the present invention using CC index Evaluation Method based on deep neural network Color perception
The technical effect contrast schematic diagram of conspicuousness prediction technique and the prior art;
Figure 10 is the knot of the conspicuousness prediction meanss provided in an embodiment of the present invention based on deep neural network Color perception
Structure schematic diagram.
Specific embodiment
It elaborates below to the embodiment of the present invention, the present embodiment carries out under the premise of the technical scheme of the present invention
Implement, the detailed implementation method and specific operation process are given, but protection scope of the present invention is not limited to following implementation
Example.
The embodiment of the invention provides conspicuousness prediction technique and device based on deep neural network Color perception, below
Just the conspicuousness prediction technique provided in an embodiment of the present invention based on deep neural network Color perception is introduced first.
Fig. 1 is the process of the conspicuousness prediction technique provided in an embodiment of the present invention based on deep neural network Color perception
Schematic diagram, as shown in Figure 1, which comprises
S101: for each sample image in the sample set of the color image of acquisition, the sample image is converted
For coarseness sample image and fine granularity sample image, wherein the high resolution of the image in the fine granularity sample image in
The resolution ratio of image in the coarseness sample image.
Specifically, the resolution ratio of the image in the fine granularity sample image is the image in the coarseness sample image
Resolution ratio the first preset quantity times.
In practical applications, we initialize the weight and biasing of VGG-16 model first on ImageNet data set,
Then, training set and verifying in SALICON data set is used to collect as the depth based on attention mechanism of the embodiment of the present invention
Spend the training data of the adaptive conspicuousness prediction network of neural network color.The data set include 15000 color images and
Its corresponding 15000 true value label image.For each image of 15000 color images, being cut into Pixel Dimensions is
1000 × 750 image is as fine granularity sample image;Using the image cropping pixel having a size of 500 × 375 image as
Fine granularity sample image.15000 true value label images are cut into the image having a size of 32 × 24 as the mark of training network
Label.
S102: fine granularity sample image is input to preset first VGG network, and coarseness sample image is input to
In preset 2nd VGG network, the fisrt feature figure corresponding to coarseness sample image is obtained, and correspond to fine granularity sample
The second feature figure of image.
Specifically, the first VGG network includes: the 5th processing unit being sequentially connected in series, wherein transmitted according to data
First processing unit and second processing unit of direction setting include: 2 convolution kernels and a maximum pond layer;
It includes: three convolution kernels and a maximum that third, which handles ternary, the 4th processing unit and the 5th processing unit,
Pond layer.The 2nd VGG network includes: the 5th processing unit being sequentially connected in series, wherein is arranged according to data direction of transfer
First processing unit and second processing unit include: the maximum pond layer of 2 convolution kernels and one;At third
Reason ternary, the 4th processing unit and the 5th processing unit include: three convolution kernels and a maximum pond layer.
In practical applications, Fig. 2 is the conspicuousness provided in an embodiment of the present invention based on deep neural network Color perception
The structural schematic diagram of VGG16 binary-flow network in prediction technique;As shown in Fig. 2, the input of network is designed as by two VGG-16 groups
At binary-flow network.
First VGG network is to carrying out feature extraction on the fine grit sizes of image.By the fine of 1000 × 750 × 3 pixels
Image is fed into the first VGG network to extract relatively high-resolution depth characteristic;Simultaneously by 500 × 375 × 3 pixels compared with
Thick scale pixel feeds the depth characteristic that comparatively low resolution is extracted in the 2nd VGG network.
Extracting each VGG-16 network used in feature binary-flow network includes: the 5th processing unit being sequentially connected in series,
Wherein, first processing unit and second processing unit include: 2 having a size of 3 × 3, convolution step-length be 1 convolution kernel
An and maximum pond layer;It includes: three that third, which handles ternary, the 4th processing unit and the 5th processing unit,
It is 1 convolution kernel and a maximum pond layer having a size of 3 × 3, convolution step-length.Maximum pond layer in VGG16 network be by
Having a size of 2 × 2, step-length be 2 convolution kernel construct.The activation primitive of all hidden layers is all ReLU function.
S103: utilizing Feature Fusion Algorithm, and the fisrt feature figure and the second feature figure are carried out fusion treatment, obtained
To blending image.
Specifically, the image tune that double-current VGG-16 network, i.e. the first VGG network and the 2nd VGG network can be exported
Whole is identical spatial resolution:
For example, the characteristic size size of the corresponding tributary output of second VGG16 network is 16 × 12, and dimension 512, benefit
The 512 dimensional feature figures that a size is 32 × 24 are obtained with up-sampling operation, are then adopted by using Feature Fusion Algorithm
The 512 dimensional feature figures that the size of characteristic pattern and the output of first tributary after sample is 32 × 24 carry out Fusion Features and form ruler
The 1024 dimensional feature figures that very little size is 32 × 24, as blending image.
In practical applications, Feature Fusion Algorithm is the prior art, is not being repeated here.
S104: the characteristic pattern of blending image is identified using the channel weighting sub-network pre-established;By blending image
Characteristic pattern and blending image carry out multiplication process, obtain target image;The target image is carried out using preset convolution kernel
Process of convolution, the notable figure predicted.
Illustratively, Fig. 3 is the conspicuousness prediction provided in an embodiment of the present invention based on deep neural network Color perception
The structural schematic diagram of VGG16 binary-flow network in method;As shown in figure 3, being channel weighting sub-network in dotted line frame in Fig. 3.In order to
The relative importance of the semantic feature of image is captured, channel weighting sub-network is devised in the embodiment of the present invention, for each figure
As calculating one group of 1024 dimensional feature weight.Channel weighting sub-network is by being the maximum pond that constitutes of 2 convolution kernels having a size of 2 × 2, step-length
Change layer, a feature flattening layer and a full articulamentum composition;Wherein, feature flattening layer is used to input " pressing ", more
The input of dimension carries out one-dimensional, and the size for exporting maximum pond layer is that the feature that dimension is 1024*16*12 is flattened into dimension
Degree is the one-dimensional vector of 1*196608;The output of full articulamentum is the matrix that a dimension is 1*1024.
The effect of maximum pond layer be on 1024 cascade Feature Mapping channels using 2 × 2 maximum pond layer with
Reduce their dimensions and space variance.Then, we flatten output, finally calculate 1024 dimensional vectors using full articulamentum.
Each dimension indicates the significant weight of corresponding input channel.Full articulamentum be used for based on its spatial positional information and semantic feature come
Learn the relative weighting of subject area different in scene;To encode contextual information, enable the network to highlight
The object that color from ambient enviroment causes, to obtain target image.
The spy of target image and the double-current VGG-16 network output by Fusion Features that channel weighting sub-network is exported
It levies obtained 1024 dimensional feature and carries out multiplication operation, the image after obtained weighting is 1024 dimensional features having a size of 32 × 24
Image.
Then, convolution behaviour is carried out using 2D image of the convolution kernel having a size of 1 × 1 to 1024 dimensions having a size of 32 × 24
Make, to switch to the single channel 2D Saliency maps having a size of 32 × 24, and network finally Saliency maps are readjusted back it is original
The dimension of image.The notable figure predicted.For example, by the single channel 2D Saliency maps tune after convolution operation having a size of 32*24
Whole is original image size.For example, if the image input network having a size of 800*600 is predicted, just by Saliency maps picture
Size adjusting is 800*600.
Using 2 × 2 maximum pond layer to reduce their dimensions and space on 1024 cascade Feature Mapping channels
Variance.
S105: obtaining the notable figure of the prediction and the human eye corresponding to the sample image watches the cross entropy between figure attentively
Loss function, and judge whether the value of the cross entropy loss function restrains;If so, executing S106;If it is not, executing S107.
Human eye watches figure attentively, refers to when showing people to observe sample image, the sample that the eyes of people are captured
The image that salient region in image is constituted.
It should be noted that cross entropy loss function is the prior art, which is not described herein again.
S106: using the network of the first VGG network, the 2nd VGG network and channel weighting sub-network composition as target network
Network model, and predicted using the conspicuousness that the target network model carries out image to be detected.
It illustratively, can during the conspicuousness for being carried out image to be detected using the target network model is predicted
Priori color weight is obtained to first pass through following steps in advance:
A: the image of 200 coloury natural scenes is collected, this 200 color images are then changed into grayscale image
Picture.
B: being SensoMotoric using model according to the time threshold that every group of picture showing is previously set, such as 5 seconds
The eye tracker system of Instruments (SMI) iView X RED acquired for 18 year in the case where sample frequency is 250Hz
Age 22-29 years old observer watch attentively above-mentioned color image and and gray level image obtain eye movement data.
C: it on the basis of eye movement data, obtains human eye and watches figure attentively, and watch human eye attentively figure normalized.This step is
The prior art, which is not described herein again.
D: the attention score that the human eye after normalized watches each pixel in figure attentively is defined as human eye and watches figure pair attentively
The gray value of pixel is answered, the value range of attention score is [0,1].It should be noted that every group of image after normalization
Including color image and corresponding gray level image.
E: being directed to each group of image, calculates the attention score value of pixel and the corresponding and pixel in color image
The difference of the attention score value of pixel in grayscale image is greater than the value of setting in the difference, when such as 0.1, by the pixel pair
The color answered is as conspicuousness color, using the difference as the score of the conspicuousness attention of the pixel.
F: also operating other pixels according to the method described above, available all conspicuousness color and its correspondence
Pixel conspicuousness attention score.
In S106 step, may comprise steps of:
Firstly, the Saliency maps obtained using target network model prediction.
Then, for each pixel of the color image of input target network model, if the color of the pixel belongs to
It is corresponding aobvious to set the pixel for the corresponding priori color weight of the pixel for the conspicuousness color determined in A-F step
Work property attention score;It is not belonging to conspicuousness color if the pixel, then the priori color weight of the pixel is set as pre-
The value first set, such as 0.1.
Finally, by the significance value of each pixel in the Saliency maps obtained using target network model prediction multiplied by
After priori color weight, then the Saliency maps obtained after being normalized are as final prediction result.
Using the above embodiment of the present invention, the accuracy of prediction result can be improved.
S107: the model power in adjustment the first VGG network and/or the 2nd VGG network and/or channel weighting sub-network
Weight and/or hyper parameter, and S102 is returned to step, until the value of the cross entropy loss function restrains.
Illustratively, during adjusting Model Weight and/or hyper parameter, momentum is arranged to 0.9, weight decaying setting
It is 0.00005, learning rate is set as 0.00005, criticizes and is sized to 32.
In addition, the value convergence of cross entropy loss function refers to that the value of cross entropy loss function is less than preset threshold.
AUC- is used on disclosed saliency data collection CAT2000 in order to illustrate the technical effect of the embodiment of the present invention
Judd (JuddArea Under Curve, HEY JUDE line under area), AUC-Borji, sAUC, NSS (Normalized
Scanpath Saliency, normalize the significant property of scan path), IG (Information Gain, information gain), CC
(Linear Correlation Coefficient, linearly dependent coefficient) evaluation method is to the embodiment of the present invention and following six
The conspicuousness prediction result of kind conspicuousness prediction technique is compared:
[1]S.Fan,Z.Shen,M.Jiang,B.L.Koenig,J.Xu,M.S.Kankanhalli,and
Q.Zhao,“Emotional attention:A study of image sentiment and visual
attention,”in Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition,2018,pp.7521–7531.
[2]X.Huang,C.Shen,X.Boix,and Q.Zhao,“Salicon:Reducing the semantic
gap in saliency prediction by adapting deep neural networks,”in IEEE
International Conference on Computer Vision,2015,pp.262–270.
[3]J.Pan,C.C.Ferrer,K.McGuinness,N.E.O’Connor,J.Torres,E.Sayrol,and
X.Giro-i Nieto,“Salgan:Visual saliency prediction with generative adversarial
networks,”arXiv preprint arXiv:1701.01081,2017.
[4]M.Cornia,L.Baraldi,G.Serra,and R.Cucchiara,“A deep multi-level
network for saliency prediction,”in Pattern Recognition(ICPR),2016 23rd
International Conference on.IEEE,2016,pp.3488–3493.
[5]J.Zhang and S.Sclaroff,“Saliency detection:A boolean map
approach,”in IEEE International Conference on Computer Vision,2013.
[6]H.Tang,C.Chen,and X.Pei,“Visual saliency detection via sparse
residual and outlier detection,”IEEE Signal Processing Letters,vol.23,no.12,
pp.1736–1740,2016.
Fig. 4 is to be based on deep neural network color to provided in an embodiment of the present invention using AUC-Judd index Evaluation Method
The conspicuousness prediction technique of perception and the technical effect contrast schematic diagram of the prior art;As shown in figure 4, the embodiment of the present invention
AUC-Judd index has reached 0.83, close with the AUC-Judd index of the above-mentioned prior art [1], and is higher than other prior arts
AUC-Judd index.
Fig. 5 is to be based on deep neural network color to provided in an embodiment of the present invention using AUC-Borji index Evaluation Method
The conspicuousness prediction technique of coloured silk perception and the technical effect contrast schematic diagram of the prior art;As shown in figure 5, the embodiment of the present invention
AUC-Borji index rate has reached 0.80, is only below the AUC-Borji index rate of documents [3], is higher than other prior arts
AUC-Borji index rate.
Fig. 6 is to be based on deep neural network Color perception to provided in an embodiment of the present invention using AUC index Evaluation Method
Conspicuousness prediction technique and the prior art technical effect contrast schematic diagram;As shown in fig. 6, the AUC of the embodiment of the present invention refers to
Mark has reached 0.79, and is higher than the AUC index of each prior art.
Fig. 7 is to be based on deep neural network Color perception to provided in an embodiment of the present invention using NSS index Evaluation Method
Conspicuousness prediction technique and the prior art technical effect contrast schematic diagram;As shown in fig. 7, the NSS of the embodiment of the present invention refers to
Mark has reached 1.5 and has remained basically stable with the prior art [1], and is higher than other prior arts.
Fig. 8 is to provided in an embodiment of the present invention using IG index Evaluation Method based on deep neural network Color perception
The technical effect contrast schematic diagram of conspicuousness prediction technique and the prior art;As shown in figure 8, the IG index of the embodiment of the present invention reaches
It remains basically stable to 0.37 with the prior art [1], and is higher than other prior arts.
Fig. 9 is to provided in an embodiment of the present invention using CC index Evaluation Method based on deep neural network Color perception
The technical effect contrast schematic diagram of conspicuousness prediction technique and the prior art;As shown in figure 9, the CC index of the embodiment of the present invention reaches
To remaining basically stable with the prior art [1], and it is higher than other prior arts.
It should be noted that AUC-Judd index is area under Judd line;AUC-Borji index is area under Borji line;
NSS (Normalized Scanpath Saliency normalizes scan path conspicuousness);IG (Information Gain, letter
Cease gain);CC (Correlation Coefficient, linearly dependent coefficient).
Using embodiment illustrated in fig. 1 of the present invention, the multiple dimensioned information of color image is included in using double fluid input network,
Then channel weighting sub-network is used, using great Chiization layer to reduce their dimensions and space on cascade Feature Mapping channel
Variance, with the relative importance of object in coded image, to realize the conspicuousness prediction for meeting human eye perception.
Corresponding with embodiment illustrated in fig. 1 of the present invention, the embodiment of the invention provides one kind to be based on deep neural network color
The conspicuousness prediction meanss of coloured silk perception.
Figure 10 is the knot of the conspicuousness prediction meanss provided in an embodiment of the present invention based on deep neural network Color perception
Structure schematic diagram, as shown in Figure 10, described device includes:
Conversion module 101, for each sample image in the sample set for the color image obtained, by the sample
This image is converted to coarseness sample image and fine granularity sample image, wherein image in the fine granularity sample image
The resolution ratio of image of the high resolution in the coarseness sample image;
Input module 102, for fine granularity sample image to be input to preset first VGG network, and by coarseness sample
This image is input in preset 2nd VGG network, obtains the fisrt feature figure corresponding to coarseness sample image, and corresponding
In the second feature figure of fine granularity sample image;
Fusion Module 103 carries out the fisrt feature figure and the second feature figure for utilizing Feature Fusion Algorithm
Fusion treatment obtains blending image;
Convolution module 104, for using the channel weighting sub-network pre-established to identify the characteristic pattern of blending image;It will
The characteristic pattern and blending image of blending image carry out multiplication process, obtain target image;Using preset convolution kernel to the mesh
Logo image carries out process of convolution, the notable figure predicted;
Judgment module 105, the notable figure for obtaining the prediction watch figure attentively with the human eye for corresponding to the sample image
Between cross entropy loss function, and judge whether the value of the cross entropy loss function restrains;
Detection module 106, in the case where the judging result of the judgment module, which is, is, by the first VGG network, the
The network of two VGG networks and channel weighting sub-network composition uses the target network model as target network model
Carry out the conspicuousness prediction of image to be detected;
Module 107 is adjusted, for adjusting the first VGG in the case where the judging result of the judgment module, which is, is
Network and/or the 2nd VGG network and/or Model Weight and/or hyper parameter in channel weighting sub-network, and trigger input mould
Block.
Using embodiment illustrated in fig. 10 of the present invention, the multiple dimensioned letter of color image is included in using double fluid input network
Then breath uses channel weighting sub-network, on cascade Feature Mapping channel using great Chiization layer with reduce their dimensions with
Space variance, with the relative importance of object in coded image, to realize the conspicuousness prediction for meeting human eye perception.
In a kind of specific embodiment of the embodiment of the present invention, the resolution ratio of the image in the fine granularity sample image
For the first preset quantity times of the resolution ratio of the image in the coarseness sample image.
In a kind of specific embodiment of the embodiment of the present invention, the first VGG network includes: the 5 be sequentially connected in series
A processing unit, wherein first processing unit and second processing unit include: 2 convolution kernels and a maximum
Pond layer;
Third processing ternary, the 4th processing unit and the 5th processing unit include: three convolution kernels and
One maximum pond layer.
In a kind of specific embodiment of the embodiment of the present invention, the 2nd VGG network includes: the 5 be sequentially connected in series
A processing unit, wherein first processing unit and second processing unit include: 2 convolution kernels and a maximum
Pond layer;
Third processing ternary, the 4th processing unit and the 5th processing unit include: three convolution kernels and
One maximum pond layer.
In a kind of specific embodiment of the embodiment of the present invention, described device further include: up-sampling module is used for
Up-sampling treatment, obtained second feature figure up-sampling figure are carried out for the second feature figure figure;
Fusion Module 103, is used for:
Using Feature Fusion Algorithm, the up-sampling figure of the fisrt feature figure and second feature figure is subjected to fusion treatment,
Obtain blending image.
In a kind of specific embodiment of the embodiment of the present invention, the channel weighting sub-network, comprising:
It sets gradually having a size of 2*2, and the maximum pond layer that step-length is 2;One feature flattening layer and a full connection
Layer.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.
Claims (10)
1. the conspicuousness prediction technique based on deep neural network Color perception, which is characterized in that the described method includes:
1), for each sample image in the sample set of the color image of acquisition, the sample image is converted into coarse grain
Spend sample image and fine granularity sample image, wherein the high resolution of the image in the fine granularity sample image is in described thick
The resolution ratio of image in granularity sample image;
2) fine granularity sample image, is input to preset first VGG network, and coarseness sample image is input to preset
In 2nd VGG network, the fisrt feature figure corresponding to coarseness sample image is obtained, and corresponding to fine granularity sample image
Second feature figure;
3), using Feature Fusion Algorithm, the fisrt feature figure and the second feature figure is subjected to fusion treatment, merged
Image;
4) characteristic pattern of blending image, is identified using the channel weighting sub-network pre-established;By the characteristic pattern of blending image
Multiplication process is carried out with blending image, obtains target image;The target image is carried out at convolution using preset convolution kernel
Reason, the notable figure predicted;
5) notable figure for, obtaining the prediction is watched attentively between figure with the human eye corresponding to the sample image intersects entropy loss letter
Number, and judge whether the value of the cross entropy loss function restrains;
6), if so, using the network of the first VGG network, the 2nd VGG network and channel weighting sub-network composition as target network
Model, and predicted using the conspicuousness that the target network model carries out image to be detected;
7), if not;Adjust the model power in the first VGG network and/or the 2nd VGG network and/or channel weighting sub-network
Weight and/or hyper parameter, and return to step 2), until the value of the cross entropy loss function restrains.
2. the conspicuousness prediction technique according to claim 1 based on deep neural network Color perception, which is characterized in that
The resolution ratio of image in the fine granularity sample image is the first of the resolution ratio of the image in the coarseness sample image
Preset quantity times.
3. the conspicuousness prediction technique according to claim 1 based on deep neural network Color perception, which is characterized in that
The first VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing unit and second processing
Unit includes: 2 convolution kernels and a maximum pond layer;
It includes: three convolution kernels and one that third, which handles ternary, the 4th processing unit and the 5th processing unit,
Maximum pond layer.
4. the conspicuousness prediction technique according to claim 1 based on deep neural network Color perception, which is characterized in that
The 2nd VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing unit and second processing
Unit includes: 2 convolution kernels and a maximum pond layer;
It includes: three convolution kernels and one that third, which handles ternary, the 4th processing unit and the 5th processing unit,
Maximum pond layer.
5. the conspicuousness prediction technique according to claim 1 based on deep neural network Color perception, which is characterized in that
In step 3), before, the method also includes:
Up-sampling treatment, obtained second feature figure up-sampling figure are carried out for the second feature figure figure;
The step 3), comprising:
Using Feature Fusion Algorithm, the up-sampling figure of the fisrt feature figure and second feature figure is subjected to fusion treatment, is obtained
Blending image.
6. the conspicuousness prediction technique according to claim 1 based on deep neural network Color perception, which is characterized in that
The channel weighting sub-network, comprising:
It sets gradually having a size of 2*2, and the maximum pond layer that step-length is 2;One feature flattening layer and a full articulamentum.
7. the conspicuousness prediction meanss based on deep neural network Color perception, which is characterized in that described device includes:
Conversion module, for each sample image in the sample set for the color image obtained, by the sample image
Be converted to coarseness sample image and fine granularity sample image, wherein the resolution ratio of the image in the fine granularity sample image
Higher than the resolution ratio of the image in the coarseness sample image;
Input module, for fine granularity sample image to be input to preset first VGG network, and coarseness sample image is defeated
Enter into preset 2nd VGG network, obtains the fisrt feature figure corresponding to coarseness sample image, and correspond to fine granularity
The second feature figure of sample image;
The fisrt feature figure is carried out merging place by Fusion Module for utilizing Feature Fusion Algorithm with the second feature figure
Reason, obtains blending image;
Convolution module, for using the channel weighting sub-network pre-established to identify the characteristic pattern of blending image;Fusion is schemed
The characteristic pattern and blending image of picture carry out multiplication process, obtain target image;Using preset convolution kernel to the target image
Carry out process of convolution, the notable figure predicted;
Judgment module, the human eye for obtaining the notable figure of the prediction and corresponding to the sample image watch the friendship between figure attentively
Entropy loss function is pitched, and judges whether the value of the cross entropy loss function restrains;
Detection module, for the judging result of the judgment module be in the case where, by the first VGG network, the 2nd VGG net
The network of network and channel weighting sub-network composition is as target network model, and it is to be checked to use the target network model to carry out
The conspicuousness of altimetric image is predicted;
Adjust module, in the case where the judging result of the judgment module, which is, is, adjust the first VGG network and/
Or the 2nd VGG network and/or Model Weight and/or hyper parameter in channel weighting sub-network, and trigger input module.
8. the conspicuousness prediction meanss according to claim 7 based on deep neural network Color perception, which is characterized in that
The resolution ratio of image in the fine granularity sample image is the first of the resolution ratio of the image in the coarseness sample image
Preset quantity times.
9. the conspicuousness prediction meanss according to claim 7 based on deep neural network Color perception, which is characterized in that
The first VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing unit and second processing
Unit includes: 2 convolution kernels and a maximum pond layer;
It includes: three convolution kernels and one that third, which handles ternary, the 4th processing unit and the 5th processing unit,
Maximum pond layer.
10. the conspicuousness prediction meanss according to claim 7 based on deep neural network Color perception, feature exist
In the 2nd VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing unit and second
Processing unit includes: 2 convolution kernels and a maximum pond layer;
It includes: three convolution kernels and one that third, which handles ternary, the 4th processing unit and the 5th processing unit,
Maximum pond layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910542301.XA CN110223295B (en) | 2019-06-21 | 2019-06-21 | Significance prediction method and device based on deep neural network color perception |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910542301.XA CN110223295B (en) | 2019-06-21 | 2019-06-21 | Significance prediction method and device based on deep neural network color perception |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110223295A true CN110223295A (en) | 2019-09-10 |
CN110223295B CN110223295B (en) | 2022-05-03 |
Family
ID=67814236
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910542301.XA Active CN110223295B (en) | 2019-06-21 | 2019-06-21 | Significance prediction method and device based on deep neural network color perception |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110223295B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110765882A (en) * | 2019-09-25 | 2020-02-07 | 腾讯科技(深圳)有限公司 | Video tag determination method, device, server and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104392463A (en) * | 2014-12-16 | 2015-03-04 | 西安电子科技大学 | Image salient region detection method based on joint sparse multi-scale fusion |
CN105787930A (en) * | 2016-02-17 | 2016-07-20 | 上海文广科技(集团)有限公司 | Sharpness-based significance detection method and system for virtual images |
CN106462771A (en) * | 2016-08-05 | 2017-02-22 | 深圳大学 | 3D image significance detection method |
CN107346436A (en) * | 2017-06-29 | 2017-11-14 | 北京以萨技术股份有限公司 | A kind of vision significance detection method of fused images classification |
CN107833220A (en) * | 2017-11-28 | 2018-03-23 | 河海大学常州校区 | Fabric defect detection method based on depth convolutional neural networks and vision significance |
CN108345892A (en) * | 2018-01-03 | 2018-07-31 | 深圳大学 | A kind of detection method, device, equipment and the storage medium of stereo-picture conspicuousness |
CN109409435A (en) * | 2018-11-01 | 2019-03-01 | 上海大学 | A kind of depth perception conspicuousness detection method based on convolutional neural networks |
US20190147288A1 (en) * | 2017-11-15 | 2019-05-16 | Adobe Inc. | Saliency prediction for informational documents |
-
2019
- 2019-06-21 CN CN201910542301.XA patent/CN110223295B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104392463A (en) * | 2014-12-16 | 2015-03-04 | 西安电子科技大学 | Image salient region detection method based on joint sparse multi-scale fusion |
CN105787930A (en) * | 2016-02-17 | 2016-07-20 | 上海文广科技(集团)有限公司 | Sharpness-based significance detection method and system for virtual images |
CN106462771A (en) * | 2016-08-05 | 2017-02-22 | 深圳大学 | 3D image significance detection method |
CN107346436A (en) * | 2017-06-29 | 2017-11-14 | 北京以萨技术股份有限公司 | A kind of vision significance detection method of fused images classification |
US20190147288A1 (en) * | 2017-11-15 | 2019-05-16 | Adobe Inc. | Saliency prediction for informational documents |
CN107833220A (en) * | 2017-11-28 | 2018-03-23 | 河海大学常州校区 | Fabric defect detection method based on depth convolutional neural networks and vision significance |
CN108345892A (en) * | 2018-01-03 | 2018-07-31 | 深圳大学 | A kind of detection method, device, equipment and the storage medium of stereo-picture conspicuousness |
CN109409435A (en) * | 2018-11-01 | 2019-03-01 | 上海大学 | A kind of depth perception conspicuousness detection method based on convolutional neural networks |
Non-Patent Citations (2)
Title |
---|
XUN H.等: "SALICON:Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks", 《2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION》 * |
李宗民 等: "结合域变换和轮廓检测的显著性目标检测", 《计算机辅助设计与图形学学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110765882A (en) * | 2019-09-25 | 2020-02-07 | 腾讯科技(深圳)有限公司 | Video tag determination method, device, server and storage medium |
CN110765882B (en) * | 2019-09-25 | 2023-04-07 | 腾讯科技(深圳)有限公司 | Video tag determination method, device, server and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110223295B (en) | 2022-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108537743B (en) | Face image enhancement method based on generation countermeasure network | |
CN110135375A (en) | More people's Attitude estimation methods based on global information integration | |
CN104063719B (en) | Pedestrian detection method and device based on depth convolutional network | |
CN109815867A (en) | A kind of crowd density estimation and people flow rate statistical method | |
CN110298266A (en) | Deep neural network object detection method based on multiple dimensioned receptive field Fusion Features | |
CN107463920A (en) | A kind of face identification method for eliminating partial occlusion thing and influenceing | |
CN107403142B (en) | A kind of detection method of micro- expression | |
CN109902646A (en) | A kind of gait recognition method based on long memory network in short-term | |
CN110060236B (en) | Stereoscopic image quality evaluation method based on depth convolution neural network | |
CN109376637A (en) | Passenger number statistical system based on video monitoring image processing | |
CN103824272A (en) | Face super-resolution reconstruction method based on K-neighboring re-recognition | |
CN113963032A (en) | Twin network structure target tracking method fusing target re-identification | |
He et al. | Object-oriented mangrove species classification using hyperspectral data and 3-D Siamese residual network | |
CN108389189B (en) | Three-dimensional image quality evaluation method based on dictionary learning | |
CN114241422A (en) | Student classroom behavior detection method based on ESRGAN and improved YOLOv5s | |
CN104077742B (en) | Human face sketch synthetic method and system based on Gabor characteristic | |
CN110490252A (en) | A kind of occupancy detection method and system based on deep learning | |
CN113139489A (en) | Crowd counting method and system based on background extraction and multi-scale fusion network | |
CN109242812A (en) | Image interfusion method and device based on conspicuousness detection and singular value decomposition | |
Li et al. | A statistical PCA method for face recognition | |
CN111881716A (en) | Pedestrian re-identification method based on multi-view-angle generation countermeasure network | |
CN109146925A (en) | Conspicuousness object detection method under a kind of dynamic scene | |
Zhang et al. | Unsupervised depth estimation from monocular videos with hybrid geometric-refined loss and contextual attention | |
CN113762009A (en) | Crowd counting method based on multi-scale feature fusion and double-attention machine mechanism | |
CN108734200A (en) | Human body target visible detection method and device based on BING features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |