Nothing Special   »   [go: up one dir, main page]

CN110223295A - Conspicuousness prediction technique and device based on deep neural network Color perception - Google Patents

Conspicuousness prediction technique and device based on deep neural network Color perception Download PDF

Info

Publication number
CN110223295A
CN110223295A CN201910542301.XA CN201910542301A CN110223295A CN 110223295 A CN110223295 A CN 110223295A CN 201910542301 A CN201910542301 A CN 201910542301A CN 110223295 A CN110223295 A CN 110223295A
Authority
CN
China
Prior art keywords
image
network
sample image
processing unit
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910542301.XA
Other languages
Chinese (zh)
Other versions
CN110223295B (en
Inventor
李腾
程凯
王妍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN201910542301.XA priority Critical patent/CN110223295B/en
Publication of CN110223295A publication Critical patent/CN110223295A/en
Application granted granted Critical
Publication of CN110223295B publication Critical patent/CN110223295B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses conspicuousness prediction techniques and device based on deep neural network Color perception, method includes: that fine granularity sample image is input to preset first VGG network, and coarseness sample image is input in preset 2nd VGG network, obtain the fisrt feature figure corresponding to coarseness sample image, and the second feature figure corresponding to fine granularity sample image;Using Feature Fusion Algorithm, blending image is obtained;The characteristic pattern of blending image and blending image are subjected to multiplication process, and then the notable figure predicted;Judge whether the value of cross entropy loss function restrains;If so, using the network of the first VGG network, the 2nd VGG network and channel weighting sub-network composition as target network model, and predicted using the conspicuousness that the target network model carries out image to be detected;If not;Model Weight, hyper parameter are adjusted, until convergence.Using the embodiment of the present invention, the conspicuousness prediction for meeting human eye perception may be implemented.

Description

Conspicuousness prediction technique and device based on deep neural network Color perception
Technical field
The present invention relates to a kind of conspicuousness prediction technique and device, it is more particularly to based on deep neural network Color perception Conspicuousness prediction technique and device.
Background technique
In computer vision field, how to make computer simulation human eye, quickly, accurately finds object interested in visual scene The ability of body is very important research field.The process of attention object position is defined as in computer discovery visual scene Conspicuousness prediction.
Currently, conspicuousness prediction technique has been widely used in the fields such as compression of images, target identification, image segmentation, and And achieve significant effect.Conspicuousness in visual scene can come from some column stimulations, including low-level image attribute, such as face Color, direction, size etc. and semantic information.Color is considered as calculating one of the main feature of conspicuousness from top to bottom.It is so far Only, existing some models consider guidance of the different color to human eye attention there are difference, and these othernesses are received Enter attention prediction model.But these otherness conclusions can not be included from the subjective experiment only comprising a small number of colors The classification of full color color in nature, it is therefore, existing that the research model that attention influences can not be extended based on color Conspicuousness prediction is carried out into natural vision scene, therefore, there are the technical problems of prediction effect inaccuracy for the prior art.
Summary of the invention
Technical problem to be solved by the present invention lies in providing, the conspicuousness based on deep neural network Color perception is pre- Method and device is surveyed, to solve the technical problem of prediction effect inaccuracy in the prior art.
The present invention is to solve above-mentioned technical problem by the following technical programs:
The embodiment of the invention provides the conspicuousness prediction technique based on deep neural network Color perception, the method packets It includes:
1), for each sample image in the sample set of the color image of acquisition, the sample image is converted to Coarseness sample image and fine granularity sample image, wherein the high resolution of the image in the fine granularity sample image is in institute State the resolution ratio of the image in coarseness sample image;
2) fine granularity sample image, is input to preset first VGG network, and coarseness sample image is input to pre- If the 2nd VGG network in, obtain the fisrt feature figure corresponding to coarseness sample image, and correspond to fine granularity sample graph The second feature figure of picture;
3), using Feature Fusion Algorithm, the fisrt feature figure and the second feature figure is subjected to fusion treatment, obtained Blending image;
4) characteristic pattern of blending image, is identified using the channel weighting sub-network pre-established;By the spy of blending image Sign figure carries out multiplication process with blending image, obtains target image;The target image is rolled up using preset convolution kernel Product processing, the notable figure predicted;
5) it, obtains the notable figure of the prediction and the human eye corresponding to the sample image watches the damage of the cross entropy between figure attentively Function is lost, and judges whether the value of the cross entropy loss function restrains;
6), if so, using the network of the first VGG network, the 2nd VGG network and channel weighting sub-network composition as target Network model, and predicted using the conspicuousness that the target network model carries out image to be detected;
7), if not;Adjust the mould in the first VGG network and/or the 2nd VGG network and/or channel weighting sub-network Type weight and/or hyper parameter, and return to step 2), until the value of the cross entropy loss function restrains.
Optionally, the resolution ratio of the image in the fine granularity sample image is the image in the coarseness sample image Resolution ratio the first preset quantity times.
Optionally, the first VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing is single Member and second processing unit include: 2 convolution kernels and a maximum pond layer;
Third processing ternary, the 4th processing unit and the 5th processing unit include: three convolution kernels and One maximum pond layer.
Optionally, the 2nd VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing is single Member and second processing unit include: 2 convolution kernels and a maximum pond layer;
Third processing ternary, the 4th processing unit and the 5th processing unit include: three convolution kernels and One maximum pond layer.
Optionally, in step 3), before, the method also includes:
Up-sampling treatment, obtained second feature figure up-sampling figure are carried out for the second feature figure figure;
The step 3), comprising:
Using Feature Fusion Algorithm, the up-sampling figure of the fisrt feature figure and second feature figure is subjected to fusion treatment, Obtain blending image.
Optionally, the channel weighting sub-network, comprising:
It sets gradually having a size of 2*2, and the maximum pond layer that step-length is 2;One feature flattening layer and a full connection Layer.
The embodiment of the invention also provides the conspicuousness prediction meanss based on deep neural network Color perception, described devices Include:
Conversion module, for each sample image in the sample set for the color image obtained, by the sample Image is converted to coarseness sample image and fine granularity sample image, wherein point of the image in the fine granularity sample image Resolution is higher than the resolution ratio of the image in the coarseness sample image;
Input module, for fine granularity sample image to be input to preset first VGG network, and by coarseness sample graph As being input in preset 2nd VGG network, the fisrt feature figure corresponding to coarseness sample image is obtained, and correspond to thin The second feature figure of granularity sample image;
Fusion Module melts the fisrt feature figure and the second feature figure for utilizing Feature Fusion Algorithm Conjunction processing, obtains blending image;
Convolution module, for using the channel weighting sub-network pre-established to identify the characteristic pattern of blending image;It will melt The characteristic pattern and blending image for closing image carry out multiplication process, obtain target image;Using preset convolution kernel to the target Image carries out process of convolution, the notable figure predicted;
Judgment module, the human eye for obtaining the notable figure of the prediction and corresponding to the sample image are watched attentively between figure Cross entropy loss function, and judge whether the value of the cross entropy loss function restrains;
Detection module, for the judging result of the judgment module be in the case where, by the first VGG network, second VGG network and channel weighting sub-network composition network be used as target network model, and use the target network model into The conspicuousness of row image to be detected is predicted;
Module is adjusted, for adjusting the first VGG network in the case where the judging result of the judgment module, which is, is And/or the 2nd VGG network and/or Model Weight and/or hyper parameter in channel weighting sub-network, and trigger input module.
Optionally, the resolution ratio of the image in the fine granularity sample image is the image in the coarseness sample image Resolution ratio the first preset quantity times.
Optionally, the first VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing is single Member and second processing unit include: 2 convolution kernels and a maximum pond layer;
Third processing ternary, the 4th processing unit and the 5th processing unit include: three convolution kernels and One maximum pond layer.
Optionally, the 2nd VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing is single Member and second processing unit include: 2 convolution kernels and a maximum pond layer;
Third processing ternary, the 4th processing unit and the 5th processing unit include: three convolution kernels and One maximum pond layer.
Optionally, described device further include: up-sampling module is used for
Up-sampling treatment, obtained second feature figure up-sampling figure are carried out for the second feature figure figure;
Fusion Module is used for:
Using Feature Fusion Algorithm, the up-sampling figure of the fisrt feature figure and second feature figure is subjected to fusion treatment, Obtain blending image.
Optionally, the channel weighting sub-network, comprising:
It sets gradually having a size of 2*2, and the maximum pond layer that step-length is 2;One feature flattening layer and a full connection Layer.
The present invention has the advantage that compared with prior art
Using the embodiment of the present invention, it is included in the multiple dimensioned information of color image using double fluid input network, then makes With channel weighting sub-network, using great Chiization layer to reduce their dimensions and space variance on cascade Feature Mapping channel, With the relative importance of object in coded image, to realize the conspicuousness prediction for meeting human eye perception.
Detailed description of the invention
Fig. 1 is the process of the conspicuousness prediction technique provided in an embodiment of the present invention based on deep neural network Color perception Schematic diagram;
Fig. 2 is in the conspicuousness prediction technique provided in an embodiment of the present invention based on deep neural network Color perception The structural schematic diagram of VGG16 binary-flow network;
Fig. 3 is in the conspicuousness prediction technique provided in an embodiment of the present invention based on deep neural network Color perception The structural schematic diagram of VGG16 binary-flow network;
Fig. 4 is to be based on deep neural network color to provided in an embodiment of the present invention using AUC-Judd index Evaluation Method The conspicuousness prediction technique of perception and the technical effect contrast schematic diagram of the prior art;
Fig. 5 is to be based on deep neural network color to provided in an embodiment of the present invention using AUC-Borji index Evaluation Method The conspicuousness prediction technique of coloured silk perception and the technical effect contrast schematic diagram of the prior art;
Fig. 6 is to be based on deep neural network Color perception to provided in an embodiment of the present invention using sAUC index Evaluation Method Conspicuousness prediction technique and the prior art technical effect contrast schematic diagram;
Fig. 7 is to be based on deep neural network Color perception to provided in an embodiment of the present invention using NSS index Evaluation Method Conspicuousness prediction technique and the prior art technical effect contrast schematic diagram;
Fig. 8 is to provided in an embodiment of the present invention using IG index Evaluation Method based on deep neural network Color perception The technical effect contrast schematic diagram of conspicuousness prediction technique and the prior art;
Fig. 9 is to provided in an embodiment of the present invention using CC index Evaluation Method based on deep neural network Color perception The technical effect contrast schematic diagram of conspicuousness prediction technique and the prior art;
Figure 10 is the knot of the conspicuousness prediction meanss provided in an embodiment of the present invention based on deep neural network Color perception Structure schematic diagram.
Specific embodiment
It elaborates below to the embodiment of the present invention, the present embodiment carries out under the premise of the technical scheme of the present invention Implement, the detailed implementation method and specific operation process are given, but protection scope of the present invention is not limited to following implementation Example.
The embodiment of the invention provides conspicuousness prediction technique and device based on deep neural network Color perception, below Just the conspicuousness prediction technique provided in an embodiment of the present invention based on deep neural network Color perception is introduced first.
Fig. 1 is the process of the conspicuousness prediction technique provided in an embodiment of the present invention based on deep neural network Color perception Schematic diagram, as shown in Figure 1, which comprises
S101: for each sample image in the sample set of the color image of acquisition, the sample image is converted For coarseness sample image and fine granularity sample image, wherein the high resolution of the image in the fine granularity sample image in The resolution ratio of image in the coarseness sample image.
Specifically, the resolution ratio of the image in the fine granularity sample image is the image in the coarseness sample image Resolution ratio the first preset quantity times.
In practical applications, we initialize the weight and biasing of VGG-16 model first on ImageNet data set, Then, training set and verifying in SALICON data set is used to collect as the depth based on attention mechanism of the embodiment of the present invention Spend the training data of the adaptive conspicuousness prediction network of neural network color.The data set include 15000 color images and Its corresponding 15000 true value label image.For each image of 15000 color images, being cut into Pixel Dimensions is 1000 × 750 image is as fine granularity sample image;Using the image cropping pixel having a size of 500 × 375 image as Fine granularity sample image.15000 true value label images are cut into the image having a size of 32 × 24 as the mark of training network Label.
S102: fine granularity sample image is input to preset first VGG network, and coarseness sample image is input to In preset 2nd VGG network, the fisrt feature figure corresponding to coarseness sample image is obtained, and correspond to fine granularity sample The second feature figure of image.
Specifically, the first VGG network includes: the 5th processing unit being sequentially connected in series, wherein transmitted according to data First processing unit and second processing unit of direction setting include: 2 convolution kernels and a maximum pond layer; It includes: three convolution kernels and a maximum that third, which handles ternary, the 4th processing unit and the 5th processing unit, Pond layer.The 2nd VGG network includes: the 5th processing unit being sequentially connected in series, wherein is arranged according to data direction of transfer First processing unit and second processing unit include: the maximum pond layer of 2 convolution kernels and one;At third Reason ternary, the 4th processing unit and the 5th processing unit include: three convolution kernels and a maximum pond layer.
In practical applications, Fig. 2 is the conspicuousness provided in an embodiment of the present invention based on deep neural network Color perception The structural schematic diagram of VGG16 binary-flow network in prediction technique;As shown in Fig. 2, the input of network is designed as by two VGG-16 groups At binary-flow network.
First VGG network is to carrying out feature extraction on the fine grit sizes of image.By the fine of 1000 × 750 × 3 pixels Image is fed into the first VGG network to extract relatively high-resolution depth characteristic;Simultaneously by 500 × 375 × 3 pixels compared with Thick scale pixel feeds the depth characteristic that comparatively low resolution is extracted in the 2nd VGG network.
Extracting each VGG-16 network used in feature binary-flow network includes: the 5th processing unit being sequentially connected in series, Wherein, first processing unit and second processing unit include: 2 having a size of 3 × 3, convolution step-length be 1 convolution kernel An and maximum pond layer;It includes: three that third, which handles ternary, the 4th processing unit and the 5th processing unit, It is 1 convolution kernel and a maximum pond layer having a size of 3 × 3, convolution step-length.Maximum pond layer in VGG16 network be by Having a size of 2 × 2, step-length be 2 convolution kernel construct.The activation primitive of all hidden layers is all ReLU function.
S103: utilizing Feature Fusion Algorithm, and the fisrt feature figure and the second feature figure are carried out fusion treatment, obtained To blending image.
Specifically, the image tune that double-current VGG-16 network, i.e. the first VGG network and the 2nd VGG network can be exported Whole is identical spatial resolution:
For example, the characteristic size size of the corresponding tributary output of second VGG16 network is 16 × 12, and dimension 512, benefit The 512 dimensional feature figures that a size is 32 × 24 are obtained with up-sampling operation, are then adopted by using Feature Fusion Algorithm The 512 dimensional feature figures that the size of characteristic pattern and the output of first tributary after sample is 32 × 24 carry out Fusion Features and form ruler The 1024 dimensional feature figures that very little size is 32 × 24, as blending image.
In practical applications, Feature Fusion Algorithm is the prior art, is not being repeated here.
S104: the characteristic pattern of blending image is identified using the channel weighting sub-network pre-established;By blending image Characteristic pattern and blending image carry out multiplication process, obtain target image;The target image is carried out using preset convolution kernel Process of convolution, the notable figure predicted.
Illustratively, Fig. 3 is the conspicuousness prediction provided in an embodiment of the present invention based on deep neural network Color perception The structural schematic diagram of VGG16 binary-flow network in method;As shown in figure 3, being channel weighting sub-network in dotted line frame in Fig. 3.In order to The relative importance of the semantic feature of image is captured, channel weighting sub-network is devised in the embodiment of the present invention, for each figure As calculating one group of 1024 dimensional feature weight.Channel weighting sub-network is by being the maximum pond that constitutes of 2 convolution kernels having a size of 2 × 2, step-length Change layer, a feature flattening layer and a full articulamentum composition;Wherein, feature flattening layer is used to input " pressing ", more The input of dimension carries out one-dimensional, and the size for exporting maximum pond layer is that the feature that dimension is 1024*16*12 is flattened into dimension Degree is the one-dimensional vector of 1*196608;The output of full articulamentum is the matrix that a dimension is 1*1024.
The effect of maximum pond layer be on 1024 cascade Feature Mapping channels using 2 × 2 maximum pond layer with Reduce their dimensions and space variance.Then, we flatten output, finally calculate 1024 dimensional vectors using full articulamentum. Each dimension indicates the significant weight of corresponding input channel.Full articulamentum be used for based on its spatial positional information and semantic feature come Learn the relative weighting of subject area different in scene;To encode contextual information, enable the network to highlight The object that color from ambient enviroment causes, to obtain target image.
The spy of target image and the double-current VGG-16 network output by Fusion Features that channel weighting sub-network is exported It levies obtained 1024 dimensional feature and carries out multiplication operation, the image after obtained weighting is 1024 dimensional features having a size of 32 × 24 Image.
Then, convolution behaviour is carried out using 2D image of the convolution kernel having a size of 1 × 1 to 1024 dimensions having a size of 32 × 24 Make, to switch to the single channel 2D Saliency maps having a size of 32 × 24, and network finally Saliency maps are readjusted back it is original The dimension of image.The notable figure predicted.For example, by the single channel 2D Saliency maps tune after convolution operation having a size of 32*24 Whole is original image size.For example, if the image input network having a size of 800*600 is predicted, just by Saliency maps picture Size adjusting is 800*600.
Using 2 × 2 maximum pond layer to reduce their dimensions and space on 1024 cascade Feature Mapping channels Variance.
S105: obtaining the notable figure of the prediction and the human eye corresponding to the sample image watches the cross entropy between figure attentively Loss function, and judge whether the value of the cross entropy loss function restrains;If so, executing S106;If it is not, executing S107.
Human eye watches figure attentively, refers to when showing people to observe sample image, the sample that the eyes of people are captured The image that salient region in image is constituted.
It should be noted that cross entropy loss function is the prior art, which is not described herein again.
S106: using the network of the first VGG network, the 2nd VGG network and channel weighting sub-network composition as target network Network model, and predicted using the conspicuousness that the target network model carries out image to be detected.
It illustratively, can during the conspicuousness for being carried out image to be detected using the target network model is predicted Priori color weight is obtained to first pass through following steps in advance:
A: the image of 200 coloury natural scenes is collected, this 200 color images are then changed into grayscale image Picture.
B: being SensoMotoric using model according to the time threshold that every group of picture showing is previously set, such as 5 seconds The eye tracker system of Instruments (SMI) iView X RED acquired for 18 year in the case where sample frequency is 250Hz Age 22-29 years old observer watch attentively above-mentioned color image and and gray level image obtain eye movement data.
C: it on the basis of eye movement data, obtains human eye and watches figure attentively, and watch human eye attentively figure normalized.This step is The prior art, which is not described herein again.
D: the attention score that the human eye after normalized watches each pixel in figure attentively is defined as human eye and watches figure pair attentively The gray value of pixel is answered, the value range of attention score is [0,1].It should be noted that every group of image after normalization Including color image and corresponding gray level image.
E: being directed to each group of image, calculates the attention score value of pixel and the corresponding and pixel in color image The difference of the attention score value of pixel in grayscale image is greater than the value of setting in the difference, when such as 0.1, by the pixel pair The color answered is as conspicuousness color, using the difference as the score of the conspicuousness attention of the pixel.
F: also operating other pixels according to the method described above, available all conspicuousness color and its correspondence Pixel conspicuousness attention score.
In S106 step, may comprise steps of:
Firstly, the Saliency maps obtained using target network model prediction.
Then, for each pixel of the color image of input target network model, if the color of the pixel belongs to It is corresponding aobvious to set the pixel for the corresponding priori color weight of the pixel for the conspicuousness color determined in A-F step Work property attention score;It is not belonging to conspicuousness color if the pixel, then the priori color weight of the pixel is set as pre- The value first set, such as 0.1.
Finally, by the significance value of each pixel in the Saliency maps obtained using target network model prediction multiplied by After priori color weight, then the Saliency maps obtained after being normalized are as final prediction result.
Using the above embodiment of the present invention, the accuracy of prediction result can be improved.
S107: the model power in adjustment the first VGG network and/or the 2nd VGG network and/or channel weighting sub-network Weight and/or hyper parameter, and S102 is returned to step, until the value of the cross entropy loss function restrains.
Illustratively, during adjusting Model Weight and/or hyper parameter, momentum is arranged to 0.9, weight decaying setting It is 0.00005, learning rate is set as 0.00005, criticizes and is sized to 32.
In addition, the value convergence of cross entropy loss function refers to that the value of cross entropy loss function is less than preset threshold.
AUC- is used on disclosed saliency data collection CAT2000 in order to illustrate the technical effect of the embodiment of the present invention Judd (JuddArea Under Curve, HEY JUDE line under area), AUC-Borji, sAUC, NSS (Normalized Scanpath Saliency, normalize the significant property of scan path), IG (Information Gain, information gain), CC (Linear Correlation Coefficient, linearly dependent coefficient) evaluation method is to the embodiment of the present invention and following six The conspicuousness prediction result of kind conspicuousness prediction technique is compared:
[1]S.Fan,Z.Shen,M.Jiang,B.L.Koenig,J.Xu,M.S.Kankanhalli,and
Q.Zhao,“Emotional attention:A study of image sentiment and visual attention,”in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018,pp.7521–7531.
[2]X.Huang,C.Shen,X.Boix,and Q.Zhao,“Salicon:Reducing the semantic gap in saliency prediction by adapting deep neural networks,”in IEEE International Conference on Computer Vision,2015,pp.262–270.
[3]J.Pan,C.C.Ferrer,K.McGuinness,N.E.O’Connor,J.Torres,E.Sayrol,and X.Giro-i Nieto,“Salgan:Visual saliency prediction with generative adversarial networks,”arXiv preprint arXiv:1701.01081,2017.
[4]M.Cornia,L.Baraldi,G.Serra,and R.Cucchiara,“A deep multi-level network for saliency prediction,”in Pattern Recognition(ICPR),2016 23rd International Conference on.IEEE,2016,pp.3488–3493.
[5]J.Zhang and S.Sclaroff,“Saliency detection:A boolean map approach,”in IEEE International Conference on Computer Vision,2013.
[6]H.Tang,C.Chen,and X.Pei,“Visual saliency detection via sparse residual and outlier detection,”IEEE Signal Processing Letters,vol.23,no.12, pp.1736–1740,2016.
Fig. 4 is to be based on deep neural network color to provided in an embodiment of the present invention using AUC-Judd index Evaluation Method The conspicuousness prediction technique of perception and the technical effect contrast schematic diagram of the prior art;As shown in figure 4, the embodiment of the present invention AUC-Judd index has reached 0.83, close with the AUC-Judd index of the above-mentioned prior art [1], and is higher than other prior arts AUC-Judd index.
Fig. 5 is to be based on deep neural network color to provided in an embodiment of the present invention using AUC-Borji index Evaluation Method The conspicuousness prediction technique of coloured silk perception and the technical effect contrast schematic diagram of the prior art;As shown in figure 5, the embodiment of the present invention AUC-Borji index rate has reached 0.80, is only below the AUC-Borji index rate of documents [3], is higher than other prior arts AUC-Borji index rate.
Fig. 6 is to be based on deep neural network Color perception to provided in an embodiment of the present invention using AUC index Evaluation Method Conspicuousness prediction technique and the prior art technical effect contrast schematic diagram;As shown in fig. 6, the AUC of the embodiment of the present invention refers to Mark has reached 0.79, and is higher than the AUC index of each prior art.
Fig. 7 is to be based on deep neural network Color perception to provided in an embodiment of the present invention using NSS index Evaluation Method Conspicuousness prediction technique and the prior art technical effect contrast schematic diagram;As shown in fig. 7, the NSS of the embodiment of the present invention refers to Mark has reached 1.5 and has remained basically stable with the prior art [1], and is higher than other prior arts.
Fig. 8 is to provided in an embodiment of the present invention using IG index Evaluation Method based on deep neural network Color perception The technical effect contrast schematic diagram of conspicuousness prediction technique and the prior art;As shown in figure 8, the IG index of the embodiment of the present invention reaches It remains basically stable to 0.37 with the prior art [1], and is higher than other prior arts.
Fig. 9 is to provided in an embodiment of the present invention using CC index Evaluation Method based on deep neural network Color perception The technical effect contrast schematic diagram of conspicuousness prediction technique and the prior art;As shown in figure 9, the CC index of the embodiment of the present invention reaches To remaining basically stable with the prior art [1], and it is higher than other prior arts.
It should be noted that AUC-Judd index is area under Judd line;AUC-Borji index is area under Borji line; NSS (Normalized Scanpath Saliency normalizes scan path conspicuousness);IG (Information Gain, letter Cease gain);CC (Correlation Coefficient, linearly dependent coefficient).
Using embodiment illustrated in fig. 1 of the present invention, the multiple dimensioned information of color image is included in using double fluid input network, Then channel weighting sub-network is used, using great Chiization layer to reduce their dimensions and space on cascade Feature Mapping channel Variance, with the relative importance of object in coded image, to realize the conspicuousness prediction for meeting human eye perception.
Corresponding with embodiment illustrated in fig. 1 of the present invention, the embodiment of the invention provides one kind to be based on deep neural network color The conspicuousness prediction meanss of coloured silk perception.
Figure 10 is the knot of the conspicuousness prediction meanss provided in an embodiment of the present invention based on deep neural network Color perception Structure schematic diagram, as shown in Figure 10, described device includes:
Conversion module 101, for each sample image in the sample set for the color image obtained, by the sample This image is converted to coarseness sample image and fine granularity sample image, wherein image in the fine granularity sample image The resolution ratio of image of the high resolution in the coarseness sample image;
Input module 102, for fine granularity sample image to be input to preset first VGG network, and by coarseness sample This image is input in preset 2nd VGG network, obtains the fisrt feature figure corresponding to coarseness sample image, and corresponding In the second feature figure of fine granularity sample image;
Fusion Module 103 carries out the fisrt feature figure and the second feature figure for utilizing Feature Fusion Algorithm Fusion treatment obtains blending image;
Convolution module 104, for using the channel weighting sub-network pre-established to identify the characteristic pattern of blending image;It will The characteristic pattern and blending image of blending image carry out multiplication process, obtain target image;Using preset convolution kernel to the mesh Logo image carries out process of convolution, the notable figure predicted;
Judgment module 105, the notable figure for obtaining the prediction watch figure attentively with the human eye for corresponding to the sample image Between cross entropy loss function, and judge whether the value of the cross entropy loss function restrains;
Detection module 106, in the case where the judging result of the judgment module, which is, is, by the first VGG network, the The network of two VGG networks and channel weighting sub-network composition uses the target network model as target network model Carry out the conspicuousness prediction of image to be detected;
Module 107 is adjusted, for adjusting the first VGG in the case where the judging result of the judgment module, which is, is Network and/or the 2nd VGG network and/or Model Weight and/or hyper parameter in channel weighting sub-network, and trigger input mould Block.
Using embodiment illustrated in fig. 10 of the present invention, the multiple dimensioned letter of color image is included in using double fluid input network Then breath uses channel weighting sub-network, on cascade Feature Mapping channel using great Chiization layer with reduce their dimensions with Space variance, with the relative importance of object in coded image, to realize the conspicuousness prediction for meeting human eye perception.
In a kind of specific embodiment of the embodiment of the present invention, the resolution ratio of the image in the fine granularity sample image For the first preset quantity times of the resolution ratio of the image in the coarseness sample image.
In a kind of specific embodiment of the embodiment of the present invention, the first VGG network includes: the 5 be sequentially connected in series A processing unit, wherein first processing unit and second processing unit include: 2 convolution kernels and a maximum Pond layer;
Third processing ternary, the 4th processing unit and the 5th processing unit include: three convolution kernels and One maximum pond layer.
In a kind of specific embodiment of the embodiment of the present invention, the 2nd VGG network includes: the 5 be sequentially connected in series A processing unit, wherein first processing unit and second processing unit include: 2 convolution kernels and a maximum Pond layer;
Third processing ternary, the 4th processing unit and the 5th processing unit include: three convolution kernels and One maximum pond layer.
In a kind of specific embodiment of the embodiment of the present invention, described device further include: up-sampling module is used for
Up-sampling treatment, obtained second feature figure up-sampling figure are carried out for the second feature figure figure;
Fusion Module 103, is used for:
Using Feature Fusion Algorithm, the up-sampling figure of the fisrt feature figure and second feature figure is subjected to fusion treatment, Obtain blending image.
In a kind of specific embodiment of the embodiment of the present invention, the channel weighting sub-network, comprising:
It sets gradually having a size of 2*2, and the maximum pond layer that step-length is 2;One feature flattening layer and a full connection Layer.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims (10)

1. the conspicuousness prediction technique based on deep neural network Color perception, which is characterized in that the described method includes:
1), for each sample image in the sample set of the color image of acquisition, the sample image is converted into coarse grain Spend sample image and fine granularity sample image, wherein the high resolution of the image in the fine granularity sample image is in described thick The resolution ratio of image in granularity sample image;
2) fine granularity sample image, is input to preset first VGG network, and coarseness sample image is input to preset In 2nd VGG network, the fisrt feature figure corresponding to coarseness sample image is obtained, and corresponding to fine granularity sample image Second feature figure;
3), using Feature Fusion Algorithm, the fisrt feature figure and the second feature figure is subjected to fusion treatment, merged Image;
4) characteristic pattern of blending image, is identified using the channel weighting sub-network pre-established;By the characteristic pattern of blending image Multiplication process is carried out with blending image, obtains target image;The target image is carried out at convolution using preset convolution kernel Reason, the notable figure predicted;
5) notable figure for, obtaining the prediction is watched attentively between figure with the human eye corresponding to the sample image intersects entropy loss letter Number, and judge whether the value of the cross entropy loss function restrains;
6), if so, using the network of the first VGG network, the 2nd VGG network and channel weighting sub-network composition as target network Model, and predicted using the conspicuousness that the target network model carries out image to be detected;
7), if not;Adjust the model power in the first VGG network and/or the 2nd VGG network and/or channel weighting sub-network Weight and/or hyper parameter, and return to step 2), until the value of the cross entropy loss function restrains.
2. the conspicuousness prediction technique according to claim 1 based on deep neural network Color perception, which is characterized in that The resolution ratio of image in the fine granularity sample image is the first of the resolution ratio of the image in the coarseness sample image Preset quantity times.
3. the conspicuousness prediction technique according to claim 1 based on deep neural network Color perception, which is characterized in that The first VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing unit and second processing Unit includes: 2 convolution kernels and a maximum pond layer;
It includes: three convolution kernels and one that third, which handles ternary, the 4th processing unit and the 5th processing unit, Maximum pond layer.
4. the conspicuousness prediction technique according to claim 1 based on deep neural network Color perception, which is characterized in that The 2nd VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing unit and second processing Unit includes: 2 convolution kernels and a maximum pond layer;
It includes: three convolution kernels and one that third, which handles ternary, the 4th processing unit and the 5th processing unit, Maximum pond layer.
5. the conspicuousness prediction technique according to claim 1 based on deep neural network Color perception, which is characterized in that In step 3), before, the method also includes:
Up-sampling treatment, obtained second feature figure up-sampling figure are carried out for the second feature figure figure;
The step 3), comprising:
Using Feature Fusion Algorithm, the up-sampling figure of the fisrt feature figure and second feature figure is subjected to fusion treatment, is obtained Blending image.
6. the conspicuousness prediction technique according to claim 1 based on deep neural network Color perception, which is characterized in that The channel weighting sub-network, comprising:
It sets gradually having a size of 2*2, and the maximum pond layer that step-length is 2;One feature flattening layer and a full articulamentum.
7. the conspicuousness prediction meanss based on deep neural network Color perception, which is characterized in that described device includes:
Conversion module, for each sample image in the sample set for the color image obtained, by the sample image Be converted to coarseness sample image and fine granularity sample image, wherein the resolution ratio of the image in the fine granularity sample image Higher than the resolution ratio of the image in the coarseness sample image;
Input module, for fine granularity sample image to be input to preset first VGG network, and coarseness sample image is defeated Enter into preset 2nd VGG network, obtains the fisrt feature figure corresponding to coarseness sample image, and correspond to fine granularity The second feature figure of sample image;
The fisrt feature figure is carried out merging place by Fusion Module for utilizing Feature Fusion Algorithm with the second feature figure Reason, obtains blending image;
Convolution module, for using the channel weighting sub-network pre-established to identify the characteristic pattern of blending image;Fusion is schemed The characteristic pattern and blending image of picture carry out multiplication process, obtain target image;Using preset convolution kernel to the target image Carry out process of convolution, the notable figure predicted;
Judgment module, the human eye for obtaining the notable figure of the prediction and corresponding to the sample image watch the friendship between figure attentively Entropy loss function is pitched, and judges whether the value of the cross entropy loss function restrains;
Detection module, for the judging result of the judgment module be in the case where, by the first VGG network, the 2nd VGG net The network of network and channel weighting sub-network composition is as target network model, and it is to be checked to use the target network model to carry out The conspicuousness of altimetric image is predicted;
Adjust module, in the case where the judging result of the judgment module, which is, is, adjust the first VGG network and/ Or the 2nd VGG network and/or Model Weight and/or hyper parameter in channel weighting sub-network, and trigger input module.
8. the conspicuousness prediction meanss according to claim 7 based on deep neural network Color perception, which is characterized in that The resolution ratio of image in the fine granularity sample image is the first of the resolution ratio of the image in the coarseness sample image Preset quantity times.
9. the conspicuousness prediction meanss according to claim 7 based on deep neural network Color perception, which is characterized in that The first VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing unit and second processing Unit includes: 2 convolution kernels and a maximum pond layer;
It includes: three convolution kernels and one that third, which handles ternary, the 4th processing unit and the 5th processing unit, Maximum pond layer.
10. the conspicuousness prediction meanss according to claim 7 based on deep neural network Color perception, feature exist In the 2nd VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing unit and second Processing unit includes: 2 convolution kernels and a maximum pond layer;
It includes: three convolution kernels and one that third, which handles ternary, the 4th processing unit and the 5th processing unit, Maximum pond layer.
CN201910542301.XA 2019-06-21 2019-06-21 Significance prediction method and device based on deep neural network color perception Active CN110223295B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910542301.XA CN110223295B (en) 2019-06-21 2019-06-21 Significance prediction method and device based on deep neural network color perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910542301.XA CN110223295B (en) 2019-06-21 2019-06-21 Significance prediction method and device based on deep neural network color perception

Publications (2)

Publication Number Publication Date
CN110223295A true CN110223295A (en) 2019-09-10
CN110223295B CN110223295B (en) 2022-05-03

Family

ID=67814236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910542301.XA Active CN110223295B (en) 2019-06-21 2019-06-21 Significance prediction method and device based on deep neural network color perception

Country Status (1)

Country Link
CN (1) CN110223295B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765882A (en) * 2019-09-25 2020-02-07 腾讯科技(深圳)有限公司 Video tag determination method, device, server and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104392463A (en) * 2014-12-16 2015-03-04 西安电子科技大学 Image salient region detection method based on joint sparse multi-scale fusion
CN105787930A (en) * 2016-02-17 2016-07-20 上海文广科技(集团)有限公司 Sharpness-based significance detection method and system for virtual images
CN106462771A (en) * 2016-08-05 2017-02-22 深圳大学 3D image significance detection method
CN107346436A (en) * 2017-06-29 2017-11-14 北京以萨技术股份有限公司 A kind of vision significance detection method of fused images classification
CN107833220A (en) * 2017-11-28 2018-03-23 河海大学常州校区 Fabric defect detection method based on depth convolutional neural networks and vision significance
CN108345892A (en) * 2018-01-03 2018-07-31 深圳大学 A kind of detection method, device, equipment and the storage medium of stereo-picture conspicuousness
CN109409435A (en) * 2018-11-01 2019-03-01 上海大学 A kind of depth perception conspicuousness detection method based on convolutional neural networks
US20190147288A1 (en) * 2017-11-15 2019-05-16 Adobe Inc. Saliency prediction for informational documents

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104392463A (en) * 2014-12-16 2015-03-04 西安电子科技大学 Image salient region detection method based on joint sparse multi-scale fusion
CN105787930A (en) * 2016-02-17 2016-07-20 上海文广科技(集团)有限公司 Sharpness-based significance detection method and system for virtual images
CN106462771A (en) * 2016-08-05 2017-02-22 深圳大学 3D image significance detection method
CN107346436A (en) * 2017-06-29 2017-11-14 北京以萨技术股份有限公司 A kind of vision significance detection method of fused images classification
US20190147288A1 (en) * 2017-11-15 2019-05-16 Adobe Inc. Saliency prediction for informational documents
CN107833220A (en) * 2017-11-28 2018-03-23 河海大学常州校区 Fabric defect detection method based on depth convolutional neural networks and vision significance
CN108345892A (en) * 2018-01-03 2018-07-31 深圳大学 A kind of detection method, device, equipment and the storage medium of stereo-picture conspicuousness
CN109409435A (en) * 2018-11-01 2019-03-01 上海大学 A kind of depth perception conspicuousness detection method based on convolutional neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XUN H.等: "SALICON:Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks", 《2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION》 *
李宗民 等: "结合域变换和轮廓检测的显著性目标检测", 《计算机辅助设计与图形学学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765882A (en) * 2019-09-25 2020-02-07 腾讯科技(深圳)有限公司 Video tag determination method, device, server and storage medium
CN110765882B (en) * 2019-09-25 2023-04-07 腾讯科技(深圳)有限公司 Video tag determination method, device, server and storage medium

Also Published As

Publication number Publication date
CN110223295B (en) 2022-05-03

Similar Documents

Publication Publication Date Title
CN108537743B (en) Face image enhancement method based on generation countermeasure network
CN110135375A (en) More people's Attitude estimation methods based on global information integration
CN104063719B (en) Pedestrian detection method and device based on depth convolutional network
CN109815867A (en) A kind of crowd density estimation and people flow rate statistical method
CN110298266A (en) Deep neural network object detection method based on multiple dimensioned receptive field Fusion Features
CN107463920A (en) A kind of face identification method for eliminating partial occlusion thing and influenceing
CN107403142B (en) A kind of detection method of micro- expression
CN109902646A (en) A kind of gait recognition method based on long memory network in short-term
CN110060236B (en) Stereoscopic image quality evaluation method based on depth convolution neural network
CN109376637A (en) Passenger number statistical system based on video monitoring image processing
CN103824272A (en) Face super-resolution reconstruction method based on K-neighboring re-recognition
CN113963032A (en) Twin network structure target tracking method fusing target re-identification
He et al. Object-oriented mangrove species classification using hyperspectral data and 3-D Siamese residual network
CN108389189B (en) Three-dimensional image quality evaluation method based on dictionary learning
CN114241422A (en) Student classroom behavior detection method based on ESRGAN and improved YOLOv5s
CN104077742B (en) Human face sketch synthetic method and system based on Gabor characteristic
CN110490252A (en) A kind of occupancy detection method and system based on deep learning
CN113139489A (en) Crowd counting method and system based on background extraction and multi-scale fusion network
CN109242812A (en) Image interfusion method and device based on conspicuousness detection and singular value decomposition
Li et al. A statistical PCA method for face recognition
CN111881716A (en) Pedestrian re-identification method based on multi-view-angle generation countermeasure network
CN109146925A (en) Conspicuousness object detection method under a kind of dynamic scene
Zhang et al. Unsupervised depth estimation from monocular videos with hybrid geometric-refined loss and contextual attention
CN113762009A (en) Crowd counting method based on multi-scale feature fusion and double-attention machine mechanism
CN108734200A (en) Human body target visible detection method and device based on BING features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant