CN110223295A

CN110223295A - Conspicuousness prediction technique and device based on deep neural network Color perception

Info

Publication number: CN110223295A
Application number: CN201910542301.XA
Authority: CN
Inventors: 李腾; 程凯; 王妍
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2019-06-21
Filing date: 2019-06-21
Publication date: 2019-09-10
Anticipated expiration: 2039-06-21
Also published as: CN110223295B

Abstract

The invention discloses conspicuousness prediction techniques and device based on deep neural network Color perception, method includes: that fine granularity sample image is input to preset first VGG network, and coarseness sample image is input in preset 2nd VGG network, obtain the fisrt feature figure corresponding to coarseness sample image, and the second feature figure corresponding to fine granularity sample image；Using Feature Fusion Algorithm, blending image is obtained；The characteristic pattern of blending image and blending image are subjected to multiplication process, and then the notable figure predicted；Judge whether the value of cross entropy loss function restrains；If so, using the network of the first VGG network, the 2nd VGG network and channel weighting sub-network composition as target network model, and predicted using the conspicuousness that the target network model carries out image to be detected；If not；Model Weight, hyper parameter are adjusted, until convergence.Using the embodiment of the present invention, the conspicuousness prediction for meeting human eye perception may be implemented.

Description

Conspicuousness prediction technique and device based on deep neural network Color perception

Technical field

The present invention relates to a kind of conspicuousness prediction technique and device, it is more particularly to based on deep neural network Color perception Conspicuousness prediction technique and device.

Background technique

In computer vision field, how to make computer simulation human eye, quickly, accurately finds object interested in visual scene The ability of body is very important research field.The process of attention object position is defined as in computer discovery visual scene Conspicuousness prediction.

Currently, conspicuousness prediction technique has been widely used in the fields such as compression of images, target identification, image segmentation, and And achieve significant effect.Conspicuousness in visual scene can come from some column stimulations, including low-level image attribute, such as face Color, direction, size etc. and semantic information.Color is considered as calculating one of the main feature of conspicuousness from top to bottom.It is so far Only, existing some models consider guidance of the different color to human eye attention there are difference, and these othernesses are received Enter attention prediction model.But these otherness conclusions can not be included from the subjective experiment only comprising a small number of colors The classification of full color color in nature, it is therefore, existing that the research model that attention influences can not be extended based on color Conspicuousness prediction is carried out into natural vision scene, therefore, there are the technical problems of prediction effect inaccuracy for the prior art.

Summary of the invention

Technical problem to be solved by the present invention lies in providing, the conspicuousness based on deep neural network Color perception is pre- Method and device is surveyed, to solve the technical problem of prediction effect inaccuracy in the prior art.

The present invention is to solve above-mentioned technical problem by the following technical programs:

The embodiment of the invention provides the conspicuousness prediction technique based on deep neural network Color perception, the method packets It includes:

1), for each sample image in the sample set of the color image of acquisition, the sample image is converted to Coarseness sample image and fine granularity sample image, wherein the high resolution of the image in the fine granularity sample image is in institute State the resolution ratio of the image in coarseness sample image；

2) fine granularity sample image, is input to preset first VGG network, and coarseness sample image is input to pre- If the 2nd VGG network in, obtain the fisrt feature figure corresponding to coarseness sample image, and correspond to fine granularity sample graph The second feature figure of picture；

3), using Feature Fusion Algorithm, the fisrt feature figure and the second feature figure is subjected to fusion treatment, obtained Blending image；

4) characteristic pattern of blending image, is identified using the channel weighting sub-network pre-established；By the spy of blending image Sign figure carries out multiplication process with blending image, obtains target image；The target image is rolled up using preset convolution kernel Product processing, the notable figure predicted；

5) it, obtains the notable figure of the prediction and the human eye corresponding to the sample image watches the damage of the cross entropy between figure attentively Function is lost, and judges whether the value of the cross entropy loss function restrains；

6), if so, using the network of the first VGG network, the 2nd VGG network and channel weighting sub-network composition as target Network model, and predicted using the conspicuousness that the target network model carries out image to be detected；

7), if not；Adjust the mould in the first VGG network and/or the 2nd VGG network and/or channel weighting sub-network Type weight and/or hyper parameter, and return to step 2), until the value of the cross entropy loss function restrains.

Optionally, the resolution ratio of the image in the fine granularity sample image is the image in the coarseness sample image Resolution ratio the first preset quantity times.

Optionally, the first VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing is single Member and second processing unit include: 2 convolution kernels and a maximum pond layer；

Third processing ternary, the 4th processing unit and the 5th processing unit include: three convolution kernels and One maximum pond layer.

Optionally, the 2nd VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing is single Member and second processing unit include: 2 convolution kernels and a maximum pond layer；

Optionally, in step 3), before, the method also includes:

Up-sampling treatment, obtained second feature figure up-sampling figure are carried out for the second feature figure figure；

The step 3), comprising:

Using Feature Fusion Algorithm, the up-sampling figure of the fisrt feature figure and second feature figure is subjected to fusion treatment, Obtain blending image.

Optionally, the channel weighting sub-network, comprising:

It sets gradually having a size of 2*2, and the maximum pond layer that step-length is 2；One feature flattening layer and a full connection Layer.

The embodiment of the invention also provides the conspicuousness prediction meanss based on deep neural network Color perception, described devices Include:

Conversion module, for each sample image in the sample set for the color image obtained, by the sample Image is converted to coarseness sample image and fine granularity sample image, wherein point of the image in the fine granularity sample image Resolution is higher than the resolution ratio of the image in the coarseness sample image；

Input module, for fine granularity sample image to be input to preset first VGG network, and by coarseness sample graph As being input in preset 2nd VGG network, the fisrt feature figure corresponding to coarseness sample image is obtained, and correspond to thin The second feature figure of granularity sample image；

Fusion Module melts the fisrt feature figure and the second feature figure for utilizing Feature Fusion Algorithm Conjunction processing, obtains blending image；

Convolution module, for using the channel weighting sub-network pre-established to identify the characteristic pattern of blending image；It will melt The characteristic pattern and blending image for closing image carry out multiplication process, obtain target image；Using preset convolution kernel to the target Image carries out process of convolution, the notable figure predicted；

Judgment module, the human eye for obtaining the notable figure of the prediction and corresponding to the sample image are watched attentively between figure Cross entropy loss function, and judge whether the value of the cross entropy loss function restrains；

Detection module, for the judging result of the judgment module be in the case where, by the first VGG network, second VGG network and channel weighting sub-network composition network be used as target network model, and use the target network model into The conspicuousness of row image to be detected is predicted；

Module is adjusted, for adjusting the first VGG network in the case where the judging result of the judgment module, which is, is And/or the 2nd VGG network and/or Model Weight and/or hyper parameter in channel weighting sub-network, and trigger input module.

Optionally, described device further include: up-sampling module is used for

Fusion Module is used for:

Optionally, the channel weighting sub-network, comprising:

The present invention has the advantage that compared with prior art

Using the embodiment of the present invention, it is included in the multiple dimensioned information of color image using double fluid input network, then makes With channel weighting sub-network, using great Chiization layer to reduce their dimensions and space variance on cascade Feature Mapping channel, With the relative importance of object in coded image, to realize the conspicuousness prediction for meeting human eye perception.

Detailed description of the invention

Fig. 1 is the process of the conspicuousness prediction technique provided in an embodiment of the present invention based on deep neural network Color perception Schematic diagram；

Fig. 2 is in the conspicuousness prediction technique provided in an embodiment of the present invention based on deep neural network Color perception The structural schematic diagram of VGG16 binary-flow network；

Fig. 3 is in the conspicuousness prediction technique provided in an embodiment of the present invention based on deep neural network Color perception The structural schematic diagram of VGG16 binary-flow network；

Fig. 4 is to be based on deep neural network color to provided in an embodiment of the present invention using AUC-Judd index Evaluation Method The conspicuousness prediction technique of perception and the technical effect contrast schematic diagram of the prior art；

Fig. 5 is to be based on deep neural network color to provided in an embodiment of the present invention using AUC-Borji index Evaluation Method The conspicuousness prediction technique of coloured silk perception and the technical effect contrast schematic diagram of the prior art；

Fig. 6 is to be based on deep neural network Color perception to provided in an embodiment of the present invention using sAUC index Evaluation Method Conspicuousness prediction technique and the prior art technical effect contrast schematic diagram；

Fig. 7 is to be based on deep neural network Color perception to provided in an embodiment of the present invention using NSS index Evaluation Method Conspicuousness prediction technique and the prior art technical effect contrast schematic diagram；

Fig. 8 is to provided in an embodiment of the present invention using IG index Evaluation Method based on deep neural network Color perception The technical effect contrast schematic diagram of conspicuousness prediction technique and the prior art；

Fig. 9 is to provided in an embodiment of the present invention using CC index Evaluation Method based on deep neural network Color perception The technical effect contrast schematic diagram of conspicuousness prediction technique and the prior art；

Figure 10 is the knot of the conspicuousness prediction meanss provided in an embodiment of the present invention based on deep neural network Color perception Structure schematic diagram.

Specific embodiment

It elaborates below to the embodiment of the present invention, the present embodiment carries out under the premise of the technical scheme of the present invention Implement, the detailed implementation method and specific operation process are given, but protection scope of the present invention is not limited to following implementation Example.

The embodiment of the invention provides conspicuousness prediction technique and device based on deep neural network Color perception, below Just the conspicuousness prediction technique provided in an embodiment of the present invention based on deep neural network Color perception is introduced first.

Fig. 1 is the process of the conspicuousness prediction technique provided in an embodiment of the present invention based on deep neural network Color perception Schematic diagram, as shown in Figure 1, which comprises

S101: for each sample image in the sample set of the color image of acquisition, the sample image is converted For coarseness sample image and fine granularity sample image, wherein the high resolution of the image in the fine granularity sample image in The resolution ratio of image in the coarseness sample image.

Specifically, the resolution ratio of the image in the fine granularity sample image is the image in the coarseness sample image Resolution ratio the first preset quantity times.

In practical applications, we initialize the weight and biasing of VGG-16 model first on ImageNet data set, Then, training set and verifying in SALICON data set is used to collect as the depth based on attention mechanism of the embodiment of the present invention Spend the training data of the adaptive conspicuousness prediction network of neural network color.The data set include 15000 color images and Its corresponding 15000 true value label image.For each image of 15000 color images, being cut into Pixel Dimensions is 1000 × 750 image is as fine granularity sample image；Using the image cropping pixel having a size of 500 × 375 image as Fine granularity sample image.15000 true value label images are cut into the image having a size of 32 × 24 as the mark of training network Label.

S102: fine granularity sample image is input to preset first VGG network, and coarseness sample image is input to In preset 2nd VGG network, the fisrt feature figure corresponding to coarseness sample image is obtained, and correspond to fine granularity sample The second feature figure of image.

Specifically, the first VGG network includes: the 5th processing unit being sequentially connected in series, wherein transmitted according to data First processing unit and second processing unit of direction setting include: 2 convolution kernels and a maximum pond layer； It includes: three convolution kernels and a maximum that third, which handles ternary, the 4th processing unit and the 5th processing unit, Pond layer.The 2nd VGG network includes: the 5th processing unit being sequentially connected in series, wherein is arranged according to data direction of transfer First processing unit and second processing unit include: the maximum pond layer of 2 convolution kernels and one；At third Reason ternary, the 4th processing unit and the 5th processing unit include: three convolution kernels and a maximum pond layer.

In practical applications, Fig. 2 is the conspicuousness provided in an embodiment of the present invention based on deep neural network Color perception The structural schematic diagram of VGG16 binary-flow network in prediction technique；As shown in Fig. 2, the input of network is designed as by two VGG-16 groups At binary-flow network.

First VGG network is to carrying out feature extraction on the fine grit sizes of image.By the fine of 1000 × 750 × 3 pixels Image is fed into the first VGG network to extract relatively high-resolution depth characteristic；Simultaneously by 500 × 375 × 3 pixels compared with Thick scale pixel feeds the depth characteristic that comparatively low resolution is extracted in the 2nd VGG network.

Extracting each VGG-16 network used in feature binary-flow network includes: the 5th processing unit being sequentially connected in series, Wherein, first processing unit and second processing unit include: 2 having a size of 3 × 3, convolution step-length be 1 convolution kernel An and maximum pond layer；It includes: three that third, which handles ternary, the 4th processing unit and the 5th processing unit, It is 1 convolution kernel and a maximum pond layer having a size of 3 × 3, convolution step-length.Maximum pond layer in VGG16 network be by Having a size of 2 × 2, step-length be 2 convolution kernel construct.The activation primitive of all hidden layers is all ReLU function.

S103: utilizing Feature Fusion Algorithm, and the fisrt feature figure and the second feature figure are carried out fusion treatment, obtained To blending image.

Specifically, the image tune that double-current VGG-16 network, i.e. the first VGG network and the 2nd VGG network can be exported Whole is identical spatial resolution:

For example, the characteristic size size of the corresponding tributary output of second VGG16 network is 16 × 12, and dimension 512, benefit The 512 dimensional feature figures that a size is 32 × 24 are obtained with up-sampling operation, are then adopted by using Feature Fusion Algorithm The 512 dimensional feature figures that the size of characteristic pattern and the output of first tributary after sample is 32 × 24 carry out Fusion Features and form ruler The 1024 dimensional feature figures that very little size is 32 × 24, as blending image.

In practical applications, Feature Fusion Algorithm is the prior art, is not being repeated here.

S104: the characteristic pattern of blending image is identified using the channel weighting sub-network pre-established；By blending image Characteristic pattern and blending image carry out multiplication process, obtain target image；The target image is carried out using preset convolution kernel Process of convolution, the notable figure predicted.

Illustratively, Fig. 3 is the conspicuousness prediction provided in an embodiment of the present invention based on deep neural network Color perception The structural schematic diagram of VGG16 binary-flow network in method；As shown in figure 3, being channel weighting sub-network in dotted line frame in Fig. 3.In order to The relative importance of the semantic feature of image is captured, channel weighting sub-network is devised in the embodiment of the present invention, for each figure As calculating one group of 1024 dimensional feature weight.Channel weighting sub-network is by being the maximum pond that constitutes of 2 convolution kernels having a size of 2 × 2, step-length Change layer, a feature flattening layer and a full articulamentum composition；Wherein, feature flattening layer is used to input " pressing ", more The input of dimension carries out one-dimensional, and the size for exporting maximum pond layer is that the feature that dimension is 1024*16*12 is flattened into dimension Degree is the one-dimensional vector of 1*196608；The output of full articulamentum is the matrix that a dimension is 1*1024.

The effect of maximum pond layer be on 1024 cascade Feature Mapping channels using 2 × 2 maximum pond layer with Reduce their dimensions and space variance.Then, we flatten output, finally calculate 1024 dimensional vectors using full articulamentum. Each dimension indicates the significant weight of corresponding input channel.Full articulamentum be used for based on its spatial positional information and semantic feature come Learn the relative weighting of subject area different in scene；To encode contextual information, enable the network to highlight The object that color from ambient enviroment causes, to obtain target image.

The spy of target image and the double-current VGG-16 network output by Fusion Features that channel weighting sub-network is exported It levies obtained 1024 dimensional feature and carries out multiplication operation, the image after obtained weighting is 1024 dimensional features having a size of 32 × 24 Image.

Then, convolution behaviour is carried out using 2D image of the convolution kernel having a size of 1 × 1 to 1024 dimensions having a size of 32 × 24 Make, to switch to the single channel 2D Saliency maps having a size of 32 × 24, and network finally Saliency maps are readjusted back it is original The dimension of image.The notable figure predicted.For example, by the single channel 2D Saliency maps tune after convolution operation having a size of 32*24 Whole is original image size.For example, if the image input network having a size of 800*600 is predicted, just by Saliency maps picture Size adjusting is 800*600.

Using 2 × 2 maximum pond layer to reduce their dimensions and space on 1024 cascade Feature Mapping channels Variance.

S105: obtaining the notable figure of the prediction and the human eye corresponding to the sample image watches the cross entropy between figure attentively Loss function, and judge whether the value of the cross entropy loss function restrains；If so, executing S106；If it is not, executing S107.

Human eye watches figure attentively, refers to when showing people to observe sample image, the sample that the eyes of people are captured The image that salient region in image is constituted.

It should be noted that cross entropy loss function is the prior art, which is not described herein again.

S106: using the network of the first VGG network, the 2nd VGG network and channel weighting sub-network composition as target network Network model, and predicted using the conspicuousness that the target network model carries out image to be detected.

It illustratively, can during the conspicuousness for being carried out image to be detected using the target network model is predicted Priori color weight is obtained to first pass through following steps in advance:

A: the image of 200 coloury natural scenes is collected, this 200 color images are then changed into grayscale image Picture.

B: being SensoMotoric using model according to the time threshold that every group of picture showing is previously set, such as 5 seconds The eye tracker system of Instruments (SMI) iView X RED acquired for 18 year in the case where sample frequency is 250Hz Age 22-29 years old observer watch attentively above-mentioned color image and and gray level image obtain eye movement data.

C: it on the basis of eye movement data, obtains human eye and watches figure attentively, and watch human eye attentively figure normalized.This step is The prior art, which is not described herein again.

D: the attention score that the human eye after normalized watches each pixel in figure attentively is defined as human eye and watches figure pair attentively The gray value of pixel is answered, the value range of attention score is [0,1].It should be noted that every group of image after normalization Including color image and corresponding gray level image.

E: being directed to each group of image, calculates the attention score value of pixel and the corresponding and pixel in color image The difference of the attention score value of pixel in grayscale image is greater than the value of setting in the difference, when such as 0.1, by the pixel pair The color answered is as conspicuousness color, using the difference as the score of the conspicuousness attention of the pixel.

F: also operating other pixels according to the method described above, available all conspicuousness color and its correspondence Pixel conspicuousness attention score.

In S106 step, may comprise steps of:

Firstly, the Saliency maps obtained using target network model prediction.

Then, for each pixel of the color image of input target network model, if the color of the pixel belongs to It is corresponding aobvious to set the pixel for the corresponding priori color weight of the pixel for the conspicuousness color determined in A-F step Work property attention score；It is not belonging to conspicuousness color if the pixel, then the priori color weight of the pixel is set as pre- The value first set, such as 0.1.

Finally, by the significance value of each pixel in the Saliency maps obtained using target network model prediction multiplied by After priori color weight, then the Saliency maps obtained after being normalized are as final prediction result.

Using the above embodiment of the present invention, the accuracy of prediction result can be improved.

S107: the model power in adjustment the first VGG network and/or the 2nd VGG network and/or channel weighting sub-network Weight and/or hyper parameter, and S102 is returned to step, until the value of the cross entropy loss function restrains.

Illustratively, during adjusting Model Weight and/or hyper parameter, momentum is arranged to 0.9, weight decaying setting It is 0.00005, learning rate is set as 0.00005, criticizes and is sized to 32.

In addition, the value convergence of cross entropy loss function refers to that the value of cross entropy loss function is less than preset threshold.

AUC- is used on disclosed saliency data collection CAT2000 in order to illustrate the technical effect of the embodiment of the present invention Judd (JuddArea Under Curve, HEY JUDE line under area), AUC-Borji, sAUC, NSS (Normalized Scanpath Saliency, normalize the significant property of scan path), IG (Information Gain, information gain), CC (Linear Correlation Coefficient, linearly dependent coefficient) evaluation method is to the embodiment of the present invention and following six The conspicuousness prediction result of kind conspicuousness prediction technique is compared:

[1]S.Fan,Z.Shen,M.Jiang,B.L.Koenig,J.Xu,M.S.Kankanhalli,and

Q.Zhao,“Emotional attention:A study of image sentiment and visual attention,”in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018,pp.7521–7531.

[2]X.Huang,C.Shen,X.Boix,and Q.Zhao,“Salicon:Reducing the semantic gap in saliency prediction by adapting deep neural networks,”in IEEE International Conference on Computer Vision,2015,pp.262–270.

[3]J.Pan,C.C.Ferrer,K.McGuinness,N.E.O’Connor,J.Torres,E.Sayrol,and X.Giro-i Nieto,“Salgan:Visual saliency prediction with generative adversarial networks,”arXiv preprint arXiv:1701.01081,2017.

[4]M.Cornia,L.Baraldi,G.Serra,and R.Cucchiara,“A deep multi-level network for saliency prediction,”in Pattern Recognition(ICPR),2016 23rd International Conference on.IEEE,2016,pp.3488–3493.

[5]J.Zhang and S.Sclaroff,“Saliency detection:A boolean map approach,”in IEEE International Conference on Computer Vision,2013.

[6]H.Tang,C.Chen,and X.Pei,“Visual saliency detection via sparse residual and outlier detection,”IEEE Signal Processing Letters,vol.23,no.12, pp.1736–1740,2016.

Fig. 4 is to be based on deep neural network color to provided in an embodiment of the present invention using AUC-Judd index Evaluation Method The conspicuousness prediction technique of perception and the technical effect contrast schematic diagram of the prior art；As shown in figure 4, the embodiment of the present invention AUC-Judd index has reached 0.83, close with the AUC-Judd index of the above-mentioned prior art [1], and is higher than other prior arts AUC-Judd index.

Fig. 5 is to be based on deep neural network color to provided in an embodiment of the present invention using AUC-Borji index Evaluation Method The conspicuousness prediction technique of coloured silk perception and the technical effect contrast schematic diagram of the prior art；As shown in figure 5, the embodiment of the present invention AUC-Borji index rate has reached 0.80, is only below the AUC-Borji index rate of documents [3], is higher than other prior arts AUC-Borji index rate.

Fig. 6 is to be based on deep neural network Color perception to provided in an embodiment of the present invention using AUC index Evaluation Method Conspicuousness prediction technique and the prior art technical effect contrast schematic diagram；As shown in fig. 6, the AUC of the embodiment of the present invention refers to Mark has reached 0.79, and is higher than the AUC index of each prior art.

Fig. 7 is to be based on deep neural network Color perception to provided in an embodiment of the present invention using NSS index Evaluation Method Conspicuousness prediction technique and the prior art technical effect contrast schematic diagram；As shown in fig. 7, the NSS of the embodiment of the present invention refers to Mark has reached 1.5 and has remained basically stable with the prior art [1], and is higher than other prior arts.

Fig. 8 is to provided in an embodiment of the present invention using IG index Evaluation Method based on deep neural network Color perception The technical effect contrast schematic diagram of conspicuousness prediction technique and the prior art；As shown in figure 8, the IG index of the embodiment of the present invention reaches It remains basically stable to 0.37 with the prior art [1], and is higher than other prior arts.

Fig. 9 is to provided in an embodiment of the present invention using CC index Evaluation Method based on deep neural network Color perception The technical effect contrast schematic diagram of conspicuousness prediction technique and the prior art；As shown in figure 9, the CC index of the embodiment of the present invention reaches To remaining basically stable with the prior art [1], and it is higher than other prior arts.

It should be noted that AUC-Judd index is area under Judd line；AUC-Borji index is area under Borji line； NSS (Normalized Scanpath Saliency normalizes scan path conspicuousness)；IG (Information Gain, letter Cease gain)；CC (Correlation Coefficient, linearly dependent coefficient).

Using embodiment illustrated in fig. 1 of the present invention, the multiple dimensioned information of color image is included in using double fluid input network, Then channel weighting sub-network is used, using great Chiization layer to reduce their dimensions and space on cascade Feature Mapping channel Variance, with the relative importance of object in coded image, to realize the conspicuousness prediction for meeting human eye perception.

Corresponding with embodiment illustrated in fig. 1 of the present invention, the embodiment of the invention provides one kind to be based on deep neural network color The conspicuousness prediction meanss of coloured silk perception.

Figure 10 is the knot of the conspicuousness prediction meanss provided in an embodiment of the present invention based on deep neural network Color perception Structure schematic diagram, as shown in Figure 10, described device includes:

Conversion module 101, for each sample image in the sample set for the color image obtained, by the sample This image is converted to coarseness sample image and fine granularity sample image, wherein image in the fine granularity sample image The resolution ratio of image of the high resolution in the coarseness sample image；

Input module 102, for fine granularity sample image to be input to preset first VGG network, and by coarseness sample This image is input in preset 2nd VGG network, obtains the fisrt feature figure corresponding to coarseness sample image, and corresponding In the second feature figure of fine granularity sample image；

Fusion Module 103 carries out the fisrt feature figure and the second feature figure for utilizing Feature Fusion Algorithm Fusion treatment obtains blending image；

Convolution module 104, for using the channel weighting sub-network pre-established to identify the characteristic pattern of blending image；It will The characteristic pattern and blending image of blending image carry out multiplication process, obtain target image；Using preset convolution kernel to the mesh Logo image carries out process of convolution, the notable figure predicted；

Judgment module 105, the notable figure for obtaining the prediction watch figure attentively with the human eye for corresponding to the sample image Between cross entropy loss function, and judge whether the value of the cross entropy loss function restrains；

Detection module 106, in the case where the judging result of the judgment module, which is, is, by the first VGG network, the The network of two VGG networks and channel weighting sub-network composition uses the target network model as target network model Carry out the conspicuousness prediction of image to be detected；

Module 107 is adjusted, for adjusting the first VGG in the case where the judging result of the judgment module, which is, is Network and/or the 2nd VGG network and/or Model Weight and/or hyper parameter in channel weighting sub-network, and trigger input mould Block.

Using embodiment illustrated in fig. 10 of the present invention, the multiple dimensioned letter of color image is included in using double fluid input network Then breath uses channel weighting sub-network, on cascade Feature Mapping channel using great Chiization layer with reduce their dimensions with Space variance, with the relative importance of object in coded image, to realize the conspicuousness prediction for meeting human eye perception.

In a kind of specific embodiment of the embodiment of the present invention, the resolution ratio of the image in the fine granularity sample image For the first preset quantity times of the resolution ratio of the image in the coarseness sample image.

In a kind of specific embodiment of the embodiment of the present invention, the first VGG network includes: the 5 be sequentially connected in series A processing unit, wherein first processing unit and second processing unit include: 2 convolution kernels and a maximum Pond layer；

In a kind of specific embodiment of the embodiment of the present invention, the 2nd VGG network includes: the 5 be sequentially connected in series A processing unit, wherein first processing unit and second processing unit include: 2 convolution kernels and a maximum Pond layer；

In a kind of specific embodiment of the embodiment of the present invention, described device further include: up-sampling module is used for

Fusion Module 103, is used for:

In a kind of specific embodiment of the embodiment of the present invention, the channel weighting sub-network, comprising:

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims

1. the conspicuousness prediction technique based on deep neural network Color perception, which is characterized in that the described method includes:

1), for each sample image in the sample set of the color image of acquisition, the sample image is converted into coarse grain Spend sample image and fine granularity sample image, wherein the high resolution of the image in the fine granularity sample image is in described thick The resolution ratio of image in granularity sample image；

2) fine granularity sample image, is input to preset first VGG network, and coarseness sample image is input to preset In 2nd VGG network, the fisrt feature figure corresponding to coarseness sample image is obtained, and corresponding to fine granularity sample image Second feature figure；

3), using Feature Fusion Algorithm, the fisrt feature figure and the second feature figure is subjected to fusion treatment, merged Image；

4) characteristic pattern of blending image, is identified using the channel weighting sub-network pre-established；By the characteristic pattern of blending image Multiplication process is carried out with blending image, obtains target image；The target image is carried out at convolution using preset convolution kernel Reason, the notable figure predicted；

5) notable figure for, obtaining the prediction is watched attentively between figure with the human eye corresponding to the sample image intersects entropy loss letter Number, and judge whether the value of the cross entropy loss function restrains；

7), if not；Adjust the model power in the first VGG network and/or the 2nd VGG network and/or channel weighting sub-network Weight and/or hyper parameter, and return to step 2), until the value of the cross entropy loss function restrains.

2. the conspicuousness prediction technique according to claim 1 based on deep neural network Color perception, which is characterized in that The resolution ratio of image in the fine granularity sample image is the first of the resolution ratio of the image in the coarseness sample image Preset quantity times.

3. the conspicuousness prediction technique according to claim 1 based on deep neural network Color perception, which is characterized in that The first VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing unit and second processing Unit includes: 2 convolution kernels and a maximum pond layer；

It includes: three convolution kernels and one that third, which handles ternary, the 4th processing unit and the 5th processing unit, Maximum pond layer.

4. the conspicuousness prediction technique according to claim 1 based on deep neural network Color perception, which is characterized in that The 2nd VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing unit and second processing Unit includes: 2 convolution kernels and a maximum pond layer；

5. the conspicuousness prediction technique according to claim 1 based on deep neural network Color perception, which is characterized in that In step 3), before, the method also includes:

The step 3), comprising:

Using Feature Fusion Algorithm, the up-sampling figure of the fisrt feature figure and second feature figure is subjected to fusion treatment, is obtained Blending image.

6. the conspicuousness prediction technique according to claim 1 based on deep neural network Color perception, which is characterized in that The channel weighting sub-network, comprising:

It sets gradually having a size of 2*2, and the maximum pond layer that step-length is 2；One feature flattening layer and a full articulamentum.

7. the conspicuousness prediction meanss based on deep neural network Color perception, which is characterized in that described device includes:

Conversion module, for each sample image in the sample set for the color image obtained, by the sample image Be converted to coarseness sample image and fine granularity sample image, wherein the resolution ratio of the image in the fine granularity sample image Higher than the resolution ratio of the image in the coarseness sample image；

Input module, for fine granularity sample image to be input to preset first VGG network, and coarseness sample image is defeated Enter into preset 2nd VGG network, obtains the fisrt feature figure corresponding to coarseness sample image, and correspond to fine granularity The second feature figure of sample image；

The fisrt feature figure is carried out merging place by Fusion Module for utilizing Feature Fusion Algorithm with the second feature figure Reason, obtains blending image；

Convolution module, for using the channel weighting sub-network pre-established to identify the characteristic pattern of blending image；Fusion is schemed The characteristic pattern and blending image of picture carry out multiplication process, obtain target image；Using preset convolution kernel to the target image Carry out process of convolution, the notable figure predicted；

Judgment module, the human eye for obtaining the notable figure of the prediction and corresponding to the sample image watch the friendship between figure attentively Entropy loss function is pitched, and judges whether the value of the cross entropy loss function restrains；

Detection module, for the judging result of the judgment module be in the case where, by the first VGG network, the 2nd VGG net The network of network and channel weighting sub-network composition is as target network model, and it is to be checked to use the target network model to carry out The conspicuousness of altimetric image is predicted；

Adjust module, in the case where the judging result of the judgment module, which is, is, adjust the first VGG network and/ Or the 2nd VGG network and/or Model Weight and/or hyper parameter in channel weighting sub-network, and trigger input module.

8. the conspicuousness prediction meanss according to claim 7 based on deep neural network Color perception, which is characterized in that The resolution ratio of image in the fine granularity sample image is the first of the resolution ratio of the image in the coarseness sample image Preset quantity times.

9. the conspicuousness prediction meanss according to claim 7 based on deep neural network Color perception, which is characterized in that The first VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing unit and second processing Unit includes: 2 convolution kernels and a maximum pond layer；

10. the conspicuousness prediction meanss according to claim 7 based on deep neural network Color perception, feature exist In the 2nd VGG network includes: the 5th processing unit being sequentially connected in series, wherein first processing unit and second Processing unit includes: 2 convolution kernels and a maximum pond layer；