CN112861880A - Weak supervision RGBD image saliency detection method and system based on image classification - Google Patents
Weak supervision RGBD image saliency detection method and system based on image classification Download PDFInfo
- Publication number
- CN112861880A CN112861880A CN202110245920.XA CN202110245920A CN112861880A CN 112861880 A CN112861880 A CN 112861880A CN 202110245920 A CN202110245920 A CN 202110245920A CN 112861880 A CN112861880 A CN 112861880A
- Authority
- CN
- China
- Prior art keywords
- map
- network model
- image
- saliency
- depth
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 52
- 230000006870 function Effects 0.000 claims abstract description 59
- 230000004044 response Effects 0.000 claims abstract description 44
- 238000005457 optimization Methods 0.000 claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 24
- 238000000034 method Methods 0.000 claims abstract description 17
- 230000007246 mechanism Effects 0.000 claims abstract description 8
- 238000010586 diagram Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 13
- 238000011176 pooling Methods 0.000 claims description 11
- 230000004927 fusion Effects 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 6
- 238000003708 edge detection Methods 0.000 claims description 6
- 230000003628 erosive effect Effects 0.000 claims description 6
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 230000003213 activating effect Effects 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 230000010339 dilation Effects 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 2
- 239000000203 mixture Substances 0.000 claims 1
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000012545 processing Methods 0.000 description 5
- 238000002372 labelling Methods 0.000 description 4
- 238000003860 storage Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a weak supervision RGBD image saliency detection method and system based on image classification, wherein the method comprises the following steps: step S1: for images in a training data set, respectively generating a class response graph and an initial saliency graph by utilizing a class response mechanism based on gradient and an RGBD image salient object detection algorithm; step S2: performing depth optimization on the category response graph and the initial saliency map, and fusing the category response graph and the initial saliency map to generate an initial saliency map pseudo label; step S3: constructing a network model and a mixing loss function for RGBD image significance detection; training the network model, and learning the optimal parameters of the network model by minimizing the hybrid loss to obtain the trained network model; step S4: and predicting a saliency map of the RGBD image by using the trained network model. The method and the system are beneficial to improving the accuracy of the saliency detection of the weakly supervised RGBD image.
Description
Technical Field
The invention belongs to the field of image processing and computer vision, and particularly relates to a weak supervision RGBD image saliency detection method and system based on image classification.
Background
Since the heavily supervised saliency detection algorithms are labeled pixel by pixel, the cost of manual labeling is very expensive. Therefore, in recent years, some scholars have studied a weakly supervised saliency detection algorithm, which uses image-level labeling or only one frame, which is a low-cost label, for supervised training of saliency detection. Parthipan Siva et al propose a method for weakly supervised image saliency detection with bounding box labeling, which treats saliency detection as a sampling problem. Wang et al use image-level labeling for saliency detection for the first time, they combine the saliency detection task with the image classification task, and use a multitask architecture to achieve weakly supervised saliency detection. Zeng et al propose a multi-source weakly supervised saliency detection framework to remedy the deficiencies of classification labels. Zhang et al in recent new work proposed a network structure based on weak significance detection of graffiti labels and proposed a corresponding data set. However, these methods are weak supervised saliency detection for studying pure RGB images, and are rarely involved in weak supervised saliency detection for RGBD images.
Disclosure of Invention
The invention aims to provide a method and a system for detecting the saliency of a weakly supervised RGBD image based on image classification, which are favorable for improving the saliency detection precision of the weakly supervised RGBD image.
In order to achieve the purpose, the invention adopts the technical scheme that: a weak supervision RGBD image saliency detection method based on image classification comprises the following steps:
step S1: for the images in the training data set, respectively utilizing a class response mechanism based on gradient and an RGBD image salient object detection algorithm to generate a class response image IcamAnd an initial saliency map Scdcp;
Step S2: carrying out depth optimization on the class response graph and the initial saliency map, and fusing the class response graph and the initial saliency map to generate an initial saliency map pseudo label Ynoisy;
Step S3: constructing a network model and a mixing loss function for RGBD image significance detection; training the network model, and learning the optimal parameters of the network model by minimizing the hybrid loss to obtain the trained network model;
step S4: and predicting a saliency map of the RGBD image by using the trained network model.
Further, the step S1 specifically includes the following steps:
step S11: scaling each color image and the corresponding depth image in the training data set together to ensure that the sizes of all RGBD images in the training data set are the same;
step S12: color map I after zoomingrgbInputting a pre-trained classification network model ResNet50 for image classification to obtain a final layer generation characteristic diagram set of ResNet50 convolutional layer, and defining the final layer generation characteristic diagram set as a matrix A belonging to RH×W×NWherein H, W represents the height and width of the feature map and N represents the number of channels; in the gradient-based class response mechanism, a feature map set A is linearly combined into a class response map, and the weight of the linear combination is determined by the partial derivative of the classification probability on the feature map; the method specifically comprises the following steps: first, the classification result y of the last layer is divided intocAnd the kth feature map A in the feature map setkPartial derivatives are calculated and are obtained through global average pooling to act on the feature mapLinear combining weights ofIt is formulated as:
wherein GAP (-) represents a global average pooling operator,represents a partial derivative operation;
secondly, linearly combining the feature graphs and generating a preliminary class response graph through Relu function filteringIt is formulated as:
wherein Relu (-) denotes a Relu activation function, and Σ denotes a summing operation;
finally, normalizing the preliminary class response graph to obtain a final class response graph IcamIt is formulated as:
wherein MaxPool represents maximum pooling;
step S13: color drawing IrgbAnd depth map IdepthMeanwhile, an initial saliency map S is generated through an RGBD image saliency detection algorithm based on central dark channel priorcdcpIt is formulated as:
Scdcp=functioncdcp(Irgb,Idepth)
wherein the functioncdcp(. represents a groupAnd performing a priori RGBD image saliency detection algorithm on the central dark channel.
Further, the step S2 specifically includes the following steps:
step S21: firstly, carrying out depth enhancement on the category response map Icam through a depth map Idepth to obtain a depth-enhanced category response mapThen carrying out deep optimization through a conditional random field to obtain an optimized class response mapIt is formulated as:
wherein,expressing pixel-by-pixel dot multiplication, CRF (-) expressing conditional random field optimization, and alpha expressing a hyperparameter larger than 1;
step S22: by depth map IdepthFor the initial saliency map ScdcpCarrying out depth enhancement to obtain a depth-enhanced saliency mapThen carrying out depth optimization through a conditional random field to obtain an optimized saliency mapIt is formulated as:
wherein,expressing pixel-by-pixel dot multiplication, CRF (-) expressing conditional random field optimization, and beta expressing a hyperparameter larger than 1;
step S23: the optimized class response graphAnd saliency mapFusing to a pseudo tag Y with lower noiseNoisyThe method is used for training the network model and is formulated as follows:
where x denotes a multiplier and δ denotes a parameter greater than 0 and less than 1.
Further, the step S3 specifically includes the following steps:
step S31: constructing a network model for RGBD image significance detection, wherein the network model consists of a feature fusion module and a full convolution neural network (FCN) module;
step S32: and constructing a mixed loss function comprising weighted cross entropy loss, conditional random field inference loss and edge loss, and training the network model by using the mixed loss function to obtain the network model with good robustness.
Further, the step S31 specifically includes the following steps:
step S311: constructing a characteristic fusion module which is formed by two 3 multiplied by 3 convolutions and used for inputting a color image I of the network modelrgbAnd depth map IdepthCarrying out feature fusion; firstly, carrying out channel splicing on an input color image and a depth image to generate a network model input with the size of (b, 4, h, w); this input is then convolved by two layers 3X 3 to obtain a feature X' of size (b, 3, h, w), which is expressed by the formula:
Input=Concat(Irgb,Idepth)
X=Conv3×3(Input)
X′=Conv3×3(X)
wherein, Concat () represents a splicing operator, Input represents the Input of the network model, and X represents the intermediate feature of convolution;
step S312: the FCN module changes the last layer of the classification network into a convolution layer and performs pooling on the 5 th layer of the classification network to obtain the characteristic Feat5Performing upsampling, performing convolution to obtain features with fewer channels, and performing activation function to obtain a final significance prediction graph, wherein the final significance prediction graph is expressed by a formula:
out=FCN(X′)
S=Sigmoid(out)
wherein, FCN (-) represents FCN module, out represents output of network model, Sigmoid (-) represents Sigmoid activating function, and S represents saliency map predicted by network model.
Further, the step S32 specifically includes the following steps:
step S321: reconstructing an original cross entropy loss function to obtain a weighted cross entropy loss function, and reducing the influence of noise in a label during network model training, wherein the formula expression is as follows:
w=|Y[i,j]-0.5|
wherein w denotes acting on a certain pixelThe weight of the loss of (a) is,representing a weighted cross-entropy loss function, YNoisyIndicates the pseudo tag generated in step S23,representing an original cross entropy loss function, Y representing a real label, i and j representing indexes of rows and columns where pixels are located, log (-) representing a logarithmic function, and | represents an absolute value operator;
step S322: constructing a conditional random field inference loss function, so that the network model can infer uncertain regions in the pseudo labels through the determined labels, and the formula expression is as follows:
Scrf=CRF(S,Irgb)
wherein CRF (. cndot.) represents conditional random field optimization, ScrfRepresenting the saliency map after conditional random field optimization, in this step as saliency map S for label supervised prediction,representing a conditional random field inference loss function;
step S323: constructing an edge loss function to optimize the edges of the prediction saliency map;
firstly, a color image IrgbConverted into a grey-scale map IgrayAnd obtaining a global edge map I through an edge detection operatoredgeIt is formulated as:
Iedge=ΔIgray
wherein Δ represents a gradient operation in edge detection;
secondly, the predicted saliency map S is subjected to expansion and erosion operations to generate a mask map ImaskActing on the edge map to filter out redundant edges to obtain edge lossA missing label, which is formulated as:
Sdil=Dilate(S)
Sero=Erode(S)
Imask=Sdil-Sero
wherein, the die (-) represents the dilation operation, the Erode (-) represents the erosion operation,indicating a pixel-by-pixel dot multiplication, YedgeA label representing an effect on edge loss;
wherein Δ S represents an edge map of the predicted saliency map;
step S324: the losses in steps S321-S323 are summed to obtain the final mixed loss function:
Further, the Adam optimizer optimizes the mixing loss function to obtain the optimal parameters of the network model for testing the network model.
The invention also provides a weakly supervised RGBD image saliency detection system based on image classification, comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the method steps are implemented when the computer program is run by the processor.
Compared with the prior art, the invention has the following beneficial effects: the invention provides a weakly supervised RGBD image saliency detection scheme, designs a depth optimization strategy to optimize a pseudo label, simultaneously considers noise on the pseudo label and incomplete label objects, and constructs a mixed loss to enable a model to effectively infer the full view of the object.
Drawings
Fig. 1 is a schematic flow chart of a method implementation of the embodiment of the present invention.
FIG. 2 is a network model architecture diagram of weakly supervised RGBD image saliency detection in an embodiment of the present invention.
FIG. 3 is a schematic diagram of a feature fusion module according to an embodiment of the invention.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the present embodiment provides a weakly supervised RGBD image saliency detection method based on image classification, including the following steps:
step S1: for the images in the training data set, respectively utilizing a class response mechanism based on gradient and a traditional RGBD image salient object detection algorithm to generate a class response image IcamAnd an initial saliency map Scdcp。
Step S2: carrying out depth optimization on the class response graph and the initial saliency map, and fusing the class response graph and the initial saliency map to generate an initial saliency map pseudo label Ynoisy。
Step S3: and constructing a network model and a mixing loss function for the RGBD image saliency detection. And training the network model, and learning the optimal parameters of the network model by minimizing the hybrid loss to obtain the trained network model.
Step S4: and predicting a saliency map of the RGBD image by using the trained network model.
Wherein the color map is RGB in fig. 2, and the Depth map is Depth in fig. 2. The gradient-based class response mechanism generally frames an upper network frame as in fig. 2.
In this embodiment, the step S1 specifically includes the following steps:
step S11: scaling each color map in the training data set and its corresponding depth map together to make all the RGBD images in the training data set have the same size, so that the saliency map pseudo label Y generated in step S2noisyHave the same size.
Step S12: color map I after zoomingrgbInputting a pre-trained classification network model ResNet50 for image classification to obtain a final layer generation characteristic diagram set of ResNet50 convolutional layer, and defining the final layer generation characteristic diagram set as a matrix A belonging to RH×W×NWhere H, W denotes the height and width of the feature map and N denotes the number of channels. In the gradient-based class response mechanism, the feature map set a is linearly combined into a class response map, and the weight of the linear combination is determined by the partial derivative of the classification probability on the feature map. The method specifically comprises the following steps: first, the classification result y of the last layer is divided intocAnd the kth feature map A in the feature map setkPartial derivatives are calculated and passed through global averagingPooling results in linear combination weights acting on the profileIt is formulated as:
wherein GAP (-) represents a global average pooling operator,representing partial derivative operations.
Secondly, linearly combining the feature graphs and generating a preliminary class response graph through Relu function filteringIt is formulated as:
where Relu (-) denotes the Relu activation function and Σ denotes the summing operation.
Finally, normalizing the preliminary class response graph to obtain a final class response graph Icam(e.g., the category response graph in FIG. 2), which is formulated as:
where MaxPool represents the maximum pooling.
Step S13: color drawing IrgbAnd depth map IdepthMeanwhile, an initial saliency map S is generated through an RGBD image saliency detection algorithm based on central dark channel priorcdcpIt is formulated as:
Scdcp=functioncdcp(Irgb,Idepth)
wherein the functioncdcp(. to) shows an RGBD image saliency detection algorithm based on a central dark channel prior.
In this embodiment, the step S2 specifically includes the following steps:
step S21: firstly, carrying out depth enhancement on the category response map Icam through a depth map Idepth to obtain a depth-enhanced category response mapThen carrying out deep optimization through a conditional random field to obtain an optimized class response mapIt is formulated as:
wherein,representing a pixel-by-pixel dot product, CRF (-) represents a conditional random field optimization, and α represents a hyperparameter greater than 1.
Step S22: by depth map IdepthFor the initial saliency map ScdcpCarrying out depth enhancement to obtain a depth-enhanced saliency mapThen carrying out depth optimization through a conditional random field to obtain an optimized saliency mapIt is formulated as:
wherein,representing a pixel-by-pixel dot product, CRF (·) represents conditional random field optimization, and β represents a hyperparameter greater than 1.
Step S23: the optimized class response graphAnd saliency mapFusing to a pseudo tag Y with lower noiseNoisy(noise label in fig. 2) for training of the network model, which is formulated as:
where x denotes a multiplier and δ denotes a parameter greater than 0 and less than 1.
In this embodiment, the step S3 specifically includes the following steps:
step S31: and constructing a network model (as shown in figure 2) for the RGBD image saliency detection, wherein the network model consists of a feature fusion module (as shown in figure 3) and a full convolution neural network (FCN) module. The step S31 specifically includes the following steps:
step S311: constructing a characteristic fusion module which is formed by two 3 multiplied by 3 convolutions and used for inputting a color image I of the network modelrgbAnd depth map IdepthAnd performing feature fusion. Firstly, channel splicing is carried out on the input color image and the input depth image to generate a network model input with the size of (b, 4, h, w). The input is then convolved by two layers 3 x 3 to get largeA characteristic X' as small as (b, 3, h, w), which is expressed by the formula:
Input=Concat(Irgb,Idepth)
X=Conv3×3(Input)
X′=Conv3×3(X)
where Concat (·) represents the concatenation operator, Input represents the Input to the network model, and X represents the intermediate features of the convolution.
Step S312: the FCN module changes the last layer of the classification network into a convolution layer and performs pooling on the 5 th layer of the classification network to obtain the characteristic Feat5Performing upsampling, performing convolution to obtain features with fewer channels, and performing activation function to obtain a final significance prediction graph, wherein the final significance prediction graph is expressed by a formula:
out=FCN(X′)
S=Sigmoid(out)
wherein, FCN (-) represents FCN module, out represents output of network model, Sigmoid (-) represents Sigmoid activating function, and S represents saliency map predicted by network model.
Step S32: and constructing a mixed loss function comprising weighted cross entropy loss, conditional random field inference loss and edge loss, and training the network model by using the mixed loss function to obtain the network model with good robustness. The step S32 specifically includes the following steps:
step S321: reconstructing an original cross entropy loss function to obtain a weighted cross entropy loss function, and reducing the influence of noise in a label during network model training, wherein the formula expression is as follows:
w=|Y[i,j]-0.5|
wherein w is represented asWith the loss weight on a certain pixel,represents a weighted cross-entropy loss function, YNoisy represents the pseudo label generated in step S23,representing the original cross-entropy loss function, Y the true label, i and j the indices of the row and column where the pixel is located, log (-) represents the logarithmic function, and | represents the absolute operator.
Step S322: constructing a conditional random field inference loss function, so that the network model can infer uncertain regions in the pseudo labels through the determined labels, and the formula expression is as follows:
Scrf=CRF(S,Irgb)
wherein CRF (. cndot.) represents conditional random field optimization, ScrfRepresenting the saliency map after conditional random field optimization, in this step as saliency map S for label supervised prediction,representing a conditional random field inference loss function.
Step S323: constructing an edge loss function optimizes predicting the edges of the saliency map.
Firstly, a color image IrgbConverted into a grey-scale map IgrayAnd obtaining a global edge map I through an edge detection operatoredgeIt is formulated as:
Iedge=ΔIgray
where Δ represents the gradient operation in edge detection.
Secondly, the predicted saliency map S is subjected to expansion and erosion operations to generate a mask map ImaskActing on the edge map to filter out redundant edges to obtainA label for edge loss, formulated as:
Sdil=Dilate(S)
Sero=Erode(S)
Imask=Sdil-Sero
wherein, the die (-) represents the dilation operation, the Erode (-) represents the erosion operation,indicating a pixel-by-pixel dot multiplication, YedgeIndicating the label acting on the edge loss.
here, AS represents an edge graph of the predicted saliency map.
Step S324: the losses in steps S321-S323 are summed to obtain the final mixed loss function:
And then, optimizing the mixing loss function through an Adam optimizer to obtain the optimal parameters of the network model for testing the network model.
The invention also provides a weakly supervised RGBD image saliency detection system based on image classification, comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein when the computer program is run by the processor, the above-mentioned method steps are implemented.
The depth map is an expression of the spatial distance between the object and the camera, can provide sufficient position information, and the depth map with smaller noise amplitude can provide complete object structure information, and the depth map is considered as additional auxiliary information for the salient detection of the weakly supervised image. The invention provides a weakly supervised RGBD image saliency detection framework, designs a depth optimization strategy to optimize a pseudo label, considers noise on the pseudo label and incomplete label objects, and designs a mixing loss to enable a model to effectively deduce the full view of the object, thereby remarkably improving the detection precision of the weakly supervised RGBD image saliency object.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.
Claims (8)
1. A weak supervision RGBD image saliency detection method based on image classification is characterized by comprising the following steps:
step S1: for the images in the training data set, respectively utilizing a class response mechanism based on gradient and an RGBD image salient object detection algorithm to generate a class response image IcamAnd an initial saliency map Scdcp;
Step S2: carrying out depth optimization on the class response graph and the initial saliency map, and fusing the class response graph and the initial saliency map to generate an initial saliency map pseudo label Ynoisy;
Step S3: constructing a network model and a mixing loss function for RGBD image significance detection; training the network model, and learning the optimal parameters of the network model by minimizing the hybrid loss to obtain the trained network model;
step S4: and predicting a saliency map of the RGBD image by using the trained network model.
2. The weakly supervised RGBD image saliency detection method based on image classification as claimed in claim 1, wherein the step S1 specifically includes the following steps:
step S11: scaling each color image and the corresponding depth image in the training data set together to ensure that the sizes of all RGBD images in the training data set are the same;
step S12: color map I after zoomingrgbInputting a pre-trained classification network model ResNet50 for image classification to obtain a final layer generation characteristic diagram set of ResNet50 convolutional layer, and defining the final layer generation characteristic diagram set as a matrix A belonging to RH×W×NWherein H, W represents the height and width of the feature map and N represents the number of channels; in the gradient-based class response mechanism, a feature map set A is linearly combined into a class response map, and the weight of the linear combination is determined by the partial derivative of the classification probability on the feature map; the method specifically comprises the following steps: first, the classification result y of the last layer is divided intocAnd the kth feature map A in the feature map setkPartial derivatives are calculated and linear combination weights acting on the feature map are obtained through global average poolingIt is formulated as:
wherein GAP (-) represents a global average pooling operator,represents a partial derivative operation;
secondly, the feature maps are linearly groupedCombined and filtered by Relu function to generate preliminary class response graphIt is formulated as:
wherein Relu (-) denotes a Relu activation function, and Σ denotes a summing operation;
finally, normalizing the preliminary class response graph to obtain a final class response graph IcamIt is formulated as:
wherein MaxPool represents maximum pooling;
step S13: color drawing IrgbAnd depth map IdepthMeanwhile, an initial saliency map S is generated through an RGBD image saliency detection algorithm based on central dark channel priorcdcpIt is formulated as:
Scdcp=functioncdcp(Irgb,Idepth)
wherein the functioncdcp(. to) shows an RGBD image saliency detection algorithm based on a central dark channel prior.
3. The weakly supervised RGBD image saliency detection method based on image classification as claimed in claim 2, wherein the step S2 specifically includes the following steps:
step S21: first by a depth map IdepthResponse to class diagram IcamCarrying out depth enhancement to obtain a class response map with the depth enhancementThen subject to the following conditionsCarrying out deep optimization on the airport to obtain an optimized class response diagramIt is formulated as:
wherein,expressing pixel-by-pixel dot multiplication, CRF (-) expressing conditional random field optimization, and alpha expressing a hyperparameter larger than 1;
step S22: by depth map IdepthFor the initial saliency map ScdcpCarrying out depth enhancement to obtain a depth-enhanced saliency mapThen carrying out depth optimization through a conditional random field to obtain an optimized saliency mapIt is formulated as:
wherein,expressing pixel-by-pixel dot multiplication, CRF (-) expressing conditional random field optimization, and beta expressing a hyperparameter larger than 1;
step S23: the optimized class response graphAnd saliency mapFusing to a pseudo tag Y with lower noiseNoisyThe method is used for training the network model and is formulated as follows:
where x denotes a multiplier and δ denotes a parameter greater than 0 and less than 1.
4. The weakly supervised RGBD image saliency detection method based on image classification as claimed in claim 3, wherein the step S3 specifically includes the following steps:
step S31: constructing a network model for RGBD image significance detection, wherein the network model consists of a feature fusion module and a full convolution neural network (FCN) module;
step S32: and constructing a mixed loss function comprising weighted cross entropy loss, conditional random field inference loss and edge loss, and training the network model by using the mixed loss function to obtain the network model with good robustness.
5. The weakly supervised RGBD image saliency detection method based on image classification as claimed in claim 4, wherein the step S31 specifically includes the following steps:
step S311: constructing a characteristic fusion module which is formed by two 3 multiplied by 3 convolutions and used for inputting a color image I of the network modelrgbAnd depth map IdepthCarrying out feature fusion; firstly, carrying out channel splicing on an input color image and a depth image to generate a network model input with the size of (b, 4, h, w); this input is then convolved by two layers 3X 3 to obtain a feature X' of size (b, 3, h, w), which is expressed by the formula:
Input=Concat(Irgb,Idepth)
X=Conv3×3(Input)
X′=Conv3×3(X)
wherein, Concat () represents a splicing operator, Input represents the Input of the network model, and X represents the intermediate feature of convolution;
step S312: the FCN module changes the last layer of the classification network into a convolution layer and performs pooling on the 5 th layer of the classification network to obtain the characteristic Feat5Performing upsampling, performing convolution to obtain features with fewer channels, and performing activation function to obtain a final significance prediction graph, wherein the final significance prediction graph is expressed by a formula:
out=FCN(X′)
S=Sigmoid(out)
wherein, FCN (-) represents FCN module, out represents output of network model, Sigmoid (-) represents Sigmoid activating function, and S represents saliency map predicted by network model.
6. The weakly supervised RGBD image saliency detection method based on image classification as claimed in claim 5, wherein the step S32 specifically includes the following steps:
step S321: reconstructing an original cross entropy loss function to obtain a weighted cross entropy loss function, and reducing the influence of noise in a label during network model training, wherein the formula expression is as follows:
w=|Y[i,j]-0.5|
where w represents the loss weight applied to a pixel,representing a weighted cross-entropy loss function, YNoisyIndicates the pseudo tag generated in step S23,representing an original cross entropy loss function, Y representing a real label, i and j representing indexes of rows and columns where pixels are located, log (-) representing a logarithmic function, and | represents an absolute value operator;
step S322: constructing a conditional random field inference loss function, so that the network model can infer uncertain regions in the pseudo labels through the determined labels, and the formula expression is as follows:
Scrf=CRF(S,Irgb)
wherein CRF (. cndot.) represents conditional random field optimization, ScrfRepresenting the saliency map after conditional random field optimization, in this step as saliency map S for label supervised prediction,representing a conditional random field inference loss function;
step S323: constructing an edge loss function to optimize the edges of the prediction saliency map;
firstly, a color image IrgbConverted into a grey-scale map IgrayAnd obtaining a global edge map I through an edge detection operatoredgeIt is formulated as:
Iedge=ΔIgray
wherein Δ represents a gradient operation in edge detection;
secondly, the predicted saliency map S is subjected to expansion and erosion operations to generate a mask map ImaskAnd filtering redundant edges on the edge graph to obtain a label of edge loss, wherein the label is expressed by the formula:
Sdil=Dilate(S)
Sero=Erode(S)
Imask=Sdil-Sero
wherein, the die (-) represents the dilation operation, the Erode (-) represents the erosion operation,indicating a pixel-by-pixel dot multiplication, YedgeA label representing an effect on edge loss;
wherein Δ S represents an edge map of the predicted saliency map;
step S324: the losses in steps S321-S323 are summed to obtain the final mixed loss function:
7. The image classification-based weakly supervised RGBD image saliency detection method according to claim 6, characterized in that the optimal parameters of the network model are obtained by optimizing the mixture loss function by an Adam optimizer for testing the network model.
8. A weakly supervised RGBD image saliency detection system based on image classification, comprising a memory, a processor and a computer program stored on the memory and being executable on the processor, the computer program, when executed by the processor, implementing the method steps of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110245920.XA CN112861880B (en) | 2021-03-05 | 2021-03-05 | Weak supervision RGBD image saliency detection method and system based on image classification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110245920.XA CN112861880B (en) | 2021-03-05 | 2021-03-05 | Weak supervision RGBD image saliency detection method and system based on image classification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112861880A true CN112861880A (en) | 2021-05-28 |
CN112861880B CN112861880B (en) | 2021-12-07 |
Family
ID=75994082
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110245920.XA Active CN112861880B (en) | 2021-03-05 | 2021-03-05 | Weak supervision RGBD image saliency detection method and system based on image classification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112861880B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113436115A (en) * | 2021-07-30 | 2021-09-24 | 西安热工研究院有限公司 | Image shadow detection method based on depth unsupervised learning |
CN115080748A (en) * | 2022-08-16 | 2022-09-20 | 之江实验室 | Weak supervision text classification method and device based on noisy label learning |
CN116978008A (en) * | 2023-07-12 | 2023-10-31 | 睿尔曼智能科技(北京)有限公司 | RGBD-fused semi-supervised target detection method and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102364560A (en) * | 2011-10-19 | 2012-02-29 | 华南理工大学 | Traffic sign convenient for electronic identification and method for identifying traffic sign |
CN105791660A (en) * | 2014-12-22 | 2016-07-20 | 中兴通讯股份有限公司 | Method and device for correcting photographing inclination of photographed object and mobile terminal |
CN107292318A (en) * | 2017-07-21 | 2017-10-24 | 北京大学深圳研究生院 | Image significance object detection method based on center dark channel prior information |
CN107452030A (en) * | 2017-08-04 | 2017-12-08 | 南京理工大学 | Method for registering images based on contour detecting and characteristic matching |
CN108399406A (en) * | 2018-01-15 | 2018-08-14 | 中山大学 | The method and system of Weakly supervised conspicuousness object detection based on deep learning |
CN109410171A (en) * | 2018-09-14 | 2019-03-01 | 安徽三联学院 | A kind of target conspicuousness detection method for rainy day image |
CN110598609A (en) * | 2019-09-02 | 2019-12-20 | 北京航空航天大学 | Weak supervision target detection method based on significance guidance |
-
2021
- 2021-03-05 CN CN202110245920.XA patent/CN112861880B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102364560A (en) * | 2011-10-19 | 2012-02-29 | 华南理工大学 | Traffic sign convenient for electronic identification and method for identifying traffic sign |
CN105791660A (en) * | 2014-12-22 | 2016-07-20 | 中兴通讯股份有限公司 | Method and device for correcting photographing inclination of photographed object and mobile terminal |
CN107292318A (en) * | 2017-07-21 | 2017-10-24 | 北京大学深圳研究生院 | Image significance object detection method based on center dark channel prior information |
CN107452030A (en) * | 2017-08-04 | 2017-12-08 | 南京理工大学 | Method for registering images based on contour detecting and characteristic matching |
CN108399406A (en) * | 2018-01-15 | 2018-08-14 | 中山大学 | The method and system of Weakly supervised conspicuousness object detection based on deep learning |
CN109410171A (en) * | 2018-09-14 | 2019-03-01 | 安徽三联学院 | A kind of target conspicuousness detection method for rainy day image |
CN110598609A (en) * | 2019-09-02 | 2019-12-20 | 北京航空航天大学 | Weak supervision target detection method based on significance guidance |
Non-Patent Citations (2)
Title |
---|
CHUNBIAO ZHU ET AL.: "Exploiting the Value of the Center-dark Channel Prior for Salient Object Detection", 《ARXIV:1805.05132V1》 * |
RAMPRASAATH R. SELVARAJU ET AL.: "Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization", 《2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113436115A (en) * | 2021-07-30 | 2021-09-24 | 西安热工研究院有限公司 | Image shadow detection method based on depth unsupervised learning |
CN113436115B (en) * | 2021-07-30 | 2023-09-19 | 西安热工研究院有限公司 | Image shadow detection method based on depth unsupervised learning |
CN115080748A (en) * | 2022-08-16 | 2022-09-20 | 之江实验室 | Weak supervision text classification method and device based on noisy label learning |
CN115080748B (en) * | 2022-08-16 | 2022-11-11 | 之江实验室 | Weak supervision text classification method and device based on learning with noise label |
CN116978008A (en) * | 2023-07-12 | 2023-10-31 | 睿尔曼智能科技(北京)有限公司 | RGBD-fused semi-supervised target detection method and system |
CN116978008B (en) * | 2023-07-12 | 2024-04-26 | 睿尔曼智能科技(北京)有限公司 | RGBD-fused semi-supervised target detection method and system |
Also Published As
Publication number | Publication date |
---|---|
CN112861880B (en) | 2021-12-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112861880B (en) | Weak supervision RGBD image saliency detection method and system based on image classification | |
CN104268594B (en) | A kind of video accident detection method and device | |
CN110532900B (en) | Facial expression recognition method based on U-Net and LS-CNN | |
CN113657560B (en) | Weak supervision image semantic segmentation method and system based on node classification | |
CN113158862B (en) | Multitasking-based lightweight real-time face detection method | |
CN112906485B (en) | Visual impairment person auxiliary obstacle perception method based on improved YOLO model | |
CN111861925B (en) | Image rain removing method based on attention mechanism and door control circulation unit | |
CN113807355A (en) | Image semantic segmentation method based on coding and decoding structure | |
CN111696110B (en) | Scene segmentation method and system | |
CN110298387A (en) | Incorporate the deep neural network object detection method of Pixel-level attention mechanism | |
CN113065546B (en) | Target pose estimation method and system based on attention mechanism and Hough voting | |
CN111882002A (en) | MSF-AM-based low-illumination target detection method | |
CN108062756A (en) | Image, semantic dividing method based on the full convolutional network of depth and condition random field | |
CN108256562A (en) | Well-marked target detection method and system based on Weakly supervised space-time cascade neural network | |
CN111104903A (en) | Depth perception traffic scene multi-target detection method and system | |
CN111428664B (en) | Computer vision real-time multi-person gesture estimation method based on deep learning technology | |
CN111368637B (en) | Transfer robot target identification method based on multi-mask convolutional neural network | |
CN112801104B (en) | Image pixel level pseudo label determination method and system based on semantic segmentation | |
CN114693930B (en) | Instance segmentation method and system based on multi-scale features and contextual attention | |
CN114821050B (en) | Method for dividing reference image based on transformer | |
CN114724155A (en) | Scene text detection method, system and equipment based on deep convolutional neural network | |
CN110852199A (en) | Foreground extraction method based on double-frame coding and decoding model | |
CN116740362A (en) | Attention-based lightweight asymmetric scene semantic segmentation method and system | |
CN114581789A (en) | Hyperspectral image classification method and system | |
CN117253184B (en) | Foggy day image crowd counting method guided by foggy priori frequency domain attention characterization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |