Nothing Special   »   [go: up one dir, main page]

CN112861880A - Weak supervision RGBD image saliency detection method and system based on image classification - Google Patents

Weak supervision RGBD image saliency detection method and system based on image classification Download PDF

Info

Publication number
CN112861880A
CN112861880A CN202110245920.XA CN202110245920A CN112861880A CN 112861880 A CN112861880 A CN 112861880A CN 202110245920 A CN202110245920 A CN 202110245920A CN 112861880 A CN112861880 A CN 112861880A
Authority
CN
China
Prior art keywords
map
network model
image
saliency
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110245920.XA
Other languages
Chinese (zh)
Other versions
CN112861880B (en
Inventor
潘昌琴
林涵阳
刘国辉
王力军
俞伟明
蔡桥英
郑骁凌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Start Dima Data Processing Co ltd
Original Assignee
Jiangsu Start Dima Data Processing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Start Dima Data Processing Co ltd filed Critical Jiangsu Start Dima Data Processing Co ltd
Priority to CN202110245920.XA priority Critical patent/CN112861880B/en
Publication of CN112861880A publication Critical patent/CN112861880A/en
Application granted granted Critical
Publication of CN112861880B publication Critical patent/CN112861880B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a weak supervision RGBD image saliency detection method and system based on image classification, wherein the method comprises the following steps: step S1: for images in a training data set, respectively generating a class response graph and an initial saliency graph by utilizing a class response mechanism based on gradient and an RGBD image salient object detection algorithm; step S2: performing depth optimization on the category response graph and the initial saliency map, and fusing the category response graph and the initial saliency map to generate an initial saliency map pseudo label; step S3: constructing a network model and a mixing loss function for RGBD image significance detection; training the network model, and learning the optimal parameters of the network model by minimizing the hybrid loss to obtain the trained network model; step S4: and predicting a saliency map of the RGBD image by using the trained network model. The method and the system are beneficial to improving the accuracy of the saliency detection of the weakly supervised RGBD image.

Description

Weak supervision RGBD image saliency detection method and system based on image classification
Technical Field
The invention belongs to the field of image processing and computer vision, and particularly relates to a weak supervision RGBD image saliency detection method and system based on image classification.
Background
Since the heavily supervised saliency detection algorithms are labeled pixel by pixel, the cost of manual labeling is very expensive. Therefore, in recent years, some scholars have studied a weakly supervised saliency detection algorithm, which uses image-level labeling or only one frame, which is a low-cost label, for supervised training of saliency detection. Parthipan Siva et al propose a method for weakly supervised image saliency detection with bounding box labeling, which treats saliency detection as a sampling problem. Wang et al use image-level labeling for saliency detection for the first time, they combine the saliency detection task with the image classification task, and use a multitask architecture to achieve weakly supervised saliency detection. Zeng et al propose a multi-source weakly supervised saliency detection framework to remedy the deficiencies of classification labels. Zhang et al in recent new work proposed a network structure based on weak significance detection of graffiti labels and proposed a corresponding data set. However, these methods are weak supervised saliency detection for studying pure RGB images, and are rarely involved in weak supervised saliency detection for RGBD images.
Disclosure of Invention
The invention aims to provide a method and a system for detecting the saliency of a weakly supervised RGBD image based on image classification, which are favorable for improving the saliency detection precision of the weakly supervised RGBD image.
In order to achieve the purpose, the invention adopts the technical scheme that: a weak supervision RGBD image saliency detection method based on image classification comprises the following steps:
step S1: for the images in the training data set, respectively utilizing a class response mechanism based on gradient and an RGBD image salient object detection algorithm to generate a class response image IcamAnd an initial saliency map Scdcp
Step S2: carrying out depth optimization on the class response graph and the initial saliency map, and fusing the class response graph and the initial saliency map to generate an initial saliency map pseudo label Ynoisy
Step S3: constructing a network model and a mixing loss function for RGBD image significance detection; training the network model, and learning the optimal parameters of the network model by minimizing the hybrid loss to obtain the trained network model;
step S4: and predicting a saliency map of the RGBD image by using the trained network model.
Further, the step S1 specifically includes the following steps:
step S11: scaling each color image and the corresponding depth image in the training data set together to ensure that the sizes of all RGBD images in the training data set are the same;
step S12: color map I after zoomingrgbInputting a pre-trained classification network model ResNet50 for image classification to obtain a final layer generation characteristic diagram set of ResNet50 convolutional layer, and defining the final layer generation characteristic diagram set as a matrix A belonging to RH×W×NWherein H, W represents the height and width of the feature map and N represents the number of channels; in the gradient-based class response mechanism, a feature map set A is linearly combined into a class response map, and the weight of the linear combination is determined by the partial derivative of the classification probability on the feature map; the method specifically comprises the following steps: first, the classification result y of the last layer is divided intocAnd the kth feature map A in the feature map setkPartial derivatives are calculated and are obtained through global average pooling to act on the feature mapLinear combining weights of
Figure BDA0002964086550000021
It is formulated as:
Figure BDA0002964086550000022
wherein GAP (-) represents a global average pooling operator,
Figure BDA0002964086550000023
represents a partial derivative operation;
secondly, linearly combining the feature graphs and generating a preliminary class response graph through Relu function filtering
Figure BDA0002964086550000024
It is formulated as:
Figure BDA0002964086550000025
wherein Relu (-) denotes a Relu activation function, and Σ denotes a summing operation;
finally, normalizing the preliminary class response graph to obtain a final class response graph IcamIt is formulated as:
Figure BDA0002964086550000026
wherein MaxPool represents maximum pooling;
step S13: color drawing IrgbAnd depth map IdepthMeanwhile, an initial saliency map S is generated through an RGBD image saliency detection algorithm based on central dark channel priorcdcpIt is formulated as:
Scdcp=functioncdcp(Irgb,Idepth)
wherein the functioncdcp(. represents a groupAnd performing a priori RGBD image saliency detection algorithm on the central dark channel.
Further, the step S2 specifically includes the following steps:
step S21: firstly, carrying out depth enhancement on the category response map Icam through a depth map Idepth to obtain a depth-enhanced category response map
Figure BDA00029640865500000313
Then carrying out deep optimization through a conditional random field to obtain an optimized class response map
Figure BDA0002964086550000031
It is formulated as:
Figure BDA0002964086550000032
Figure BDA0002964086550000033
wherein,
Figure BDA0002964086550000034
expressing pixel-by-pixel dot multiplication, CRF (-) expressing conditional random field optimization, and alpha expressing a hyperparameter larger than 1;
step S22: by depth map IdepthFor the initial saliency map ScdcpCarrying out depth enhancement to obtain a depth-enhanced saliency map
Figure BDA0002964086550000035
Then carrying out depth optimization through a conditional random field to obtain an optimized saliency map
Figure BDA0002964086550000036
It is formulated as:
Figure BDA0002964086550000037
Figure BDA0002964086550000038
wherein,
Figure BDA0002964086550000039
expressing pixel-by-pixel dot multiplication, CRF (-) expressing conditional random field optimization, and beta expressing a hyperparameter larger than 1;
step S23: the optimized class response graph
Figure BDA00029640865500000310
And saliency map
Figure BDA00029640865500000311
Fusing to a pseudo tag Y with lower noiseNoisyThe method is used for training the network model and is formulated as follows:
Figure BDA00029640865500000312
where x denotes a multiplier and δ denotes a parameter greater than 0 and less than 1.
Further, the step S3 specifically includes the following steps:
step S31: constructing a network model for RGBD image significance detection, wherein the network model consists of a feature fusion module and a full convolution neural network (FCN) module;
step S32: and constructing a mixed loss function comprising weighted cross entropy loss, conditional random field inference loss and edge loss, and training the network model by using the mixed loss function to obtain the network model with good robustness.
Further, the step S31 specifically includes the following steps:
step S311: constructing a characteristic fusion module which is formed by two 3 multiplied by 3 convolutions and used for inputting a color image I of the network modelrgbAnd depth map IdepthCarrying out feature fusion; firstly, carrying out channel splicing on an input color image and a depth image to generate a network model input with the size of (b, 4, h, w); this input is then convolved by two layers 3X 3 to obtain a feature X' of size (b, 3, h, w), which is expressed by the formula:
Input=Concat(Irgb,Idepth)
X=Conv3×3(Input)
X′=Conv3×3(X)
wherein, Concat () represents a splicing operator, Input represents the Input of the network model, and X represents the intermediate feature of convolution;
step S312: the FCN module changes the last layer of the classification network into a convolution layer and performs pooling on the 5 th layer of the classification network to obtain the characteristic Feat5Performing upsampling, performing convolution to obtain features with fewer channels, and performing activation function to obtain a final significance prediction graph, wherein the final significance prediction graph is expressed by a formula:
out=FCN(X′)
S=Sigmoid(out)
wherein, FCN (-) represents FCN module, out represents output of network model, Sigmoid (-) represents Sigmoid activating function, and S represents saliency map predicted by network model.
Further, the step S32 specifically includes the following steps:
step S321: reconstructing an original cross entropy loss function to obtain a weighted cross entropy loss function, and reducing the influence of noise in a label during network model training, wherein the formula expression is as follows:
Figure BDA0002964086550000041
Figure BDA0002964086550000042
w=|Y[i,j]-0.5|
wherein w denotes acting on a certain pixelThe weight of the loss of (a) is,
Figure BDA0002964086550000043
representing a weighted cross-entropy loss function, YNoisyIndicates the pseudo tag generated in step S23,
Figure BDA0002964086550000044
representing an original cross entropy loss function, Y representing a real label, i and j representing indexes of rows and columns where pixels are located, log (-) representing a logarithmic function, and | represents an absolute value operator;
step S322: constructing a conditional random field inference loss function, so that the network model can infer uncertain regions in the pseudo labels through the determined labels, and the formula expression is as follows:
Scrf=CRF(S,Irgb)
Figure BDA0002964086550000045
wherein CRF (. cndot.) represents conditional random field optimization, ScrfRepresenting the saliency map after conditional random field optimization, in this step as saliency map S for label supervised prediction,
Figure BDA0002964086550000051
representing a conditional random field inference loss function;
step S323: constructing an edge loss function to optimize the edges of the prediction saliency map;
firstly, a color image IrgbConverted into a grey-scale map IgrayAnd obtaining a global edge map I through an edge detection operatoredgeIt is formulated as:
Iedge=ΔIgray
wherein Δ represents a gradient operation in edge detection;
secondly, the predicted saliency map S is subjected to expansion and erosion operations to generate a mask map ImaskActing on the edge map to filter out redundant edges to obtain edge lossA missing label, which is formulated as:
Sdil=Dilate(S)
Sero=Erode(S)
Imask=Sdil-Sero
Figure BDA0002964086550000052
wherein, the die (-) represents the dilation operation, the Erode (-) represents the erosion operation,
Figure BDA0002964086550000053
indicating a pixel-by-pixel dot multiplication, YedgeA label representing an effect on edge loss;
defining an edge loss function
Figure BDA0002964086550000054
Comprises the following steps:
Figure BDA0002964086550000055
wherein Δ S represents an edge map of the predicted saliency map;
step S324: the losses in steps S321-S323 are summed to obtain the final mixed loss function:
Figure BDA0002964086550000056
wherein,
Figure BDA0002964086550000057
representing the mixing loss function.
Further, the Adam optimizer optimizes the mixing loss function to obtain the optimal parameters of the network model for testing the network model.
The invention also provides a weakly supervised RGBD image saliency detection system based on image classification, comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the method steps are implemented when the computer program is run by the processor.
Compared with the prior art, the invention has the following beneficial effects: the invention provides a weakly supervised RGBD image saliency detection scheme, designs a depth optimization strategy to optimize a pseudo label, simultaneously considers noise on the pseudo label and incomplete label objects, and constructs a mixed loss to enable a model to effectively infer the full view of the object.
Drawings
Fig. 1 is a schematic flow chart of a method implementation of the embodiment of the present invention.
FIG. 2 is a network model architecture diagram of weakly supervised RGBD image saliency detection in an embodiment of the present invention.
FIG. 3 is a schematic diagram of a feature fusion module according to an embodiment of the invention.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the present embodiment provides a weakly supervised RGBD image saliency detection method based on image classification, including the following steps:
step S1: for the images in the training data set, respectively utilizing a class response mechanism based on gradient and a traditional RGBD image salient object detection algorithm to generate a class response image IcamAnd an initial saliency map Scdcp
Step S2: carrying out depth optimization on the class response graph and the initial saliency map, and fusing the class response graph and the initial saliency map to generate an initial saliency map pseudo label Ynoisy
Step S3: and constructing a network model and a mixing loss function for the RGBD image saliency detection. And training the network model, and learning the optimal parameters of the network model by minimizing the hybrid loss to obtain the trained network model.
Step S4: and predicting a saliency map of the RGBD image by using the trained network model.
Wherein the color map is RGB in fig. 2, and the Depth map is Depth in fig. 2. The gradient-based class response mechanism generally frames an upper network frame as in fig. 2.
In this embodiment, the step S1 specifically includes the following steps:
step S11: scaling each color map in the training data set and its corresponding depth map together to make all the RGBD images in the training data set have the same size, so that the saliency map pseudo label Y generated in step S2noisyHave the same size.
Step S12: color map I after zoomingrgbInputting a pre-trained classification network model ResNet50 for image classification to obtain a final layer generation characteristic diagram set of ResNet50 convolutional layer, and defining the final layer generation characteristic diagram set as a matrix A belonging to RH×W×NWhere H, W denotes the height and width of the feature map and N denotes the number of channels. In the gradient-based class response mechanism, the feature map set a is linearly combined into a class response map, and the weight of the linear combination is determined by the partial derivative of the classification probability on the feature map. The method specifically comprises the following steps: first, the classification result y of the last layer is divided intocAnd the kth feature map A in the feature map setkPartial derivatives are calculated and passed through global averagingPooling results in linear combination weights acting on the profile
Figure BDA0002964086550000071
It is formulated as:
Figure BDA0002964086550000072
wherein GAP (-) represents a global average pooling operator,
Figure BDA0002964086550000073
representing partial derivative operations.
Secondly, linearly combining the feature graphs and generating a preliminary class response graph through Relu function filtering
Figure BDA0002964086550000074
It is formulated as:
Figure BDA0002964086550000075
where Relu (-) denotes the Relu activation function and Σ denotes the summing operation.
Finally, normalizing the preliminary class response graph to obtain a final class response graph Icam(e.g., the category response graph in FIG. 2), which is formulated as:
Figure BDA0002964086550000076
where MaxPool represents the maximum pooling.
Step S13: color drawing IrgbAnd depth map IdepthMeanwhile, an initial saliency map S is generated through an RGBD image saliency detection algorithm based on central dark channel priorcdcpIt is formulated as:
Scdcp=functioncdcp(Irgb,Idepth)
wherein the functioncdcp(. to) shows an RGBD image saliency detection algorithm based on a central dark channel prior.
In this embodiment, the step S2 specifically includes the following steps:
step S21: firstly, carrying out depth enhancement on the category response map Icam through a depth map Idepth to obtain a depth-enhanced category response map
Figure BDA0002964086550000081
Then carrying out deep optimization through a conditional random field to obtain an optimized class response map
Figure BDA0002964086550000082
It is formulated as:
Figure BDA0002964086550000083
Figure BDA0002964086550000084
wherein,
Figure BDA0002964086550000085
representing a pixel-by-pixel dot product, CRF (-) represents a conditional random field optimization, and α represents a hyperparameter greater than 1.
Step S22: by depth map IdepthFor the initial saliency map ScdcpCarrying out depth enhancement to obtain a depth-enhanced saliency map
Figure BDA0002964086550000086
Then carrying out depth optimization through a conditional random field to obtain an optimized saliency map
Figure BDA0002964086550000087
It is formulated as:
Figure BDA0002964086550000088
Figure BDA0002964086550000089
wherein,
Figure BDA00029640865500000810
representing a pixel-by-pixel dot product, CRF (·) represents conditional random field optimization, and β represents a hyperparameter greater than 1.
Step S23: the optimized class response graph
Figure BDA00029640865500000811
And saliency map
Figure BDA00029640865500000812
Fusing to a pseudo tag Y with lower noiseNoisy(noise label in fig. 2) for training of the network model, which is formulated as:
Figure BDA00029640865500000813
where x denotes a multiplier and δ denotes a parameter greater than 0 and less than 1.
In this embodiment, the step S3 specifically includes the following steps:
step S31: and constructing a network model (as shown in figure 2) for the RGBD image saliency detection, wherein the network model consists of a feature fusion module (as shown in figure 3) and a full convolution neural network (FCN) module. The step S31 specifically includes the following steps:
step S311: constructing a characteristic fusion module which is formed by two 3 multiplied by 3 convolutions and used for inputting a color image I of the network modelrgbAnd depth map IdepthAnd performing feature fusion. Firstly, channel splicing is carried out on the input color image and the input depth image to generate a network model input with the size of (b, 4, h, w). The input is then convolved by two layers 3 x 3 to get largeA characteristic X' as small as (b, 3, h, w), which is expressed by the formula:
Input=Concat(Irgb,Idepth)
X=Conv3×3(Input)
X′=Conv3×3(X)
where Concat (·) represents the concatenation operator, Input represents the Input to the network model, and X represents the intermediate features of the convolution.
Step S312: the FCN module changes the last layer of the classification network into a convolution layer and performs pooling on the 5 th layer of the classification network to obtain the characteristic Feat5Performing upsampling, performing convolution to obtain features with fewer channels, and performing activation function to obtain a final significance prediction graph, wherein the final significance prediction graph is expressed by a formula:
out=FCN(X′)
S=Sigmoid(out)
wherein, FCN (-) represents FCN module, out represents output of network model, Sigmoid (-) represents Sigmoid activating function, and S represents saliency map predicted by network model.
Step S32: and constructing a mixed loss function comprising weighted cross entropy loss, conditional random field inference loss and edge loss, and training the network model by using the mixed loss function to obtain the network model with good robustness. The step S32 specifically includes the following steps:
step S321: reconstructing an original cross entropy loss function to obtain a weighted cross entropy loss function, and reducing the influence of noise in a label during network model training, wherein the formula expression is as follows:
Figure BDA0002964086550000091
Figure BDA0002964086550000092
w=|Y[i,j]-0.5|
wherein w is represented asWith the loss weight on a certain pixel,
Figure BDA0002964086550000093
represents a weighted cross-entropy loss function, YNoisy represents the pseudo label generated in step S23,
Figure BDA0002964086550000094
representing the original cross-entropy loss function, Y the true label, i and j the indices of the row and column where the pixel is located, log (-) represents the logarithmic function, and | represents the absolute operator.
Step S322: constructing a conditional random field inference loss function, so that the network model can infer uncertain regions in the pseudo labels through the determined labels, and the formula expression is as follows:
Scrf=CRF(S,Irgb)
Figure BDA0002964086550000101
wherein CRF (. cndot.) represents conditional random field optimization, ScrfRepresenting the saliency map after conditional random field optimization, in this step as saliency map S for label supervised prediction,
Figure BDA0002964086550000102
representing a conditional random field inference loss function.
Step S323: constructing an edge loss function optimizes predicting the edges of the saliency map.
Firstly, a color image IrgbConverted into a grey-scale map IgrayAnd obtaining a global edge map I through an edge detection operatoredgeIt is formulated as:
Iedge=ΔIgray
where Δ represents the gradient operation in edge detection.
Secondly, the predicted saliency map S is subjected to expansion and erosion operations to generate a mask map ImaskActing on the edge map to filter out redundant edges to obtainA label for edge loss, formulated as:
Sdil=Dilate(S)
Sero=Erode(S)
Imask=Sdil-Sero
Figure BDA0002964086550000103
wherein, the die (-) represents the dilation operation, the Erode (-) represents the erosion operation,
Figure BDA0002964086550000104
indicating a pixel-by-pixel dot multiplication, YedgeIndicating the label acting on the edge loss.
Defining an edge loss function
Figure BDA0002964086550000105
Comprises the following steps:
Figure BDA0002964086550000106
here, AS represents an edge graph of the predicted saliency map.
Step S324: the losses in steps S321-S323 are summed to obtain the final mixed loss function:
Figure BDA0002964086550000108
wherein,
Figure BDA0002964086550000107
representing the mixing loss function.
And then, optimizing the mixing loss function through an Adam optimizer to obtain the optimal parameters of the network model for testing the network model.
The invention also provides a weakly supervised RGBD image saliency detection system based on image classification, comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein when the computer program is run by the processor, the above-mentioned method steps are implemented.
The depth map is an expression of the spatial distance between the object and the camera, can provide sufficient position information, and the depth map with smaller noise amplitude can provide complete object structure information, and the depth map is considered as additional auxiliary information for the salient detection of the weakly supervised image. The invention provides a weakly supervised RGBD image saliency detection framework, designs a depth optimization strategy to optimize a pseudo label, considers noise on the pseudo label and incomplete label objects, and designs a mixing loss to enable a model to effectively deduce the full view of the object, thereby remarkably improving the detection precision of the weakly supervised RGBD image saliency object.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.

Claims (8)

1. A weak supervision RGBD image saliency detection method based on image classification is characterized by comprising the following steps:
step S1: for the images in the training data set, respectively utilizing a class response mechanism based on gradient and an RGBD image salient object detection algorithm to generate a class response image IcamAnd an initial saliency map Scdcp
Step S2: carrying out depth optimization on the class response graph and the initial saliency map, and fusing the class response graph and the initial saliency map to generate an initial saliency map pseudo label Ynoisy
Step S3: constructing a network model and a mixing loss function for RGBD image significance detection; training the network model, and learning the optimal parameters of the network model by minimizing the hybrid loss to obtain the trained network model;
step S4: and predicting a saliency map of the RGBD image by using the trained network model.
2. The weakly supervised RGBD image saliency detection method based on image classification as claimed in claim 1, wherein the step S1 specifically includes the following steps:
step S11: scaling each color image and the corresponding depth image in the training data set together to ensure that the sizes of all RGBD images in the training data set are the same;
step S12: color map I after zoomingrgbInputting a pre-trained classification network model ResNet50 for image classification to obtain a final layer generation characteristic diagram set of ResNet50 convolutional layer, and defining the final layer generation characteristic diagram set as a matrix A belonging to RH×W×NWherein H, W represents the height and width of the feature map and N represents the number of channels; in the gradient-based class response mechanism, a feature map set A is linearly combined into a class response map, and the weight of the linear combination is determined by the partial derivative of the classification probability on the feature map; the method specifically comprises the following steps: first, the classification result y of the last layer is divided intocAnd the kth feature map A in the feature map setkPartial derivatives are calculated and linear combination weights acting on the feature map are obtained through global average pooling
Figure FDA0002964086540000015
It is formulated as:
Figure FDA0002964086540000011
wherein GAP (-) represents a global average pooling operator,
Figure FDA0002964086540000012
represents a partial derivative operation;
secondly, the feature maps are linearly groupedCombined and filtered by Relu function to generate preliminary class response graph
Figure FDA0002964086540000013
It is formulated as:
Figure FDA0002964086540000014
wherein Relu (-) denotes a Relu activation function, and Σ denotes a summing operation;
finally, normalizing the preliminary class response graph to obtain a final class response graph IcamIt is formulated as:
Figure FDA0002964086540000021
wherein MaxPool represents maximum pooling;
step S13: color drawing IrgbAnd depth map IdepthMeanwhile, an initial saliency map S is generated through an RGBD image saliency detection algorithm based on central dark channel priorcdcpIt is formulated as:
Scdcp=functioncdcp(Irgb,Idepth)
wherein the functioncdcp(. to) shows an RGBD image saliency detection algorithm based on a central dark channel prior.
3. The weakly supervised RGBD image saliency detection method based on image classification as claimed in claim 2, wherein the step S2 specifically includes the following steps:
step S21: first by a depth map IdepthResponse to class diagram IcamCarrying out depth enhancement to obtain a class response map with the depth enhancement
Figure FDA0002964086540000022
Then subject to the following conditionsCarrying out deep optimization on the airport to obtain an optimized class response diagram
Figure FDA0002964086540000023
It is formulated as:
Figure FDA0002964086540000024
Figure FDA0002964086540000025
wherein,
Figure FDA0002964086540000026
expressing pixel-by-pixel dot multiplication, CRF (-) expressing conditional random field optimization, and alpha expressing a hyperparameter larger than 1;
step S22: by depth map IdepthFor the initial saliency map ScdcpCarrying out depth enhancement to obtain a depth-enhanced saliency map
Figure FDA0002964086540000027
Then carrying out depth optimization through a conditional random field to obtain an optimized saliency map
Figure FDA0002964086540000028
It is formulated as:
Figure FDA0002964086540000029
Figure FDA00029640865400000210
wherein,
Figure FDA00029640865400000211
expressing pixel-by-pixel dot multiplication, CRF (-) expressing conditional random field optimization, and beta expressing a hyperparameter larger than 1;
step S23: the optimized class response graph
Figure FDA00029640865400000212
And saliency map
Figure FDA00029640865400000213
Fusing to a pseudo tag Y with lower noiseNoisyThe method is used for training the network model and is formulated as follows:
Figure FDA0002964086540000031
where x denotes a multiplier and δ denotes a parameter greater than 0 and less than 1.
4. The weakly supervised RGBD image saliency detection method based on image classification as claimed in claim 3, wherein the step S3 specifically includes the following steps:
step S31: constructing a network model for RGBD image significance detection, wherein the network model consists of a feature fusion module and a full convolution neural network (FCN) module;
step S32: and constructing a mixed loss function comprising weighted cross entropy loss, conditional random field inference loss and edge loss, and training the network model by using the mixed loss function to obtain the network model with good robustness.
5. The weakly supervised RGBD image saliency detection method based on image classification as claimed in claim 4, wherein the step S31 specifically includes the following steps:
step S311: constructing a characteristic fusion module which is formed by two 3 multiplied by 3 convolutions and used for inputting a color image I of the network modelrgbAnd depth map IdepthCarrying out feature fusion; firstly, carrying out channel splicing on an input color image and a depth image to generate a network model input with the size of (b, 4, h, w); this input is then convolved by two layers 3X 3 to obtain a feature X' of size (b, 3, h, w), which is expressed by the formula:
Input=Concat(Irgb,Idepth)
X=Conv3×3(Input)
X′=Conv3×3(X)
wherein, Concat () represents a splicing operator, Input represents the Input of the network model, and X represents the intermediate feature of convolution;
step S312: the FCN module changes the last layer of the classification network into a convolution layer and performs pooling on the 5 th layer of the classification network to obtain the characteristic Feat5Performing upsampling, performing convolution to obtain features with fewer channels, and performing activation function to obtain a final significance prediction graph, wherein the final significance prediction graph is expressed by a formula:
out=FCN(X′)
S=Sigmoid(out)
wherein, FCN (-) represents FCN module, out represents output of network model, Sigmoid (-) represents Sigmoid activating function, and S represents saliency map predicted by network model.
6. The weakly supervised RGBD image saliency detection method based on image classification as claimed in claim 5, wherein the step S32 specifically includes the following steps:
step S321: reconstructing an original cross entropy loss function to obtain a weighted cross entropy loss function, and reducing the influence of noise in a label during network model training, wherein the formula expression is as follows:
Figure FDA0002964086540000041
Figure FDA0002964086540000042
w=|Y[i,j]-0.5|
where w represents the loss weight applied to a pixel,
Figure FDA0002964086540000043
representing a weighted cross-entropy loss function, YNoisyIndicates the pseudo tag generated in step S23,
Figure FDA0002964086540000044
representing an original cross entropy loss function, Y representing a real label, i and j representing indexes of rows and columns where pixels are located, log (-) representing a logarithmic function, and | represents an absolute value operator;
step S322: constructing a conditional random field inference loss function, so that the network model can infer uncertain regions in the pseudo labels through the determined labels, and the formula expression is as follows:
Scrf=CRF(S,Irgb)
Figure FDA0002964086540000045
wherein CRF (. cndot.) represents conditional random field optimization, ScrfRepresenting the saliency map after conditional random field optimization, in this step as saliency map S for label supervised prediction,
Figure FDA0002964086540000046
representing a conditional random field inference loss function;
step S323: constructing an edge loss function to optimize the edges of the prediction saliency map;
firstly, a color image IrgbConverted into a grey-scale map IgrayAnd obtaining a global edge map I through an edge detection operatoredgeIt is formulated as:
Iedge=ΔIgray
wherein Δ represents a gradient operation in edge detection;
secondly, the predicted saliency map S is subjected to expansion and erosion operations to generate a mask map ImaskAnd filtering redundant edges on the edge graph to obtain a label of edge loss, wherein the label is expressed by the formula:
Sdil=Dilate(S)
Sero=Erode(S)
Imask=Sdil-Sero
Figure FDA0002964086540000051
wherein, the die (-) represents the dilation operation, the Erode (-) represents the erosion operation,
Figure FDA0002964086540000052
indicating a pixel-by-pixel dot multiplication, YedgeA label representing an effect on edge loss;
defining an edge loss function
Figure FDA0002964086540000053
Comprises the following steps:
Figure FDA0002964086540000054
wherein Δ S represents an edge map of the predicted saliency map;
step S324: the losses in steps S321-S323 are summed to obtain the final mixed loss function:
Figure FDA0002964086540000055
wherein,
Figure FDA0002964086540000056
representing the mixing loss function.
7. The image classification-based weakly supervised RGBD image saliency detection method according to claim 6, characterized in that the optimal parameters of the network model are obtained by optimizing the mixture loss function by an Adam optimizer for testing the network model.
8. A weakly supervised RGBD image saliency detection system based on image classification, comprising a memory, a processor and a computer program stored on the memory and being executable on the processor, the computer program, when executed by the processor, implementing the method steps of any of claims 1-7.
CN202110245920.XA 2021-03-05 2021-03-05 Weak supervision RGBD image saliency detection method and system based on image classification Active CN112861880B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110245920.XA CN112861880B (en) 2021-03-05 2021-03-05 Weak supervision RGBD image saliency detection method and system based on image classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110245920.XA CN112861880B (en) 2021-03-05 2021-03-05 Weak supervision RGBD image saliency detection method and system based on image classification

Publications (2)

Publication Number Publication Date
CN112861880A true CN112861880A (en) 2021-05-28
CN112861880B CN112861880B (en) 2021-12-07

Family

ID=75994082

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110245920.XA Active CN112861880B (en) 2021-03-05 2021-03-05 Weak supervision RGBD image saliency detection method and system based on image classification

Country Status (1)

Country Link
CN (1) CN112861880B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436115A (en) * 2021-07-30 2021-09-24 西安热工研究院有限公司 Image shadow detection method based on depth unsupervised learning
CN115080748A (en) * 2022-08-16 2022-09-20 之江实验室 Weak supervision text classification method and device based on noisy label learning
CN116978008A (en) * 2023-07-12 2023-10-31 睿尔曼智能科技(北京)有限公司 RGBD-fused semi-supervised target detection method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102364560A (en) * 2011-10-19 2012-02-29 华南理工大学 Traffic sign convenient for electronic identification and method for identifying traffic sign
CN105791660A (en) * 2014-12-22 2016-07-20 中兴通讯股份有限公司 Method and device for correcting photographing inclination of photographed object and mobile terminal
CN107292318A (en) * 2017-07-21 2017-10-24 北京大学深圳研究生院 Image significance object detection method based on center dark channel prior information
CN107452030A (en) * 2017-08-04 2017-12-08 南京理工大学 Method for registering images based on contour detecting and characteristic matching
CN108399406A (en) * 2018-01-15 2018-08-14 中山大学 The method and system of Weakly supervised conspicuousness object detection based on deep learning
CN109410171A (en) * 2018-09-14 2019-03-01 安徽三联学院 A kind of target conspicuousness detection method for rainy day image
CN110598609A (en) * 2019-09-02 2019-12-20 北京航空航天大学 Weak supervision target detection method based on significance guidance

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102364560A (en) * 2011-10-19 2012-02-29 华南理工大学 Traffic sign convenient for electronic identification and method for identifying traffic sign
CN105791660A (en) * 2014-12-22 2016-07-20 中兴通讯股份有限公司 Method and device for correcting photographing inclination of photographed object and mobile terminal
CN107292318A (en) * 2017-07-21 2017-10-24 北京大学深圳研究生院 Image significance object detection method based on center dark channel prior information
CN107452030A (en) * 2017-08-04 2017-12-08 南京理工大学 Method for registering images based on contour detecting and characteristic matching
CN108399406A (en) * 2018-01-15 2018-08-14 中山大学 The method and system of Weakly supervised conspicuousness object detection based on deep learning
CN109410171A (en) * 2018-09-14 2019-03-01 安徽三联学院 A kind of target conspicuousness detection method for rainy day image
CN110598609A (en) * 2019-09-02 2019-12-20 北京航空航天大学 Weak supervision target detection method based on significance guidance

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHUNBIAO ZHU ET AL.: "Exploiting the Value of the Center-dark Channel Prior for Salient Object Detection", 《ARXIV:1805.05132V1》 *
RAMPRASAATH R. SELVARAJU ET AL.: "Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization", 《2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436115A (en) * 2021-07-30 2021-09-24 西安热工研究院有限公司 Image shadow detection method based on depth unsupervised learning
CN113436115B (en) * 2021-07-30 2023-09-19 西安热工研究院有限公司 Image shadow detection method based on depth unsupervised learning
CN115080748A (en) * 2022-08-16 2022-09-20 之江实验室 Weak supervision text classification method and device based on noisy label learning
CN115080748B (en) * 2022-08-16 2022-11-11 之江实验室 Weak supervision text classification method and device based on learning with noise label
CN116978008A (en) * 2023-07-12 2023-10-31 睿尔曼智能科技(北京)有限公司 RGBD-fused semi-supervised target detection method and system
CN116978008B (en) * 2023-07-12 2024-04-26 睿尔曼智能科技(北京)有限公司 RGBD-fused semi-supervised target detection method and system

Also Published As

Publication number Publication date
CN112861880B (en) 2021-12-07

Similar Documents

Publication Publication Date Title
CN112861880B (en) Weak supervision RGBD image saliency detection method and system based on image classification
CN104268594B (en) A kind of video accident detection method and device
CN110532900B (en) Facial expression recognition method based on U-Net and LS-CNN
CN113657560B (en) Weak supervision image semantic segmentation method and system based on node classification
CN113158862B (en) Multitasking-based lightweight real-time face detection method
CN112906485B (en) Visual impairment person auxiliary obstacle perception method based on improved YOLO model
CN111861925B (en) Image rain removing method based on attention mechanism and door control circulation unit
CN113807355A (en) Image semantic segmentation method based on coding and decoding structure
CN111696110B (en) Scene segmentation method and system
CN110298387A (en) Incorporate the deep neural network object detection method of Pixel-level attention mechanism
CN113065546B (en) Target pose estimation method and system based on attention mechanism and Hough voting
CN111882002A (en) MSF-AM-based low-illumination target detection method
CN108062756A (en) Image, semantic dividing method based on the full convolutional network of depth and condition random field
CN108256562A (en) Well-marked target detection method and system based on Weakly supervised space-time cascade neural network
CN111104903A (en) Depth perception traffic scene multi-target detection method and system
CN111428664B (en) Computer vision real-time multi-person gesture estimation method based on deep learning technology
CN111368637B (en) Transfer robot target identification method based on multi-mask convolutional neural network
CN112801104B (en) Image pixel level pseudo label determination method and system based on semantic segmentation
CN114693930B (en) Instance segmentation method and system based on multi-scale features and contextual attention
CN114821050B (en) Method for dividing reference image based on transformer
CN114724155A (en) Scene text detection method, system and equipment based on deep convolutional neural network
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
CN116740362A (en) Attention-based lightweight asymmetric scene semantic segmentation method and system
CN114581789A (en) Hyperspectral image classification method and system
CN117253184B (en) Foggy day image crowd counting method guided by foggy priori frequency domain attention characterization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant