Semi-and weakly-supervised semantic segmentation with deep convolutional neural networks
Proceedings of the 23rd ACM international conference on Multimedia, 2015•dl.acm.org
Successful semantic segmentation methods typically rely on the training datasets containing
a large number of pixel-wise labeled images. To alleviate the dependence on such a fully
annotated training dataset, in this paper, we propose a semi-and weakly-supervised
learning framework by exploring images most only with image-level labels and very few with
pixel-level labels, in which two stages of Convolutional Neural Network (CNN) training are
included. First, a pixel-level supervised CNN is trained on very few fully annotated images …
a large number of pixel-wise labeled images. To alleviate the dependence on such a fully
annotated training dataset, in this paper, we propose a semi-and weakly-supervised
learning framework by exploring images most only with image-level labels and very few with
pixel-level labels, in which two stages of Convolutional Neural Network (CNN) training are
included. First, a pixel-level supervised CNN is trained on very few fully annotated images …
Successful semantic segmentation methods typically rely on the training datasets containing a large number of pixel-wise labeled images. To alleviate the dependence on such a fully annotated training dataset, in this paper, we propose a semi- and weakly-supervised learning framework by exploring images most only with image-level labels and very few with pixel-level labels, in which two stages of Convolutional Neural Network (CNN) training are included. First, a pixel-level supervised CNN is trained on very few fully annotated images. Second, given a large number of images with only image-level labels available, a collaborative-supervised CNN is designed to jointly perform the pixel-level and image-level classification tasks, while the pixel-level labels are predicted by the fully-supervised network in the first stage. The collaborative-supervised network can remain the discriminative ability of the fully-supervised model learned with fully labeled images, and further enhance the performance by importing more weakly labeled data. Our experiments on two challenging datasets, i.e, PASCAL VOC 2007 and LabelMe LMO, demonstrate the satisfactory performance of our approach, nearly matching the results achieved when all training images have pixel-level labels.
ACM Digital Library