Abstract
Edge detection is among the most fundamental vision problems for its role in perceptual grouping and its wide applications. Recent advances in representation learning have led to considerable improvements in this area. Many state of the art edge detection models are learned with fully convolutional networks (FCNs). However, FCN-based edge learning tends to be vulnerable to misaligned labels due to the delicate structure of edges. While such problem was considered in evaluation benchmarks, similar issue has not been explicitly addressed in general edge learning. In this paper, we show that label misalignment can cause considerably degraded edge learning quality, and address this issue by proposing a simultaneous edge alignment and learning framework. To this end, we formulate a probabilistic model where edge alignment is treated as latent variable optimization, and is learned end-to-end during network training. Experiments show several applications of this work, including improved edge detection with state of the art performance, and automatic refinement of noisy annotations.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Edge Alignment
- Learning Edge
- Fully Convolutional Network (FCNs)
- Label Alignment
- Sigmoid Cross Entropy Loss
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
Over the past decades, edge detection played a significant role in computer vision. Early edge detection methods often formulate the task as a low-level or mid-level grouping problem where Gestalt laws and perceptual grouping play considerable roles in algorithm design [7, 16, 23, 44]. Latter works start to consider learning edges in a data-driven way, by looking into the statistics of features near boundaries [1, 2, 12, 13, 25, 31, 34, 39]. More recently, advances in deep representation learning [18, 26, 43] have further led to significant improvements on edge detection, pushing the boundaries of state of the art performance [3, 20, 24, 49, 50] to new levels. The associated tasks also expended from the conventional binary edge detection problems to the recent more challenging category-aware edge detection problems [4, 17, 22, 38, 52]. As a result of such advancement, a wide variety of other vision problems have enjoyed the benefits of reliable edge detectors. Examples of these applications include, but are not limited to (semantic) segmentation [1, 4, 5, 9, 51], object proposal generation [4, 50, 53], object detection [29], depth estimation [19, 32], and 3D vision [21, 33, 42], etc.
With the strong representation abilities of deep networks and the dense labeling nature of edge detection, many state of the art edge detectors are based on FCNs. Despite the underlying resemblance to other dense labeling tasks, edge learning problems face some typical challenges and issues. First, in light of the highly imbalanced amounts of positive samples (edge pixels) and negative samples (non-edge pixels), using reweighted losses where positive samples are weighted higher has become a predominant choice in recent deep edge learning frameworks [22, 24, 30, 49, 52]. While such a strategy to some extent renders better learning behaviorsFootnote 1, it also induces thicker detected edges as well as more false positives. An example of this issue is illustrated in Figs. 1(c) and (g), where the edge mapspredicted by CASENet [52] contains thick object boundaries. A direct consequence is that many local details are missing, which is not favored for other potential applications using edge detectors.
Another challenging issue for edge learning is the training label noise caused by inevitable misalignment during annotation. Unlike segmentation, edge learning is generally more vulnerable to such noise due to the fact that edge structures by nature are much more delicate than regions. Even slight misalignment can lead to significant proportion of mismatches between ground truth and prediction. In order to predict sharp edges, a model should learn to distinguish the few true edge pixels while suppressing edge responses near them. This already presents a considerable challenge to the model as non-edge pixels near edges are likely to be hard negatives with similar features, while the presence of misalignment further causes significant confusion by continuously sending false positives during training. The problem is further aggravated under reweighted losses, where predicting more false positives near the edge is be an effective way to decrease the loss due to the significant higher weights of positive samples.
Unfortunately, completely eliminating misalignment during annotation is almost impossible given the limit of human precision and the diminishing gain of annotation quality from additional efforts as a result. For datasets such as Cityscapes [11] where high quality labels are generated by professional annotators, misalignment can still be frequently observed. For datasets with crowdsourcing annotations where quality control presents another challenge, the issue can become even more severe. Our proposed solution is an end-to-end framework towards Simultaneous Edge Alignment and Learning (SEAL). In particular, we formulate the problem with a probabilistic model, treating edge labels as latent variables to be jointly learned during training. We show that the optimization of latent edge labels can be transformed into a bipartite graph min-cost assignment problem, and present an end-to-end learning framework towards model training. Figure 2 shows some examples where the model gradually learns how to align noisy edge labels to more accurate positions along with edge learning.
Contrary to the widely believed intuition that reweighted loss benefits edge learning problems, an interesting and counter-intuitive observation made in this paper is that (regular) sigmoid cross-entropy loss works surprisingly well under the proposed framework despite the extremely imbalanced distribution. The underlying reason is that edge alignment significantly reduces the training confusion by increasing the purity of positive edge samples. Without edge alignment, on the other hand, the presence of label noise together with imbalanced distribution makes the model more difficult to correctly learn positive classes. As a result of the increased label quality and the benefit of better negative suppression using unweighted loss, our proposed framework produces state of the art detection performance with high quality sharp edges (see Figs. 1(d) and (h)).
2 Related Work
2.1 Boundary Map Correspondence
Our work is partly motivated by the early work of boundary evaluation using precision-recall and F-measure [34]. To address misalignment between prediction and human ground truth, [34] proposed to compute a one-to-one correspondence for the subset of matchable edge pixels from both domains by solving a min-cost assignment problem. However, [34] only considers the alignment between fixed boundary maps, while our work addresses a more complicated learning problem where edge alignment becomes part of the optimization with learnable inputs.
2.2 Mask Refinement via Energy Minimization
Yang et al. [50] proposed to use dense-CRF to refine object mask and contour. Despite the similar goal, our method differs from [50] in that: 1. The refinement framework in [50] is a separate preprocessing step, while our work jointly learns refinement with the model in an end-to-end fashion. 2. The CRF model in [50] only utilizes low-level features, while our model considers both low-level and high-level information via a deep network. 3. The refinement framework in [50] is segmentation-based, while our framework directly targets edge refinement.
2.3 Object Contour and Mask Learning
A series of works [8, 37, 40] seek to learn object contours/masks in a supervised fashion. Deep active contour [40] uses learned CNN features to steer contour evolution given the input of an initialized contour. Polygon-RNN [8] introduced a semi-automatic approach for object mask annotation, by learning to extract polygons given input bounding boxes. DeepMask [37] proposed an object proposal generation method to output class-agnostic segmentation masks. These methods require accurate ground truth for contour/mask learning, while this work only assumes noisy ground truths and seek to refine them automatically.
2.4 Noisy Label Learning
Our work can be broadly viewed as a structured noisy label learning framework where we leverage abundant structural priors to correct label noise. Existing noisy label learning literatures have proposed directed graphical models [48], conditional random fields (CRF) [45], neural networks [46, 47], robust losses [35] and knowledge graph [27] to model and correct image-level noisy labels. Alternatively, our work considers pixel-level labels instead of image-level ones.
2.5 Virtual Evidence in Bayesian Networks
Our work also shares similarity with virtual evidence [6, 28, 36], where the uncertainty of an observation is modeled by a distribution rather than a single value. In our problem, noisy labels can be regarded as uncertain observations which give conditional prior distributions over different configurations of aligned labels.
3 A Probabilistic View Towards Edge Learning
In many classification problems, training of the models can be formulated as maximizing the following likelihood function with respect to the parameters:
where \(\mathbf {y}\), \(\mathbf {x}\) and \(\mathbf {W}\) indicate respectively training labels, observed inputs and model parameters. Depending on how the conditional probability is parameterized, the above likelihood function may correspond to different types of models. For example, a generalized linear model function leads to the well known logistic regression. If the parameterization is formed as a layered representation, the model may turn into CNNs or multilayer perceptrons. One may observe that many traditional supervised edge learning models can also be regarded as special cases under the above probabilistic framework. Here, we are mostly concerned with edge detection using fully convolutional neural networks. In this case, the variable \(\mathbf {y}\) indicates the set of edge prediction configurations at every pixel, while \(\mathbf {x}\) and \(\mathbf {W}\) denote the input image and the network parameters, respectively.
4 Simultaneous Edge Alignment and Learning
To introduce the ability of correcting edge labels during training, we consider the following model. Instead of treating the observed annotation \(\mathbf {y}\) as the fitting target, we assume there is an underlying ground truth \(\hat{\mathbf {y}}\) that is more accurate than \(\mathbf {y}\). Our goal is to treat \(\hat{\mathbf {y}}\) as a latent variable to be jointly estimated during learning, which leads to the following likelihood maximization problem:
where \(\hat{\mathbf {y}}\) indicates the underlying true ground truth. The former part \(P(\mathbf {y}|\hat{\mathbf {y}})\) can be regarded as an edge prior probabilistic model of an annotator generating labels given the observed ground truths, while the latter part \(P(\hat{\mathbf {y}}|\mathbf {x}; \mathbf {W})\) is the standard likelihood of the prediction model.
4.1 Multilabel Edge Learning
Consider the multilabel edge learning setting where one assumes that \(\mathbf {y}\) does not need to be mutually exclusive at each pixel. In other words, any pixel may correspond to the edges of multiple classes. The likelihood can be decomposed to a set of class-wise joint probabilities assuming the inter-class independence:
where \(\mathbf {y}^{k}\in \{0,1\}^N\) indicates the set of binary labels corresponding to the k-th class. A typical multilabel edge learning example which alsoassumes inter-class independence is CASENet [52]. In addition, binary edge detection methods such as HED [49] can be viewed as special cases of multilabel edge learning.
4.2 Edge Prior Model
Solving Eq. (2) is not easy given the additional huge search space of \(\hat{\mathbf {y}}\). Fortunately, there is some prior knowledge one could leverage to effectively regularize \(\hat{\mathbf {y}}\). One of the most important prior is that \(\hat{\mathbf {y}}^{k}\) should not be too different from \(\mathbf {y}^{k}\). In addition, we assume that edge pixels in \(\mathbf {y}^{k}\) is generated from those in \(\hat{\mathbf {y}}^{k}\) through an one-to-one assignment process, which indicates \(|\mathbf {y}^{k}|=|\hat{\mathbf {y}}^{k}|\). In other words, let \(y_{\mathbf {q}}^{k}\) denote the label of class k at pixel \(\mathbf {q}\), and similarly for \(\hat{y}_{\mathbf {p}}^{k}\), there exists a set of one-to-one correspondences between edge pixels in \(\hat{\mathbf {y}}^{k}\) and \(\mathbf {y}^{k}\):
where each \(m(\cdot )\) is associated with a finite set of pairs:
The edge prior therefore can be modeled as a product of Gaussian similarities maximized over all possible correspondences:
where \(\sigma \) is the bandwidth that controls the sensitivity to misalignment. The misalignment is quantified by measuring the lowest possible sum of squared distances between pairwise pixels, which is determined by the tightest correspondence.
4.3 Network Likelihood Model
We now consider the likelihood of the prediction model, where we assume that the class-wise joint probability can be decomposed to a set of pixel-wise probabilities modeled by bernoulli distributions with binary configurations:
where \(\mathbf {p}\) is the pixel location index, and \(h_k\) is the hypothesis function indicating the probability of the k-th class. We consider the prediction model as FCNs with k sigmoid outputs. As a result, the hypothesis function in Eq. (7) becomes the sigmoid function, which will be denoted as \(\sigma (\cdot )\) in the rest part of this section.
4.4 Learning
Taking Eqs. (6) and (7) into Eq. (3), and taking log of the likelihood, we have:
where the second part is the widely used sigmoid cross-entropy loss. Accordingly, learning the model requires solving the constrained optimization:
Given a training set, we take an alternative optimization strategy where \(\mathbf {W}\) is updated with \(\hat{\mathbf {y}}\) fixed, and vice versa. When \(\hat{\mathbf {y}}\) is fixed, the optimization becomes:
which is the typical network training with the aligned edge labels and can be solved with standard gradient descent. When \(\mathbf {W}\) is fixed, the optimization can be modeled as a constrained discrete optimization problem for each class:
where \(\sigma (\mathbf {p})\) denotes \(\sigma (\mathbf {p}|\mathbf {x};\mathbf {W})\) for short. Solving the above optimization is seemingly difficult, since one would need to enumerate all possible configurations of \(\hat{\mathbf {y}}^k\) satisfying \(|\hat{\mathbf {y}}^{k}| = |\mathbf {y}^{k}|\) and evaluate the associated cost. It turns out, however, that the above optimization can be elegantly transformed to a bipartite graph assignment problem with available solvers. We first have the following definition:
Definition 1
Let \(\hat{\mathbf {Y}}=\{\hat{\mathbf {y}}||\hat{\mathbf {y}}|=|\mathbf {y}|\}\), a mapping space \(\mathbf {M}\) is the space consisting all possible one-to-one mappings:
Definition 2
A label realization is a function which maps a correspondence to the corresponding label given:
Lemma 1
The mapping \(f_{L}(\cdot )\) is surjective.
Remark
Lemma 1 shows that a certain label configuration \(\hat{\mathbf {y}}\) may correspond to multiple underlying mappings. This is obviously true since there could be multiple ways in which pixels in \(\mathbf {y}\) are assigned to the \(\hat{\mathbf {y}}\).
Lemma 2
Under the constraint \(|\hat{\mathbf {y}}|=|\mathbf {y}|\), if:
then \(f_{L}(\mathbf {y},m^{*})=\hat{\mathbf {y}}^{*}\).
Proof
Suppose in the beginning all pixels in \(\hat{\mathbf {y}}\) are 0. The corresponding loss therefore is:
Flipping \(y_{\mathbf {p}}\) to 1 will accordingly introduce a cost \(\log (1-\sigma (\mathbf {p}))-\log \sigma (\mathbf {p})\) at pixel \(\mathbf {p}\). As a result, we have:
In addition, Lemma 1 states that the mapping \(f_{L}(\cdot )\) is surjective, which incites that the mapping search space \(\mathbf {M}\) exactly covers \(\hat{\mathbf {Y}}\). Thus the top optimization problem in Lemma 2 can be transformed into the bottom problem.
Lemma 2 motivates us to reformulate the optimization in Eq. (11) by alternatively looking to the following problem:
Equation (12) is a typical minimum cost bipartite assignment problem which can be solved by standard solvers, where the cost of each assignment pair \((\mathbf {p},\mathbf {q})\) is associated with the weight of a bipartite graphos edge. Following [34], we formulate a sparse assignment problem and use the Goldbergos CSA package, which is the best known algorithms for min-cost sparse assignment [10, 15]. Upon obtaining the mapping, one can recover \(\hat{\mathbf {y}}\) through label realization.
However, solving Eq. (12) assumes an underlying relaxation where the search space contains m which may not follow the infimum requirement in Eq. (11). In other words, it may be possible that the minimization problem in Eq. (12) is an approximation to Eq. (11). The following theorem, however, proves the optimality of Eq. (12):
Theorem 1
Given a solver that minimizes Eq. (12), the solution is also a minimizer of the problem in Eq. (11).
Proof
We use contradiction to prove Theorem 1. Suppose there exists a solution of (12) where:
There must exist another mapping \(m'\) which satisfies:
Since \(f_{L}(\mathbf {y},m')=f_{L}(\mathbf {y},m^{*})=\hat{\mathbf {y}}\), substituting \(m'\) to (12) leads to an even lower cost, which contradicts to the assumption that \(m^{*}\) is the minimizer of (12).
In practice, we follow the mini-batch SGD optimization, where \(\hat{\mathbf {y}}\) of each image and \(\mathbf {W}\) are both updated once in every batch. To begin with, \(\hat{\mathbf {y}}\) is initialized as \(\mathbf {y}\) for every image in the first batch. Basically, the optimization can be written as a loss layer in a network, and is fully compatible with end-to-end training.
4.5 Inference
We now consider the inference problem given a trained model. Ideally, the inference problem of the model trained by Eq. (2) would be the following:
However, in cases where \(\mathbf {y}\) is not available during testing. we can alternatively look into the second part of (2) which is the model learned under \(\hat{\mathbf {y}}\):
Both cases can find real applications. In particular, (14) corresponds to general edge prediction, whereas (13) corresponds to refining noisy edge labels in a dataset. In the latter case, \(\mathbf {y}\) is available and the inferred \(\hat{\mathbf {y}}\) is used to output the refined label. In the experiment, we will show examples of both applications.
5 Biased Gaussian Kernel and Markov Prior
The task of SEAL turns out not easy, as it tends to generate artifacts upon having cluttered background. A major reason causing this failure is the fragmented aligned labels, as shown in Fig. 3(a). This is not surprising since we assume an isotropic Gaussian kernel, where labels tend to break and shift along the edges towards easy locations. In light of this issue, we assume that the edge prior follows a biased Gaussian (B.G.), with the long axis of the kernel perpendicular to local boundary tangent. Accordingly, such model encourages alignment perpendicular to edge tangents while suppressing shifts along them.
Another direction is to consider the Markov properties of edges. Good edge labels should be relatively continuous, and nearby alignment vectors should be similar. Taking these into consideration, we can model the edge prior as:
where \(\lambda \) controls the strength of the smoothness. \(\mathcal {N}(\mathbf {q})\) is the neighborhood of \(\mathbf {q}\) defined by the geodesic distance along the edge. \(\mathbf {m}_{\mathbf {q}} = \mathbf {p}-\mathbf {q}\), and \(\mathbf {m}_{\mathbf {v}} = \mathbf {u}-\mathbf {v}\). An example of the improved alignment and a graphical illustration are shown in Figs. 3(b) and (c). In addition, the precision matrix \(\mathbf {\Sigma }_{\mathbf {q}}\) is defined as:
where \(\theta _{\mathbf {q}}\) is the angle between edge tangent and the positive x-axis, and \(\sigma _y\) corresponds to the kernel bandwidth perpendicular to the edge tangent. With the new prior, the alignment optimization becomes the following problem:
Note that Theorem 1 still holds for (16). However, solving (16) becomes more difficult as pairwise dependencies are included. As a result, standard assignment solvers can not be directly applied, and we alternatively decouple \(\mathcal {C}_{Pair}\) as:
and take the iterated conditional mode like iterative approximation where the alignment of neighboring pixels are taken from the alignment in previous round:
where the Assign and Update steps are repeated multiple times. The algorithm converges very fast in practice. Usually two or even one Assign is sufficient.
6 Experimental Results
In this section, we comprehensively test the performance of SEAL on category-ware semantic edge detection, where the detector not only needs to localize object edges, but also classify to a predefined set of semantic classes.
6.1 Backbone Network
In order to guarantee fair comparison across different methods, a fixed backbone network is needed for controlled evaluation. We choose CASENet [52] since it is the current state of the art on our task. For additional implementation details such as choice of hyperparameters, please refer to the supplementary material.
6.2 Evaluation Benchmarks
We follow [17] to evaluate edges with class-wise precision recall curves. However, the benchmarks of our work differ from [17] by imposing considerably stricter rules. In particular: 1. We consider non-suppressed edges inside an object as false positives, while [17] ignores these pixels. 2. We accumulate false positives on any image, while the benchmark code from [17] only accumulates false positives of a certain class on images containing that class. Our benchmark can also be regarded as a multiclass extension of the BSDS benchmark [34].
Both [17] and [34] by default thin the prediction before matching. We propose to match the raw predictions with unthinned ground truths whose width is kept the same as training labels. The benchmark therefore also considers the local quality of predictions. We refer to this mode as “Raw” and the previous conventional mode as “Thin”. Similar to [34], both settings use maximum F-Measure (MF) at optimal dataset scale (ODS) to evaluate the performance.
Another difference between the problem settings of our work and [17] is that we consider edges between any two instances as positive, even though the instances may belong to the same class. This differs from [17] where such edges are ignored. Our motivation on making such changes is two fold: 1. We believe instance-sensitive edges are important and it makes better sense to distinguish these locations. 2. The instance-sensitive setting may better benefit other potential applications where instances need to be distinguished.
6.3 Experiment on the SBD Dataset
The Semantic Boundary Dataset (SBD) [17] contains 11355 images from the trainval set of PASCAL VOC2011 [14], with 8498 images divided as training set and 2857 images as test set. The dataset contains both category-level and instance-level semantic segmentation annotations, with semantic classes defined following the 20 class definitions in PASCAL VOC.
Parameter Analysis.
We set \(\sigma _x=1\) and \(\sigma _y > \sigma _x\) to favor alignment perpendicular to edge tangents. Details on the validation of \(\sigma _y\) and \(\lambda \) are in supplementary.
Results on SBD Test Set.
We compare SEAL with CASENet, CASENet trained with regular sigmoid cross-entropy loss (CASENet-S), and CASENet-S trained on labels refined by dense-CRF following [50] (CASENet-C), with the results visualized in Fig. 5 and quantified in Table 1. Results show that SEAL is on par with CASENet-S under “Thin” setting, while significantly outperforms all other baselines when edge sharpness is taken into account.
Results on Re-annotated SBD Test Set.
A closer analysis shows that SEAL actually outperforms CASENet-S considerably under the “Thin” setting. The original SBD labels turns out to be noisy, which can influence the validity of evaluation. We re-annotated more than 1000 images on SBD test set using LabelMe [41], and report evaluation using these high-quality labels in Table 2. Results indicates that SEAL outperforms CASENet-S in both settings.
Results of SBD GT Refinement.
We output the SEAL aligned labels and compare against both dense-CRF and original annotation. We match the aligned labels with re-annotated labels by varying the tolerance threshold and generating F-Measure scores. Figure 4 shows that SEAL indeed can improve the label quality, while dense-CRF performs even worse than original labels. In fact, the result of CASENet-C also indicates the decreased model performance.
Non-Instance-Insensitive (non-IS) Mode.
We also train/evaluate under non-IS mode, with the evaluation using re-annotated SBD labels. Table 3 shows that the scores have high correlation with IS mode.
Comparison with State of the Art.
Although proposing different evaluation criteria, we still follow [52] by training SEAL with instance-insensitive labels and evaluating with the same benchmark and ground truths. Results in Table 4 show that this work outperforms previous state of the art by a significant margin.
6.4 Experiment on the Cityscapes Dataset
Results on Validation Set.
The Cityscapes dataset contains 2975 training images and 500 images as validation set. Following [52], we train SEAL on the training set and test on the validation set, with the results visualized in Fig. 6 and quantified in Table 5. Again, SEAL overall outperforms all comparing baselines.
Alignment Visualization.
We show that misalignment can still be found on Cityscapes. Figure 7 shows misaligned labels and the corrections made by SEAL.
7 Concluding Remarks
In this paper, we proposed SEAL: an end-to-end learning framework for joint edge alignment and learning. Our work considers a novel pixel-level noisy label learning problem, levering structured priors to address an open issue in edge learning. Extensive experiments demonstrate that the proposed framework is able to correct noisy labels and generate sharp edges with better quality.
Notes
- 1.
E.g., more stabled training, and more balanced prediction towards smaller classes.
References
Arbeláez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. PAMI 33(5), 898–916 (2011)
Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: CVPR (2014)
Bertasius, G., Shi, J., Torresani, L.: Deepedge: a multiscale bifurcated deep network for top-down contour detection. In: CVPR (2015)
Bertasius, G., Shi, J., Torresani, L.: High-for-low, low-for-high: efficient boundary detection from deep object features and its applications to high-level vision. In: ICCV (2015)
Bertasius, G., Shi, J., Torresani, L.: Semantic segmentation with boundary neural fields. In: CVPR (2016)
Bilmes, J.: On virtual evidence and soft evidence in Bayesian networks. Technical report (2004)
Canny, J.: A computational approach to edge detection. IEEE Trans. PAMI 6, 679–698 (1986)
Castrejón, L., Kundu, K., Urtasun, R., Fidler, S.: Annotating object instances with a polygon-RNN. In: CVPR (2017)
Chen, L.C., Barron, J.T., Papandreou, G., Murphy, K., Yuille, A.L.: Semantic image segmentation with task-specific edge detection using CNNS and a discriminatively trained domain transform. In: CVPR (2016)
Cherkassky, B.V., Goldberg, A.V.: On implementing push-relabel method for the maximum flow problem. In: Balas, E., Clausen, J. (eds.) IPCO 1995. LNCS, vol. 920, pp. 157–171. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-59408-6_49
Cordts, M., et al.: The Cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)
Dollar, P., Tu, Z., Belongie, S.: Supervised learning of edges and object boundaries. In: CVPR (2006)
Dollár, P., Zitnick, C.L.: Fast edge detection using structured forests. IEEE Trans. PAMI 37(8), 1558–1570 (2015)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge 2011 (VOC2011) results. http://www.pascal-network.org/challenges/VOC/voc2011/workshop/index.html
Goldberg, A.V., Kennedy, R.: An efficient cost scaling algorithm for the assignment problem. SIAM J. Discrete Math. (1993)
Hancock, E.R., Kittler, J.: Edge-labeling using dictionary-based relaxation. IEEE Trans. PAMI 12(2), 165–181 (1990)
Hariharan, B., Arbeláez, P., Bourdev, L., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: ICCV (2011)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hoiem, D., Efros, A.A., Hebert, M.: Geometric context from a single image. In: ICCV (2005)
Hwang, J., Liu, T.L.: Pixel-wise deep learning for contour detection. In: ICLR (2015)
Karsch, K., Liao, Z., Rock, J., Barron, J.T., Hoiem, D.: Boundary cues for 3D object shape recovery. In: CVPR (2013)
Khoreva, A., Benenson, R., Omran, M., Hein, M., Schiele, B.: Weakly supervised object boundaries. In: CVPR (2016)
Kittler, J.: On the accuracy of the sobel edge detector. Image Vis. Comput. 1(1), 37–42 (1983)
Kokkinos, I.: Pushing the boundaries of boundary detection using deep learning (2016)
Konishi, S., Yuille, A.L., Coughlan, J.M., Zhu, S.C.: Statistical edge detection: learning and evaluating edge cues. IEEE Trans. PAMI 25(1), 57–74 (2003)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)
Li, Y., Yang, J., Song, Y., Cao, L., Luo, J., Li, L.J.: Learning from noisy labels with distillation. In: CVPR (2017)
Liao, L., Choudhury, T., Fox, D., Kautz, H.A.: Training conditional random fields using virtual evidence boosting. In: IJCAI (2007)
Lim, J., Zitnick, C., Dollar, P.: Sketch tokens: a learned mid-level representation for contour and object detection. In: CVPR (2013)
Liu, Y., Cheng, M.M., Hu, X., Wang, K., Bai, X.: Richer convolutional features for edge detection. In: CVPR (2017)
Maire, M., Yu, S.X., Perona, P.: Reconstructive sparse code transfer for contour detection and semantic labeling. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9006, pp. 273–287. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16817-3_18
Malik, J.: Interpreting line drawings of curved objects. Int. J. Comput. Vis. 1(1), 73–103 (1987)
Malik, J., Maydan, D.: Recovering three-dimensional shape from a single image of curved objects. IEEE Trans. PAMI 11(6), 555–566 (1989)
Martin, D.R., Fowlkes, C.C., Malik, J.: Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans. PAMI 26(5), 530–549 (2004)
Patrini, G., Rozza, A., Menon, A.K., Nock, R., Qu, L.: Making deep neural networks robust to label noise: a loss correction approach. In: CVPR (2017)
Pearl, J.: Probabilistic reasoning in intelligent systems: networks of plausible inference (1988)
Pinheiro, P.O., Collobert, R., Dollár, P.: Learning to segment object candidates. In: NIPS (2015)
Prasad, M., Zisserman, A., Fitzgibbon, A., Kumar, M.P., Torr, P.H.S.: Learning class-specific edges for object detection and segmentation. In: Kalra, P.K., Peleg, S. (eds.) ICVGIP 2006. LNCS, vol. 4338, pp. 94–105. Springer, Heidelberg (2006). https://doi.org/10.1007/11949619_9
Ren, X., Fowlkes, C.C., Malik, J.: Learning probabilistic models for contour completion in natural images. Int. J. Comput. Vis. 77(1–3), 47–63 (2008)
Rupprecht, C., Huaroc, E., Baust, M., Navab, N.: Deep active contours. arXiv preprint arXiv:1607.05074 (2016)
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: Labelme: a database and web-based tool for image annotation. IJCV 77(1–3), 157–173 (2008)
Shan, Q., Curless, B., Furukawa, Y., Hernandez, C., Seitz, S.: Occluding contours for multi-view stereo. In: CVPR (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Sugihara, K.: Machine Interpretation of Line Drawings. MIT Press, Wiley (1986)
Vahdat, A.: Toward robustness against label noise in training deep discriminative neural networks. In: NIPS (2017)
Veit, A., Alldrin, N., Chechik, G., Krasin, I., Gupta, A., Belongie, S.: Learning from noisy large-scale datasets with minimal supervision. In: CVPR (2017)
Wang, Y., et al.: Iterative learning with open-set noisy labels. In: CVPR (2018)
Xiao, T., Xia, T., Yang, Y., Huang, C., Wang, X.: Learning from massive noisy labeled data for image classification. In: CVPR (2015)
Xie, S., Tu, Z.: Holistically-nested edge detection. In: ICCV (2015)
Yang, J., Price, B., Cohen, S., Lee, H., Yang, M.H.: Object contour detection with a fully convolutional encoder-decoder network. In: CVPR (2016)
Yu, Z., Liu, W., Liu, W., Peng, X., Hui, Z., Kumar, B.V.: Generalized transitive distance with minimum spanning random forest. In: IJCAI (2015)
Yu, Z., Feng, C., Liu, M.Y., Ramalingam, S.: CaseNet: deep category-aware semantic edge detection. In: CVPR (2017)
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_26
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Yu, Z. et al. (2018). Simultaneous Edge Alignment and Learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol 11207. Springer, Cham. https://doi.org/10.1007/978-3-030-01219-9_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-01219-9_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01218-2
Online ISBN: 978-3-030-01219-9
eBook Packages: Computer ScienceComputer Science (R0)