Joint Training of Generic CNN-CRF Models with Stochastic Optimization

A. Kirillov¹⁷,
D. Schlesinger¹⁷,
S. Zheng¹⁸,
B. Savchynskyy¹⁷,
P. H. S. Torr¹⁸ &
…
C. Rother¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10112))

Included in the following conference series:

Asian Conference on Computer Vision

2199 Accesses

Abstract

We propose a new CNN-CRF end-to-end learning framework, which is based on joint stochastic optimization with respect to both Convolutional Neural Network (CNN) and Conditional Random Field (CRF) parameters. While stochastic gradient descent is a standard technique for CNN training, it was not used for joint models so far. We show that our learning method is (i) general, i.e. it applies to arbitrary CNN and CRF architectures and potential functions; (ii) scalable, i.e. it has a low memory footprint and straightforwardly parallelizes on GPUs; (iii) easy in implementation. Additionally, the unified CNN-CRF optimization approach simplifies a potential hardware implementation. We empirically evaluate our method on the task of semantic labeling of body parts in depth images and show that it compares favorably to competing techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Human Body Model Fitting by Learned Gradient Descent

DeepTAM: Deep Tracking and Mapping with Convolutional Neural Networks

Article 03 September 2019

Structured Output Prediction and Learning for Deep Monocular 3D Human Pose Estimation

Notes

1.
http://host.robots.ox.ac.uk:8080/leaderboard.
2.
We use the commonly adopted terminology from the CNN literature for technical details, to allow reproducibility of our results.

References

Lin, G., Shen, C., Reid, I.D., van den Hengel, A.: Efficient piecewise training of deep structured models for semantic segmentation. preprint arXiv:1504.01013 (2015)
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML, pp. 282–289 (2001)
Google Scholar
Chen, L., Schwing, A.G., Yuille, A.L., Urtasun, R.: Learning deep structured models. In: ICML, pp. 1785–1794 (2015)
Google Scholar
Nowozin, S., Rother, C., Bagon, S., Sharp, T., Yao, B., Kohli, P.: Decision tree fields. In: ICCV (2011)
Google Scholar
Sethi, I.K.: Entropy nets: from decision trees to neural networks. Proc. IEEE 78, 1605–1613 (1990)
Article Google Scholar
Richmond, D.L., Kainmueller, D., Yang, M.Y., Myers, E.W., Rother, C.: Relating cascaded random forests to deep convolutional neural networks for semantic segmentation. preprint arXiv:1507.07583 (2015)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. preprint arXiv:1411.4038 (2014)
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. TPAMI 35, 1915–1929 (2013)
Article Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. preprint arXiv:1412.7062 (2014)
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with gaussian edge potentials. In: NIPS (2011)
Google Scholar
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.S.: Conditional random fields as recurrent neural networks. In: Proceedings of ICCV (2015)
Google Scholar
Schwing, A.G., Urtasun, R.: Fully connected deep structured networks. preprint arXiv:1503.02351 (2015)
Adams, A., Baek, J., Davis, M.A.: Fast high-dimensional filtering using the permutohedral lattice. In: Computer Graphics Forum, vol. 29. Wiley Online Library (2010)
Google Scholar
Domke, J.: Learning graphical model parameters with approximate marginal inference. TPAMI 35, 2454–2467 (2013)
Google Scholar
Kiefel, M., Gehler, P.V.: Human pose estimation with fields of parts. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 331–346. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_22
Google Scholar
Barbu, A.: Training an active random field for real-time image denoising. IEEE Trans. Image Process. 18, 2451–2462 (2009)
Article MathSciNet Google Scholar
Ross, S., Munoz, D., Hebert, M., Bagnell, J.A.: Learning message-passing inference machines for structured prediction. In: Proceedings of CVPR (2011)
Google Scholar
Stoyanov, V., Ropson, A., Eisner, J.: Empirical risk minimization of graphical model parameters given approximate inference, decoding, and model structure. In: Proceedings of AISTATS (2011)
Google Scholar
Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Proceedings of NIPS (2014)
Google Scholar
Liu, Z., Li, X., Luo, P., Loy, C.C., Tang, X.: Semantic image segmentation via deep parsing network. In: Proceedings of ICCV (2015)
Google Scholar
Sutton, C., McCallum, A.: Piecewise training of undirected models. In: Conference on Uncertainty in Artificial Intelligence (UAI) (2005)
Google Scholar
He, X., Zemel, R.S., Carreira-perpiñán, M.Á.: Multiscale conditional random fields for image labeling. In: CVPR. Citeseer (2004)
Google Scholar
Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends${\textregistered }$ Mach. Learn. 1, 1–305 (2008)
Google Scholar
Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions, and the Bayesian restoration of images. TPAMI 6, 721–741 (1984)
Article MATH Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 400–407 (1951)
Google Scholar
Spall, J.C.: Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control, vol. 65. Wiley, Hoboken (2005)
MATH Google Scholar
Geyer, C.J.: Practical Markov chain Monte Carlo. Stat. Sci. 473–483 (1992)
Google Scholar
Lauritzen, S.L.: Graphical Models. Oxford University Press, Oxford (1996)
MATH Google Scholar
Gonzalez, J., Low, Y., Gretton, A., Guestrin, C.: Parallel Gibbs sampling: from colored fields to thin junction trees. In: International Conference on Artificial Intelligence and Statistics. pp. 324–332 (2011)
Google Scholar
Hinton, G.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002)
Article MATH Google Scholar
Yuille, A.L.: The convergence of contrastive divergences. In: NIPS (2004)
Google Scholar
Tieleman, T.: Training restricted Boltzmann machines using approximations to the likelihood gradient. In: ICML. ACM, New York (2008)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC 2012) Results
Google Scholar
Hariharan, B., Arbelaez, P., Bourdev, L., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: International Conference on Computer Vision (ICCV) (2011)
Google Scholar
Denil, M., Matheson, D., de Freitas, N.: Consistency of online random forests. In: ICML (2013)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates Inc., New York (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Google Scholar
Ren, S., Cao, X., Wei, Y., Sun, J.: Global refinement of random forest. In: CVPR (2015)
Google Scholar
Cheng, M.M., Prisacariu, V.A., Zheng, S., Torr, P.H.S., Rother, C.: Densecut: densely connected CRFs for realtime Grabcut. Comput. Graph. Forum 34, 193–201 (2015)
Article Google Scholar

Download references

Acknowledgements

This work was supported by: European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 647769); German Federal Ministry of Education and Research (BMBF, 01IS14014A-D); EPSRC EP/I001107/2; ERC grant ERC-2012-AdG 321162-HELIOS. The computations were performed on an HPC Cluster at the Center for Information Services and High Performance Computing (ZIH) at TU Dresden.

Author information

Authors and Affiliations

Dresden University of Technology, Dresden, Germany
A. Kirillov, D. Schlesinger, B. Savchynskyy & C. Rother
University of Oxford, Oxford, England
S. Zheng & P. H. S. Torr

Authors

A. Kirillov
View author publications
You can also search for this author in PubMed Google Scholar
D. Schlesinger
View author publications
You can also search for this author in PubMed Google Scholar
S. Zheng
View author publications
You can also search for this author in PubMed Google Scholar
B. Savchynskyy
View author publications
You can also search for this author in PubMed Google Scholar
P. H. S. Torr
View author publications
You can also search for this author in PubMed Google Scholar
C. Rother
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Schlesinger .

Editor information

Editors and Affiliations

National Tsing Hua University, Hsinchu, Taiwan
Shang-Hong Lai
Graz University of Technology, Graz, Austria
Vincent Lepetit
Drexel University, Philadelphia, Pennsylvania, USA
Ko Nishino
The University of Tokyo, Tokyo, Japan
Yoichi Sato

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kirillov, A., Schlesinger, D., Zheng, S., Savchynskyy, B., Torr, P.H.S., Rother, C. (2017). Joint Training of Generic CNN-CRF Models with Stochastic Optimization. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10112. Springer, Cham. https://doi.org/10.1007/978-3-319-54184-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-54184-6_14
Published: 10 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54183-9
Online ISBN: 978-3-319-54184-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics