DSDNet: Deep Structured Self-driving Network

Wenyuan Zeng^12,13,
Shenlong Wang^12,13,
Renjie Liao^12,13,
Yun Chen¹²,
Bin Yang^12,13 &
…
Raquel Urtasun^12,13

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12366))

Included in the following conference series:

European Conference on Computer Vision

5062 Accesses
3 Altmetric

Abstract

In this paper, we propose the Deep Structured self-Driving Network (DSDNet), which performs object detection, motion prediction, and motion planning with a single neural network. Towards this goal, we develop a deep structured energy based model which considers the interactions between actors and produces socially consistent multimodal future predictions. Furthermore, DSDNet explicitly exploits the predicted future distributions of actors to plan a safe maneuver by using a structured planning cost. Our sample-based formulation allows us to overcome the difficulty in probabilistic inference of continuous random variables. Experiments on a number of large-scale self driving datasets demonstrate that our model significantly outperforms the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

CAR-Net: Clairvoyant Attentive Recurrent Network

Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations

Predicting humans future motion trajectories in video streams using generative adversarial network

Article 13 September 2021

Notes

1.
We would like the samples to cover the original continuous space and have high recall wrt the ground-truth future trajectories.
2.
Although the sum-product algorithm is only exact for tree structures, it is shown to work well in practice for graphs with cycles [35, 48].
3.
We find that using only $\mathcal {L}_{planning}$ without the other two terms prevents the model from learning reasonable detection and prediction.
4.
Numbers are reported on official validation split, since there is no joint detection and prediction benchmark.
5.
[7] replaced the original encoder (taking the ground-truth detection and tracking as input) with a learned CNN that takes LiDAR as input for a fair comparison.
6.
We conduct the comparison on the official validation split, as our model currently only focuses on vehicles while the testing benchmark is built for multi-class detection.

References

Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social LSTM: human trajectory prediction in crowded spaces. In: CVPR (2016)
Google Scholar
Bandyopadhyay, T., Won, K.S., Frazzoli, E., Hsu, D., Lee, W.S., Rus, D.: Intention-aware motion planning. In: Frazzoli, E., Lozano-Perez, T., Roy, N., Rus, D. (eds.) Algorithmic Foundations of Robotics X. STAR, vol. 86, pp. 475–491. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36279-8_29
Chapter Google Scholar
Belanger, D., McCallum, A.: Structured prediction energy networks. In: ICML (2016)
Google Scholar
Bojarski, M., et al.: End to end learning for self-driving cars. arXiv (2016)
Google Scholar
Buehler, M., Iagnemma, K., Singh, S.: The DARPA Urban Challenge: Autonomous Vehicles in City Traffic (2009)
Google Scholar
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. arXiv (2019)
Google Scholar
Casas, S., Gulino, C., Liao, R., Urtasun, R.: Spatially-aware graph neural networks for relational behavior forecasting from sensor data. arXiv (2019)
Google Scholar
Casas, S., Gulino, C., Suo, S., Luo, K., Liao, R., Urtasun, R.: Implicit latent variable model for scene-consistent motion forecasting. In: ECCV (2020)
Google Scholar
Casas, S., Gulino, C., Suo, S., Urtasun, R.: The importance of prior knowledge in precise multimodal prediction. In: IROS (2020)
Google Scholar
Casas, S., Luo, W., Urtasun, R.: IntentNet: learning to predict intention from raw sensor data. In: Proceedings of The 2nd Conference on Robot Learning (2018)
Google Scholar
Chai, Y., Sapp, B., Bansal, M., Anguelov, D.: Multipath: multiple probabilistic anchor trajectory hypotheses for behavior prediction. arXiv (2019)
Google Scholar
Chen, L.C., Schwing, A., Yuille, A., Urtasun, R.: Learning deep structured models. In: ICML (2015)
Google Scholar
Codevilla, F., Miiller, M., López, A., Koltun, V., Dosovitskiy, A.: End-to-end driving via conditional imitation learning. In: ICRA (2018)
Google Scholar
Deo, N., Trivedi, M.M.: Convolutional social pooling for vehicle trajectory prediction. In: CVPR (2018)
Google Scholar
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: Carla: an open urban driving simulator. arXiv (2017)
Google Scholar
Fan, H., et al.: Baidu apollo em motion planner. arXiv (2018)
Google Scholar
Graber, C., Meshi, O., Schwing, A.: Deep structured prediction with nonlinear output transformations. In: NeurIPS (2018)
Google Scholar
Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social GAN: socially acceptable trajectories with generative adversarial networks. In: CVPR (2018)
Google Scholar
Hardy, J., Campbell, M.: Contingency planning over probabilistic obstacle predictions for autonomous road vehicles. IEEE Trans. Robot. D (2013)
Google Scholar
Hong, J., Sapp, B., Philbin, J.: Rules of the road: predicting driving behavior with a convolutional model of semantic interactions. In: CVPR (2019)
Google Scholar
Ihler, A., McAllester, D.: Particle belief propagation. In: Artificial Intelligence and Statistics (2009)
Google Scholar
Jain, A., et al.: Discrete residual flow for probabilistic pedestrian behavior prediction. arXiv (2019)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv (2013)
Google Scholar
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: CVPR (2019)
Google Scholar
Lee, N., Choi, W., Vernaza, P., Choy, C.B., Torr, P.H., Chandraker, M.: Desire: distant future prediction in dynamic scenes with interacting agents. In: CVPR (2017)
Google Scholar
Li, L., et al.: End-to-end contextual perception and prediction with interaction transformer. In: IROS (2020)
Google Scholar
Liang, M., et al.: Learning lane graph representations for motion forecasting. In: ECCV (2020)
Google Scholar
Liang, M., et al.: PnPNet: end-to-end perception and prediction with tracking in the loop. In: CVPR (2020)
Google Scholar
Luo, W., Yang, B., Urtasun, R.: Fast and furious: real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net
Google Scholar
Manivasagam, S., et al.: LiDARsim: realistic lidar simulation by leveraging the real world. In: CVPR (2020)
Google Scholar
Marcos, D., et al.: Learning deep structured active contours end-to-end. In: CVPR (2018)
Google Scholar
Min Choi, H., Kang, H., Hyun, Y.: Multi-view reprojection architecture for orientation estimation. In: ICCV (2019)
Google Scholar
Montemerlo, M., et al.: Junior: the stanford entry in the urban challenge. J. Field Robot. (2008)
Google Scholar
Müller, M., Dosovitskiy, A., Ghanem, B., Koltun, V.: Driving policy transfer via modularity and abstraction. arXiv (2018)
Google Scholar
Murphy, K.P., Weiss, Y., Jordan, M.I.: Loopy belief propagation for approximate inference: an empirical study. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (1999)
Google Scholar
Phan-Minh, T., Grigore, E.C., Boulton, F.A., Beijbom, O., Wolff, E.M.: CoverNet: multimodal behavior prediction using trajectory sets. In: CVPR (2020)
Google Scholar
Pomerleau, D.A.: ALVINN: an autonomous land vehicle in a neural network. In: NeurIPS (1989)
Google Scholar
Rhinehart, N., Kitani, K.M., Vernaza, P.: r2p2: a reparameterized pushforward policy for diverse, precise generative path forecasting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 794–811. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_47
Chapter Google Scholar
Rhinehart, N., McAllister, R., Kitani, K., Levine, S.: PRECOG: prediction conditioned on goals in visual multi-agent settings. arXiv (2019)
Google Scholar
Sadat, A., Casas, S., Ren, M., Wu, X., Dhawan, P., Urtasun, R.: Perceive, predict, and plan: Safe motion planning through interpretable semantic representations. In: ECCV (2020)
Google Scholar
Sadat, A., Ren, M., Pokrovsky, A., Lin, Y.C., Yumer, E., Urtasun, R.: Jointly learnable behavior and trajectory planning for self-driving vehicles. arXiv (2019)
Google Scholar
Sadeghian, A., Legros, F., Voisin, M., Vesel, R., Alahi, A., Savarese, S.: CAR-Net: clairvoyant attentive recurrent network. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 162–180. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_10
Chapter Google Scholar
Schwing, A.G., Urtasun, R.: Fully connected deep structured networks. arXiv (2015)
Google Scholar
Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: NeurIPS (2015)
Google Scholar
Sudderth, E.B., Ihler, A.T., Isard, M., Freeman, W.T., Willsky, A.S.: Nonparametric belief propagation. Commun. ACM (2010)
Google Scholar
Tang, Y.C., Salakhutdinov, R.: Multiple futures prediction. arXiv (2019)
Google Scholar
Wang, T.H., Manivasagam, S., Liang, M., Yang, B., Zeng, W., Raquel, U.: V2VNET: vehicle-to-vehicle communication for joint perception and prediction. In: ECCV (2020)
Google Scholar
Weiss, Y., Pearl, J.: Belief propagation: technical perspective. Commun. ACM (2010)
Google Scholar
Wulfmeier, M., Ondruska, P., Posner, I.: Maximum entropy deep inverse reinforcement learning. arXiv (2015)
Google Scholar
Yamaguchi, K., Hazan, T., McAllester, D., Urtasun, R.: Continuous Markov random fields for robust stereo estimation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 45–58. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_4
Chapter Google Scholar
Yamaguchi, K., McAllester, D., Urtasun, R.: Efficient joint segmentation, occlusion labeling, stereo and flow estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 756–771. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_49
Chapter Google Scholar
Yang, B., Luo, W., Urtasun, R.: Pixor: Real-time 3D object detection from point clouds
Google Scholar
Yedidia, J.S., Freeman, W.T., Weiss, Y.: Understanding belief propagation and its generalizations. In: Exploring Artificial Intelligence in the New Millennium (2003)
Google Scholar
Zeng, W., Luo, W., Suo, S., Sadat, A., Yang, B., Casas, S., Urtasun, R.: End-to-end interpretable neural motion planner. In: CVPR (2019)
Google Scholar
Zhai, S., Cheng, Y., Lu, W., Zhang, Z.: Deep structured energy based models for anomaly detection. In: ICML (2016)
Google Scholar
Zhan, W., Liu, C., Chan, C.Y., Tomizuka, M.: A non-conservatively defensive strategy for urban autonomous driving. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC) (2016)
Google Scholar
Zhao, T., et al.: Multi-agent tensor fusion for contextual trajectory prediction. In: CVPR (2019)
Google Scholar
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: CVPR (2018)
Google Scholar
Zhu, B., Jiang, Z., Zhou, X., Li, Z., Yu, G.: Class-balanced grouping and sampling for point cloud 3D object detection. arXiv (2019)
Google Scholar
Ziebart, B.D., Maas, A.L., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: AAAI (2008)
Google Scholar
Ziegler, J., Bender, P., Dang, T., Stiller, C.: Trajectory planning for bertha–a local, continuous method. In: Intelligent Vehicles Symposium Proceedings, 2014 IEEE (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Uber ATG, Pittsburgh, USA
Wenyuan Zeng, Shenlong Wang, Renjie Liao, Yun Chen, Bin Yang & Raquel Urtasun
University of Toronto, Toronto, Canada
Wenyuan Zeng, Shenlong Wang, Renjie Liao, Bin Yang & Raquel Urtasun

Authors

Wenyuan Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Shenlong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Renjie Liao
View author publications
You can also search for this author in PubMed Google Scholar
Yun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Bin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Raquel Urtasun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenyuan Zeng .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1928 KB)

Supplementary material 2 (mp4 51724 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeng, W., Wang, S., Liao, R., Chen, Y., Yang, B., Urtasun, R. (2020). DSDNet: Deep Structured Self-driving Network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12366. Springer, Cham. https://doi.org/10.1007/978-3-030-58589-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-58589-1_10
Published: 12 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58588-4
Online ISBN: 978-3-030-58589-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics