Abstract
In this paper, we propose the Deep Structured self-Driving Network (DSDNet), which performs object detection, motion prediction, and motion planning with a single neural network. Towards this goal, we develop a deep structured energy based model which considers the interactions between actors and produces socially consistent multimodal future predictions. Furthermore, DSDNet explicitly exploits the predicted future distributions of actors to plan a safe maneuver by using a structured planning cost. Our sample-based formulation allows us to overcome the difficulty in probabilistic inference of continuous random variables. Experiments on a number of large-scale self driving datasets demonstrate that our model significantly outperforms the state-of-the-art.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We would like the samples to cover the original continuous space and have high recall wrt the ground-truth future trajectories.
- 2.
- 3.
We find that using only \(\mathcal {L}_{planning}\) without the other two terms prevents the model from learning reasonable detection and prediction.
- 4.
Numbers are reported on official validation split, since there is no joint detection and prediction benchmark.
- 5.
[7] replaced the original encoder (taking the ground-truth detection and tracking as input) with a learned CNN that takes LiDAR as input for a fair comparison.
- 6.
We conduct the comparison on the official validation split, as our model currently only focuses on vehicles while the testing benchmark is built for multi-class detection.
References
Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social LSTM: human trajectory prediction in crowded spaces. In: CVPR (2016)
Bandyopadhyay, T., Won, K.S., Frazzoli, E., Hsu, D., Lee, W.S., Rus, D.: Intention-aware motion planning. In: Frazzoli, E., Lozano-Perez, T., Roy, N., Rus, D. (eds.) Algorithmic Foundations of Robotics X. STAR, vol. 86, pp. 475–491. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36279-8_29
Belanger, D., McCallum, A.: Structured prediction energy networks. In: ICML (2016)
Bojarski, M., et al.: End to end learning for self-driving cars. arXiv (2016)
Buehler, M., Iagnemma, K., Singh, S.: The DARPA Urban Challenge: Autonomous Vehicles in City Traffic (2009)
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. arXiv (2019)
Casas, S., Gulino, C., Liao, R., Urtasun, R.: Spatially-aware graph neural networks for relational behavior forecasting from sensor data. arXiv (2019)
Casas, S., Gulino, C., Suo, S., Luo, K., Liao, R., Urtasun, R.: Implicit latent variable model for scene-consistent motion forecasting. In: ECCV (2020)
Casas, S., Gulino, C., Suo, S., Urtasun, R.: The importance of prior knowledge in precise multimodal prediction. In: IROS (2020)
Casas, S., Luo, W., Urtasun, R.: IntentNet: learning to predict intention from raw sensor data. In: Proceedings of The 2nd Conference on Robot Learning (2018)
Chai, Y., Sapp, B., Bansal, M., Anguelov, D.: Multipath: multiple probabilistic anchor trajectory hypotheses for behavior prediction. arXiv (2019)
Chen, L.C., Schwing, A., Yuille, A., Urtasun, R.: Learning deep structured models. In: ICML (2015)
Codevilla, F., Miiller, M., López, A., Koltun, V., Dosovitskiy, A.: End-to-end driving via conditional imitation learning. In: ICRA (2018)
Deo, N., Trivedi, M.M.: Convolutional social pooling for vehicle trajectory prediction. In: CVPR (2018)
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: Carla: an open urban driving simulator. arXiv (2017)
Fan, H., et al.: Baidu apollo em motion planner. arXiv (2018)
Graber, C., Meshi, O., Schwing, A.: Deep structured prediction with nonlinear output transformations. In: NeurIPS (2018)
Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social GAN: socially acceptable trajectories with generative adversarial networks. In: CVPR (2018)
Hardy, J., Campbell, M.: Contingency planning over probabilistic obstacle predictions for autonomous road vehicles. IEEE Trans. Robot. D (2013)
Hong, J., Sapp, B., Philbin, J.: Rules of the road: predicting driving behavior with a convolutional model of semantic interactions. In: CVPR (2019)
Ihler, A., McAllester, D.: Particle belief propagation. In: Artificial Intelligence and Statistics (2009)
Jain, A., et al.: Discrete residual flow for probabilistic pedestrian behavior prediction. arXiv (2019)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv (2013)
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: CVPR (2019)
Lee, N., Choi, W., Vernaza, P., Choy, C.B., Torr, P.H., Chandraker, M.: Desire: distant future prediction in dynamic scenes with interacting agents. In: CVPR (2017)
Li, L., et al.: End-to-end contextual perception and prediction with interaction transformer. In: IROS (2020)
Liang, M., et al.: Learning lane graph representations for motion forecasting. In: ECCV (2020)
Liang, M., et al.: PnPNet: end-to-end perception and prediction with tracking in the loop. In: CVPR (2020)
Luo, W., Yang, B., Urtasun, R.: Fast and furious: real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net
Manivasagam, S., et al.: LiDARsim: realistic lidar simulation by leveraging the real world. In: CVPR (2020)
Marcos, D., et al.: Learning deep structured active contours end-to-end. In: CVPR (2018)
Min Choi, H., Kang, H., Hyun, Y.: Multi-view reprojection architecture for orientation estimation. In: ICCV (2019)
Montemerlo, M., et al.: Junior: the stanford entry in the urban challenge. J. Field Robot. (2008)
Müller, M., Dosovitskiy, A., Ghanem, B., Koltun, V.: Driving policy transfer via modularity and abstraction. arXiv (2018)
Murphy, K.P., Weiss, Y., Jordan, M.I.: Loopy belief propagation for approximate inference: an empirical study. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (1999)
Phan-Minh, T., Grigore, E.C., Boulton, F.A., Beijbom, O., Wolff, E.M.: CoverNet: multimodal behavior prediction using trajectory sets. In: CVPR (2020)
Pomerleau, D.A.: ALVINN: an autonomous land vehicle in a neural network. In: NeurIPS (1989)
Rhinehart, N., Kitani, K.M., Vernaza, P.: r2p2: a reparameterized pushforward policy for diverse, precise generative path forecasting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 794–811. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_47
Rhinehart, N., McAllister, R., Kitani, K., Levine, S.: PRECOG: prediction conditioned on goals in visual multi-agent settings. arXiv (2019)
Sadat, A., Casas, S., Ren, M., Wu, X., Dhawan, P., Urtasun, R.: Perceive, predict, and plan: Safe motion planning through interpretable semantic representations. In: ECCV (2020)
Sadat, A., Ren, M., Pokrovsky, A., Lin, Y.C., Yumer, E., Urtasun, R.: Jointly learnable behavior and trajectory planning for self-driving vehicles. arXiv (2019)
Sadeghian, A., Legros, F., Voisin, M., Vesel, R., Alahi, A., Savarese, S.: CAR-Net: clairvoyant attentive recurrent network. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 162–180. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_10
Schwing, A.G., Urtasun, R.: Fully connected deep structured networks. arXiv (2015)
Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: NeurIPS (2015)
Sudderth, E.B., Ihler, A.T., Isard, M., Freeman, W.T., Willsky, A.S.: Nonparametric belief propagation. Commun. ACM (2010)
Tang, Y.C., Salakhutdinov, R.: Multiple futures prediction. arXiv (2019)
Wang, T.H., Manivasagam, S., Liang, M., Yang, B., Zeng, W., Raquel, U.: V2VNET: vehicle-to-vehicle communication for joint perception and prediction. In: ECCV (2020)
Weiss, Y., Pearl, J.: Belief propagation: technical perspective. Commun. ACM (2010)
Wulfmeier, M., Ondruska, P., Posner, I.: Maximum entropy deep inverse reinforcement learning. arXiv (2015)
Yamaguchi, K., Hazan, T., McAllester, D., Urtasun, R.: Continuous Markov random fields for robust stereo estimation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 45–58. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_4
Yamaguchi, K., McAllester, D., Urtasun, R.: Efficient joint segmentation, occlusion labeling, stereo and flow estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 756–771. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_49
Yang, B., Luo, W., Urtasun, R.: Pixor: Real-time 3D object detection from point clouds
Yedidia, J.S., Freeman, W.T., Weiss, Y.: Understanding belief propagation and its generalizations. In: Exploring Artificial Intelligence in the New Millennium (2003)
Zeng, W., Luo, W., Suo, S., Sadat, A., Yang, B., Casas, S., Urtasun, R.: End-to-end interpretable neural motion planner. In: CVPR (2019)
Zhai, S., Cheng, Y., Lu, W., Zhang, Z.: Deep structured energy based models for anomaly detection. In: ICML (2016)
Zhan, W., Liu, C., Chan, C.Y., Tomizuka, M.: A non-conservatively defensive strategy for urban autonomous driving. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC) (2016)
Zhao, T., et al.: Multi-agent tensor fusion for contextual trajectory prediction. In: CVPR (2019)
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: CVPR (2018)
Zhu, B., Jiang, Z., Zhou, X., Li, Z., Yu, G.: Class-balanced grouping and sampling for point cloud 3D object detection. arXiv (2019)
Ziebart, B.D., Maas, A.L., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: AAAI (2008)
Ziegler, J., Bender, P., Dang, T., Stiller, C.: Trajectory planning for bertha–a local, continuous method. In: Intelligent Vehicles Symposium Proceedings, 2014 IEEE (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 2 (mp4 51724 KB)
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zeng, W., Wang, S., Liao, R., Chen, Y., Yang, B., Urtasun, R. (2020). DSDNet: Deep Structured Self-driving Network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12366. Springer, Cham. https://doi.org/10.1007/978-3-030-58589-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-58589-1_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58588-4
Online ISBN: 978-3-030-58589-1
eBook Packages: Computer ScienceComputer Science (R0)