Nothing Special   »   [go: up one dir, main page]

skip to main content
survey

Visual SLAM and Structure from Motion in Dynamic Environments: A Survey

Published: 20 February 2018 Publication History

Abstract

In the last few decades, Structure from Motion (SfM) and visual Simultaneous Localization and Mapping (visual SLAM) techniques have gained significant interest from both the computer vision and robotic communities. Many variants of these techniques have started to make an impact in a wide range of applications, including robot navigation and augmented reality. However, despite some remarkable results in these areas, most SfM and visual SLAM techniques operate based on the assumption that the observed environment is static. However, when faced with moving objects, overall system accuracy can be jeopardized. In this article, we present for the first time a survey of visual SLAM and SfM techniques that are targeted toward operation in dynamic environments. We identify three main problems: how to perform reconstruction (robust visual SLAM), how to segment and track dynamic objects, and how to achieve joint motion segmentation and reconstruction. Based on this categorization, we provide a comprehensive taxonomy of existing approaches. Finally, the advantages and disadvantages of each solution class are critically discussed from the perspective of practicality and robustness.

References

[1]
Vincent J. Aidala and Sherry E. Hammel. 1983. Utilization of modified polar coordinates for bearings-only tracking. IEEE Trans. Automat. Contr. 28, 3 (1983), 283--294.
[2]
Hirotogu Akaike. 1973. Information theory and an extension of the maximum likelihood principle. In Int. Symp. Inf. Theory. 267--281.
[3]
Ijaz Akhter, Sohaib Khan, Yaser Sheikh, and Takeo Kanade. 2008. Nonrigid structure from motion in trajectory space. In Adv. Neural Inf. Process. Syst., Vol. 1. 1--8.
[4]
Pablo F. Alcantarilla, José J. Yebes, Javier Almazán, and Luis M. Bergasa. 2012. On combining visual slam and dense scene flow to increase the robustness of localization and mapping in dynamic environments. In IEEE Int. Conf. Robot. Autom. 1290--1297.
[5]
Shai Avidan and Amnon Shashua. 1999. Trajectory triangulation of lines: Reconstruction of a 3D point moving along a line from a monocular image sequence. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Vol. 2. 66.
[6]
Shai Avidan and Amnon Shashua. 2000. Trajectory triangulation: 3D reconstruction of moving points from a monocular image sequence. IEEE Trans. Pattern Anal. Mach. Intell. 22, 4 (2000), 348--357.
[7]
Mohammadreza Babaee, Duc Tung Dinh, and Gerhard Rigoll. 2017. A deep convolutional neural network for background subtraction. In arXiv:1702.01731.
[8]
Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. 2008. Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110, 3 (2008), 346--359.
[9]
Paul A. Beardsley, Andrew Zisserman, and David W. Murray. 1994. Navigation using affine structure from motion. In Eur. Conf. Comput. Vis. 85--96.
[10]
Francisco Bonin-Font, Alberto Ortiz, Gabriel Oliver, Francisco Bonin-font Alberto, and Ortiz Gabriel. 2008. Visual navigation for mobile robots: A survey. J. Intell. Robot. Syst. 53 (2008), 263--296.
[11]
Jean-Yves Bouguet. 2000. Pyramidal implementation of the affine Lucas Kanade feature tracker - Description of the algorithm. Intel Corp. Microprocess. Res. Labs.
[12]
Terrance E. Boult and Lisa Gottesfeld Brown. 1991. Factorization-based segmentation of motions. In IEEE Work. Vis. Motion.
[13]
Christoph Bregler, Aaron Herzmann, and Henning Biermann. 2000. Recovering non-rigid 3D shape from image streams. In IEEE Conf. Comput. Vis. Pattern Recognit.
[14]
Michael D. Breitenstein, Fabian Reichlin, Bastian Leibe, Esther Koller-Meier, and Luc Van Gool. 2011. Online multi-person tracking-by-detection from a single, uncalibrated camera. IEEE Trans. Pattern Anal. Mach. Intell. 33, 9 (2011), 1820--1833.
[15]
Arunkumar Byravan and Dieter Fox. 2017. SE3-Nets: Learning rigid body motion using deep neural networks. In IEEE Int. Conf. Robot. Autom.
[16]
Jean-pierre L. E. Cadre and Olivier Tremois. 1998. Bearings-only tracking for maneuvering sources. IEEE Trans. Aerosp. Electron. Syst. 34, 1 (1998), 179--193.
[17]
Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua. 2010. BRIEF: Binary robust independent elementary features. In Eur. Conf. Comput. Vis. 778--792.
[18]
Robert O. Castle, Georg Klein, and David W. Murray. 2011. Wide-area augmented reality using camera tracking and mapping in multiple regions. Comput. Vis. Image Underst. 115, 6 (2011), 854--867.
[19]
Stephen M. Chaves, Ayoung Kim, and Ryan M. Eustice. 2014. Opportunistic sampling-based planning for active visual SLAM. In IEEE/RSJ Int. Conf. Intell. Robot. Syst.
[20]
Jinhui Chen and Jian Yang. 2014. Robust subspace segmentation by low-rank representation. IEEE Trans. Cybern. 44, 8 (2014), 1432--1445.
[21]
Falak Chhaya, Dinesh Reddy, Sarthak Upadhyay, Visesh Chari, M. Zeeshan Zia, and K. Madhava Krishna. 2016. Monocular reconstruction of vehicles: Combining SLAM with shape priors. In IEEE Int. Conf. Robot. Autom. 5758--5765.
[22]
Ondrej Chum and Jiri Matas. 2005. Matching with PROSAC-Progressive Sample Consensus. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 220--226.
[23]
Burcu Cinaz and Holger Kenn. 2008. HeadSLAM - Simultaneous localization and mapping with head-mounted inertial and laser range sensors. In IEEE Int. Symp. Wearable Comput.
[24]
Joao Costeira and Takeo Kanade. 1995. A multi-body factorization method for motion analysis. In Int. Conf. Comput. Vis. 1071--1076.
[25]
João Paulo Costeira and Takeo Kanade. 1998. A multibody factorization method for independently moving objects. Int. J. Comput. Vis. 29, 3 (1998), 159--179.
[26]
Mark Cummins and Paul Newman. 2008. FAB-MAP: Probabilistic localization and mapping in the space of appearance. Int. J. Rob. Res. 27, 6 (2008), 647--665.
[27]
Yuchao Dai, Hongdong Li, and Mingyi He. 2014. A simple prior-free method for non-rigid structure-from-motion factorization. Int. J. Comput. Vis. 107, 2 (2014), 101--122.
[28]
Danping Zhou and Ping Tan. 2012. CoSLAM: Collaborative visual SLAM in dynamic environments. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2 (2012), 354--366.
[29]
Andrew J. Davison. 2003. Real-time simultaneous localisation and mapping with a single camera. In IEEE Int. Conf. Comput. Vis.
[30]
Maxime Derome, Aurelien Plyer, Martial Sanfourche, and Guy Le Besnerais. 2015. Moving object detection in real-time using stereo from a mobile platform. Unmanned Syst. 3, 4 (2015), 253--266.
[31]
Maxime Derome, Aurelien Plyer, Martial Sanfourche, and Guy Le Besnerais. 2014. Real-time mobile object detection using stereo. In 13th Int. Conf. Control Autom. Robot. Vis. (ICARCV’14). 1021--1026.
[32]
Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. 2016. Deep image homography estimation. In arXiv:1606.03798.
[33]
Alexey Dosovitskiy, Philipp Fischery, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2016. FlowNet: Learning optical flow with convolutional networks. In IEEE Int. Conf. Comput. Vis., Vol. 11-18-Dece. 2758--2766.
[34]
Ehsan Elhamifar and Rene Vidal. 2009. Sparse subspace clustering. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work. 2790--2797.
[35]
Ehsan Elhamifar and Rene Vidal. 2013. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35, 11 (2013), 2765--2781.
[36]
Jakob Engel, Thomas Sch, and Daniel Cremers. 2014. LSD-SLAM: Direct monocular SLAM. In Eur. Conf. Comput. Vis. 834--849.
[37]
Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24 (1981), 381--395.
[38]
Katerina Fragkiadaki, Pablo Arbelaez, Panna Felsen, and Jitendra Malik. 2015. Learning to segment moving objects in videos. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 4083--4090.
[39]
Friedrich Fraundorfer and Davide Scaramuzza. 2012. Visual odometry: Part II - matching, robustness, optimization, and applications. IEEE Robot. Autom. Mag. 19, 2 (2012), 78--90.
[40]
Jorge Fuentes-Pacheco, Jose Ruiz-Ascencio, and Juan Manuel Rendon-Mancha. 2012. Visual simultaneous localization and mapping: A survey. Artif. Intell. Rev. 43, 1 (2012), 55--81.
[41]
Dorian Galvez-Lopez and Juan D. Tardos. 2012. Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28, 5 (2012), 1188--1197.
[42]
Xiao Shan Gao, Xiao Rong Hou, Jianliang Tang, and Hang Fei Cheng. 2003. Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 25, 8 (2003), 930--943.
[43]
Emilio Garcia-Fidalgo and Alberto Ortiz. 2015. Vision-based topological mapping and localization methods: A survey. Rob. Auton. Syst. 64 (2015), 1--20.
[44]
C. W. Gear. 1998. Multibody grouping from motion images. Int. J. Comput. Vis. 29, 2 (1998), 133--150.
[45]
Andreas Geiger, Julius Ziegler, and Christoph Stiller. 2011. StereoScan: Dense 3D reconstruction in real-time. In IEEE Intell. Veh. Symp. 1--9.
[46]
Arturo Gil, Oscar Reinoso, Monica Ballesta, and Miguel Julia. 2010. Multi-robot visual SLAM using a Rao-Blackwellized particle filter. Rob. Auton. Syst. 58, 1 (2010), 68--80.
[47]
Georgia Gkioxari and Jitendra Malik. 2015. Finding action tubes. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.
[48]
Susanna Gladh, Martin Danelljan, Fahad Shahbaz Khan, and Michael Felsberg. 2016. Deep motion features for visual tracking. In Int. Conf. Pattern Recognit.
[49]
Alvina Goh and Rene Vidal. 2007. Segmenting motions of different types by unsupervised manifold clustering. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.
[50]
Venu Madhav Govindu. 2001. Combining two-view constraints for motion estimation. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.
[51]
H. M. Gross, H. J. Boehme, C. Schroeter, S. Mueller, A. Koenig, Ch. Martin, M. Merten, and A. Bley. 2008. Shopbot: Progress in developing an interactive mobile shopping assistant for everyday use. In IEEE Int. Conf. Syst. Man Cybern. 3471--3478.
[52]
Yanming Guo, Yu Liu, Ard Oerlemans, Songyang Lao, Song Wu, and Michael S. Lew. 2015. Deep learning for visual understanding: A review. Neurocomputing 187 (2015), 27--48.
[53]
Hugh C. Longuet-Higgins. 1981. A computer algorithm for reconstructing a scene from two projections. Nature 293 (1981), 133--135.
[54]
Mei Han and Takeo Kanade. 2004. Reconstruction of a scene with multiple linearly moving objects. Int. J. Comput. Vis. 59, 3 (2004), 285--300.
[55]
Ankur Handa, Michael Bloesch, Viorica Patraucean, Simon Stent, John McCormac, and Andrew Davison. 2016. gvnn: Neural network library for geometric computer vision. In arXiv:1607.07405.
[56]
Chris Harris and Carl Stennett. 1990. RAPID - A video rate object tracker. In Br. Mach. Vis. Conf.
[57]
Chris Harris and Mike Stephens. 1988. A combined corner and edge detector. In Alvey Vis. Conf. 147--151.
[58]
Richard Hartley and Frederik Schaffalitzky. 2003. PowerFactorization: 3D reconstruction with missing or uncertain data. In Aust. Adv. Work. Comput. Vis., Vol. 74. 1--9.
[59]
Richard Hartley and Andrew Zisserman. 2004. Multiple View Geometry in Computer Vision (2nd ed.). Cambridge University Press.
[60]
Richard I. Hartley and Peter Sturm. 1997. Triangulation. Comput. Vis. Image Underst. 68, 2 (1997), 146--157.
[61]
Stephan Heuel and Wolfgang Förstner. 2001. Matching, reconstructing and grouping 3D lines from multiple views using uncertain projective geometry. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.
[62]
Berthold K. P. Horn and Brian G. Schunck. 1981. Determining optical flow. Artif. Intell. 17, 1--3 (1981), 185--203.
[63]
Stefan Hrabar, Gaurav S. Sukhatme, Peter Corke, Kane Usher, and Jonathan Roberts. 2005. Combined optic-flow and stereo-based navigation of urban canyons for a UAV. In IEEE/RSJ Int. Conf. Intell. Robot. Syst. 302--309.
[64]
Thomas S. Huang and Arun N. Netravali. 1994. Motion and structure from feature correspondences: A review. Proc. IEEE 82, 2 (1994), 252--268.
[65]
Naoyuki Ichimura. 1999. Motion segmentation based on factorization method and discriminant critea. In IEEE Int. Conf. Comput. Vis.
[66]
Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In IEEE Conf. Comput. Vis. Pattern Recognit.
[67]
Eagle S. Jones and Stefano Soatto. 2011. Visual-inertial navigation, mapping and localization: A scalable real-time causal approach. Int. J. Rob. Res. 30, 4 (2011), 1--38.
[68]
Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas. 2012. Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34, 7 (2012), 1409--1422.
[69]
Jeremy Yirmeyahu Kaminski and Mina Teicher. 2002. General trajectory triangulation. In Eur. Conf. Comput. Vis. 823--836.
[70]
Jeremy Yirmeyahu Kaminski and Mina Teicher. 2004. A general framework for trajectory optimization. J. Math. Imaging Vis. 21 (2004), 27--41.
[71]
Kenichi Kanatani. 1996. Statistical Optimization for Geometric Computation: Theory and Practice. Elsevier.
[72]
Kenichi Kanatani. 2001. Motion segmentation by subspace separation and model selection. In IEEE Int. Conf. Comput. Vis. 586--591.
[73]
Kenichi Kanatani and Chikara Matsunaga. 2002. Estimating the number of independent motions for multibody motion segmentation. In Asian Conf. Comput. Vis.
[74]
Jens Klappstein, Tobi Vaudrey, Clemens Rabe, Andreas Wedel, and Reinhard Klette. 2009. Moving object segmentation using optical flow and depth information. In Pacific-Rim Symp. Image Video Technol. 611--623.
[75]
Georg Klein and David Murray. 2007. Parallel tracking and mapping for small AR workspaces. In IEEE ACM Int. Symp. Mix. Augment. Real.
[76]
Georg Klein and David Murray. 2009. Parallel tracking and mapping on a camera phone. In 8th IEEE Int. Symp. Mix. Augment. Real. 83--86.
[77]
Kishore Konda and Roland Memisevic. 2013. Unsupervised learning of depth and motion. In arXiv:1312.3429.
[78]
Kishore Konda and Roland Memisevic. 2015. Learning visual odometry with a convolutional network. In Int. Conf. Comput. Vis. Theory Appl. 486--490.
[79]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Adv. Neural Inf. Process. Syst. 1--9.
[80]
Suryansh Kumar, Yuchao Dai, and Hongdong Li. 2016. Multi-body non-rigid structure-from-motion. In Int. Conf. 3D Vis. 148--156.
[81]
Rainer Kummerle, Giorgio Grisetti, Hauke Strasdat, Kurt Konolige, and Wolfram Burgard. 2011. G2o: A general framework for graph optimization. In IEEE Int. Conf. Robot. Autom. 3607--3613.
[82]
Abhijit Kundu, K. Madhava Krishna, and C. V. Jawahar. 2010. Realtime motion segmentation based multibody visual SLAM. In 7th Indian Conf. Comput. Vision, Graph. Image Process. 251--258.
[83]
Abhijit Kundu, K. Madhava Krishna, and C. V. Jawahar. 2011. Realtime multibody visual SLAM and tracking with a smoothly moving monocular camera. In IEEE Int. Conf. Comput. Vis.
[84]
Abhijit Kundu, K. Madhava Krishna, and Jayanthi Sivaswamy. 2009. Moving object detection by multi-view geometric techniques from a single camera mounted robot. In IEEE/RSJ Int. Conf. Intell. Robot. Syst. 4306--4312.
[85]
Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. 2016. Deeper depth prediction with fully convolutional residual networks. In Int. Conf. 3D Vis. 239--248.
[86]
Quoc V. Le, Alexandre Karpenko, Jiquan Ngiam, and Andrew Y. Ng. 2011. ICA with reconstruction cost for efficient overcomplete feature learning. In Adv. Neural Inf. Process. Syst. 1--9.
[87]
Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, and Andrew Y. Ng. 2011. Building high-level features using large scale unsupervised learning. In Int. Conf. Mach. Learn. 38115.
[88]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2016. Deep learning. Nature 521 (2016), 436--444.
[89]
Kuan Hui Lee, Jenq Neng Hwang, Greg Okapal, and James Pitton. 2014. Driving recorder based on-road pedestrian tracking using visual SLAM and constrained multiple-kernel. In 17th IEEE Int. Conf. Intell. Transp. Syst. 2629--2635.
[90]
Kuan-hui Lee, Jenq-neng Hwang, Greg Okopal, and James Pitton. 2016. Ground-moving-platform-based human tracking using visual SLAM and constrained multiple kernels. IEEE Trans. Intell. Transp. Syst. 17, 12 (2016), 3602--3612.
[91]
Stefan Leutenegger, Margarita Chli, and Roland Y. Siegwart. 2011. BRISK: Binary robust invariant scalable keypoints. In IEEE Int. Conf. Comput. Vis. 2548--2555.
[92]
Stefan Leutenegger, Paul Furgale, Vincent Rabaud, Margarita Chli, Kurt Konolige, and Roland Siegwart. 2013. Keyframe-based visual-inertial SLAM using nonlinear optimization. Int. J. Rob. Res. 34, 3 (2013), 314--334.
[93]
Ting Li, Vinutha Kallem, Dheeraj Singaraju, and Rene Vidal. 2007. Projective factorization of multiple rigid-body motions. In IEEE Conf. Comput. Vis. Pattern Recognit.
[94]
Hyon Lim, Jongwoo Lim, and H. Jin Kim. 2014. Real-time 6-DOF monocular visual SLAM in a large-scale environment. In IEEE Int. Conf. Robot. Autom.
[95]
Kuen-Han Lin and Chieh-Chih Wang. 2010. Stereo-based simultaneous localization, mapping and moving object tracking. In IEEE/RSJ Int. Conf. Intell. Robot. Syst.
[96]
Tsung Han Lin and Chieh-Chih Wang. 2014. Deep learning of spatio-temporal features with geometric-based moving point detection for motion segmentation. In IEEE Int. Conf. Robot. Autom. 3058--3065.
[97]
Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. 2013. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1 (2013), 171--184.
[98]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recognit. 3431--3440.
[99]
David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2 (2004), 91--110.
[100]
Bruce D. Lucas and Takeo Kanade. 1981. An Iterative Image Registration Technique with an Application to Stereo Vision. In DARPA Image Underst. Work. 121--130.
[101]
Nikolaus Mayer, Eddy Ilg, Philip Häusser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. 2016. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In IEEE Conf. Comput. Vis. Pattern Recognit.
[102]
Christopher Mei, Gabe Sibley, Mark Cummins, Paul Newman, and Ian Reid. 2011. RSLAM: A system for large-scale mapping in constant-time using stereo. Int. J. Comput. Vis. 94, 2 (2011), 198--214.
[103]
Iaroslav Melekhov, Juha Ylioinas, Juho Kannala, and Esa Rahtu. 2017. Relative camera pose estimation using convolutional neural networks. In arXiv:1702.01381.
[104]
Davide Migliore, Roberto Rigamonti, Daniele Marzorati, Matteo Matteucci, and Domenico G. Sorrenti. 2009. Use a single camera for simultaneous localization and mapping with mobile object tracking in dynamic environments. In ICRA Work. Safe Navig. Open Dyn. Environ. Appl. to Auton. Veh.
[105]
Vikram Mohanty, Shubh Agrawal, Shaswat Datta, Arna Ghosh, Vishnu Dutt Sharma, and Debashish Chakravarty. 2016. DeepVO: A deep learning approach for monocular visual odometry. In arXiv:1611.06069.
[106]
Toshihiko Morita and Takeo Kanade. 1993. A sequential factorization method for recovering shape and motion from image streams. Proc. Natl. Acad. Sci. 90, 21 (1993), 9795--9802.
[107]
Pierre Moulon, Pascal Monasse, and Renaud Marlet. 2013. Global fusion of relative motions for robust, accurate and scalable structure from motion. In IEEE Int. Conf. Comput. Vis. 3248--3255.
[108]
Etienne Mouragnon, Maxime Lhuillier, Michel Dhome, Fabien Dekeyser, and Patrick Sayd. 2006. Monocular vision based SLAM for mobile robots. In 18th Int. Conf. Pattern Recognit.
[109]
Etienne Mouragnon, Maxime Lhuillier, Michel Dhome, Fabien Dekeyser, and Patrick Sayd. 2006. Real time localization and 3D reconstruction. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 1--8.
[110]
Etienne Mouragnon, Maxime Lhuillier, Michel Dhome, Fabien Dekeyser, and Patrick Sayd. 2007. Generic and real-time structure from motion. In Br. Mach. Vis. Conf. 64.1--64.10.
[111]
Etienne Mouragnon, Maxime Lhuillier, Michel Dhome, Fabien Dekeyser, and Patrick Sayd. 2009. Generic and real-time structure from motion using local bundle adjustment. Image Vis. Comput. 27, 8 (2009), 1178--1193.
[112]
Peter Muller and Andreas Savakis. 2017. Flowdometry: An optical flow and deep learning based approach to visual odometry. In IEEE Winter Conf. Appl. Comput. Vis.
[113]
Raul Mur-Artal, J. M. M. Montiel, and Juan D. Tardos. 2015. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31, 5 (2015), 1147--1163.
[114]
Yohei Murakami, Takeshi Endo, Yoshimichi Ito, and Noboru Babaguchi. 2012. Depth-estimation-free projective factorization and its application to 3D reconstruction. In Asian Conf. Comput. Vis. 150--162.
[115]
Richard A. Newcombe, David Molyneaux, David Kim, Andrew J. Davison, Jamie Shotton, Steve Hodges, Andrew Fitzgibbon, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. 2011. KinectFusion: Real-time dense surface mapping and tracking. In IEEE Int. Symp. Mix. Augment. Real. 127--136.
[116]
David Nister. 2004. An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26, 6 (2004), 756--770.
[117]
David Nistér, Oleg Naroditsky, and James Bergen. 2004. Visual odometry. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 652--659.
[118]
John Oliensis. 2000. A critique of structure-from-motion algorithms. Comput. Vis. Image Underst. 80, 2 (2000), 172--214.
[119]
D. Ortín and J. Montiel. 2001. Indoor robot motion based on monocular images. Robotica 19, 3 (2001), 331--342.
[120]
Nobuyuki Otsu. 1979. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man. Cybern. SMC-9, 1 (1979), 62--66.
[121]
Kemal Egemen Ozden, Kurt Cornelis, Luc Van Eycken, and Luc Van Gool. 2004. Reconstructing 3D trajectories of independently moving objects using generic constraints. Comput. Vis. Image Underst. 96, 3 (2004), 453--471.
[122]
Kemal E. Ozden, Konrad Schindler, and Luc Van Gool. 2010. Multibody structure-from-motion in practice. IEEE Trans. Pattern Anal. Mach. Intell. 32, 6 (2010), 1134--1141.
[123]
Marco Paladini, Alessio Del Bue, Marko Stošić, Marija Dodig, João Xavier, and Lourdes Agapito. 2009. Factorization for non-rigid and articulated structure using metric projections. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2898--2905.
[124]
Hyun Soo Park, Takaaki Shiratori, Iain Matthews, and Yaser Sheikh. 2010. 3D reconstruction of a moving point from a series of 2D projections. In Eur. Conf. Comput. Vis. 158--171.
[125]
Hyun Soo Park, Takaaki Shiratori, Iain Matthews, and Yaser Sheikh. 2015. 3D trajectory reconstruction under perspective projection. Int. J. Comput. Vis. 115, 2 (2015), 115--135.
[126]
Massimo Piccardi. 2004. Background subtraction techniques: A review. In EEE Int. Conf. Syst. Man Cybern., Vol. 4. 3099--3104.
[127]
Jouni Rantakokko, Joakim Rydell, Peter Strömbäck, Peter Händel, Jonas Callmer, David Törnqvist, Fredrik Gustafsson, Magnus Jobs, and Mathias Grudén. 2011. Accurate and reliable soldier and first responder indoor positioning: Multisensor systems and cooperative localization. IEEE Wirel. Commun. 18, 2 (2011), 10--18.
[128]
Shankar Rao, Roberto Tron, Rene Vidal, and Yi Ma. 2010. Motion segmentation in the presence of outlying, incomplete, or corrupted trajectories. IEEE Trans. Pattern Anal. Mach. Intell. 32, 10 (2010), 1832--1845.
[129]
Jorma Rissanen. 1984. Universal coding, information, prediction, and eestimation. IEEE Trans. Inf. Theory 30, 4 (1984), 629--636.
[130]
Edward Rosten and Tom Drummond. 2006. Machine learning for high-speed corner detection. In Eur. Conf. Comput. Vis., Vol. 1. 430--443.
[131]
Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In IEEE Int. Conf. Comput. Vis. 2564--2571.
[132]
Reza Sabzevari and Davide Scaramuzza. 2014. Monocular simultaneous multi-body motion segmentation and reconstruction from perspective views. In IEEE Int. Conf. Robot. Autom. 23--30.
[133]
Reza Sabzevari and Davide Scaramuzza. 2016. Multi-body motion estimation from monocular vehicle-mounted cameras. IEEE Trans. Robot. 32, 3 (2016), 638--651.
[134]
Muhamad Risqi Utama Saputra, Widyawan, and Paulus Insap Santosa. 2014. Obstacle avoidance for visually impaired using auto-adaptive thresholding on Kinect’s depth image. In 11th IEEE Int. Conf. Ubiquitous Intell. Comput. 337--342.
[135]
Lawrence K. Saul and Sam T. Roweis. 2003. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4, 1999 (2003), 119--155.
[136]
Davide Scaramuzza. 2011. 1-point-RANSAC structure from motion for vehicle-mounted cameras by exploiting non-holonomic constraints. Int. J. Comput. Vis. 95, 1 (2011), 74--85.
[137]
Davide Scaramuzza, Friedrich Fraundorfer, and Roland Siegwart. 2009. Real-time monocular visual odometry for on-road vehicles with 1-point RANSAC. In IEEE Int. Conf. Robot. Autom. 4293--4299.
[138]
Konrad Schindler and David Suter. 2005. Two-view multibody structure-and-motion with outliers. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.
[139]
Konrad Schindler and David Suter. 2006. Two-view multibody structure-and-motion with outliers through model selection. IEEE Trans. Pattern Anal. Mach. Intell. 28, 6 (2006), 983--995.
[140]
Konrad Schindler, David Suter, and Hanzi Wang. 2008. A model-selection framework for multibody structure-and-motion of image sequences. Int. J. Comput. Vis. 79, 2 (2008), 159--177.
[141]
Konrad Schindler, James U., and Hanzi Wang. 2006. Perspective n-view multibody structure-and-motion through model selection. In Eur. Conf. Comput. Vis., Vol. 1. 606--619.
[142]
Johannes Lutz Schönberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In IEEE Conf. Comput. Vis. Pattern Recognit. 4104--4113.
[143]
Gideon Schwarz. 1978. Estimating the dimension of a model. Ann. Stat. 6, 2 (1978), 461--464.
[144]
Amnon Shashua, Shai Avidan, and Michael Werman. 1999. Trajectory triangulation over conic sections. In IEEE Int. Conf. Comput. Vis.
[145]
Gabe Sibley, Christopher Mei, Ian Reid, and Paul Newman. 2010. Vast-scale outdoor navigation using adaptive relative bundle adjustment. Int. J. Rob. Res. 29, 8 (2010), 958--980.
[146]
Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Adv. Neural Inf. Process. Syst. 1--9.
[147]
Noah Snavely, Steven Seitz, and Richard Szeliski. 2006. PhotoTourism: Exploring photo collections in 3D. In SIGGRAPH Conf. Proc. 835--846.
[148]
Noah Snavely, Steven M. Seitz, and Richard Szeliski. 2008. Modeling the world from internet photo collections. Int. J. Comput. Vis. 80, 2 (2008), 189--210.
[149]
Joan Solà. 2007. Towards Visual Localization, Mapping and Moving Objects Tracking by a Mobile Robot: A Geometric and Probabilistic Approach. Ph.D. Dissertation. Institut National Politechnique de Toulouse.
[150]
Hauke Strasdat, J. M. M. Montiel, and Andrew J. Davison. 2012. Visual SLAM: Why filter? Image Vis. Comput. 30, 2 (2012), 65--77.
[151]
Peter Sturm and Bill Triggs. 1996. A factorization based algorithm for multi-image projective structure and motion. In Eur. Conf. Comput. Vis., Vol. 1065. 710--720.
[152]
Wei Tan, Haomin Liu, Zilong Dong, Guofeng Zhang, and Hujun Bao. 2013. Robust monocular SLAM in dynamic environments. In IEEE Int. Symp. Mix. Augment. Real.
[153]
Ninad Thakoor, Jean Gao, and Venkat Devarajan. 2010. Multibody structure-and-motion segmentation by branch-and-bound model selection. IEEE Trans. Image Process. 19, 6 (2010), 1393--1402.
[154]
Carlo Tomasi and Takeo Kanade. 1992. Shape and motion from image streams under orthography: A factorization method. In Int. J. Comput. Vis., Vol. 9. 137--154.
[155]
Philip H. S. Torr. 1998. Geometric motion segmentation and model selection. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 356, 1740 (1998), 1321--1340.
[156]
Philip H. S. Torr and Andrew Zisserman. 1997. Robust parameterization and computation of the trifocal tensor. Image Vis. Comput. 15, 8 (1997), 591--605.
[157]
Philip H. S. Torr and Andrew Zisserman. 1999. Feature based methods for structure and motion estimation. In Int. Work. Vis. Algorithms.
[158]
Philip H. S. Torr and Andrew Zisserman. 2000. MLESAC: A new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78, 1 (2000), 138--156.
[159]
Roberto Tron and Rene Vidal. 2007. A benchmark for the comparison of 3-D motion segmentation algorithms. In IEEE Conf. Comput. Vis. Pattern Recognit. 1--8.
[160]
Sepehr Valipour, Mennatullah Siam, Martin Jagersand, and Nilanjan Ray. 2017. Recurrent fully convolutional networks for video segmentation. In IEEE Winter Conf. Appl. Comput. Vis. 1--12.
[161]
René Vidal. 2006. Online clustering of moving hyperplanes. In Adv. Neural Inf. Process. Syst. 1433--1440.
[162]
Rene Vidal. 2011. Subspace clustering. IEEE Signal Process. Mag. 28, 2 (2011), 52--68.
[163]
René Vidal and Richard Hartley. 2008. Three-view multibody structure from motion. IEEE Trans. Pattern Anal. Mach. Intell. 30, 2 (2008), 214--227.
[164]
René Vidal, Yi Ma, and Shankar Sastry. 2005. Generalized principal component analysis (GPCA). In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 40, 12 (2005), 1945--1959.
[165]
René Vidal, Yi Ma, and Shankar Sastry. 2005. Generalized principal component analysis (GPCA). IEEE Trans. Pattern Anal. Mach. Intell. 27, 12 (2005), 1945--1959.
[166]
René Vidal, Yi Ma, Stefano Soatto, and Shankar Sastry. 2006. Two-view multibody structure from motion. Int. J. Comput. Vis. 68, 1 (2006), 7--25.
[167]
René Vidal, Stefano Soatto, Yi Ma, and Shankar Sastry. 2002. Segmentation of dynamic scenes from the multibody fundamental matrix. In ECCV Work. Vis. Model. Dyn. Scenes.
[168]
Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, and Katerina Fragkiadaki. 2017. SfM-Net: Learning of structure and motion from video. In arXiv:1704.07804.
[169]
Chieh-Chih Wang and Chuck Thorpe. 2002. Simultaneous localization and mapping with detection and tracking of moving objects. In IEEE Int. Conf. Robot. Autom., Vol. 3. 2918--2924.
[170]
Chieh-Chih Wang, Charles Thorpe, Sebastian Thrun, M. Hebert, and H. Durrant-Whyte. 2007. Simultaneous localization, mapping and moving object tracking. Int. J. Rob. Res. 26, 9 (2007), 889--916.
[171]
Sen Wang, Ronald Clark, Hongkai Wen, and Niki Trigoni. 2017. DeepVO: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In IEEE Int. Conf. Robot. Autom.
[172]
Yin Tien Wang, Ming Chun Lin, and Rung Chi Ju. 2010. Visual SLAM and moving-object detection for a small-size humanoid robot. Int. J. Adv. Robot. Syst. 7, 2 (2010), 133--138.
[173]
Somkiat Wangsiripitak and David W. Murray. 2009. Avoiding moving outliers in visual SLAM by tracking moving objects. In IEEE Int. Conf. Robot. Autom.
[174]
Changchang Wu. 2013. Towards linear-time incremental structure from motion. In Int. Conf. 3D Vis. 127--134.
[175]
Changchang Wu, Sameer Agarwal, Brian Curless, and Steven M. Seitz. 2011. Multicore bundle adjustment. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 3057--3064.
[176]
Jing Xiao, Jin-xiang Chai, and Takeo Kanade. 2004. A closed-form solution to non-rigid shape and motion recovery. In Eur. Conf. Comput. Vis. 573--587.
[177]
Jingyu Yan and Marc Pollefeys. 2006. A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In Eur. Conf. Comput. Vis.
[178]
Jingyu Yan and Marc Pollefeys. 2008. A factorization-based approach for articulated nonrigid shape, motion, and kinematic chain recovery from video. IEEE Trans. Pattern Anal. Mach. Intell. 30, 5 (2008), 865--877.
[179]
Congyuan Yang, Daniel Robinson, and Rene Vidal. 2015. Sparse subspace clustering with missing entries. In Int. Conf. Mach. Learn. 2463--2472.
[180]
Georges Younes, Daniel Asmar, and Elie Shammas. 2016. A survey on non-filter-based monocular visual SLAM systems. In arXiv:1607.00470.
[181]
Khalid Yousif, Alireza Bab-Hadiashar, and Reza Hoseinnezhad. 2015. An overview to visual odometry and visual SLAM: Applications to mobile robotics. Intell. Ind. Syst. 1, 4 (2015), 289--311.
[182]
Luca Zappella, Alessio Del Bue, Xavier Lladó, and Joaquim Salvi. 2013. Joint estimation of segmentation and structure from motion. Comput. Vis. Image Underst. 117, 2 (2013), 113--129.
[183]
Hendrik Zender, Patric Jensfelt, and Geert Jan M. Kruijff. 2007. Human- and situation-aware people following. In IEEE Int. Work. Robot Hum. Interact. Commun. 1131--1136.
[184]
Dong Zhang and Ping Li. 2012. Visual odometry in dynamical scenes. Sensors Transducers J. 147, 12 (2012), 78--86.
[185]
Teng Zhang, Arthur Szlam, and Gilad Lerman. 2009. Median K-flats for hybrid linear modeling with many outliers. In Int. Conf. Comput. Vis. Work. 234--241.
[186]
Enliang Zheng, Ke Wang, Enrique Dunn, and Jan Michael Frahm. 2014. Joint object class sequencing and trajectory triangulation (JOST). In Eur. Conf. Comput. Vis. 599--614.
[187]
Tinghui Zhou, Matthew Brown, Noah Snavely, and David G. Lowe. 2017. Unsupervised learning of depth and ego-motion from video. In IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.

Cited By

View all
  • (2025)RDynaSLAM: Fusing 4D Radar Point Clouds to Visual SLAM in Dynamic EnvironmentsJournal of Intelligent & Robotic Systems10.1007/s10846-024-02204-1111:1Online publication date: 4-Jan-2025
  • (2024)A Review of SLAM Research on Mobile Robot VisionFrontiers in Science and Engineering10.54691/gcwwng314:11(17-23)Online publication date: 24-Nov-2024
  • (2024)Three-Dimensional Dense Reconstruction: A Review of Algorithms and DatasetsSensors10.3390/s2418586124:18(5861)Online publication date: 10-Sep-2024
  • Show More Cited By

Recommendations

Reviews

Giuseppina Carla Gini

Reconstructing an environment's 3D models is traditionally a computer vision problem, crucial for virtual reality (VR) applications and mobile robots that have to estimate the pose of the camera that moves with them. Well-known vision methods, such as structure from motion (SfM), and robotics methods, such as visual simultaneous localization and mapping (SLAM), while effective in static environments are still challenging in dynamic environments. This survey illustrates the state of the art of vision and robotics methods for real-time rendering in real-world environments containing dynamic objects. It proposes a taxonomy of the available approaches divided into three main themes: building static maps by rejecting dynamic features (robust visual SLAM), extracting moving objects while ignoring the static background (dynamic object segmentation and 3D tracking), and simultaneously handling the static and dynamic components of the world (joint motion segmentation and reconstruction). It also critically discusses the advantages and disadvantages of the many illustrated approaches, which rely on methods spanning from geometry to statistics to machine learning. The authors nicely organize about 200 references, using figures with flow diagrams and summarizing via tables the existing approaches. The paper can serve as an introduction for researchers new to the field, as well as a practical guide to specific approaches for application-oriented developers.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 51, Issue 2
March 2019
748 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3186333
  • Editor:
  • Sartaj Sahni
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2018
Accepted: 01 December 2017
Revised: 01 December 2017
Received: 01 August 2017
Published in CSUR Volume 51, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 3D reconstruction
  2. 3D tracking
  3. Structure from motion
  4. deep learning
  5. dynamic environments
  6. dynamic object segmentation
  7. motion segmentation
  8. visual SLAM
  9. visual odometry

Qualifiers

  • Survey
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)644
  • Downloads (Last 6 weeks)57
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)RDynaSLAM: Fusing 4D Radar Point Clouds to Visual SLAM in Dynamic EnvironmentsJournal of Intelligent & Robotic Systems10.1007/s10846-024-02204-1111:1Online publication date: 4-Jan-2025
  • (2024)A Review of SLAM Research on Mobile Robot VisionFrontiers in Science and Engineering10.54691/gcwwng314:11(17-23)Online publication date: 24-Nov-2024
  • (2024)Three-Dimensional Dense Reconstruction: A Review of Algorithms and DatasetsSensors10.3390/s2418586124:18(5861)Online publication date: 10-Sep-2024
  • (2024)A Comparative Review on Enhancing Visual Simultaneous Localization and Mapping with Deep Semantic SegmentationSensors10.3390/s2411338824:11(3388)Online publication date: 24-May-2024
  • (2024)Simultaneous localization and mapping in a multi-robot system in a dynamic environment with unknown initial correspondenceFrontiers in Robotics and AI10.3389/frobt.2023.129167210Online publication date: 11-Jan-2024
  • (2024)Optimisation of key algorithms for vision-based SLAM in highly dynamic environmentsApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-36319:1Online publication date: 29-Nov-2024
  • (2024)Dynamic Object Detection and Tracking in Vision SLAMApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-11749:1Online publication date: 22-May-2024
  • (2024)Accurate localization of indoor high similarity scenes using visual slam combined with loop closure detection algorithmPLOS ONE10.1371/journal.pone.031235819:12(e0312358)Online publication date: 30-Dec-2024
  • (2024)Visual place recognition with fusion event camerasJournal of Image and Graphics10.11834/jig.23000329:4(1018-1029)Online publication date: 2024
  • (2024)Generative-AI based Map Representation and LocalizationProceedings of the 2nd ACM SIGSPATIAL International Workshop on Advances in Urban-AI10.1145/3681780.3697276(34-42)Online publication date: 29-Oct-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media