Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu

Relative Pose Estimation From Image Correspondences Under a Remote Center of Motion Constraint

2018, IEEE Robotics and Automation Letters

2654 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 3, NO. 3, JULY 2018 Relative Pose Estimation From Image Correspondences Under a Remote Center of Motion Constraint Francisco Vasconcelos , Evangelos Mazomentos, John Kelly, Sebastien Ourselin, and Danail Stoyanov Abstract—This letter proposes an algorithm to estimate the relative pose between two image view-points assuming that a camera is moving under a remote center of motion constraint. This is useful in minimally invasive robotic surgery, where the motion of a laparoscopic camera is constrained by the keyhole insertion point. Our method uses point correspondences between the two images and does not require any knowledge about the position of the remote center of motion. The pipeline consists of a 4-point minimal closed-form solver, used within a robust RANSAC framework to filter outlier correspondences, followed by a Levenberg–Marquardt refinement step. Our method compares favorably against the classic relative pose solution for unconstrained motion (5-point algorithm) both with synthetic data and a real footage of endoscopic robotic surgery. Index Terms—Surgical robotics: laparoscopy, visual-based navigation. Fig. 1. Image guided minimally invasive procedure performed with a surgical robot. The motion of both the camera and the tools is constrained by the trocar placement. I. INTRODUCTION ELATIVE pose estimation between two image views is a common way to obtain visual odometry of a moving camera sensor [1]. It is a basic component of more complex navigation and 3D reconstruction systems such as Simultaneous Localisation and Mapping (SLAM) [2] or Structure-FromMotion (SfM) [3]. The classic relative pose problem, considering a six degree-of-freedom unconstrained motion in the 3D space has been widely validated in computer vision and robotics applications [4]. When prior knowledge about the camera motion is available, other formulations have been proposed that add new constraints to reduce the number of pose parameters to be estimated [5]–[8]. These algorithms generally outperform the general relative pose solution in their respective domains. R Manuscript received September 10, 2017; accepted February 5, 2018. Date of publication February 27, 2018; date of current version May 8, 2018. This letter was recommended for publication by Associate Editor I. I. Iordachita and Editor K. Masamune upon evaluation of the reviewers comments. This work was supported in part by the Wellcome Trust [WT101957] through an Innovative Engineering for Health award and in part by the Engineering and Physical Sciences Research Council (EPSRC) [NS/A000027/1]. (Corresponding Author: Francisco Vasconcelos.) F. Vasconcelos, E. Mazomentos, S. Ourselin, and D. Stoyanov are with the Centre for Medical Image Computing (CMIC), University College London, London W1W 7TS, U.K. (e-mail: v.vasconcelos@ucl.ac.uk; e.mazomenos@ ucl.ac.uk; s.ourselin@ucl.ac.uk; danail.stoyanov@ucl.ac.uk). J. Kelly is with the Division of Surgery and Interventional Science, University of College London, London NW3 2PS, U.K. (e-mail: j.d.kelly@ucl.ac.uk). Digital Object Identifier 10.1109/LRA.2018.2809617 In this letter we address the relative pose estimation problem in the context of image guided minimally invasive surgery. In this type of procedures, the surgical tools are manipulated through trocars that are placed on small incisions on the patient (Fig. 1), and are guided by an endoscopic camera that is also inserted through a trocar. Due to this set-up the camera motion is bounded by the trocar placement in a way that is usually modelled by a remote center of motion constraint [9], i.e., the endoscope must always intersect the 3D point where the trocar is located. This means that the endoscope motion has only 4 degrees of freedom: three rotation parameters and a single translation component. Some minimally invasive procedures are currently performed with a surgical robot (e.g., prostatectomy [10]) that enforces the trocar motion constraints by assuming a static remote center of motion[11], [12]. Given that in practice a trocar is not strictly static due to patient motion or breathing, some approaches propose a more flexible kinematic control by incorporating force feedback [13]. There have been previous works on localisation problems related to a center of motion constraint, including trocar localisation and detection from the robot kinematics [14], [15], or tool pose tracking under remote center of motion constraints [16]. However, to the best of our knowledge, the relative pose problem between two camera views under a remote center of motion constraint has not been previously addressed. A solution to this problem can be useful for tackling multiple problems This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ VASCONCELOS et al.: RELATIVE POSE ESTIMATION FROM IMAGE CORRESPONDENCES in image guided surgery, including real-time localisation of the endoscopic camera and the surgical tools, accurate 3D reconstructions of the anatomical site, as well as 3D registration with pre-operative imaging. With a surgical robot, the camera motion can be estimated through the kinematic chain of the manipulator holding the camera. However, visual localisation across multiple views is still necessary for representing both camera and human anatomy in the same reference frame, or whenever robot hand-eye calibration is challenging in the surgical setting. An alternative is to estimate the camera motion directly from the change of perspective of different frames using a Structure-from-Motion or SLAM approach [17]. This is a very challenging task, as a reliable motion estimation requires that a sufficiently descriptive part of the scene remains static across different frames. This contrasts with the highly dynamic nature of most surgical scenes that include deformable tissue and moving tools. In this letter we address this problem by using the remote center of motion constraints in order to reduce the strict requirements on both the quantity and the quality of static image features required to estimate an accurate relative pose between two views. The contributions of this letter are summarised as follows: r A simplified model (aligned axis assumption) for expressing the remote center of motion constraints as a single linear equation in terms of one essential matrix parameter, or by a quadratic equation in terms of translation and rotation parameters. r Formulation of the relative camera pose problem with remote center of motion constraints, leading to a minimal solution that requires only 4 point correspondences between two images. r Comparison between our algorithm and the classic 5-point relative pose solution for unconstrained motion [4]. Our solution outperforms the 5-point algorithm with both synthetic data and real video footage from a radical prostatectomy procedure performed with the da Vinci surgical robot [18]. r Robustness evaluation of our algorithm when the aligned axis assumption is not strictly verified. In simulation, our solution shows no signals of degradation for moderate deviations to the aligned axis assumption, considering motions where there is a sufficient field of view overlap between two views. With real data, our model is a sufficiently good approximation to make our 4-point solution work, even though a stereo camera that does not conform to the aligned axis assumption is used. II. NOTATION Scalars are represented by plain letters, e.g., λ, vectors are indicated by bold symbols, e.g., t, and matrices are denoted by letters in sans serif font, e.g., T. 2D points and lines are expressed in homogeneous coordinates as 3 × 1 vectors. The operator [v]× designates the 3 × 3 skew symmetric matrix of a 3 × 1 vector v, such that [v]× x = v × x. Fig. 2. tion. 2655 Remote center of motion formulation under the aligned axis assump- III. PROBLEM FORMULATION Consider a rigid endoscopic camera with known intrinsic parameters being inserted into a patient through a keyhole incision point O (Fig. 2). We aim at estimating the relative pose with rotation R and translation t   R t (1) T = T1 T−1 = 2 0 1 when the endoscope moves between the world-to-camera transformations T1 and T2 , given a set of pairwise point correspondences (xi , x′i ) between the two views. We start by briefly reviewing the classic relative pose estimation with unconstrained motion, and then we introduce the remote center of motion constraint defined by point O. A. Unconstrained Relative Pose Two image point correspondences xi , x′i that represent the same 3D point X under two different calibrated views are related by the epipolar constraint x′T i Exi = 0 (2) where the essential matrix [19] E = [t]× R (3) must verify the following cubic relations 1 EET E − trace(EET )E = 0, 2 det E = 0 (4) The essential matrix E has 5 degrees of freedom, and can be estimated from a minimum of 5 point correspondences [4]. Although multiple 5-point algorithm implementations exist, they typically proceed as follows: first, a 4-dimensional linear solution subspace for E is generated from 5 or more instances of (2); then, the 4 remaining unknown parameters are determined by solving a cubic system of ten equations (4). This procedure generates up to 10 algebraic solutions for the matrix E, which can only be disambiguated by verifying the epipolar consistency (2) of at least 6 correspondences. Finally a rotation R and an up-to-scale translation t can be uniquely factorised from E [19]. Additionally, the performance of this 5-point algorithm is greatly enhanced by using it within a RANSAC framework 2656 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 3, NO. 3, JULY 2018 [20] for outlier filtering, followed by an iterative LevenbergMarquardt refinement step that minimises the re-projection error of inlier correspondences. IV. 4-POINT MINIMAL SOLUTION B. Remote Center of Motion Constraint Consider now that the endoscope motion is constrained such that it must go through the remote center of motion O. To model this constraint we work under the following assumption: for any possible camera pose, the optical axis of the endoscopic camera intersects the remote center of motion O (Fig. 2). In the context of this letter we designate this as the aligned axis assumption. Note that this assumption might not be strictly verified in practice with a real endoscope, however, we leave the discussion of its validity for later sections. The remote center of motion constraint under the aligned axis assumption is a generalisation of the spherical camera motion as modelled in [5] for the case of a varying sphere radius. Therefore, we follow an analogous strategy to this work in order to derive our formulation. Consider that O is the origin of the world reference frame W. From the aligned axis assumption, it follows that any transformation Ti between W and the camera reference frame can be represented as   Ri zi (5) Ti = 0 1 where the translation zi = ( 0 0 zi )T has only one degree of freedom and represents the distance between O and the principal point of the camera. Consider now a camera motion between transformations T1 and T2 (Fig. 2). Assume that, without loss of generality,    T  I z1 R z2 (6) , T2 = T1 = 0 1 0 1 with z1 = ( 0 0 z1 )T , z2 = ( 0 0 z2 )T , and I being the 3 × 3 identity matrix. The relative pose T between the two views becomes   R z1 − Rz2 −1 (7) T = T1 T2 = 0 1 By substituting this into (3), the essential matrix under the remote center of motion constraint has the following format E = [z1 − Rz2 ]× R = ⎛ −r1,2 z2 − r2,1 z1 = ⎝ r1,1 z1 − r2,2 z2 −r3,2 z2 [z1 ]× R − R[z2 ]× r1,1 z2 − r2,2 z1 r1,2 z1 + r2,1 z2 r3,1 z2 (8) ⎞ −r2,3 z1 r1,3 z1 ⎠ (9) 0 where, ri,j is the element from ith row and jth column of R. From this follows that the remote center of motion, under the aligned axis assumption, is constrained by e3,3 = 0 (10) where e3,3 is the element from the third row and third column of E. Additionally, this constraint can also be represented in terms of translation and rotation as r2,3 t1 − r1,2 t2 = 0 where t1 and t2 are the first and second components of the relative translation t. (11) The constraint from (10) eliminates one degree of freedom for the essential matrix E, and thus it can now be estimated minimally from 4 point correspondences instead of the 5 required for unconstrained motion. Given the simplicity of the remote center of motion constraint, we propose a 4-point relative pose algorithm that is extremely similar to its 5-point counterpart. Given a set of N >= 4 correspondences (xi , x′i ), up to 10 algebraic solutions for the relative pose solutions can be obtained as follows 1) Build a linear system by stacking 4 or more instances of (2) in terms of the 8 up-to-scale unknown parameters of the essential matrix (ei,j ): ⎛ ⎞ e1,1 ⎜ e2,1 ⎟ ⎟ ⎞⎜ ⎛ T ′ ⎟ ′ ′ T ′ x1 x1,1 x1 x1,2 x1,1 x1,3 x1,2 x1,3 ⎜ ⎜ e3,1 ⎟ ⎜ ′ ′ ′ ′ T T ⎜ x2 x2,1 x2 x2,2 x2,1 x2,3 x2,2 x2,3 ⎟ ⎜ e1,2 ⎟ ⎠⎜e ⎟ ⎝ ⎟=0 .. .. .. .. ⎜ 2,2 ⎟ ⎜ e3,2 ⎟ . . . . ⎜ ⎟ ⎝ e1,3 ⎠ e2,3 (12) 2) Determine a 4-dimensional linear solution subspace to (12) using SVD decomposition. This defines E = aE1 + bE2 + cE3 + E4 (13) where {E1 , E2 , E3 , E4 } is the linear base for the solution and a, b, c are unknown parameters. 3) Substitute (13) into the cubic constraints of (4), forming a polynomial system of 10 equations in 3 unknowns a, b, c. 4) Solve the polynomial system using the action matrix method [21] 5) Factorise E into rotation R and translation t [19] V. RELATIVE POSE ESTIMATION PIPELINE Our relative pose estimation pipeline follows the same structure as the traditional pipeline for unconstrained motion that uses the 5-point algorithm [4]. The 4-point minimal solution is used within a RANSAC framework to remove outlier correspondences. Considering a camera with known intrinsics K, the result is then refined with Levenberg-Marquardt non-linear optimisation by minimising the distances ri , r′i in pixels, between image point correspondences (x, x′i ) and their corresponding epipolar lines (li = ET x′i , l′i = Exi ). The epipolar distances ri , r′i are equivalent to the residue of (2) when normalised to the pixel units of each camera view. The rotation is parametrised as a quaternion q while the translation by only two of its components t2 , t3 . The missing translation component t1 is implicitly defined by the remote center VASCONCELOS et al.: RELATIVE POSE ESTIMATION FROM IMAGE CORRESPONDENCES Fig. 3. Remote center of motion configurations that do not verify the aligned axis assumption, due to translation and rotation offsets. Within a broad range of motions, two views are still close to having intersecting optical axes at a given point (red dots). Offsets are exaggerated for visualisation purposes. of motion constraint (11). N ||ri ||2 + ||r′i ||2 min q,t 2 ,t 3 (14) i= 1 with ri = Kdi |xi T li | , ||I2×3 li || r′i = Kd′i ′ |x′T i li | ||I2×3 l′i || Fig. 4. Minimal solver simulation results for 100 trials, using minimal data and no noise. Both 4-point and 5-point algorithms pure matlab implementations. The computation times were obtained on a Macbook Pro (Mid 2015) with a 2.5 GHz Intel Core i7. depth allowed by the surgical setup. Finally we can also observe that two camera views that share a significant field of view overlap (and thus are broadly facing the same direction) generally have optical axis that are very close to intersect at a certain point (displayed in red in Fig. 3). In the experimental section we validate our algorithm using a stereo camera pair with a baseline of approximately 5 mm, and therefore both cameras correspond to a configuration similar to Fig. 3(a). VII. EXPERIMENTAL RESULTS (15) where di and d′i are unit 2D homogenous vectors orthogonal to the epipolar lines li and l′i respectively. To ensure valid rotations, the rotation quaternion is scaled to a unit norm each time the epipolar distances are computed. VI. COMMENTS ON THE ALIGNED AXIS ASSUMPTION In this section we discuss the limits of the aligned axis assumption and its impact on the applicability of our 4-point algorithm. First we should note that since we do not make any assumption on the location of the remote center of motion and consider only two frames. Therefore our problem is equivalent to estimating the relative pose between any two cameras whose optical axes intersect. Note that a pure translation between two camera views (parallel optical axes) can also be estimated with our algorithm, since they intersect at infinity in the projective space and the correspondent essential matrix has the format ⎞ ⎛ 0 −t3 t2 0 −t1 ⎠ (16) E = [t]× = ⎝ t3 −t2 t1 0 which verifies (10). On the other hand, pure rotation motions are degenerate configurations due to the elements of the essential matrix being all close to zero. We now consider the remote center of motion constraint in cases where the aligned axis assumption is not verified. When there is a translation offset (Fig. 3(a)), the camera axis is always tangent to a spherical surface with radius equal to the distance between the optical axis and the remote centre of motion. When there is a rotation offset (Fig. 3(b)), the camera axis goes through a sphere whose radius is defined by the maximum or minimum 2657 We compare our 4-point algorithm against the 5-point algorithm for unconstrained motion. Although there are publicly available versions, we implemented the 5-point algorithm using the action matrix method [21] in order to use the same methodology as our 4-point implementation. Both algorithms are tested on synthetic data and real video footage from a radical prostatectomy. We also validate the robustness of our method when the aligned axis assumption is not verified. A. Simulation A simulator was designed to approximate the imaging conditions of a surgical robot. We consider a pinhole camera with resolution 1920 × 1080, with intrinsic parameters ⎛ ⎞ 1500 0.01 800 1400 600 ⎠ (17) K=⎝ 0 0 0 1 A set of 3D points is randomly generated within a 60 mm cube. The remote center of motion is set at a distance of 200 mm from the center of mass of the scene 3D points. Camera poses are randomly generated within a distance interval between 40 and 80 mm to the remote center of motion, while the rotation is generated within the maximum range that allows all 3D points to be visible in the images. We start by analysing the behaviour of the minimal 4-point solver with noise-free data in 100 random trials. Fig. 4(a) displays the epipolar error, as defined in (15), while Fig. 4(b) displays the computational time. We also compare both algorithms in the presence of point correspondences with 1 pixel variance Gaussian noise and non-minimal data. In this case we use the complete relative pose pipeline including RANSAC and Levenberg-Marquardt optimisation. A threshold of 1 pixel is used to filter outliers 2658 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 3, NO. 3, JULY 2018 Fig. 5. Simulation results: distributions are Matlab boxplots, where the center mark is the medium, the box limits are the 1st and 3rd quartiles, the whisker limits are minimum and maximum values, and cross marks are outliers. (a) Sample simulated camera motion under aligned axis assumption. (b) Sample simulated camera motion with an axis offset (in both rotation and translation) that makes the aligned axis assumption not true (zoomed in detail). (c), (d) Translation and rotation errors under the aligned axis assumption for a varying number of point correspondences and 1 pixel noise. Translation error is the angle between groundtruth and estimated vectors. (e) Simulated distances between optical axes when the aligned axis assumption is not verified, the distance between optical axes is the orthogonal euclidean distance between lines in 3D space. (f) Simulated residues for the value of e3 , 3 (10) when the aligned axis assumption is not verified. (g), (h) Translation and rotation errors for a varying axis offset (i.e., aligned axis assumption not true) in both translation and rotation, using 15 point correspondences with 1 pixel variance Gaussian noise. in RANSAC. We start by considering that the aligned axis assumption holds true (Fig. 5(a)). Note that due to the fact that translations are estimated up-to-scale, we use the angle between estimated and groundtruth translation vectors as the error metric. Our algorithm is consistently more accurate both in terms of rotation and translation (Fig. 5(c) and (d)). For 15 correspondences, it obtains median translation and rotation errors of 2.22 and 0.44 degrees respectively, while the 5-point solution reaches median errors of 2.91 and 0.64 degrees. Finally, we add an offset rotation and translation with increasing magnitude to the simulated camera views so that the aligned Fig. 6. Stereo motion experiment. Left and right camera trajectories are estimated independently, except for the translation scale factor, which is obtained using a point cloud obtained from sparse stereo triangulation. The trajectories are compared by measuring their consistency with the fixed and known stereo extrinsic calibration. VASCONCELOS et al.: RELATIVE POSE ESTIMATION FROM IMAGE CORRESPONDENCES 2659 Fig. 7. Prostate stereo dataset captured with da Vinci. (a)–(i) results for the same forward motion trajectory when estimated using different frame steps: 2, 5, and 10 respectively for each row. (j)–(l) results for a circular motion trajectory. Left and right camera motions are estimated independently except for the translation scale. Only point matches between consecutive frames are considered, and no global refinement is performed. The stereo translation and rotation distributions represent the different transformations between left/right cameras obtained from the two independent trajectory estimations, while the green line represents the stereo transformation obtained from an independent offline stereo camera calibration. axis assumption is not true any more (Fig. 5(b)). The maximum tested offset of 2.5 mm corresponds to the expected scenario in our real experiment using data from a stereo camera with 5 mm baseline between left and right cameras. Fig. 5(e) and (f) quantify how this offset affects the aligned axis assumption. Fig. 5(e) displays the distance between the optical axes of both views. In line with our observations in Section VI, after an offset is applied, the optical axes are still relatively close to intersecting. E.g., an offset of 2.5 mm and 0.5 degrees corresponds to a median distance between optical axes below 0.2 mm, and a maximum distance around 1 mm. Since our simulation guarantees that there is a sufficient overlap in the fields of view of both cam- 2660 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 3, NO. 3, JULY 2018 eras, the motions are strictly restricted to a small working space, where the camera axes of both cameras are still close to intersecting. We expect this to be the case with a real scenario where point matches can be extablished between two views. Fig. 5(f) represents the groundtruth value of e3,3 (10) after normalising the essential matrix to a unit Frobenius norm. Fig. 5(g) and (h) represent the translation and rotation errors of the 4-point and 5-point algorithms in these conditions. The 4-point algorithm does not degrade in performance for an increasing offset, 15 point correspondences and 1 pixel variance Gaussian noise. B. Real Data We compare the performance of 4-point and 5-point algorithms when estimating a camera trajectory on video sequences from a radical prostatectomy performed with the da Vinci Si surgical robot. The camera is a stereo laparoscope, an interesting case for two reasons: 1) it allows us to test the robustness of our algorithm in a scenario where the aligned axis assumption is not verified by design of the scope; 2) in the absence of groundtruth motion data, we can evaluate the relative pose algorithms indirectly by measuring the discrepancy between left/right trajectory estimations in terms of the left-to-right stereo transformation changes along the estimated trajectories. We select two sequences from the procedure where there is significant camera motion. The first one is a forward motion (55 frames), and the second is a circular motion (86 frames). Both videos contain two static surgical tools and a live tissue background presenting slight deformations over time. We use SIFT descriptors [22] to establish image point correspondences between different frames. The camera trajectories are estimated by successively applying a relative pose algorithm between frames at regular intervals. Note, however, that monocular relative pose algorithms addressed in this letter only provide an up-to-scale translation. In order to find the correct scale we compare 3D point reconstructions from a stereo pair with the correct baseline against 3D point clouds obtained from two consecutive frames. The scale estimation is the only step where stereo information is used. The consecutive up-to-scale relative poses are estimated independently for the left and right cameras. The complete pipeline for the stereo sequence experiment is summarised in Fig. 6. Although better trajectory estimations could obviously be obtained by incorporating stereo information during the relative pose estimation step, our main goal is to establish unbiased consistency metrics to compare the performance between the 5-point and 4-point algorithms. An aspect that must be taken into account in this experiment is that relative pose estimation degenerates for very small translations [19]. This affects both the classic 5-point algorithm and our 4-point algorithm. To evaluate their breaking points we compare both algorithms on the forward motion sequence using different frame steps. We can observe that by estimating the relative pose every two frames (Fig. 7(c)), both the 5-point and the 4-point algorithms perform badly, although the 4-point is able to hold an accurate trajectory for a longer period. Estimating the relative pose every five frames (Fig. 7(f)) represents the threshold when our 4-point method starts working with greater stability, while Fig. 8. Circular motion trajectory represented in the same reference frame as a 3D stereo reconstruction of the scene. the 5-point algorithm still fails at some frames. Finally, using every ten frames (Fig. 7(i)) both the 5-point and 4-point are able to estimate consistent trajectories, with our method taking a slight advantage. The circular motion is faster, and thus our 4-point algorithm is able to recover a consistent trajectory estimate for the whole duration of the sequence by estimating a relative pose every two frames (Fig. 7(l)). The 5-point algorithm, however, presents a significant discrepancy in terms of the stereo transformation between the two trajectories. In Fig. 8 we represent the circular trajectory with respect to the 3D scene as reconstructed from a stereo view. VIII. CONCLUSIONS We propose a new algorithm for estimating the relative pose between two camera views under a remote center of motion constraint. We use the aligned axis assumption to greatly simplify the formulation, making our algorithm extremely simple to implement. Although the aligned axis assumption is not strictly verified in practice, in all our tests this did not stop our algorithm from outperforming the classic 5-point solution for unconstrained motion estimation. Although our current formulation enforces a remote center of motion constraint between two views, it does not enforce it to be at a known position, nor to be the same point for different pairs of frames. Therefore, our method can be used for any problem where two consecutive views have intersecting camera axes. A trivial example is the planar 2D motion of a ground vehicle equipped with a non-tilted camera. Although it is possible that enforcing a fixed remote center over more than 2 frames could improve the estimation of motion sequences, it is yet unclear if the current flexibility of our formulation is able to cope better with moderate RCM motions. This trade-off requires further experiments to be properly evaluated. VASCONCELOS et al.: RELATIVE POSE ESTIMATION FROM IMAGE CORRESPONDENCES A complete motion estimation pipeline for endoscopy cannot be built solely using a relative pose solution, since estimating the relative pose using the the essential matrix is not adequate for motions with very small translations. As observed in Fig. 7(c), it is very challenging for our method (as well as the 5-point algorithm) to work reliably in video sequences with high frame rates. The next step is therefore to extend the remote center of motion constraint to other components of SfM/SLAM systems, such as the ressectioning (pnp) problem [23], the relative pose between stereo pairs [24], and multi-view bundle adjustment [25]. REFERENCES [1] D. Nistér, O. Naroditsky, and J. Bergen, “Visual odometry,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 1, 2004, pp. I-652–I-659. [2] J. Fuentes-Pacheco, J. Ruiz-Ascencio, and J. M. Rendón-Mancha, “Visual simultaneous localization and mapping: A survey,” Artif. Intell. Rev., vol. 43, no. 1, pp. 55–81, 2015. [3] N. Snavely, S. M. Seitz, and R. Szeliski, “Photo tourism: Exploring photo collections in 3D,” ACM Trans. Graph., vol. 25, no. 3, pp. 835–846, 2006. [4] D. Nister, “An efficient solution to the five-point relative pose problem,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 6, pp. 756–770, Jun. 2004. [5] J. Ventura, “Structure from motion on a sphere,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 53–68. [6] B. Li, L. Heng, G. H. Lee, and M. Pollefeys, “A 4-point algorithm for relative pose estimation of a calibrated camera with a known relative rotation angle,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2013, pp. 1595–1601. [7] D. Scaramuzza, F. Fraundorfer, and R. Siegwart, “Real-time monocular visual odometry for on-road vehicles with 1-point RANSAC,” in Proc. IEEE Int. Conf. Robot. Autom., 2009, pp. 4293–4299. [8] F. Fraundorfer, P. Tanskanen, and M. Pollefeys, “A minimal case solution to the calibrated relative pose problem for the case of two known orientation angles,” in Proc. Eur. Conf. Comput. Vis., pp. 269–282, 2010. [9] R. H. Taylor, J. Funda, D. D. Grossman, J. P. Karidis, and D. A. LaRose, “Remote center-of-motion robot for surgery,” US Patent 5,397,323, Mar. 14 1995. [10] A. Tewari et al., “Technique of da Vinci robot-assisted anatomic radical prostatectomy,” Urology, vol. 60, no. 4, pp. 569–572, 2002. [11] R. H. Taylor et al., “A telerobotic assistant for laparoscopic surgery,” IEEE Eng. Med. Biol. Mag., vol. 14, no. 3, pp. 279–288, May/Jun. 1995. 2661 [12] N. Aghakhani, M. Geravand, N. Shahriari, M. Vendittelli, and G. Oriolo, “Task control with remote center of motion constraint for minimally invasive robotic surgery,” in Proc. IEEE Int. Conf. Robot. Autom., 2013, pp. 5807–5812. [13] A. Krupa, C. Doignon, J. Gangloff, M. de Mathelin, L. Solert, and G. Morel, “Towards semi-autonomy in laparoscopic surgery through vision and force feedback control,” in Experimental Robotics VII. Berlin, Germany: Springer-Verlag, 2001, pp. 189–198. [14] L. Dong and G. Morel, “Robust trocar detection and localization during robot-assisted endoscopic surgery,” in Proc. IEEE Int. Conf. Robot. Autom., 2016, pp. 4109–4114. [15] Z. Wang et al., “Vision-based calibration of dual RCM-based robot arms in human-robot collaborative minimally invasive surgery,” IEEE Robot. Autom. Lett., vol. 3, no. 2, pp. 672–679, Apr. 2018. [16] C. Doignon, F. Nageotte, and M. De Mathelin , “Segmentation and guidance of multiple rigid objects for intra-operative endoscopic vision,” in Dynamical Vision. Berlin, Germany: Springer-Verlag, 2007, pp. 314–327. [17] P. Mountney, D. Stoyanov, A. Davison, and G.-Z. Yang, “Simultaneous stereoscope localization and soft-tissue mapping for minimal invasive surgery,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervention, 2006, pp. 347–354. [18] G. H. Ballantyne and F. Moll, “The da Vinci telerobotic surgical system: The virtual operative field and telepresence surgery,” Surgical Clinics, vol. 83, no. 6, pp. 1293–1304, 2003. [19] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision. Cambridge, U.K.: Cambridge Univ. Press, 2003. [20] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, 1981. [21] M. Byröd, K. Josephson, and K. Åström, “Fast and stable polynomial equation solving and its application to computer vision,” Int. J. Comput. Vis., vol. 84, no. 3, pp. 237–256, 2009. [22] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, pp. 91–110, Nov. 2004. [23] F. Moreno-Noguer, V. Lepetit, and P. Fua, “Accurate non-iterative o (n) solution to the pnp problem,” in Proc. IEEE 11th Int. Conf. Comput. Vis., 2007, pp. 1–8. [24] F. Vasconcelos and J. P. Barreto, “Towards a minimal solution for the relative pose between axial cameras,” in Proc. Brit. Mach. Vision Conf., 2013, p. 1241–1–1241–1. [25] B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon, “Bundle adjustment—A modern synthesis,” in Proc. Int. Workshop Vis. Algorithms, 1999, pp. 298–372.