Euclidean Reconstruction of 3D Scene from 2D Images Following a Non-Rigid Transformation
FIELD OF THE INVENTION
The invention relates generally to the fields of photogrammetry and image processing, and more particularly to systems and methods for generating reconstructions of surface elements of three- dimensional objects in a scene from a set of two-dimensional images of the scene. The invention specifically provides an arrangement for facilitating such reconstruction using images recorded by, for example, a rig supporting a set of optical sensors, such as cameras, which record the scene. following a non-rigid transformation of the rig after the rig has been calibrated.
BACKGROUND OF THE INVENTION
Reconstruction of surface features of three-dimensional objects in a scene from a set of two- dimensional images of the scene has been the subject of research since the late 19th century. Such reconstruction may be useful in a number of areas, including obtaining information about physical (three-dimensional) characteristics of objects in the scene, such as determination of the actual three- dimensional shapes and volumes of the objects. Reconstruction has also recently become particularly important in, for example, computer vision and robotics. The geometric relation between three-dimensional objects and the images created by a simple image recorder such as a pin-hole camera (that is, a camera without a lens) is a source of information to facilitate a three-dimensional reconstruction. Current practical commercial systems for object reconstruction generally rely on reconstruction from aerial photographs or from satellite images. In both cases, one or more cameras are used which record images from two locations, whose positions relative to a scene are precisely determinable. In reconstruction from aerial photographs, one or more cameras mounted on an airborne platform may be used; if one camera is used, information from obj ects on the ground whose relative positions are known can be used in the reconstruction, whereas if more cameras are used. the geometries of the cameras relative to each other are fixed in a known condition, which information can be used in the reconstruction. With satellites, the positions and orientations of the satellites can be determined with great accuracy, thereby providing the required geometrical information required for reconstruction with corresponding precision. In any case, reconstruction
of the desired objects shown in the images can be performed from two-dimensional photographic or video images taken from such an arrangement
Generally, reconstruction methods are non-hnear and they generally do not behave well in the presence of errors in measurement of the vaπous camera calibration parameters and in the images from which the objects are to be reconstructed Conventional reconstruction methods rely on the successful decoupling of two sets of parameters known as intrinsic and extnnsic parameters The extπnsic parameters are related to the external geometry or arrangement of the cameras, including the rotation and translation between the coordinate frame of one camera m relation to the coordinate frame of the second camera The intrinsic parameters associated with each camera is related to the camera's internal geometry in a manner that descπbes a transformation between a virtual camera coordinate system and the true relationship between the camera's image plane and its center of proj ection (COP) The intrinsic parameters can be represented by the image's aspect ratio, the skew and the location of the pnncipal point, that is, the location of the intersection of the camera's optical axis and the image plane (Note that the camera's focal length is related to the identified intrinsic parameters, in particular the aspect ratio, and thus it need not be considered as a parameter )
These intrinsic and extπnsic parameters are coupled together and it is possible to recover the Euclidean three-dimensional structure of the scene depicted m two views only if these two sets of parameters can be decoupled The precise manner in which the intrinsic and extπnsic parameters are coupled together is as follows If the intrinsic parameters for the cameras are used to form respective three-by-three matπces M and M', and R and "t" represent the rotational and translational external parameters, then for points p and p' having coordinates p = (x,y,Y)T and p' = (x y ,\)
("T" represents the matnx transpose operation) representing the projection in the two images of a single point P m the scene, the coordinate of the points are related by z' ' = zM' RM~'p + M t (1)
where z and z' represent respective depth values relative to the two camera locations
In general, there are two conventional methods for reconstruction In one method, the values of the internal parameters are determined by a separate and independent "internal camera calibration" procedure that relies on images of specialized patterns In the second reconstruction method, more than two views of a scene are taken and processed and the two sets of parameters are decoupled b
assuming that the internal camera parameters are fixed for all views. Processing to determine the values of the parameters proceeds using non- linear methods, such as recursive estimation, non-linear optimization techniques such as Levenberg-Marquardt iterations, and more recently projective geometry tools using the concept of "the absolute conic." One significant problem with the first approach (using a separate internal camera calibration step) is that even small errors in calibration lead to significant errors in reconstruction. The methods for recovering the extrinsic parameters following the internal calibration are known to be extremely sensitive to minor errors in image measurements and require a relatively large field of view in order to behave properly. In the second approach (using more than two views of a scene) the processing techniques are iterative based on an initial approximation, and are quite sensitive to that initial approximation. In addition, the assumption that the internal camera parameters are fixed is not always a good assumption.
U. S. Patent No. 5 ,598,515, issued January 28, 1997, in the name of Amnon Shashua, entitled "System And Method For Reconstructing Surface Elements Of Solid Objects In A Three- Dimensional Scene From A Plurality Of Two Dimensional Images Of The Scene" (hereinafter referred to as the "Shashua patent") describes an arrangement for generating a three-dimensional reconstruction of a scene from a set of two images of the scene which are recorded at diverse locations. In the arrangement described in the Shashua patent, the reconstruction is generated without requiring recovery of the extrinsic parameters, as in the first approach, and also without requiring iterative approximation, as required in the second approach. The arrangement described in the Shashua patent can be used in connection with a rig including, for example, two optical sensors, which can be directed at a scene from two diverse locations to record the set of images required for the reconstruction; alternatively, the images can be recorded using a single optical sensor which records a set of images, one image from each of the two locations.
Generally, during a reconstruction from a set of images in the arrangement described in the Shashua patent, the rig performs a calibration operation to generate a "projective-to-Euclidean" matrix. The projective-to-Euclidean matrix relates coordinates in a projective coordinate system to coordinates in a Euclidean coordinate system to facilitate generation of a Euclidean reconstruction from the projective reconstruction generated by the arrangement described in the Shashua patent. Calibration generally requires use of "control points" in a scene that is used for the calibration operation whose Euclidean positions relative to each other are known. Several approaches have been used for calibration. In one approach, Euclidean information in the form of pre-measured control
points or Euclidean constraints such as distances between points in the scene or angles between lines or edges in the scene need to be known. Another approach avoids the requirement of having Euclidean information, but does require multiple overlapping images recorded by moving of a single optical sensor or rig of optical sensors. If, following the calibration operation, the rig were to undergo a rigid transformation, for example, when the rig is moved but the optical sensors are fixed relative to each other (that is, there is no change of the positions of their centers of projection relative to each other, and the optical sensors are not tilted or panned relative to each other) and there are no changes in their focal length or aspect ratio, the projective-to-Euclidean matrix is preserved across the transformation, the same projective-to-Euclidean matrix can be used during reconstruction after the transformation as was generated during the calibration operation before the transformation. However, if the transformation is non-rigid, for example, if there is a change in focal length in connection with an optical sensor, which can facilitate focusing onto the scene from the new position, or if the optical sensors have been tilted or panned relative to another to direct them at a desired portion of the scene, or if a change has occurred in connection with the aspect ratio of an optical sensor, the projective-to-Euclidean matrix after the transformation will be different from that used before the transformation, which would necessitate performing another calibration operation. A problem arises, however, that control points may not be available in the scene as recorded following the non-rigid transformation.
SUMMARY OF THE INVENTION
The invention provides anew and improved system and method for facilitating reconstruction of surface elements of three-dimensional objects in a scene from a set of two-dimensional images of the scene following a non-rigid transformation, without requiring control points whose Euclidean positions are known following the non-rigid transformation, and without requiring multiple images be recorded by respective optical sensors before and after the non-rigid transformation.
In brief summary, the invention provides an arrangement for use in connection with a system that generates a Euclidean representation of surface elements of objects of a scene, from a projective reconstruction, following a non-rigid transformation in connection with optical sensors which record the images, using a projective-to-Euclidean matrix generated before the non-rigid transformation. Essentially, the arrangement determines changes which occur when a non-rigid transformation occurs in connection with the optical sensors, when the centers of projection of the optical sensors are fixed in relation to each other, in relation to changes in the positions of the epipoles on the
respective images as recorded after the non-rigid transformation. Following a non-rigid transformation, the arrangement first determines respective relationships between the coordinates of the epipoles before and after the non-rigid transformation and, using those relationships and the set of images recorded after the non-rigid transformation, a set of processed images essentially undoing the non-rigid aspects of the non-rigid transformation is generated. Using at least some of the processed images, the projective representation is generated, and, using projective representation the projective-to-Euclidean matrix generated prior to the non-rigid transformation, the Euclidean representation is generated.
Depending on the type of non-rigid transformation, three, four or five images, in respective sets of images recorded before and after a non-rigid transformation, may be used to enable the arrangement to generate the processed images after the non-rigid transformation. In particular, using five images in sets recorded before and after the non-rigid transformation, the non-rigid transformation can include, for example, changes in focal length of one or more of the optical sensors which are used to record the images to facilitate focusing on object(s) in the scene, tilting and/or panning of one or more of the optical sensors, and changing the aspect ratio of the image(s) as recorded by respective optical sensors. On the other hand, with as few as three images in each set, a non-rigid transformation may include, for example, changes in both the focal length(s) of one or more of the optical sensors, as well as a pan, or lateral movement of one or more of the optical sensors 12(s), and, with four images in each set, a non-rigid transformation may also include tilting as well as panning.
BRIEF DESCRIPTION OF THE DRAWINGS
This invention is pointed out with particularity in the appended claims. The above and further advantages of this invention may be better understood by referring to the following description taken in conjunction with the accompanying drawings, in which:
FIG. 1 schematically depicts a system including an omni-configurational rig and a Euclidean reconstruction generator for reconstructing three-dimensional surface features of objects in a scene from a plurality of two-dimensional images of the scene, constructed in accordance with the invention;
FIG. 2 schematically depicts a sensor useful in the omni-configurational rig depicted in FIG. 1 ;
FIG. 3 is a flowchart depicting operations performed by the Euclidean reconstruction generator depicted in FIG. 1, in reconstructing three-dimensional surface features of objects in the scene from a plurality of two-dimensional images of the scene; and
FIG. 4 schematically depicts a second embodiment of a rig useful in the system depicted in FIG. 1, described herein as a "catadioptric" rig.
DETAILED DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT
FIG. 1 schematically depicts a system 10 including an omni-configurational rig and a Euclidean reconstruction generator 1 for reconstructing three-dimensional surface features of obj ects in a scene from a plurality of two-dimensional images of the scene, constructed in accordance with the invention. The system 10 generates reconstruction of surface features of objects in following a non-rigid transformation in connection with the rig, if the parameters of system 10 were properly determined prior to the transformation, even if no elements of the scene following the transformation were in the scene prior to the transformation. The rig 11 includes one or more optical sensors (in the following it will be assumed that the rig 11 includes a plurality of optical sensors) 12(1) through 12(S) (generally identified by reference numeral 12(s)) which are generally directed at a three- dimensional scene 13 for recording images thereof. The optical sensors 12(s) are mounted in a support structure generally identified by reference numeral 14. Structural details of the optical sensors 12(s) used in the rig 11 will be described below in connection with FIG. 2. Generally, each optical sensor 12(s) includes a lens and an image recording device such as a charge-coupled device (CCD), the lens projecting a image of the scene 13 onto a two-dimensional image plane defined by the image recording device. It will be appreciated that the collection of optical sensors 12(s) thus can contemporaneously record "S" two-dimensional images of the three-dimensional scene 13. Each optical sensor 12(s) generates a respective SENSOR_s_OUTPUT sensor "s" output signal (index "s" ranging from one to S, inclusive) that is representative of the image recorded by its image recording device.
System 10 further includes a rig motion control circuit 15. The rig motion control circuit 15 generates signals generally identified as "RIG_MOT_CTRL" rig motion control signals, which control motion by the support 14 to facilitate recording of images of the scene 13 from a plurality of orientations, as well as to facilitate recording of images of various scenes (additional scenes not shown). The rig motion control circuit 15 also generates SENSOR_s_MOT_CTRL sensor "s" motion control signals (index "s" ranging from one to S, inclusive) for controlling predetermined
optical characteristics of the respective optical sensor 12(s). In particular, in one embodiment, the SENSOR_s_MOT_CTRL signal controls the positioning of the image recording device of the respective image sensor 12(s) relative to the lens to selectively facilitate focusing, and tilting and panning, as will be described below in connection with FIG. 2.
As noted above, each optical sensor 12(s) generates a respective SENSOR_s_OUTPUT sensor "s" output signal that is representative of the image recorded by its image recording device 12(s). The Euclidean reconstruction generator 16 receives the SENSOR_s_OUTPUT signals from all of the optical sensors and generates information defining a three-dimensional Euclidean reconstruction of the scene 13, as well as other scenes to which the rig motion control 15 may direct the rig 11, from sets of images as recorded by the optical censors 12(s). A set of images may include two or more images of the scene. Generally, following a calibration operation during which the rig motion control circuit 15 directs the optical sensors 12(s) at objects in a scene to obtain a set of images used by Euclidean reconstruction generator 16 to generate calibration information, the Euclidean reconstruction generator 16 can, from a set of images recorded after a non-rigid transformation in connection with the rig 11 , operate in accordance with operations described below in connection with FIG. 3 to generate Euclidean representations of surface features of object(s) representations of the scene at which the support 14 directs the optical sensors 12(s) without requiring additional calibration operations.
FIG. 2 schematically depicts, in section, an optical sensor 12(s) constructed in accordance with the invention. The optical sensor includes a housing 20 having a circular forward opening in which a lens 21 is mounted. Behind the lens 21 is positioned a image recording device 22, such as a CCD device, which generates the SENSOR_s_OUTPUT signal for the optical sensor 12(s). The image recording device 22 defines a plane which forms an image plane for the optical sensor 12(s), on which the lens 21 projects an image. The position and orientation of the image recording device 22 relative to the lens 21 is controlled by a motor 23, which, in turn, is controlled by the SENSOR s MOT CTRL sensor "s" motor control signal generated by the rig motion control circuit 15.
Generally, the lens 21 will be mounted in the housing 20 relatively rigidly, that is, in a manner such that the position and orientation of the lens 21 is fixed relative to the support 14. On the other hand, the motor 23 can enable the image recording device 22 to move relative to the lens 21. The manner in which the image recording device 22 can be permitted to move relative to lens
21 during a transformation, and still allow the Euclidean reconstruction generator 16 to generate a reconstruction from images recorded after the transformation, depends on the number of images recorded before and after the transformation. In particular, if at least three images are recorded before and after the transformation, the respective motors 23 thereof can enable the image recording device 22 to, for example, move backwards and forwards in the sensor 12(s) relative to the lens, that is, closer to or farther away from the lens 21 , during the transformation, which can facilitate focusing of the image of the scene cast by the lens on the respective sensor's image recording device 22. In addition, if at least four images are recorded before and after the transformation, the respective motors 23 thereof can enable the respective image recording devices 22 to move relative to the lens 21 as described in connection with the three image sensor rig described above, and in addition to, for example, change their respective angular orientations (that is, tilting and panning) relative to the lens 21 , during the transformation. Finally, if at least five images are recorded before and after the transformation, in addition to allowing for focusing, tilting and panning of the respective image recording devices, for example, the aspect ratios of the respective image recording devices can be changed during the transformation. These modifications, that is, focusing, tilting and panning, and changes in the aspect ratio, which are referred to above as "non-rigid parameters," can be performed individually for each respective image sensor 12(s). If any such modifications are made when, for example, the rig 11 is moved from one location to another, the modification is referred to as a "non- rigid transformation."
As noted above, the Euclidean reconstruction generator 16, after it has been calibrated prior to a non-rigid transformation, can, after a non-rigid transformation, generate three-dimensional Euclidean representations of surface features of object(s) in the scene 13 as recorded by optical sensors 12(s) without requiring additional calibration operations. Operations performed by the Euclidean reconstruction generator in that operation will be described in connection with the flowchart depicted in FIG. 3. By way of background, the Euclidean reconstruction generator 16 processes sets of images recorded by the optical sensors 12(s) before and after a non-rigid transformation. During a calibration operation performed before a non-rigid transformation, the Euclidean reconstruction generator 16 generates a four-by-four "projective-to-Euclidean" matrix W, which, as is known in the art, is determined using the set of images recorded before the non-rigid transformation and other Euclidean information. In addition, if the Euclidean reconstruction generator is to generate a Euclidean representation of the scene before the non-rigid transformation,
it can construct a projective representation using the images in the set recorded before the non-rigid transformation, and, using the projective representation and the projective-to-Euclidean matrix W, generate the Euclidean representation.
If, following the calibration operation, the rig 11 were to undergo a rigid transformation and records images of a scene, which may be the same scene or a different scene, the Euclidean reconstruction generator 16 can generate a Euclidean representation of the scene by constructing a projective representation using the images recorded after the rigid transformation, and thereafter applying the same projective-to-Euclidean matrix W to generate the Euclidean representation. However, if the transformation is non-rigid, the Euclidean reconstruction generator 16 cannot use the projective-to-Euclidean matrix W generated prior to the non-rigid transformation to generate a Euclidean representation from a projective representation constructed using the images in the set recorded after the non-rigid transformation. As will be shown below, however, if, during a non-rigid transformation, the positions of the centers of projections of the positions of the optical sensors 12(s) remain fixed relative to each other, the Euclidean reconstruction generator 16 can use the projective- to-Euclidean matrix W generated prior to the non-rigid transformation to generate a Euclidean representation from a projective representation if the projective representation is constructed from a set of processed images, where the processed images correspond to the images in the set recorded after the non-rigid transformation, multiplied by respective inverse collineation matrices A, ' . Essentially, during a non-rigid transformation in which the centers of proj ection remain fixed relative to each other, the image planes at which the respective images are recorded undergo respective mappings defined by the collineations A„ that is, IP,'=A, IP, , where "IP," refers to the "i-th" image plane, and A, refers to the three-by-three collineation matrix for the "i-th" image plane.
This will be clear from the following. If a point P on the surface of an object in the scene 13 projects onto points p, (i=l,..., I) in the respective images in a set, then the coordinates (x„y„l) of the point p, in the image "i" are related to the coordinates (x^y^l) of the point pj (j ≠i) in the image "j" as recorded by optical sensor 12(j) by p FvPl = 0 (2),
where Fυ is the three-by-three fundamental matrix relating the two images and "T" represents the transpose operation. Equation (2) relates the coordinates between images "i" and "j" for all points P (not separately shown) in the scene 13 which are projected onto the "i-th" image as points p, and
onto the "j-th" image as points p,. A fundamental matrix Fy exists relating the coordinates of the points projected from the scene 13 onto each respective pair of images "i" and "j." The values of the matrix elements for the fundamental matrices as among the various pairs of images may be different. Since the centers of projection of the various optical sensors 12(s) will be in the same relative positions before and after the transformation of the rig 11 , the fundamental matrices for the pairs of images before and after the transformation will be related by
Fv ≡ A]F } A, (3),
for collineations A,,..., A, where Fy is the fundamental matrix between the "i-th" and "j-th" images prior to the non-rigid transformation, F, is the fundamental matrix between the "i-th" and "j-th" images after the non-rigid transformation, each collineation A, is a three-by-three matrix, and " = " means equality up to a scale.
In addition, in each set, each image will have a center of projection, which, in turn, will be projected as an epipole onto the image planes of the other images in the respective set. Thus, the center of projection of the "i-th" image is, in turn, projected onto the image plane of the "j-th" image as epipole ey before the non-rigid transformation and epipole e after the non-rigid transformation, with each epipole ey and e',. having coordinates in the respective image plane. (Depending on the sizes of the respective images, the points of epipoles ey and e',. may not actually be in the respective images, but they will be in the image planes therefor.) In that case, since, for each pair of images, the fundamental matrix F„ for the pair is related to the coordinates of the epipole e„ by F, e = 0
then from equation (3) the coordinates of the epipoles ey before the transformation are related to the coordinates of the epipoles e',. after the transformation by
0 = F </oe„ vo = A '
A Joe
η vo (4)
T for all i≠jo- Since F' e = 0 , it follows from equation (4) that
A £ — £ (5).
Jo 'Jo ~ ! o
Therefore, the collineation matrix A for each j0 is determined by the four pairs of matching
epipoles et , e h for all i≠j0. Since each two-dimensional collineation A is uniquely determined
by four matching points on the two-dimensional plane, five images are needed to be able to generate the collineations A;, i=l,...,I for the non-rigid transformation, in which case the number of images "I" in each set will equal "five." Essentially, equations (3) through (5) represent the fact that the changes in the image planes which occur when the rig 11 undergoes a non-rigid transformation, while maintaining the positions of the centers of projection of the sensors 12(s) fixed relative to each other, can be determined in relation to the changes in the positions of the epipoles on the respective images planes which occur when the rig 11 undergoes the non-rigid transformation.
Thus, after the collineations A; are determined in accordance with equation (5) following a non-rigid transformation, the effects on the respective image planes that resulted from the non-rigid transformation can effectively be undone by multiplying each image and its respective collineation, that is
P A:'P (6)
where "p,u" refers to the "i-th" image (or more specifically the coordinates of the points in the "i-th" image) in the set after the non-rigid transformation, for which the non-rigid effects of the transformation have been undone. Thereafter, a Euclidean representation of the scene whose images were recorded following the non-rigid transformation by first generating a projective representation using at least some of the images p,u as processed in accordance with equation (6), and then applying the projective-to-Euclidean matrix W to generate the Euclidean representation.
With this background, the operations performed by the Euclidean reconstruction generator 16 will be described in connection with the flowchart depicted in FIG. 3. With reference to FIG. 3, initially the system 10 will perform a calibration operation to generate the projective-to-Euclidean matrix W in connection with a set of images recorded prior to a non-rigid transformation (step 100). Operations performed by the system in step 100 in determining the projective-to-Euclidean matrix W of the epipoles are known to those skilled in the art and will not be described further herein. Following step 100, the rig motion control 15 enables the rig 11 to perform a non-rigid transformation, and in that process it can enable the optical sensors 12(s) to train on a scene and facilitate recording of images thereof (step 101). The scene on which the optical sensors 12(s) are trained in step 101 can be, for example, the same scene as was used in step 100 but from a different position or orientation, overlap with the scene as was used in step 100, or be a completely different scene, that is, a scene in which none of the objects in the scene were in the scene used in step 100.
Following the non-rigid transformation, the Euclidean reconstruction generator 16 performs a number of steps to determine the collineations A, for the respective images recorded in step 101 and generate the images p,u for the set after the non-rigid transformation, for which the non-rigid effects of the transformation have been undone. Initially, the Euclidean reconstruction generator 16 determines the coordinates in each of the image planes for the epipole e'υ associated with each of the other images in the set (step 102). Thereafter, for each image "j0" and the coordinates of the epipoles e in the image plane associated therewith for the optical sensors 12(s) which records the respective
"i-th" image, the Euclidean reconstruction generator 16 uses equation (5) to generate the collineations A, for the respective images p' recorded following the non-rigid transformation (step 103). Thereafter, the Euclidean reconstruction generator 16, using the collineations A Jo generated
in step 103, generates the generate the set of images p,u in accordance with equation (6) (step 104). After the set of images p,u have been generated, the Euclidean reconstruction generator 16 uses those images to construct a projective representation of the scene after the non-rigid transformation, and, using that projective representation and the projective-to-Euclidean matrix W generated in step 100, the Euclidean representation (step 105). Operations performed by the Euclidean representation generator 16 in step 105 to construct a projective representation and a Euclidean representation using the projective representation and projective-to-Euclidean matrix W are known to those skilled in the art and will not be described further herein.
As described above, if the rig records sets of five images ("I"=5) before and after a non-rigid transformation, and if the positions of the optical sensors 12(s) at which the images are recorded are such that the positions of the centers of projection are fixed relative to one another, and the images are recorded such that the optical sensors 12(s) in "general" positions (that is, generally no more than four of the images are recorded with the optical sensors being in the same plane and no more than three with the optical sensors along the same line), the Euclidean reconstruction generator 16 will be able to determine the collineations A„ thereby facilitating the generation of a Euclidean representation following a non-rigid transformation. Illustrative types of changes that may happen during such a non-rigid transformation include, for example, changes in focal length of one or more of the optical sensors 12(s) to facilitate focusing on object(s) in the scene, tilting or panning of one or more of the optical sensors 12(s), and changing the aspect ratio of the image(s) as recorded by respective optical sensors 12(s). If it is desired to further constrain the rig 1 1 during a non-rigid
transformation, for example, to not allow one or more of these types of changes (changes in focal length, tilting and/or panning, and changes in aspect ratio) during a non-rigid transformation, the Euclidean reconstruction generator 16 can perform the operations described above if the rig 11 has fewer than five images in each set. For example, if the rig 11 records three images in each set, the Euclidean reconstruction generator 16 can determine the collineations and generate a Euclidean representation after a non-rigid transformation in which, for example, changes can occur in both the focal length(s) of one or more of the optical sensors 12(s) and one or more of the optical sensors 12(s) undergoes a pan, or lateral movement. Similarly, if the rig 11 records four images, the non- rigid transformation can also include tilting as well as panning. In either case, by allowing for changes in focal length, the system can allow for re-focusing after the transformation, which is particularly helpful if the scene 13 includes surfaces which are slanted relative to the rig 11.
Thus, it will be apparent that, if "E" represents the number of types of parameters describing a non-rigid transformation, and "X" represents the number of images recorded before and after a non- rigid transform, the relationship between X and E corresponds to 2(X-1)>E. This relation holds in the case of a rig, such as rig 11, in which the image plane homographies (reference the Shashua patent) do not coincide. On the other hand, if all of the image plane homographies to coincide, the relation between X and E corresponds to 2X(X-1)>E. In that case, a rig which maintains the image planes of the optical sensors such that the image plane homographies coincide, can allow for all of the above-described non-rigid transformations (that is, changes in focal length, tilt and pan, and changes in aspect ratio) with three optical sensors, and for fewer transformations only two optical sensors are required.
FIG. 4 schematically depicts a portion of a rig, referred to herein as a "catadioptric rig" 150 which constrains the image planes of two images such that their image plane homographies coincide. With reference to FIG. 4, the catadioptric rig 150 includes a mirror arrangement 151 and a unitary optical sensor 152. The mirror arrangement 151 includes a plurality of mirrors represented by thick lines identified by reference numerals 151(1) through 151 (4). Mirrors 151(1) and 151 (2) are oriented to reflect light scattered from a point 154 in a scene onto respective mirrors 151(3) and 151(4), which, in turn, are oriented to reflect the light onto a single image recording device 153. Respective beams of light as scattered from point 154 are represented by the dashed lines 155(1) and 155(2). The image recording device 153 , in turn, records images of the point 154 in separate image recording areas represented by dashed boxes 153(1) and 153(2). The image recording device 153 may be
similar to the image recording device 22 used in each optical sensors 12(s) described above in connection with FIG.2, and may, for example, comprise CCD (charge-coupled device) devices. The optical sensor 152 may include a motor similar to motor 23 for moving the image recording device 153. As with rig 11, the optical sensor 152 used in the catadioptric rig 150 provides a signal representing the images to a Euclidean reconstruction generator (not shown) which can operate in a manner similar to that described above in connection with Euclidean reconstruction generator 16.
Although the catadioptric rig 150 has been described as having a mirror arrangement 151 that enables two images 153(1) and 153(2) to be recorded by the image recording device 153, it will be appreciated that a similar mirror arrangement may be provided to enable three images or more to be recorded by the image recording device. In addition, the catadioptric rig, instead of having a single, relatively large image recording device on which the images are recorded in respective image recording areas, may have a plurality of image recording devices at each of the image recording areas, with the image recording devices constrained to lie in a single image plane.
In the methodology described above in connection with FIG. 3, the Euclidean representation generator 16 generates a Euclidean representation of a scene following a non-rigid transformation by generating, from a set of images p, recorded after the non-rigid transformation, a set of images p,u for which the non-rigid aspects of the transformation have effectively been undone. Instead, the Euclidean representation generator 16 can generate a Euclidean representation from a projective representation generated from a set of images p' generated after the non-rigid transformation, without the necessity of generating a set of images p,u for which the non-rigid aspect of the transformation have been undone, and making use of a projective-to-Euclidean matrix W This will be clear from the following. Prior to a non-rigid transformation, the Euclidean representation generator 16 can, during a calibration operation, generate respective three-by-four camera matrices G, for the respective "i-th" image as
P, ≡ G,P (7),
where P represents respective points in the scene defined by respective Euclidean coordinates (x,y,z,l), "p," represents respective points projected onto the "i-th" image defined by coordinates (x,y,l) in the image plane of the respective "i-th" image, and " = " represents equality up to a scale. From each camera matrix G„ the position, in Euclidean coordinate, of the respective center of projection c, associated with each image can be determined from
G,c, = 0 (8).
After the non-rigid transformation, the camera matrices G', and centers of projection c', can also be determined in a manner described in the Shashua patent. Since the sets of centers of projection c, and c', are in the same relative positions, the set of centers of projection prior to the non-rigid transformation are related to the set after the non-rigid transformation by
W c ≡ c, (9).
Five points are needed to determine the projective-to-Euclidean matrix using equation (9). since each point contributes three equations and equation (9) determines the projective-to-Euclidean W up to a scale. After the projective-to-Euclidean matrix is determined, it can be used to construct the Euclidean representation of the scene for the image set recorded after the non-rigid transformation.
The invention provides a number of advantages. In particular, the invention provides an arrangement which facilitates Euclidean reconstruction of surface features of objects in a scene after a non-rigid transformation of the rig 11 recording the scene, without additional Euclidean information after the non-rigid transformation and without requiring multiple images be recorded by respective optical sensors before and after the non-rigid transformation. Thus, the arrangement can be calibrated once in, for example, a laboratory in which Euclidean information can be readily provided, and thereafter used at other locations without requiring Euclidean information at those locations.
It will be appreciated that a number of modifications may be made to the system 10 as described above in connection with FIGS. 1 through 4. For example, although the Shashua patent was described as describing various methodologies which could be implemented by system 10, including, for example, as making use of methodologies for determining the fundamental matrices Fy and epipoles e,j (step 100) and reconstruction (step 105), it will be appreciated that other methodologies can be used for one or all operations.
Furthermore, although the system 10 has been described as making use of optical sensors 12(s) which record images using CCD (charge-coupled device) recorders, it will be appreciated that the optical sensors may comprise any convenient mechanism for recording images.
In addition, although the system 10 has been described as making use of, in one embodiment, five optical sensors 12(s), or, otherwise stated, a number of optical sensors 12(s) to facilitate the recording of five images (which may include one or more catadioptic sensors 120 capable of recording two images, or similar sensors capable of recording more than two images), which will allow for determining the fundamental matrices F',j following a non-rigid transformation which can include, for example, changes in focal length of one or more of the optical sensors 12(s) to facilitate focusing on object(s) in the scene, tilting or panning of one or more of the optical sensors 12(s), and changing the aspect ratio of the image(s) as recorded by respective optical sensors 12(s), a system in accordance with the invention may instead include three optical sensors 12(s) (S=3) (or number of optical sensors to facilitate recording three images) to allow determination of the fundamental matrices F'„ and generate a three-dimensional reconstruction after a non-rigid transformation in which, for example, changes can occur in both the focal length(s) of one or more of the optical sensors 12(s) and one or more of the optical sensors 12(s) undergoes a pan, or lateral movement. Similarly, a system in accordance with the invention may instead include four optical sensors 12(s) (S=4) (or number of optical sensors to facilitate recording four images) to allow determination of the fundamental matrices F'y and generate a three-dimensional reconstruction following a non-rigid transformation which can further include tilting as well as panning. In either case, by allowing for changes in focal length, the system can allow for re-focusing after the transformation, which is particularly helpful if the scene 13 includes surfaces which are slanted relative to the rig 11.
Furthermore, although the invention has been described in connection with a system 10 including a rig 11 on which one or more image sensors 12(s) are mounted, it will be appreciated that the images may instead be recorded by one or more image sensors held by, for example, respective operators, who direct the image sensor(s) at the scene to record images thereof.
It will be appreciated that a system in accordance with the invention can be constructed in whole or in part from special purpose hardware or a general purpose computer system, or any combination thereof, any portion of which may be controlled by a suitable program. Any program may in whole or in part comprise part of or be stored on the system in a conventional manner, or it may in whole or in part be provided in to the system over a network or other mechanism for transferring information in a conventional manner. In addition, it will be appreciated that the system may be operated and/or otherwise controlled by means of information provided by an operator using operator input elements (not shown) which may be connected directly to the system or which may
transfer the information to the system over a network or other mechanism for transfeπing information in a conventional manner.
The foregoing description has been limited to a specific embodiment of this invention. It will be apparent, however, that various variations and modifications may be made to the invention, with the attainment of some or all of the advantages of the invention. It is the object of the appended claims to cover these and such other variations and modifications as come within the true spirit and scope of the invention.
What is claimed as new and desired to be secured by Letters Patent of the United States is: