GB2389500A

GB2389500A - Generating 3D body models from scanned data

Info

Publication number: GB2389500A
Application number: GB0308234A
Authority: GB
Inventors: Timothy Bryan Niblett; Iain Mckinley; Anthony Ashbrook
Original assignee: Virtual Mirrors Ltd
Current assignee: Virtual Mirrors Ltd
Priority date: 2002-04-20
Filing date: 2003-04-10
Publication date: 2003-12-10
Also published as: GB0209080D0; GB0308234D0

Abstract

Methods of generating 3D models of object from data generated by 3D scanners include: aligning a plurality of points clouds obtained from a scanner; bringing a set of 3D data points obtained by an initial alignment into precise registration with a mean body surface derived from the point clouds; fitting an existing mesh-type body model template to a set of 3D data points; ensuring consistent parameterization of mesh-type body model templates when fitted to point cloud data; optimising templates used for fitting to point cloud data; integrating a plurality of texture maps obtained from a scanner and corresponding to a plurality of point clouds into a single texture map for application to a mesh-type body model derived from the point cloud data; and modelling the hair of a subject for use in a body model derived from point cloud data.

Description

Body Models from Scanned Data 1 Introduction

1.1 Field of the Invention

The present invention relates to methods for generating 3D models of 3D objects, partc ularly the human body, from data generated by 3D body scanners.

1.2 Background to the Invention

3D body scanners generally use a plurality of cameras or camera positions to obtain a plurality of images of a body from different viewpoints. The output from the scanner consists of a set of point clouds, one for each viewpoint. Each pout cloud;onsists of a set of points in 3D space, and information about the texture at each point. For many practical applications, including "virtual tailoring" applications as described, for example, In the present applicant's co-pending international Patent Application No PCT/GB02/00205, it is necessary or desirable to convert the point cloud data output from the scanner into a coherent or "conformable" 3D body model, generally comprising a surface mesh defining the 3D surface of the body and a texture map representing the colors and texture of the surface. A body model of this type is capable of being modified ("conformed") in a variety of ways, including changes of pose, scaling, deformation etc., and of being animated.

The present invention concerns methods that facilitate the generation of useful body mod-

els from point cloud data generated by 3D body scanners.

1.3 Summary of the Invention

first aspect ol the invention provides a method of aligning a plurality of point clouds obtained from a 3D body scanner, each of which defines part of the surface of a scanned body, by rotating and translating the various point clouds. The method employs a gcn

cralized Iterated Closest Point (ICP) algorithm to perform a global rigid body alignment of the set of point clouds. A mean surface is calculated by taking a subset of the points from each point cloud and identifying points from each subset that match points in the other subsets. The mean surface is derived from the mean values of matched pomts. The points of each point cloud are then aligned to the mean surface in an iterative manner until convergence is achieved (as judged in accordance with predetermined criteria). The method may include exactly aligning selected points that are identified as corresponding to predetermined landmarks on the scanned body. This first aspect of the invention and embodiments thereof are described In more detail in Section 3. l.2 below.

A second aspect of the invention provides a method of bringing a set of 3D data points obtained by an initial alignment of a plurality of point clouds obtained from a 3D body scanner (e.g. in accordance with the method of the first aspect) into precise registration with a mean body surface derived from the point clouds (such as that derived using the method of the first aspect). This method employs a general ized ICP algorithm to perform an elastic (or nonlinear) warp (or registration). Points on the mean surface are matched to points in the previously aligned data cloud to define an optimal elastic registration be-

tween the mean surface and the previously aligned data cloud surface, and hence to define a warp that is applied to the points of the previously aligned data clouds in an iterative manner until convergence is achieved (as judged in accordance with predetermined crite-

ria). This second aspect of the invention and embodiments thereof are described in more detail in Section 3.2 below.

A third aspect of the invention provides a method of fitting an existing mesh-type body model template to a set of 3D data points obtained from a 3D body scanner (e.g. from a plurality of data clouds that have been processed in accordance with the first and/or second aspects of the Invention). A space warp i s determined that fits the template model to the surface defined by the 3D data points, representing the changes of pose and shape that fit the template to the 31) data points. An initial fit of the template model to the 3D data points may be established by identifying points of correspondence between corre-

sponding landmarks on the template and the 3D data points. The spatial relationships between the template landmarks and data landmarks are used to define a radial basis warp

that is applied to the template so as to fit the template to the data. This process may be automated by employing an algorithm that initially aligns the template with the 3D data points using oriented bounding boxes and then refines the fit using a non^rigid (elastic or nonlinear) registration (or warp) algorithm (such as that employed in the second as-

pect of the invention). The initial fit may be refined by defining additional, substantially uniformly distributed landmarks on the template and calculating a warp that brings the additional landmarks into alignment with the data. This third aspect of the invention and embodiments thereof are described in more detail in Section 4.3 below.

A fourth aspect of the invention provides a method of ensuring consistent parameteriza^ lion of mesh-type body model templates when fitted to point cloud data obtained from a 3D body scanner (e.g. using the method of the third aspect). Fitting is done globally, by optimizing the initial fit provided by the third aspect of the invention. The overall deformation, measured as the deviation from an amine warp, is minimized via an iterative process., described in Section 4.6 below.

A fifth aspect of the invention provides a method for optimizing meshtype body model templates used for fitting to point cloud data obtained from a 3D body scanner (e.g. as in the third aspect of the invention). The Parameterization provided by the fourth aspect of the invention mimmizcs (globally) the degree of warping that must be applied to a template model to fit the scanned data. The fit can be improved by optimizing the tem-

plate. The template is optimized by averaging a representative sample of fitted models, so as to generate an average body model template. This fifth aspect of the invention and embodiments thereof are described in more detail in Section 4.6.1 below.

A sixth aspect of the invention provides a method of integrating a plurality of texture maps obtained from a 3D body scanner and corresponding to a plurality of point clouds obtained from the body scanner into a single texture map for application to a mesh-type body model derived from the point cloud data. Patch borders are employed to ensure seamless integration of the original plurality of texture maps. Patches on the body model mesh are generated corresponding to the original texture maps. Vertices along the borders of the patches are assigned average texture values based on weighted averages of values from the corresponding original texture maps. These average texture values are mapped

back to the original texture maps and difference images derived therefrom. The difference images are added to the original texture maps and the resultant texture maps are projected back onto the body model. Gaps (holes) in the resultant texture map may be filled by means of texture synthesis based on texture data adjacent to the gaps. This sixth aspect of the invention and embodiments thereof are described in more detail in section S below.

A seventh aspect of the invention provides a method of modelling the hair of a subject for use in a body model derived from point cloud data obtained from a 3D body scanner. The method employs video images of the subject's hair from a number of different viewpoints, obtained from the body scanner. The video images are used to construct a smooth 3D surface corresponding to the 3D shape of the hair. Preferably, the surface is constructed using silhouettes derived from the video images. The 3D surface Is textured using texture data derived from the video images. Texturing may be performed using the methods of the sixth aspect of the invention. This seventh aspect of the Invention and embodiments thereof are described in more detail in Section below.

Further aspects and preferred or alternative features of the invention will be apparent from the following description of the preferred embodiments of the invention.

2 Description of the Preferred Embodiments of the In

vention Embodiments of the invention will now be described, by way of example only, with ref-

erence to the accompanying drawings, in which: Fig. I is an example of a cosine-shaded point cloud; Fig. 2 is an example of a raw scan of a human head; Fig. 3 shows the scan of fig. 2 modified with filter in x and z and quadratic convolution; Fig. 4 illustrates the subdivision of a triangular face by the addition of vertex at the the midpoint of each edge and replacing the face by four new faces; Fig. 5 illustrates extra face subdivision to avoid gaps in the detailed mesh;

Fig. 6 illustrates initial alignment of a template (shaded) and data (wireframe) by aligning their oriented bounding boxes; Fig. 7 illustrates elastic registration of a template (shaded) and data (wireframe); Fig. 8 illustrates a template mesh, with associated texture map; Fig 9 i!!ustrates a tcm.plate mesh, with associated texture map from a scan; Fig 10 shows (left) a texture with a missing central region and (right) the same texture with the missing region filled using a texture synthesis algorithm; Fig l l shows (left) another texture with a missing central region and (right) the same texture with the missing region filled using the texture synthesis algorithm; Fig 12 illustrates a neutral skin texture; Fig 13 illustrates a coloured skin texture; Fig 14 illustrates a textured body model with synthetic skin; Fig 15 illustrates a head model showing (left) the head model and some of the body model seen from the view it was created for, (middle) the geometry of the head model in wireframe and (right) the head model from a different view; Fig 16 illustrates (top left) an original image from a scanner camera, (top right) a mask defining pixels beolonging to the head of the original image, (bottom left) a mesh con-

structed for the hair in wireframe, and (bottom right) the mesh constructed for the hair after being textured; Fig 17 illustrates (left) a conformed and textured template model without an additional hair model, (middle) the model with the constructed hair model and (right) an image from the scanner for comparison; Fig 18 is an example of an image of a face with diffuse lighting; Fig 19 shows an image (left) and its calibrated version (right); Fig 20 illustrates the GretagMacbeth colour checker; Fig 2 l is an example of an image in early morning light; Fig 22 shows the image of Fig 21 calibrated with default values;

Fig 23 illustrates the location of colour patches from four comer points; Fig 24 illustrates an initially conformed template mesh; Fig 25 illustrates a data mesh; and Fig 26 Illustrates results for 20 evaluations of the LBEGS algorithm, in which the final average distance of the variable template mesh vertices to the smooth model is 9.43exp-5 m, wih 22632 variables, lambda = 1000 and a solution time of 3.37 minutes.

The increasing availability of 3D body scanners, which are capable of providing high quality 3D scans of human bodies, produces a requirement for efficient and automatic methods for creating 3D body models from scanned data. This paper describes a system which can automatically produce high quality 3D models, including textures, from body scans. The purpose of the modelling system is to provide data with sufficient accuracy to allow for clothing to be built for the scanned bodies, and with sufficient quality to allow for Stylization Body scanners, such as the WB4 scanner from Cyberwnre, or the Trifonn scanner from Wicks arid Wilson produce as output a set of point clouds, each cloud consisting of a set of points in 3-D, together with information about the texture at each sampled 3-D point.

Our focus in this paper will be on models generated by the Wicks & Wilson Triform scanner, where the eight point clouds are organized as images (i. e. are arranged in a regular rectangular structure) and where the texture mapping is generated from images captured with a video camera.

2.1 Creating an Initial Surface Mesh In order to generate an output triangulate mesh from this data, the following semi auto-

matic approach is typically followed.

1. A certain amount of manual cleaning of the point clouds may be undertaken in order to remove spurious points.

2. The point clouds are aligned, typically using a variant of the Iterated Closest Point (lCP) algorithm [Chen and Medioni, 1991, Best and McKay, 1992]. Algnrnent

consists in applying a rigid body transform (a rotation and translation) to each point cloud. Alignment is required because relative orientation between the hard-

warc units which capture each cloud can vary, and because, in the case of the Wicks&Wlson scanner, the point clouds are captured sequentially so there may be body movement between the individual clouds. It is assumed that this movement will be rigid.

3. The point clouds are triangulated. There are a variety of ways this can be done.

Possibly the earliest was [Boissonat, 1984]. In [I-Ioppe et al., 1992] an Implicit surface is constructed. In [Mencl and Muller, 1998] Euclidean Minimum Spanning Trees are constructed. In [Amenta et al., 1998] a Voronoi approach Is used, and in [Bernardini et al., l999l a ball- pivoting algorithm was developed. An alternative is to triangulate the point clouds separately and then zipper the separate meshes together [Turk and Levoy, 1994].

2.2 Mesh templates The output ofthis process Is a triangle mesh. This is for many purposes a sub-optimal rep-

rcscnlation since it codes no body-specific information. We are Interested in conforming a standardized template body to the scanned data. This has several advantages A scan misses data In regions of occlusion typically the groin and underarm re-

gions - and in areas where there is insufficient detail, such as the hands and feet. A template model can be used to fill in missing detail in an economical manner.

The use of a template assists in automatic construction of body models, since tem-

plalc fitting procedures can be robust to noise as well as missing data. Body scan-

ners can be prone to noise in the form of additional data (caused by reflections, etc) which are difficult to deal with automatically other than by use of a template.

A template makes it simpler to provide animation for the body, since it can be designed to be animatable.

When measurements are required from a body, the use of a template means that the measurements can be defined on the template body and read oJJthe conformed body. This requires a consistent parameterization of the conformed template of course, which is one of the principal themes of this paper.

There has been little world In filling template bodies to the output of body scanners. There has however been interest in fitting body templates to silhouette images [llilton et al., 2000] or to video sequences [Fua et al., 1'399, Gu et al., 1998, Kakadiaris and Metaxas, 1995, Placnkers and Fua, 2001]. In both applications a template model is useful as it provides constraints on the output, which allow reasonable guesses as to the actual body shape to made, even though the input data may contain insufficient information.

If we move away from the human body there is a vast literature on elastic matching in the computer vision and medical literature. Elastic matching is a natural way of conforming a template body to scanned data. Probably the first attempt iTI this area was [Bajcsy and Broil, 1982], refined in [Bajcsy and Kovacic, 1989]. Elastic matching is usually done between images in 2-D or volumes in 3-D. Surveys of deformable medical image registration can be found in [Mclnemey and Terzopoulos, 1996, Maintz and Viergever, 1998]. One approach to using elastic matching for surfaces (as opposed to volumes) is [Szeliski and Lavallee, l 996]. Their approach is to impose smoothness constraints on a hierarchical free form deformation (FFD) [Sederberg and Parry, August 1986].

Here the template model is fitted to the data using an approximate radial basis warp.

Points of correspondence are specified on both surfaces and the warp brings these points, and therefore the surfaces, into close alignment. The fitting is optimized by allowing the points to move locally on one of the surfaces to maximize the smoothness of the warp. Most of the points of correspondence are defined automatically after an initial alignment of the model to the data. The initial alignment is found either by aligning a small number of manually specified landmarks or by aligning oriented bounding boxes and then applying an elastic registration algorithm.

2.3 Consistent parameterization The primary requirement for the conformed template is that it should provide a consistent parameterizaiion [Praun et al., 2001] of the scanned body. In [Praun et al., 2001] a set of parameterization is consistent when they (a) share a base domain and (b) respect Features. The first of these is trivia! in our case since the same mesh topology is used for each mesh. The second criterion is more difficult, in that the notion of feature is itself somewhat imprecise.

Our notion of feature is that of an anatomical landmark. Such landmarks are usually defined in terms of a general location together with a precise description relating to the

local surface geometry. For example the Pronasale is defined as the most prominent point on the top of the nose. The general location is the nose, the specific point the most prominent point on the nose. l his definition itself depends on the orientation of the head when the measurement is taken, and in general it Is not possible to provide a definition of the location of landmarks that allows for localization within less than a millimeter.

Our approach, discussed in more detail in Section 4.6 is to develop a procedure which, given a landmark whose true location is p and any point pi,, where IPOL-Pl < e will adjust an estimate Pa of the location of L to its true location p. Typically we would expect e to be of the order of 23cm. The motivation behind this definition is that if we were to ask several people to locate landmarks they would all be placed in slightly different locations. Our procedure ensures that the landmarks are placed in consistent positions by finding a local minimum of an energy measure defined over the point locations.

Rather than use local surface geometry to define the precise location of landmarks, we use the relationship between the scanned data and the template mesh. Specifically, we use the smoothest local mappings between the scanned data and the template to define the position of landmarks. The use of a local mapping is more robust than use of local surface geometry precisely since it is a model-based approach. An additional advantage is that there Is no requirement to define the local surface geometry at each landmark In Section 4.6 we demonstrate that this approach can provide a consistent parametcrizatior..

2.4 Texture mapping The fitted template model Is textured using a texture map derived from the body scanner camera images. The texturing process performs the following operations: 1. For each model triangle suitable cameras from which to determine the texture are identified. This Is based upon visibility.

2. For each model triangle one camera is selected from the suitable cameras so as to maximize the size of triangle patches assigned to the same camera. This benefits the image merging.

3. The images from the cameras are merged along the patch boundaries.

4. Texture pixels are transferred from the camera images into a unified texture image.

5. Model triangles for which no suitable camera was tound are filled with a synthe-

sized texture.

This process is described in detail in Section 5.

2.5 Hair modelling Onc of the limitations of 3D scanning systems is that they fail to capture hair reliably.

This Is problematic if the objective Is to provide a model with a realistic appearance as a persons hair has a very significant impact on this. To rectify this an algorithm has been developed to create a simple model of hair from a small number of images of the head. This algorithm is able to create a model of the hair from the cameras in the Triforrn scanner for example.

No attempt is made to model the hair in detail, this is impractical given only a small number of Images. Instead a smooth surface is constructed whose shape matches the silhouette of the hair as seen from all of the available cameras. This surface is then textured In a similar fashion to the texturing of the template model using the camera images. Although the resulting hair model is very simple. the improvement in the realism

of the molel's appearance is significant. The details of the hair modelling algorithm are given in Section 8.

3 Creating an Initial Surface Model of the Data The data output from the Wicks & Wilson scanner consists of eight point clouds, each captured from a separate camera. One of the point clouds, with cosine shading applied, is shown in Figure 1. Typically each point cloud contains 100,000 or more points, giving 1.6M pomts in a single scan. There is considerable overlap between scans, so the total of unique points is about I million.

As Figure 1 shows the points within each cloud are organized as a rectangular image. The view of the point cloud as an Image is useful, as it allows us to generate a virtual camera for each point cloud. We call point a cloud model with this property a range map.

The initial step in the conformation process is to generate a surface model S from the point cloud data. We need to be able to answer the following queries: 1. Given a point p n3, find the closest point on S to p. In other words q = erg min{||p-qll, q S} 2. Given a point p 773 and a (normal) direction n of unit length, find the snnallest u = abs(o) such that p + an S and 0 c a < d. This is the closest point on S to the ray (p, n) within a distance d of p. Data structures which can be used to answer queries of the above foml include triangle meshes with some form of hierarchical subdivision and implicit surfaces (perhaps imple-

mented via octrces).

If only query 2 Is required and if then a simple and efficient implementation w.r.t. each point cloud can be obtained by projecting (p, n) onto the range map image, and perform-

ing a line search.

We form a surface model in three steps.

1. An initial rigid-body algnmcnt of the point clouds is performed. This step is re-

quired as the Wicks & Wilson scanner produces clouds which may be misaligned due to mechanical movement of mirrors and of the subject (since the overall capture time Is several seconds).

2. A further non-linear adjustment in order to fully align the point clouds is under-

taken. This step is necessary since movement by the subject may be nonlinear, and there are residual distortions in the system. The end result at this stage is a set of point clouds that are completely aligned, by which we mean that given any point p in point cloud S0 with local surface normal n, if the ray (p, n) intersects another point cloud So at point q, then if IP-ql < e the points are considered to be the same. Typically e will be about 1 mm.

3. The point clouds are meshed. This process produces a surface which can be dis-

played. Queries of the form 1 and 2 above can either be performed on a data struc-

ture based wither on this mesh, or on the point clouds from which it is derived.

3.1 Rigid body alignment of point clouds There is a need for a mechanism to align data scanned by a variety of 3-D digitizers.

Starting with the work of Chen & Mediom [Chen and Medoni, 1991] and Besl and McKay [Besl and McKay' 1992] considerable attention has been paid to the problem of aligning two scanned surfaces. The most common method for performing such an align-

ment is some variation of the "Iterated Closest Point" or ICP algorithm (a term introduced by [Besl and McKay, 1992]).

lCP variants align two surfaces by selecting pairs of points, one point from each surface, and then aligning the two point sets element-wise.

The sub-problem of aligning two point sets via a rigid body transform has been known to statisticians since the '60s [Schoneman, 1966], and a generalization of this problem where a scale factor can also be applied to the points is known as Ordinary Procrustes Analysis (OPA) [Dryden and Mardia, 1997].

A closed-form solution to the Procrustes problem was introduced in [Schoneman, 1966] using Singular Value Decomposition (SVD) (see e.g. tGoluh and Loan, 1996a]). Alter-

native methods are known, including [Horn, 1987, Horn et al., 1988]. A recent study of four closed-form algorithms [Eggert et al., 1997] has shown that there is little difference

in performance between these methods.

The problem of aligning multiple surface has received less attention than the two-surface problem. It is possible to use a parwise algorithm to ahgn multiple surfaces, but this is likely to lead to inaccuracy, as no global error minimization is performed.

Previous work on multi-surface registration includes [Pulh, l 999, Bergevin et al. l 996, Eggert et al., 1998, Stoddart and I lilton, 1996] We can consider the problem of aligning two surfaces to be an extension of Ordinary Pro-

crustes, by reducing it to a sequence of Procrustes problems. Similarly it Is possible to use the solution to the "Generalized Procrustes'' problem to match multiple surfaces together.

Although this approach has not been taken before it offers a simple and robust method for alignment of multiple surface acquired from a range scanner. We call this algorithm the "Generalized Iterated Closest Point" or GICP algorithm for obvious reasons.

Generalized Procrustes Analysis (GPA) was introduced by [Gower, 1975] arid [Berge, 1977]. Other work includes[Goodall, 1991, Dryden and Mardia, 1997]. We are interested in rigid body motions only (termed Generalized Partial Procrustes or GPPA) in [Dryden and Mardia, 1997].

The Generalized (Partial) Procrustes problem can be posed as follows. We are given a set of configurations c,, O < i < n, with each configuration C, consisting of an ordered set of 3-dmensional points pa, 0 < j < m. Let the distance between configurations C, and Ck be d(C,, Ck) = EJ-o ||Pt,-p, k||2. Let a rigid body transform T take a configuration C to a new configuration TC by applying T to the points in C. Then we seek a mean configuration C and transformations Ti, O < i < n to minimize the expression inf d(T,Ct, C,) T,,C,,

Although this looks daunting there is a straightforward iterative algorithm to determine the mean configuration C, and the transforms T. [Berge, 19773.

1. Center the configurations to remove location, so that the "center of mass" is at (0, 0, o)T. Call these locations Ct.

The work In [Bcrgcvn fit al, 19961 Is the basis for the ahgnment In Polyvorks

2. For the ith configuration let Cab = 1 /(n-I) 5 Zip Cal. Arithmetic on configurations is defined point-wise way. Then perform an Ordinary Procrustes superposition of C' on Ct.

3. Superpose the n configurations in turn, as described in step 2. Repeat until the Procrustes sum of squares Is not reduced further.

We seek to generalize this algorithm to operate with surfaces rather than configurations.

This generalization Is described in Section 3.1.2. The general algorithm requires a pair-

wsc alignment algorithm, and this is described in Section 3. I. l. The pairwise algorithm is a variant of standard ICP, which itself can be viewed as a generalization of Ordinary Partial Procrustes Analysis to operate with surfaces rather than configurations.

3.1.1 Two-surface alignment The two-surface alignment algorithm we use Is specialized to point cloud data, although variants could be used with data of any kind. We have point clouds corresponding to the two surfaces. Eachpoint cloud consists of À A pose, containing information about the translation and rotation applied to the point cloud. This nforrnaton Is updated during computation. The pose is repre-

sented as a rigid body transform T. À A point cloud (size 768 x 576 or 442, 368 points for the Wicks&Wilson scanner), where each point contains an x, A, z location in space. The point cloud Is arranged as a rectangular array of pixels. Given a point p-(x, A, z) in the point cloud, its location is TV.

À A binary mask image, the same size as the point cloud, where a pixel is l if the corresponding x, A, z pixel is valid' 0 otherwise.

À (optionally) a color image, again of the same size, which shows the appearance of the surface. This information may be useful for matching In some circumstances.

We;assune that coarse alignment has already been performed, by which we mean that points on the two surface are less than some fixed distance (say 2-3 cm) apart.

In [Rusinkewcz and Levoy, 2001] a taxonomy of lCP variants is given. The taxonomy is based upon the following stages of the 1CP algorithm.2 Selection flow to determmc which points on either surface are selected to paired with points on the other surface.

Matching Given a selection of points on a surface, how they are matched with the 'cor-

responding" points on the other surface.

Weighting How point pairs are weighted. This can be a unifUrna weighting, or weighting based upon hke]ihood of the points matching, or weighting to account for the effects of scanner noise and geometry.

Rejection Criteria What criteria to use to completely reject a point pair. This is cffec-

tively the same as assigning a weight of 0.

Error Metric The metric used to measure the distance between surfaces. A sum of squared distances measure is the most common.

Error Minimization The search strategy used for error minimization.

Convergence This is not one of the elements of the taxonomy of [Rusinkiewicz and Levoy, 2001], but is important to both the correctness and efficiency of the algo-

rithm. We consider each of these areas in tom, after a brief overview of the algorithm.

Overview Given two point clouds A and B we align them by fixmg A and iteratively findmg a rigid body transform T that minimizes IITB-Alla.

At each step k of the iteration we have a transformation Tk, where To is the identity. Each iteration involves 2 They assume that meshes are used to rcprcscnt the surfaces, but our use of point clouds makes little dfltrence

1. Select of a set of points SB = Pi, 0 i < n on TkB. See Section 2 for details.

2. For each PZ SB choose a corresponding point qua on A. Reject p, If the corre-

sponding point Is too far, or the normals are too different (dot product of p, and qt is too small). This prunes SB and creates a corresponding So.

3. Do Ordinary (Partial) Procrustes Analysts to determine the transform ok taking SB to SA. In order to improve noise resistance a reweighted least squares approach is used, described in more detail in Section 3. If the error IIJB-All:' has not improved STOP. See Section 3 for details of the convergence criterion.

4. Set Tk+ = SkTk.

5. goto I Selection T here are many point selection strategies. A comprehensive discussion can be found in [Rusinkiewicz and Levoy, 2001]. Strategies include: À Selection of all points [Best and McKay, 1992]. This is expensive when we are deahng with point clouds, as there are typically 200, 000 or more points.

À Uniform subsampling [Turk and Levoy, l 994]. This is the approach we take. Typ-

ically 1 point In l 6 is selected (subsampling the point cloud by 4). Since the point cloud is in the form of an image there is an argument for subsampling by treat-

ing the point cloud as an image and resampling by averaging. This would reduce any aliasing effects. This approach produced results indistingushablc from those presented, so has been ignored for the sake of simplicity.

À Random sampling (with a different sample at each iteration) [Masuda et al., 1996J.

This Is an interesting (and relatively cheap to implement) strategy. It seems likely to be more effective when only a small subset of the points is selected.

À Maximizing the distribution of normals [Rusinkiewicz and Levoy; 2()()1]. This strat-

egy is elTective when registration is determined by a small "feature" on the surface

Again, this is less necessary when a reasonably large proportion of the surface is sampled (although see the comments above about aliasing).

It is also possible to select points from both surfaces (rather than just one), with any of these strategies.

We have employed two strategies. The first is the simplest - a uniform subsampling of the point cloud, typically taking every 16th point. The seconds, which is much more effective for the human body, involves selecting more points near the boundary of point clouds, and selecting more relatively more points on the body's extremities. We have found that the selection strategy has a significant effect on the performance of the ICP algorithm.

Matching 1\ critical aspect of an ICP algorithm is the method by which matching points on a surface are found. Various strategies are possible for matching point p on surface 13 to a point on surface A. These include: l Use the closest point on the other surface [Best and McKay, 1992].

2. Shoot a normal from p to A. We use a slight variant of this [Chen and Medioni, 1991]. If q is the point on A where the normal from p intersects, then we choose the closest point to p on the tangent plane on A through q.

3. Project the point p to A using the camera used capture point cloud A. Our experi-

ments indicate that, although fast to implement, this method is less useful than the others, as it Is possible that the optimal direction of movement is almost perpendic-

ular to the camera normal.

It is also possible to use information from the photographic render to refine the match position. This approach has potential but is contraindicated for two reasons in our body scanning application and has not been implemented. First, the angle between cameras is often greater than 90 degrees, which makes any form of intensity-based matching prob-

lematic and second, large areas of uniform skin do not provide a great deal of useful matching Information.

3 due to Kevin Keeier of Wicks & Wilson Ltci

The approach we have followed is a variant of 2 above. A hierarchical bounding box structure is used to accelerate the computation involved in intersecting a ray with a sur-

face. The unoptimized system we are using can compute 20, 000 intersections per second, which is sufficient.

Nonnal shooting is a very effective technique. It does however depend on reasonably ac-

curate surface normals. In some circumstances the Wicks&Wlson scanner can produce significant noise in surface normals. We have found a certam amount of image-based smoothing produces acceptable results. Figure 2 shows an unprocessed scan (cosine shaded) containing noise. Figure 3 shows the filtered scan. Some image resolution has been lost, but the nonnals are consistent enough for normal shooting to be successfi.l.

Rejection Criteria Since scans can be noisy it is important to reject matches which are unlikely and (less importantly) to provide appropriate weighting for other points. The weighting issue is discussed below. We reject points based on the following criteria À The distance between the points exceeds a (user-defined threshold). Typically this will be 2-3cm for a body scan.

À The dot product between the surface normals at the two points is less than a use-

defined threshold. Typically this will correspond to an angle of about N/4 radians.

In Turk [Turk and Levoy, 1994] points were also rejected if they are close to the boundary of a surface. This does not seem to be an issue for the Wicks&Wilson scanner, so we have not implemented it.

Weighting and Error Minimization The distinction between weighting and rejection of points Is not clear cut, since a weight of 0 Is rejection. A qualitative distinction can be made however. Points that are "clearly" wrong should be rejected, and the remain-

lng points should provide an "approximately" correct match without further weighting -

wcghting should be u sod to refine an initial estimate.

In our Implementation weighting is implemented via an iteratively reweghted version of Ordinary (Partial) Procrustes. The algorithm works as follows.

l. Set all point weights to 1.

2. Solve the weighted Ordinary Procrustes problem. This is exactly the same as the unwcighted problem, with the exception that each source and destination point is scaled by its appropriate weight beforehand.

3. Calculate the mean and standard deviation of the residual distances between matching points. Define two threshold I,, and tb Typically to is 1 and th is 2. For any point pair if its deviation is less than t, from its weight is set to 1, if the deviation d from the mean is between tow and tbo its weight is set to (tQ/d)2, otherwise the weight is set to 0.

This is a standard reweighted least squares algorithm. Convergence to the "correct" solution will happen, assuming that the initial solution is sufficiently close to the "correct" solution. l 4. Compute the sum of squares difference of the new and previous weights. If these are within a user-defned threshold, stop. This threshold is typically set to 10 -2.

Convergence Detection of when the algorithm has converged is important for effi-

ciency. The Procrustes distance between the two point sets as found in the algorithm of Section 3 above is used to measure the goodness of fit. There seems to be no ab-

solutely best way to determine whether convergence has been reached, so a heuristic is used. The stopping criterion is that given the sequence of Procrustes distances at each iteration do'..., cl,, and an integer length l, the running average of the last I iterations Is calculated for each iteration, and If the current average has Improved by less that the threshold, iteration is stopped.

3.1.2 Multi-surface alignment The multi-surface alignment algorithm is based on the Generalized Partial Procrustes al-

gorithm described in Section 3.1. The central Idea is to construct a (point leased) mean

model, and to align all the models with this mean model. Thus, unlike the approach of [Bcrgevin et al., 1996] all the point clouds are moved.

The algorithm we use is as follows 1. Compute the mean surface (see Section 3.1.2 below for details).

2 Align each surface to the mean, using re-weighted Ordinary Partial Procrustes.

3. If convergence is achieved, STOP, else groto l.

The Mean Surface The mean configuration 12 for Generalized Partial Procrustes Anal-

ysis (GPPA) is defined as 1/n Z=0 C,, where arithmetic is point-wise. What we term the mean surface (I) is a generalization of the mean configuration, and is also defined point-wise. l The mean surface is constructed poin-wise. Let there be rr surfaces, S0,... Sn to be aligned. For each surface S. select m points p0,..., Ptn. Each point p jc has an associated normal all;'<.

For each point p jc and each surface Sj, j i, locate the matching point jp jc, as defined In Section 3, with corresponding normal ink. Reject q i: if the conditions of Section 3 are met. Define ink as Ok 1/ll 79'Pk, where is the set of j such that Ok exists and meets the conditions of Section 3. The normal Ok is defined similarly, as the (normalized) average,namelynk+l/lJl knit. If 171 > Then; is en clement ofthe mean surface, and the mean surface is the set containing all such points, together with the corresponding normals. Surface alignment Once the mean surface has been constructed, each surface So can be aligned against as follows. The points on potentially matching surface S. are P = Ok, k K, with associated normals Ok. The ICP algorithm outlined in Section 3.1. l can be used to align S. with P. by finding corresponding points on S. to the points in P and proceeding as in Section 3.1.1. Once the transform T aligning Si to P is determined the pose of S. is updated accordingly.

Convergence We use the same strategy for deterring the convergence of GICP as for ICP. At each iteration the squared residuals for each point are summed, and the overall residual recorded. If the running average over some user-defined number of iterations (typically 3) does not decrease by more than a specified factor (typically 10%), conver-

gence is achieved.

Use of tie points It can be useful, although not in fully automatic operation, to insist that the algorithms brings identified points into exact alignment. This is the case when the user can identify anatomical landmarks between different point clouds.

It is possible to allow for tie points within the GICP framework in a natural manner. As-

sume that we wish to tie point Pi on surface So and pi on surface S., where the points have associated normals ni and n'. We can add an additional point to the mean surface with location (Pi + p')/2 and normal (pi + + j)/2. This point should be weighted appro-

priately and the alignment process of Section 3.1.2 then takes place with this additional point. 3.2 Non-rigid alignment of point clouds Even with optimal adjustment via rigid body transforms, the point clouds are not perfectly aligned. The residual differences are due to movement by the subject, since in the Wicks & Wilson scanner each point cloud takes some time to scan, and the point clouds are scanned sequentially, and to other residual errors which have not been quantified.

Since the final output from the scanner is a con.ven.sual surface, derived from the point clouds, we need some method of reconciling the differences between point clouds.

We have developed a novel netllod of non-rigid aligunent of point clouds which brings the different point clouds Into exact registration. The consensual surface is then defined unambiguously by the union of the Individual point clouds. A suitable definition is: given a point p on point cloud So, and a ray T-(p, n) with origin p and direction n normal to S. at p, then If r Intersects a point cloud Sj, i 76 j at point q, then Ill lp-41 < e then p and q are considered to be the same point otherwise they are different. Typically, for a body

scan G will be less than lmm.

In teens of queries i and 2 of Section 3 the closest point query can be answered from the point clouds by taking the closest point among any point cloud, and the intersection is the closest intersection with any point cloud.

Our method of producing the exact alignment between point clouds uses a generalization of the ICP algorithm, using elastic warps rather than rigid body transforms. This is used by the Generalized ICP algorithm in the same way as it used the standard ICP algorithm We now explain the non- rigd ICP algonthrn.

3.2.1 Non-rigid ICP Let there be two surfaces S0 and So, with SO being the mean surface, consisting of points p,, 0 < i < n and normals n,. The surface So is a point cloud, and will be non-rigdly aligned with S0.

For convenience we align 50 to So and then reverse this alignment to align S; to S0. We do this since the mean surface S0 is approximated by a sparse set of points, whereas S. is approximated more densely as a range map. As will be seem it is computationally simpler to align SO to So in this case.

The motivation for the alignment is to generate a warp from SO to S' which matches the two surfaces in regions of overlap, and which is in some sense minimal, by which we mean produces the least distortion of SO. Since noise is present in the sensor measurements the warp should be approximating and allow alignment errors within c to remain. The approximating warp replaces the rigid body transform used in standard ICP.

The approximating warp W is performed using a radial basis warp. This is a warp from ] R3, which generates a warp from S0 to So. The form of the warp is: n W(p) = wilp-pa + Tp ( I) To where w, = (wX'wY, wz) and T is a 4 x 4 affine matrix. Let the initial matches to the points p, be q, and let d,J = Hi-P'1, then the cocfficents w, can be obtained by solvmg

Equation 2.

doe + don 1 pry Po Po wO q-r duo dun A 1 Pn Pn Pn wl qn 1... 1 0 0 0 0 d = O (2) pO px O 0 0 0 a O Po PA 0 0 0 0 b O pZ - pZ O O O O C O Where 1 is a smoothing term. we can write Equation 2 as ( pT O) ( C) ( O I The bending energy is defined as BE(q) = sum(wrDw), where sum(v) = 0 v, for any vector v of length k. If we solve Equation 3 by irivcrtllp lilac lcl- hand matrix, we get Al A2) ( O)) It can be shown (see erg. [Dryden and Mardia, 1997]) that BE(q) = qTq, Thus the bending energy, for any particular set of points pi depends only on the location of the points q. The derivative of the bending energy aBE(q)/aq = Eq. It can be shown that the bending energy is non-positive, hence A should be negative.

The iterative algorithm uses this relationship to search for optimal locations for the points q,, given an initial approximation. The q, are constrained to lie on the surface SO by the simple expedient of reparameterizing in terms of the u-v coordinates of the depth map image for S.. At any image location u = (u, v) there is a corresponding 3-D point p. We therefore have that dBE;jOu = DB/dq o dq/du and thus a direct route lor optimization of the bending energy.

It is useful for the function OBE(q)/Ou have continuous derivatives, in teens of the com-

putational efficiency of the solution, since nonlinear solvers that can make use of function derivatives are in general more efficient. This requires that aq/du be continuously differ entiable. In other words the depth map must be continuously differentiable. A simple way

to ensure this is to approximate the image using a piecewise bicubic function. Assuming that the image at location (i,j),i,j has value l(i,j), then for any point (u,v) the approximated image value is 3 3 I(u,v) = Bk(s)BI(t)l(i + k,j + l) (5) k=0 1--0 where s = u-Lu], t = v-LvJ and i = (u]-1, j = LvJ-1. The Be are defined as: Bo(t) = at3 - 2at' + at B1 (t) = (a + 2)t3 - ( a + 3)t2 1 B2(t) = - (a + 2)t3 + (2a + 3)t2 _ at B3 (t) = at3 + at2 The parameter a must lie in the range-1 ≤ a < 0, reasonable value is-0.5 (scc e.g. [Wolberg' 1990], page 130. The partial derivatives of the image w.r.t s and t can be found from Equation 5.

To recapitulate: the overall problem Is to find q-{qO,..., q,,}, which minimizes BE(q), subject to the q, being on surface S;. We reparameterize by defining a mapping R: 72 _) 3, from the range image to the range surface. Given that R(u,) = q,, ui = (u,, vi) we want to minimize BE(R(u)) .

The number of points defining SO which overlap with points of So is typically between 100 and 500. This number is chosen as the practical limit for solving Equation 2 efficiently.

In order to optimize the bending energy we therefore need to minimize BE(R(u)) with u consisting of 200 - 1000 variables. A suitable code for this is the LBFGS method [Liu and Nocedal, 1989]. This method requires a function with analytic derivatives (and that the function Is continuously differentiable). These requirements are met as described above.

The choice of the smoothing parameter of Equation 2 should be made so as to reflect the amount of noise in the range Nap. It can he shown (see e.g. [Duchon, 1977, Wahba, 1990] that the solution to Equation 2 minimizes,_0 q-pl2 + ABE(q). If the range errors are normally distributed with mean O and variance 2 then a reasonable value for

!q.-P.12 is (rib 1)2, since we are not then requiring more accuracy than is in the data. Callthis value dT. For any choice ofq = (qO,...,q) So letdq) = Iq,--pil2-

We can estimate an appropriate value for A as follows. Let do be the initial value of d, given the Initial choice of q. In this case the bending energy will be 0. Let BEo be the value of the bending energy when Equation 2 is solved with 7 = 0. In this case d will be zero. An approximate choice for A is then given by BEod,/do.

The end result of this procedure Is a set of points q' on 5 which match the points p and define an optimal elastic registration between So and S.. These points are used to define a warp from q'to p, which is applied to So.

3.2.2 Non-rigid Generalized ICP Since SO is an approximation of the true mean surface, exactly the same procedure as with the rigdbody Diversion of Generalized ICP is used to iterate the non-rigid Generalized ICP. The only difference between the two algorithms is in the procedure aligning a pomt cloud with the current mean surface. In general few iterations of the non-rigid algorithm are performed, assuming that the value of A is set as described. A higher value of A allows less elasticity in the registration warp, and therefore requires more iterations. A reasonable strategy in order to avoid local minima in the registration is to start with a high value of and finish with the A suggested above.

3.3 Creation of meshed surface Once the Individual point clouds have been aligned the point clouds can be used for queries I and 2 of Section 3. For display purposes it is more convenient to display the data using a triangle mesh. In the interests of speed this mesh is constructed in pieces, one for each point cloud. The mesh for each point cloud is constructed using a quadtrec structure. This allows the mesh to be constructed within a given error tolerance (typically within lmm).

4 Fitting a Template Model This section discusses the problem of fitting a template model to the initial surface model created for the scanned data. The template model represents a standard body in a standard pose and we would like to change the shape of the template to match the data. This is done by deierminnL; a space warp that his the template model surface to the data surface.

This warp represents both the change of pose and shape between the template and the data and factoring these out is left until later if it is required.

4.1 The Multi-resolution Template Model A multi-resolution mesh is used for the template body model. This multi-resolution mesh comprises two levels of detail, namely a coarse resolution mesh and a fine resolution mesh. The vertices of the coarse resolution mesh are a subset of the vertices of the fine resolution mesh. The fine resolution mesh is created by subdividing the faces of the coarse resolution mesh until a sufficient level of detail is attained. More detail of the creation of the template model is given in Section 4.2.

The topology of the template model is symmetric about a central plane. Each vertex can be considered to be either a central vertex, a left vertex or a right vertex. Each left vertex has a counterpart right vertex and vice versa. This allows the optional creation of geomctrcally symmetric body models which can be advantageous in some applications.

The template body model is also segmented into nine separate components connected by joints which allows some basic posing of the model. These components are the trunk, the upper and lower arms (both left and right) and the upper and lower legs (again both left and right). The upper arms are connected to the trunk: by the shoulder joints, the upper and lower arms are connected by the elbow joints, the upper legs are connected to the trunk by the hip jomts and the upper and lower legs are connected by the knee Jomts. Note that the component and joint names are only a convenient notation and not necessarily anatomically correct. The joint position between two components is simply the centrod of the vertices shared between those two components.

The multi-resolution mesh enables an Steal course fit followed by a detailed fit. First the coarse mesh is fitted to the body data to get the gross shape and pose. The fit of the coarse mesh is then used to estimate where the vertices of the detailed mesh should be placed.

Finally the vertices of the detailed mesh are moved onto the body surface to get the fine detail of the body shape.

4.2 Creation of the Multi-Resolution Mesh The initial coarse mesh is created manually with a maximum of a couple of hundred vertices. These vertices are placed onto a suitable body shape which we will call the template body shape. The detailed mesh is then created by recursively subdividing faces which are not well matched to the template body shape. A face is said to match the template body shape if the distance from the edge midpoints and the face midpoint to the template body shape is below a specified thre.shcld, A face is subdivided by adding a vertex to the midpoint of each its edges and then re-

placing the face by four smaller faces. This is demonstrated in Figure 4. Each of the new vertices must also be moved onto the surface of the template shape. This is done by moving the vertex along the original face normal until it is on the surface of the template shape. Because these distances can be relatively large the placement might not be exactly right so some manual adjustment is allowed.

Apart *om the fit test there are some additional considerations when subdvcling faces. To maintain topological symmetry, if a face of the left is subdivided then the corresponding face on the nght must be subdivided, and vice versa. Some extra subdivision has to be done to avoid gaps in the mesh if adjacent faces are subdivided differently. This is demonstrated in Figure 5.

4.3 Initial Fit The Initial fit of the template brings the vertices of the coarse template mesh into appro-

priate positions on the surface of the body. This captures both the gross shape and pose

of the body. Both manual and automatic approaches for the initial fit are discussed In the following two subsections.

4.3.1 Manual Fitting A user Interface is provided which enables the user to quickly manipulate the coarse mesh onto the data. For an experienced user this process should take no more than a couple of minutes. To begin with the template mesh and the data are approximately aligned by aligning their oriented bounding boxes. The user can then translate and scale the template to get the trunk approximately aligned. Different scale factors can be introduced along each axis.

The user then moves the trunk vertices Into position on the surface of the body data.

Because the coarse mesh does not convey much of the detailed shape of the template nody it Is difficult to assess exactly where on the data to place template mesh vertices.

To avoid this problem the user has the option to display the full resolution mesh whilst manipulating the coarse mesh. The details of how the shape ofthe fine mesh is determined from the coarse mesh is described in Section 4.4. Determining this mesh is relatively computationally expensive so it is not displayed when the vertices are being dragged.

Once the trunk is in position the upper arms and upper legs are posed by dragging the elbow joints and knee joints into place respectively. This is done in both front and side views so that the 3-dmensional position of the joint is properly specified. Again the vertices of these body parts are then moved onto the surface of the body data. Note that any twisting of these parts in the data with respect to the model Is accounted for when moving the template vertices onto the body data surface.

Finally the forearms and lower legs are posed by dragging the wrist and ankle joints into position respectively and the associated vertices are moved onto the body data surface.

4.3.2 Automatic Fitting The manual approach for initial fitting is acceptable in some situations but in general it is advantageous to have this process automated. The later stages of the modelling process require no manual intervention so by automating the initial fit the whole process becomes fully automated.

Provided that the detailed template mesh and the scanned body data are in a similar pose an initial alignment can be determined by aligning theirreoriented bounding boxes. An oriented bounding box can be represented by a transformation T where the transformation maps a unit sided, axis aligned box at the origin to the oriented bounding box. This is discussed further In Appendix B. Given the oriented bounding boxes for the template model and the data model, Tt and Td respectively, the initial aligurnent is given by TdT i.

This Is demonstrated in Figure 6.

The alig,ll'cni of the Expiate and data Is further refined using an elastic registration algorithm. This is the same as the non-rigid 1CP algorithm described in Section 3.2.1 except here the surfaces bemg registered are represented by meshes rather than point clouds. Figure 7 presents the results of this algorithm. Although the global alignment is good some of the smaller features are not so well aligned. This fit should be sufficient to transfer the vertices of the coarse resolution template mesh onto the data.

4.4 Detailed Fit The initial positions of the vertices in the detailed template mesh are determined from the positions of the vertices in the posed coarse resolution mesh. For each component of the template a space warp is determined which maps template vertices from their original positions to their positions on the body data. The warp determined for each component Is applied to the detailed mesh vertices belonging to that component.

The warp for each component is determined as radial basis warp that maps the vertices of the coarse resolution mesh from their original positions to their positions on the body data. Because adjacent components of the coarse template mesh share vertices the warps

join up at these locations. In general, the warps for two adjacent components will map shared vertices in the fine resolution mesh (that are not In the coarse resolution mesh)to different locations. The positions of these vertices are determined by selecting one of the warps. For vertices shared between the upper arms and the trunk the trunk warp is used. For vertices shared between the lower arm and the upper arm the upper arm warp is used. For vertices shared between the upper leg and the trunk the trunk warp IS used. For vertices shared between the lower leg and the upper leg the upper leg warp is used.

4.5 Detailed Conformation The warp applied to the detailed mesh vertices should bring these vertices relatively close to the surface of the body data but in general these vertices will not lie on the surface of the body. For the final fit of the template the template mesh vertices should lie on the body data surface. to achieve this the positions of the template mesh vertices are optimized sc' as to minimis their distance from the data surface whilst minimizing the bending energy in the warp needed to achieve this. The details of the algorithm used to achieve this are presented in Appendix C. 4.6 Consistent Parameterization We assume that an Initial fit of a surface to the scanned data exists. The data surface, S. can be represented as a triangle mesh or otherwise. We assume that the data surface supports a function closests: 3 773 which locates the closest point on the data surface to any point in space. This is the only requirement on the data surface.

The task is to conform a template mesh to the data surface. The template mesh is seg-

mented into poseahle components Typically these will be trunk, head, upper-arm, lower-

arm, hand, upper-leg, lower-leg, foot. Each has a fixed shape for any particular body, with its location and orientation variable, subject to attachment constraints. We assume that the Initial fit poses the template components correctly.

We call the confonned template mesh the data mesh. The template and data meshes are

topologcally equivalent. Let the template mesh vertices be at locations P = {p,, l < i < n} and the data mesh vertices be at locations Q = {qi, 0 i c A}. The initial fit ensures that each point qua lies on the data surface. There are many ways to map from P to S by choosing the mapping from P to Q. where the points in Q lie on S. We wish the final mapping from P to Q to be correct. We call such a mapping a consistent parameterization of S. A consistent parametenzation has two purposes: I. Minor differences in the initial fit lead to the same final fit.

2. It is possible to study body shape using Principal Component /inalysis via point-

wise means of body shapes [Dryden and Mardia, 1997].

We define the mapping to be correct in this sense if it as smooth as possible, by which we mean as close as possible to an Affine mapping, namely a linear transformation, in homogenous coordinates of the form my tnO1 lnO2 to redo m'1 m'2 to m2O m21 m22 tz O O 0 1

In principle such a mapping could be found using Radial Basis functions by minimizing bending energy, as described in Section 3.2.1. In practice this would be difficult since IPI, the number of vertices in the mesh, will typically be 10 - 20, 000, and the size of the data space will be too large.

Our method is to provide a discrete approximation to the problem of minimizing overall bending energy, which is computationally tractable. The method can be viewed as a type of ICP algorithm [Best and McKay, 1992.

We determine mappings component-wise. For the body we start with the trunk, and find a mapping for trunk vertices on the template mesh. his is then fixed, and components attached to the trunk are mapped, with common vertices held fixed. We start by describing the method when no vertices are fixed, and then explam the alteration necessary when the mapping of some vertices on a component are fixed.

Let to template vertices for the component be P-{Pi, 1 c i < n) as above, and let the initial fit function map P to Q = {ql , 1 < i < In}, where the q, lie on S. The first step of the algorithm determines an optimal amine mapping as follows 1. Let k 0. Let Ao I (the Identity 4 x 4 matrix). Let be a threshold.

* 2. Detennine an Affine map Ak by finding the least squares fit Affine map from P to Qk 3. Set Qk- = {closests(Ap), 1 < i < n 4. If At |qik+]-qk|2 < C goto 2, else return Ak.

In the case where some vertices are fixed, the fixed locations are used in step 3 above, rather than the closest point on S. The output at this stage is an affine map A, which provides the closest map between the template mesh and the data surface. This mapping is not a mapping onto the data surface, and by construction it Is possible for it to be discontinuous. In what follows we assume that the set P has been transformed by A. The second step of the algorithm produces a smooth mapping, as close as possible to affine.

This step is iterative, with iteration variable k. We produce a sequence of maps P Qk from P to the data surface S. For each k and pi, P we define do = qk-p, Let d be a dffcrencc function, with the d, being samples of this function at the Pi We have that P: + dip:) lies on S. The goal of a smooth mapping from P to S is to maximize the smoothness of the function d. Affine functions (e.g. d(p) = Ap, where A Is affine) are, by definition, perfectly smooth.

Our method involves developing a discrete approximation to d, smoothing the approx-

maton, and using this approximation to move the q. The function d is then re-computed and the process Iterates until convergence is achieved.

Let the bounding box of P U S be V. Wc define a fine grid of points G {9ijk -

(x0 + is, ye + is, zo + k6)T, 0 i, i, k < m. Typically will be lcm for a body. The value Is chosen so that most cells In the grid contain at most one of the pi. These points

are used to define a smooth function d', which agrees with dZ on the Pi and approximates the smooth function d. A method similar to that described in [Lee et al., 1997] is used to generate the function. Rather than cubic splices, linear splincs are used. This means that the values of d' inside grid cells are determined by bilinear interpolation from the values at the cell vertices. Linear splines have the property that the values of grid coefficients are the values of the distance function at these points. The Unction d' defined In this way is continuous, with picccwise continuous derivatives.

The function d' is smoothed by treating the grid G as an image and convolving with a radially symmetric smoothing filter (e.g. Gaussian) with weights summing to 1. Amine functions are invariant under such filters, other functions are smoothed. Repeated use of G produces an affinc function.

With these preliminaries in place, the method used to generate the smooth mapping is as follows: I. k O. Q is the set of points on S closest to corresponding points in P. 2. Determine the set dk, O < i c n as the difference vectors between points in P and points in Qk. If k > 0 and Al Idly±d,kl2 < He for some threshold E then stop.

3. Extend the dk to a smooth function dk on grid G as described above. Note that p + dk = qk 4. Apply a Gaussian filter to dk. This smooths dk, giving do+ 5. Let qk Pl, O < i n be the closest point on the S to p, + dike 6. goto b The effect is to smooth the difference in transform between neighboring points in P. and to generate a mapping as close as possible to amine, subject to the constraint that the template mesh must map point-wise on the data surface. In effect, this is a finite difference version of the radial basis warp described In Section 3.2.1. There is one parameter to the warp, namely the extent of C;. This Is typically chosen to be about twice the average distance from any pomt In P to its closest neighbor.

If a point Pi in P is fixed to a particular q,, because we are dealing with a component whose border has already been fitted, then the point q, is fixed in step 5 above.

4.6.1 The optimization algorithm The 'method for consistent parameterization of meshes uses a standard tc,mplate wash.

Results are improved if the mesh is replaced with the mean shape obtained by averaging a set of bodies. The optimization of the template mesh w.r.t. a set Ma,..., Mn of meshes, proceeds as follows.

1. All meshes are posed identically 2. Set the mean mesh M to Ma.

3. For each mesh M, of Ma, Me,..., M,, the optimal amine transform mapping to M is found, giving mesh M'.

4. Let M' be the point-wise average of the I'm. If the difference between M and M' Is less than a preset threshold, stop else M M' and goto c.

,... _.

4 reference to Anthony's bit on posing here

5 Creating a Photorealistic Texture for the Model Each point cloud captured by the scanner has an associated texture map, containing the visual appearance of that point cloud. It Is desirable to use these texture maps to generate a seamless rendering of the appearance of the confonned template mesh.

To do this effectively several issues must be addressed.

1. Since the texture maps are captured by different cameras the overall intensity and color balance of the images can vary. We have developed a simple technique for color calibration of cameras, described In Appendix A, which minimizes these ef-

cts. It is important that best efforts be made to color-calibrate the cameras ident-

cally in the first place.

2. Since each camera views the body from a different angle, even with identical cam-

eras there can be differences In Illumination at a given point, If the surface is not Lambertian at that point.

3. There are inevitably errors in the calibration of the cameras and acquisition of the model, which means that the images will not be in perfect registration. In this case any blending of the images is likely to lead to blurring.

4. Some areas of the model will not be visible from any camera due to occlusion.

Generally, a body scanner misses parts of the body, such as under the arms or the crotch. Since we are fitting a complete template mesh, some areas of the body will have no texture assigned.

There has been extensive research addressing issues 1-3 in the context of image based rendering. Sce for example [Buehler et al., 2001, Debevec et al., 1996, Debevec et al., 1998]. These approaches assume, for the sake of quality, that the texture mapping can be done in a vew-based t:ashion, and that the individual cameras being used are color and intensity calibrated.

Since the models we produce are designed to be viewed on a wide range of hardware it is infeasible to generate view-dependent models. Instead we generate a single view

independent texture map which blends the images from the eight (or more) individual textures captured by the scanner. We assume that the target conformed mesh M has vertices V = {v0,.., via} and triangles T = {to,..., to}. Each triangle to has vertices v,0, via, v,2. We assume that each triangle has u-v coordinates associated with each of vertices. Figure 8shows a mesh textured ni this manner, with a smgle texture map. Our aim is to replace this generic texture with the texture corresponding to the person being imaged. This is shown in Figure 9. We assume that for each texture we have the camera parameters (internal and external orientation), so that, as In fDebevec et al., 1996] we can project the texture map onto the triangle mesh M. Given this information the method for generating a single texture map proceeds as follows. We assume that the texture map form a set mO,..., mk.

1. For each triangle t in M determine which texture maps project to t and record this nforrnation A map projects to t if it projects to each of the vertices of t. A map projects to a vertex If there is a line from the camera center to the vertex which does not Intersect M. For each map which projects onto t determine the angle between the ray from the camera to t and the mesh normal at t. This angle will be between O and Wry. Set the target map to be that map with minimal angle.

2. Although each triangle Is in some sense optimally assigned in step I too many isolated triany,les with a map different to their neighbors are more difficult to merge that larger groupings. We therefore group triangles as far as possible by reassigning triangles fin small patches to larger patches. A triangle can only be assigned a texture map if the map pro jects to that triangles.

The result of this step is that large groups of neighboring triangles are assigned the same target map.

3. Since the mesh Is manifold we can define an edge on the mesh by two triangles e = (to, t'). A border is defined as a sequence of edges No,..., ek where the edges adjoin and the triangles of each edge have the same two maps. This is a border between maps, me and ma say.

In order to proceed we assign weights to each map on border vertices. All weights start at zero. If a vertex lies on a border between me and ma then the weight of nix and ma each get incremented by 1. When all borders have been accounted for weights at each border vertex are norrnahzed to sum to 1.

4. For each vertex with a non zero weight let the texture maps be me,..., mk with weights we,...,wk where k=0wl - 1. We assume that the texture maps are single channel images with intensities in the range 0-1. The extension to multi-

channel Images is straightforward. Since we know the projective camera associated with each texture map, we can detcrrninc the pixel value pi for each map mi at v. Since the projection of v in the ml is at non- ntegral coordinates in general, we use bilinear Interpolation to generate the pixel value. We assign the intensity value k=wpZ to vertex v, which is the average value of the vertex in patches neighboring v.

5. For each texture map m,, list all vertices v in M which have non-zero weight and which project into m,. Let v project onto pixel Pv = (Px, pa), which has intensity Iv Let the "average value of v as determined in 4 above be no. The difference is = lo, I We can now construct a problem in scattered data interpolation. Each pixel v which projects into m, generates an image location p and a value v. We construct a function cp n2 OR, which interpolates these values, so that (Pa) = to for all v projecting to ma. The function should be as smooth as possible so that not only are the images Bitt blended but also the overall intensities and colors are, as far as possible, balanced.

Many methods of calculating a suitable up are possible. One approach is to use a ra-

dal basis function. The calculations are similar to those described in Section 3.2.1, but in two dimensions. Specifically, given rt 2-D coordinates p,, with associated values CAL we construct a function n (P(P) = Pulp-P\l2 10g(lP-P,l) + Am, + bPY + C (6) 1=0

satisfying the constraint that (pZ)-at, O ' i < A. The function of equation 6 can be constructed by solving the equation doe din po Po 17 we 1 SO I | dnO -- dun 1 Pn PA I we | Sn (7) 1 1 1 0 o 0 1 Ct 1 101 | Po Pn O O O | b | | O | I'd ' ' ' Pn O O O) c) O) where do - lip,-PJI2IOg(lP,Pit). This function is used, as it is optimal In 2D [Powell, 1992].

The function A, considered as an image, and evaluated at integer coordinates is then added to the texture map m,. This creates a new texture map m, which Is consistent with the pixel values at the vertices v.

6. The final step of the texture creation method is the creation of a unified texture map.

We assun1e that each triangle has texture coordinates which refer to a unified texture map. The process of filling in this map proceeds as follows: (a) Each triangle, using its default texture coordinates is projected Into the default texture map and painted with a unique color. A suitable value is an integer representing the index of the triangle. At the end of this process the texture map contains pixels which indicate their triangle of origin in the mesh.

(b) Each pixel in the default texture map is visited. If the pixel is labelled with a triangle index, then the barycentric coordinates of the pixel with respect to the triangle's texture coordinates are computed. These coordinates are used to determine the corresponding location w.r.t the triangle on the mesh. This value is then projected Into the appropriate scanned texture map to determine the value to substitute for the pixel ore the default texture map. In this manner all pixels can be assigned values.

In order to avoid aliasing during this procedure it is useful to ensure that the dimensions of the default texture map are consistent with the scanned reaps,

or to sample pixel values from the scanned maps in such a manner as to ensure that aliasng does not occur.

To summaries the method: we first generate patches on the mesh corresponding to the different texture maps The patches are made as large as possible, and are seeded from an initial uptmal assignment of triangles to texture maps. The vertices along the borders of the patches are then assigned "average" texture values. These average values are then mapped back to the individual texture maps and "difference" images formed by means of scattered data Interpolation. These difference images, when added to the scanned images provide a seamless view when projected back onto the mesh.

The main advantage of this method is that there is no explicit blending of images. Blend-

ng (see e.g. [Burt and Adelson, 1983]) almost always involves combining a number of pixels on either side of an image boundary. This is inadequate in this case since the s:,ei texture maps contain z large amount of background, not part of the mesh, and the

background can be arbitrarily close to the patch borders. If background is blended with

an image the results are wrong and highly displeasing! 5.1 Texture Synthesis Some areas of the body are not visible from any of the cameras due to self occlusions.

Typical examples include the underside of the arms, between the fingers and around the crotch area. This presents a problem as there is no image data representing these areas from which to create the appropriate regions of the texture maps.

To address this problem, regions of the texture map for which there is no associated image data are filled with svnthe.sizedtexare pixels. The pixels arc synthesized in such a way to ensure that: 1. The filecd region is similar In appearance to neighboring regions for which there is texture data.

2. The boundary between synthesized and existing texture pixels is seamless.

The earner process of texture mapping the model establishes which texture image is most appropriate for texturing each triangle. This process also identifies those model triangles which cannot be textured because they are not entirely visible from any of the cameras. It Is these triangles for which appropriate texture must be synthesized.

Note that the occlusion test for a given triangle is based upon the visibility of its vertices only. If a line from any triangle vertex to the camera intersects another part of the model then the triangle is said to be occluded. Although this is not a sufficient test as part of a triangle could be occluded whilst it vertices are visible, in practice this does not present a problem. More sophisticated occlusion tests could be employed if this was the case.

5.1.1 The Texture Synthesis Algorithm The texture synthesis algorithm is based upon the work of Wei [Wet, 2001]. The principle idea is to match pixels to be synthesized to pixels in a region of source texture based upon the context of neighboring pixels.

The texture synthesis process Is setup by associating each pixel from the region of source texture with a feature vector derived from its neighboring pixels. The neighbors to include are defined by a mask and the feature vector is constructed by concatenating each color component of each included pixel. To synthesize a given pixel that pixel's feature vector is constructed and the closest matching feature vector in the source texture is identified.

The value of the pixel associated with the closest feature vector Is assigned to the pixel being synthesized. The Euclidean distance between the feature vectors is the metric used for matching.

5.1.2 Sparse Feature Vectors lior a given pixel in the region to be filled, any number of the neighboring pixels included in the mask may have undetermined values. When matching the feature vector for this pixel against the feature vectors for the source region these undetcrrnncd components are simply ignored. The best match is determined in a feature space whose dimensonality matches the available data.

5.1.3 Improved Blending using a Spiral Path So famt has been assumed that the order in which pixels are synthesized Is row by row and from left to right. The problem with this is that the value of pixels to the right of and at the bottom of the region being filled are not considered until synthesizing their immediate neighbors. The consequence is that the blending at these boundaries is poor.

This problem is reduced, although not completely solved, by defining the order in which pixels are synthesized to be a spiral path within the region begin filled. This pattern still exhibits poorly blended pixels but these are more scattered and so less visible.

The algorithm to determine a spiral path for a given region is as follows. The 4-connected boundary of the region is tracked and each pixel visited is removed and added to the path.

Note that removing pixels can result in the region splitting into subregions. Tracking is continued for the current subregion until no pixels remain. The process is then repeated for any subregions that may have become detached.

5.1.4 Selecting a Region of Source Texture The region from which pixel samples are taken to form the texture synthesizer should be close to the region being filled. The assumption here is that regions in close proximity will have a similar appearance. It is important to recognize however that pixels in close proximity in the camera images are not necessarily In close proximity on the surface of the body. Neighboring pixels in the camera images can belong to separate parts of the body which are separated spatially.

One of the advantages of using the unified texture image described previously Is that it is organized in such a way that pixels adjacent in the image are also adjacent spatially.

The consequence is that to form a texture synthesizer, pixels can simply be sampled from around the region to be filled. Only pixels that have already been assigned values are sampled so that the background and other holes in the texture are ignored.

Another minor advantac of taking pixel samples from the unified texture image Is that in general the sampling is more uniform spatially over the body surface Conversely, pixels

in the camera images represent samples at different special resulutiL'ns depending upon the orientation of the body surface relative to the camera. The unified texture image, to some degree, decouples the texture image from the geometry of the body.

5.1.5 Demonstration Figures 10 and l l present results of the texture synthesis algorithm. In each figure the image on the left presents a texture with a missing central region. The Image on the right presents the texture after the missing region has been filled using the texture synthesis algorithm. The texture in Figure 11 is part of a real body texture image captured in the Wicks and Wilson Triform scanner.

In both examples the masks included the 8 immediate neighbors and the source texture pixels included all pixels up to 4 pixels away from the missing region.

6 Creating a Synthetic Skin Texture for the Model A texture Is applied to the body model to provide a realistic visual representation. This texture can either be derived from images captured of the body, as described in Section 5 or generated synthetically. To obtain a high level of realism it Is necessary to derive the texture for the head from images but for the rest of the body a synthetic texture is often sufficient. Given a good model of the head and provided that the skin texture looks like skin and has the correct colour then the overal impression is that the body model is a good likeness. An approach is described here for creating such a skin texture. Initially a colour neural skin texture is prepared. This can either be done manually using texture painting tools or generated programmatrically. In the implementation used here the skin texture was generated by modulating a light-map for the body with a noise function extracted from a sample image of skin. The iight-map was created using a global illumination aigorithrn.

Figure 12 presents the colour neutral skin texture.

Next an estimate of the skin colour is determined from the captured images of the indi-

vidual being modelled. Only certain pixels arc used to estimate the skin colour as defined by a mask. This prevents the colour of eyes, hair and so on being used in the estimate.

Here a simple average of these pixels has been used as the colour estimate but alternative statistics could be used. A problem with the average is that the skin colour estimate is not necessarily the colour of any of the sample pixels. The skin texture is created by modulat-

ing the colour neutral skin texture by the estimated skin colour. Figure 13 presents such a result. Finally, Figure 14 presents the body model textured using this skin model.

7 Modelling the Head for Restricted Views If the body model can be viewed from arbitrary viewpoints then a full, 3-dimensional model of the head is required. Creating such a model of the head is challenging and is discussed in Section 8. Sometimes an application may impose certain restrictions on the viewpoints fro.' which a model is 'isplayed. In this case a bull, 3-dil.lensional model may not be required.

In this Section the case where a model is only viewed from one, or possibly a small number of vewpomts is considered. It is assumed that an image of the head from each of these views, or at least close to these views, is available. It is also assumed that these images are calibrated in the sense that for each pixel the line of sight for that pixel relative to the body model is known. The implication is that either the images and 3-dimensional capture are taken with the body in the same position or theimages are somehow aligned with the body model.

Each head image is rendered using a technique similar to bllboarding. Billboarding Is a well known technique in computer graphics where a 3dimensional object is modelled by projecting an image of the object onto some planar geometry. The orientation of the plane is controlled so that it is always facing towards the camera. By using transparency, arbitrarily shaped objects can be modelled this way.

To render the head the body models head geometry is replaced by some, approximately, planar geometry and the image of the head is projected on as a texture. The background of

the head image is rendered as transparent. The new head geometry is only approximately planar because it Is joined to the rest of the body model at the neck. This keeps the head image and the rest of the body Joined even If the view point is not exactly the same as that of the camera capturing the head Image.

To prevent a visible join, the body texture and face image must be adjusted so that they have the same colour where they are adjacent. This is the texture blending problem dis-

cussed canter In Section 5.

If the model is to viewed from more than a single fixed position then extra geometry is

created for each position but only the geometry and texture for the curers position is displayed. When blending the body texture with the each of the head images only the head image data is adjusted. This produces a single body texture onto which any of the head images can be displayed seamlessly.

Figure 15 presents a body model with a head modelled this way.

8 Creating a 3-Dimensional Hair Model Systems for imaging the human body in three dimensions are becoming increasingly pops ular. A drawback of the systems currently available is that they cannot capture the hair.

In this section we describe and demonstrate a method for constructing a simple hair model given a number of calibrated images and a partial model of the head.

Modelling a persons hair is a notoriously difficult problem as the body of the hair com-

prises typically 100,000 individual strands, each with complex dynamic and optical be-

havior. The problem of capturing this complexity in detail from a small number of images is unrealistic. Instead we aim to represent the volume of the hair using a single surface and to texture map this surface using Images derived from the ones captured of the head.

The problem then is to estimate the shape of the surface that defines the volume of the hair and then to determine the textures needed to render over the top of this.

X.1 Problem Statement

The Input data to the system are a number of calibrated images of the subjects head taken from different viewpoints and a partial model of the head. The Images arc calibrated in the sense that the cameras used to capture the images are calibrated. The cameras are calibrated in the sense that the mapping from points in the world to images coordinates is known. In addition, we assume that the pixels in each image are labelled as belonging to either the head or the background The region of pixels labelled as belonging to the

head is known here as the head silhouette. Segmenting the images mto foreground and background pixels is a classical image-processing problem and is not discussed further

here. Note that no distinction is made between pixels on the face and pixels on the hair.

Automatically classifying these different regions is in general problematic so an algorithm has been developed that does not require this.

The output of the system is a textured model that augments the supplied partial model of the head to produce a complete head model.

A requirement of a candidate model of the head is that when projected into the images'

ii covers all of the pixels that are labelled as belonging to the head. In act, an infiniec family of shapes exists that meets this criterion so in addition it is also necessary to select a model with a plausible shape. Intuitively the smoothest shape that meets the constraints is a good candidate. To summarise: 1. The silhouette of the surface viewed from each camera should match the head sl-

houette identified for that camera.

2. The shape of the surface should be reasonably smooth. Given that smoothness is only a heuristic used to obtain a plausible shape it is not necessary to create a shape that Is maximally smooth.

In practice the first requirement can only be approximately satisfied because of errors in the calibration of the cameras and errors in identifying head silhouette pixels. This issue is discussed in more detail later.

8.2 The Hair Modelling Algorithm The hair modelling algorithm that has been developed comprises the following steps: 1. A surface is created that encloses the volume defined by the re-projection of the head silhouettes for each camera.

2. This initial surface Is smoothed to produce a more plausible shape for the hair.

3. The smoothed surface is warped so that its silhouette matches the head silhouettes for each camera.

4. Each model face is assigned a camera and the face is textured accordingly.

5. (geometry not contributing to the hair is removed and any gaps are closed.

X.2.] Creaffug surface that encloses the re-projectior. volume The reprojection of each head silhouette from the image plane into 3-dimensions defines a volume which encloses the head. The intersection of the volumes defined by all of the cameras, known here as the re-projection volume, is used an initial estimate of the shape or the hea i A 3d point p -s contained within the re-projection volume if it projects inside the silhouette in all of the camera images.

An martial model of the head is created as a mesh with vertices that lie on the surface of the re-projection volume. A variety of techniques can be used to create this mesh. Here this is achieved by shrink-wrapping a mesh model of a sphere onto the volumes surface.

Initially a sphere model is constructed that bounds the re-proJection volume with the cen-

ter of the sphere placed roughly at the center of the volume. To determine this center point the centroid of each head silhouette Is calculated and this is projected into 3-dimensions producing a iinc. The point cioscst to all of these lines in a least squares sense is used as the center of the sphere.

After the bounding sphere model has been positioned, each vertex Is moved along the line defined by its current position and the center of the sphere until it lies on the surface of the re-projection volume. To implement this efficiently, a bisections search is used.

8.2.2 Enforcing head silhouette consistency In practice, errors in the calibration of the cameras and errors In identifying pixels be-

longing to the head in each camera image will result in inconsistencies amongst the head silhouettes. If this is the case, a shape does not exist which will match each head sil-

houette shape when projected into all of the cameras. This causes problems later when determining a warp that aims to fit the shape of the mesh to the head silhouettes. To avoid this problem the head silhouettes are corrected, by removing pixels, so that they are consistent. This is achieved by projecting the mesh enclosing the re-projected volume into each cam-

era and removing head silhouette pixels that are not contained within the projected mesh.

This results in a consistent set of head silhouettes that roost closely match the original silhouettes. 8.2.3 Smoothing To create a hair mode! with a more plausible shape the initial mesh Is smoothed. This Is achieved by itcratvcly applying a Laplacian smoothing operator to the mesh. This operator simply moves each vertex to the mean position of the vertices connected to this vertex by an edge.

A consequence of Laplacian smoothing is that the volume enclosed by the surface is reduced. More sophisticated algorithms exist which preserve volume whist smoothing and these are particularly useful for removing noise. In this case we want the volume to be reduced, however, as the volume of the head will in general be less than the volume of the reprojection volume.

8.2.4 Fixing the silhouette constraints The application of smoothing changes the shape of the model and, in general, the silhou-

ctte of the mesh will no longer match the head silhouettes in the cameras. To correct this disparity a warp is determined and Is applied to the mesh.

If the silhouettes of the head in the camera images contain no holes and the surface model is a closed surface the silhouette matching requirement is satisfied by matching the silhou-

ette boundaries. A number of points are identified on the boundary of the head silhouette at uniform intervals along the boundary length for each camera. For each such boundary point a line is projected into space and the point on the mesh that Is closest to the line Is determined. A displacement vector is then established that moves this point on the mesh to the closest point on the hne.

A continuous warping function Is then established which interpolates the set of dsplace-

ment vectors determined for all of the boundary points and for all of the cameras. A radial basis warp has been used in this instance.

The motivahon for using only a sample of points along the silhouette boundaries Is that the computation of the radial basis warp parameters increases rapidly with the number of constraints. The constraints in this instance being the displacement vectors. A conse-

quence of using only a sample of points is that the warped surface may have vertices that lie outside of the hea:I silhouette boundaries. This is easily rectified by moving any such points back onto the surface of the re-projection volume.

8.2.5 Iterative smoothing and constraint fixing Although not given as a primary requirement earlier, it is desirable that the created hair mesh Is well structured. That is, it comprises faces with reasonably similar sizes with little distortion.

The original bounding sphere mesh has a very uniform structure and will generally main-

tain a reasonable structure after shrink-wrapping to the re-projection volume and smooth-

ing. The warping process can introduce significant distortion if care is not taken however.

If the boundary of the projected mesh and the head silhouette boundaries are close then the calculated displacement vectors, and hence the warp, are well structured. This limits the amount of smoothing that can be applied since the smoothing moves the boundary of the projected mesh away from the silhouette boundaries.

solution to this Is to apply a proportion of the total amount of smoothing required, warp the model to meet the boundary constraints and then repeat. The boundary of the projected mesh never moves very far from the silhouette boundary and so little distortion is introduced but the mesh can be smoothed very significantly. The only disadvantage Is that this takes a tattle longer to perform.

8.2.6 Texturing Textures are derived directly from the images of the head. Each model face is assigned to the camera that has the best view of its surface and as such contains the best texture data.

I he vertices of the lace are linen projected Into this camera to determine the appropriate texture coordinates.

Whcrc tcx.urcs from dicrcnt im,agcs arc adjacent on the model, it Is usual to see a dis-

continuity. This may be due to a number of factors including differences in image color balance and brightness and errors in the geometry of the model so that the textures do not line up correctly. A texture merging algorithm is used to modify the texture images to avoid this problem.

The details of the texturing process are the same as those discussed for the texturing of the template model discussed in Section 5.

8.2.7 Removing excess geometry and gap filling The model at this stage is a closed surface representing the complete head and so it is necessary to remove the geometry covering the face and any other areas that are not hair, such as the neck and ears. To be more precise, geometry should be removed where there is already geometry prodded by the imaging system. T et this be Known as the face model to avoid confusion.

To do this the face model is first projected into each of the cameras to create a mask. Any hair model face that projects, via its camera, entirely onto this mask is removed. This will generally leave a gap between the edge of the hair and the face that should be closed by adding extra faces.

If some user intervention is allowed then it is possible to produce a more refined model.

For each of the cameras the user defines a mask representing pixels that do not belong to the hair. Hair model faces are then removed that project inside this mask as before. The mask is then used to define transparent pixels in the texture maps, defining the edge of the hair in much more detail. In this case, the top of the head above the line of the hair should be modelled and the gap between the hair and the head left open. This generally looks better as the hair is not longer stuck onto the head.

X.3 Results Figure 16 demonstrates the hair modelling process with a real example. The top left image presents one of the images of the subjects head from the Wicks and Wilson Triforrn scanner. The top right image presents the mask, shown in real, which specifies pixels bclor,g.ng to the head This teas'- was specified,nanua!!y but faith improved lighting, in the scanner this process will be reasonably straightforward to automate. The bottom left Image shows the constructed hair model as a red wireframe. It can be seen that this closely matches the silhouette of the head in the magic. Note also that redundant geometry has been removed. Finally the bottom right image shows the hair model after it has been textured. Figure 17 presents a conformed, textured body model with and without a hair model together with an image of the subject in the scanner for comparison. Improvements and modifications may be incorporated without departing from the scope of the invention.

References [Adams et al, 1998] Jim Adams, Ken Parulski' and Kevin Spaulding. Color processing In digital cameras. IEEE Micro, 18(6):20-30, November/December 1998.

[Amenta et al., 1998] Nina Amenta, Marshall Bern, and Manolis Kamvysselis. A new voronoi-based surface reconstruction algorithm. In Michael Cohen, editor, SIGGRAPH 98 Conference Proceedings, Annual Conference Series, pages 415-422. ACM SIG-

GRAPII, Addison Wesley, July 1998. ISBN 0-89791-999-8.

[Bajcsy and Broth 1982] R. Bajcsy and C. Broit. Matching of deformed images. In Proc. 6th Int. Joint Conf Patt. Recog., pages 351--353, 1982.

iBajcsy and Kovacic, 1989] R. Bajcsy and S. Kovacic. Multircsolution elastic matching.

Computer Vision, Graphics, and Image Processing, 46(1): 1 --21, April 1989.

[Bcrgc, 1977] J. M. F. Ten Berge. Orthogonal procnstes rotation for two or more matri-

ces. Psychometrika, 42:267-276, 1977.

[Bergevin et al., 1996] R. Bergevin, M. Soucy, H. Gagnon, and D. Laurendeau. Towards a general multi-view registration technique. IEEE Transactions on Pattern Analysis and machine Intelligence, 18(5), 1996.

[Bernardini et al., 1999] F. Bernardini, J. Mittleman, H. Rushmeier, C. Silva, and G. Taubin. The ball-pivoting algorithm for surface reconstruction. IEEE Transactions on Visualization and Computer Graphics, 5(4):349-359, OctoberlDecember 1999.

[Best and McKay, 1992] P. J. Best and N. D. McKay. A method for registration of 3-D shapes. IEEE 7iansactons on Pattern Analysis and machine Intelligence, 14(2):239-

258, February 1992.

[Bossonat, 1984] J. D. Boissonat. Representing 2D and 3D shapes with the delaunay triangulation. In Seventh International Conference on Pattern Recognition (Montreal, Canada, July 30-August 2 1984), IF,EE Publ. 84Cl12046-1, pages 745 - 748. lEkE, 1 F,EF., 1984.

rBuchlcr et al, 2001] Chris E3uchler, Michael Bossc, Lconard Mcvlillan, Steven J. Gortler, and Michael F. Cohen. Unstructured lumigraph rendering. In Eugene Fi-

ume, editor, SIGGRAPH 2001, Computer Graphics Proceeclings, Annual Conference Series, pages 425--432. ACM Press / ACM SIGGRAPH, 2001.

FBurt and Arlelson; 1983] P. J. Burt and E. H. Adelson. A multiresolution spline with application to image mosaics. ACM Transactions on Graphics, 2(4):217-236, October 1983. [Chen and Medioni, 1991] Yang Chen and Gerard Medioni. Object modeling by reg-

istration of multiple range images. In Proceedings of the 19)1 IEEE l'2ternational Conference on Robotics nod Automation. Sacramento, California, April 1991.

[Debevec and Malik, 1997] Paul E. Debevec and Jitendra Malk. Recovering high dy-

namic range radiance maps from photographs. In Turner Whitted, editor, SIGGRAPH 97 ConJerence Proceedings, Annual Conference Series, pages 369378. ACM SIG-

GRAPH, Addison Wesley, August 1997. ISBN 0-89791-896-7.

[Debevec et al., 1996] Paul E. Debevec, Camillo J. Taylor, and Jitendra Malik. Modeling and rendering architecture from photographs; A hybrid geometry- and image-based approach. In l folly Rushmeer, editor, SIGGRAPH 96 Conference Proceedings, An-

nual Conference Series, pages l l-20. ACM SlOGRAPH, Addison Wesley, August 1996. held in New Orleans, Louisiana, 04-09 August 1996.

[Debevec et al., 1998] Paul Debevec, Yizhou Yu, and George Boshokov. Efficient view-

dependent image-based rendering with projective texture-mappng. Technical Report CSD-98- 1003, University of California, Berkeley, May 20, 1998.

[Debevec et al., 2000] Paul Debevec, Tim Hawkins, Chris Tchou, HaarmPieter Duiker, Westley Stroke, and Mark Sagar. AcquinnL: the reflectance field of a human face.

In Kurt Akeley, editor, Siggraph 2000, Completer Graphics Proceedings, Annual Con-

ference Series, pages 145-156. ACM Press / ACM SIGGRAPH / Addison Wesley Longman, 2000.

[Dryde,'ad Mafdia, 1997] 1. L.. D[yden arid K. Eve. Mardia. Statistical Shape Analysis.

Wiley, Chichester, 1997.

[Duchon, 1977] J. Duchon. Spline minimizing rotation-invanant semi-norms in sobolev spaces In W. Schempp and K. Zcller, editors, Constructive Theory of Functions of Se, feral Variables, volume 5 7! of Lecture Notes in Mathematics, pages 8 5-! 00, 197?.

[Eggert et al., 1997] D. Eggert, A. Lonsso, and R. B. Fisher. Estimating 3-d rigid body transformations: A comparison of four major algorithms. Machine Vision and Appl'-

catons, 9(516):272- 290, I 997.

[Robert ct al., 1998] David W. Eggert, Andrew W. Fitzgibbon, and Robert B. Fisher. S-

multaneous registration of multiple range views for use in reverse engineering of CAD models. Computer Vision and Image Understazclrg: C'VIU, 69(3):253 272, March 1998. [Fuaetal., 1999] P. Fua, R. Plankers, and D. Thalmann. From synthesis to analysis: Fitting human animation models to image data. In Bob Werner, editor, Proceedings of the Conference on Computer Graphics International 1999, pages =11, Los Alamitos, California, June 7-1 l 1999. IEEE Computer Society.

[Golub and Loan, 1996a] Gene H. Golub and Charles F. Van Loan. Matrix computations.

Johns llopkins Studies in the Mathematical Sciences. The Johns Hopkins University Press, Baltimore, MD, USA, third edition, 1996.

[Golub and Loan, 1996b] Gene H. Golub and Charles F. Van Loan. Matrix computations.

Johns Hopkins Studies in the Mathematical Sciences. The Johns lIopkins University Press, Baltimore, MD, USA, third edition, 1996.

[Goodall, 1991] C. R. Goodall. Procrustcs methods in the statistical analysis of shape (wish discussion). J Roy. Statist. Soc. Ser B. 53:285-339,1991.

[Gower, 1975] J. C. Gower. Generalized procrustes analysis. Psychometrika, 40:33-51, 1 975.

[Gu et al., 199 4 J. Gu, T. Chang, I. Mak, and S. Gopalsamy. A 3D reconstre.ct;on systerr.

for human body modeling. Lecture Notes in Computer Science, 1537:229-'??, 1998.

[l lilton et al, 2000] Adrian Hilton, Daniel Beresford, Thomas Gentils, Raymond Smith, Wei Sun, and John Illngworth. Whole-body modelling of people from multiview images to populate virtual worlds In The visual Computer; volume 1G(7), pages 411-

436. Springer, 2000.

[Hoppe et al., 1992] Hugucs Hoppe, Tony DeRose, Tom Duchamp, John McDonald, and Werner Stuetzle. Surface reconstruction from unorganized points. In F.dwm E. Cat-

mull, editor, Computer Graphics (SIGCMPH '92 Proceedings), pages 71-78, July 1992. [Horn et al., 1988] Berthold K. P. Horn, Hilden H., and Negahdariour S. Closed-form solution of absolute orientation using orthonormal matrices. Journal of the Optical Society oJAmerica' 5(7), 1988.

[Horn, 1987] Berthold K. P. Horn. Closed-fonn solution of absolute orientation using unit quaternions. Journal oJ the Optical Society of America, 4(4), April 1987.

[Intel, 2000] Intel. Intel Image Processing Library: Reference Manual, 2000.

[Kakadiaris and Metaxas, 1995] l. Kakadiaris and D. Metaxas. Human body model ac-

quistion from multiple views, 1995.

[Lee et al., 1997] Seungyong Lee, George Wolberg, and Sung Yong Shin. Scattered lData Interpolation with Multilevel B-Splines. IDLE Transactions on Visualization and Com-

p'ter (graphics, 3(3):228-244, July 1997.

[Liu and Nocedal, 1989] D. C. Lou and J. Nocedal. On the Jmrte<1 memory BEGS method for large scale optimization. Math. Programming, 45(3, (Ser. B)):503-528, 1989.

[Maintz and Vergever, 1998] J. Maintz and M. Vicrgcver. A survey of medical image reg,isiraiion,;998.

[Masuda e, al., 1996 T. Masuda, K. 8a'aue, and N. Yokoya. Rcgistation and ir.tegra-

tion of multiple range images for 3-d model construction. In Proceedings of IEFTEF; Confernce on Computer Vision and Pattern Recognition 1996, pages 879 - 883, 1996.

[McInerney and Terzopoulos, 1996] T. McInemey and D. Terzopoulos. Deformable models In medical images analysis: a survey, 1996.

[Mencl and Muller, 1998] R. Mencl and H. Muller. Graph-based surface reconstruction using structures in scattered point sets. In Franz-Ench Wolter and Nicholas M. Pa-

trikalakis, editors, Proceedings of the Conference on Computer Graphics International 1998 (CGI-Y8), pages 298-311, Los Alamitos, California, June 22-26 1998. IEEE Computer Society.

[Plaenkers and Fua, 2001] Ralf Placnlcers and Pascal Fua. Articulated soft objects for video-based body modeling. In Proceedings of the Eighth International Conference On Computer Vision flCCV-01), pages 39401, Los Alamitos, CA, July 9-12 2001.

IEEE Computer Society.

[Powell, 1992] M. J. B. Powell. The theory of radial basis functions in 1990. In W. Light, editor, Advance in Numerical Analysis, Volume Il: Waveless, Subdivision ALgorithms and Radial Basis Functions, volume 2, pages 105 --- 210. Oxford Scientific Publica-

tions, 1992.

[Praun et al., 2001] Emil Praun, W,m Sweldens, and Peter Schroder. Consistent mesh pa-

rameterizations. In Eugene Flume, editor, SIG(7RAPH2001, Computer Graphics Pro-

ceedznss, Annual Conference Series, pages 179-184. ACM Press / ACM SIGGRAPH, 20()1.

[Pull), 1999] K. Pulli. Multvew registration for large data sets. In Proceedings of the Second International Conference on 3-D Imaging and Modeling (3DIM'9Y), pages 160-168. IF.1:F,, October 1999.

[Rusinkicwicz and Lcvoy, 20014 S. Rusinkiewicz and M. Levoy. Efficient variants of the ICP algorithm. In Proceedings of the Third International Conference on 3-D Imaging and Modeling (3DIM'0I), 2001.

[Schoneman, 1966] P. H. Schoneman. A generalized solution of the orthogonal Pro-

c,stesproblem. Pvehometrika 31 I-10 1966 [Sederberg and Parry, August 1986] Thomas W. Sederberg and Scott R. Parry. Free-form deforrnahon of solid geometric models. Computer Graphics (Proceedings o/ SlG-

GRAPH Ifs), 20(4):151-160, August 1986. Held in Dallas, Texas.

[Stoddart and llilton, 1996] A. J. Stoddart and A. 1-Elton. Registration of multiple point sets. In Proceedings of the 13th lAPR International Conference on Pattern Recogni-

tion, pages 404, 1996.

[Szeliski and Lavalice, 1996 rv. Seliski arid S. Lavallee. Matching 3-d anatomical sur-

faces with nonrigid deformations using octree-splines, 1996.

[Turk and Levoy, 1994] Greg Turk and Marc Levoy. Zippered polygon meshes from range images. In Andrew Glassner, editor, Proceedings of SIGGRAPH '94 (Orlando, Florida, July 24-29, 1994), Computer Graphics Proceedings, Annual Conference Se-

nes, pages 311 -318. ACM SIGGRAPH, ACM Press, July 1994. ISBN 0-89791 667-0.

[Wahba, 199()] G. Wabba. Splint Models for Observational Data. SIAM, Philadelphia, 1990. [Wei, 2001] Li-Yi Wei. Texture Synthesis by Fixed Neighhourhood Searching. PhD the-

sis, Department of Electrical Engineering, Stanford University, 2001.

[Wolberg, 1990] George Wolberg. Digital Image Warping. IEEE Computer Society Press, 10662 Los Vaqueros Circle, Los Alamitos, CA, 1990. IEEE Computer Soci-

ety Press Monograph.

14 Color Calibration The Virtual Mirror process requires high quality images of the human form, and in pars ticular the human face, in order to provide sufficient realism for users to believe that they are using a virtual mirror The challenge of providing high quality images In 3D is that Illumination is a (very com-

plex) function of the position of the viewer as well as the shape and color of objects in the scene. An ideal would be to be able to determine the exact irradiance function at each point on a person's skin, in order to be able to relight in any illumination. An initial step in this direction is reported in [Debevec et al., 2000]. This approach is impractical for us, in that hundreds of images must be captured under varying lighting conditions.

The design of the body scanning booth does allow considerable control of lighting, as the booth is closed, there Is a reasonable latitude in the placement and control of lights.

This document is primarily concerned with the Issue of color balance, so that good quality photographic render of users of the scanning booth can be obtained. The requirements are: 1. Captured images should have a consistent color balance, for both cameras within a single booth, and between booths.

2. It should be possible to re-balance images to provide different lighting eJects9 such as indoor or outdoor lighting on the skin.

3. The lighting of the Individual images of a capture should be capable of being nte-

grated so that a seamless view of the whole person Is obtained.

We argue that the simplest way to achieve these goals is to provide uniform diffuse (white) lighting within the booth.

An example of a face, captured under diffuse lighting is shown in figure 18. Notice that even this Image contains some specular highlights on the lips and under the cycs. One method of further reducing: the highhghts is to use polarizing filters on the projector and

camera. Unfortunately this process is both too expensive and loses too much light to be practicable. The advantages of removing specular lighting is that re-balancing and seamless lighting become simpler. In terms of seamless integration specularities are viewpoint dependent, and so can cause problems. In terms of re-lighting specularities contain information about the light source, which can complicate re-lighting.

This document Is concerned largely with color calibration and rebalancing (items I and 2 above). A simple calibration procedure which fills both functions has been developed and Is described. The calibration procedure relies on the availability of a calibration object, in this case the GretagMacbeth ColorChecker. A calibration object with a greater number of targets can be helpful in terms of providing more data points for the corrccton. It Is however more difficult to recognize the individual color patches as the number increases.

In Section A.l the color calibration procedure being used is described. A.1 Color calibration procedure The desired effect of the calibration

procedure can be seen in Figure 19. C)n the left is an uncalibrated image, on the right a calibrated image. It can be seen that the uncalibrated image has a greenish cast. The Image on the right not only has this cast corrected but also has a fuller (or stretched) histogram. This latter effect of calibration, although visually pleasing is bad, in that the original image capture should use the full range of the camera If possible. Some ideas relating to this issue are discussed in Section A.2. 2. It should be noted that the image contains a considerable amount of (undesirable) specularity.

The basic calibration procedure uses the GretagMacbeth ColorChecker. A synthetic Imp age of the ColorChecker can be seen in Figure 20 The cahbraton procedure itself (for each camera) is as follows.

1. The RGB value of each patch In the ColorChecker is set (see discussion below for details). 2. An Image containing the ColorChecker Is captured with the camera. This Is called

* the calibratoi' enraged The exposure settings and lighting should be as close as possible to those used during later Image capture.

3. The 24 color patches of the ColorChecker are identified in the calibration image.

4. The median ROB value of the central portion of each patch is calculated.

5. An optimal color twist matrix mapping the median colors found in step 4 to those of I is calculated, and associated with the image.

6. When a new image is captured the color twist matrix is apphed to each pixel to provide a new value.

We now discuss these steps in a more detail, providing algorithms where appropriate.

A.1.1 Setting the Calibration Object The ColorChecker is supplied with a set of default ROB values for the patches. These are the colors shown in Figure 20. These values are deal, and cannot be realized in a physical setting. As an example, consider the image of Figure 21. This was taken with a Sony DCR-DC5 on a clear morning. Figure 22 shows the same image calibrated using the default values of Figure 20. This is arguably inferior to the original, and looks unrealistic. In particular the colors on the ColorChecker look very unrealistic, although they are exactly the default values.

Looked at analytically, the difference between Figures 21 and 22 is that the first has a slightly blue cast, due the presence of sunlight, and the second has had its histogram stretched. What I propose is that the figures used for the color patches on the ColorChecker should be taken from good quality Images, carefully checked by hanti, and taken in a variety of lighting situations This will provide a first approximation to different hghting conditions, such as sunlight, morning evening, artificial light etc. The particular "cast" of lighting desired can be chosen by the user at any time.

The ad-vantage of this approach is that the color balance achieved should be "natural", in that the colors used for calibration were actually achieved under real lighting conditions.

A.1.2 Locating color patches In order for the color calibration procedure to be useful it should be automated as far as possible so that relatively unskilled staff can carry it out. I believe that automatic location of the color chart in a scene, especially a constrained scene, will be straightforward. At present however the implementation that has been undertaken Is to locate the 24 patches given the location of the 4 bounding corners of the target.

In the current implementation the four bounding corners are located by hand. Given this mfonnation, and our knowledge of the size of the dimensions of the ColorChecker it is straightforward to determine the perspective transform from the ColorChecker to its image. This allows us to determine where the color patches lie in the image. An example Is shown in Figure 23. The algorithm used is described in Appendix A.3 A.1.3 Computing the Color twist A good overall introduction to color calibration in digital cameras can be found in [Adams

et al, 1998]. The calculation of the color twist matrix I use is broadly similar, although I use a 4 x 4 twist matrix, which sets offset as well as gain and cross-correlation. The least squares estimation procedure for the color twist is very similar to that for the perspective transform, and is detailed in Appendix A.4.

Assuming that the twist matrix is M, each pixel (T. 9, b) is written (r, 9, b, l), and is transformed by M. The Intel Processmg Library [Intel, 2000], which we have imple-

mented inside the Java Advanced Imaging framework, contains optimized functions for performing 4 x 4 color twists. Note that the transform matrix M is not a general 4 x 4

t; { + ^ 5/\ Abet Cotta rum ma rl^, I'll Llll_ IJJIIIVglp I') tJ"L W1 Lll 1111] tnOO no I Tn02 m03 M= ml0 ml1 ml2 ml3 m20 m21 inky m23 O O 0 1

A color twist is adequate. It does not provide perfect color registration however. l have also done some tests with a non-linear registration procedure, the outline of which is described in Section A.5, and is implemented in the test program.

A.2 Some additional techniques Here arc some additional ideas which can improve the color rendition.

A.2.1 Gamma Correction Most cameras have gamma correction implemented in hardware, and are factory set to the correct garmna value. If this is not the case, then gamma correction (using the intensity component values of the color patches) could be performed for each color channel, before undertaking the linear (and nonlinear) corrections of Appendix A.4 and A. 5. This is a non-

lnear least squares problem but the solution would be very rapid with only 24 data points and a single variable, and a gamma value of l would provide an appropriate starting point.

A.2.2 Multiple captures There may well be scope in taking capturing multiple images of the calibration target at different exposures. On the assumption that illumination levels can be kept constant and that the sensor is linear in response over the range of exposures being used, multiple exposures would allow more accurate calibration of the radiance function (see [Debevec and Malik, 1997] for an example of this).

Multiple captures are attractive if the camera exposure can be controlled remotely.

A.3 Estimatir.g a Pcrspcctivc Transform The transforrnahon we are looking for is a s x homography, satsfying the equatton: x '1 u hoo ho ho2 u A | y | = H | v | = | kilo hi h12 v | (8) \ Z / J \ h20,i2 1 / \ 1 '/ This equation assumes that h2' 0, which is correct for non-degenerate cases. By dividing through to remove the A tem we have: hoox ho Y + ho2 x - hZox+ h2,y + 1 hlex + hy + h2 11 - h2oX+h2Y + 1 This can be rewritten in matrix form as oo '| I hol I ho2 x y 1 0 0 0 -ux -uy kilo u o o o x y 1 -vx -v J h v J 1Z12 h20 h21 Given points (xO, yO),... (xl,, Yn) (where n = 7 in this case) we have the matrix equa tion hoo Xo gO 1 0 0 0 -uoxO - Uoyo h01 uo 0 0 0 xO yO I -vOxO -vOyO = h' = vO (9) Xll y + ll 1 0 0 0 unXrl -ulDl hl2 un O Xn Yrl 1 Vnxr -vy h20 vr, h2 This equaton can bc solved for H straghtforwardly.

A.4 Estimating the color twist We are Liven a set of samples (24 for the ColorChecker), each of which has a standard (destination) value. The value is an RGB triple, it' = (a', 9', b')T, O < i < 23. The measured value of patch i is t = (To, 9, b,) r In order to "correct the colors in the image we need to find a 4 x HI color twist matrix M such that ti = Mt'. This is an over-constrained problem, and we seek a least squares solution.

Let the matrix M be mOO mO I mO2 mO3 M= m10 mll ml2 ml3 m20 m21 m22 m23 O O 0 1

A matrix equation for the least squares solution of M, given the values of the ti and the t' is (where n = 24 for the ColorChecker).

NATO rO go bo 1 O O O O mO] rO O () O O O O O O mo2 9, O 0 0 0 To go bo 1 mod = by ( 1 O) r,, 9r b,, 1 - O O O O m20 r' O O O O - O O O O LIZ] 9n O O O O Tic gn b,, 1 mzz bra There are a variety of ways in which this equation can be solved. We do it using the SVD decomposition [Golub and Loan, 1996b].

A.5 Nonlinear fitting The color twist estimation described in Section A.4 provides very acceptable color esti-

mates. I have implemented an additional non-linear fit, based upon the use of Radial Basic Unctions. Initial experiments indicate that this provides superior performance when the

Imear approach is a good rematch, bu, oversets when it does not. This could kc rc.mcdicd by providing more calibration points (color targets). If this is too difficult a continuous (perhaps gray-scale) target might prove intcrcsting, as it would provide many more target points. The basis of the non-linear fit is as follows. Given RGB points t, and t' as above, let us write the color component c of t as to, where can be r, 9 or b. We are seeking functions Scat), defined as: (pC(t) = ZCIIt-till +pc(t) subject to the constraints 4>c(t.) = t',O < i < n and YNcq(t')--o, Bend This provides n + 4 constraints in n + 4 variables. These radial basic functions have useful variational properties, and can be shown to have stable numerical solutions m most circumstances; see [Powell, 1992].

A.6 Conclusion

The color checking procedure proposed is more accurate than calibration of a camera alone, as it can guarantee a degree of color constancy between cameras. It does not, however, replace the requirement for careful camera setup. In particular it is important that the full dynamic range of the camera is used' and that a basic degree of color balance is applied in the camera - otherwise not enough information will be available to produce clean images.

The procedure is capable of complete automation, by addition of a function for recogni-

ton of the outline of the ColorCheck (or perhaps the addition of markers to the border of the ColorChecker).

B Oriented Bounding BOA Given a mesh M = {FZ, v71 l < i < M, l c j c in} with faces FZ and vertices v, we would like to determine an oriented box that bounds the mesh. The orientation of the box should be detennined so that the volume of the box is close to the minhnium necessary to bound the mesh. A weight, w,, Is calculated for each vertex by summing of the areas of the faces to which the vertex belongs. The centroid v' of the weighted vertices is then determined.

N V = N Vat W. ( I I) A weighted correlation matrix S is then computed for the vertices where: N S = W2 [V: -- V | [Vet -V] ( 1 2) =1 The principle directions of the oriented bounding box are determined by peforrnng an eden value decomposition of S. Let the three eigen vectors be en, e2 and e3.

[crepes] = EigenVectors(S) (13) The maximum extent dk of the data along each of the principle directions is determined by projecting the mesh vertices onto the principle axes. Let ek be the kth egenvector with its length normalised.

dk = max | [Vj v].ekl (14) The lower corner pow.. of the bounding box is then given by: Ptowcr-V - dkek (15) k=l The upper corner Puppcr of the bounding box is then given by: Ptppcr = v +, dkek (16) k -1

It is fiscal to rcprcscut the bounding box using a transformation matrix T. This transAfor-

mation maps an axis aligned box, centered at the origin with a lower corner ( - 1, - 1, - 1) and an upper corner (1, 11) to the oriented bounding box.

d(x d22x d33x Id T | d1 A dame dame vy ( 17) deed d22Z dabs vz O O 0 1

C Globa! Conformation This appendix describes the algorthrn that is used to globally conform the template mesh to a smooth model of a consumer taken from the scan data.

C.l Preliminaries For a given set of points S: = (I,,,, Z.Z) r, i = 1,, N. and a one-to-one corresponding set of points T. = (x,,y,, z,)T, i = 1,, N. we can find a warp that maps S. mto T. for all i by solving the linear equation (pT oJ( r) (o) (18) where (P Is (N x N), P is ( N x 4) and /Y, and X are all ( N x 3). For example, written out m full for N = 4 this system is given by It'd Ale 13 14 1 t]] (] \'] P] X; Be Z. t11 ) + 23 $74 1 t) 2 (2 A2 t12 P2 x2 Liz Z2 31 132 + 3 34 1 43 3 43 A3 pa Ps x g3 Z3 141 42 143 t44 1 44 4 (4 A4 p4 P4 = X4 p4 Z4. 1 1 1 1 0 0 0 0 Ao Be CO 0 0 0 4 42 43 44 0 0 0 0 Al Be Cal O O O 2 3 g4 0 0 0 0 A2 B2 C2 O O O (1 (} 43 (4 0 0 0 0 A3 B3 C3 O O O ( 19)

where ,' Is shorthand notation for d,'(llS,-S'll) where is the thin plate splinc function defined by (p(r) = r. (20) We solve the above system for the weights A', it, and p, and the polynomial constants At, B. and C, to detcrrntne the warp from S. to T..

Equation ( 18) can be re-arranged to give pT o) ( O) () (21) Defining A as the leading (N x N) submatrix of the inverted matrix on the left-hand side of I. (21) we note that the bending energy, 'V4Jc, for the associated wa is given by Wc = xTAx + yTAy + z Az, (22) where the (N x 1) vectors x, y and z are the first, second and third columns of X respec tTvely. The derivatives are easy to calculate, as follows.

ax, " at,= awe = Aim, (23) C.2 Formulation C.2.1 Introduction

We formulate the problem as a non-constrained mnimisation problem that is solved via the LBFGS algorithm. The function f we are minimTsing is given by f =-E + AD, (24) where E is a measure of the energy of the mesh D is the sum of the square of the distances from the template mesh vertices to the smooth model, and is a positive constant. Not all template mesh vertices are used in the calculations of E anti D - indeed, the set of vertices used to calculate D (given by SD) jS a subset of the set of vertices used to calculate E (given by Sc). The sets SE and SD are discussed in Sec. (C.2.2). Note that E is negated in lo. (24) since E < 0.

The mesh energy E is calculated as follows. We define a crown of a vertex v to consist of v plus all its immediate vertex neighbours. We define the local warp energy of v as the bending energy of the warp calculated from the vertices of the crown of v. To get the rocsh energy we sum up the local warp energy for all crowns of vertices in the set SF

2.2 The.Verte, Sets Some of the template mesh vertices are to remain fixed, namely those that are closest to each template landmark. These vertices are included in the calculation of E, but not D. Since the position of these vertices do not vary we do not need to include them when calculating the derivatives Of E and D. The set SD consists of all variable template mesh vertices that have a corresponding close' point on the data mesh. Specifically, we project along the normal of each vari-

able template mesh vertex (in both directions) and add to SD any vertex whose closest intersection with the data mesh is less than some specified distance (say 2 - 3cm).

The set Sr consists of all template mesh vertices except for vertices that are in parts of the mesh we wish to discard (e.g. the hands, feet, eyeballs, etc.). The set SE can be split into three independent sets - SE, , SD and SE,;. The set SE, consists of all vertices whose positions are to remain fixed' the set AD IS described above and the set S. consists of all variable vertices that do not have a corresponding close point on the data mesh.

C.3 Results This section describes some results from running the global conforming algorithm on the template mesh shown in Fig. (24), and the data mesh shown in Fig. (25). Note that the template mesh has already undergone various warps to closely align itself with the data mesh. Due to a potentially lengthy solution time, it is necessary to specify the number of function evaluations for the LBFGS algorithm. A maximum number of 20 was shown to reduce the average distance from the variable template mesh vertices to corresponding close points on the smooth model to approximately 9.43 X 10 5m. The result of the global conforming algorithm is shown in Fig. (26). Here, -1000 and the time taken was 3.37mins for 20 iterations of the LBFGS algorithm. The values of 22632 variables were solved for (corresponding to 7544 template mesh vertices). Close alignment can be seem between the template and data mesh in virtually all areas. The hands (and feet) are excluded from the conforming and are therefore not in good agreemcut.

Claims

1 Claims

3 1. A method of aligning a plurality of point 4 clouds obtained from a 3D body scanner, each of 5 which defines part of the surface of a scanned body, 6 by rotating and translating the various point clouds, comprising: 8 employing a generalized Iterated Closest Point 9 (ICP) algorithm to perform a global rigid body 10 alignment of the set of point clouds; 11 calculating a mean surface by taking a subset 12 of the points from each point cloud and identifying 13 points from each subset that match points in the 14 other subsets, and deriving the mean surface the 15 mean values of matched points; and 16 aligning the points of each point cloud to the 17 mean surface in an iterative manner until 18 convergence is achieved (as judged in accordance 19 with predetermined criteria).

21

2. The method of claim 1, further comprising 22 exactly aligning selected points that are identified 23 as corresponding to predetermined landmarks on the 24 scanned body.

26

3. A method of bringing a set of 3D data points 27 obtained by an initial alignment of a plurality of 28 point clouds obtained from a 3D body scanner into 29 precise registration with a mean body surface 30 derived from the point clouds, comprising:

1 employing a generalized ICP algorithm to 2 perform an elastic (or nonlinear) warp (or 3 registration); 4 matching points on the mean surface to points 5 in the previously aligned data cloud to define an 6 optimal elastic registration between the mean 7 surface and the previously aligned data cloud 8 surface, and hence to define a warp that is applied 9 to the points of the previously aligned data clouds 10 in an iterative manner until convergence is achieved 11 (as judged in accordance with predetermined 12 criteria). 14

4. The method of claim 3, wherein the initial 15 alignment of the plurality is of point clouds is 16 performed and the mean body surface is derived in 17 accordance with the method of claim 1 or claim 2.

19

5. A method of fitting an existing mesh-type body 20 model template to a set of 3D data points obtained 21 from a 3D body scanner, comprising: 22 determining a space warp that fits the template 23 model to the surface defined by the 3D data points, 24 representing changes of pose and shape that fit the 25 template to the 3D data points.

27

6. The method of claim 5, further comprising: 28 establishing an initial fit of the template 29 model to the 3D data points by identifying points of 30 correspondence between corresponding landmarks on 31 the cempiace and the 3D data points; and

7S 1 using the spatial relationships between the 2 template landmarks and data landmarks to define a 3 radial basis warp that is applied to the template so 4 as to fit the template to the data.

6

7. The method of claim 6, further comprising: 7 employing an algorithm that initially aligns 8 the template with the 3D data points using oriented 9 bounding boxes and then refines the fit using a non 10 rigid (elastic or non-linear) registration (or warp) 11 algorithm.

13

8. The method of claim 7, wherein the non-rigid 14 (elastic or nonlinear) registration (or warp) 15 algorithm is that employed in the method of claim 3 16 or claim 4.

18

9. The method of any of claims 6 to 8, wherein the 19 initial fit is refined by defining additional, 20 substantially uniformly distributed landmarks on the 21 template and calculating a warp that brings the 22 additional landmarks into alignment with the data.

24

10. The method of any of claims 5 to 9, wherein set 25 of 3D data points obtained from a 3D body scanner is 26 derived from a plurality of data clouds that have 27 been processed in accordance with the method of any 23 of claims 1 to 4.

30

11. A method of ensuring consistent 31 parameterization of mesh-type body model templates

1 when fitted to point cloud data obtained from a 3D 2 body scanner, comprising: 3 globally fitting the mesh-type body model 4 template to the scanner data by optimizing an 5 initial fit obtained by the method of any one of 6 claims 5 to 10, and using an iterative process to minimize the overall deformation of the body model 8 template, measured as the deviation from an affine 9 warp. 11

12. A method for optimizing mesh-type body model 12 templates used for fitting to point cloud data 13 obtained from a 3D body scanner, comprising: 14 using the parameterisation provided by the 15 method of claim 11 to minimise (globally) the degree 16 of warping that must be applied to a template model 17 to fit the scanned data; and 18 improving the fit by optimizing the template, 19 wherein the template is optimized by averaging a 20 representative sample of fitted models, so as to 21 generate an average body model template.

23

13. A method of integrating a plurality of texture 24 maps obtained from a 3D body scanner and 25 corresponding to a plurality of point clouds 26 obtained from the body scanner into a single texture 27 map for application to a mesh-type body model 28 derived from the point cloud data, comprising: 29 employing patch borders to ensure substantially 30 seamless integration of the original plurality of 31 texture maps, by generating patches on the body

1 model mesh corresponding to the original texture 2 maps; 3 assigning average texture values to vertices 4 along the borders of the patches based on weighted 5 averages of values from the corresponding original 6 texture maps; 7 mapping these average texture values back to 8 the original texture maps and deriving difference 9 images therefrom; 10 adding the difference images to the original 11 texture maps and projecting the resultant texture 12 maps back onto the body model.

14

14. The method of claim 13, wherein gaps (holes) in 15 the resultant texture map are filled by means of 16 texture synthesis based on texture data adjacent to 17 the gaps.

L8 19

15. A method of modelling the hair of a subject for 20 use in a body model derived from point cloud data 21 obtained from a 3D body scanner, comprising: 22 employing video images of the subject's hair 23 from a number of different viewpoints obtained from 24 the body scanner to construct a smooth 3D surface 25 corresponding to the 3D shape of the hair.

27

16. The method of claim 15, wherein the surface is 28 constructed using silhouettes derived from the video 29 images and the 3D surface is textured using texture 30 data derived from the video images.

1

17. The method of claim 16, wherein texturing is 2 performed using the methods of claim 13 or claim 14.