WO2023184139A1

WO2023184139A1 - Methods and systems for rendering three-dimensional scenes

Info

Publication number: WO2023184139A1
Application number: PCT/CN2022/083633
Authority: WO
Inventors: Mohamed Ibrahim; Yang Liu; Tengyi LIN; Sergei SACHKOV
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2023-10-05
Also published as: CN118985009A

Abstract

Processing a three dimensional (3D) scene model, including, at a first computing system: defining, for each surface region of a plurality of surface regions, a respective local reference frame and a bin structure, the bin structure discretizing the local reference frame into a set of bins, each bin corresponding to a respective range of light directions that intersect the surface region; computing, for each surface region, a respective light tensor comprising a respective color measurement for each bin of the bin structure for the surface region, the respective color measurement for each bin being based on a path trace of one or more light ray samples that fall within the respective range of light directions corresponding to the bin; assembling, a data structure for the 3D scene model that indicates the respective local reference frames, bin structures, and light tensors for the surface regions.

Description

METHODS AND SYSTEMS FOR RENDERING THREE-DIMENSIONAL SCENES

TECHNICAL FIELD

The present disclosure generally relates to realistic rendering of three-dimensional (3D) scenes, in particular, to methods and systems for enabling limited resource devices to render realistic 3D scenes.

BACKGROUND

Physically based rendering (PBR) is used to generate realistic color renderings of three-dimensional (3D) scenes, which are in high demand for the entertainment industry, and in particular the gaming industry. These techniques mimic physical interactions of light rays with objects in 3D scenes to produce plausible, high-quality, physically accurate renderings. However, PBR techniques typically rely on computationally intensive path tracing algorithms, with the result that rendering images of 3D scenes that are of comparable quality to real scene photographs can be computationally and time intensive. Thus, rendering high quality images of 3D scenes, particularly in an interactive environment of changing viewing perspectives, is a highly challenging and computationally demanding task. The rendering task can be particularly difficult for computationally constrained devices such as mobile devices that have limited processing power and storage capacity and are powered by a limited energy battery.

Accordingly, there is a need for methods and systems that can enable realistic rendering of 3D scenes in a quick and efficient manner using computationally constrained device s such as mobile devices.

SUMMARY

In various examples, the present disclosure describes a method for processing a three dimensional (3D) scene model that defines geometry and appearance of one or more objects in a 3D scene space. In a first example aspect, the method includes, at a first computing system: defining, for each surface region of a plurality of surface regions that collectively represent a surface of the one or more objects, a respective local reference frame and a bin structure, the bin structure discretizing the local reference frame into a set of bins, each bin corresponding to a respective range of light directions that intersect the surface region; computing, for each surface region, a respective light tensor comprising a respective color measurement for each bin of the bin structure for the surface region, the respective color measurement for each bin being based on a path trace of one or more light ray samples that fall within the respective range of light directions corresponding to the bin; assembling, a data structure for the 3D scene model that indicates the respective local reference frames, bin structures, and light tensors for the surface regions; and storing the data structure.

Computing the light tensor at the first computing system can reduce the computational resources required to render a photo-realistic image for the 3D scene model at a rendering device.

In an example of the preceding aspect of the method, comprising, at the first computing system, the method includes: parametrizing a mesh for the 3D scene space, the mesh comprising a plurality of points that collectively represent the surface of the one or more objects, each point having a respective unique point location defined by a respective set of 3D coordinates in a 3D spatial coordinate system for the 3D scene space; grouping the plurality of points into a plurality of point clusters, each point cluster forming a respective one of the plurality of surface regions; and mapping each surface region to a respective discrete pixel of a two-dimensional (2D) image map.

In an example of any of the preceding example aspects of the method, the plurality of points of the mesh includes vertex points that define respective corners of polygonal primitives that form respective surfaces areas, wherein parametrizing the mesh comprises mapping each vertex point to a pair of continuous coordinate variables indicating a location within the 2D map such that each polygonal primitive has a unique location within the 2D map; and the method comprises, at the first computing system, storing data indicating the mapping for each vertex point as part of the 3D scene model.

In an example of any of the preceding example aspects of the method, assembling the data structure comprises populating the discrete pixels of the 2D map such that each discrete pixel includes data indicating the local reference frame and the respective light tensor for the surface region that is mapped to the discrete pixel.

In an example of any of the preceding example aspects of the method, the method includes computing a respective visibility probability for each of the surface regions that indicates a probability that the surface region is visible to one or more light sources, and assembling the data structure comprises indicating in the data structure the respective computed visibility probability for each surface region.

In an example of any of the preceding example aspects of the method, computing the respective light tensor for each surface region comprises path tracing multiple light ray samples for each bin in the bin structure for the surface region and averaging results for the path tracing.

In an example of any of the preceding example aspects of the method, the multiple light ray samples for each bin represent both direct and indirect lighting of the respective surface region, and a multiple bounce limit within the 3D scene space is defined for the light ray samples.

In an example of any of the preceding example aspects of the method, the (3D) scene model conforms to a graphics language transmission format (glTF) .

In an example of any of the preceding example aspects of the method, the respective color measurement for each bin for each surface region is represented as an RGB color value.

In an example of any of the preceding example aspects of the method, the method further includes sending the 3D scene model with the data structure to a rendering device.

In an example of any of the preceding example aspects of the method, the method includes, at the rendering device: obtaining the 3D scene model with the data structure; and rendering a scene image for the 3D scene model based on an input view direction, wherein pixel colors in the rendered scene image are determined based on the color measurements included in the data structure.

According to a further example aspect is a method performed at a rendering device for rendering a scene image corresponding to a view direction for a three-dimensional (3D) scene represented in a 3D scene model. The method includes obtaining a 3D scene model that includes a data structure that indicates a respective local reference frame and light tensor for each of a plurality of surface regions that are included in the 3D scene, wherein the light tensor for each surface region indicates a respective color measurement for a respective light ray intersection direction for the surface region; and rendering a scene image for the 3D scene model based on an input view direction, wherein rendered colors of surface regions represented in the rendered scene image are determined based on the local reference frame and color measurements included in the light tensors for the respective surface regions.

In some example aspects, the present disclosure describes a system comprising one or more processors and one or more non-transitory memories that store executable instructions for the one or more processors, wherein the executable instructions, when executed by the one or more processors, configure the system to perform the method of any one of the preceding example aspects.

In some example aspects, the present disclosure describes a computer readable medium storing computer executable instructions that when executed by one or more processors of a computer system, configure the computer system to perform the method of any one of the above example aspects.

In some example aspects, the present disclosure describes computer program that when executed by one or more processors of a computer system, configure the computer system to perform the method of any one of any one of the above example aspects.

BRIEF DESCRIPTION OF THE DRAWINGS

Reference will now be made, by way of example, to the accompanying drawings which show example embodiments of the present disclosure, and in which:

FIG. 1 is a schematic diagram of a system that can be used for physically based rendering (PBR) in accordance with examples aspects of the present disclosure;

FIG. 2A schematically illustrates an example of a 3D scene model and corresponding 3D scene space;

FIG. 2B is a high level representation of a graphics language transmission format (glTF) that can be used for a 3D scene model;

FIG. 3 is a block diagram illustrating example of operations performed by a server of the system of FIG. 1 to generate an enhanced 3D scene model;

FIG. 4 is a schematic representation of actions performed by a scene editing operation of the server of FIG. 2;

FIG. 5 illustrates selected parameter definitions for an ‘Xatlas’ library that can be used by the server of FIG. 2;

FIG. 6 represents a coded example of defining a ‘tinyGLTF’ image that is used by the example server of FIG. 1;

FIG. 7 shows a simulated illumination of a point cluster x _p by a light source.

FIG. 8 illustrates a pseudocode representation of light capture operation performed by the server of FIG. 2;

FIG. 9A is an perspective view of a light capture reference frame structure;

FIG. 9B illustrates plan views of two examples of light capture reference frame structures in accordance with examples of the present disclosure;

FIG. 10 is a light capture data structure generated by the example server of FIG. 1;

FIG. 11 is a flowchart illustrating an example method for physically based rendering a scene image that is implemented by an example client device of FIG. 1;

FIG. 12 shows an example of a view ray shot into a 3D scene space;

FIG. 13 is a flowchart illustrating an example method for a rendering operation of the method of FIG. 11;

FIG. 14 is an example of a method that can be performed at a server of the system of FIG. 1; and

FIG. 15 is a block diagram illustrating an example hardware structure of a computing system that may be used for implementing methods to process a 3D scene model, in accordance with examples of the present disclosure.

Similar reference numerals may be used in different figures to denote similar components.

DETAILED DESCRIPTION

The following describes example technical solutions of this disclosure with reference to accompanying drawings.

The present disclosure discloses methods and systems for rendering high quality, realistic images of a 3D scene without consuming extensive computational and storage resources at the device that the images are rendered at. In this regard, a server/client based solution is disclosed wherein a computationally powerful system (e.g., a server) is utilized to perform computationally demanding PBR tasks, such as path tracing. The server generates 3D scene data that can enable a computationally constrained device (e.g., a client device such as a mobile device) to render a high-quality, realistic, scene image for the 3d scene data.

FIG. 1 illustrates a schematic diagram of a rendering system 100, including a first computer system (e.g., server 102) and a second computer system (e.g., client device 104) , to render a realistic image of a 3D scene, in accordance with example aspects of the disclosure. As shown in FIG. 1, the server 102 receives 3D scene model 106 as input. As will be explained in greater detail below, server 102 processes the input 3D scene model 106 using one or more computationally intensive algorithms to generate an enhanced 3D scene model 108 that includes edited 3D scene model 106’ with an appended light texture component L _map. The client device 104 receives, as inputs, the enhanced 3D scene model 108 and a view direction 112. Client device 104 renders a scene image 114 representation of a 3D scene from the enhanced 3D scene model 108 that corresponds to the view direction 112. Additional images can be rendered using the enhanced 3D scene model 108 for additional view directions 112, enabling an interactive user experience at client device 104. As will be explained in greater detail below, the data that has been added to enhanced 3D scene model 108 through computationally intensive algorithms at the server 102 can enable the client device 104 to render scene images 114 for different view directions 112 in a computationally and time efficient manner.

In the example of FIG. 1, the server 102 is a computationally capable system that provides resources, data, services, or programs to other devices, such as one or more client devices 104, over a digital communications network 105 that may include one or both of wired and wireless networks. The client device 104 can be any device that is able to realistically render images of 3D scenes with colors. The client device 104 may include a laptop, a desktop personal computer (PC) , tablet, mobile station (MS) , mobile terminal, smartphone, mobile telephone, or other display enabled mobile device.

With reference to FIG. 2A, in an example embodiment, the input 3D scene model 106 describes a 3D scene space 402. Input 3D scene model 106 is a set of data structures that collectively encode data that defines a geometry and appearance of content represented in the 3D scene space 402. Locations (also referred to as points “pt” ) within the 3D scene space can be defined by a set of point coordinates that reference a three dimensional spatial coordinate system, for example an orthogonal X, Y, Z coordinate system. The geometry of objects within the 3D scene space are represented as a collection of basic geometric units, referred to as primitives 406. A primitive 406 can, for example, be a point, a line, or a polygon such as a triangle. A primitive 406 can define a face area of a geometric object 408 that is included in a scene. The geometry of each primitive is defined by one or more points, with each point having a respective set of (x, y, z) coordinates. In the case of a simple polygon shaped primitive, the geometry of the primitive can be defined by the set of points that form the vertices of the polygon. For example, the three points pt (x ₁, y ₁, z ₁) , pt (x ₂, y ₂, z ₂) and pt (x ₃, y ₃, z ₃) can define a respective triangle primitive 406.

Primitives 406 can share common points and edges. Each primitive 406 can have an associated material that indicates visual appearance properties of the primitive, including for example texture properties for a surface of the primitive.

In an illustrative example, 3D scene model 106 conforms to the graphic label transmission format (glTF ^TM) as maintained by The Khronos Group. The glTF specifies a data format for the efficient transmission and loading of 3D scenes by computer applications. In example embodiments, the input 3D scene model 106 (also known as a glTF asset) is represented by a set of files, including: (i) a JavaScript Object Notation (JSON) file (e.g., . gtlf file) containing a full scene description: node hierarchy, materials, cameras, as well as descriptor information for meshes, animations, and other constructs; (ii) Binary files (e.g., . bin file ) containing binary resources that can include geometry, animation, and other buffer-based data; and (iii) Image files (e.g., jpg, . png files) containing image resources such as texture maps. In some examples, binary and image resources may be embedded in the . gtlf file.

To provide context, FIG. 2B is a block diagram overview of the top-level components of a JSON file 230 for a glTF based 3D scene model 106. The JSON file 230 includes a description of the scene structure itself, which is given by a hierarchy of node components 232 that define a scene graph. Scene content (for example one or geometric scene objects) is defined using mesh components 233 that are attached to the node components 232. Material components 236 (together with the components that they reference) define the appearance of scene content to be rendered, including the surface material of such content. Animation components 240 describe how scene content is transformed (e.g., rotated or translated) over time, and skin components 235 define how the geometry of the scene content is deformed based on a skeleton pose. Camera components 234 describe the view configuration for a scene.

The mesh components 233 are stored in arrays in the JSON file and can be accessed using the index of the respective component in the array. These indices are also used to define the relationships between the components.

In the overview of FIG. 2B, a scene component 231 is an entry point for a description of a scene that is represented by 3D scene model 106. Scene component 231 refers to one or more node components 232 that collectively define the screen graph. A node component 232 corresponds to a respective node in the screen graph hierarchy. A node component 232 can contain a transformation (e.g., rotation or translation) , and it may refer to further (child) nodes. Additionally, it may refer to mesh components 233 or camera components 234 that are attached to the node component 232, or to a skin component 235 that describes a mesh deformation.

A mesh component 233 can describe a geometry and appearance of scene content, including a structure of one or more objects that appear in the scene, and can refer to one or more accessor components 237 and material components 236. An accessor component 273 is used for accessing the actual geometry data for the scene content, and functions as an abstract source of arbitrary data. It is used by the mesh component 233, skin component 235, and animation component 240, and provides geometry data, skinning parameters and time-dependent animation values required to render a scene image. Accessor component 273 refers to one or more bufferView components 239, which refers to one or more buffers 243. A buffer 243 contains actual raw binary data for the geometry of 3D objects, animations, and skinning. The bufferview component 239 add structural information to the data contained in the buffer 243. In the example of FIG. 2B, accessor components 237, bufferview components 239, and buffers 243, (hereafter referred to collectively as geometry data 252) cooperatively define data references and data layout descriptions that provide the geometry of the scene content that is represented by mesh component 233.

A material component 236 contains parameters that define the appearance of the scene content being rendered. It can refer to texture components 238 that define surface appearances of scene objects. Each texture component 238 is defined by a sampler component 241 and an image component 242. The sampler component 241 defines how a texture map that is specified in image component 242 should be placed on a surface of scene content when the content is rendered. The texture components 238, sampler components and image components 242, (hereafter referred to collectively as appearance data 251) cooperatively describe the surface appearance of the scene content that is the subject of mesh component 233.

In the input 3D scene model 106, primitives 406 that are defined in a mesh 404 will typically be mapped to respective appearance data (e.g., color and texture data that is present in the input 3D scene model 106) .

Referring again to FIG. 2A, a mesh 404 is shown that is a collection of primitives 406 within a 3D scene.

Referring to FIG. 3, processing of an input 3D scene model 106 by server 102 to generate a respective enhanced 3D scene model 108 will now be described in greater detail. In the illustrated example, server 102 is configured to perform a scene editing operation 302 and a light capture operation 308. In the illustrated example, input 3D scene model 106 is loaded by the server 102 using the ‘tinyglTF’ loader/saver available in the GITUB ‘tinyglTF’ C+++library (Reference 1: S. Fjuita, "Header only C++ tiny glTF library (loader/saver) ., " [Online] . Available: https: //github. com/syoyo/tinygltf. [Accessed Feb. 26 2022] . )

Scene editing operation 302 includes, as sub-operations, a light texture generation operation 304 and a geometry editing operation 306. Referring to FIG. 4, light texture generation operation 304 is configured to generate a set of light texture data L _data for the scene content represented in 3D scene model 106. Light texture data L _data is a collection of geometric data L _gm that corresponds to a blank 2D light texture map L _Map. The 2D light texture map L _Map can be a 2D array of pixels, with each pixel p indexed by a respective pair of w, h coordinates. In an example embodiment, light texture map L _Map can have a size that is set by a defined parameter, for example w=512 by h=512 pixels. Geometric data L _gm can include coordinate mapping data that maps vertex points pt and surface regions (also referred to as point clusters x _p) within the 3D scene space of the 3D scene model 106 to respective locations in the light texture map L _Map.

In example embodiments, one function of light texture generation operation 304 is to identify clusters of points pt (n ₁) , …, pt (n _p) that are close to each other in the 3D scene space 402 and group the identified close points into respective point clusters x _p . Each point cluster x _p can correspond to a respective surface region that will be assigned a set of captured gathered light data as described below. Closeness can be a function of geometrical distance. Each point cluster x _p is then mapped to a respective pixel p (i.e., a unique integer w, h coordinate pair) in the light texture map L _Map. The number of points pt included per point cluster x _p can be determined based on the volume and density of the 3D scene space 402. In some examples, the points pt that correspond to multiple primitives 406 (for example adjacent triangle primitives 406 that share an edge or vertex. ) may be included in a single point cluster x _p. In some examples, only the points pt that correspond to a single primitive 406 may be included in a single point cluster x _p. In some examples, the points pt included in a single point cluster x _p may include only a subset of the points that make up a primitive 406.

In addition to having locations that correspond to pixels indexed according to w, h integer coordinate frame, locations in light texture map L _Map can also be referenced by a continuous u, v coordinate frame. The u, v coordinate frame can overlay the w, h coordinate frame, and the respective frames can be scaled to each other based on the volume of the 3D scene space 402 represented in the 3D scene model 106. Multiple u, v coordinate values can fall within a single pixel p. A further function of light texture generation operation 304 is to map each unique point pt in 3D scene space 402 that defines a primitive (e.g., each vertex point) to a respective, unique u, v coordinate of light texture map L _Map.

Accordingly, light texture generation operation 304 generates geometric data L _gm that includes coordinate data that maps locations in the 3D scene space 402 to locations in the 2D light texture map L _Map. This coordinate mapping data can include point pt to unique u, v coordinate frame mapping. The coordinate mapping data can also include explicit or implicit indications of: multiple point pt to point cluster x _p mapping and point cluster x _p to unique pixel p mapping. In an example, these mapping functions are performed using an application such as the Xatlas function. Xatlas is available as a C++ library available on Github (Reference 2: J. Young, "xatlas, " [Online] . Available: https: //github. com/jpcy/xatlas. https: //github. com/syoyo/tinygltf. [Accessed Feb. 26 2022] ) . Xatlas processes the 3D scene model 106 to parametrizes a mesh that represents the contents of 3D scene space. The parametrized mesh is cut and projected onto a 2D texture map (also referred to as an image) such that every vertex point pt within the mesh is assigned a unique u, v entry in the light texture map L _map and close points are grouped together into respective point clusters x _p (that each correspond to a respective pixel p) .

FIG. 5 presents an example of selected Xatlas parameters 220 that can be defined to enable the Xatlas function to generate geometric data L _gm for a light texture map L _Map corresponding to 3D scene model space 106. A first parameter 222, ‘packOptions. bruteForce’ , determines quality of the 2D projection that corresponds to light texture map L _Map. In an example where both a best projection quality and small size of the 2D light texture map L _Map are desired, the first parameter 222 is set to be true. A second parameter 224, ‘packOptions. texelsPerUnit’ , is used to control unit to texel (i.e. pixel) scale. The second parameter 224 can be set to 0 in example embodiments, causing the Xatlas function to estimate and use a scale that matches a resolution that is determined by a third parameter 226, ‘packOptions. resolution’ . In this example, the third parameter 226 can be set to 512 such that pixel resolution of the 2D light texture map L _Map is set to be close to a 512×512 image.

The output of light texture generation operation 304 is a set of geometric data L _gm corresponding to the light texture map L _Map . The geometric data L _gm includes sets of coordinates that define the corresponding point pt to unique u, v coordinate frame mappings, geometric data L _gm can also define multiple point pt to point cluster x _p mappings, and point cluster x _p to unique pixel p mappings. The geometric data L _gm may include new vertices and faces beyond those defined in the original 3D Scene model 106. These new vertices and faces are added by the Xatlas function to ensure a unique u, v coordinate entry per point pt into the light texture map L _map.

Referring again to FIG. 3, geometry editing operation 306 is configured to add the geometric data L _gm and references to the light texture map L _Map into the glTF file (e.g., a JSON file) for the glTF 3D scene model 106, and generate a blank version of light texture map L _Map for inclusion in the image files that are part of glTF 3D scene model 106.

In this regard, the scene appearance data 251 (i.e., texture component 238, sampler component 241 and image component 242) of glTF 3D scene model 106 is updated to add references to the light texture map L _Map are added to the scene appearance data 251. The scene geometry data 252 of glTF 3D scene model 106 is edited to add the newly generated geometric data L _gm. In particular, the buffer 243, bufferView component 239 and accessor component 237 are updated.

An example of the generation of a blank version of light texture map L _Map is illustrated in the process 260 illustrated by code in FIG. 6, which appends a blank version of light texture map L _Map to the image files that are included in the glTF 3D scene model 106. In the illustrated example, light texture map L _Map is set to have a width (x=w) and height (y=h) , the number of components per pixel is set to 4, and the bits for each component is set to 8.

At the conclusion of light texture generation operation 304 and geometry editing operation 306, the edited glTF 3D scene model 106E, including appended blank light texture map L _Map, can be saved. In some examples, the 3D scene model that is input to server 102 can be pre-edited to include light texture data L _data, in which case scene editing operation 302 can be skipped.

Light capture operation 308 receives the edited glTF 3D scene model 106E

(including the added geometric data L _gm and blank 2D light texture map L _Map) as input and is configured to populate light texture map L _Map with light capture data. An enlarged sample of pixels taken from light texture map L _Map is graphically represented in a left side of FIG. 7. As will be explained below, each pixel p is used to store a set of captured light data for a respective point cluster x _p.

The right side of FIG. 7 graphically illustrates a ray path trace for a view ray 702 (e.g., a ray that represents the reverse direction of a light ray) from view plane 706 to a respective point cluster x _p (e.g., a surface region that corresponds to a pixel 2D light texture map L _Map) and then towards a light source 704. The path trace for view ray 702 includes several bounces, representing that the path between light source 704 and point cluster x _p includes a number of reflection points within the 3D scene 402 before the incoming light ray intersects the point cluster x _p.

The purpose of light capture operation 308 is to capture the lighting for each point cluster x _p that is included in 3D scene 402 and represented by a respective pixel p in 2D light texture map L _map. The lighting for each point cluster x _p is captured for a plurality of light directions d∈D that intersect the point cluster x _p. In example embodiments, each light direction d is defined with respect to a local reference frame for the point cluster x _p. For each point cluster x _p represented by a pixel p in light texture map L _Map, light capture operation 308 is configured to generate a respective gathered light tensor G _p, D that represents the incoming sources of light on the point cluster x _p. In some examples, the captured light represents all incoming direct and indirect light at cluster x _p for each direction d∈D. Direct light refers to light from a light source that intersects point cluster x _p without any intervening bounces (e.g., only one bounce occurs, namely at point cluster x _p, between a view plane 706 and the light source 704) . Indirect light refers to light from a light source 704 that experiences one or more bounces before intersecting point cluster x _p.

FIG. 8 shows a pseudocode representation of a process 810 that can be performed as part of light capture operation 308 for capturing light data for the gathered light tensor G _p, _D for each pixel p∈P. As noted above, each pixel p maps to respective point cluster x _p.

Step 1: As indicated at line 810, a local reference frame and bin structure is defined and stored for each point cluster x _p. The local reference frame and bin structure for a point cluster x _p remains constant through the light capture operation 308 and also for a reconstruction operation (described below) that is performed at client device 104. With reference to FIG. 9A, in one example, the local reference frame 918 is defined by defining three orthonormal vectors relative to the 3D scene space coordinate system. The orthonormal vectors of local reference frame 918 includes a normal vector n _xp that is normal to the face of target point cluster x _p. The direction d of a ray 702 intersecting point cluster x _p can be defined using a pair of spherical coordinates [θ, φ] in the local reference frame 918.

In the illustrated example, the local reference frame 918 is divided into a spherical bin structure 920 that includes a set of discrete bins b∈B that discretize all directions Ω about normal vector n _xp. Each bin b (one of which (bin bi) is shown in the right diagram of FIG. 9A) corresponds to a defined range of light directions (e.g., each bin corresponds to a respective subset of light directions d∈D) relative to vector n _xp. In example embodiments, a plurality of pre-defined bin structures are available for use with local reference frame 920. For example, FIG. 9B shows plan-view examples of spherical bin structures 920_1 and 920_2, each having different number |B| of bins b. For example bin structures 920_1, 920_2 each include |B|=8×8 bins, and |B| = 4×4 bins, respectively. Each bin b corresponds to a respective range (e.g., [θb, Φb] to [θb+θd, Φb+Φd] of directions d in spherical coordinates with respect to normal vector n _xp.

In example embodiments, the local reference frame for all of the respective point clusters x _p in a scene will use the same bin structure type (e.g., all the point clusters x _p will have a respective local reference frame with the same number of bins) .

Step 2: As indicated by lines 812 in FIG. 8, an iterative sampling routine is performed to collect gathered light samples, corresponding to each light direction d∈D, for the point cluster x _p. The gathered light samples are mapped to respective bins b. FIG. 9A illustrates a mapping operation 922 for view ray 702 wherein the ray direction -d is mapped to a bin bi that has a corner coordinate of (r, θb, Φb) . In example embodiments, the value of r is set to a constant, for example r=1, and thus does not to be specified in stored or transmitted coordinate data.

In one example, this iterative sampling routine is performed using a path tracing algorithm (represented in FIG. 8 as “pathTrace (d, B) ” ) . The path tracing algorithm loops through all directions d∈D to capture samples of gathered direct and indirect light from all sources for each direction d (where each direction d can be mapped to a respective bin b of the bin structure 920 for the local reference frame 918) . For example, for each direction d, the path tracer algorithm pathTrace (d, B) can be called with a negative direction –d (e.g, a view direction) and a maximum number of light bounces B as input parameters. The output generated by the path tracer algorithm pathTrace (d, B) is the gathered light for direction d, which can be mapped to a respective bin b. By way of example, the gathered light from a direction d can be represented using a known color coding system, for example the red-green-blue (RGB) color coding system wherein an RGB color value is specified with respective r, g, b (red, green, blue) values. Each parameter (red, green, and blue) defines the intensity of the color of the received light as an integer value between 0 and 255.

The process is repeated to acquire S gathered light samples, which can include multiple captured light samples (e.g., RGB values) for each bin b. The gathered light samples for each bin b are averaged to provide a final respective gathered light measurement G _p, d (e.g., an RGB value) for the bin 940. The gathered light measurements G _p, d for all d∈D for a point cluster x _p are represented in the gathered light tensor G _p, D = {G _p, 1, …, G _p, |B|} , where |B| is the number of bins b in the bin structure 920) . Thus, gathered light tensor G _p, D includes the set of averaged r, g, b color intensity values for each bin b∈B of the local reference frame respective to point cluster x _p.

In the above example, direct and indirect light is included in the respective gathered light measurement G _p, d. However, in some examples, direct light sources (e.g., bounce only at cluster point x _p) can be omitted from the samples used to calculated the gathered light measurement G _p, d such that the gathered light measurement G _p, d corresponds to only indirect light sources.

Step 3: as indicated in line 814 “Capture Visibility” , a further parameter that is determined for each respective point cluster x _p is a visibility probability v _p. The visibility probability v _p for a point cluster x _p is a value that indicates the probability that the point cluster x _p is directly visible from scene light sources 604. Visibility probability v _p depends on types of light sources. For example, with respect to a point light source, a point cluster x _p is either visible or invisible, in which case the visibility probability v _p will have a value of 0 or 1. In the case of an area light source, a point cluster x _p may be visible, invisible, or partially visible, in which case the visibility probability v _p of the point cluster x _p cluster is between 0 and 1. In an example embodiment, in order to determine the visibility probability v _p for a point cluster x _p, a respective sample view ray is projected from point cluster x _p to each of a plurality of predefined light source locations. A visibility probability v _p is calculated by dividing the total number of times that the point cluster x _p is visible by the total number of light ray samples projected. In some examples, visibility probability v _p may be calculated only in respect of direct light sources

In summary, process 810 generates the following light capture data for each point cluster x _p (which corresponds to a respective pixel p in light texture map L _Map) : (i) a local reference frame definition that defines the local reference frame 918 relative to the coordinate system for the 3D scene space 402; (ii) a gathered light tensor G _p, _D , including respective sets of r,g, b, color values for each of the bins b∈B of the selected bin structure 920; and (iii) a visibility probability v _p for the point cluster x _p. The light capture operation 308 is configured to store the light capture data for all point clusters x _p in a light capture data structure L _capture. An example of a data structure that can be used for light capture data structure L _capture is shown in FIG. 10.

In example embodiments, all of the light capture data values that are computed by the process of FIG. 8 are either positive or can be made positive without losing accuracy. This property can be exploited to minimize a size of light capture data structure L _capture as only positive values need to be represented. For example: the spherical coordinates [θ, φ] for the local reference frame each can fall within the range of [0, 2π] ; gathered light tensor G _p, D comprises a set of r, g, b color values within a known range of [0, 255] ; and visibility probability v _p is a probability value and thus inherently has a value in the range of [0, 1] . As the minimum and maximum of the three types of light capture data values are known, the values can each be mapped to predefined ranges while maintaining accuracy up to a defined level. In example embodiments, the [0, 2π] range for each of spherical coordinates [θ, φ] is mapped to a [0, 1] range, and the [0, 255] range for the r, g, b, color values for gathered light G _p, D can be scaled to [0, 1] , using floating point values with an accuracy of 1/255, enabling each data variable to be stored as a single byte.

In this regard, as shown in FIG. 10, light capture data structure L _capture includes a header 450 that comprises first, second, third, and fourth descriptors 452-458 that provide information about the format of the remainder of the light capture data structure L _capture. In one example: (i) descriptor 452 is a byte in length and be used to indicate the number |B| of bins b included in the selected bin structure 920 (which corresponds to the number of gathered light measurements G _p, d included in each gathered light tensor G _p, _D) (ii) descriptor 454 is a byte in length and indicates a number n _var of variables that are used, in addition to gathered color light measurements, for each point cluster x _p (for example, n _var can be used to indicate that the structure 930 also includes two additional variables, namely a local reference frame definition and the visibility probability v _p, for each point cluster) ; and (iii)

descriptors

456 and 458 respectively store the values of w and h and are each 2 bytes in length (where (w X h) is the total number (n) of point clusters x _p that are represented in light texture component L) .

Header 450 is followed by n = (w X h) pixel data sections 459 (1) to 459 (n) (with 459 (p) referring to a generic pixel section) . Each pixel section 459 corresponds to a respective pixel p and includes the light capture data collected by light capture operation 308 for a respective point cluster x _p. In particular, each pixel section 459 (i) includes a color data field 460 for gathered light tensor G _p, D = {G _p, 1, …, G _p, |B|} . In the illustrated example, color data field 460 is (|B|X 3) bytes long, with three bytes used for the point cluster x _p specific gathered light G _p=i, b values for each bin b. One byte is used for each of the r, g, b color values, respectively.

Each pixel section 459 (i) also includes a local reference frame section 462 that can be, in an example embodiment, 4 bytes in length for including a definition of the local frame reference. For example 2 bytes can be used for storing coordinates for the normal vector n _xp and two bytes to store coordinates for one of the two reference frame coordinate vectors that are orthogonal to it (the third orthonormal vector can be computed during a future rendering task based on the provided vector data for the other two orthogonal vectors) .

Each pixel section 459 (i) also includes a visibility probability section 464 that can be, in an example embodiment, one byte in length, for including the visibility probability v _p computed for the point cluster x _p.

The light capture data structure L _capture contains the light texture data that is used to populate the light texture map L _Map. In particular, light capture data structure L _capture is converted by server 102 into a portable network graphics (PNG) format that is used to populate the light texture map L _Map. Although other transmission formats can be used for light capture data structure L _capture, conversion into a . png format allows light texture map L _Map to take advantage of lossless compression and is well-suited for storing color data that covers area will small color variation.

Referring again to FIG. 3, the server 102 outputs the edited glTF 3D scene model 106E with appended light texture map L _Map as populated by light capture operation 308, to provide enhanced 3D scene model 108. As will be explained below, the data that has been added to the enhanced 3D scene model 108 relative to the input 3D scene model 106 can enable scene rendering tasks to be performed using relatively fewer computational resources that would be required for the same tasks using only the input 3D scene model 106. Accordingly, enhanced 3D scene model 108 can be used to enable realistic and fast scene image rendering by support devices with lower computational resources, such as client device 104.

Client device 104 processing of an enhanced 3D scene model 108 to render a physically based realistic scene image 114 will now be explained in greater detail with reference to FIG. 11, according to example aspects of the disclosure. As indicated at block 502, the enhanced 3D scene model 108, including edited 3D scene model 106E and PNG format light texture map L _Map, is obtained as input by client device 104 through a communication network or other medium. The edited 3D scene model 106E is loaded using a . gltf loader. The PNG format light texture map L _Map is decompressed to provide a recovered version of light capture data structure L _capture.

As indicated at block 504, the light capture data structure L _capture can then be processed to recover captured light data. For example, the light capture data structure L _capture can be parsed to extract the header 450 parameters n _D, n _var, w and h. The light capture data structure L _capture can be further parsed to recover, for each of the n point clusters x _p : a respective gathered light tensor G _p, D = {G _p, 1, …, G _p, |B|} , the two vectors that define the respective local fame reference data for the point cluster x _p, and the respective visibility probability v _P for the point cluster x _p. The color values for each of the gathered light tensors can be scaled back up from [0, 1] to [0, 255] , and similarly, any scaling to [0, 1] performed in respect of the coordinate frame reference values can also be reversed.

As indicated at block 506, the light capture data structure L _capture and loaded edited glTF edited 3D scene model 106E can be used by client device 104 to render a globally illuminated scene image 114 that corresponds to a respective view direction 112. One or more view directions 112 can, for example, be provided through successive user interactions with an input interface of the client device 104, thereby enabling an interactive viewing experience of successive images from different view directions of the 3D scene space.

With reference to FIG. 12, in order to construct and render a scene image 114, client device 104 can apply a version of a light path tracer algorithm. The light tracer algorithm simulates a respective reverse direction (-d) light ray (i.e., a view ray) 1206 into the 3D scene space 402 shot through each of the pixels p _v of an image plane 1204. The image plane 1204, which can correspond to rendered scene image 114, is positioned relative to the 3D scene space 402 in a location that corresponds to the input view direction 112. The image plane 1204 is a w _r by h _r matrix of pixels p _v.

Actions that can be performed by client device 104 as part of rendering block 506 to render scene image 114 for a specified view direction 112 are illustrated in FIG. 13, according to an example embodiment. In example embodiments, a light path tracer algorithm is applied to simulate one or more respective view rays 1206 for each image plane pixel p _v of the view plane 1204. The following set of actions is performed for each view ray 1206:

Block 5062: for each view ray 1206, determine the x, y, z coordinates (i.e., a point hit location x) in the 3D scene space 402 for the point at which where the view ray 1206 first interacts with a surface. Based on the point hit location x, fetch the corresponding surface material that is specified for the point hit location x in the appearance data 251 of enhanced 3D scene model 108. In examples, the surface material will be specified as part of the appearance data 251 that was included in the input 3D scene model 106, and may for example include one or more of a color or a texture or a combination thereof. Based on the angle of the view ray 1206 and the properties of the fetched surface material, a direction γ of the reflected view ray 1206R is computed.

Block 5064: Based on the point hit coordinates, and the geometric data L _gm included in the enhanced glTF 3D scene model 108, the point hit location x is mapped to a respective point cluster x _p represented in the light capture data structure L _capture (which corresponds to a respective pixel of the light texture map L _map) .

Block 5066: Obtain the local reference frame definition data for the point cluster x _p from the light capture data structure L _capture. For example, this can include information that defines two of the three orthogonal vectors that define the respective local fame reference data for the point cluster x _p. The third orthogonal vector for the local reference frame can be computed using a cross product between the two known orthogonal vectors.

Block 5068: Map the direction γ of the reflected view ray 1206R to a respective gathered light measurement G _p, _d (i.e., a respective bin b) within the gathered light tensor G _p, D = {G _p, 1, …, G _p, |B|} .

Block 5070: Calculate a final rendering color value for the image plane pixel p _v based on: gathered light measurement G _p, d (which is a set of r, g, b color values in the illustrated example) for the point cluster x _p; the visibility probability v _p for the point cluster x _p; and the material property extracted from the edited glTF 3D scene model 106E. For a hit point ‘x’ , the visibility probability is used to attenuate the value of incoming direct light towards ‘x’ . If ‘x’ is completely visible, the visibility probability will be 1, and therefore the direct light value arriving at ‘x’ won’t be changed. However, if ‘x’ is partially visible, or completely invisible, the visibility probability will be less than 1 and attenuate the amount of direct light arriving at ‘x’ . The fetched indirect light values along with the visibility, material, and direct light values comprise the components needed to solve an approximation to the rendering equation in order to compute the final color of the pixel. The final rendering color value computed for the image plane pixel p _v is the color value for a corresponding pixel in rendered scene image 114.

As indicated at block 5072, the process is repeated for all image plane pixels p _v to generate rendered scene image 114 for view direction 112.

By way of overview, FIG. 14 illustrates an example of a method performed at the server 102 according to an example embodiment. In the example of FIG. 14, the server 102 processes 3D scene model 106E that defines geometry and appearance of one or more objects in 3D scene space 104. As indicated at Block 1402, for each surface region (e.g., each point cluster x _p) a plurality of surface regions that collectively represent a surface of the one or more objects, the server 102 defines a respective local reference frame 918 and a bin structure 920, the bin structure 920 discretizing the local reference frame 918 into a set of bins b∈B, each bin corresponding to a respective range of light directions that intersect the surface region. As indicated at Block 1404, for each surface region, the server 102 computes a respective light tensor G _p, D = {G _p, 1, …, G _p, |B|} comprising a respective color measurement G _p, _b for each bin b of the bin structure 920 for the surface region, the respective color measurement for each bin being based on a path trace of one or more light ray samples that fall within the respective range of light directions corresponding to the bin. As indicated at Block 1406, the server 102 assembles a data structure (e.g., light capture data structure L _capture or light texture map L _Map) . As indicated at Block 1408, the server 102 stores the data structure.

It will be appreciated that in at least some scenarios the systems and methods described above can shift computationally demanding operations related to path tracing to calculate colors for incoming light directions to a computationally capable server, such that a client device can render a photorealistic image without excessive computational resources costs. The client device obtains pre-computed parameters that are stored in a data structure when the physically realistic rendering is performed, and thus calculation of a color of bounces for a cluster is avoided at the client device. Thus, the physically realistic rending may be increased at a computationally constrained client device.

FIG. 15 is a block diagram illustrating an example hardware structure of a computing system 600 that is suitable for implementing embodiments described herein, such as instances of the server 102 or the client device 104 in the rendering system 100. Examples of the present disclosure may be implemented in other computing systems, which may include components different from those discussed below.

Although FIG. 15 shows a single instance of each component, there may be multiple instances of each component in the computing system 600. Further, although the computing system 600 is illustrated as a single block, the computing system 600 may be a single physical machine or device (e.g., implemented as a single computing device, such as a single workstation, single end user device, single server, etc. ) , or may comprise a plurality of physical machines or devices (e.g., implemented as a cluster of servers or a cluster of client devices) . For example, the computing system 600 may represent a group of servers or cloud computing platform using the first tracing algorithm to calculate the one or more parameters (e.g., a calculated color, a visibility probability, and a local frame) of captured incoming lights from a plurality of directions for each cluster in an edited 3D scene.

The computing system 600 includes one or more processors 602, such as a central processing unit, a microprocessor, a digital signal processor, an application-specific integrated circuit (ASIC) , a field-programmable gate array (FPGA) , a dedicated logic circuitry, a dedicated artificial intelligence processor unit, a graphics processing unit (GPU) , or combinations thereof.

The computing system 600 may include an input/output (I/O) interface 604 to enable interaction with the system through I/O devices. The computing system 600 may include a communications interface 614 for wired or wireless communication with other computing systems via one or more intermediate networks. The communications interface 614 may include wired link interfaces (e.g., Ethernet cable) and/or wireless link interfaces (e.g., one or more antennas) for intra-network and/or inter-network communications.

The computing system 600 may include one or more memories 616 (collectively referred to as “memory 616” ) , which may include a volatile and non-volatile memories. Non-transitory memory 616 may store instructions 617 for execution by the one or more processors 602, such as to carry out examples described in the present disclosure. For example, the memory 616 may store instructions for implementing any of the methods disclosed herein. The memory 616 may include other software instructions, such as for implementing an operating system (OS) and other applications/functions.

The memory 616 may also store other data 618, information, rules, policies, and machine-executable instructions described herein.

In some examples, instructions for performing the methods described herein may be stored on non-transitory computer readable media.

It should be noted that noted that, although the present disclosure applies static scenes with static light sources, this is not intended to be limiting. In some examples, dynamic scenes and dynamic light sources may be applied in other suitable scenarios.

The present disclosure provides certain example algorithms and calculations for implementing examples of the disclosed methods and systems. However, the present disclosure is not bound by any particular algorithm or calculation. Although the present disclosure describes methods and processes with steps in a certain order, one or more steps of the methods and processes may be omitted or altered as appropriate. One or more steps may take place in an order other than that in which they are described, as appropriate.

A person of ordinary skill in the art may be aware that, in combination with the examples described in the embodiments disclosed in this disclosure, units and algorithm steps may be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether the functions are performed by hardware or software depends on particular applications and design constraint conditions of the technical solutions. A person skilled in the art may use different methods to implement the described functions for each particular application, but it should not be considered that the implementation goes beyond the scope of this disclosure.

It may be clearly understood by a person skilled in the art that, for the purpose of convenient and brief description, for a detailed working process of the foregoing system, apparatus, and unit, refer to a corresponding process in the foregoing method embodiments, and details are not described herein again.

It should be understood that the disclosed systems and methods may be implemented in other manners. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual requirements to achieve the objectives of the solutions of the embodiments. In addition, functional units in the embodiments of this application may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit.

When the functions are implemented in the form of a software functional unit and sold or used as an independent product, the functions may be stored in a computer-readable storage medium. Based on such an understanding, the technical solutions of this disclosure essentially, or the part contributing to the prior art, or some of the technical solutions may be implemented in a form of a software product. The software product is stored in a storage medium, and includes serveral instructions for instructing a computer device (which may be a personal computer, a server, or a network device) to perform all or some of the steps of the methods described in the embodiments of this application. The foregoing storage medium includes any medium that can store program code, such as a universal serial bus (USB) flash drive, a removable hard disk, a read-only memory (ROM) , a random access memory (RAM) , a magnetic disk, or an optical disc, among others.

As used herein, statements that a second item (e.g., a signal, value, scalar, vector, matrix, calculation, or bit sequence) is “based on” a first item can mean that characteristics of the second item are affected or determined at least in part by characteristics of the first item. The first item can be considered an input to an operation or calculation, or a series of operations or calculations that produces the second item as an output that is not independent from the first item. Where possible, any terms expressed in the singular form herein are meant to also include the plural form and vice versa, unless explicitly stated otherwise. In the present disclosure, use of the term “a, ” “an” , or “the” is intended to include the plural forms as well, unless the context clearly indicates otherwise. Also, the term “includes, ” “including, ” “comprises, ” “comprising, ” “have, ” or “having” when used in this disclosure specifies the presence of the stated elements, but do not preclude the presence or addition of other elements. As used here, the term “tensor” can mean a data structure that includes a set of discrete values where the order of the values in the data structure has meaning. Vectors and matrices are examples of tensors.

The foregoing descriptions are merely specific implementations of this application, but are not intended to limit the protection scope of this disclosure. Any variation or replacement readily figured out by a person skilled in the art within the technical scope disclosed in this disclosure shall fall within the protection scope of this disclosure.

Claims

A method for processing a three dimensional (3D) scene model that defines geometry and appearance of one or more objects in a 3D scene space, the method comprising:

at a first computing system:

defining, for each surface region of a plurality of surface regions that collectively represent a surface of the one or more objects, a respective local reference frame and a bin structure, the bin structure discretizing the local reference frame into a set of bins, each bin corresponding to a respective range of light directions that intersect the surface region;

computing, for each surface region, a respective light tensor comprising a respective color measurement for each bin of the bin structure for the surface region, the respective color measurement for each bin being based on a path trace of one or more light ray samples that fall within the respective range of light directions corresponding to the bin;

assembling, a data structure for the 3D scene model that indicates the respective local reference frames, bin structures, and light tensors for the surface regions; and

storing the data structure.
The method of claim 1 comprising, at the first computing system:

parametrizing a mesh for the 3D scene space, the mesh comprising a plurality of points that collectively represent the surface of the one or more objects, each point having a respective unique point location defined by a respective set of 3D coordinates in a 3D spatial coordinate system for the 3D scene space;

grouping the plurality of points into a plurality of point clusters, each point cluster forming a respective one of the plurality of surface regions; and

mapping each surface region to a respective discrete pixel of a two-dimensional (2D) image map.
The method of claim 2 wherein the plurality of points of the mesh includes vertex points that define respective corners of polygonal primitives that form respective surfaces areas,

wherein parametrizing the mesh comprises mapping each vertex point to a pair of continuous coordinate variables indicating a location within the 2D map such that each polygonal primitive has a unique location within the 2D map; and

the method comprises, at the first computing system, storing data indicating the mapping for each vertex point as part of the 3D scene model.
The method of claim 2 or 3 wherein assembling the data structure comprises populating the discrete pixels of the 2D map such that each discrete pixel includes data indicating the local reference frame and the respective light tensor for the surface region that is mapped to the discrete pixel.
The method of any one of claims 1 to 4 comprising computing a respective visibility probability for each of the surface regions that indicates a probability that the surface region is visible to one or more light sources, and assembling the data structure comprises indicating in the data structure the respective computed visibility probability for each surface region.
The method of anyone of claims 1 to 5 wherein computing the respective light tensor for each surface region comprises path tracing multiple light ray samples for each bin in the bin structure for the surface region and averaging results for the path tracing.
The method of claim 6 wherein the multiple light ray samples for each bin represent both direct and indirect lighting of the respective surface region, and a multiple bounce limit within the 3D scene space is defined for the light ray samples.
The method of any one of claims 1 to 7 wherein the (3D) scene model conforms to a graphics language transmission format (glTF) .
The method of any one of claims 1 to 8 wherein the respective color measurement for each bin for each surface region is represented as an RGB color value.
The method of any one of claims 1 to 9 comprising:

sending the 3D scene model with the data structure to a rendering device.
The method of claim 10 comprising:

at the rendering device:

obtaining the 3D scene model with the data structure;

rendering a scene image for the 3D scene model based on an input view direction, wherein pixel colors in the rendered scene image are determined based on the color measurements included in the data structure.
A method performed at a rendering device for rendering a scene image corresponding to a view direction for a three-dimensional (3D) scene represented in a 3D scene model comprising:

obtaining a 3D scene model that includes a data structure that indicates a respective local reference frame and light tensor for each of a plurality of surface regions that are included in the 3D scene, wherein the light tensor for each surface region indicates a respective color measurement for a respective light ray intersection direction for the surface region; and

rendering a scene image for the 3D scene model based on an input view direction, wherein rendered colors of surface regions represented in the rendered scene image are determined based on the local reference frame and color measurements included in the light tensors for the respective surface regions.
A system comprising one or more processors and one or more non-transitory memories that store executable instructions for the one or more processors, wherein the executable instructions, when executed by the one or more processors, configure the system to perform the method of any one of claims 1 to 12.
A computer readable medium storing computer executable instructions that when executed by one or more processors of a computer system, configure the computer system to perform the method of any one of claims 1 to 12.
A computer program that when executed by one or more processors of a computer system, configure the computer system to perform the method of any one of claims 1 to 12.