US20130070049A1 - System and method for converting two dimensional to three dimensional video - Google Patents
System and method for converting two dimensional to three dimensional video Download PDFInfo
- Publication number
- US20130070049A1 US20130070049A1 US13/233,341 US201113233341A US2013070049A1 US 20130070049 A1 US20130070049 A1 US 20130070049A1 US 201113233341 A US201113233341 A US 201113233341A US 2013070049 A1 US2013070049 A1 US 2013070049A1
- Authority
- US
- United States
- Prior art keywords
- depth
- depth map
- dimensional image
- values
- objects
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/14—Picture signal circuitry for video frequency region
- H04N5/144—Movement detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/261—Image signal generators with monoscopic-to-stereoscopic image conversion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/261—Image signal generators with monoscopic-to-stereoscopic image conversion
- H04N13/264—Image signal generators with monoscopic-to-stereoscopic image conversion using the relative movement of objects in two video frames or fields
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/271—Image signal generators wherein the generated image signals comprise depth maps or disparity maps
Definitions
- FIG. 1 illustrates a system for converting two dimensional video to three dimensional video
- FIG. 4 illustrates a system of FIG. 3 for converting two dimensional images to three dimensional images in greater detail
- FIG. 2 shows logic for converting two dimensional images to three dimensional images.
- Each of the modules or functional steps can be implemented, for example, as instructions executed by the processor 106 of FIG. 1 , in hardware only, or as a combination of hardware and software.
- the scene based global depth surface module 310 accepts the edge detection data from the edge analysis module 304 and the scene content data from the scene content analysis module 306 .
- the scene based global depth surface module 310 generates a global depth surface model that identifies the various regions within the image based on one or more surface segments. Each surface segment can have a given depth contour calculated based on the edge detection information and the scene content data.
- the global depth surface model from the scene based global depth surface module 310 can be provided to the depth enhancement module 312 .
- a sharpness analysis module 412 can receive data from the downscaling module 406 .
- the depth values assigned to such objects/regions should also remain uniform.
- the goal is to measure sharpness/blurriness and use it as an indicator of which portion of the input picture is in focus.
- the camera lens focus is a function of the distance from the camera to the object/region in focus, all pixels sharing similar levels of sharpness/blurriness belong to the similar depth values (i.e., distance from the camera).
- the sharpness/blurriness of these pixels are measured. If the sharpness/blurriness measurements are very similar, then all of these pixels receive similar depth values. In this way, no significant fluctuation of depth values is allowed within the region with similar brightness and sharpness.
- the baseline depth map is blended with the edge map.
- the blending factors are programmable parameters. Blending of the edges with baseline depth values is performed to increase relative depth perception because the edges increase the depth variation at object boundaries of the input image. The edge values are clamped prior to blending to achieve the effect of depth layer segmentation.
- certain depth values in the neighborhood are excluded from spatial filtering.
- applying spatial filtering at this location may cause geometric distortion such as bending or warping of the object contours.
- the absolute difference between the current depth value and a neighboring depth value is larger than a threshold, then the neighboring depth value is excluded from the spatial average filtering.
- the threshold value is adaptive and changes depending on where the current depth value is located in the depth map.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Abstract
Description
- This disclosure relates to systems and methods for converting two dimensional video to three dimensional video.
- In two dimensional to three dimensional video conversion, depth information is extracted from input pictures of the two dimensional video in order to generate a pair of stereo output pictures. Extracting depth information is a key step in the two dimensional to three dimensional conversion process.
- Preconstructed three dimensional video, such as the video displayed at properly equipped movie theaters, considers the three dimensional aspects of the video from the start. Essentially, three dimensional processing technicians can properly isolate objects in a scene and place these objects in a depth map along the z-axis. Because these are preconstructed videos, the technicians have the benefit of knowing when objects will enter into the video both before and after the present image of the video. This knowledge allows these technicians to properly place these objects in the depth map.
- However, in many instances, converting two dimensional video to three dimensional video, such as converting broadcast video into three dimensional video, is much more difficult, particularly when it is done in real time. Unlike preconstructed three dimensional movies, there are no technicians that have preprocessed the video with the knowledge of which objects will enter the scene or leave the scene. Accordingly, it is very difficult to properly identify objects and place these objects in a depth map.
- The system may be better understood with reference to the following drawings and description.
-
FIG. 1 illustrates a system for converting two dimensional video to three dimensional video; -
FIG. 2 illustrates logic flow for a method for converting two dimensional images to three dimensional images; -
FIG. 3 illustrates the system ofFIG. 2 for converting two dimensional images to three dimensional images in greater detail; -
FIG. 4 illustrates a system ofFIG. 3 for converting two dimensional images to three dimensional images in greater detail; -
FIGS. 5A and 5B illustrate side views of depth maps; -
FIGS. 6A , 6B, and 6C illustrate a row accumulated depth map, column accumulated depth map, and a baseline depth map, respectively; -
FIGS. 7A and 7B illustrate a spatial neighborhood in a global depth surface; -
FIG. 8 illustrates a coordinate system in a space relative to an image; -
FIG. 9 is logic flow for of the motion analysis module ofFIG. 3 ; -
FIG. 10 illustrates a method for converting a two dimensional image to a three dimensional image using sharpness information; and -
FIG. 11 illustrates a method for adjusting the depth value of a specific object over a period of time as the size of the specific object changes. - Referring to
FIG. 1 , asystem 100 for determining the location of images in a depth map is shown. Thesystem 100 includes a threedimensional display 102 having aviewing area 104, aprocessor 106 in communication with the threedimensional display 102 and astorage device 108. As one example, the threedimensional display 102 can be a polarized three dimensional display. A polarized three dimensional display is configured to project two images superimposed on thedisplay area 104 of the threedimensional display 102 at the same time. Generally, two images are projected superimposed on to thedisplay area 104 of the threedimensional display 102 through orthogonal polarizing filters. For example, pixels forming a left view image can be linearly polarized at 45 degrees and pixels forming a right view image can be linearly polarized at 135 degrees. In order for a view to see the left view image with their left eye and the right view image with their right eye, the viewer can wear a pair ofpassive glasses 110 with the left lens polarized in the same way as the left view image pixels on thedisplay 102 and the right lens being polarized in the same way as the right view image pixels on thedisplay 102. By so doing, the viewer can see both simultaneously, with the left eye seeing the left view image and the right eye seeing the right view image. - The
processor 106 can include aninstruction set 112 having instructions that are executed by anexecution unit 114 of theprocessor 106. It should be understood that theprocessor 106 can be a single processor or can be multiple processors located within the same package or can be multiple processors that are in communication with each other and distributed on one or more circuit boards. - Alternatively, the
instruction set 112 can be stored in thememory device 108, and can be read and executed by theprocessor 106 from thememory device 108. Thememory device 108 can be any suitable memory device capable of storing digital information. For example, thememory device 108 can be a solid state memory device, a magnetic memory device, such as a hard disk, or an optical memory device. Further, thememory device 108 can be incorporated into the processor 16 or can be located separately from theprocessor 106. Further, thememory device 108 can be in direct physical and electrical communication with theprocessor 106, but can be also remote from the processor 16 and can communicate with theprocessor 106 through a wired or wireless communications network. -
FIG. 2 shows logic for converting two dimensional images to three dimensional images. Each of the modules or functional steps can be implemented, for example, as instructions executed by theprocessor 106 ofFIG. 1 , in hardware only, or as a combination of hardware and software. - A two
dimensional input picture 202 can be provided to a depthmap generation module 204. Theinput picture 202 can be a two dimensional image, such as a frame of video of any resolution (e.g., output from a DVD or Blu-Ray™ player), or can be any other type of two dimensional image. The depthmap generation module 204 extracts depth information in the spatial domain. As examples, the depthmap generation module 204 can analyze two dimensional image information including, for example, luma, chroma, and edge information to estimate the depth level for certain regions or objects in the two dimensional images. - The depth map can be then provided from the depth
map generation module 204 to a depth todisparity conversion module 206. The depth todisparity conversion module 206 calculates a distance that a particular pixel will need to be shifted in a left view output picture and/or the amount that a particular pixel will need to be shifted in a right view of the picture for a viewer to perceive the intended depth level in a stereo view. The depth todisparity conversion module 206 determines the shifts for any desired pixels in the image and builds a disparity map that identifies the shift applicable to each pixel. The depth todisparity conversion module 206 provides the disparity map to the stereoview rendering module 208. - The stereo
view rendering module 208 renders the three dimensional image. To that end, the stereoview rendering module 208 generates theleft view picture 210 and theright view picture 212 with the appropriate pixel shift values applied so that the viewer can perceive each pixel at the appropriate depth level. Accordingly, the stereo view rendering module 38 can provide a leftview output picture 210 and a rightview output picture 212 to thedisplay area 104 of adisplay 102 ofFIG. 1 . -
FIG. 3 shows a more detailed example of asystem 300 for generating a three dimensional image based on a two dimensional input picture. Theinput picture 302 can be provided to a variety of different image processing modules. Examples of the image processing modules include anedge analysis module 304, a scenecontent analysis module 306, asharpness analysis module 308, and abrightness analysis module 310. Additional, fewer, or different modules can be present to preprocess the two dimensional input picture. - There are many different methods for edge detection that the
edge analysis module 304 can implement. Examples include zero crossing based edge detection and search base edge detection. Zero crossing based edge detection methods can identify zero crossings in a second order derivative expression from the image in order to determine the location of the edges. These may be zero crossing points, for example, the luma values or the zero crossings of an expression, which may be the zero-crossings of the Laplacian or the zero-crossings of a non-linear differential expression. Prior to applying zero crossing based edge detection methods, a pre-processing step of smoothing may be applied to a possible edge. In search based methods, edges may be detected by computing an edge strength using a first order derivative expression such as gradient magnitude expression. Then local directional maxima of the gradient magnitude may be identified by computing an estimate of the local orientation of the edge. The edges may then be segmented and/or categorized and labeled. The edge detection results may be provided from theedge analysis module 304 to a scene based globaldepth surface module 310. - In addition, the
input picture 302 can be provided to a scenecontent analysis module 306. The scenecontent analysis module 306 can analyze changes in the scene, for example, the changing of camera perspective including zoom, pan, and tilt as well as the change of various objects or regions within the scene. For example, objects can be identified in the video scene by image processing techniques including adaptive background subtraction or other techniques. The scene content analysis data from the scenecontent analysis module 306 can be provided to the scene based globaldepth surface module 310. - In addition, the
input picture 302 can be provided to asharpness analysis module 308. Thesharpness analysis module 308 can analyze the sharpness of certain regions within the image to determine an approximate depth of that region of the image. The lack of depth and/or gradient of a particular region would correspond to a depth of a particular region in the image relative to the focal point of the optical system providing the image to the camera. - More specifically, the
sharpness analysis module 308 receives theinput picture 302 and segments objects in theinput picture 302 based on variations in sharpness in theinput picture 302 to identify and locate specific objects in theinput picture 302. Once these objects are segmented and identified, thesharpness analysis module 308 determines a depth value in the depth map for the objects based on the sharpness of the object. For example, objects that are determined to be sharper can be given a lower value in the depth map, thereby placing the object closer as perceived by the viewer. Additionally, the opposite can be true as well, when the object is placed further away from the viewer and therefore given a greater value in the depth map. Additionally, the object could be placed in the middle of the depth map. Further, if multiple objects are identified and each of these objects have similar sharpness, these objects can be given the same depth value. In this way, thesystem 300 can control the fluctuation of depth values within the region and for objects with similar sharpness. Thesystem 300 can implement any desired mapping between sharpness values and depth. - The results of the
sharpness analysis module 308 can then be provided to adepth adjustment module 312. In addition, the results of thesharpness analysis module 308 can also be provided to the scenecontent analysis module 306. The scenecontent analysis module 306 can adjust the scene content analysis based on the sharpness analysis data. - The
input picture 302 can also be provided to abrightness analysis module 314. Thebrightness analysis module 314 can analyze theinput picture 302 to identify and segment various objects or regions within the image based on brightness characteristics of the image. Thebrightness analysis module 314 can provide brightness data to thedepth adjustment module 312. - The scene based global
depth surface module 310 accepts the edge detection data from theedge analysis module 304 and the scene content data from the scenecontent analysis module 306. The scene based globaldepth surface module 310 generates a global depth surface model that identifies the various regions within the image based on one or more surface segments. Each surface segment can have a given depth contour calculated based on the edge detection information and the scene content data. The global depth surface model from the scene based globaldepth surface module 310 can be provided to thedepth enhancement module 312. - As stated above, the global depth surface module can be used in conjunction with the sharpness data and the brightness data to adjust the depth of certain regions within the input picture. These regions can be placed into particular depth layers, where each depth layer, as described above, can have a layer identification, a layer depth origin, and a layer depth volume. Accordingly, particular regions and/or objects can be tracked across multiple images while providing efficient use of the available depth range. The
depth enhancement module 312 can provide depth adjustment information including layer configuration information such as a number of layers and the volume of each layer to a spatial temporal depth consistency anddepth tuning module 316. - The spatial temporal depth consistency and
depth tuning module 316 can generate a depth map based on the depth adjustment information as well as the pixel information provided from the input picture. The spatial temporal depth consistency anddepth tuning module 316 can provide a depth map to a stereoview rendering module 318. For example, after the depth map is created, it becomes available for final adjustment of depth values according to user-specified settings. This allows gain around a programmable pivot point as well as ability to shift the global depth. The shift can be characterized by the equation y=A*(x−p)+B, where x is the current depth value, A is a gain factor, p is a pivot point, B is an offset/shift, and y is the output depth value. - A disparity map can be generated from the depth-to-
disparity conversion module 317. A stereo view rendering module can utilize the disparity map to generate a leftview output picture 320 and rightview output picture 322 that can be provided to a threedimensional display 102 to present a three dimensional image to a viewer. Still referring toFIG. 3 , thesystem 300 can include amotion analysis module 324 that receives an input picture and outputs motion data to thedepth adjustment module 312. -
FIG. 4 is another more detailed example of asystem 400 for converting two dimensional images to three dimensional images. Moreover,FIG. 4 is a more detailed example of thesystem 300 ofFIG. 3 . InFIG. 4 , thesystem 400 receives twodimensional pictures 402. These twodimensional pictures 402 can be separated intosingle images 404. Eachindividual image 404 can be provided to adownscaling module 406, where decimation (or downsampling) occurs. The advantages of decimating theinput image 404 are: (1) cost of all signal processing operations is reduced significantly in terms of logic size, bandwidth, memory size, etc.; (2) the algorithm is more robust to noise (random analog and digital compression noise); and (3) allow very smooth depth surfaces to be generated easily by maintaining only the dominant features in the input images. - An
edge analysis module 408 can receive data from the downscalingmodule 406. Edges are widely used spatial features in image processing. The following steps can be executed by theedge analysis module 408 to generate edges that are suitable for two dimensional to three dimensional conversion: (1) compute horizontal and vertical edges; (2) take the larger of horizontal and vertical edges; (3) find maximum edge magnitude in a spatial neighborhood of (2N+1)-by-(2M+1) kernel; (4) suppress edges due to noise, letterbox, and pillarbox; and (5) normalize the edge values to occupy the full x-bit range. - In
step 1, horizontal and vertical edges are computed separately. Instep 2, the maximum of the horizontal edge, Hedge, and vertical edge, Vedge, is computed as: -
edge=MAX(Hedge, Vedge). - In step 3, each edge value, edgexy located in a pixel coordinates (x,y), is replaced by the maximum edge magnitude in its (2N+1)-by-(2M+1) neighboring kernel.
-
edge=MAX(edgex+i, y+j); where i=[−N, +N] and j=[−M, +M] - In step 4, the magnitudes of the edges created by noise, letterbox, or pillarbox are reduced to zero. In step 5, the minimum and maximum edge values of the entire edge map are identified. All of the edge values are multiplied by a suitable scaling factor such that the edge values take up the entire X-bit range where X is the number of bits dedicated to edges in the design. For example, if X=8, the full range is [0, 255]. If the minimum edge value is 3 and the maximum edge value is 180 in the current edge map, a scaling factor of 1.41 is multiplied to all edge values such that the resulting edge values range [4, 253].
- A
brightness analysis module 410 can receive data from the downscalingmodule 406. Brightness information can be used as object/region segmentation cues. Brightness information does not necessarily provide depth information, but it does provide natural segmentation of the objects/regions in the input picture, especially in decimated pictures. The idea is to codify this brightness information in such a way that depth values are adjusted accordingly based on the brightness of the region. That is, when brightness (or luma) values are similar in the local region, the depth values are not allowed to change much in that region. Therefore, when a single object/region has a uniform brightness (or luma) level, the method ensures depth values to stay uniform in the uniform brightness region. As an example, this method can be used to limit the maximum range of depth value fluctuation of the given local region based on the brightness (or luma) value fluctuation of the same region. - A
sharpness analysis module 412 can receive data from the downscalingmodule 406. There are other cases when a single object/region consists of different brightness (or luma) levels. The depth values assigned to such objects/regions should also remain uniform. The goal is to measure sharpness/blurriness and use it as an indicator of which portion of the input picture is in focus. As the camera lens focus is a function of the distance from the camera to the object/region in focus, all pixels sharing similar levels of sharpness/blurriness belong to the similar depth values (i.e., distance from the camera). When pixels belonging to very different brightness values are nearby in a local region, then the sharpness/blurriness of these pixels are measured. If the sharpness/blurriness measurements are very similar, then all of these pixels receive similar depth values. In this way, no significant fluctuation of depth values is allowed within the region with similar brightness and sharpness. - A
motion analysis module 414 can receive data from the downscalingmodule 406. In practice, motion present in the input image sequence may not be a perfect translational motion of rigid bodies. Many of motion types detected in natural scenes are complex and non-rigid. In order to create pleasing three dimensional effects, such complex and non-rigid motion types have to be first detected and identified in a robust manner. One advantage in two dimensional to three dimensional conversion is that the motion information needs not be very accurate or precise. Some other applications such as compression, noise reduction, de-interlacing, or frame-rate conversion require pixel-level or sub-pixel-level precision of the motion information. It is not necessarily the case for the two dimensional to three dimensional conversion problem. Therefore, motion detection techniques that are more robust to different motion types are preferred over highly precise and accurate techniques that are not robust to different motion types. Successful motion detection for two dimensional to three dimensional conversion may incorporate the motion detection techniques that perform well on natural scenes with complex non-rigid motion types. - A scene
content analysis module 414 can receive data from the downscalingmodule 406 and theedge analysis module 408. For example, assume there is a scene that includes a sky with clouds. In the scene content analysis performed by the scenecontent analysis module 414, the sky and clouds are detected. The presence of the sky and clouds is a very useful depth cue because they are farther from the viewer than other objects in most cases. The three processing steps are: (1) locate major horizontal edges or lines that represent the horizon; (2) analyze color content—detect spatial concentration of blue and white colors in upper region of the image; (3) identify the sky and clouds based on the horizon and color contents; and adjust depth values of the sky region such that they are farther than other objects in the image. - In
step 1, major horizontal edges are detected. In one embodiment, the number of edges with magnitudes larger than a threshold is counted in the N-by-M neighborhood. If the number exceeds another predetermined threshold, the center location of the N-by-M neighborhood is detected as part of major horizontal edge. For each row of the image, the number of major horizontal edge is counted. The row with the highest number of major horizontal edges is declared as the horizon. - In
step 2, the number of blue and white colored pixels is counted for each row of the image. In step 3, the information fromstep step 1, then the region above the horizon is identified as the sky region. In step 4, the depth values of the sky region are adjusted such that the sky and the clouds are farther away from the viewer compared to other objects/regions in the image. - A
global scene model 419 module can receive data from the downscalingmodule 406 and theedge analysis module 408. Theglobal scene model 419 approximates the depth surface of the scene in the input picture. For example, the ‘Top-to-Bottom’ model approximates the depth surface of a scene where top portion is far away from the viewer (behind the screen) and bottom portion is closer to the viewer (in front of the screen). The scene in the input picture is analyzed and one of the available global depth models is selected. Then, a baseline depth map is constructed based on the model and edge information. - A global
depth surface module 418 can receive data from theedge analysis module 408 and theglobal scene model 419. A depth map can be generated by the globaldepth surface module 418 and adepth enhancement module 420. Thedepth enhancement module 420 can receive data from the downscalingmodule 406 and at least one of thesharpness analysis module 412,brightness analysis module 410, and themotion analysis module 414. The globaldepth surface module 418 generates the global depth surface given a suitable global depth model for the input image. Thedepth enhancement module 420 modifies the global depth surface to maximize the amount of local depth variation and as a result enhance depth perception. - With regards to the global
depth surface module 418, use of global geometric depth models can be used in two dimensional to three dimensional conversion. Each global depth model provides a pattern of depth gradient suitable for a given scene. For example, as shown inFIG. 5A , a top-to-bottom model provides a depth surface pattern that gradually increases from the top of the image toward the bottom such that the top portion is far away and the bottom portion is closer to the viewer. Assuming a global depth model is given, a global depth surface is generated based on the edges extracted from the input image. As a first step, two baseline depth surfaces, row-accumulated and column-accumulated depth surfaces, are generated and combined into one baseline depth map. As a second step, the baseline depth map fromstep 1 is combined with edges to generate the final global depth surface - As explained in the previous paragraphs, a baseline depth map is generated in the following manner. First, a row-accumulated depth map is created based on the edges in the edge map. The depth values in the row-accumulated depth map increase monotonically from the top to the bottom of the depth map. The amount of increase from row-to-row is measured in the variable delta shown in
FIG. 5B . - The value of delta between the ith row and (i+1)th row is computed as the absolute difference between the maximum edge value in the ith row and maximum edge value in the (i+1)th row.
-
delta=abs(max_edge(i)−max_edge(i+1)) - The final row-accumulated depth map becomes the baseline depth map for the top-to-bottom global model. For the center convex model in which the isotropic center region of the picture is closest to the viewer, the depth values start decreasing at the mid-point of the map towards the bottom of the map. For the vertical concave model in which the horizontal band of rows in the middle of the picture is farthest away from the viewer, the row-accumulated depth map of the center convex model is inverted so that the largest depth values are located at the top and bottom of the map.
- The column-accumulated depth map is created in a very similar manner except that the delta between two successive columns (instead of rows) is computed.
- The row- and column-accumulated depth maps are merged together by taking the smaller of the two values to generate the baseline depth map. Examples of this are shown in
FIGS. 6A , 6B, and 6C. The two depth maps can also be merged by taking maximum or average of the two corresponding values. - The baseline depth map is blended with the edge map. The blending factors are programmable parameters. Blending of the edges with baseline depth values is performed to increase relative depth perception because the edges increase the depth variation at object boundaries of the input image. The edge values are clamped prior to blending to achieve the effect of depth layer segmentation.
- Referring back to
FIG. 4 , thedepth enhancement module 420 can perform addition modifications. A spatial neighborhood in the global depth surface is examined to find a depth value that can replace the current depth value. The corresponding luma values in the same spatial neighborhood are examined for this purpose. When the absolute difference between the luma values of the current pixel and a neighboring pixel is less than a predetermined threshold, the current depth value is modified by taking minimum (or average, maximum, median, etc.) of the current depth value and the neighboring depth value. Referring toFIG. 7A , a 2D neighborhood of size 5×5 is used for this purpose where z0 indicates the current depth value. The depth values in the triangular neighborhood, z1˜z16, are examined. In another embodiment shown inFIG. 7B , depth values in a diamond-shaped neighborhood, z1˜z16, are examined. - A spatiotemporal
depth processing module 422 can receive data from thedepth enhancement module 420. The spatiotemporaldepth processing module 422 provides additional filtering. More specifically, the depth map goes through spatial and temporal filtering to ensure smooth variation of depth values in time and space, and to avoid blurring the depth values across sharp edges that separate objects and regions. - During spatial filtering, certain depth values in the neighborhood are excluded from spatial filtering. When depth changes sharply, at an object boundary for example, applying spatial filtering at this location may cause geometric distortion such as bending or warping of the object contours. If the absolute difference between the current depth value and a neighboring depth value is larger than a threshold, then the neighboring depth value is excluded from the spatial average filtering. The threshold value is adaptive and changes depending on where the current depth value is located in the depth map.
- Temporal filtering reduces temporal flickering of depth values in video and ensures quick updates of depth values using FIR temporal filtering after scene change. The depth values computed from the current input picture are blended with the depth values computed from the previous input picture. Let the depth map of the current picture to be denoted as D(t) and the depth map of the previous picture as D(t−1). Then, blending of the two depth maps is simply:
-
D′(t)=ALPHA·D(t)+(1−ALPHA)·D(t−1); where ALPHA=0˜1. -
- The blending factor ALPHA may be programmable.
- A convergence control/
depth control module 424 receives data from the spatiotemporaldepth processing module 422. The convergence control/depth control module 424 may provide a global shift of the depth values based on a programmable amount. After the depth map is created and upsampled to full native resolution, it becomes available for final adjustment of depth values according to user-specified settings. It allows gain around a programmable pivot point (depth control) as well as ability to shift the global depth uniformly (convergence control). See the equation below: -
y=A*(x−p)+B - where x is the current depth value, A is a gain factor, p is a pivot point, B is an offset/shift, and y is the output depth value.
- There is also programmable clamping of minimum and maximum of the output depth value y. These variables are fully controlled through programmable registers
- A depth to
disparity conversion module 426 can receive data from the convergence control/depth control module 424. Disparity is the difference between the locations of stereo pixel pair in the left- and right-view images measured in number of pixels. When disparity is measured directly on the display screen in millimeters, it is called parallax. Conversion from depth values to disparity (or parallax) values is performed: Input is a depth map containing depth values. Output is a disparity map containing disparity values. Depth to disparity conversion is done for every depth value in the depth map such that the size of the output disparity map is equal to the size of the input depth map. An efficient and flexible conversion method from depth to disparity values is possible using a k-point LUT (look up table). Each entry of the LUT is programmable and the data points between two LUT entries are linearly or non-linearly interpolated. - After the depth to
disparity conversion module 426 has performed its processing, the stereo rendering can be accomplished. To that end, aleft view picture 428 and theright view picture 430 are generated with the appropriate pixel shift values applied so that the viewer can perceive each pixel at the appropriate depth level. A plurality of left view pictures 428 and the right view pictures 430 can then be outputted as left view pictures 432 and right view pictures 434. - Referring to
FIG. 8 , a three dimensional coordinatesystem 800 is shown having anx-axis 802, a y-axis 804, and a z-axis 806. Anobject 808 from theinput picture 302 ofFIG. 3 is located in the depth map along the z-axis 806. The motion of theobject 808 in space can be decomposed into a number of motion elements including translations along thex-axis 802, y-axis 804, and z-axis 806 as well as rotation about thex-axis 802, y-axis 804, and z-axis 806. - Translational motion across the
image 302 can be detected by themotion analysis module 324 along thex-axis 802 and the y-axis 804 of theimage 302. This encompasses both global translation motion due to camera panning or local translation motions of objects or regions. Further, occlusion information can be extracted from translational motion on the x-y plane as objects enter or leave theinput picture 302 and the depth of each object can be adjusted as objects enter or leave theinput picture 302. - Translational motion on the z-
axis 806 is a special case for three dimensional rendering because it is directly related to depth changes. Translational motion of individual objects or local regions along the z-axis 806 is also useful for three dimensional rendering because the depth values of these local moving objects change directly proportional to the motion. For example, as the size of theobject 808 changes (e.g., increases) in theinput picture 802 over a period of time the depth value of theobject 808 can be adjusted (e.g., increased) over that period of time. Motion present in theinput picture 802 does not need to be a perfect translational motion. For example, most of the motion types detected in natural scenes are complex and non-rigid. In converting a two dimensional image to a three dimensional image, the motion information can be approximate and need not meet any particular level of accuracy or precision. Other processing, such as compression, noise reduction, de-interlacing, or frame rate conversion can require pixel level or subpixel level precision of the motion information, however. -
FIG. 9 illustrates a more detailed diagram of themotion analysis module 324. As described above, themotion analysis module 324 receives aninput picture 302 and outputs motion analysis data to the depthmap adjustment module 312. Themotion analysis module 324 includes a complex motion mainly onx-y plane module 902, which is configured to receive the input picture and determine if any objects in theinput picture 302 are moving primarily along the x-y plane. This can be determined by first segmenting the objects, as described in the paragraphs above, and then determining if there is movement primarily along the x-y plane of these objects. Detection of complex non-rigid motion is detected with much a relaxed accuracy requirement. The occlusion information is extracted from the motion on X-Y plane such that depth ordering is determined for different regions of the image. Depth ordering means placing certain pixels/regions in front of the others in terms of depth. - The
motion analysis module 324 also includes a complex motion with motion components along the z-axis module 906, which is configured to determine if there is a motion along the z-axis of the input pictures of specified objects. This can be determined by first segmenting the objects, as described in the paragraphs above, and then determining if there is a change in size of these objects. Complex motion along Z-axis: (a) Detection of global zoom in/out motion is used for adjusting the global depth levels. The depth is shifted uniformly on the entire scene based on the zoom motion information. (b) Detection of object/region motion along z-axis allows local adjustment of depth for certain pixels/regions in the image. - The
motion analysis module 324 can also include an occlusion detector and depthlayers generation module 906, which receives data indicating if there is any motion along the x-y plane frommodule 902. From there, the occlusion detection and depthlayers generation module 906 can determine if any new objects have entered into theinput picture 302 along the x-y plane and then assign these objects an appropriate depth value that can be based on various different variables, such as brightness of the object or sharpness of the object. In turn, this data can then be provided to the depthmap adjustment module 312. - The
motion analysis module 324 can also include a global zoom detection anddepth adjustment module 908 and a local object/regiondepth adjustment module 910, both of which receive data from the complex motion with motion components along the z-axis module 904. The global zoom detection and depth adjustment module 88 determines if the movement of objects along the z-axis is global in nature e.g. all the detected objects are increasing or decreasing in size, therefore indicating that all of the objects are moving along the z-axis. For example, the camera capturing the images can be zooming in or zooming out, which has the effect of making all objects in the scene appear either closer or farther away. The global zoom detection anddepth adjustment module 908 determines if the zoom function is occurring and adjusts the objects in the depth map accordingly and provides this data to the depthmap adjustment module 312. - The local objects/region
depth adjustment module 910 adjusts single objects or regions of theinput picture 302 based on data from the complex motion with motion components along the z-axis module 904. For example, if there is a determination there is motion on the z-axis of an object, the local object/regiondepth adjustment module 910 will then adjust the depth of the object and provide this data to the depthmap adjustment module 312. - The
motion analysis module 324 can also include a spatialdepth cues module 912. The spatialdepth cues module 912 receives theinput picture 302 and determines if there are any spatial depth cues in theinput picture 302. Spatial depth cues can include sharpness data of the input picture, regions of the input picture, or specific objects located within theinput picture 302. -
FIG. 10 illustrateslogic 1000 for converting a two dimensional image to a three dimensional image using sharpness information. Thelogic 1000 begins inblock 1002 wherein a two dimensional image is received. Next, inblock 1004, the two dimensional image is segmented, so as to identify specific objects within the two dimensional image. The segmenting of the two dimensional image can use brightness information found in the two dimensional image to identify objects located within the two dimensional image and then segment these objects in the two dimensional image. For example, in an image containing several objects, the objects in the image can have different colors and therefore have different brightness. Using these different brightness levels, thelogic 1000 can then segment and identify these separate objects due to their differences in brightness. - Next, in
block 1006, a depth map is generated comprising depth values indicating the depth of the specific object in the two dimensional image. A variety of different depth maps can be used, including those described in the paragraphs above, such as a top-to-bottom model depth map, a vertical concave depth map, a column accumulated depth map, a row accumulated depth map, or a baseline depth map. Inblock 1008, a determination is made of the depth value for the specific object based on the sharpness of the object. An object that is sharper can be given a lower depth value, indicating that the object is in the foreground. However, any depth value can be assigned to the specific object. Last, inblock 1010, a three dimensional image is generated that comprises the specific object located according to the depth map based on the depth value previously assigned to the object inblock 1008. Thelogic 1000 then returns to block 1002 in a continuous fashion. -
FIG. 11 illustrateslogic 1100 for adjusting the depth value of a specific object over a period of time as the size of the specific object changes. Inblock 1102, a two dimensional video is received comprising two dimensional images that are arranged in sequential fashion over a period of time. Inblock 1104, specific objects in a two dimensional image from the two dimensional video is segmented. Similarly as described inmethod 1000, specific objects in the two dimensional image can be segmented based on brightness information. - In
block 1106, the depth map is generated comprising depth values indicating object depth of the specific objects in the two dimensional image. As described inFIG. 6 and in previous paragraphs, the depth map can be any one of a number of different depth maps. Inblock 1108, a depth value is determined in the depth map for this specific object. Any one of a number of different methodologies can be used to determine the depth value of the specific object including brightness information or sharpness information described in thelogic 1000 ofFIG. 10 . - In
block 1110, the depth value of the specific object is adjusted over a period of time as the size of the object changes. Essentially, if one assumes that a specific object has a set size, the size of the object can only change if there is translational motion along the z-axis and/or rotation along the z-axis. When this occurs, the size of the object will change and the depth value of the object should be adjusted. If the object is increased in size, the object should receive a lower depth value and be visualized by the viewer as being closer. However, if the object is decreasing in size, the object should receive a high depth value, and be perceived by the viewer as being farther away than previously was perceived by the viewer. Last, inblock 1112, a three dimensional image is generated comprising the specific object located according to the depth map. Thelogic 1100 then returns to block 1102. - The methods, devices, and logic described above can be implemented in many different ways in many different combinations of hardware, software or both hardware and software. For example, all or parts of the system 50? can include circuitry in a controller, a microprocessor, or an application specific integrated circuit (ASIC), or can be implemented with discrete logic or components, or a combination of other types of analog or digital circuitry, combined on a single integrated circuit or distributed among multiple integrated circuits. All or part of the logic described above can be implemented as instructions for execution by a processor, controller, or other processing device and can be stored in a tangible or non-transitory machine-readable or computer-readable medium such as flash memory, random access memory (RAM) or read only memory (ROM), erasable programmable read only memory (EPROM) or other machine-readable medium such as a compact disc read only memory (CDROM), or magnetic or optical disk. Thus, a product, such as a computer program product, can include a storage medium and computer readable instructions stored on the medium, which when executed in an endpoint, computer system, or other device, cause the device to perform operations according to any of the description above.
- As a person skilled in the art will readily appreciate, the above description is meant as an illustration of implementation of the principles this invention. This description is not intended to limit the scope or application of this invention in that the invention is susceptible to modification, variation and change, without departing from the spirit of this invention, as defined in the following claims.
Claims (20)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/233,341 US20130070049A1 (en) | 2011-09-15 | 2011-09-15 | System and method for converting two dimensional to three dimensional video |
US13/277,852 US9036007B2 (en) | 2011-09-15 | 2011-10-20 | System and method for converting two dimensional to three dimensional video |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/233,341 US20130070049A1 (en) | 2011-09-15 | 2011-09-15 | System and method for converting two dimensional to three dimensional video |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/277,852 Continuation US9036007B2 (en) | 2011-09-15 | 2011-10-20 | System and method for converting two dimensional to three dimensional video |
Publications (1)
Publication Number | Publication Date |
---|---|
US20130070049A1 true US20130070049A1 (en) | 2013-03-21 |
Family
ID=47880296
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/233,341 Abandoned US20130070049A1 (en) | 2011-09-15 | 2011-09-15 | System and method for converting two dimensional to three dimensional video |
US13/277,852 Active 2033-04-16 US9036007B2 (en) | 2011-09-15 | 2011-10-20 | System and method for converting two dimensional to three dimensional video |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/277,852 Active 2033-04-16 US9036007B2 (en) | 2011-09-15 | 2011-10-20 | System and method for converting two dimensional to three dimensional video |
Country Status (1)
Country | Link |
---|---|
US (2) | US20130070049A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140023230A1 (en) * | 2012-07-18 | 2014-01-23 | Pixart Imaging Inc | Gesture recognition method and apparatus with improved background suppression |
US20150086127A1 (en) * | 2013-09-20 | 2015-03-26 | Samsung Electronics Co., Ltd | Method and image capturing device for generating artificially defocused blurred image |
US11176728B2 (en) * | 2016-02-29 | 2021-11-16 | Interdigital Ce Patent Holdings, Sas | Adaptive depth-guided non-photorealistic rendering method and device |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20120023431A (en) * | 2010-09-03 | 2012-03-13 | 삼성전자주식회사 | Method and apparatus for converting 2-dimensinal image to 3-dimensional image with adjusting depth of the 3-dimensional image |
JP5954668B2 (en) * | 2011-10-11 | 2016-07-20 | パナソニックIpマネジメント株式会社 | Image processing apparatus, imaging apparatus, and image processing method |
US9210405B2 (en) * | 2012-03-22 | 2015-12-08 | Qualcomm Technologies, Inc. | System and method for real time 2D to 3D conversion of video in a digital camera |
GB2502047B (en) * | 2012-04-04 | 2019-06-05 | Snell Advanced Media Ltd | Video sequence processing |
US9619884B2 (en) * | 2013-10-03 | 2017-04-11 | Amlogic Co., Limited | 2D to 3D image conversion device and method |
US20150103200A1 (en) * | 2013-10-16 | 2015-04-16 | Broadcom Corporation | Heterogeneous mix of sensors and calibration thereof |
US20150271467A1 (en) * | 2014-03-20 | 2015-09-24 | Neal Weinstock | Capture of three-dimensional images using a single-view camera |
CN104185008B (en) * | 2014-07-22 | 2017-07-25 | 上海聚力传媒技术有限公司 | A kind of method and apparatus of generation 3D media datas |
US9984494B2 (en) * | 2015-01-26 | 2018-05-29 | Uber Technologies, Inc. | Map-like summary visualization of street-level distance data and panorama data |
KR102286572B1 (en) * | 2015-03-04 | 2021-08-06 | 한국전자통신연구원 | Device and Method for new 3D Video Representation from 2D Video |
US11024047B2 (en) * | 2015-09-18 | 2021-06-01 | The Regents Of The University Of California | Cameras and depth estimation of images acquired in a distorting medium |
US10389804B2 (en) * | 2015-11-11 | 2019-08-20 | Adobe Inc. | Integration of content creation and sharing |
US10249061B2 (en) * | 2015-11-11 | 2019-04-02 | Adobe Inc. | Integration of content creation and sharing |
US10783431B2 (en) | 2015-11-11 | 2020-09-22 | Adobe Inc. | Image search using emotions |
US10198590B2 (en) | 2015-11-11 | 2019-02-05 | Adobe Inc. | Content sharing collections and navigation |
US9818044B2 (en) | 2015-11-11 | 2017-11-14 | Adobe Systems Incorporated | Content update suggestions |
US10552970B2 (en) * | 2018-01-12 | 2020-02-04 | Qualcomm Incorporated | Efficient guide filter for depth refinement |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080150945A1 (en) * | 2006-12-22 | 2008-06-26 | Haohong Wang | Complexity-adaptive 2d-to-3d video sequence conversion |
US20110026808A1 (en) * | 2009-07-06 | 2011-02-03 | Samsung Electronics Co., Ltd. | Apparatus, method and computer-readable medium generating depth map |
US20110032329A1 (en) * | 2009-08-06 | 2011-02-10 | Qualcomm Incorporated | Transforming video data in accordance with three dimensional input formats |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120268562A1 (en) * | 2011-04-20 | 2012-10-25 | Himax Technologies Limited | Image processing module and image processing method thereof for 2d/3d images conversion and frame rate conversion |
-
2011
- 2011-09-15 US US13/233,341 patent/US20130070049A1/en not_active Abandoned
- 2011-10-20 US US13/277,852 patent/US9036007B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080150945A1 (en) * | 2006-12-22 | 2008-06-26 | Haohong Wang | Complexity-adaptive 2d-to-3d video sequence conversion |
US20110026808A1 (en) * | 2009-07-06 | 2011-02-03 | Samsung Electronics Co., Ltd. | Apparatus, method and computer-readable medium generating depth map |
US20110032329A1 (en) * | 2009-08-06 | 2011-02-10 | Qualcomm Incorporated | Transforming video data in accordance with three dimensional input formats |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140023230A1 (en) * | 2012-07-18 | 2014-01-23 | Pixart Imaging Inc | Gesture recognition method and apparatus with improved background suppression |
US9842249B2 (en) * | 2012-07-18 | 2017-12-12 | Pixart Imaging Inc. | Gesture recognition method and apparatus with improved background suppression |
US20150086127A1 (en) * | 2013-09-20 | 2015-03-26 | Samsung Electronics Co., Ltd | Method and image capturing device for generating artificially defocused blurred image |
US11176728B2 (en) * | 2016-02-29 | 2021-11-16 | Interdigital Ce Patent Holdings, Sas | Adaptive depth-guided non-photorealistic rendering method and device |
Also Published As
Publication number | Publication date |
---|---|
US9036007B2 (en) | 2015-05-19 |
US20130070050A1 (en) | 2013-03-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9036007B2 (en) | System and method for converting two dimensional to three dimensional video | |
US9407896B2 (en) | Multi-view synthesis in real-time with fallback to 2D from 3D to reduce flicker in low or unstable stereo-matching image regions | |
US9171372B2 (en) | Depth estimation based on global motion | |
US9525858B2 (en) | Depth or disparity map upscaling | |
US8711204B2 (en) | Stereoscopic editing for video production, post-production and display adaptation | |
US9123115B2 (en) | Depth estimation based on global motion and optical flow | |
RU2692432C2 (en) | Method and apparatus for three-dimensional image forming | |
KR101938205B1 (en) | Method for depth video filtering and apparatus thereof | |
US20140176672A1 (en) | Systems and methods for image depth map generation | |
US20150379720A1 (en) | Methods for converting two-dimensional images into three-dimensional images | |
JP2015522198A (en) | Depth map generation for images | |
US10855965B1 (en) | Dynamic multi-view rendering for autostereoscopic displays by generating reduced number of views for less-critical segments based on saliency/depth/eye gaze map | |
US9100642B2 (en) | Adjustable depth layers for three-dimensional images | |
CA2988360A1 (en) | Method and apparatus for determining a depth map for an image | |
US10631008B2 (en) | Multi-camera image coding | |
US20160180514A1 (en) | Image processing method and electronic device thereof | |
EP3616399B1 (en) | Apparatus and method for processing a depth map | |
Kim et al. | Visual comfort enhancement for stereoscopic video based on binocular fusion characteristics | |
KR20170070148A (en) | Processing of disparity of a three dimensional image | |
US9787980B2 (en) | Auxiliary information map upsampling | |
Coria et al. | Automatic stereoscopic 3D video reframing | |
US9531943B2 (en) | Block-based digital refocusing system and method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: BROADCOM CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HA, HYEONG-SEOK VICTOR;YANG, JASON CHUI-HSUN;SIGNING DATES FROM 20110913 TO 20110916;REEL/FRAME:026921/0686 Owner name: BROADCOM CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HA, HYEONG-SEOK VICTOR;YANG, JASON CHUI-HSUN;SIGNING DATES FROM 20110913 TO 20110916;REEL/FRAME:026919/0323 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: BANK OF AMERICA, N.A., AS COLLATERAL AGENT, NORTH CAROLINA Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:BROADCOM CORPORATION;REEL/FRAME:037806/0001 Effective date: 20160201 Owner name: BANK OF AMERICA, N.A., AS COLLATERAL AGENT, NORTH Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:BROADCOM CORPORATION;REEL/FRAME:037806/0001 Effective date: 20160201 |
|
AS | Assignment |
Owner name: AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD., SINGAPORE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BROADCOM CORPORATION;REEL/FRAME:041706/0001 Effective date: 20170120 Owner name: AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BROADCOM CORPORATION;REEL/FRAME:041706/0001 Effective date: 20170120 |
|
AS | Assignment |
Owner name: BROADCOM CORPORATION, CALIFORNIA Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:041712/0001 Effective date: 20170119 |