WO2013173548A2

WO2013173548A2 - Adapting large format stereoscopic content to other platforms

Info

Publication number: WO2013173548A2
Application number: PCT/US2013/041286
Authority: WO
Inventors: Pierre Hughes ROUTHIER; Brian J. Dorini
Original assignee: Thomson Licensing
Priority date: 2012-05-17
Filing date: 2013-05-16
Publication date: 2013-11-21
Also published as: WO2013173548A3

Abstract

Various implementations relate to scaling a picture to a different resolution. In one implementation, a source picture having a source resolution is accessed, and a destination resolution for a destination picture is accessed. A lower bound is determined on a size of an area in the source picture that can be selected for use in generating the destination picture. The lower bound is based on a disparity metric associated with the destination resolution. A selected area of the source picture, at least as large as the lower bound, is scaled. The destination picture is based on the scaled selected area. In another implementation, a portion of the source picture is displayed. A minimum size for cropping is identified on the display. Input is accepted that identifies a selected area of the displayed source picture for cropping, and the selected area is at least the minimum size.

Description

ADAPTING LARGE FORMAT STEREOSCOPIC

CONTENT TO OTHER PLATFORMS

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. provisional application No. 61/688,58^"/ filed May 17, 2012, and titled "Method for Adapting Large Format Stereoscopic Content to Other Platforms", the contents of which are hereby incorporated by reference for all purposes.

TECHNICAL FIELD

Implementations are described that relate to digital pictures. Various particular implementations relate to processing 3D digital pictures.

BACKGROUND

It is frequently desirable to convert a large format stereoscopic digital picture to < smaller format. However, frequently the resulting picture has a different stereoscopic effect on a viewer. SUMMARY

According to a general aspect, a source picture having a source resolution is accessed. A destination resolution for a destination picture is accessed. A lowe bound is determined. The lower bound is on a size of an area in the source picture that can be selected for use in generating the destination picture. The lower bound is based on a disparity metric associated with the destination resolution. A selected area of the source picture is scaled. The selected area is at least as large as the lower bound. The destination picture is based on the scaled selected area.

According to a general aspect, at least a portion of a source picture having a source resolution is displayed. A minimum size for cropping is identified on the display of at least the portion. Input is accepted that identifies a selected area o the source picture for cropping. The selected area is at least as large as the identified minimum size. The selected area is scaled to form a scaled picture. The details of one or more implementations are set forth in the accompanying drawings and the description below. Even if described in one particular manner it should be clear that implementations may be configured or embodied in variou manners. For example, an implementation may be performed as a method, or embodied as an apparatus, such as, for example, an apparatus configured to perform a set of operations or an apparatus storing instructions for performing a set of operations, or embodied in a signal. Other aspects and features will become apparent from the following detailed description considered in conjunction with the accompanying drawings and the claims. BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 provides a pictorial diagram depicting an example of a particular parallax and disparity situation.

FIG. 2 provides a pictorial diagram depicting an example of a 2k window in an 8I picture. FIG. 3 provides a pictorial diagram depicting an example of a large format pictur that includes a maximum crop window and a pan and scan window.

FIG. 4 provides a pictorial diagram depicting an example of the pan and scan window of FIG. 3 being scaled down to a smaller format screen.

FIG. 5 provides a pictorial diagram depicting an example of an 8k picture overlai with a maximum crop window and a pan and scan window.

FIG. 6 provides a pictorial diagram depicting an example of the pan and scan window of FIG. 5 scaled down to a 2k picture.

FIG. 7 provides a flow diagram depicting an example of a process for scaling a picture based on a disparity metric. FIG. 8 provides a flow diagram depicting another example of a process for scaling a picture.

FIG. 9 provides a block diagram depicting an example of a system for scaling a picture based on a disparity metric. DETAILED DESCRIPTION

When adapting large format assets to smaller screens, the change in

composition and aspect ratio often leads, in 2D, to a pan and scan process whic cuts out a significant portion of the source frame. Using the same approach for stereoscopic content is frequently unsuitable, because of the excessive parallax this would cause.

At least one implementation described in this application provides a methodolog that attempts to ensure that excessive parallax is not possible when resizing large screen items to smaller viewing platforms. The implementation

automatically computes a minimum scaling ratio, and uses a maximum crop window in the post production software, thereby preventing the source image from being cropped beyond the minimum scaling ratio.

We now provide a discussion of several stereoscopic concepts, using FIG. 1 as point of reference. Parallax, disparity, and depth are related in stereoscopic presentations. Parallax is the angular difference between two sight-lines to an object. In the context of stereoscopic presentation, the sight-lines originate at each of a viewer's left- and right-eyes and proceed to corresponding left- and right-eye image views of the object (or portion thereof).

Disparity is the linear difference in the positions of an object (or portion thereof) i each of the left- and right-eye images of a stereoscopic image pair. Disparity may be expressed as a physical measure (for example, in centimeters), or in an image-relative measure, (for example, in pixels or as a percentage of image width). A conversion between the two forms is possible when the size of the images as displayed is known. Depth, as perceived in a stereoscopic presentation, can be determined from parallax. However, depth is trigonometrically, not linearly, related to disparity an parallax. Depth is shown by the apparent distance (D) in FIG. 1 , as discussed below, and can in theory range from zero to infinity.

The three measures of parallax, disparity, and depth are, as previously explained, mutually derivable. All three are considered to be measures of "depth", in a broad sense, and are treated in this application as "depth indicators These three measures are considered to be interchangeable for purposes of this application, unless otherwise required. We provide the following brief examples of how these three measures correspond in Table 1 below.

FIG. 1 shows a situation 100 where a viewer 1 10 is watching a stereoscopic presentation on a screen 120. The viewer 1 10 perceives a particular object 130 whose apparent distance (D) from the viewer results from parallax (Θ), which is induced by the combination of physical disparity (dp), viewing distance (V), and the viewer's interocular spacing (t_E).

For simplicity of explanation and illustration, the situation 100 is shown with left and right sightlines 131 , 132 forming a right triangle with a line between the viewer's left and right eyes 1 1 1 , 1 12, the line having a length of (t_E). Further, an again for simplicity, that line is considered to be parallel to the screen 120.

In the example situation 100, the physical disparity (d_P) is by some conventions, and herein, considered to be negative. The physical disparity (dp) is negative whenever the left-eye image of the object is to the left of the right-eye image of the object. The parallax angle (Θ), in the situation 100, is positive, and is positive for all values of (dp) greater than (-t_E). Parallax (Θ) will be zero (not illustrated in FIG. 1 when the sight-lines 131 and 132 are parallel, in which case object 130 would appear to be at an infinite distance (D =∞). This would be the case if the physical disparity (dp) of the object, as displayed on the screen 120, were negative and equal in magnitude to the viewer's interocular distance, (that is, dp is equal to -t_E).

There are cases where the parallax (Θ) can be displayed, by presenting a stereoscopic image pair in which the left- and right-eye images of the object 130 have a disparity of less than (-t_E). In these circumstances, the parallax becomes negative, at which point the presentation of the object 130 by the screen 120 ceases to have a meaningful interpretation and the parallax (Θ) becomes negative. In such a case, geometrically, the sightlines 131 , 132 would intersect behind the viewer 1 10, but the images of the object 130 still appear on the scree 120. This produces a contradiction that the human visual system typically does not enjoy.

As long as the physical disparity is at least (-t_E), corresponding to a parallax (Θ) ( at least zero, when the parallax becomes more positive, that is, the left eye turns further to the right and/or the right eye turns further to the left when viewing the object 130, the object 130 appears to come closer to the viewer 1 10. As the parallax becomes less positive, the object 130 appears further away. In the casi for which disparity is zero, the object appears to reside at the same distance fror the viewer as the screen (D = V).

The relationship between interocular spacing (t_E), distance V from the viewer to : screen, physical disparity (dp), and an object's apparent distance (D) from the viewer can be described in an equation obtained by inspecting the similar triangles of FIG. 1 (and recalling that the sign of dp in situation 100 is negative):

EQ. 1 : d_P = -t_E (D - V)/D

Which, solved for D, gives: EQ. 2: D = t_EV/(d_P + t_E)

Recall that in the convention from above, that positive physical disparity d_P places the left-eye image of an object rightward of the corresponding right-eye image of the object. As a result, at least five interesting conditions occur with EQ. 2:

1 . When d_P is positive, then (d_P + t_E) is greater than t_E, and D will be less than V That is, an object displayed with positive physical disparity dp will appear to be closer to the viewer than the screen. For example, if t_E = dp, D will equal V/2, and result in the object appearing halfway between the viewer and the screen.

2. If dp is zero, then D will equal V. That is, an object with zero disparity will appear at the screen.

3. When dp is negative, but of smaller magnitude than t_E, as shown in the

situation 100, then D will be greater than V. That is, an object will appear behind the screen.

4. If dp is equal to -t_E, then the object will appear to be at infinity.

5. If dp is more negative than t_E, then the contradiction mentioned above in conjunction with negative parallax occurs. That is, EQ. 1 indicates that D is negative, which suggests that the object appears behind the viewer, even as the images of the object appear in front of the viewer on the screen 120.

Because humans do not see objects behind them, such a presentation can result in a perceptual conflict that should typically be avoided.

Physical disparity (d_P) is proportional to the disparity (d,) intrinsic to the images c a stereoscopic image pair and the size of the screen (S). As the size (S) of the screen grows larger (and with it the size of the stereoscopic images), a particula intrinsic disparity (d,) will produce a larger physical disparity (dp):

EQ. 3: d_P = d_t x S where d, is expressed as a fraction of image width. Combining this with EQ. 1 yields:

Thus, for a particular stereoscopic image pair showing an object with an intrinsic disparity (d,), the apparent distance (D) of the object from the viewer is altered b varying the viewer's distance from the screen (V) and/or size of the screen (S). Additionally, while interocular spacing (t_E) is essentially constant for an individua over long spans of time, individuals within a population may differ substantially, especially if comparing adults to children.

From the above discussion and equations, it should be clear (i) that all of these measures can be calculated from the others, and (ii) how parallax, disparity, and apparent distance are all interrelated. In some embodiments, these equations may be employed using predetermined "default" values for certain elements.

Examples of elements for which default values may be used include, for example, interocular distance (t_E), screen size (S), and/or viewing distance (V). The use of default values can allow other values (for example, distance D) to be presented in a contingent fashion, in which final values might depend upon how actual values differ from the default assumptions.

It is useful to note that the discussion of parallax with respect to FIG. 1 is an approximation. The calculations are made easier, and the number of variables made fewer, by assuming that one sight line is perpendicular. The perpendicula sight line results in a right angle at the right eye 1 12 between the sight line 132 and the interocular spacing t_E. It is a further approximation to assume that the head is facing the screen, which results in the baseline of the eyes and the plan* of the screen being parallel (that is, the eyes are equidistant from the screen). The head facing the screen squarely is typically a good assumption, because thi would be the center case when considering what might go wrong in a viewing situation. As such, any head turning could result in a deviation in either directior with square-facing being the median condition. When t_E is small compared to V_; that is, theta is typically small, the approximation is generally considered acceptable because both sin() and tan() are typically considered to be fairly linei in that region.

The three measures being discussed each have advantages. Parallax is typical considered to be closest to how the human brain actually perceives 3D. Depth i typically considered to be most descriptive of an actual scene. Disparity exactly describes what is happening at the screen. Many variables, such as, for example, a head rotating, a viewer moving closer to or further from a screen, or screen size changing, can affect depth and parallax, but do not impact disparity (at least not disparity as a proportion of the image size). It should also be clear that disparity can be directly controlled and modified. Note that implementations discussed in this application may refer to reducing parallax values. That language is frequently used because, at least, offensive parallax values often correspond to a 3D effect in which an object appears close to the viewer than the screen. Such situations involve a parallax value that is to< large for comfort or safety. It should be understood, however, that all of the implementations can be used to increase parallax values if, for example, the parallax is considered to be too small. Frequently, an implementation will provid safety and/or comfort levels for both hyperdivergence and hyperconvergence.

We now describe a typical situation in which conversion from a large format to a small format can create a problem for stereoscopic content. A large format image frame (such as an Imax ® asset, for example) has a composition that is typically very different from theatrical and home viewing compositions. Typical conditions include a very large screen, and a very high ratio between the width c the screen and the average distance of the viewer to the screen. As a result, th< viewer typically only actively perceives a very small portion of the image, often located in the lower third of the screen. The rest of the frame is displayed, for example, to immerse the viewer in the image, filling the viewer's peripheral vision.

If such an image was scaled down to entirely fit on a High Definition Television (HDTV), the reduction ratio would be such that in many scenes, people's faces and most of the action would be too small to see. To alleviate this problem, pos production companies typically crop a significant part of the source frame to present the asset on the smaller screen.

"Cropping", as used in this application, refers generally to selecting a portion of ί source picture and discarding (or ignoring) the remainder of the source picture, with the selected portion to be converted into a destination picture. The selectee portion is typically less than the entire source picture. The selected portion (rather than the unselected, or discarded, portion) is typically referred to as the cropped portion. The destination picture can have a different resolution, including a different aspect ratio, than the source picture. The cropped portion is typically scaled to adapt the cropped portion to the resolution of the destination picture. Thus, cropping and scaling are generally referred to as separate processes, although various implementations combine the two operations of cropping (selecting) and scaling.

This method generally works properly for non-stereoscopic (2D) content.

However, when dealing with stereoscopic assets, the method is often not practical, because the resulting disparity is often excessive, and does not meet home viewing specifications. This is due to the fact that disparity (sometimes referred to as parallax, in a general sense) is often expressed as a percentage c a screen's width, and this value (expressible, for example, as a ratio of disparity over screen width) will change if the image is simply cropped instead of being scaled down.

For example, consider an 8k large screen source frame, which has 8192 pixels ( width. If this asset has a maximum negative disparity (sometimes referred to as parallax, in a general sense) value of 1 % of screen width. This means an absolute negative disparity value of 8192 x 0.01 = 82 pixels. Scaling down the image to a 2K frame, at 2048 pixels of width, would give us a maximum negative disparity value of 2048 x 0.01 = 20 pixels.

However, if a 2K window is cropped from the 8k image, then there will be no scaling, and the final maximum negative disparity value will be 82 pixels. This value of 82 pixels exceeds, by a factor of 4, the maximum value of 20 pixels. This situation is depicted in FIG. 2. FIG. 2 includes an 8k (8192 pixels wide) picture 200 that includes a 2k (2048 pixels wide) window 210. The window 210 shows a tree 220. If the window 210 is used as a cropping window to generate an output 2k picture, then no scaling will be needed. As a result, the 2048 columns of pixels in the window 210 will become the pixels of the output 2k picture. Accordingly, the disparity present in, for example, the tree 220 will be completely preserved. Therefore, if the disparity present in the tree 220 in the 8 picture 200 is 82 pixels, then the disparity present in the tree 220 in the output 2 picture will also be 82 pixels. In order to alleviate this problem, post production companies typically perform a manual combination of cropping and scaling. Unfortunately, because such cropping and scaling is a manual operation, the result is that a high percentage < pictures (or shots) typically still exceed the maximum thresholds for disparity. Consequently, the pictures (or shots) that exceed the maximum thresholds will generally be reformatted, until a final result is attained that meets the thresholds This often results in an iterative and time-consuming process.

We now describe an implementation associated with FIGS. 3-4. FIG. 3 provides a pictorial diagram depicting an example of a large format picture that includes a maximum crop window and a pan and scan window. FIG. 4 provides a pictorial diagram depicting an example of the pan and scan window of FIG. 3 being scaled down to a smaller format screen.

Referring to FIG. 3, a large format picture is shown as a large screen source picture 310. Upon loading the large screen source picture 310, a measure of maximum observed negative and positive disparity is calculated for the picture 310. The disparity calculations are performed using, for example, a disparity estimator, feature point analysis, or any other suitable method, as is known in th art.

Next, a Minimum Reduction Ratio (MRR) for the picture 310, or for an entire sho that includes the picture 310, is calculated using the following formula: MRR (negative) = Source Maximum Negative Disparity (calculated) /

Destination Maximum Negative Disparity (specified)

MRR (positive) = Source Maximum Positive Disparity (calculated) /

Destination Maximum Positive Disparity (specified) MRR = Largest of MRR (negative) and MRR (positive)

For example, assume that the source has an observed maximum negative disparity of 60 pixels, and the destination specification is 15 pixels. Then the MRR in negative disparity is 60 px / 15 px, producing an MRR (negative) of 4. Note that the specification is provided, for example, in various implementations, by a device manufacturer, by a content producer, or by a user's viewing preferences.

Continuing this example, assume that the source has an observed maximum positive disparity of 100 pixels, and the destination specification is 30 pixels. Then the MRR in positive disparity is 100 px / 30 px, producing an MRR (positiv* of 3.33.

The MRR is the largest of MRR (negative) and MRR (positive). Therefore, in thi case, the MRR is 4. This means that the source image has to be scaled down b a minimum factor of 4. A scaling down of at least 4 results in a minimum reduction in size of 75% because the image is reduced to no more than 25% of its original size.

After the MRR has been calculated automatically for the picture 310, a Maximun Crop Window (MCW) 320 appears on the screen, shown in dashed lines in FIG. 3, overlaying the picture 310. This window 320 is an indication of the smallest possible cropping size in order to be over the MRR. That is, if the cropping size is less than the MCW 320, then the required scaling to achieve the destination resolution will be less than the MRR. The size of the MCW 320 is calculated in the following way:

MCW Width = Source Frame Width / MRR

MCW Height = MCW Width / Destination Aspect Ratio = Source Frame Width / MRR / Destination Aspect Ratio

For example, given an MRR of 2, a source frame that is 8192 pixels wide, and a destination aspect ratio of 1 .78, the dimensions of the MCW 320 will be:

MCW Width = 8192 / 2 = 4096 pixels wide MCW Height = 4096 / 1 .78 = 2301 pixels high

The MCW 320 is then locked in the pan and scan software used by an operator that is performing a conversion of the source content to a new destination resolution. This prevents the operator from selecting a final cropped window tha is smaller than the MCW 320, which would result in a final picture (or a final shol with excessive disparity on the destination device.

The operator then selects the final, desired Pan and Scan Window (PSW) 330 fc the picture. The PSW 330 can be static or dynamic (for example, one dynamic implementation changes the location and/or size of the PSW 330 in each picture), but it is automatically locked to be at least as large as the MCW 330. This lock ensures that the final picture in the destination resolution does not exceed the destination device's parallax or disparity specifications. FIG. 3 show this lock by depicting the MCW 320 as being smaller than the PSW 330, with the arrows in FIG. 3 indicating that the PSW 330 has a range of allowable sizes but that all allowable sizes are at least as large as the MCW 320. The final operation is to apply the proper scale down ratio to the PSW 330 so th< the final picture (or, in various implementations, the final shot) is in the destination's resolution and aspect ratio. This is depicted in FIG. 4 by showing the PSW 330 being scaled down into a final picture 340. The arrows of FIG. 4 indicate the scaling down, which is performed using the following formula: Scale Down Ratio = PSW Width / Final Resolution Width

For example, assume that the PSW 330 is 4096 pixels wide and the final picture 340 has a Final Resolution Width of 2048 pixels. Then the Scale Down Ratio wi be 4096 / 2048 = 2. Accordingly, the image will first be cropped to the PSW's specifications. Then the PSW 330 will be scaled down by a factor of 2, reducing the PSW 330 to 50% of the original size of the PSW 330.

Although not necessarily evident from FIG. 4, the final picture 340 will typically b smaller than the MCW 320. This is because typical implementations will have ai MRR that is greater than 1 , which will have the effect of reducing the size of the PSW 330 so that it is smaller than the MCW 320.

Referring to FIGS. 5-6, we now describe the above implementation using picture that have an object in them. FIG. 5 provides a pictorial diagram depicting an example of an 8k picture overlaid with a maximum crop window and a pan and scan window. FIG. 6 provides a pictorial diagram depicting an example of the pan and scan window of FIG. 5 scaled down to a 2k picture.

FIG. 5 includes an 8k picture 510 that includes a tree 515. FIG. 5 also includes an MCW 520 that does not include the entire tree 515. The MCW 520 is 4096 pixels wide. If the MCW is calculated using the equation discussed above with respect to FIGS. 3-4, then the MCW is calculated as:

MCW Width = Source Frame Width / MRR

In such a case, the MRR can be solved for as Source Frame Width / MCW Width. Given that both of the variables that determine MRR are known from FIG 5, the MRR is 2. FIG. 5 also includes a PSW 530 that does include the entire tree 515. The operator is presumed to have selected the PSW 530 to include the entire tree 515. Referring to FIG. 6, the PSW 530 is scaled down, by at least the value of the MRR, to produce the final (destination) picture 540 having a width of 2048 pixels. It is clear that the PSW 530 is scaled down by more than the value of MRR, which is 2, but the exact scale down value is not known because the size of the PSW 530 is not indicated.

We now describe another implementation. This implementation calculates the MCW width differently than the implementation that is discussed above with respect to FIGS. 3-6. The size of the MCW is calculated in the following way: MCW Width = MRR ^* Final Resolution Width

MCW Height = MCW Width / Destination Aspect Ratio

= MRR ^* Final Resolution Width / Destination Aspect Ratio

= MRR ^* Final Resolution Height For example, given an MRR of 2, a source frame that is 8192 pixels wide, a destination (final) resolution that is 2048 pixels wide, and a destination aspect ratio of 1 .78, the dimensions of the MCW will be:

MCW Width = 2 ^* 2048 = 4096 pixels wide

MCW Height = 4096 / 1 .78 = 2301 pixels high The scale down ratio to apply to the PSW is determined as in the previous implementation, for which:

Scale Down Ratio = PSW Width / Final Resolution's Width

For example, assume that the PSW is 4096 pixels wide and the final picture has a Final Resolution Width of 2048 pixels. Then the Scale Down Ratio will be 409 / 2048 = 2.

As another example, assume that the Final Resolution Width is 4096. Then: MCW width = 2 ^* 4096 = 8192, PSW must be 8192, and

Scale Down Ratio = 8192 / 4096 = 2 (which is the same as the MRR). This results in an example that has no pan and scan because the entire 8k picture is to be selected for scaling down.

As another example, assume that the Final Resolution Width is 3072. Then: MCW width = 2 ^* 3072 = 6144,

PSW = 7680 (can be any value equal to or greater than 6144), and Scale Down Ratio = 7680 / 3072 = 2.5. Referring now to FIG. 7, a flow diagram depicts an example of a process 700 for scaling a picture based on a disparity metric. A disparity metric is used in this application to refer, for example, to any variable or quantity that is based on, or derivable from, disparity. A disparity metric also includes any variable or quantit that is based on, or derivable from, a value that is derivable from or based on disparity such as, for example, parallax or depth. Disparity metrics include, for example, an MRR whether based on disparity or parallax, and an MCW size.

The process 700 includes accessing a source picture having a source resolution (710). The process 700 also includes accessing a destination resolution for a destination picture (720).

The process 700 includes determining a lower bound, based on a disparity metr associated with the destination resolution (730). In at least one implementation, the operation 730 includes determining a lower bound on a size of an area in the source picture that can be selected for use in generating the destination picture, the lower bound being based on a disparity metric associated with the destinatio resolution.

Various implementations determine a lower bound by determining a single dimension, with the second dimension being implicit. Additionally, various implementations provide a display of the lower bound in two dimensions (for example, an MCW). However, other implementations provide a display of only ι single dimension, such as, for example, the width of the MCW.

The process 700 includes scaling a selected area (740). In at least one implementation, the operation 740 includes scaling a selected area of the source picture, the selected area being at least as large as the lower bound, wherein thi destination picture is based on the scaled selected area.

In various implementations, the process 700 also includes an optional operation (not shown in FIG. 7) of receiving or accessing an indication of the selected area The indication is provided, for example, by a user choosing an area on a screen that is displaying the source picture. The process 700 does not specifically recite the operation of generating the destination picture. However, in various implementations, the scaled selected area is the destination picture exactly and has the destination resolution. In sucl implementations, the process 700 does generate the destination picture by virtui of generating the scaled selected area. In such implementations in which the destination picture is the scaled selected area, the destination picture is clearly based on the selected scaled area.

In other implementations, the process 700 further includes an optional operation (not shown in FIG. 7) of performing additional processing in order to generate th destination picture. Such additional processing can include, for example, one or more of (i) truncating part of the scaled selected area, (ii) padding the scaled selected area to increase the number of pixels, (iii) adapting the color or luminance of at least some of the pixels in the scaled selected area, or (iv) performing some other filtering operation on at least some of the pixel values in the scaled selected area.

Various implementations of the process 700 include one or more of the followinc additional features:

- the scaled selected area is processed to generate the destination picture,

- the disparity metric is further associated with the source picture, - the disparity metric is a minimum reduction ratio,

- the minimum reduction ratio is based on (i) a maximum disparity for the source picture and (ii) a maximum disparity for the destination picture,

- the maximum disparity for the source picture is measured or calculated, and th maximum disparity for the destination picture is specified, - the minimum reduction ratio is the ratio of the maximum disparity for the source picture over the maximum disparity for the destination picture,

- the lower bound is further based on the destination resolution,

- the lower bound is further based on the source resolution, - the lower bound is expressed or displayed with respect to only one dimension,

- the dimension is horizontal,

- the lower bound is based on a ratio of the source resolution to a minimum

reduction ratio, - the lower bound is based on the destination resolution multiplied by a minimunr reduction ratio,

- the source picture is displayed, the lower bound is displayed by being overlaid on the source picture, and receiving the selected area includes accepting input from a user identifying the selected area, and/or - the destination picture is transmitted, encoded, and/or displayed.

Referring now to FIG. 8, a flow diagram depicts an example of a process 800 for scaling a picture. The process 800 includes displaying a source picture (810). I at least one implementation, the operation 810 includes displaying at least a portion of a source picture having a source resolution. The process 800 includes identifying a minimum size for cropping (820). In at least one implementation, the operation 820 includes identifying on a display of ; least a portion of the source picture having a source resolution, a minimum size for cropping.

The process 800 includes accepting input identifying a selected area at least as large as the minimum size (830). In at least one implementation, the operation 830 includes accepting input identifying a selected area of the source picture for cropping, the selected area being at least as large as the identified minimum size. Input can identify the selected area by, for example, (i) indicating one or more coordinates of the selected area (for example, corner or center

coordinates), (ii) selecting among a set of possible areas (for example, a pre- supplied list of selected areas), (iii) identifying one or more objects that are to be included in the selected area, and/or (iv) identifying a different picture from whicl the coordinates for the selected area are to be copied. Input can be accepted b; for example, (i) receiving input from a touch screen, a mouse, or a keyboard, and/or (ii) accessing input from a stored memory location (for example, in using default value for the input, or accessing a stored profile for the input, or accessin input used and stored for another picture).

The process 800 includes scaling the selected area to form a scaled picture (840). The scaled selected area need not be a final destination picture.

However, in various implementations, the scaled picture is a destination picture having a destination resolution.

In various implementations, the process 800 ends without generating the destination picture. Conversely, in other implementations, the process 800 includes further processing of the scaled selected area to generate the destination picture. Such further processing is, in various implementations, for example, as described above with respect to the process 700.

Various implementations of the process 800 include one or more of the followinc additional features: - displaying an entire source picture,

- identifying the minimum size includes displaying an outline of a rectangle over display of at least a portion of the source picture, the rectangle indicating the minimum size for cropping,

- accepting input identifying the selected area includes allowing a user to adjust size of a window to identify the selected area, and/or

- preventing the size of the window from being adjusted so that it is smaller than the identified minimum size.

Referring to FIG. 9, a block diagram depicts an example of a system 900 for scaling a picture based on a disparity metric. The system 900 includes a processor 910 that is communicatively coupled to a display 920 for displaying, fc example, digital pictures. Pictures are displayed, in various implementations, before, during, and/or after being processed by the processor 910. The display 920 is also able to send signals back to the processor 910 in the event, for example, that the display 920 is a touch-screen and provides input information from the touch-screen to the processor 910.

The processor 910 is also communicatively coupled to a user interface 930 for accepting input from, for example, an operator. The user interface 930 is also communicatively coupled to the display 920 to allow, for example, operator inpul to be displayed on the display 920 directly without intervention by the processor 910.

The processor 910 is communicatively coupled a storage device 940, an encode 950, and a transmitter 960. In typical implementations, the processor 910 provides digital pictures to one or more of the storage device 940 for storing the pictures, the encoder 950 for encoding of the pictures, or the transmitter 960 for transmitting the pictures.

The encoder 950 is also communicatively coupled to the storage device 940 and/or the transmitter 960. This allows, for example, encoded pictures from the encoder 950 to be (i) stored in the storage device 940 and/or (ii) transmitted using the transmitter 960.

In various implementations, the communications is two-way (not shown in FIG. ί between the processor 910 on the one hand, and one or more of the storage device 940, the encoder 950, or the transmitter 960, on the other hand. Such two-way communication allows, for example, stored pictures to be retrieved by the processor 910, encoded pictures to be provided directly to the processor 91 ( without going through the storage device 940, and transmitted pictures also to b provided to the processor 910. Additionally, parameters or other information car be provided to the processor 910 from one or more of the storage device 940, th encoder 950, or the transmitter 960.

The system 900 is used, in various implementations, to perform the process 70C as well as any of the additional features described with respect to the process 700. In one such implementation: - The processor 910 accesses a source picture having a source resolution (710) from the storage device 940.

- The processor 910 also accesses a destination resolution for a destination picture (720) from the user interface 930 or the storage device 940. - The processor 910 determines a lower bound on a size of an area in the souro picture that can be selected for use in generating the destination picture, the lower bound being based on a disparity metric associated with the destinatioi resolution (730). The processor 910 uses, in various implementations, any o the equations described in this application for calculating an MCW as the lower bound.

- The processor 910 scales a selected area of the source picture, the selected area being at least as large as the lower bound, wherein the destination picture is based on the scaled selected area (740). In at least one

implementation, the processor 910 receives the selected area from the user interface 930.

The system 900 is used, in various implementations, to perform the process 80C as well as any of the additional features described with respect to the process 800. In one such implementation:

- The processor 910 displays a source picture (810) on the display 920 after accessing the source picture from the storage device 940.

- The processor 910 identifies a minimum size for cropping (820).

- The processor 910 accepts input, from the user interface 930, identifying a selected area at least as large as the minimum size (830).

- The processor 910 scales the selected area to form a scaled picture (840). The display 920 includes, in various implementations, one or more of a compute display, a laptop display, a tablet display, a cell phone display, a television display, or any of the other displays mentioned in this application or known in th( art, including projected displays that may be visible on any surface, such as, for example, a wall, a ceiling, a floor, or a sidewalk. The user interface 930 includes, in various implementations, one or more of a mouse, a track pad, a keyboard, a touch screen, a microphone for accepting voice commands that are interpreted by the processor 910, a remote control, a cell phone, a separate computer whether remote or local, or any other input device mentioned in this application or known in the art.

The storage device 940 includes, in various implementations, any of the storag devices mentioned in this application or known in the art.

The encoder 950 includes, in various implementations, an AVC or H.264 (as defined elsewhere in this application) encoder, an encoder for any other standard, or any other encoding device mentioned in this application or known ir the art.

The transmitter 960 includes, in various implementations, an output pin of any integrated circuit, a Universal Asynchronous Receiver/Transmitter (UART), a broadcast transmitter, a satellite transmitter, a cable transmitter, or any other transmitting device mentioned in this application or known in the art. The transmitter 960 may be, for example, adapted to transmit a program signal having one or more bitstreams representing encoded pictures and/or informatior related thereto. Typical transmitters perform functions such as, for example, on< or more of providing error-correction coding (which may alternatively, or additionally, be performed in the encoder 950), interleaving the data in the signa (which may alternatively, or additionally, be performed in the encoder 950), randomizing the energy in the signal, and modulating the signal onto one or mor carriers using a modulator. The transmitter 960 may include, or interface with, a antenna (not shown). Further, implementations of the transmitter 960 may be limited to a modulator.

This application provides multiple figures, including the block diagram of FIG. 9, the flow diagrams of FIGS. 7-8, and the pictorial diagrams of FIGS. 1 -6. Each σ these figures provides disclosure for a variety of implementations.

- For example, the block diagram certainly describes an interconnection of

functional blocks of an apparatus or system. However, it should also be clea that the block diagram provides a description of various process flows. As ar example, FIG. 9 also presents a flow diagram for performing various processes that include the functions of the blocks of FIG. 9. For example, (i) the block for the encoder 950 also represents the operation of encoding pictures, (ii) the block for the transmitter 960 also represents the operation of transmitting pictures, and (iii) the interconnection between the encoder 950 and the transmitter 960 represents a process in which pictures are encoded and then transmitted. Other blocks of FIG. 9 are similarly interpreted in describing this flow process. - For example, the flow diagram certainly describes a flow process. However, it should also be clear that the flow diagram provides an interconnection between functional blocks of a system or apparatus for performing the flow process. As an example, FIG. 7 also presents a block diagram for performin the functions of the process 700. For example, (i) reference element 720 als represents a block for performing the function of accessing a destination resolution, (ii) reference element 730 represents a block for performing the function of determining a lower bound, and (iii) the interconnection between elements 720 and 730 represents an apparatus having a component for accessing a destination resolution coupled to a component for determining a lower bound. Other blocks of FIG. 7 are similarly interpreted in describing th system/apparatus.

- For example, the pictorial diagrams of FIGS. 2-6 certainly describe output

screens shown to a user. However, it should also be clear that the pictorial diagrams describe one or more flow processes for interacting with the user. For example, FIG. 4 also describes a process of accepting a PSW 330 from ; user and then scaling the PSW 330 to generate the final output 340. Further FIGS. 2-3 and 5-6 can also be interpreted in a similar fashion to describe respective flow processes.

We have thus provided a number of implementations. It should be noted that variations of the described implementations, as well as additional applications, are contemplated and are considered to be within our disclosure. Additionally, features and aspects of described implementations may be adapted for other implementations.

Several of the implementations refer to features that are automated or that are performed automatically. Variations of such implementations, however, are not automated and/or do not perform all of part of the features automatically.

Various implementations are described with respect to a picture. Many implementations perform one or more of the described processes on every picture in a series of pictures. Other implementations, however, apply more consistency across pictures that belong to the same shot, or scene, or movie, fo example. It is often advantageous, for example, to apply the same cropping anc scaling to every picture in a shot. Additionally, it is often advantageous for an operator to view an entire shot, for example, before selecting the pan and scan window for that shot or for an individual picture in the shot. Various implementations calculate the maximum disparity of a source picture. Typically the maximum disparity will be in a region of the source picture that an operator desires to preserve in the pan and scan window. However, if the maximum disparity is not in the pan and scan window, then the "effective" MRR will be different. Various implementations take this into consideration when creating the maximum crop window, and can vary the size of the maximum crop window depending on the location, selected by an operator, of the pan and scan window.

Several of the implementations and features described in this application may b< used in the context of the AVC Standard, and/or AVC with the MVC (Multiview Video Coding) extension (Annex H), and/or AVC with the SVC (Scalable Video Coding) extension (Annex G). Additionally, these implementations and features may be used in the context of another standard (existing or future), or in a context that does not involve a standard. Note that AVC refers to the existing International Organization for Standardization/International Electrotechnical Commission ("ISO/IEC") Moving Picture Experts Group-4 ("MPEG-4") Part 10 Advanced Video Coding ("AVC") standard/International Telecommunication Union, Telecommunication Sector ("ITU-T") H.264 Recommendation (variously referred to throughout this document as the "H.264/MPEG-4 AVC Standard" or variations thereof, such as the "AVC standard", the "H.264 standard", or simply "AVC" or "H.264").

Reference to "one embodiment" or "an embodiment" or "one implementation" or "an implementation" of the present principles, as well as other variations thereof mean that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase "in one embodiment" o "in an embodiment" or "in one implementation" or "in an implementation", as well any other variations, appearing in various places throughout the specification an not necessarily all referring to the same embodiment.

Additionally, this application or its claims may refer to "determining" various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting th information, or retrieving the information from memory.

Further, this application or its claims may refer to "accessing" various pieces of information. Accessing the information may include one or more of, for example receiving the information, retrieving the information (for example, memory), storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the

information, or estimating the information. Additionally, this application or its claims may refer to "receiving" various pieces of information. Receiving is, as with "accessing", intended to be a broad term. Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory).

Further, "receiving" is typically involved, in one way or another, during operation: such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.

Various implementations refer to "images" and/or "pictures". The terms "image" and "picture" are used interchangeably throughout this document, and are intended to be broad terms. An "image" or a "picture" may be, for example, all o part of a frame or of a field. The term "video" refers to a sequence of images (or pictures). An image, or a picture, may include, for example, any of various videc components or their combinations. Such components, or their combinations, include, for example, luminance, chrominance, Y (of YUV or YCbCr or YPbPr), I (of YUV), V (of YUV), Cb (of YCbCr), Cr (of YCbCr), Pb (of YPbPr), Pr (of YPbPr), red (of RGB), green (of RGB), blue (of RGB), S-Video, and negatives oi positives of any of these components. An "image" or a "picture" may also, or alternatively, refer to various different types of content, including, for example, typical two-dimensional video, a disparity map for a 2D video picture, a depth map that corresponds to a 2D video picture, or an edge map.

Further, many implementations may refer to a "frame". However, such implementations are assumed to be equally applicable to a "picture" or "image".

A "depth map", or "disparity map", or "edge map", or similar terms are also intended to be broad terms. A map generally refers, for example, to a picture that includes a particular type of information. However, a map may include othe types of information not indicated by its name. For example, a depth map typically includes depth information, but may also include other information such as, for example, video or edge information. This application refers to "encoders" and "decoders" in a variety of

implementations. It should be clear that an encoder can include, for example, one or more (or no) source encoders and/or one or more (or no) channel encoders, as well as one or more (or no) modulators. Similarly, it should be dec that a decoder can include, for example, one or more (or no) modulators as well as one or more (or no) channel encoders and/or one or more (or no) source encoders.

It is to be appreciated that the use of any of the following 7", "and/or", and "at least one of, for example, in the cases of "A/B", "A and/or B" and "at least one o A and B", is intended to encompass the selection of the first listed option (A) onh or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of "A, B, and/or C" and "at least one of A, B, and C" and "at least one of A, B, or C", such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection o all three options (A and B and C). This may be extended, as readily apparent b) one of ordinary skill in this and related arts, for as many items listed.

Additionally, many implementations may be implemented in a processor, such as, for example, a post-processor or a pre-processor. The processors discussei in this application do, in various implementations, include multiple processors (sub-processors) that are collectively configured to perform, for example, a process, a function, or an operation. For example, the processor 910, as well as other processing components such as, for example, the encoder 950 and the transmitter 960, are, in various implementations, composed of multiple sub- processors that are collectively configured to perform the operations of that component. The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus c program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a

microprocessor, an integrated circuit, or a programmable logic device.

Processors also include communication devices, such as, for example, computers, cell phones, tablets, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end- users.

Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications. Examples of such equipment include an encoder, a decoder, a post-processor, a pre-processor, a video coder, a video decoder, a video codec, a web server, a television, a set-to box, a router, a gateway, a modem, a laptop, a personal computer, a tablet, a ce phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.

Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as for example, a hard disk, a compact diskette ("CD"), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video dis< or a Blu-Ray disc), a random access memory ("RAM"), a read-only memory ("ROM"), a USB thumb drive, or some other storage device. The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or ί combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out ε process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.

As will be evident to one of skill in the art, implementations may produce a variel of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations For example, a signal may be formatted to carry as data the rules for writing or reading syntax, or to carry as data the actual syntax-values generated using the syntax rules. Such a signal may be formatted, for example, as an

electromagnetic wave (for example, using a radio frequency portion of spectrum or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.

A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill wi understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially thi same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application.

Claims

Claims:

1 . A method comprising:

accessing a source picture having a source resolution;

accessing a destination resolution for a destination picture;

determining a lower bound on a size of an area in the source picture that can be selected for use in generating the destination picture, the lower bound being based on a disparity metric associated with the destination resolution; and scaling a selected area of the source picture, the selected area being at least as large as the lower bound, wherein the destination picture is based on th scaled selected area.

2. The method of claim 1 further comprising processing the scaled selected area to generate the destination picture.

3. The method of any of claims 1 or 2 wherein the disparity metric is further associated with the source picture.

4. The method of any of claims 1 or 2 wherein the disparity metric is < minimum reduction ratio.

5. The method of claim 4 wherein the minimum reduction ratio is based on (i) a maximum disparity for the source picture and (ii) a maximum disparity for the destination picture.

6. The method of claim 5 wherein:

the maximum disparity for the source picture is measured or calculated, and

the maximum disparity for the destination picture is specified.

7. The method of any of claims 5 or 6 wherein the minimum reductior ratio is the ratio of the maximum disparity for the source picture over the maximum disparity for the destination picture.

8. The method of claim 1 wherein the lower bound is further based or the destination resolution.

9. The method of claim 1 wherein the lower bound is further based or the source resolution.

10. The method of claim 1 wherein the lower bound is expressed or displayed with respect to only one dimension.

1 1 . The method of claim 10 wherein the dimension is horizontal.

12. The method of claim 1 wherein the lower bound is based on a ratic of the source resolution to a minimum reduction ratio.

13. The method of claim 1 wherein the lower bound is based on the destination resolution multiplied by a minimum reduction ratio.

14. The method of claim 1 further comprising:

displaying the source picture; and

displaying the lower bound overlaid on the source picture,

wherein receiving the selected area comprises accepting input from a use identifying the selected area.

15. The method of claim 1 further comprising one or more of:

transmitting the destination picture,

encoding the destination picture, or

displaying the destination picture.

16. An apparatus comprising:

means for accessing a source picture having a source resolution;

means for accessing a destination resolution for a destination picture; means for determining a lower bound on a size of an area in the source picture that can be selected for use in generating the destination picture, the lower bound being based on a disparity metric associated with the destination resolution; and

means for scaling a selected area of the source picture, the selected arec being at least as large as the lower bound, wherein the destination picture is based on the scaled selected area.

17. An apparatus comprising one or more processors collectively configured to perform:

accessing a source picture having a source resolution;

accessing a destination resolution for a destination picture;

18. A processor readable medium having stored thereon instructions for causing one or more processors to collectively perform:

accessing a source picture having a source resolution;

accessing a destination resolution for a destination picture;

19. A method comprising:

displaying at least a portion of a source picture having a source resolutior identifying on the display of at least the portion, a minimum size for cropping;

accepting input identifying a selected area of the source picture for cropping, the selected area being at least as large as the identified minimum size; and

scaling the selected area to form a scaled picture.

20. The method of claim 19 wherein displaying at least the portion of the source picture comprises displaying an entire source picture.

21 . The method of claim 19 wherein identifying the minimum size comprises displaying an outline of a rectangle over the display of at least the portion, the rectangle indicating the minimum size for cropping.

22. The method of claim 19 wherein accepting input identifying the selected area comprises allowing a user to adjust a size of a window to identify the selected area.

23. The method of claim 22 further comprising preventing the size of the window from being adjusted so that it is smaller than the identified minimum size.