US20170163999A1 - Encoder decisions based on results of hash-based block matching - Google Patents
Encoder decisions based on results of hash-based block matching Download PDFInfo
- Publication number
- US20170163999A1 US20170163999A1 US15/321,536 US201415321536A US2017163999A1 US 20170163999 A1 US20170163999 A1 US 20170163999A1 US 201415321536 A US201415321536 A US 201415321536A US 2017163999 A1 US2017163999 A1 US 2017163999A1
- Authority
- US
- United States
- Prior art keywords
- hash
- block
- encoder
- picture
- blocks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001914 filtration Methods 0.000 claims abstract description 107
- 230000003044 adaptive effect Effects 0.000 claims abstract description 16
- 238000000034 method Methods 0.000 claims description 88
- 238000012545 processing Methods 0.000 claims description 19
- 239000000872 buffer Substances 0.000 claims description 13
- 238000013139 quantization Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 description 40
- 238000013459 approach Methods 0.000 description 31
- 238000006073 displacement reaction Methods 0.000 description 30
- 230000008569 process Effects 0.000 description 26
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 21
- 241000023320 Luma <angiosperm> Species 0.000 description 20
- 239000003086 colorant Substances 0.000 description 19
- 230000006854 communication Effects 0.000 description 19
- 238000004891 communication Methods 0.000 description 19
- 230000008859 change Effects 0.000 description 16
- 238000007906 compression Methods 0.000 description 15
- 230000006835 compression Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 11
- 101150114515 CTBS gene Proteins 0.000 description 10
- 230000011664 signaling Effects 0.000 description 10
- 208000034188 Stiff person spectrum disease Diseases 0.000 description 8
- 229920010524 Syndiotactic polystyrene Polymers 0.000 description 8
- 208000012112 ischiocoxopodopatellar syndrome Diseases 0.000 description 8
- 238000002490 spark plasma sintering Methods 0.000 description 8
- 238000005070 sampling Methods 0.000 description 7
- 230000002123 temporal effect Effects 0.000 description 7
- 230000000717 retained effect Effects 0.000 description 6
- 238000005192 partition Methods 0.000 description 5
- 230000003068 static effect Effects 0.000 description 5
- 230000001419 dependent effect Effects 0.000 description 4
- 230000005055 memory storage Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000007175 bidirectional communication Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 239000002131 composite material Substances 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 101100068859 Caenorhabditis elegans ggr-1 gene Proteins 0.000 description 1
- 238000012952 Resampling Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000003467 diminishing effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000013213 extrapolation Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000000153 supplemental effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/517—Processing of motion vectors by encoding
- H04N19/52—Processing of motion vectors by encoding by predictive encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/105—Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/117—Filters, e.g. for pre-processing or post-processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/137—Motion inside a coding unit, e.g. average field, frame or block difference
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/43—Hardware specially adapted for motion estimation or compensation
- H04N19/433—Hardware specially adapted for motion estimation or compensation characterised by techniques for memory access
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/523—Motion estimation or motion compensation with sub-pixel accuracy
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
Definitions
- Engineers use compression (also called source coding or source encoding) to reduce the bit rate of digital video. Compression decreases the cost of storing and transmitting video information by converting the information into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original information from the compressed form.
- a “codec” is an encoder/decoder system.
- Extensions to the H.265/HEVC standard are currently under development.
- a video codec standard typically defines options for the syntax of an encoded video bitstream, detailing parameters in the bitstream when particular features are used in encoding and decoding.
- a video codec standard also provides details about the decoding operations a decoder should perform to achieve conforming results in decoding.
- various proprietary codec formats define other options for the syntax of an encoded video bitstream and corresponding decoding operations.
- video compression techniques include “intra-picture” compression and “inter-picture” compression.
- Intra-picture compression techniques compress individual pictures
- inter-picture compression techniques compress pictures with reference to a preceding and/or following picture (often called a reference or anchor picture) or pictures.
- Motion estimation is a process for estimating motion between pictures.
- an encoder using motion estimation attempts to match a current block of sample values in a current picture with a candidate block of the same size in a search area in another picture, the reference picture.
- a reference picture is, in general, a picture that contains sample values that may be used for prediction in the decoding process of other pictures.
- MV motion vector
- An MV is conventionally a two-dimensional value, having a horizontal MV component that indicates left or right spatial displacement and a vertical MV component that indicates up or down spatial displacement.
- motion compensation is a process of reconstructing pictures from reference picture(s) using motion data.
- An MV can indicate a spatial displacement in terms of an integer number of samples starting from a co-located position in a reference picture for a current block. For example, for a current block at position (32, 16) in a current picture, the MV ( ⁇ 3, 1) indicates position (29, 17) in the reference picture. Or, an MV can indicate a spatial displacement in terms of a fractional number of samples from a co-located position in a reference picture for a current block. For example, for a current block at position (32, 16) in a current picture, the MV ( ⁇ 3.5, 1.25) indicates position (28.5, 17.25) in the reference picture.
- the encoder To determine sample values at fractional offsets in the reference picture, the encoder typically interpolates between sample values at integer-sample positions. Such interpolation can be computationally intensive. During motion compensation, a decoder also performs the interpolation as needed to compute sample values at fractional offsets in reference pictures.
- an encoder When encoding a block using motion estimation and motion compensation, an encoder often computes the sample-by-sample differences (also called residual values or error values) between the sample values of the block and its motion-compensated prediction. The residual values may then be encoded. For the residual values, encoding efficiency depends on the complexity of the residual values and how much loss or distortion is introduced as part of the compression process. In general, a good motion-compensated prediction closely approximates a block, such that the residual values include few significant values, and the residual values can be efficiently encoded. On the other hand, a poor motion-compensated prediction often yields residual values that include many significant values, which are more difficult to encode efficiently. Encoders typically spend a large proportion of encoding time performing motion estimation, attempting to find good matches and thereby improve rate-distortion performance.
- an MV component indicates an integer number of sample values for spatial displacement.
- an MV component can indicate an integer number of sample values or fractional number of sample values for spatial displacement. For example, if the MV precision is 1 ⁇ 4-sample MV precision, an MV component can indicate a spatial displacement of 0 samples, 0.25 samples, 0.5 samples, 0.75 samples, 1.0 samples, 1.25 samples, and so on.
- an encoder and decoder need not perform interpolation operations between sample values of reference pictures for motion compensation.
- an encoder and decoder perform interpolation operations between sample values of reference pictures for motion compensation (adding computational complexity), but motion-compensated predictions tend to more closely approximate blocks (leading to residual values with fewer significant values), compared to integer-sample MV precision.
- Some video codec standards and formats support switching of MV precision during encoding. Encoder-side decisions about which MV precision to use are not made effectively, however, in certain encoding scenarios. In particular, such encoder-side decisions are not made effectively in various situations when encoding artificially-created video content such as screen capture content.
- multiple reference pictures are available at a given time for use for motion-compensated prediction.
- Such video codec standards/formats specify how to manage the multiple reference pictures. For example, reference pictures can be added or dropped automatically according to rules during video encoding and decoding. Or, parameters in a bitstream may indicate information about reference pictures used during video encoding and decoding.
- a reference picture set (“RPS”) is a set of reference pictures available for use in motion-compensated prediction at a given time.
- RPS reference picture set
- an RPS can be updated to add newly decoded pictures and remove older pictures that are no longer used as reference pictures.
- an RPS is updated during encoding and decoding, and syntax elements signaled in the bitstream indicate how to update the RPS.
- Encoder-side decisions about how to update an RPS are not made effectively in certain encoding scenarios, however. In particular, such decisions are not made effectively in various situations when encoding artificially-created video content such as screen capture content.
- a video encoder or video decoder can apply one or more filters to reconstructed sample values of pictures.
- deblock filtering and sample adaptive offset (“SAO”) filtering can be applied to reconstructed sample values.
- Deblock filtering tends to reduce blocking artifacts due to block-based coding, and is adaptively applied to sample values at block boundaries.
- SAO filtering is adaptively applied to sample values that satisfy certain conditions, such as presence of a gradient across the sample values.
- SAO filtering can be enabled or disabled for a sequence.
- SAO filtering can be enabled or disabled on a slice-by-slice basis for luma content of a slice and/or for chroma content of the slice.
- SAO filtering can also be enabled or disabled for blocks within a slice.
- SAO filtering can be enabled or disabled for coding tree blocks (“CTBs”) of a coding tree unit (“CTU”) in a slice, where a CTU typically includes a luma CTB and corresponding chroma CTBs.
- CTB coding tree blocks
- a type index indicates whether SAO filtering is disabled, uses band offsets, or uses edge offsets.
- SAO filtering uses band offsets or edge offsets
- additional syntax elements indicate parameters for the SAO filtering for the CTB.
- a CTB can reuse syntax elements from an adjacent CTB to control SAO filtering. In any event, when SAO filtering is used, it increases the computational complexity of encoding and decoding.
- the detailed description presents innovations in encoder-side decisions that use the results of hash-based block matching when setting parameters during encoding.
- some of the innovations relate to ways to select motion vector (“MV”) precision depending on the results of hash-based block matching.
- Other innovations relate to ways to selectively disable sample adaptive offset (“SAO”) filtering depending on the results of hash-based block matching.
- Still other innovations relate to ways to select which reference pictures to retain in a reference picture set (“RPS”) depending on the results of hash-based block matching.
- the innovations can provide computationally-efficient ways to set parameters during encoding of artificially-created video content such as screen capture content.
- a video encoder encodes video to produce encoded data and outputs the encoded data in a bitstream.
- the encoder determines an MV precision for a unit of the video based at least in part on the results of hash-based block matching.
- the unit can be a sequence, series of pictures between scene changes, group of pictures, picture, tile, slice, coding unit or other unit of video.
- the MV precision can be integer-sample precision, quarter-sample precision, or some other fractional-sample precision.
- the encoder splits the unit into multiple blocks. For a given block of the multiple blocks of the unit, the encoder determines a hash value for the given block, then determines whether there is a match for it among multiple candidate blocks of reference picture(s). The match can signify matching hash values between the given block and one of the multiple candidate blocks, which provides a fast result. Or, the match can further signify sample-by-sample matching between the given block and the one of the multiple candidate blocks, which is slower but may be more reliable. Then, for a non-matched block among the multiple blocks of the unit, the encoder can classify the non-matched block as containing natural video content or artificially-created video content. For example, when classifying the non-matched block, the encoder measures a number of different colors in the non-matched block, then compares the number of different colors to a threshold.
- an image encoder or video encoder encodes an image or video to produce encoded data, and outputs the encoded data in a bitstream.
- the encoder performs hash-based block matching for a current block of a current picture. Based on whether a condition is satisfied, the encoder determines whether to disable SAO filtering for the current block. Based on results of the determining, the encoder selectively disables SAO filtering for the current block. If SAO filtering is not disabled for the current block, the encoder can check one or more other conditions to decide whether to use SAO filtering for the current block and, if SAO filtering is used, determine parameters for SAO filtering for the current block.
- the condition depends on whether a match is found during the hash-based block matching for the current block.
- the condition can also depend on expected quality of the current block relative to quality of a candidate block for the match (e.g., as indicated by a quantization parameter (“QP”) value that applies for the current block and a QP value that applies for the candidate block, respectively).
- QP quantization parameter
- the encoder determines a hash value for the current block, then attempts to find the match for it among multiple candidate blocks of reference picture(s).
- the current block can be a coding tree block (“CTB”) of a coding tree unit (“CTU”), in which case SAO filtering is also selectively disabled for one or more other CTBs of the CTU.
- CTB coding tree block
- CTU coding tree unit
- a video encoder encodes video to produce encoded data and outputs the encoded data in a bitstream. As part of the encoding, the encoder determines which of multiple reference pictures to retain in an RPS based at least in part on results of hash-based block matching.
- the encoder uses the hash-based block matching to estimate how well the reference picture predicts a next picture of a sequence.
- the encoder drops the reference picture that is expected to predict the next picture worse than the other reference pictures predict the next picture. For example, the encoder performs the hash-based block matching between blocks of the next picture and candidate blocks of a reference picture, where a count indicates how many of the blocks of the next picture have matching blocks in the reference picture. With this information, the encoder drops the reference picture having the lowest count.
- the multiple reference pictures can include one or more previous reference pictures previously in the RPS for encoding of a current picture.
- the multiple reference pictures can also include a current reference picture that is a reconstructed version of the current picture.
- the encoder uses the hash-based block matching to estimate similarity to the current reference picture.
- the encoder drops one of the previous reference picture(s) that is estimated to be most similar to the current reference picture. For example, the encoder performs the hash-based block matching between blocks of the current reference picture and candidate blocks of a previous reference picture, where a count indicates how many of the blocks of the current reference picture have matching blocks in the previous reference picture. With this information, the encoder drops the previous reference picture having the highest count.
- the innovations for encoder-side decisions can be implemented as part of a method, as part of a computing device adapted to perform the method or as part of a tangible computer-readable media storing computer-executable instructions for causing a computing device to perform the method.
- the various innovations can be used in combination or separately.
- any of the innovations for selecting MV precision can be used separately or in combination with any of the innovations for selectively disabling SAO filtering and/or any of the innovations for deciding which reference pictures to retain in an RPS.
- any of the innovations for selectively disabling SAO filtering can be used separately or in combination with any of the innovations for selecting MV precision and/or any of the innovations for deciding which reference pictures to retain in an RPS.
- any of the innovations for deciding which reference pictures to retain in an RPS can be used separately or in combination with any of the innovations for selectively disabling SAO filtering and/or any of the innovations for selecting MV precision.
- FIG. 1 is a diagram of an example computing system in which some described embodiments can be implemented.
- FIGS. 2 a and 2 b are diagrams of example network environments in which some described embodiments can be implemented.
- FIG. 3 is a diagram of an example encoder system in conjunction with which some described embodiments can be implemented.
- FIGS. 4 a and 4 b are diagrams illustrating an example video encoder in conjunction with which some described embodiments can be implemented.
- FIG. 5 is diagram illustrating a computer desktop environment with content that may provide input for screen capture.
- FIG. 6 is a diagram illustrating composite video with natural video content and artificially-created video content.
- FIG. 7 is a table illustrating hash values for candidate blocks in hash-based block matching.
- FIGS. 8 a -8 c are tables illustrating example data structures that organize candidate blocks for hash-based block matching.
- FIGS. 9 a -9 c are tables illustrating example data structures that organize candidate blocks for iterative hash-based block matching.
- FIGS. 10 a and 10 b are diagrams illustrating motion compensation with MV values having an integer-sample spatial displacement and fractional-sample spatial displacement, respectively.
- FIGS. 11, 12 and 15 are flowcharts illustrating techniques for selecting MV precision depending on the results of hash-based block matching.
- FIG. 13 is a diagram illustrating characteristics of blocks of natural video content and blocks of screen capture content.
- FIG. 14 is a flowchart illustrating a generalized technique for classifying a block of video depending on a measure of the number of different colors in the block.
- FIGS. 16 and 17 are flowcharts illustrating techniques for selectively disabling SAO filtering depending on the results of hash-based block matching.
- FIG. 18 is a diagram illustrating updates to reference pictures of an RPS.
- FIGS. 19-21 are flowcharts illustrating techniques for deciding which reference pictures to retain in an RPS depending on the results on hash-based block matching.
- the detailed description presents innovations in encoder-side decisions that use the results of hash-based block matching when setting parameters during encoding.
- some of the innovations relate to ways to select motion vector (“MV”) precision depending on the results of hash-based block matching.
- Other innovations relate to ways to selectively disable sample adaptive offset (“SAO”) filtering depending on the results of hash-based block matching.
- Still other innovations relate to ways to select which reference pictures to retain in a reference picture set (“RPS”) depending on the results of hash-based block matching.
- the innovations can provide computationally-efficient ways to set parameters during encoding of artificially-created video content such as screen capture content.
- Screen capture content typically includes repeated structures (e.g., graphics, text characters).
- Screen capture content is usually encoded in a format (e.g., YUV 4:4:4 or RGB 4:4:4) with high chroma sampling resolution, although it may also be encoded in a format with lower chroma sampling resolution (e.g., YUV 4:2:0).
- Common scenarios for encoding/decoding of screen capture content include remote desktop conferencing and encoding/decoding of graphical overlays on natural video or other “mixed-content” video.
- FIG. 1 illustrates a generalized example of a suitable computing system ( 100 ) in which several of the described innovations may be implemented.
- the computing system ( 100 ) is not intended to suggest any limitation as to scope of use or functionality, as the innovations may be implemented in diverse general-purpose or special-purpose computing systems.
- the computing system ( 100 ) includes one or more processing units ( 110 , 115 ) and memory ( 120 , 125 ).
- the processing units ( 110 , 115 ) execute computer-executable instructions.
- a processing unit can be a general-purpose central processing unit (“CPU”), processor in an application-specific integrated circuit (“ASIC”) or any other type of processor.
- CPU central processing unit
- ASIC application-specific integrated circuit
- FIG. 1 shows a central processing unit ( 110 ) as well as a graphics processing unit or co-processing unit ( 115 ).
- the tangible memory ( 120 , 125 ) may be volatile memory (e.g., registers, cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory, etc.), or some combination of the two, accessible by the processing unit(s).
- the memory ( 120 , 125 ) stores software ( 180 ) implementing one or more innovations for encoder decisions based on the results of hash-based block matching (e.g., for selecting MV precision, for selectively disabling SAO filtering and/or for deciding which references pictures to retain in a RPS), in the form of computer-executable instructions suitable for execution by the processing unit(s).
- a computing system may have additional features.
- the computing system ( 100 ) includes storage ( 140 ), one or more input devices ( 150 ), one or more output devices ( 160 ), and one or more communication connections ( 170 ).
- An interconnection mechanism such as a bus, controller, or network interconnects the components of the computing system ( 100 ).
- operating system software provides an operating environment for other software executing in the computing system ( 100 ), and coordinates activities of the components of the computing system ( 100 ).
- the tangible storage ( 140 ) may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing system ( 100 ).
- the storage ( 140 ) stores instructions for the software ( 180 ) implementing one or more innovations for encoder decisions based on the results of hash-based block matching.
- the input device(s) ( 150 ) may be a touch input device such as a keyboard, mouse, pen, or trackball, a voice input device, a scanning device, or another device that provides input to the computing system ( 100 ).
- the input device(s) ( 150 ) may be a camera, video card, TV tuner card, screen capture module, or similar device that accepts video input in analog or digital form, or a CD-ROM or CD-RW that reads video input into the computing system ( 100 ).
- the output device(s) ( 160 ) may be a display, printer, speaker, CD-writer, or another device that provides output from the computing system ( 100 ).
- the communication connection(s) ( 170 ) enable communication over a communication medium to another computing entity.
- the communication medium conveys information such as computer-executable instructions, audio or video input or output, or other data in a modulated data signal.
- a modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
- communication media can use an electrical, optical, RF, or other carrier.
- Computer-readable media are any available tangible media that can be accessed within a computing environment.
- computer-readable media include memory ( 120 , 125 ), storage ( 140 ), and combinations of any of the above.
- program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
- the functionality of the program modules may be combined or split between program modules as desired in various embodiments.
- Computer-executable instructions for program modules may be executed within a local or distributed computing system.
- system and “device” are used interchangeably herein. Unless the context clearly indicates otherwise, neither term implies any limitation on a type of computing system or computing device. In general, a computing system or computing device can be local or distributed, and can include any combination of special-purpose hardware and/or general-purpose hardware with software implementing the functionality described herein.
- the disclosed methods can also be implemented using specialized computing hardware configured to perform any of the disclosed methods.
- the disclosed methods can be implemented by an integrated circuit (e.g., an ASIC (such as an ASIC digital signal processor (“DSP”), a graphics processing unit (“GPU”), or a programmable logic device (“PLD”), such as a field programmable gate array (“FPGA”)) specially designed or configured to implement any of the disclosed methods.
- an ASIC such as an ASIC digital signal processor (“DSP”), a graphics processing unit (“GPU”), or a programmable logic device (“PLD”), such as a field programmable gate array (“FPGA”)
- FIGS. 2 a and 2 b show example network environments ( 201 , 202 ) that include video encoders ( 220 ) and video decoders ( 270 ).
- the encoders ( 220 ) and decoders ( 270 ) are connected over a network ( 250 ) using an appropriate communication protocol.
- the network ( 250 ) can include the Internet or another computer network.
- each real-time communication (“RTC”) tool ( 210 ) includes both an encoder ( 220 ) and a decoder ( 270 ) for bidirectional communication.
- a given encoder ( 220 ) can produce output compliant with a variation or extension of the H.265/HEVC standard, SMPTE 421M standard, ISO-IEC 14496-10 standard (also known as H.264 or AVC), another standard, or a proprietary format, with a corresponding decoder ( 270 ) accepting encoded data from the encoder ( 220 ).
- the bidirectional communication can be part of a video conference, video telephone call, or other two-party or multi-part communication scenario.
- the network environment ( 201 ) in FIG. 2 a includes two real-time communication tools ( 210 )
- the network environment ( 201 ) can instead include three or more real-time communication tools ( 210 ) that participate in multi-party communication.
- a real-time communication tool ( 210 ) manages encoding by an encoder ( 220 ).
- FIG. 3 shows an example encoder system ( 300 ) that can be included in the real-time communication tool ( 210 ).
- the real-time communication tool ( 210 ) uses another encoder system.
- a real-time communication tool ( 210 ) also manages decoding by a decoder ( 270 ).
- an encoding tool ( 212 ) includes an encoder ( 220 ) that encodes video for delivery to multiple playback tools ( 214 ), which include decoders ( 270 ).
- the unidirectional communication can be provided for a video surveillance system, web camera monitoring system, remote desktop conferencing presentation or other scenario in which video is encoded and sent from one location to one or more other locations.
- the network environment ( 202 ) in FIG. 2 b includes two playback tools ( 214 ), the network environment ( 202 ) can include more or fewer playback tools ( 214 ).
- a playback tool ( 214 ) communicates with the encoding tool ( 212 ) to determine a stream of video for the playback tool ( 214 ) to receive.
- the playback tool ( 214 ) receives the stream, buffers the received encoded data for an appropriate period, and begins decoding and playback.
- FIG. 3 shows an example encoder system ( 300 ) that can be included in the encoding tool ( 212 ).
- the encoding tool ( 212 ) uses another encoder system.
- the encoding tool ( 212 ) can also include server-side controller logic for managing connections with one or more playback tools ( 214 ).
- a playback tool ( 214 ) can also include client-side controller logic for managing connections with the encoding tool ( 212 ).
- FIG. 3 is a block diagram of an example encoder system ( 300 ) in conjunction with which some described embodiments may be implemented.
- the encoder system ( 300 ) can be a general-purpose encoding tool capable of operating in any of multiple encoding modes such as a low-latency encoding mode for real-time communication, a transcoding mode, and a higher-latency encoding mode for producing media for playback from a file or stream, or it can be a special-purpose encoding tool adapted for one such encoding mode.
- the encoder system ( 300 ) can be adapted for encoding of a particular type of content (e.g., screen capture content).
- the encoder system ( 300 ) can be implemented as an operating system module, as part of an application library or as a standalone application. Overall, the encoder system ( 300 ) receives a sequence of source video frames ( 311 ) from a video source ( 310 ) and produces encoded data as output to a channel ( 390 ). The encoded data output to the channel can include content encoded using encoder-side decisions as described herein.
- the video source ( 310 ) can be a camera, tuner card, storage media, screen capture module, or other digital video source.
- the video source ( 310 ) produces a sequence of video frames at a frame rate of, for example, 30 frames per second.
- the term “frame” generally refers to source, coded or reconstructed image data.
- a frame is a progressive-scan video frame.
- an interlaced video frame might be de-interlaced prior to encoding.
- two complementary interlaced video fields are encoded together as a single video frame or encoded as two separately-encoded fields.
- frame or “picture” can indicate a single non-paired video field, a complementary pair of video fields, a video object plane that represents a video object at a given time, or a region of interest in a larger image.
- the video object plane or region can be part of a larger image that includes multiple objects or regions of a scene.
- An arriving source frame ( 311 ) is stored in a source frame temporary memory storage area ( 320 ) that includes multiple frame buffer storage areas ( 321 , 322 , . . . , 32 n ).
- a frame buffer ( 321 , 322 , etc.) holds one source frame in the source frame storage area ( 320 ).
- a frame selector ( 330 ) selects an individual source frame from the source frame storage area ( 320 ).
- the order in which frames are selected by the frame selector ( 330 ) for input to the encoder ( 340 ) may differ from the order in which the frames are produced by the video source ( 310 ), e.g., the encoding of some frames may be delayed in order, so as to allow some later frames to be encoded first and to thus facilitate temporally backward prediction.
- the encoder system ( 300 ) can include a pre-processor (not shown) that performs pre-processing (e.g., filtering) of the selected frame ( 331 ) before encoding.
- the pre-processing can include color space conversion into primary (e.g., luma) and secondary (e.g., chroma differences toward red and toward blue) components and resampling processing (e.g., to reduce the spatial resolution of chroma components) for encoding.
- primary e.g., luma
- secondary e.g., chroma differences toward red and toward blue
- resampling processing e.g., to reduce the spatial resolution of chroma components
- YUV indicates any color space with a luma (or luminance) component and one or more chroma (or chrominance) components, including Y′UV, YIQ, Y′IQ and YDbDr as well as variations such as YCbCr and YCoCg.
- the chroma sample values may be sub-sampled to a lower chroma sampling rate (e.g., for YUV 4:2:0 format), or the chroma sample values may have the same resolution as the luma sample values (e.g., for YUV 4:4:4 format).
- the video can be encoded in another format (e.g., RGB 4:4:4 format, GBR 4:4:4 format or BGR 4:4:4 format).
- the encoder ( 340 ) encodes the selected frame ( 331 ) to produce a coded frame ( 341 ) and also produces memory management control operation (“MMCO”) signals ( 342 ) or reference picture set (“RPS”) information.
- the RPS is the set of frames that may be used for reference in motion compensation for a current frame or any subsequent frame. If the current frame is not the first frame that has been encoded, when performing its encoding process, the encoder ( 340 ) may use one or more previously encoded/decoded frames ( 369 ) that have been stored in a decoded frame temporary memory storage area ( 360 ). Such stored decoded frames ( 369 ) are used as reference frames for inter-frame prediction of the content of the current source frame ( 331 ).
- the MMCO/RPS information ( 342 ) indicates to a decoder which reconstructed frames may be used as reference frames, and hence should be stored in a frame storage area. Example ways to make decisions about which reference pictures to retain in an RPS are described below.
- the encoder ( 340 ) includes multiple encoding modules that perform encoding tasks such as partitioning into tiles, intra prediction estimation and prediction, motion estimation and compensation, frequency transforms, quantization and entropy coding.
- the exact operations performed by the encoder ( 340 ) can vary depending on compression format.
- the format of the output encoded data can be a variation or extension of H.265/HEVC format, Windows Media Video format, VC-1 format, MPEG-x format (e.g., MPEG-1, MPEG-2, or MPEG-4), H.26x format (e.g., H.261, H.262, H.263, H.264), or another format.
- the encoder ( 340 ) can partition a frame into multiple tiles of the same size or different sizes. For example, the encoder ( 340 ) splits the frame along tile rows and tile columns that, with frame boundaries, define horizontal and vertical boundaries of tiles within the frame, where each tile is a rectangular region. Tiles are often used to provide options for parallel processing.
- a frame can also be organized as one or more slices, where a slice can be an entire frame or region of the frame.
- a slice can be decoded independently of other slices in a frame, which improves error resilience.
- the content of a slice or tile is further partitioned into blocks or other sets of sample values for purposes of encoding and decoding.
- a coding tree unit (“CTU”) includes luma sample values organized as a luma coding tree block (“CTB”) and corresponding chroma sample values organized as two chroma CTBs.
- CTB luma coding tree block
- the size of a CTU (and its CTBs) is selected by the encoder, and can be, for example, 64 ⁇ 64, 32 ⁇ 32 or 16 ⁇ 16 sample values.
- a CTU includes one or more coding units.
- a coding unit (“CU”) has a luma coding block (“CB”) and two corresponding chroma CBs.
- a CTU with a 64 ⁇ 64 luma CTB and two 64 ⁇ 64 chroma CTBs can be split into four CUs, with each CU including a 32 ⁇ 32 luma CB and two 32 ⁇ 32 chroma CBs, and with each CU possibly being split further into smaller CUs.
- a CTU with a 64 ⁇ 64 luma CTB and two 32 ⁇ 32 chroma CTBs (YUV 4:2:0 format) can be split into four CUs, with each CU including a 32 ⁇ 32 luma CB and two 16 ⁇ 16 chroma CBs, and with each CU possibly being split further into smaller CUs.
- the smallest allowable size of CU e.g., 8 ⁇ 8, 16 ⁇ 16
- 8 ⁇ 8, 16 ⁇ 16 can be signaled in the bitstream.
- a CU has a prediction mode such as inter or intra.
- a CU includes one or more prediction units for purposes of signaling of prediction information (such as prediction mode details, displacement values, etc.) and/or prediction processing.
- a prediction unit (“PU”) has a luma prediction block (“PB”) and two chroma PBs.
- PB luma prediction block
- the PU For an intra-predicted CU, the PU has the same size as the CU, unless the CU has the smallest size (e.g., 8 ⁇ 8). In that case, the CU can be split into four smaller PUs (e.g., each 4 ⁇ 4 if the smallest CU size is 8 ⁇ 8) or the PU can have the smallest CU size, as indicated by a syntax element for the CU.
- a CU also has one or more transform units for purposes of residual coding/decoding, where a transform unit (“TU”) has a transform block (“TB”) and two chroma TBs.
- TU transform unit
- TB transform block
- a PU in an intra-predicted CU may contain a single TU (equal in size to the PU) or multiple TUs.
- the encoder decides how to partition video into CTUs, CUs, PUs, TUs, etc.
- a slice can include a single slice segment (independent slice segment) or be divided into multiple slice segments (independent slice segment and one or more dependent slice segments).
- a slice segment is an integer number of CTUs ordered consecutively in a tile scan, contained in a single network abstraction layer (“NAL”) unit.
- NAL network abstraction layer
- a slice segment header includes values of syntax elements that apply for the independent slice segment.
- a truncated slice segment header includes a few values of syntax elements that apply for that dependent slice segment, and the values of the other syntax elements for the dependent slice segment are inferred from the values for the preceding independent slice segment in decoding order.
- block can indicate a macroblock, prediction unit, residual data unit, or a CB, PB or TB, or some other set of sample values, depending on context.
- the encoder represents an intra-coded block of a source frame ( 331 ) in terms of prediction from other, previously reconstructed sample values in the frame ( 331 ).
- an intra-picture estimator or motion estimator estimates displacement of a block with respect to the other, previously reconstructed sample values in the same frame.
- An intra-frame prediction reference region is a region of sample values in the frame that are used to generate BC-prediction values for the block.
- the intra-frame prediction region can be indicated with a block vector (“BV”) value, which can be represented in the bitstream as a motion vector (“MV”) value.
- BV block vector
- MV motion vector
- the intra-picture estimator estimates extrapolation of the neighboring reconstructed sample values into the block.
- Prediction information (such as BV/MV values for intra BC prediction, or prediction mode (direction) for intra spatial prediction) can be entropy coded and output.
- An intra-frame prediction predictor (or motion compensator for BV/MV values) applies the prediction information to determine intra prediction values.
- the encoder ( 340 ) represents an inter-frame coded, predicted block of a source frame ( 331 ) in terms of prediction from one or more reference frames ( 369 ).
- a motion estimator estimates the motion of the block with respect to the one or more reference frames ( 369 ).
- the motion estimator can select an MV precision (e.g., integer-sample MV precision, 1 ⁇ 2-sample MV precision, or 1 ⁇ 4-sample MV precision), for example, using an approach described herein, then use the selected MV precision during motion estimation.
- the multiple reference frames can be from different temporal directions or the same temporal direction.
- a motion-compensated prediction reference region is a region of sample values in the reference frame(s) that are used to generate motion-compensated prediction values for a block of sample values of a current frame.
- the motion estimator outputs motion information such as MV information, which is entropy coded.
- a motion compensator applies MVs to reference frames ( 369 ) to determine motion-compensated prediction values for inter-frame prediction.
- the encoder can determine the differences (if any) between a block's prediction values (intra or inter) and corresponding original values. These prediction residual values are further encoded using a frequency transform, quantization and entropy encoding. For example, the encoder ( 340 ) sets values for quantization parameter (“QP”) for a picture, tile, slice and/or other portion of video, and quantizes transform coefficients accordingly.
- QP quantization parameter
- the entropy coder of the encoder ( 340 ) compresses quantized transform coefficient values as well as certain side information (e.g., MV information, selected MV precision, SAO filtering parameters, RPS update information, QP values, mode decisions, other parameter choices).
- Typical entropy coding techniques include Exponential-Golomb coding, Golomb-Rice coding, arithmetic coding, differential coding, Huffman coding, run length coding, variable-length-to-variable-length (“V2V”) coding, variable-length-to-fixed-length (“V2F”) coding, Lempel-Ziv (“LZ”) coding, dictionary coding, probability interval partitioning entropy coding (“PIPE”), and combinations of the above.
- the entropy coder can use different coding techniques for different kinds of information, can apply multiple techniques in combination (e.g., by applying Golomb-Rice coding followed by arithmetic coding), and can choose from among multiple code tables within a particular coding technique.
- An adaptive deblocking filter is included within the motion compensation loop in the encoder ( 340 ) to smooth discontinuities across block boundary rows and/or columns in a decoded frame.
- Other filtering such as de-ringing filtering, adaptive loop filtering (“ALF”), or SAO filtering
- ALF adaptive loop filtering
- SAO filtering can alternatively or additionally be applied as in-loop filtering operations. Example approaches to making decisions about enabling or disabling SAO filtering are described below.
- the encoded data produced by the encoder ( 340 ) includes syntax elements for various layers of bitstream syntax.
- a picture parameter set (“PPS”) is a syntax structure that contains syntax elements that may be associated with a picture.
- a PPS can be used for a single picture, or a PPS can be reused for multiple pictures in a sequence.
- a PPS is typically signaled separate from encoded data for a picture (e.g., one NAL unit for a PPS, and one or more other NAL units for encoded data for a picture).
- a syntax element indicates which PPS to use for the picture.
- a sequence parameter set (“SPS”) is a syntax structure that contains syntax elements that may be associated with a sequence of pictures.
- a bitstream can include a single SPS or multiple SPSs.
- a SPS is typically signaled separate from other data for the sequence, and a syntax element in the other data indicates which SPS to use.
- the coded frames ( 341 ) and MMCO/RPS information ( 342 ) are processed by a decoding process emulator ( 350 ).
- the decoding process emulator ( 350 ) implements some of the functionality of a decoder, for example, decoding tasks to reconstruct reference frames.
- the decoding processes emulator ( 350 ) determines whether a given coded frame ( 341 ) needs to be reconstructed and stored for use as a reference frame in inter-frame prediction of subsequent frames to be encoded.
- the decoding process emulator ( 350 ) models the decoding process that would be conducted by a decoder that receives the coded frame ( 341 ) and produces a corresponding decoded frame ( 351 ). In doing so, when the encoder ( 340 ) has used decoded frame(s) ( 369 ) that have been stored in the decoded frame storage area ( 360 ), the decoding process emulator ( 350 ) also uses the decoded frame(s) ( 369 ) from the storage area ( 360 ) as part of the decoding process.
- the decoded frame temporary memory storage area ( 360 ) includes multiple frame buffer storage areas ( 361 , 362 , . . . , 36 n ).
- the decoding process emulator ( 350 ) manages the contents of the storage area ( 360 ) in order to identify any frame buffers ( 361 , 362 , etc.) with frames that are no longer needed by the encoder ( 340 ) for use as reference frames.
- the decoding process emulator ( 350 ) stores a newly decoded frame ( 351 ) in a frame buffer ( 361 , 362 , etc.) that has been identified in this manner.
- the coded frames ( 341 ) and MMCO/RPS information ( 342 ) are buffered in a temporary coded data area ( 370 ).
- the coded data that is aggregated in the coded data area ( 370 ) contains, as part of the syntax of an elementary coded video bitstream, encoded data for one or more pictures.
- the coded data that is aggregated in the coded data area ( 370 ) can also include media metadata relating to the coded video data (e.g., as one or more parameters in one or more supplemental enhancement information (“SEI”) messages or video usability information (“VUI”) messages).
- SEI Supplemental Enhancement Information
- VUI video usability information
- the aggregated data ( 371 ) from the temporary coded data area ( 370 ) are processed by a channel encoder ( 380 ).
- the channel encoder ( 380 ) can packetize and/or multiplex the aggregated data for transmission or storage as a media stream (e.g., according to a media program stream or transport stream format such as ITU-T H.222.01ISO/IEC 13818-1 or an Internet real-time transport protocol format such as IETF RFC 3550), in which case the channel encoder ( 380 ) can add syntax elements as part of the syntax of the media transmission stream.
- the channel encoder ( 380 ) can organize the aggregated data for storage as a file (e.g., according to a media container format such as ISO/IEC 14496-12), in which case the channel encoder ( 380 ) can add syntax elements as part of the syntax of the media storage file. Or, more generally, the channel encoder ( 380 ) can implement one or more media system multiplexing protocols or transport protocols, in which case the channel encoder ( 380 ) can add syntax elements as part of the syntax of the protocol(s).
- the channel encoder ( 380 ) provides output to a channel ( 390 ), which represents storage, a communications connection, or another channel for the output.
- the channel encoder ( 380 ) or channel ( 390 ) may also include other elements (not shown), e.g., for forward-error correction (“FEC”) encoding and analog signal modulation.
- FEC forward-error correction
- FIGS. 4 a and 4 b are a block diagram of a generalized video encoder ( 400 ) in conjunction with which some described embodiments may be implemented.
- the encoder ( 400 ) receives a sequence of video pictures including a current picture as an input video signal ( 405 ) and produces encoded data in a coded video bitstream ( 495 ) as output.
- the encoder ( 400 ) is block-based and uses a block format that depends on implementation. Blocks may be further sub-divided at different stages, e.g., at the prediction, frequency transform and/or entropy encoding stages. For example, a picture can be divided into 64 ⁇ 64 blocks, 32 ⁇ 32 blocks or 16 ⁇ 16 blocks, which can in turn be divided into smaller blocks of sample values for coding and decoding. In implementations of encoding for the H.265/HEVC standard, the encoder partitions a picture into CTUs (CTBs), CUs (CBs), PUs (PBs) and TU (TBs).
- CTUs CTUs
- CBs CUs
- PBs PUs
- TBs TU
- the encoder ( 400 ) compresses pictures using intra-picture coding and/or inter-picture coding. Many of the components of the encoder ( 400 ) are used for both intra-picture coding and inter-picture coding. The exact operations performed by those components can vary depending on the type of information being compressed.
- a tiling module ( 410 ) optionally partitions a picture into multiple tiles of the same size or different sizes. For example, the tiling module ( 410 ) splits the picture along tile rows and tile columns that, with picture boundaries, define horizontal and vertical boundaries of tiles within the picture, where each tile is a rectangular region. In H.265/HEVC implementations, the encoder ( 400 ) partitions a picture into one or more slices, where each slice includes one or more slice segments.
- the general encoding control ( 420 ) receives pictures for the input video signal ( 405 ) as well as feedback (not shown) from various modules of the encoder ( 400 ). Overall, the general encoding control ( 420 ) provides control signals (not shown) to other modules (such as the tiling module ( 410 ), transformer/scaler/quantizer ( 430 ), scaler/inverse transformer ( 435 ), intra-picture estimator ( 440 ), motion estimator ( 450 ), filtering control ( 460 ) and intra/inter switch) to set and change coding parameters during encoding.
- modules such as the tiling module ( 410 ), transformer/scaler/quantizer ( 430 ), scaler/inverse transformer ( 435 ), intra-picture estimator ( 440 ), motion estimator ( 450 ), filtering control ( 460 ) and intra/inter switch
- the general encoding control ( 420 ) can manage decisions about MV precision, whether to enable or disable SAO filtering and which reference pictures to retain in an RPS.
- the general encoding control ( 420 ) can also evaluate intermediate results during encoding, for example, performing rate-distortion analysis.
- the general encoding control ( 420 ) produces general control data ( 422 ) that indicates decisions made during encoding, so that a corresponding decoder can make consistent decisions.
- the general control data ( 422 ) is provided to the header formatter/entropy coder ( 490 ).
- a motion estimator ( 450 ) estimates the motion of blocks of sample values of a current picture of the input video signal ( 405 ) with respect to one or more reference pictures.
- the decoded picture buffer (“DPB”) ( 470 ) buffers one or more reconstructed previously coded pictures for use as reference pictures.
- the multiple reference pictures can be from different temporal directions or the same temporal direction.
- the motion estimator ( 450 ) can select an MV precision (e.g., integer-sample MV precision, 1 ⁇ 2-sample MV precision, or 1 ⁇ 4-sample MV precision) using an approach described herein, then use the selected MV precision during motion estimation.
- an MV precision e.g., integer-sample MV precision, 1 ⁇ 2-sample MV precision, or 1 ⁇ 4-sample MV precision
- the motion estimator ( 450 ) can use the block hash dictionary ( 451 ) to find an MV value for a current block.
- the block hash dictionary ( 451 ) is a data structure that organizes candidate blocks for hash-based block matching.
- the block hash dictionary ( 451 ) is an example of a hash table. In FIG.
- the block hash dictionary ( 451 ) is constructed based upon input sample values.
- a block hash dictionary can be constructed based upon reconstructed sample values and updated during encoding to store information about new candidate blocks, as those candidate blocks become available for use in hash-based block matching.
- the motion estimator ( 450 ) produces as side information motion data ( 452 ) such as MV data, merge mode index values, and reference picture selection data, and the selected MV precision. These are provided to the header formatter/entropy coder ( 490 ) as well as the motion compensator ( 455 ).
- the motion compensator ( 455 ) applies MVs to the reconstructed reference picture(s) from the DPB ( 470 ).
- the motion compensator ( 455 ) produces motion-compensated predictions for the current picture.
- an intra-picture estimator ( 440 ) determines how to perform intra-picture prediction for blocks of sample values of a current picture of the input video signal ( 405 ).
- the current picture can be entirely or partially coded using intra-picture coding.
- the intra-picture estimator ( 440 ) uses values of a reconstruction ( 438 ) of the current picture, for intra spatial prediction, the intra-picture estimator ( 440 ) determines how to spatially predict sample values of a current block of the current picture from neighboring, previously reconstructed sample values of the current picture.
- the intra-picture estimator ( 440 ) can determine the direction of spatial prediction to use for a current block.
- the intra-picture estimator ( 440 ) or motion estimator ( 450 ) estimates displacement of the sample values of the current block to different candidate reference regions within the current picture, as a reference picture.
- the intra-picture estimator ( 440 ) or motion estimator ( 450 ) can use a block hash dictionary (not shown) to find a BV/MV value for a current block.
- a block hash dictionary not shown
- pixels of a block are encoded using previous sample values stored in a dictionary or other location, where a pixel is a set of co-located sample values (e.g., an RGB triplet or YUV triplet).
- the intra-picture estimator ( 440 ) produces as side information intra prediction data ( 442 ), such as mode information, prediction mode direction (for intra spatial prediction), and offsets and lengths (for dictionary mode).
- the intra prediction data ( 442 ) is provided to the header formatter/entropy coder ( 490 ) as well as the intra-picture predictor ( 445 ).
- the intra-picture predictor ( 445 ) spatially predicts sample values of a current block of the current picture from neighboring, previously reconstructed sample values of the current picture.
- the intra-picture predictor ( 445 ) or motion compensator ( 455 ) predicts the sample values of the current block using previously reconstructed sample values of an intra-picture prediction reference region, which is indicated by a BV/MV value for the current block.
- the intra-picture predictor ( 445 ) reconstructs pixels using offsets and lengths.
- the intra/inter switch selects whether the prediction ( 458 ) for a given block will be a motion-compensated prediction or intra-picture prediction.
- the difference (if any) between a block of the prediction ( 458 ) and a corresponding part of the original current picture of the input video signal ( 405 ) provides values of the residual ( 418 ), for a non-skip-mode block.
- reconstructed residual values are combined with the prediction ( 458 ) to produce an approximate or exact reconstruction ( 438 ) of the original content from the video signal ( 405 ).
- In lossy compression some information is lost from the video signal ( 405 ).
- a frequency transformer converts spatial-domain video information into frequency-domain (i.e., spectral, transform) data.
- the frequency transformer applies a discrete cosine transform (“DCT”), an integer approximation thereof, or another type of forward block transform (e.g., a discrete sine transform or an integer approximation thereof) to blocks of prediction residual data (or sample value data if the prediction ( 458 ) is null), producing blocks of frequency transform coefficients.
- DCT discrete cosine transform
- the transformer/scaler/quantizer ( 430 ) can apply a transform with variable block sizes.
- the encoder ( 400 ) can also skip the transform step in some cases.
- the scaler/quantizer scales and quantizes the transform coefficients.
- the quantizer applies dead-zone scalar quantization to the frequency-domain data with a quantization step size that varies on a picture-by-picture basis, tile-by-tile basis, slice-by-slice basis, block-by-block basis, frequency-specific basis or other basis.
- the quantized transform coefficient data ( 432 ) is provided to the header formatter/entropy coder ( 490 ).
- a scaler/inverse quantizer performs inverse scaling and inverse quantization on the quantized transform coefficients.
- an inverse frequency transformer performs an inverse frequency transform, producing blocks of reconstructed prediction residual values or sample values.
- the encoder ( 400 ) combines reconstructed residual values with values of the prediction ( 458 ) (e.g., motion-compensated prediction values, intra-picture prediction values) to form the reconstruction ( 438 ).
- the encoder ( 400 ) uses the values of the prediction ( 458 ) as the reconstruction ( 438 ).
- the values of the reconstruction ( 438 ) can be fed back to the intra-picture estimator ( 440 ) and intra-picture predictor ( 445 ).
- the values of the reconstruction ( 438 ) can similarly be fed back to provide reconstructed sample values.
- the values of the reconstruction ( 438 ) can be used for motion-compensated prediction of subsequent pictures.
- a filtering control ( 460 ) determines how to perform deblock filtering and SAO filtering on values of the reconstruction ( 438 ), for a given picture of the video signal ( 405 ). With the general encoding control ( 420 ) and the block hash dictionary ( 451 ), the filtering control ( 460 ) can make decisions about enabling or disabling SAO filtering, as explained below.
- the filtering control ( 460 ) produces filter control data ( 462 ), which is provided to the header formatter/entropy coder ( 490 ) and merger/filter(s) ( 465 ).
- the encoder ( 400 ) merges content from different tiles into a reconstructed version of the picture.
- the encoder ( 400 ) selectively performs deblock filtering and/or SAO filtering according to the filter control data ( 462 ).
- Other filtering such as de-ringing filtering or ALF
- Tile boundaries can be selectively filtered or not filtered at all, depending on settings of the encoder ( 400 ), and the encoder ( 400 ) may provide syntax within the coded bitstream to indicate whether or not such filtering was applied.
- the DPB ( 470 ) buffers the reconstructed current picture for use in subsequent motion-compensated prediction.
- references pictures in the RPS can be buffered in the DPB ( 470 ).
- the DPB ( 470 ) has limited memory space, however. If the reconstructed current picture is retained in the DPB ( 470 ) for use as a reference picture, another picture may be removed from the DPB ( 470 ) (and dropped from the RPS).
- the general encoding control ( 420 ) decides which pictures to retain in the RPS and buffer in the DPB ( 470 ). Using the block hash dictionary ( 451 ), the general encoding control ( 420 ) can make decisions about which reference pictures to retain in the RPS, as explained below.
- the header formatter/entropy coder ( 490 ) formats and/or entropy codes the general control data ( 422 ), quantized transform coefficient data ( 432 ), intra prediction data ( 442 ), motion data ( 452 ) and filter control data ( 462 ).
- the header formatter/entropy coder ( 490 ) can select and entropy code merge mode index values, or a default MV predictor can be used.
- the header formatter/entropy coder ( 490 ) also determines MV differentials for MV values (relative to MV predictors), then entropy codes the MV differentials, e.g., using context-adaptive binary arithmetic coding.
- the header formatter/entropy coder ( 490 ) provides the encoded data in the coded video bitstream ( 495 ).
- the format of the coded video bitstream ( 495 ) can be a variation or extension of H.265/HEVC format, Windows Media Video format, VC-1 format, MPEG-x format (e.g., MPEG-1, MPEG-2, or MPEG-4), H.26x format (e.g., H.261, H.262, H.263, H.264), or another format.
- modules of an encoder ( 400 ) can be added, omitted, split into multiple modules, combined with other modules, and/or replaced with like modules.
- encoders with different modules and/or other configurations of modules perform one or more of the described techniques.
- Specific embodiments of encoders typically use a variation or supplemented version of the encoder ( 400 ).
- the relationships shown between modules within the encoder ( 400 ) indicate general flows of information in the encoder; other relationships are not shown for the sake of simplicity.
- screen capture content represents the output of a computer screen or other display.
- FIG. 5 shows a computer desktop environment ( 510 ) with content that may provide input for screen capture.
- video of screen capture content can represent a series of images of the entire computer desktop ( 511 ).
- video of screen capture content can represent a series of images for one of the windows of the computer desktop environment, such as the app window ( 513 ) including game content, browser window ( 512 ) with Web page content or window ( 514 ) with word processor content.
- screen capture content tends to have relatively few discrete sample values, compared to natural video content that is captured using a video camera. For example, a region of screen capture content often includes a single uniform color, whereas a region in natural video content more likely includes colors that gradually vary. Also, screen capture content typically includes distinct structures (e.g., graphics, text characters) that are exactly repeated from frame-to-frame, even if the content may be spatially displaced (e.g., due to scrolling).
- distinct structures e.g., graphics, text characters
- Screen capture content is usually encoded in a format (e.g., YUV 4:4:4 or RGB 4:4:4) with high chroma sampling resolution, although it may also be encoded in a format with lower chroma sampling resolution (e.g., YUV 4:2:0, YUV 4:2:2).
- a format e.g., YUV 4:4:4 or RGB 4:4:4
- FIG. 6 shows composite video ( 620 ) that includes natural video content ( 621 ) and artificially-created video content.
- the artificially-created video content includes a graphic ( 622 ) beside the natural video content ( 621 ) and ticker ( 623 ) running below the natural video content ( 621 ).
- the artificially-created video content shown in FIG. 6 tends to have relatively few discrete sample values. It also tends to have distinct structures (e.g., graphics, text characters) that are exactly repeated from frame-to-frame or gradually offset from frame-to-frame (e.g., due to scrolling).
- a video encoder uses the results of hash-based block matching when making decisions about parameters during encoding. This section describes examples of hash-based block matching.
- the encoder determines a hash value for each of multiple candidate blocks of one or more reference pictures.
- a hash table stores the hash values for the candidate blocks.
- the encoder also determines a hash value for a current block by the same hashing approach, and then searches the hash table for a matching hash value. If two blocks are identical, their hash values are the same.
- an encoder can quickly and efficiently identify candidate blocks that have the same hash value as the current block, and filter out candidate blocks that have different hash values.
- the encoder may then further evaluate those candidate blocks having the same hash value as the current block. (Different blocks can have the same hash value. So, among the candidate blocks with the same hash value, the encoder can further identify a candidate block that matches the current block.)
- hash values for candidate blocks are determined from the input sample values for the pictures (reference pictures) that include the candidate blocks.
- the encoder determines the hash value for a current block using input sample values. The encoder compares it (or otherwise uses the hash value) against the hash values determined from input sample values for candidate blocks. Even so, reconstructed sample values from the matching block are used to represent the current block. Thus, prediction operations still use reconstructed sample values.
- the candidate blocks considered in hash-based block matching include reconstructed sample values. That is, the candidate blocks are part of previously encoded then reconstructed content in a picture. Hash values for the candidate blocks are determined from the reconstructed sample values.
- the encoder determines the hash value for a current block using input sample values. The encoder compares it (or otherwise uses the hash value) against the hash values determined from reconstructed sample values for candidate blocks.
- FIG. 7 illustrates hash values ( 700 ) for candidate blocks B(x, y) in hash-based block matching, where x and y indicate horizontal and vertical coordinates, respectively, for the top-left position of a given candidate block.
- the candidate blocks have hash values determined using a hash function h( ).
- the encoder determines a hash value h(B) for the candidate block from input sample values for the reference picture.
- the encoder can determine hash values for all candidate blocks in the reference picture. Or, the encoder can screen out some candidate blocks.
- the hash function h( ) yields n possible hash values, designated h 0 to h n-1 .
- the candidate blocks with that hash value are grouped.
- the candidate blocks B(1266, 263), B(1357, 365), B(1429, 401), B(502, 464), . . . have the hash value h 0 .
- Groups can include different numbers of candidate blocks.
- the group for hash value h 4 includes a single candidate block, while the group for hash value h 0 includes more than four candidate blocks.
- the possible candidate blocks are distributed into n categories.
- the number of candidate blocks per hash value can be further reduced by eliminating redundant, identical blocks with that hash value, or by screening out candidate blocks having certain patterns of sample values.
- the encoder can iteratively winnow down the number of candidate blocks using different hash functions.
- the hash function used for hash-based block matching depends on implementation.
- a hash function can produce hash values with 8 bits, 12 bits, 16 bits, 24 bits, 32 bits, or some other number of bits. If a hash value has fewer bits, the data structure includes fewer categories, but each category may include more candidate blocks. On the other hand, using hash values with more bits tends to increase the size of the data structure that organizes candidate blocks. If a hash value has more bits, the data structure includes more categories, but each category may include fewer candidate blocks.
- the hash function h( ) can be a cryptographic hash function, part of a cryptographic hash function, cyclic redundancy check (“CRC”) function, part of a CRC, or another hash function (e.g., using averaging and XOR operations to determine the signature of a candidate block or current block).
- CRC cyclic redundancy check
- Some types of hash function e.g., CRC function
- map similar blocks to different hash values which may be efficient when seeking a matching block that exactly corresponds with a current block.
- Other types of hash function e.g., locality-sensitive hash function
- the encoder determines the hash value for the current block B current .
- the hash value h(B current ) is h 3 .
- the encoder can identify candidate blocks that have the same hash value (shown in outlined box in FIG. 7 ), and filter out the other candidate blocks.
- the identified candidate blocks include blocks that might be identical to the current block.
- the identified candidate blocks include blocks that might be identical to the current block or might be close approximations of the current block. Either way, from these identified candidate blocks, the encoder can further identify a matching block for the current block (e.g., using sample-wise block matching operations, using a second hash function).
- hash-based block matching can make the process of evaluating the candidate blocks in reference picture(s) much more efficient.
- hash values for candidate blocks can be reused in hash-based block matching for different blocks within a picture during encoding. In this case, the cost of computing the hash values for the candidate blocks can be amortized across hash-based block matching operations for the entire picture, for other pictures that use the same reference picture, and for other encoder-side decisions that use the hash values.
- the encoder uses a data structure that organizes candidate blocks according to their hash values.
- the data structure can help make hash-based block matching more computationally efficient.
- the data structure implements, for example, a block hash dictionary or hash table as described herein.
- FIG. 8 a illustrates an example data structure ( 800 ) that organizes candidate blocks for hash-based block matching.
- the n possible hash values are h 0 to h n-1 .
- Candidate blocks with the same hash value are classified in the same candidate block list.
- a given candidate block list can include zero or more entries.
- the candidate block list for the hash value h 2 has no entries
- the list for the hash value h 6 has two entries
- the list for the hash value h 1 has more than four entries.
- An entry(h i , k) includes information for the k th candidate block with the hash value h i .
- an entry in a candidate block list can include the address of a block B(x, y) (e.g., horizontal and vertical coordinates for the top-left position of the block).
- an entry in a candidate block list can include the address of a block B(x, y) and a hash value from a second hash function, which can be used for iterative hash-based block matching.
- the encoder determines the hash value of the current block h(B current ).
- the encoder retains the candidate block list with the same hash value and rules out the other n ⁇ 1 lists.
- the encoder can compare the current block with the candidate block(s), if any, in the retained candidate block list.
- the encoder can eliminate (n ⁇ 1)/n of the candidate blocks (on average), and focus on the remaining 1/n candidate blocks (on average) in the retained list, significantly reducing the number of sample-wise block matching operations.
- an entry for a candidate block in the data structure stores information indicating the reference picture that includes the candidate block, which can be used in hash-based block matching.
- different data structures can be used for different sizes of blocks.
- one data structure includes hash values for 8 ⁇ 8 candidate blocks
- a second data structure includes hash values for 16 ⁇ 16 candidate blocks
- a third data structure includes hash values for 32 ⁇ 32 candidate blocks, and so on.
- the data structure used during hash-based block matching depends on the size of the current block.
- a single, unified data structure can be used for different sizes of blocks.
- a hash function can produce an n-bit hash value, where m bits of the n-bit hash value indicate a hash value among the possible blocks of a given block size according to an m-bit hash function, and the remaining n-m bits of the n-bit hash value indicate the given block size.
- the first two bits of a 14-bit hash function can indicate a block size, while the remaining 12 bits indicate a hash value according to a 12-bit hash function.
- a hash function can produce an m-bit hash value regardless of the size of the block, and an entry for a candidate block in the data structure stores information indicating the block size for the candidate block, which can be used in hash-based block matching.
- the data structure can store information representing a very large number of candidate blocks.
- the encoder can eliminate redundant values. For example, the encoder can skip adding identical blocks to the data structure. In general, reducing the size of the data structure by eliminating identical blocks can hurt coding efficiency. Thus, by deciding whether to eliminate identical blocks, the encoder can trade off memory size for the data structure and coding efficiency.
- the encoder can also screen out candidate blocks, depending on the content of the blocks.
- the encoder can rule out n ⁇ 1 lists of candidate blocks based on the hash value of a current block, but the encoder may still need to perform sample-wise block matching operations for the remaining candidate block(s), if any, for the list with the matching hash value. Also, when updating a data structure that organizes candidate blocks, the encoder may need to perform sample-wise block matching operations to identify identical blocks. Collectively, these sample-wise block matching operations can be computationally intensive.
- the encoder uses iterative hash-based block matching. Iterative hash-based block matching can speed up the block matching process and also speed up the process of updating a data structure that organizes candidate blocks.
- Iterative hash-based block matching uses multiple hash values determined with different hash functions. For a block B (current block or candidate block), in addition to the hash value h(B), the encoder determines another hash value h′(B) using a different hash function h′( ). With the first hash value h(B current ) for a current block, the encoder identifies candidate blocks that have the same hash value for the first hash function h( ). To further rule out some of these identified candidate blocks, the encoder uses a second hash value h′(B current ) for the current block, which is determined using a different hash function.
- the encoder compares the second hash value h′(B current ) with the second hash values for the previously identified candidate blocks (which have same first hash value), in order to filter out more of the candidate blocks.
- a hash table tracks hash values for the candidate blocks according to the different hash functions.
- an entry includes a block address and a second hash value h′(B) from the hash function h′( ).
- the encoder compares the second hash value h′(B current ) for the current block with the second hash values h′(B) for the respective candidate blocks with entry(3, 0), entry (3, 1), entry(3, 2), entry(3, 3), . . . .
- the encoder can rule out more of the candidate blocks, leaving candidate blocks, if any, that have first and second hash values matching h(B current ) and h′(B current ), respectively.
- the encoder can perform sample-wise block matching on any remaining candidate blocks to select a matching block.
- FIGS. 9 a -9 c show another example of iterative hash-based block matching that uses a different data structure.
- the data structure ( 900 ) in FIG. 9 a organizes candidate blocks by first hash value from a first hash function h( ), which has n 1 possible hash values.
- the data structure ( 900 ) includes lists for hash values from h 0 . . . h n1-1 .
- the list ( 910 ) for h 2 includes multiple lists that further organize the remaining candidate blocks by second hash value from a second hash function h which has n 2 possible hash values.
- the list ( 910 ) includes lists for hash values from h′ 0 . . . h′ n2-1 , each including entries with block addresses (e.g., horizontal and vertical coordinates for top-left positions of respective candidate blocks), as shown for the entry ( 920 ) in FIG. 9 c .
- the encoder can perform sample-wise block matching to select a matching block.
- the lists for the second hash values are specific to a given list for the first hash value.
- there is one set of lists for the second hash values and the encoder identifies any candidate blocks that are (1) in the matching list for the first hash values and also (2) in the matching list for the second hash values.
- the second hash function h′( ) can be used to simplify the process of updating a data structure that organizes candidate blocks. For example, when the encoder checks whether a new candidate block is identical to a candidate block already represented in the data structure, the encoder can use multiple hash values with different hash functions to filter out non-identical blocks. For remaining candidate blocks, the encoder can perform sample-wise block matching to identify any identical block.
- the iterative hash-based block matching and updating use two different hash functions.
- the encoder uses three, four or more hash functions to further speed up hash-based block matching or filter out non-identical blocks, and thereby reduce the number of sample-wise block matching operations.
- the encoder can skip sample-wise block matching operations when hash values match. For hash functions with a large number of possible hash values, there is a high probability that two blocks are identical if hash values for the two blocks match.
- the encoder considers, as the results of hash-based block matching, whether hash values match, but does not perform any sample-wise block matching operations.
- This section presents various approaches to selection of motion vector (“MV”) precision during encoding, depending on the results of hash-based block matching (e.g., matching hash values). By selecting appropriate MV precisions during encoding, these approaches can facilitate compression that is effective in terms of rate-distortion performance and/or computational efficiency of encoding and decoding.
- MV motion vector
- MV values When encoding artificially-created video content, MV values usually represent integer-sample spatial displacements, and very few MV values represent fractional-sample spatial displacements. This provides opportunities for reducing MV precision to improve overall performance.
- FIG. 10 a shows motion compensation with an MV ( 1020 ) having an integer-sample spatial displacement.
- the MV ( 1020 ) indicates a spatial displacement of four samples to the left, and one sample up, relative to the co-located position ( 1010 ) in a reference picture for a current block.
- the MV ( 1020 ) indicates a 4 ⁇ 4 prediction region ( 1030 ) whose position is (60, 95) in the reference picture.
- the prediction region ( 1030 ) includes reconstructed sample values at integer-sample positions in the reference picture. An encoder or decoder need not perform interpolation to determine the values of the prediction region ( 1030 ).
- FIG. 10 b shows motion compensation with an MV ( 1021 ) having a fractional-sample spatial displacement.
- the MV ( 1021 ) indicates a spatial displacement of 3.75 samples to the left, and 0.5 samples up, relative to the co-located position ( 1010 ) in a reference picture for a current block.
- the MV ( 1021 ) indicates a 4 ⁇ 4 prediction region ( 1031 ) whose position is (60.25, 95.5) in the reference picture.
- the prediction region ( 1031 ) includes interpolated sample values at fractional-sample positions in the reference picture.
- An encoder or decoder performs interpolation to determine the sample values of the prediction region ( 1031 ).
- the quality of motion-compensated prediction usually improves, at least for some types of video content (e.g., natural video content).
- MV values are typically represented using integer values whose meaning depends on MV precision.
- integer-sample MV precision for example, an integer value of 1 indicates a spatial displacement of 1 sample, an integer value of 2 indicates a spatial displacement of 2 samples, and so on.
- integer value of 1 indicates a spatial displacement of 0.25 samples.
- Integer values of 2, 3, 4 and 5 indicate spatial displacements of 0.5, 0.75, 1.0 and 1.25 samples, respectively.
- the integer value can indicate a magnitude of the spatial displacement, and separate flag value can indicate whether displacement is negative or positive.
- the horizontal MV component and vertical MV component of a given MV value can be represented using two integer values.
- the meaning of two integer values representing an MV value depends on MV precision. For example, for an MV value having a 2-sample horizontal displacement and no vertical displacement, if MV precision is 1 ⁇ 4-sample MV precision, the MV value is represented as (8, 0). If MV precision is integer-sample MV precision, however, the MV value is represented as (2, 0).
- MV values in a bitstream of encoded video data are typically entropy coded (e.g., on an MV-component-wise basis).
- An MV value may also be differentially encoded relative to a predicted MV value (e.g., on an MV-component-wise basis). In many cases, the MV value equals the predicted MV value, so the differential MV value is zero, which can be encoded very efficiently.
- a differential MV value (or MV value, if MV prediction is not used) can be entropy encoded using Exponential-Golomb coding, context-adaptive binary arithmetic coding or another form of entropy coding.
- MV value or differential MV value
- encoded bits depends on the form of entropy coding used, in general, smaller values are encoded more efficiently (that is, using fewer bits) because they are more common, and larger values are encoded less efficiently (that is, using more bits) because they are less common.
- using MV values with integer-sample MV precision tends to reduce bit rate associated with signaling MV values and reduce computational complexity of encoding and decoding (by avoiding interpolation of sample values at fractional-sample positions in reference pictures), but may reduce the quality of motion-compensated prediction, at least for some types of video content.
- using MV values with fractional-sample MV precision tends to increase bit rate associated with signaling MV values and increase computational complexity of encoding and decoding (by including interpolation of sample values at fractional-sample positions in reference pictures), but may improve the quality of motion-compensated prediction, at least for some types of video content.
- the added costs of fractional-sample MV precision may be unjustified. For example, if most MV values represent integer-sample spatial displacements, and very few MV values represent fractional-sample spatial displacements, the added costs of fractional-sample MV precision are not warranted.
- the encoder can skip searching at fractional-sample positions (and interpolation operations to determine sample values at those positions) during motion estimation. For such content, bit rate and computational complexity can be reduced, without a significant penalty to the quality of motion-compensated prediction, by using MV values with integer-sample MV precision.
- an encoder and decoder can be adapted to switch between MV precisions.
- an encoder and decoder can use integer-sample MV precision for artificially-created video content, but use a fractional-sample MV precision (such as 1 ⁇ 4-sample MV precision) for natural video content.
- Approaches that an encoder may follow when selecting MV precision are described in the next section.
- the encoder can signal the selected MV precision to the decoder using one or more syntax elements in the bitstream.
- the encoder selects an MV precision on a slice-by-slice basis.
- a flag value in a sequence parameter set (“SPS”), picture parameter set (“PPS”) or other syntax structure indicates whether adaptive selection of MV precision is enabled. If so, one or more syntax elements in a slice header for a given slice indicate the selected MV precision for blocks of that slice. For example, a flag value of 0 indicates 1 ⁇ 4-sample MV precision, and a flag value of 1 indicates integer-sample MV precision.
- the encoder selects an MV precision on a picture-by-picture basis or slice-by-slice basis.
- a syntax element in a PPS indicates one of three MV precision modes: (0) 1 ⁇ 4-sample MV precision for MV values of slice(s) of a picture associated with the PPS, (1) integer-sample MV precision for MV values of slice(s) of a picture associated with the PPS, or (2) slice-adaptive MV precision depending on a flag value signaled per slice header, where the flag value in the slice header can indicate 1 ⁇ 4-sample MV precision or integer-sample MV precision for MV values of the slice.
- the encoder selects an MV precision on a CU-by-CU basis.
- One or more syntax elements in a structure for a given CU indicate the selected MV precision for blocks of that CU. For example, a flag value in a CU syntax structure for a CU indicates whether MV values for all PUs associated with the CU have integer-sample MV precision or 1 ⁇ 4-sample MV precision.
- the encoder and decoder can use different MV precisions for horizontal and vertical MV components. This can be useful when encoding artificially-created video content that has been scaled horizontally or vertically (e.g., using integer-sample MV precision in an unscaled dimension, and using a fractional-sample MV precision in a scaled dimension).
- an encoder may resize video horizontally or vertically to reduce bit rate, then encode the resized video.
- the video is scaled back to its original dimensions after decoding.
- the encoder can signal the MV precision for horizontal MV components and also signal the MV precision for vertical MV components to the decoder.
- the encoder selects an MV precision and signals the selected MV precision in some way.
- a flag value in a SPS, PPS or other syntax structure can indicate whether adaptive selection of MV precision is enabled.
- one or more syntax elements in sequence-layer syntax, GOP-layer syntax, picture-layer syntax, slice-layer syntax, tile-layer syntax, block-layer syntax or another syntax structure can indicate the selected MV precision for horizontal and vertical components of MV values.
- one or more syntax elements in sequence-layer syntax, GOP-layer syntax, picture-layer syntax, slice-header-layer syntax, slice-data-layer syntax, tile-layer syntax, block-layer syntax or another syntax structure can indicate MV precisions for different MV components.
- a flag value can indicate a selection between the two MV precisions. Where there are more available MV precisions, an integer value can a selection between those MV precisions.
- decoding can be modified to change how signaled MV values are interpreted depending on the selected MV precision.
- the details of how MV values are encoded and reconstructed can vary depending on MV precision. For example, when the MV precision is integer-sample precision, predicted MV values can be rounded to the nearest integer, and differential MV values can indicate integer-sample offsets. Or, when the MV precision is 1 ⁇ 4-sample precision, predicted MV values can be rounded to the nearest 1 ⁇ 4-sample offset, and differential MV values can indicate 1 ⁇ 4-sample offsets. Or, MV values can be signaled in some other way.
- chroma MV values can be derived by scaling, etc., which may result in 1 ⁇ 2-sample displacements for chroma. Or, chroma MV values can be rounded to integer values.
- the encoder does not change how MV values are predicted or how MV differences are signaled in the bitstream, nor does the decoder change how MV values are predicted or how MV differences are reconstructed, but the interpretation of reconstructed MV values changes depending on the selected MV precision. If the selected MV precision is integer-sample precision, a reconstructed MV value is scaled by a factor of 4 before being used in a motion compensation process (which operates at quarter-sample precision). If the selected MV precision is quarter-sample precision, the reconstructed MV value is not scaled before being used in the motion compensation process.
- an encoder selects an MV precision for a unit of video (e.g., the MV precision for one or both components of MV values for the unit).
- the encoder can select the MV precision to use depending on the results of hash-based block matching (e.g., matching hash values).
- the selection of the MV precision can also depend on other factors, such as classification of blocks as natural video content or artificially-created video content.
- FIG. 11 shows a generalized technique ( 1100 ) for selecting MV precision depending on the results of hash-based block matching.
- the technique ( 1100 ) can be performed by an encoder such as one described with reference to FIG. 3 or FIGS. 4 a and 4 b , or by another encoder.
- the encoder encodes ( 1110 ) video to produce encoded data, then outputs ( 1120 ) the encoded data in a bitstream.
- the encoder determines an MV precision for a unit of the video based at least in part on the results of hash-based block matching.
- the MV precision can apply for one or both components of MV values.
- the hash-based block matching can use a hash table as described in section VI or use another hash table. For example, if at least a threshold number of blocks of the unit of the video have matching blocks identified in the hash-based block matching (according to matching hash values, without performing sample-wise block matching), the encoder selects integer-sample MV precision. Otherwise, the encoder selects a fractional-sample MV precision.
- FIG. 12 shows a more specific technique ( 1200 ) for adapting MV precision during encoding, where MV precision is selected depending on the results of hash-based block matching.
- the technique ( 1200 ) can be performed by an encoder such as one described with reference to FIG. 3 or FIGS. 4 a and 4 b , or by another encoder.
- the encoder determines an MV precision from among multiple MV precisions for units of the video. Specifically, when encoding a unit of video, the encoder determines ( 1210 ) whether to change MV precision. At the start of encoding, the encoder can initially set the MV precision according to a default value, or proceed as if changing the MV precision. For later units of video, the encoder may use the current MV precision (which was used for one or more previously encoded units) or change the MV precision. For example, the encoder can decide to change MV precision upon the occurrence of a defined event (e.g., after encoding of a threshold number of units, after a scene change, after a determination that the type of video has changed).
- a defined event e.g., after encoding of a threshold number of units, after a scene change, after a determination that the type of video has changed.
- the encoder determines ( 1220 ) the MV precision for the unit of video based at least in part on the results of hash-based block matching. For example, the encoder splits the unit into multiple blocks. For a given block of the multiple blocks, the encoder determines a hash value, then determines whether there is a match for it among multiple candidate blocks of one or more reference pictures. The encoder can evaluate a single reference picture (e.g., first reference picture in a reference picture list) or multiple reference pictures (e.g., each reference picture in the reference picture list). The match can signify matching hash values between the given block and one of the multiple candidate blocks.
- a single reference picture e.g., first reference picture in a reference picture list
- multiple reference pictures e.g., each reference picture in the reference picture list
- the match can further signify sample-by-sample matching between the given block and the one of the multiple candidate blocks. (That is, sample-wise comparisons confirm the match.)
- the hash-based block matching can use a hash table as described in section VI or use another hash table. If at least a threshold number of blocks of the unit have matching blocks identified in the hash-based block matching, the encoder can select integer-sample MV precision. Otherwise, the encoder can select a fractional-sample MV precision (such as quarter-sample MV precision).
- the encoder encodes ( 1230 ) the unit using the selected MV precision.
- MV values of blocks e.g., prediction units, macroblocks, or other blocks
- the encoder outputs encoded data for the current unit in a bitstream.
- the encoded data can include syntax elements that indicate the selected MV precision.
- the encoder decides ( 1240 ) whether to continue with the next unit. If so, the encoder decides ( 1210 ) whether to change the MV precision for the next unit.
- MV precision can be selected for each unit. Or, to reduce complexity, the MV precision for a unit can be changed from time-to-time (e.g., periodically or upon the occurrence of a defined event), then repeated for one or more subsequent units.
- the unit of video can be a sequence, series of pictures between scene changes, group of pictures, picture, slice, tile, CU, PU, other block or other type of unit of video.
- the encoder can select MV precision on a highly-local basis (e.g., CU-by-CU basis), a larger region-by-region basis (e.g., tile-by-tile basis or slice-by-slice basis), whole picture basis, or more global basis (e.g., per encoding session, per sequence, per GOP, or per series of pictures between detected scene changes).
- the encoder can select between using 1 ⁇ 4-sample MV precision and integer-sample MV precision. More generally, the encoder selects between multiple available MV precisions, which can include integer-sample MV precision, 1 ⁇ 2-sample MV precision, 1 ⁇ 4-sample MV precision and/or another MV precision.
- the selected MV precision can apply for horizontal components and/or vertical components of MV values for the unit of video.
- the hash-based block matching uses hash values determined from input sample values of the unit and (for candidate blocks) input sample values for one or more reference pictures.
- the hash-based block matching can use hash values determined from reconstructed sample values.
- the encoder when determining the MV precision for a unit of video, the encoder can also consider other factors, such as whether non-matched blocks contain a significant amount of natural video content (camera-captured video), as described in the next sections.
- This section presents various ways to classify a non-matched block as natural, camera-captured video content or artificially-created video content (such as screen capture content).
- hash-based block matching may fail to find a matching block for at least some of the blocks of the unit.
- the encoder can classify the non-matched block as containing natural video content or artificially-created video content. By providing a high-probability way to differentiate natural video content from artificially-created video content in non-matched blocks, the encoder can select a more appropriate MV precision.
- FIG. 13 shows characteristics of typical blocks of natural video content and screen capture content, which depict the same general pattern.
- the block ( 1310 ) of natural video content includes gradually changing sample values and irregular lines.
- the block ( 1320 ) of artificially-created video content includes sharper lines and patterns of uniform sample values.
- the number of different color values varies between the block ( 1310 ) of natural video content and block ( 1320 ) of screen capture content.
- the block ( 1320 ) of screen capture content includes three colors, and the block ( 1310 ) of natural video content includes many more different colors.
- FIG. 14 shows a technique ( 1400 ) for classifying a block of video depending on a measure of the number of different colors in the block.
- the technique ( 1400 ) can be performed by an encoder such as one described with reference to FIG. 3 or FIGS. 4 a and 4 b , or by another encoder.
- the encoder measures ( 1410 ) the number of different colors in the non-matched block. For example, the encoder counts the distinct colors among sample values in the block. Or, the encoder counts the distinct colors among sample values in the block after clustering of the sample values into fewer colors (e.g., quantizing the sample values such that similar sample values become the same sample value). Or, the encoder measures the number of different colors in the block in some other way.
- the sample values can be organized as a histogram or organized in some other way.
- the way the encoder measures the number of different colors in the block depends on the color space used. If the color space is YUV (e.g., YCbCr, YCoCg), for example, the encoder can count different Y values in the unit of video. Or, the encoder can count different YUV triplets (that is, distinct combinations of Y, U and V sample values for pixels at locations). If the color space is RGB (or GBR or BGR), the encoder can count sample values in one color component or multiple color components. Or, the encoder can count different triplets (that is, distinct combinations of R, G and B sample values for pixels at locations).
- RGB or GBR or BGR
- the encoder compares ( 1420 ) the number of different colors in the non-matched block to a threshold count.
- the value of the threshold count depends on implementation and can be, for example, 5, 8, 10, 20, or 50.
- the threshold count can be the same for all sizes of units (e.g., regardless of block size). Or, the threshold count can be different for different unit sizes (e.g., different block sizes).
- the threshold can be pre-defined and static, or the threshold can be adjustable (tunable). In any case, the presence of a small number of discrete sample values in a non-matched block tends to indicate screen capture content, and the presence of a large number of discrete sample values in a non-matched block tends to indicate natural video content.
- the encoder classifies ( 1440 ) the block as natural video content. If the number of different colors is less than the threshold, the encoder classifies ( 1430 ) the block as artificially-created video content.
- the boundary condition (count equals threshold) can be handled using either option, depending on implementation.
- the encoder repeats the technique ( 1400 ) on a block-by-block basis for non-matched blocks of the unit. In some example implementations, when more than a defined proportion of the non-matched blocks of the unit are classified as natural video content, the encoder selects a fractional-sample MV precision, since integer-sample MV precision is primarily useful when encoding artificially-created video content.
- the encoder otherwise considers statistics from the collected sample values of a non-matched block. For example, the encoder determines whether the x most common collected sample values account for more than y % of the sample values.
- the values of x and y depend on implementation. The value of x can be 10 or some other count. The value of y can be 80, 90 or some other percentage less than 100. If the x most common sample values account for more than y % of the sample values in the block, the block is classified as containing artificially-created video content. Otherwise, the block is classified as containing natural video content.
- FIG. 15 shows an example technique ( 1500 ) for selecting MV precision depending on the results of hash-based block matching and further depending on classification of non-matched blocks.
- the technique ( 1500 ) can be performed by an encoder such as one described with reference to FIG. 3 or FIGS. 4 a and 4 b , or by another encoder.
- the encoder splits ( 1510 ) a unit of video into T blocks.
- the T blocks are non-overlapped M ⁇ N blocks.
- the M ⁇ N blocks are 8 ⁇ 8 blocks.
- the M ⁇ N blocks have another size.
- the encoder compares ( 1520 ) T to a block count threshold.
- the block count threshold is 10.
- the block count threshold has another value.
- the block count threshold can be pre-defined and static, or the block count threshold can be adjustable (tunable).
- the block count threshold ensures that the encoder considers a sufficient number of blocks when selecting the MV precision for the unit. If T is less than the block count threshold, the encoder selects ( 1580 ) quarter-sample MV precision for the unit.
- the boundary condition (T equals the block count threshold) can be handled using this option or the other option, depending on implementation.
- the encoder performs ( 1530 ) hash-based block matching for the T blocks of the unit. For each of the T blocks, the encoder calculates a hash value and finds if there is a candidate block of a reference picture that has an identical hash value. Of the T blocks of the unit, the encoder finds M blocks that have matching blocks (according to matching hash values) in the hash-based block matching. This leaves T M non-matched blocks.
- the encoder compares ( 1540 ) the proportion M/T to a matched block threshold.
- the matched block threshold is 25%.
- the matched block threshold has another value.
- the matched block threshold can be pre-defined and static, or the matched block threshold can be adjustable (tunable).
- the matched block threshold ensures that a sufficient number of matched blocks has been found when selecting the MV precision for the unit. If M/T is less than the matched block threshold, the encoder selects ( 1580 ) quarter-sample MV precision for the unit.
- the boundary condition (M/T equals the matched block threshold) can be handled using this option or the other option, depending on implementation. Alternatively, instead of using M/T, the encoder compares some other measure that relates to the number of matched blocks to a threshold.
- the encoder classifies ( 1550 ) each the T ⁇ M non-matched blocks into one of two categories depending on the histogram of color values (number of different colors) in the block.
- the two categories are (1) natural video content, for blocks more likely to contain camera-captured video content, and (2) artificially-created video content, for blocks more likely to contain screen capture content.
- the encoder counts the number of different colors contained in the block.
- the encoder can count a single color component (e.g., luma, G) or count all of the color components (e.g., luma and chroma; R, G and B).
- the encoder compares the count to a color threshold, whose value depends on implementation.
- the color threshold is 8 for an 8 ⁇ 8 block.
- the color threshold has another value.
- the color threshold can be the same for all sizes of blocks. Or, the color threshold can be different for different block sizes.
- the color threshold can be pre-defined and static, or the color threshold can be adjustable (tunable).
- the non-matched block is classified as artificially-created video content. If the count is greater than the color threshold, the non-matched block is classified as natural video content.
- the boundary condition (count equals the color threshold) can be handled using either option, depending on implementation.
- the encoder compares ( 1560 ) the proportion C/T to a natural video block threshold.
- the natural video block threshold is 3%.
- the natural video block threshold has another value.
- the natural video block threshold can be pre-defined and static, or the natural video block threshold can be adjustable (tunable).
- the natural video block threshold ensures that integer-sample MV precision is not selected if there are too many blocks of natural video content. If C/T is greater than the natural video block threshold, the encoder selects ( 1580 ) quarter-sample MV precision for the unit. If C/T is less than the natural video block threshold, the encoder selects ( 1570 ) integer-sample MV precision for the unit.
- the boundary condition (C/T equals the natural video block threshold) can be handled using either option, depending on implementation. Alternatively, instead of using C/T, the encoder compares some other measure that relates to the number of natural video blocks to a threshold.
- the encoder selects the MV precision based on one or more of: (a) a comparison of a number of the multiple blocks to a blocks threshold, (b) a comparison of a measure of the multiple blocks that have matching blocks from the hash-based block matching to a matched blocks threshold, and (c) a comparison of a measure of the multiple blocks classified as natural video content to a natural video blocks threshold.
- the encoder selects integer-sample MV precision if: (a) the number of the multiple blocks greater than the blocks threshold, (b) the measure of the multiple blocks that have matching blocks from the hash-based block matching is greater than the matched blocks threshold, AND (c) the measure of the multiple blocks classified as natural video content is less than the natural video blocks threshold. Otherwise, when any of these conditions (a)-(c) is not satisfied, the encoder selects quarter-sample MV precision. As noted, handling of the boundary conditions depends on implementation.
- the encoder can repeat per-tile MV precisions from picture-to-picture. Co-located tiles from picture-to-picture can use the same MV precision. Similarly, co-located slices from picture-to-picture can use the same MV precision. For example, suppose video depicts a computer desktop, and part of the desktop has a window displaying natural video content. A fractional-sample MV precision may be used within that region of the desktop from picture-to-picture, whether other areas that show text or other rendered content are encoded using integer-sample MV precision.
- the encoder can adjust an amount of bias towards or against integer-sample MV precision based at least in part on a degree of confidence that integer-sample MV precision is appropriate.
- the encoder can also adjust an amount of bias towards or against integer-sample MV precision based at least in part on target computational complexity of encoding and/or decoding (favoring integer-sample MV precision to reduce computational complexity). For example, the encoder can adjust thresholds used in comparison operations to make it more likely or less likely that integer-sample MV precision is selected.
- the selected MV precision can be for horizontal MV components and/or vertical MV components of the MV values of blocks within the unit of the video, where the horizontal MV components and vertical MV components are permitted to have different MV precisions.
- the selected MV precision can be for both horizontal MV components and vertical MV components of the MV values of blocks within the unit of the video, where the horizontal MV components and vertical MV components have the same MV precision.
- the encoded video in the bitstream includes one or more syntax elements that indicate the selected MV precision for the unit.
- a decoder parses the syntax element(s) indicating the selected MV precision and interprets MV values according to the selected MV precision.
- the encoded video in the bitstream can lack any syntax elements that indicate the selected MV precision. For example, even if the bitstream supports signaling of MV values with a fractional-sample MV precision, the encoder can constrain motion estimation for the unit of the video to use only MV values with fractional parts of zero, and only MV values that indicate integer-sample offsets are used in motion compensation.
- a decoder reconstructs and applies MV values at the fractional-sample MV precision (where the MV values indicate integer-sample offsets). This may reduce computational complexity of decoding by avoiding interpolation operations.
- This section presents various approaches to selectively disabling sample adaptive offset (“SAO”) filtering depending on the results of hash-based block matching (e.g., matching hash values).
- SAO sample adaptive offset
- SAO filtering involves non-linear filtering operations that can be used, for example, to enhance edge sharpness or suppress banding artifacts or ringing artifacts.
- SAO filtering can be adaptively applied to sample values that satisfy certain conditions, such as presence of a gradient across the sample values.
- SAO filtering can be enabled or disabled for a sequence. Specifically, whether SAO filtering is performed for pictures of a sequence can be controlled by a syntax element in the SPS. If sample_adaptive_offset_enabled_flag is 1, SAO filtering may be applied to slices of reconstructed pictures after deblocking filtering. If sample_adaptive_offset_enabled_flag is 0, SAO filtering is not applied.
- SAO filtering when enabled for a sequence, SAO filtering can be enabled or disabled on a slice-by-slice basis for luma content of a slice and/or chroma content of the slice.
- two slice segment header flags control SAO filtering for a slice. If slice_sao_luma_flag is 1, SAO filtering is enabled for the luma component of the slice. If slice_sao_luma_flag is 0 (default value, if not present), SAO filtering is disabled for the luma component of the slice. If slice_sao_chroma_flag is 1, SAO filtering is enabled for the chroma component of the slice. If slice_sao_chroma_flag is 0 (default value, if not present), SAO filtering is disabled for the chroma component of the slice.
- SAO filtering can be enabled or disabled for CTBs of a CTU in a slice, where a CTU typically includes a luma CTB and corresponding chroma CTBs.
- a type index indicates whether SAO filtering is disabled, uses band offsets, or uses edge offsets. If the type index is 0, SAO filtering is disabled for the CTB. If the type index is 1, the type of SAO filtering used for the CTB is band offset. Finally, if the type index is 2, the type of SAO filtering used for the CTB is edge offset.
- a CTB can reuse syntax elements from an adjacent CTB to control SAO filtering.
- the relevant sample value range is split into 32 bands. Sample values in four consecutive bands are modified by adding band offsets.
- a syntax element indicates the starting position of the bands to be modified, and other syntax elements indicate the band offsets.
- a syntax element indicates whether a horizontal, vertical, 45 degree or 135 degree gradient is used in SAO filtering.
- Each sample value of a CTB is classified based on relations to its neighbor sample values along the selected gradient (e.g., classified as a flat area, local minimum, edge, or local maximum). For categories other than “flat area,” an offset (indicated by syntax elements in the bitstream) is added to the sample value.
- SAO filtering can enhance edge sharpness and suppress certain types of artifacts, but it increases the computational complexity of encoding and decoding, and it consumes some bits signaling SAO parameters.
- the added costs of SAO filtering may be unjustified. For example, if blocks of a screen content region of a current picture are predicted well using candidate blocks in a reference picture, and the expected quality of the blocks is at least as good as the quality of the candidate blocks in the reference picture, SAO filtering may fail to improve quality. For such content, bit rate and computational complexity can be reduced, without a significant penalty to quality, by disabling SAO filtering.
- FIG. 16 shows a generalized technique ( 1600 ) for selectively disabling SAO filtering depending on the results (e.g., matching hash values) of hash-based block matching.
- the technique ( 1600 ) can be performed by an encoder such as one described with reference to FIG. 3 or FIGS. 4 a and 4 b , or by another encoder.
- the encoder encodes an image or video to produce encoded data, which the encoder outputs as part of a bitstream.
- the encoder performs ( 1610 ) hash-based block matching for a current block of a current picture.
- the current block can be a CTB of a CTU, or some other block.
- the encoder determines a hash value for the current block, then attempts to find a match for it among multiple candidate blocks of one or more reference pictures.
- the encoder can evaluate a single reference picture (e.g., first reference picture in a reference picture list) or multiple reference pictures (e.g., each reference picture in the reference picture list). The match can signify matching hash values between the given block and one of the multiple candidate blocks.
- the match can further signify sample-by-sample matching between the given block and the one of the multiple candidate blocks. (That is, sample-wise comparisons confirm the match.)
- the hash-based block matching can use a hash table as described in section VI or use another hash table.
- the encoder determines ( 1620 ) whether to disable SAO filtering for the current block.
- the condition depends on whether a match is found during the hash-based block matching for the current block (e.g., considering matching hash values, but not sample-wise comparisons). Reconstructed sample values may be different than the input sample values used to determine hash values. Thus, the condition can also depend on other factors, such as expected quality of the current block relative to quality of a candidate block for the match. Alternatively, the condition depends on other and/or additional factors.
- the expected quality of the current block can be indicated by a quantization parameter (“QP”) value that applies for the current block
- QP quantization parameter
- the QP values can be picture QP values (QP value for the current picture versus QP value for the reference picture that includes the candidate block) or block-level QP values.
- the QP value that applies for the candidate block can be (a) a smallest QP value among the different QP values for the blocks, (b) a QP value of whichever block covers a largest portion of the candidate block, (c) an average QP value among the different QP values for the blocks, (d) a weighted average QP value among the different QP values for the blocks, (e) a largest QP value among the different QP values for the blocks, or (f) some other QP value derived from one or more of the different QP values for the blocks.
- the encoder can check that the QP value for the current picture is greater than or equal to the QP value for the reference picture that includes the candidate block. Or, as part of the condition, the encoder can check that the QP value that applies for the current block is greater than or equal to the QP value that applies for the candidate block. If the QP value for the current picture is greater than or equal to the QP value for the reference picture, the expected error for the current picture is equivalent to or worse than the expected error for the reference picture. Similarly, if the QP value that applies for the current block is greater than or equal to the QP value that applies for the candidate block, the expected error for the current block is equivalent to or worse than the expected error for the candidate block. Alternatively, instead of checking QP values for the current block and candidate block, the encoder evaluates expected quality of the current block relative to quality of a candidate block for the match in some other way.
- the encoder Based on results of the determining ( 1620 ), the encoder selectively disables ( 1630 ) SAO filtering for the current block. If SAO filtering is not disabled for the current block, the encoder can check one or more other conditions to decide whether to use SAO filtering for the current block and, if SAO filtering is used, determine parameters for SAO filtering for the current block. As part of the SAO determination process, the encoder can evaluate different options for type of SAO filter (edge offset or band offset), gradients, bands, offset values, etc.
- the encoder can repeat the technique ( 1600 ) on a block-by-block basis for other blocks of a CTU, a slice or picture.
- FIG. 17 illustrates a more detailed example technique ( 1700 ) for selectively disabling SAO filtering depending on the results of hash-based block matching.
- the technique ( 1700 ) can be performed by an encoder such as one described with reference to FIG. 3 or FIGS. 4 a and 4 b , or by another encoder.
- the encoder selectively disables SAO filtering for a current block of a current picture.
- the encoder performs ( 1710 ) hash-based block matching for the current block. For example, the encoder performs hash-based block matching using one of the hash tables described in section VI.
- the encoder checks ( 1720 ) if hash-based block matching yields a match (here, matching hash values) for the current block. If the hash-based block matching yields a match, the encoder determines ( 1730 ) QP values for the current block and the candidate block (e.g., from picture-level, slice-level and/or CU-level QP values), then determines ( 1740 ) whether the candidate block passes a quality check (e.g., reconstruction quality of the candidate block (or reference picture) is not worse than the expected quality of the current block (or current picture)).
- a quality check e.g., reconstruction quality of the candidate block (or reference picture) is not worse than the expected quality of the current block (or current picture)
- both checks ( 1720 , 1740 ) are passed, the encoder disables ( 1750 ) SAO filtering for the current block, bypassing any other SAO filtering checking for the current block (according to one or more other conditions). Otherwise, if either of the two checks ( 1720 , 1740 ) fails, the encoder performs ( 1760 ) SAO filtering checking for the current block. That is, if either of the two checks ( 1720 , 1740 ) fails, the encoder can still determine whether SAO filtering should or should not be used for the current block (according to one or more other conditions) and, if SAO filtering is used, determine the parameters of SAO filtering for the current block.
- the encoder can repeat the technique ( 1700 ) on a block-by-block basis for other blocks of a CTU, a slice or picture.
- This section presents various approaches to deciding which reference pictures to retain in a reference picture set (“RPS”) depending on the results of hash-based block matching (e.g., matching hash values). By selecting reference pictures that facilitate effective motion-compensated prediction, these approaches can facilitate compression that is effective in terms of rate-distortion performance.
- RPS reference picture set
- a reference picture is, in general, a picture that contains samples that may be used for prediction in the decoding process of other pictures, which typically follow the reference picture in decoding order (also called coding order, coded order or decoded order). Multiple reference pictures may be available at a given time for use for motion-compensated prediction.
- an RPS is a set of reference pictures available for use in motion-compensated prediction.
- an encoder or decoder determines an RPS that includes reference pictures in a decoded frame storage area such as a decoded picture buffer (“DPB”).
- the size of the RPS can be pre-defined or set according to a syntax element in a bitstream.
- a syntax element indicates a constraint on the maximum number of reference pictures contained in the RPS.
- the reference pictures in the RPS may be adjacent in display order (also called temporal order) or separated from each other in display order.
- a given reference picture in the RPS can precede a current picture in display order or follow the current picture in display order.
- an RPS is updated—reference pictures in the RPS change from time to time to add newly decoded pictures and drop older pictures that are no longer used as reference pictures.
- the RPS is a description of the reference pictures used in the decoding process of the current and future coded pictures. Reference pictures included in the RPS are listed explicitly in the bitstream. Specifically, the RPS includes reference pictures in multiple groups (also called RPS lists). The encoder can determine the RPS once per picture. For a current picture, the encoder determines groups of short-term reference pictures and long-term reference pictures that may be used in inter-picture prediction of the current picture and/or a following picture (in decoding order). Collectively, the groups of reference pictures define the RPS for the current picture. The encoder signals syntax elements in a slice segment header to indicate how the decoder should update the RPS for the current picture.
- the decoder determines the RPS after decoding a slice segment header for a slice of the current picture, using syntax elements signaled in the slice header.
- Reference pictures are identified with picture order count (“POC”) values, parts thereof and/or other information signaled in the bitstream.
- POC picture order count
- the decoder determines groups of short-term reference pictures and long-term reference pictures that may be used in inter-picture prediction of the current picture and/or a following picture (in decoding order), which define the RPS for the current picture.
- FIG. 18 shows an example ( 1800 ) of updates to reference pictures of an RPS.
- the RPS includes up to four reference pictures, which are separated from each other in display order in FIG. 18 .
- at least some of the reference pictures in the RPS can be adjacent in display order.
- three of the reference pictures in the RPS precede the current picture in display order, but one reference picture follows the current picture in display order.
- the RPS when picture 222 is the current picture, the RPS includes reference pictures 37 , 156 , 221 and 230 . After picture 222 is encoded/decoded, the RPS is updated. Picture 221 is dropped from the RPS, and picture 222 is added to the RPS.
- the RPS includes reference pictures 37 , 156 , 222 and 230 .
- a reference picture list (“RPL”) is a list of reference pictures used for motion-compensated prediction.
- An RPL is constructed from the RPS.
- an RPL is constructed for a slice. Reference pictures in the RPL are addressed with reference indices.
- reference pictures in the RPL can change to reflect changes to the RPS and/or to reorder reference pictures within the RPL to make signaling of the more commonly used reference indices more efficient.
- an RPL is constructed during encoding and decoding based upon available information about the RPL (e.g., available pictures in the RPS), modifications according to rules and/or modifications signaled in the bitstream.
- the H.265/HEVC standard allows an encoder to decide which pictures are retained in an RPS, but does not define the patterns of reference pictures retained or criteria for retaining reference pictures.
- An encoder can apply a simple, fixed strategy such as dropping the oldest reference picture in the RPS, but that may result in dropping a useful reference picture. Sophisticated approaches to evaluating which reference pictures to retain can be computationally-intensive.
- This section describes computationally efficient and effective approaches to deciding which reference pictures to retain in an RPS.
- the approaches are adapted for encoding of artificially-created video content, but can also be applied for other types of video content.
- FIG. 19 shows a generalized technique ( 1900 ) for deciding which reference pictures to retain in an RPS depending on the results (e.g., matching hash values) of hash-based block matching.
- the technique ( 1900 ) can be performed by an encoder such as one described with reference to FIG. 3 or FIGS. 4 a and 4 b , or by another encoder.
- the encoder encodes ( 1910 ) video to produce encoded data and outputs ( 1920 ) the encoded data in a bitstream.
- the encoder determines which of multiple reference pictures to retain in an RPS based at least in part on the results of hash-based block matching.
- the multiple reference pictures include one or more previous reference pictures, which were previously in the RPS for encoding of a current picture, as well as a current reference picture that is a reconstructed version of the current picture.
- the encoder can use the approach shown in FIG. 20 , the approach shown in FIG. 21 , or another approach.
- the RPS can include references pictures pic ref1 , pic ref2 , pic ref3 and pic ref4 for encoding of a current picture.
- the encoder updates the RPS.
- a reconstructed version of the current picture (pic current ) can be added to the RPS, in which case one of the reference pictures previously in the RPS is dropped if the capacity of the RPS is exceeded.
- pic current a reconstructed version of the current picture
- any four of pic ref1 , pic ref2 , pic ref3 , pic ref4 and pic current can be included in the RPS, and the remaining picture is dropped.
- hash values for the hash-based block matching are computed from input sample values for a picture, whether the picture is a next picture, current picture (current reference picture) or previous reference picture. That is, even though the encoder is making decisions about reference pictures, which include reconstructed sample values, the hash values are computed from input sample values for those pictures.
- FIG. 20 shows a first example technique ( 2000 ) for deciding which reference pictures to retain in an RPS depending on the results on hash-based block matching.
- the technique ( 2000 ) can be performed by an encoder such as one described with reference to FIG. 3 or FIGS. 4 a and 4 b , or by another encoder.
- the encoder drops the candidate reference picture that is expected to be least effective in predicting the next picture.
- the encoder evaluates the candidate reference pictures (current reference picture and previous reference pictures) in succession. For each of the candidate reference pictures, the encoder uses hash-based block matching to estimate how well the candidate reference picture predicts the next picture. After evaluating the candidate reference pictures, the encoder drops the candidate reference picture that is expected to predict the next picture worst.
- the encoder can simply add the current picture to the RPS as a new reference picture and retain the previous reference pictures.
- the approach shown in FIG. 20 retains in the RPS those candidate reference pictures best suited for motion-compensated prediction of the next picture, but the retained reference pictures might not be as useful for motion-compensated prediction of pictures further in the future (e.g., after a scene change).
- the encoder adds ( 2010 ) the current picture as a candidate reference picture.
- the encoder checks ( 2020 ) whether the RPS, counting the current picture (current reference picture) and previous reference pictures as candidate reference pictures, would be past full. If not, the updated RPS includes the previous reference pictures, if any, and the current picture (current reference picture), and the technique ( 2000 ) ends.
- the encoder determines which candidate reference picture to drop. For a given candidate reference picture, the encoder performs ( 2030 ) hash-based block matching between blocks of the next picture and the candidate reference picture. For example, the encoder splits the next picture into M ⁇ N blocks (where the M ⁇ N blocks can be 8 ⁇ 8 blocks or blocks of some other size), and attempts to find matching hash values for the respective blocks of the next picture and candidate blocks of the candidate reference picture.
- the encoder counts ( 2040 ) blocks of the next picture with matches in the candidate reference picture (e.g., matching hash values from the hash-based block matching, without sample-wise comparisons).
- a count value count cand _ x indicates how many of the blocks of the next picture have matching blocks in the candidate reference picture.
- the encoder checks ( 2050 ) whether to continue with another candidate reference picture. If so, the encoder performs ( 2030 ) hash-based block matching between blocks of the next picture and the other candidate reference picture. Thus, the encoder evaluates the previous reference pictures in the RPS (from encoding of the current picture) as well as the current reference picture (after reconstruction of the current picture) as candidate reference pictures. After determining counts of matches for all of the candidate reference pictures, the encoder drops ( 2060 ) the candidate reference picture with the lowest count.
- the encoder evaluates pic ref pic ref2 , pic ref3 , pic ref4 and pic current as candidate reference pictures, performing ( 2030 ) hash-based block matching for blocks of the next picture.
- the encoder determines ( 2040 ) count values count cand _ ref1 , count cand _ ref2 , count cand _ ref3 , count cand _ ref4 and count cand _ current , for the respective candidate reference pictures.
- the encoder determines which of count cand _ ref1 , count cand _ ref2 , count cand _ ref3 , count cand _ ref4 and count cand _ current is lowest, and drops ( 2060 ) the candidate reference picture having the lowest count.
- the encoder can repeat the technique ( 2000 ) on a picture-by-picture basis.
- FIG. 21 shows a second example technique ( 2100 ) for deciding which reference pictures to retain in an RPS depending on results on hash-based block matching.
- the technique ( 2100 ) can be performed by an encoder such as one described with reference to FIG. 3 or FIGS. 4 a and 4 b , or by another encoder.
- the encoder adds the current picture (current reference picture) to the RPS but drops the candidate previous reference picture that is estimated to be most similar to the current picture (current reference picture). This tends to maintain diversity among the reference pictures in the RPS.
- the encoder evaluates the candidate previous reference pictures in succession. For each of the candidate previous reference pictures (which were in the RPS for encoding of the current picture), the encoder uses hash-based block matching to estimate similarity to the current reference picture. After evaluating the candidate previous reference pictures, the encoder drops the candidate previous reference picture that is estimated to be most similar to the current reference picture.
- the encoder can simply add the current reference picture to the RPS as a new reference picture and retain the previous reference pictures. In this way, the approach shown in FIG. 21 can retain in the RPS reference pictures that are useful for motion-compensated prediction even if future pictures change significantly (e.g., after a scene change).
- the encoder adds ( 2110 ) the current picture as a current reference picture. Compared to the next picture to be encoded, the current reference picture tends to have small temporal differences and a high correlation, so the encoder retains it as a reference picture.
- the encoder checks ( 2120 ) whether the RPS, counting the current reference picture and previous reference pictures as candidate reference pictures, would be past full. If not, the new RPS includes the previous reference pictures, if any, and the current reference picture, and the technique ( 2100 ) ends.
- the encoder determines which candidate previous reference picture to drop. For a given candidate previous reference picture, the encoder performs ( 2130 ) hash-based block matching between blocks of the current reference picture and the candidate previous reference picture. For example, the encoder splits the current reference picture into M ⁇ N blocks (where the M ⁇ N blocks can be 8 ⁇ 8 blocks or blocks of some other size), and attempts to find matching hash values for the respective blocks of the current reference picture and candidate blocks of the candidate previous reference picture.
- the encoder counts ( 2140 ) blocks of the current reference picture with matches in the candidate previous reference picture (e.g., matching hash values from the hash-based block matching, without sample-wise comparisons).
- a count value count cand _ x indicates how many of the blocks of the current reference picture have matching blocks in the candidate previous reference picture.
- the encoder checks ( 2150 ) whether to continue with another candidate previous reference picture. If so, the encoder performs ( 2130 ) hash-based block matching between blocks of the current reference picture and the other candidate previous reference picture. Thus, the encoder evaluates the previous reference pictures in the RPS (from encoding of the current picture) as candidate reference pictures. After determining counts of matches for all of the candidate previous reference pictures, the encoder drops ( 2160 ) the candidate previous reference picture with the highest count.
- the encoder evaluates pic ref1 , Pic ref2 , pic ref3 and pic ref4 as candidate reference pictures, performing ( 2130 ) hash-based block matching for blocks of the current reference picture.
- the encoder determines ( 2140 ) count values count cand _ ref1 , count cand _ ref2 , count cand _ ref3 and count cand _ ref4 , for the respective candidate previous reference pictures.
- the encoder determines which of count cand _ ref1 , count cand _ ref2 , count cand _ ref3 and count cand _ ref4 is highest, and drops ( 2160 ) the candidate previous reference picture having the highest count.
- the encoder can repeat the technique ( 2100 ) on a picture-by-picture basis.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
- Engineers use compression (also called source coding or source encoding) to reduce the bit rate of digital video. Compression decreases the cost of storing and transmitting video information by converting the information into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original information from the compressed form. A “codec” is an encoder/decoder system.
- Over the last two decades, various video codec standards have been adopted, including the ITU-T H.261, H.262 (MPEG-2 or ISO/IEC 13818-2), H.263 and H.264 (MPEG-4 AVC or ISO/IEC 14496-10) standards, the MPEG-1 (ISO/IEC 11172-2) and MPEG-4 Visual (ISO/IEC 14496-2) standards, and the SMPTE 421M (VC-1) standard. More recently, the H.265/HEVC standard (ITU-T H.265 or ISO/IEC 23008-2) has been approved. Extensions to the H.265/HEVC standard (e.g., for scalable video coding/decoding, for coding/decoding of video with higher fidelity in terms of sample bit depth or chroma sampling rate, for screen capture content, or for multi-view coding/decoding) are currently under development. A video codec standard typically defines options for the syntax of an encoded video bitstream, detailing parameters in the bitstream when particular features are used in encoding and decoding. In many cases, a video codec standard also provides details about the decoding operations a decoder should perform to achieve conforming results in decoding. Aside from codec standards, various proprietary codec formats define other options for the syntax of an encoded video bitstream and corresponding decoding operations.
- In general, video compression techniques include “intra-picture” compression and “inter-picture” compression. Intra-picture compression techniques compress individual pictures, and inter-picture compression techniques compress pictures with reference to a preceding and/or following picture (often called a reference or anchor picture) or pictures.
- Inter-picture compression techniques often use motion estimation and motion compensation to reduce bit rate by exploiting temporal redundancy in a video sequence. Motion estimation is a process for estimating motion between pictures. In one common technique, an encoder using motion estimation attempts to match a current block of sample values in a current picture with a candidate block of the same size in a search area in another picture, the reference picture. A reference picture is, in general, a picture that contains sample values that may be used for prediction in the decoding process of other pictures.
- For a current block, when the encoder finds an exact or “close enough” match in the search area in the reference picture, the encoder parameterizes the change in position between the current and candidate blocks as motion data such as a motion vector (“MV”). An MV is conventionally a two-dimensional value, having a horizontal MV component that indicates left or right spatial displacement and a vertical MV component that indicates up or down spatial displacement. In general, motion compensation is a process of reconstructing pictures from reference picture(s) using motion data.
- An MV can indicate a spatial displacement in terms of an integer number of samples starting from a co-located position in a reference picture for a current block. For example, for a current block at position (32, 16) in a current picture, the MV (−3, 1) indicates position (29, 17) in the reference picture. Or, an MV can indicate a spatial displacement in terms of a fractional number of samples from a co-located position in a reference picture for a current block. For example, for a current block at position (32, 16) in a current picture, the MV (−3.5, 1.25) indicates position (28.5, 17.25) in the reference picture. To determine sample values at fractional offsets in the reference picture, the encoder typically interpolates between sample values at integer-sample positions. Such interpolation can be computationally intensive. During motion compensation, a decoder also performs the interpolation as needed to compute sample values at fractional offsets in reference pictures.
- When encoding a block using motion estimation and motion compensation, an encoder often computes the sample-by-sample differences (also called residual values or error values) between the sample values of the block and its motion-compensated prediction. The residual values may then be encoded. For the residual values, encoding efficiency depends on the complexity of the residual values and how much loss or distortion is introduced as part of the compression process. In general, a good motion-compensated prediction closely approximates a block, such that the residual values include few significant values, and the residual values can be efficiently encoded. On the other hand, a poor motion-compensated prediction often yields residual values that include many significant values, which are more difficult to encode efficiently. Encoders typically spend a large proportion of encoding time performing motion estimation, attempting to find good matches and thereby improve rate-distortion performance.
- Different video codec standards and formats have used MVs with different MV precisions. For integer-sample MV precision, an MV component indicates an integer number of sample values for spatial displacement. For a fractional-sample MV precision such as ½-sample MV precision or ¼-sample MV precision, an MV component can indicate an integer number of sample values or fractional number of sample values for spatial displacement. For example, if the MV precision is ¼-sample MV precision, an MV component can indicate a spatial displacement of 0 samples, 0.25 samples, 0.5 samples, 0.75 samples, 1.0 samples, 1.25 samples, and so on. When a codec uses MVs with integer-sample MV precision, an encoder and decoder need not perform interpolation operations between sample values of reference pictures for motion compensation. When a codec uses MVs with fractional-sample MV precision, an encoder and decoder perform interpolation operations between sample values of reference pictures for motion compensation (adding computational complexity), but motion-compensated predictions tend to more closely approximate blocks (leading to residual values with fewer significant values), compared to integer-sample MV precision.
- Some video codec standards and formats support switching of MV precision during encoding. Encoder-side decisions about which MV precision to use are not made effectively, however, in certain encoding scenarios. In particular, such encoder-side decisions are not made effectively in various situations when encoding artificially-created video content such as screen capture content.
- In some video codec standards and formats, multiple reference pictures are available at a given time for use for motion-compensated prediction. Such video codec standards/formats specify how to manage the multiple reference pictures. For example, reference pictures can be added or dropped automatically according to rules during video encoding and decoding. Or, parameters in a bitstream may indicate information about reference pictures used during video encoding and decoding.
- In some video codec standards and formats, a reference picture set (“RPS”) is a set of reference pictures available for use in motion-compensated prediction at a given time. During encoding and decoding, an RPS can be updated to add newly decoded pictures and remove older pictures that are no longer used as reference pictures. In some recent codec standards (such as the H.265/HEVC standard), an RPS is updated during encoding and decoding, and syntax elements signaled in the bitstream indicate how to update the RPS.
- Encoder-side decisions about how to update an RPS are not made effectively in certain encoding scenarios, however. In particular, such decisions are not made effectively in various situations when encoding artificially-created video content such as screen capture content.
- A video encoder or video decoder can apply one or more filters to reconstructed sample values of pictures. According to the H.265/HEVC standard, for example, deblock filtering and sample adaptive offset (“SAO”) filtering can be applied to reconstructed sample values. Deblock filtering tends to reduce blocking artifacts due to block-based coding, and is adaptively applied to sample values at block boundaries. Within a region, SAO filtering is adaptively applied to sample values that satisfy certain conditions, such as presence of a gradient across the sample values.
- According to the H.265/HEVC standard, SAO filtering can be enabled or disabled for a sequence. When enabled for a sequence, SAO filtering can be enabled or disabled on a slice-by-slice basis for luma content of a slice and/or for chroma content of the slice. SAO filtering can also be enabled or disabled for blocks within a slice. For example, SAO filtering can be enabled or disabled for coding tree blocks (“CTBs”) of a coding tree unit (“CTU”) in a slice, where a CTU typically includes a luma CTB and corresponding chroma CTBs. For a CTB, a type index indicates whether SAO filtering is disabled, uses band offsets, or uses edge offsets. If SAO filtering uses band offsets or edge offsets, additional syntax elements indicate parameters for the SAO filtering for the CTB. In some cases, a CTB can reuse syntax elements from an adjacent CTB to control SAO filtering. In any event, when SAO filtering is used, it increases the computational complexity of encoding and decoding.
- There are many conditions and situations in which SAO filtering should be disabled. Encoder-side decisions about when to use SAO filtering are not made effectively, however, in certain encoding scenarios. In particular, such decisions are not made effectively in various situations when encoding artificially-created video content such as screen capture content.
- In summary, the detailed description presents innovations in encoder-side decisions that use the results of hash-based block matching when setting parameters during encoding. For example, some of the innovations relate to ways to select motion vector (“MV”) precision depending on the results of hash-based block matching. Other innovations relate to ways to selectively disable sample adaptive offset (“SAO”) filtering depending on the results of hash-based block matching. Still other innovations relate to ways to select which reference pictures to retain in a reference picture set (“RPS”) depending on the results of hash-based block matching. In particular, the innovations can provide computationally-efficient ways to set parameters during encoding of artificially-created video content such as screen capture content.
- According to a first aspect of the innovations described herein, a video encoder encodes video to produce encoded data and outputs the encoded data in a bitstream. As part of the encoding, the encoder determines an MV precision for a unit of the video based at least in part on the results of hash-based block matching. The unit can be a sequence, series of pictures between scene changes, group of pictures, picture, tile, slice, coding unit or other unit of video. The MV precision can be integer-sample precision, quarter-sample precision, or some other fractional-sample precision.
- For example, when determining the MV precision, the encoder splits the unit into multiple blocks. For a given block of the multiple blocks of the unit, the encoder determines a hash value for the given block, then determines whether there is a match for it among multiple candidate blocks of reference picture(s). The match can signify matching hash values between the given block and one of the multiple candidate blocks, which provides a fast result. Or, the match can further signify sample-by-sample matching between the given block and the one of the multiple candidate blocks, which is slower but may be more reliable. Then, for a non-matched block among the multiple blocks of the unit, the encoder can classify the non-matched block as containing natural video content or artificially-created video content. For example, when classifying the non-matched block, the encoder measures a number of different colors in the non-matched block, then compares the number of different colors to a threshold.
- According to another aspect of the innovations described herein, an image encoder or video encoder encodes an image or video to produce encoded data, and outputs the encoded data in a bitstream. As part of the encoding, the encoder performs hash-based block matching for a current block of a current picture. Based on whether a condition is satisfied, the encoder determines whether to disable SAO filtering for the current block. Based on results of the determining, the encoder selectively disables SAO filtering for the current block. If SAO filtering is not disabled for the current block, the encoder can check one or more other conditions to decide whether to use SAO filtering for the current block and, if SAO filtering is used, determine parameters for SAO filtering for the current block.
- The condition (for whether to enable or disable SAO filtering for the current block) depends on whether a match is found during the hash-based block matching for the current block. The condition can also depend on expected quality of the current block relative to quality of a candidate block for the match (e.g., as indicated by a quantization parameter (“QP”) value that applies for the current block and a QP value that applies for the candidate block, respectively).
- For example, when performing the hash-based block matching for the current block, the encoder determines a hash value for the current block, then attempts to find the match for it among multiple candidate blocks of reference picture(s). The current block can be a coding tree block (“CTB”) of a coding tree unit (“CTU”), in which case SAO filtering is also selectively disabled for one or more other CTBs of the CTU.
- According to another aspect of the innovations described herein, a video encoder encodes video to produce encoded data and outputs the encoded data in a bitstream. As part of the encoding, the encoder determines which of multiple reference pictures to retain in an RPS based at least in part on results of hash-based block matching.
- In one approach to determining which reference pictures to retain, for each of the multiple reference pictures, the encoder uses the hash-based block matching to estimate how well the reference picture predicts a next picture of a sequence. The encoder drops the reference picture that is expected to predict the next picture worse than the other reference pictures predict the next picture. For example, the encoder performs the hash-based block matching between blocks of the next picture and candidate blocks of a reference picture, where a count indicates how many of the blocks of the next picture have matching blocks in the reference picture. With this information, the encoder drops the reference picture having the lowest count.
- The multiple reference pictures can include one or more previous reference pictures previously in the RPS for encoding of a current picture. In this case, the multiple reference pictures can also include a current reference picture that is a reconstructed version of the current picture.
- In another approach to determining which reference pictures to retain, for each of the previous reference picture(s) in the RPS, the encoder uses the hash-based block matching to estimate similarity to the current reference picture. The encoder drops one of the previous reference picture(s) that is estimated to be most similar to the current reference picture. For example, the encoder performs the hash-based block matching between blocks of the current reference picture and candidate blocks of a previous reference picture, where a count indicates how many of the blocks of the current reference picture have matching blocks in the previous reference picture. With this information, the encoder drops the previous reference picture having the highest count.
- The innovations for encoder-side decisions can be implemented as part of a method, as part of a computing device adapted to perform the method or as part of a tangible computer-readable media storing computer-executable instructions for causing a computing device to perform the method. The various innovations can be used in combination or separately. For example, any of the innovations for selecting MV precision can be used separately or in combination with any of the innovations for selectively disabling SAO filtering and/or any of the innovations for deciding which reference pictures to retain in an RPS. As another example, any of the innovations for selectively disabling SAO filtering can be used separately or in combination with any of the innovations for selecting MV precision and/or any of the innovations for deciding which reference pictures to retain in an RPS. As another example, any of the innovations for deciding which reference pictures to retain in an RPS can be used separately or in combination with any of the innovations for selectively disabling SAO filtering and/or any of the innovations for selecting MV precision.
- The foregoing and other objects, features, and advantages of the invention will become more apparent from the following detailed description, which proceeds with reference to the accompanying figures.
-
FIG. 1 is a diagram of an example computing system in which some described embodiments can be implemented. -
FIGS. 2a and 2b are diagrams of example network environments in which some described embodiments can be implemented. -
FIG. 3 is a diagram of an example encoder system in conjunction with which some described embodiments can be implemented. -
FIGS. 4a and 4b are diagrams illustrating an example video encoder in conjunction with which some described embodiments can be implemented. -
FIG. 5 is diagram illustrating a computer desktop environment with content that may provide input for screen capture. -
FIG. 6 is a diagram illustrating composite video with natural video content and artificially-created video content. -
FIG. 7 is a table illustrating hash values for candidate blocks in hash-based block matching. -
FIGS. 8a-8c are tables illustrating example data structures that organize candidate blocks for hash-based block matching. -
FIGS. 9a-9c are tables illustrating example data structures that organize candidate blocks for iterative hash-based block matching. -
FIGS. 10a and 10b are diagrams illustrating motion compensation with MV values having an integer-sample spatial displacement and fractional-sample spatial displacement, respectively. -
FIGS. 11, 12 and 15 are flowcharts illustrating techniques for selecting MV precision depending on the results of hash-based block matching. -
FIG. 13 is a diagram illustrating characteristics of blocks of natural video content and blocks of screen capture content. -
FIG. 14 is a flowchart illustrating a generalized technique for classifying a block of video depending on a measure of the number of different colors in the block. -
FIGS. 16 and 17 are flowcharts illustrating techniques for selectively disabling SAO filtering depending on the results of hash-based block matching. -
FIG. 18 is a diagram illustrating updates to reference pictures of an RPS. -
FIGS. 19-21 are flowcharts illustrating techniques for deciding which reference pictures to retain in an RPS depending on the results on hash-based block matching. - The detailed description presents innovations in encoder-side decisions that use the results of hash-based block matching when setting parameters during encoding. For example, some of the innovations relate to ways to select motion vector (“MV”) precision depending on the results of hash-based block matching. Other innovations relate to ways to selectively disable sample adaptive offset (“SAO”) filtering depending on the results of hash-based block matching. Still other innovations relate to ways to select which reference pictures to retain in a reference picture set (“RPS”) depending on the results of hash-based block matching. In particular, the innovations can provide computationally-efficient ways to set parameters during encoding of artificially-created video content such as screen capture content.
- Although operations described herein are in places described as being performed by a video encoder, in many cases the operations can be performed by another type of media processing tool (e.g., image encoder).
- Some of the innovations described herein are illustrated with reference to syntax elements and operations specific to the H.265/HEVC standard. For example, reference is made to the draft version JCTVC-P1005 of the H.265/HEVC standard—“High Efficiency Video Coding (HEVC) Range Extensions Text Specification: Draft 6,” JCTVC-P1005_v1, February 2014. The innovations described herein can also be implemented for other standards or formats.
- Many of the innovations described herein can improve decision-making processes when encoding certain artificially-created video content such as screen capture content from a screen capture module. Screen capture content typically includes repeated structures (e.g., graphics, text characters). Screen capture content is usually encoded in a format (e.g., YUV 4:4:4 or RGB 4:4:4) with high chroma sampling resolution, although it may also be encoded in a format with lower chroma sampling resolution (e.g., YUV 4:2:0). Common scenarios for encoding/decoding of screen capture content include remote desktop conferencing and encoding/decoding of graphical overlays on natural video or other “mixed-content” video. Several of the innovations described herein (e.g., selecting MV precision, selectively disabling SAO filtering, determining which references pictures to retain in an RPS) are adapted for encoding of artificially-created video content, or for encoding of mixed-content video that includes at least some artificially-created video content. These innovations can also be used for natural video content, but may not be as effective.
- More generally, various alternatives to the examples described herein are possible. For example, some of the methods described herein can be altered by changing the ordering of the method acts described, by splitting, repeating, or omitting certain method acts, etc. The various aspects of the disclosed technology can be used in combination or separately. Different embodiments use one or more of the described innovations. Some of the innovations described herein address one or more of the problems noted in the background. Typically, a given technique/tool does not solve all such problems.
-
FIG. 1 illustrates a generalized example of a suitable computing system (100) in which several of the described innovations may be implemented. The computing system (100) is not intended to suggest any limitation as to scope of use or functionality, as the innovations may be implemented in diverse general-purpose or special-purpose computing systems. - With reference to
FIG. 1 , the computing system (100) includes one or more processing units (110, 115) and memory (120, 125). The processing units (110, 115) execute computer-executable instructions. A processing unit can be a general-purpose central processing unit (“CPU”), processor in an application-specific integrated circuit (“ASIC”) or any other type of processor. In a multi-processing system, multiple processing units execute computer-executable instructions to increase processing power. For example,FIG. 1 shows a central processing unit (110) as well as a graphics processing unit or co-processing unit (115). The tangible memory (120, 125) may be volatile memory (e.g., registers, cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory, etc.), or some combination of the two, accessible by the processing unit(s). The memory (120, 125) stores software (180) implementing one or more innovations for encoder decisions based on the results of hash-based block matching (e.g., for selecting MV precision, for selectively disabling SAO filtering and/or for deciding which references pictures to retain in a RPS), in the form of computer-executable instructions suitable for execution by the processing unit(s). - A computing system may have additional features. For example, the computing system (100) includes storage (140), one or more input devices (150), one or more output devices (160), and one or more communication connections (170). An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing system (100). Typically, operating system software (not shown) provides an operating environment for other software executing in the computing system (100), and coordinates activities of the components of the computing system (100).
- The tangible storage (140) may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing system (100). The storage (140) stores instructions for the software (180) implementing one or more innovations for encoder decisions based on the results of hash-based block matching.
- The input device(s) (150) may be a touch input device such as a keyboard, mouse, pen, or trackball, a voice input device, a scanning device, or another device that provides input to the computing system (100). For video, the input device(s) (150) may be a camera, video card, TV tuner card, screen capture module, or similar device that accepts video input in analog or digital form, or a CD-ROM or CD-RW that reads video input into the computing system (100). The output device(s) (160) may be a display, printer, speaker, CD-writer, or another device that provides output from the computing system (100).
- The communication connection(s) (170) enable communication over a communication medium to another computing entity. The communication medium conveys information such as computer-executable instructions, audio or video input or output, or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media can use an electrical, optical, RF, or other carrier.
- The innovations can be described in the general context of computer-readable media. Computer-readable media are any available tangible media that can be accessed within a computing environment. By way of example, and not limitation, with the computing system (100), computer-readable media include memory (120, 125), storage (140), and combinations of any of the above.
- The innovations can be described in the general context of computer-executable instructions, such as those included in program modules, being executed in a computing system on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing system.
- The terms “system” and “device” are used interchangeably herein. Unless the context clearly indicates otherwise, neither term implies any limitation on a type of computing system or computing device. In general, a computing system or computing device can be local or distributed, and can include any combination of special-purpose hardware and/or general-purpose hardware with software implementing the functionality described herein.
- The disclosed methods can also be implemented using specialized computing hardware configured to perform any of the disclosed methods. For example, the disclosed methods can be implemented by an integrated circuit (e.g., an ASIC (such as an ASIC digital signal processor (“DSP”), a graphics processing unit (“GPU”), or a programmable logic device (“PLD”), such as a field programmable gate array (“FPGA”)) specially designed or configured to implement any of the disclosed methods.
- For the sake of presentation, the detailed description uses terms like “determine” and “use” to describe computer operations in a computing system. These terms are high-level abstractions for operations performed by a computer, and should not be confused with acts performed by a human being. The actual computer operations corresponding to these terms vary depending on implementation.
-
FIGS. 2a and 2b show example network environments (201, 202) that include video encoders (220) and video decoders (270). The encoders (220) and decoders (270) are connected over a network (250) using an appropriate communication protocol. The network (250) can include the Internet or another computer network. - In the network environment (201) shown in
FIG. 2a , each real-time communication (“RTC”) tool (210) includes both an encoder (220) and a decoder (270) for bidirectional communication. A given encoder (220) can produce output compliant with a variation or extension of the H.265/HEVC standard, SMPTE 421M standard, ISO-IEC 14496-10 standard (also known as H.264 or AVC), another standard, or a proprietary format, with a corresponding decoder (270) accepting encoded data from the encoder (220). The bidirectional communication can be part of a video conference, video telephone call, or other two-party or multi-part communication scenario. Although the network environment (201) inFIG. 2a includes two real-time communication tools (210), the network environment (201) can instead include three or more real-time communication tools (210) that participate in multi-party communication. - A real-time communication tool (210) manages encoding by an encoder (220).
FIG. 3 shows an example encoder system (300) that can be included in the real-time communication tool (210). Alternatively, the real-time communication tool (210) uses another encoder system. A real-time communication tool (210) also manages decoding by a decoder (270). - In the network environment (202) shown in
FIG. 2b , an encoding tool (212) includes an encoder (220) that encodes video for delivery to multiple playback tools (214), which include decoders (270). The unidirectional communication can be provided for a video surveillance system, web camera monitoring system, remote desktop conferencing presentation or other scenario in which video is encoded and sent from one location to one or more other locations. Although the network environment (202) inFIG. 2b includes two playback tools (214), the network environment (202) can include more or fewer playback tools (214). In general, a playback tool (214) communicates with the encoding tool (212) to determine a stream of video for the playback tool (214) to receive. The playback tool (214) receives the stream, buffers the received encoded data for an appropriate period, and begins decoding and playback. -
FIG. 3 shows an example encoder system (300) that can be included in the encoding tool (212). Alternatively, the encoding tool (212) uses another encoder system. The encoding tool (212) can also include server-side controller logic for managing connections with one or more playback tools (214). A playback tool (214) can also include client-side controller logic for managing connections with the encoding tool (212). -
FIG. 3 is a block diagram of an example encoder system (300) in conjunction with which some described embodiments may be implemented. The encoder system (300) can be a general-purpose encoding tool capable of operating in any of multiple encoding modes such as a low-latency encoding mode for real-time communication, a transcoding mode, and a higher-latency encoding mode for producing media for playback from a file or stream, or it can be a special-purpose encoding tool adapted for one such encoding mode. The encoder system (300) can be adapted for encoding of a particular type of content (e.g., screen capture content). The encoder system (300) can be implemented as an operating system module, as part of an application library or as a standalone application. Overall, the encoder system (300) receives a sequence of source video frames (311) from a video source (310) and produces encoded data as output to a channel (390). The encoded data output to the channel can include content encoded using encoder-side decisions as described herein. - The video source (310) can be a camera, tuner card, storage media, screen capture module, or other digital video source. The video source (310) produces a sequence of video frames at a frame rate of, for example, 30 frames per second. As used herein, the term “frame” generally refers to source, coded or reconstructed image data. For progressive-scan video, a frame is a progressive-scan video frame. For interlaced video, in example embodiments, an interlaced video frame might be de-interlaced prior to encoding. Alternatively, two complementary interlaced video fields are encoded together as a single video frame or encoded as two separately-encoded fields. Aside from indicating a progressive-scan video frame or interlaced-scan video frame, the term “frame” or “picture” can indicate a single non-paired video field, a complementary pair of video fields, a video object plane that represents a video object at a given time, or a region of interest in a larger image. The video object plane or region can be part of a larger image that includes multiple objects or regions of a scene.
- An arriving source frame (311) is stored in a source frame temporary memory storage area (320) that includes multiple frame buffer storage areas (321, 322, . . . , 32 n). A frame buffer (321, 322, etc.) holds one source frame in the source frame storage area (320). After one or more of the source frames (311) have been stored in frame buffers (321, 322, etc.), a frame selector (330) selects an individual source frame from the source frame storage area (320). The order in which frames are selected by the frame selector (330) for input to the encoder (340) may differ from the order in which the frames are produced by the video source (310), e.g., the encoding of some frames may be delayed in order, so as to allow some later frames to be encoded first and to thus facilitate temporally backward prediction. Before the encoder (340), the encoder system (300) can include a pre-processor (not shown) that performs pre-processing (e.g., filtering) of the selected frame (331) before encoding. The pre-processing can include color space conversion into primary (e.g., luma) and secondary (e.g., chroma differences toward red and toward blue) components and resampling processing (e.g., to reduce the spatial resolution of chroma components) for encoding. Typically, before encoding, video has been converted to a color space such as YUV, in which sample values of a luma (Y) component represent brightness or intensity values, and sample values of chroma (U, V) components represent color-difference values. The precise definitions of the color-difference values (and conversion operations to/from YUV color space to another color space such as RGB) depend on implementation. In general, as used herein, the term YUV indicates any color space with a luma (or luminance) component and one or more chroma (or chrominance) components, including Y′UV, YIQ, Y′IQ and YDbDr as well as variations such as YCbCr and YCoCg. The chroma sample values may be sub-sampled to a lower chroma sampling rate (e.g., for YUV 4:2:0 format), or the chroma sample values may have the same resolution as the luma sample values (e.g., for YUV 4:4:4 format). Or, the video can be encoded in another format (e.g., RGB 4:4:4 format, GBR 4:4:4 format or BGR 4:4:4 format).
- The encoder (340) encodes the selected frame (331) to produce a coded frame (341) and also produces memory management control operation (“MMCO”) signals (342) or reference picture set (“RPS”) information. The RPS is the set of frames that may be used for reference in motion compensation for a current frame or any subsequent frame. If the current frame is not the first frame that has been encoded, when performing its encoding process, the encoder (340) may use one or more previously encoded/decoded frames (369) that have been stored in a decoded frame temporary memory storage area (360). Such stored decoded frames (369) are used as reference frames for inter-frame prediction of the content of the current source frame (331). The MMCO/RPS information (342) indicates to a decoder which reconstructed frames may be used as reference frames, and hence should be stored in a frame storage area. Example ways to make decisions about which reference pictures to retain in an RPS are described below.
- Generally, the encoder (340) includes multiple encoding modules that perform encoding tasks such as partitioning into tiles, intra prediction estimation and prediction, motion estimation and compensation, frequency transforms, quantization and entropy coding. The exact operations performed by the encoder (340) can vary depending on compression format. The format of the output encoded data can be a variation or extension of H.265/HEVC format, Windows Media Video format, VC-1 format, MPEG-x format (e.g., MPEG-1, MPEG-2, or MPEG-4), H.26x format (e.g., H.261, H.262, H.263, H.264), or another format.
- The encoder (340) can partition a frame into multiple tiles of the same size or different sizes. For example, the encoder (340) splits the frame along tile rows and tile columns that, with frame boundaries, define horizontal and vertical boundaries of tiles within the frame, where each tile is a rectangular region. Tiles are often used to provide options for parallel processing. A frame can also be organized as one or more slices, where a slice can be an entire frame or region of the frame. A slice can be decoded independently of other slices in a frame, which improves error resilience. The content of a slice or tile is further partitioned into blocks or other sets of sample values for purposes of encoding and decoding.
- For syntax according to the H.265/HEVC standard, the encoder splits the content of a frame (or slice or tile) into coding tree units. A coding tree unit (“CTU”) includes luma sample values organized as a luma coding tree block (“CTB”) and corresponding chroma sample values organized as two chroma CTBs. The size of a CTU (and its CTBs) is selected by the encoder, and can be, for example, 64×64, 32×32 or 16×16 sample values. A CTU includes one or more coding units. A coding unit (“CU”) has a luma coding block (“CB”) and two corresponding chroma CBs. For example, a CTU with a 64×64 luma CTB and two 64×64 chroma CTBs (YUV 4:4:4 format) can be split into four CUs, with each CU including a 32×32 luma CB and two 32×32 chroma CBs, and with each CU possibly being split further into smaller CUs. Or, as another example, a CTU with a 64×64 luma CTB and two 32×32 chroma CTBs (YUV 4:2:0 format) can be split into four CUs, with each CU including a 32×32 luma CB and two 16×16 chroma CBs, and with each CU possibly being split further into smaller CUs. The smallest allowable size of CU (e.g., 8×8, 16×16) can be signaled in the bitstream.
- Generally, a CU has a prediction mode such as inter or intra. A CU includes one or more prediction units for purposes of signaling of prediction information (such as prediction mode details, displacement values, etc.) and/or prediction processing. A prediction unit (“PU”) has a luma prediction block (“PB”) and two chroma PBs. For an intra-predicted CU, the PU has the same size as the CU, unless the CU has the smallest size (e.g., 8×8). In that case, the CU can be split into four smaller PUs (e.g., each 4×4 if the smallest CU size is 8×8) or the PU can have the smallest CU size, as indicated by a syntax element for the CU. A CU also has one or more transform units for purposes of residual coding/decoding, where a transform unit (“TU”) has a transform block (“TB”) and two chroma TBs. A PU in an intra-predicted CU may contain a single TU (equal in size to the PU) or multiple TUs. The encoder decides how to partition video into CTUs, CUs, PUs, TUs, etc.
- In H.265/HEVC implementations, a slice can include a single slice segment (independent slice segment) or be divided into multiple slice segments (independent slice segment and one or more dependent slice segments). A slice segment is an integer number of CTUs ordered consecutively in a tile scan, contained in a single network abstraction layer (“NAL”) unit. For an independent slice segment, a slice segment header includes values of syntax elements that apply for the independent slice segment. For a dependent slice segment, a truncated slice segment header includes a few values of syntax elements that apply for that dependent slice segment, and the values of the other syntax elements for the dependent slice segment are inferred from the values for the preceding independent slice segment in decoding order.
- As used herein, the term “block” can indicate a macroblock, prediction unit, residual data unit, or a CB, PB or TB, or some other set of sample values, depending on context.
- Returning to
FIG. 3 , the encoder represents an intra-coded block of a source frame (331) in terms of prediction from other, previously reconstructed sample values in the frame (331). For intra block copy (“BC”) prediction, an intra-picture estimator or motion estimator estimates displacement of a block with respect to the other, previously reconstructed sample values in the same frame. An intra-frame prediction reference region is a region of sample values in the frame that are used to generate BC-prediction values for the block. The intra-frame prediction region can be indicated with a block vector (“BV”) value, which can be represented in the bitstream as a motion vector (“MV”) value. For intra spatial prediction for a block, the intra-picture estimator estimates extrapolation of the neighboring reconstructed sample values into the block. Prediction information (such as BV/MV values for intra BC prediction, or prediction mode (direction) for intra spatial prediction) can be entropy coded and output. An intra-frame prediction predictor (or motion compensator for BV/MV values) applies the prediction information to determine intra prediction values. - The encoder (340) represents an inter-frame coded, predicted block of a source frame (331) in terms of prediction from one or more reference frames (369). A motion estimator estimates the motion of the block with respect to the one or more reference frames (369). The motion estimator can select an MV precision (e.g., integer-sample MV precision, ½-sample MV precision, or ¼-sample MV precision), for example, using an approach described herein, then use the selected MV precision during motion estimation. When multiple reference frames are used, the multiple reference frames can be from different temporal directions or the same temporal direction. A motion-compensated prediction reference region is a region of sample values in the reference frame(s) that are used to generate motion-compensated prediction values for a block of sample values of a current frame. The motion estimator outputs motion information such as MV information, which is entropy coded. A motion compensator applies MVs to reference frames (369) to determine motion-compensated prediction values for inter-frame prediction.
- The encoder can determine the differences (if any) between a block's prediction values (intra or inter) and corresponding original values. These prediction residual values are further encoded using a frequency transform, quantization and entropy encoding. For example, the encoder (340) sets values for quantization parameter (“QP”) for a picture, tile, slice and/or other portion of video, and quantizes transform coefficients accordingly. The entropy coder of the encoder (340) compresses quantized transform coefficient values as well as certain side information (e.g., MV information, selected MV precision, SAO filtering parameters, RPS update information, QP values, mode decisions, other parameter choices). Typical entropy coding techniques include Exponential-Golomb coding, Golomb-Rice coding, arithmetic coding, differential coding, Huffman coding, run length coding, variable-length-to-variable-length (“V2V”) coding, variable-length-to-fixed-length (“V2F”) coding, Lempel-Ziv (“LZ”) coding, dictionary coding, probability interval partitioning entropy coding (“PIPE”), and combinations of the above. The entropy coder can use different coding techniques for different kinds of information, can apply multiple techniques in combination (e.g., by applying Golomb-Rice coding followed by arithmetic coding), and can choose from among multiple code tables within a particular coding technique.
- An adaptive deblocking filter is included within the motion compensation loop in the encoder (340) to smooth discontinuities across block boundary rows and/or columns in a decoded frame. Other filtering (such as de-ringing filtering, adaptive loop filtering (“ALF”), or SAO filtering) can alternatively or additionally be applied as in-loop filtering operations. Example approaches to making decisions about enabling or disabling SAO filtering are described below.
- The encoded data produced by the encoder (340) includes syntax elements for various layers of bitstream syntax. For syntax according to the H.265/HEVC standard, for example, a picture parameter set (“PPS”) is a syntax structure that contains syntax elements that may be associated with a picture. A PPS can be used for a single picture, or a PPS can be reused for multiple pictures in a sequence. A PPS is typically signaled separate from encoded data for a picture (e.g., one NAL unit for a PPS, and one or more other NAL units for encoded data for a picture). Within the encoded data for a picture, a syntax element indicates which PPS to use for the picture. Similarly, for syntax according to the H.265/HEVC standard, a sequence parameter set (“SPS”) is a syntax structure that contains syntax elements that may be associated with a sequence of pictures. A bitstream can include a single SPS or multiple SPSs. A SPS is typically signaled separate from other data for the sequence, and a syntax element in the other data indicates which SPS to use.
- The coded frames (341) and MMCO/RPS information (342) (or information equivalent to the MMCO/RPS information (342), since the dependencies and ordering structures for frames are already known at the encoder (340)) are processed by a decoding process emulator (350). The decoding process emulator (350) implements some of the functionality of a decoder, for example, decoding tasks to reconstruct reference frames. In a manner consistent with the MMCO/RPS information (342), the decoding processes emulator (350) determines whether a given coded frame (341) needs to be reconstructed and stored for use as a reference frame in inter-frame prediction of subsequent frames to be encoded. If a coded frame (341) needs to be stored, the decoding process emulator (350) models the decoding process that would be conducted by a decoder that receives the coded frame (341) and produces a corresponding decoded frame (351). In doing so, when the encoder (340) has used decoded frame(s) (369) that have been stored in the decoded frame storage area (360), the decoding process emulator (350) also uses the decoded frame(s) (369) from the storage area (360) as part of the decoding process.
- The decoded frame temporary memory storage area (360) includes multiple frame buffer storage areas (361, 362, . . . , 36 n). In a manner consistent with the MMCO/RPS information (342), the decoding process emulator (350) manages the contents of the storage area (360) in order to identify any frame buffers (361, 362, etc.) with frames that are no longer needed by the encoder (340) for use as reference frames. After modeling the decoding process, the decoding process emulator (350) stores a newly decoded frame (351) in a frame buffer (361, 362, etc.) that has been identified in this manner.
- The coded frames (341) and MMCO/RPS information (342) are buffered in a temporary coded data area (370). The coded data that is aggregated in the coded data area (370) contains, as part of the syntax of an elementary coded video bitstream, encoded data for one or more pictures. The coded data that is aggregated in the coded data area (370) can also include media metadata relating to the coded video data (e.g., as one or more parameters in one or more supplemental enhancement information (“SEI”) messages or video usability information (“VUI”) messages).
- The aggregated data (371) from the temporary coded data area (370) are processed by a channel encoder (380). The channel encoder (380) can packetize and/or multiplex the aggregated data for transmission or storage as a media stream (e.g., according to a media program stream or transport stream format such as ITU-T H.222.01ISO/IEC 13818-1 or an Internet real-time transport protocol format such as IETF RFC 3550), in which case the channel encoder (380) can add syntax elements as part of the syntax of the media transmission stream. Or, the channel encoder (380) can organize the aggregated data for storage as a file (e.g., according to a media container format such as ISO/IEC 14496-12), in which case the channel encoder (380) can add syntax elements as part of the syntax of the media storage file. Or, more generally, the channel encoder (380) can implement one or more media system multiplexing protocols or transport protocols, in which case the channel encoder (380) can add syntax elements as part of the syntax of the protocol(s). The channel encoder (380) provides output to a channel (390), which represents storage, a communications connection, or another channel for the output. The channel encoder (380) or channel (390) may also include other elements (not shown), e.g., for forward-error correction (“FEC”) encoding and analog signal modulation.
-
FIGS. 4a and 4b are a block diagram of a generalized video encoder (400) in conjunction with which some described embodiments may be implemented. The encoder (400) receives a sequence of video pictures including a current picture as an input video signal (405) and produces encoded data in a coded video bitstream (495) as output. - The encoder (400) is block-based and uses a block format that depends on implementation. Blocks may be further sub-divided at different stages, e.g., at the prediction, frequency transform and/or entropy encoding stages. For example, a picture can be divided into 64×64 blocks, 32×32 blocks or 16×16 blocks, which can in turn be divided into smaller blocks of sample values for coding and decoding. In implementations of encoding for the H.265/HEVC standard, the encoder partitions a picture into CTUs (CTBs), CUs (CBs), PUs (PBs) and TU (TBs).
- The encoder (400) compresses pictures using intra-picture coding and/or inter-picture coding. Many of the components of the encoder (400) are used for both intra-picture coding and inter-picture coding. The exact operations performed by those components can vary depending on the type of information being compressed.
- A tiling module (410) optionally partitions a picture into multiple tiles of the same size or different sizes. For example, the tiling module (410) splits the picture along tile rows and tile columns that, with picture boundaries, define horizontal and vertical boundaries of tiles within the picture, where each tile is a rectangular region. In H.265/HEVC implementations, the encoder (400) partitions a picture into one or more slices, where each slice includes one or more slice segments.
- The general encoding control (420) receives pictures for the input video signal (405) as well as feedback (not shown) from various modules of the encoder (400). Overall, the general encoding control (420) provides control signals (not shown) to other modules (such as the tiling module (410), transformer/scaler/quantizer (430), scaler/inverse transformer (435), intra-picture estimator (440), motion estimator (450), filtering control (460) and intra/inter switch) to set and change coding parameters during encoding. For example, during encoding the general encoding control (420) can manage decisions about MV precision, whether to enable or disable SAO filtering and which reference pictures to retain in an RPS. The general encoding control (420) can also evaluate intermediate results during encoding, for example, performing rate-distortion analysis. The general encoding control (420) produces general control data (422) that indicates decisions made during encoding, so that a corresponding decoder can make consistent decisions. The general control data (422) is provided to the header formatter/entropy coder (490).
- If the current picture is predicted using inter-picture prediction, a motion estimator (450) estimates the motion of blocks of sample values of a current picture of the input video signal (405) with respect to one or more reference pictures. The decoded picture buffer (“DPB”) (470) buffers one or more reconstructed previously coded pictures for use as reference pictures. When multiple reference pictures are used, the multiple reference pictures can be from different temporal directions or the same temporal direction.
- Working with the general encoding control (420) and a block hash dictionary (451), the motion estimator (450) can select an MV precision (e.g., integer-sample MV precision, ½-sample MV precision, or ¼-sample MV precision) using an approach described herein, then use the selected MV precision during motion estimation. For hash-based block matching during the motion estimation, the motion estimator (450) can use the block hash dictionary (451) to find an MV value for a current block. The block hash dictionary (451) is a data structure that organizes candidate blocks for hash-based block matching. The block hash dictionary (451) is an example of a hash table. In
FIG. 4b , the block hash dictionary (451) is constructed based upon input sample values. Alternatively, a block hash dictionary can be constructed based upon reconstructed sample values and updated during encoding to store information about new candidate blocks, as those candidate blocks become available for use in hash-based block matching. - The motion estimator (450) produces as side information motion data (452) such as MV data, merge mode index values, and reference picture selection data, and the selected MV precision. These are provided to the header formatter/entropy coder (490) as well as the motion compensator (455).
- The motion compensator (455) applies MVs to the reconstructed reference picture(s) from the DPB (470). The motion compensator (455) produces motion-compensated predictions for the current picture.
- In a separate path within the encoder (400), an intra-picture estimator (440) determines how to perform intra-picture prediction for blocks of sample values of a current picture of the input video signal (405). The current picture can be entirely or partially coded using intra-picture coding. Using values of a reconstruction (438) of the current picture, for intra spatial prediction, the intra-picture estimator (440) determines how to spatially predict sample values of a current block of the current picture from neighboring, previously reconstructed sample values of the current picture. The intra-picture estimator (440) can determine the direction of spatial prediction to use for a current block.
- Or, for intra BC prediction using BV/MV values, the intra-picture estimator (440) or motion estimator (450) estimates displacement of the sample values of the current block to different candidate reference regions within the current picture, as a reference picture. For hash-based block matching, the intra-picture estimator (440) or motion estimator (450) can use a block hash dictionary (not shown) to find a BV/MV value for a current block. Or, for an intra-picture dictionary coding mode, pixels of a block are encoded using previous sample values stored in a dictionary or other location, where a pixel is a set of co-located sample values (e.g., an RGB triplet or YUV triplet).
- The intra-picture estimator (440) produces as side information intra prediction data (442), such as mode information, prediction mode direction (for intra spatial prediction), and offsets and lengths (for dictionary mode). The intra prediction data (442) is provided to the header formatter/entropy coder (490) as well as the intra-picture predictor (445).
- According to the intra prediction data (442), the intra-picture predictor (445) spatially predicts sample values of a current block of the current picture from neighboring, previously reconstructed sample values of the current picture. Or, for intra BC prediction, the intra-picture predictor (445) or motion compensator (455) predicts the sample values of the current block using previously reconstructed sample values of an intra-picture prediction reference region, which is indicated by a BV/MV value for the current block. Or, for intra-picture dictionary mode, the intra-picture predictor (445) reconstructs pixels using offsets and lengths.
- The intra/inter switch selects whether the prediction (458) for a given block will be a motion-compensated prediction or intra-picture prediction.
- The difference (if any) between a block of the prediction (458) and a corresponding part of the original current picture of the input video signal (405) provides values of the residual (418), for a non-skip-mode block. During reconstruction of the current picture, for a non-skip-mode block, reconstructed residual values are combined with the prediction (458) to produce an approximate or exact reconstruction (438) of the original content from the video signal (405). (In lossy compression, some information is lost from the video signal (405).)
- In the transformer/scaler/quantizer (430), a frequency transformer converts spatial-domain video information into frequency-domain (i.e., spectral, transform) data. For block-based video coding, the frequency transformer applies a discrete cosine transform (“DCT”), an integer approximation thereof, or another type of forward block transform (e.g., a discrete sine transform or an integer approximation thereof) to blocks of prediction residual data (or sample value data if the prediction (458) is null), producing blocks of frequency transform coefficients. The transformer/scaler/quantizer (430) can apply a transform with variable block sizes. The encoder (400) can also skip the transform step in some cases.
- The scaler/quantizer scales and quantizes the transform coefficients. For example, the quantizer applies dead-zone scalar quantization to the frequency-domain data with a quantization step size that varies on a picture-by-picture basis, tile-by-tile basis, slice-by-slice basis, block-by-block basis, frequency-specific basis or other basis. The quantized transform coefficient data (432) is provided to the header formatter/entropy coder (490).
- In the scaler/inverse transformer (435), a scaler/inverse quantizer performs inverse scaling and inverse quantization on the quantized transform coefficients. When the transform stage has not been skipped, an inverse frequency transformer performs an inverse frequency transform, producing blocks of reconstructed prediction residual values or sample values. For a non-skip-mode block, the encoder (400) combines reconstructed residual values with values of the prediction (458) (e.g., motion-compensated prediction values, intra-picture prediction values) to form the reconstruction (438). For a skip-mode block or dictionary-mode block, the encoder (400) uses the values of the prediction (458) as the reconstruction (438).
- For spatial intra-picture prediction, the values of the reconstruction (438) can be fed back to the intra-picture estimator (440) and intra-picture predictor (445). For intra BC prediction, the values of the reconstruction (438) can similarly be fed back to provide reconstructed sample values. Also, the values of the reconstruction (438) can be used for motion-compensated prediction of subsequent pictures.
- The values of the reconstruction (438) can be further filtered. A filtering control (460) determines how to perform deblock filtering and SAO filtering on values of the reconstruction (438), for a given picture of the video signal (405). With the general encoding control (420) and the block hash dictionary (451), the filtering control (460) can make decisions about enabling or disabling SAO filtering, as explained below. The filtering control (460) produces filter control data (462), which is provided to the header formatter/entropy coder (490) and merger/filter(s) (465).
- In the merger/filter(s) (465), the encoder (400) merges content from different tiles into a reconstructed version of the picture. The encoder (400) selectively performs deblock filtering and/or SAO filtering according to the filter control data (462). Other filtering (such as de-ringing filtering or ALF) can alternatively or additionally be applied. Tile boundaries can be selectively filtered or not filtered at all, depending on settings of the encoder (400), and the encoder (400) may provide syntax within the coded bitstream to indicate whether or not such filtering was applied.
- The DPB (470) buffers the reconstructed current picture for use in subsequent motion-compensated prediction. In particular, references pictures in the RPS can be buffered in the DPB (470). The DPB (470) has limited memory space, however. If the reconstructed current picture is retained in the DPB (470) for use as a reference picture, another picture may be removed from the DPB (470) (and dropped from the RPS). The general encoding control (420) decides which pictures to retain in the RPS and buffer in the DPB (470). Using the block hash dictionary (451), the general encoding control (420) can make decisions about which reference pictures to retain in the RPS, as explained below.
- The header formatter/entropy coder (490) formats and/or entropy codes the general control data (422), quantized transform coefficient data (432), intra prediction data (442), motion data (452) and filter control data (462). For the motion data (452), the header formatter/entropy coder (490) can select and entropy code merge mode index values, or a default MV predictor can be used. In some cases, the header formatter/entropy coder (490) also determines MV differentials for MV values (relative to MV predictors), then entropy codes the MV differentials, e.g., using context-adaptive binary arithmetic coding.
- The header formatter/entropy coder (490) provides the encoded data in the coded video bitstream (495). The format of the coded video bitstream (495) can be a variation or extension of H.265/HEVC format, Windows Media Video format, VC-1 format, MPEG-x format (e.g., MPEG-1, MPEG-2, or MPEG-4), H.26x format (e.g., H.261, H.262, H.263, H.264), or another format.
- Depending on implementation and the type of compression desired, modules of an encoder (400) can be added, omitted, split into multiple modules, combined with other modules, and/or replaced with like modules. In alternative embodiments, encoders with different modules and/or other configurations of modules perform one or more of the described techniques. Specific embodiments of encoders typically use a variation or supplemented version of the encoder (400). The relationships shown between modules within the encoder (400) indicate general flows of information in the encoder; other relationships are not shown for the sake of simplicity.
- The approaches described herein for selecting MV precision, selectively disabling SAO filtering and determining which reference pictures to retain in an RPS can be applied when encoding any type of video. In particular, however, these approaches can improve performance when encoding certain artificially-created video content such as screen capture content.
- In general, screen capture content represents the output of a computer screen or other display.
FIG. 5 shows a computer desktop environment (510) with content that may provide input for screen capture. For example, video of screen capture content can represent a series of images of the entire computer desktop (511). Or, video of screen capture content can represent a series of images for one of the windows of the computer desktop environment, such as the app window (513) including game content, browser window (512) with Web page content or window (514) with word processor content. - As computer-generated, artificially-created video content, screen capture content tends to have relatively few discrete sample values, compared to natural video content that is captured using a video camera. For example, a region of screen capture content often includes a single uniform color, whereas a region in natural video content more likely includes colors that gradually vary. Also, screen capture content typically includes distinct structures (e.g., graphics, text characters) that are exactly repeated from frame-to-frame, even if the content may be spatially displaced (e.g., due to scrolling). Screen capture content is usually encoded in a format (e.g., YUV 4:4:4 or RGB 4:4:4) with high chroma sampling resolution, although it may also be encoded in a format with lower chroma sampling resolution (e.g., YUV 4:2:0, YUV 4:2:2).
-
FIG. 6 shows composite video (620) that includes natural video content (621) and artificially-created video content. The artificially-created video content includes a graphic (622) beside the natural video content (621) and ticker (623) running below the natural video content (621). Like the screen capture content shown inFIG. 5 , the artificially-created video content shown inFIG. 6 tends to have relatively few discrete sample values. It also tends to have distinct structures (e.g., graphics, text characters) that are exactly repeated from frame-to-frame or gradually offset from frame-to-frame (e.g., due to scrolling). - In various innovations described herein, a video encoder uses the results of hash-based block matching when making decisions about parameters during encoding. This section describes examples of hash-based block matching.
- A. Hash-Based Block Matching.
- When an encoder uses hash-based block matching, the encoder determines a hash value for each of multiple candidate blocks of one or more reference pictures. A hash table stores the hash values for the candidate blocks. The encoder also determines a hash value for a current block by the same hashing approach, and then searches the hash table for a matching hash value. If two blocks are identical, their hash values are the same. Using hash values, an encoder can quickly and efficiently identify candidate blocks that have the same hash value as the current block, and filter out candidate blocks that have different hash values. Depending on implementation and the goals of the hash-based block matching, the encoder may then further evaluate those candidate blocks having the same hash value as the current block. (Different blocks can have the same hash value. So, among the candidate blocks with the same hash value, the encoder can further identify a candidate block that matches the current block.)
- In some example implementations, hash values for candidate blocks are determined from the input sample values for the pictures (reference pictures) that include the candidate blocks. During hash-based block matching, the encoder determines the hash value for a current block using input sample values. The encoder compares it (or otherwise uses the hash value) against the hash values determined from input sample values for candidate blocks. Even so, reconstructed sample values from the matching block are used to represent the current block. Thus, prediction operations still use reconstructed sample values.
- Alternatively, the candidate blocks considered in hash-based block matching include reconstructed sample values. That is, the candidate blocks are part of previously encoded then reconstructed content in a picture. Hash values for the candidate blocks are determined from the reconstructed sample values. During hash-based block matching, the encoder determines the hash value for a current block using input sample values. The encoder compares it (or otherwise uses the hash value) against the hash values determined from reconstructed sample values for candidate blocks.
-
FIG. 7 illustrates hash values (700) for candidate blocks B(x, y) in hash-based block matching, where x and y indicate horizontal and vertical coordinates, respectively, for the top-left position of a given candidate block. The candidate blocks have hash values determined using a hash function h( ). For a candidate block B(x, y) in a reference picture, the encoder determines a hash value h(B) for the candidate block from input sample values for the reference picture. The encoder can determine hash values for all candidate blocks in the reference picture. Or, the encoder can screen out some candidate blocks. - In general, the hash function h( ) yields n possible hash values, designated h0 to hn-1. For a given hash value, the candidate blocks with that hash value are grouped. For example, in
FIG. 7 , the candidate blocks B(1266, 263), B(1357, 365), B(1429, 401), B(502, 464), . . . have the hash value h0. Groups can include different numbers of candidate blocks. For example, inFIG. 7 , the group for hash value h4 includes a single candidate block, while the group for hash value h0 includes more than four candidate blocks. - In this way, the possible candidate blocks are distributed into n categories. For example, if the hash function h( ) produces 12-bit hash values, the candidate blocks are split into 212=4,096 categories. The number of candidate blocks per hash value can be further reduced by eliminating redundant, identical blocks with that hash value, or by screening out candidate blocks having certain patterns of sample values. Also, the encoder can iteratively winnow down the number of candidate blocks using different hash functions.
- The hash function used for hash-based block matching depends on implementation. A hash function can produce hash values with 8 bits, 12 bits, 16 bits, 24 bits, 32 bits, or some other number of bits. If a hash value has fewer bits, the data structure includes fewer categories, but each category may include more candidate blocks. On the other hand, using hash values with more bits tends to increase the size of the data structure that organizes candidate blocks. If a hash value has more bits, the data structure includes more categories, but each category may include fewer candidate blocks. The hash function h( ) can be a cryptographic hash function, part of a cryptographic hash function, cyclic redundancy check (“CRC”) function, part of a CRC, or another hash function (e.g., using averaging and XOR operations to determine the signature of a candidate block or current block). Some types of hash function (e.g., CRC function) map similar blocks to different hash values, which may be efficient when seeking a matching block that exactly corresponds with a current block. Other types of hash function (e.g., locality-sensitive hash function) map similar blocks to the same hash value.
- During hash-based block matching, with the hash function h( ), the encoder determines the hash value for the current block Bcurrent. In
FIG. 7 , the hash value h(Bcurrent) is h3. Using the hash value of the current block, the encoder can identify candidate blocks that have the same hash value (shown in outlined box inFIG. 7 ), and filter out the other candidate blocks. When a hash function maps similar blocks to different hash values, the identified candidate blocks (same hash value as the current block) include blocks that might be identical to the current block. When a hash function maps similar blocks to the same hash value, the identified candidate blocks (same hash value as the current block) include blocks that might be identical to the current block or might be close approximations of the current block. Either way, from these identified candidate blocks, the encoder can further identify a matching block for the current block (e.g., using sample-wise block matching operations, using a second hash function). - Overall, since hash value comparisons are much simpler than sample-wise block matching, hash-based block matching can make the process of evaluating the candidate blocks in reference picture(s) much more efficient. Also, hash values for candidate blocks can be reused in hash-based block matching for different blocks within a picture during encoding. In this case, the cost of computing the hash values for the candidate blocks can be amortized across hash-based block matching operations for the entire picture, for other pictures that use the same reference picture, and for other encoder-side decisions that use the hash values.
- B. Data Structures for Hash-Based Block Matching.
- In some example implementations, the encoder uses a data structure that organizes candidate blocks according to their hash values. The data structure can help make hash-based block matching more computationally efficient. The data structure implements, for example, a block hash dictionary or hash table as described herein.
-
FIG. 8a illustrates an example data structure (800) that organizes candidate blocks for hash-based block matching. For the hash function h( ), the n possible hash values are h0 to hn-1. Candidate blocks with the same hash value are classified in the same candidate block list. A given candidate block list can include zero or more entries. For example, the candidate block list for the hash value h2 has no entries, the list for the hash value h6 has two entries, and the list for the hash value h1 has more than four entries. - An entry(hi, k) includes information for the kth candidate block with the hash value hi. As shown in
FIG. 8b , an entry in a candidate block list can include the address of a block B(x, y) (e.g., horizontal and vertical coordinates for the top-left position of the block). Or, as shown inFIG. 8c , an entry in a candidate block list can include the address of a block B(x, y) and a hash value from a second hash function, which can be used for iterative hash-based block matching. - During hash-based block matching for a current block, the encoder determines the hash value of the current block h(Bcurrent). The encoder retains the candidate block list with the same hash value and rules out the other n−1 lists. To select the matching block, the encoder can compare the current block with the candidate block(s), if any, in the retained candidate block list. Thus, by a simple lookup operation using the hash value h(Bcurrent), the encoder can eliminate (n−1)/n of the candidate blocks (on average), and focus on the remaining 1/n candidate blocks (on average) in the retained list, significantly reducing the number of sample-wise block matching operations.
- Different data structures can be used for different reference pictures. Alternatively, an entry for a candidate block in the data structure stores information indicating the reference picture that includes the candidate block, which can be used in hash-based block matching.
- Also, different data structures can be used for different sizes of blocks. For example, one data structure includes hash values for 8×8 candidate blocks, a second data structure includes hash values for 16×16 candidate blocks, a third data structure includes hash values for 32×32 candidate blocks, and so on. The data structure used during hash-based block matching depends on the size of the current block. Alternatively, a single, unified data structure can be used for different sizes of blocks. A hash function can produce an n-bit hash value, where m bits of the n-bit hash value indicate a hash value among the possible blocks of a given block size according to an m-bit hash function, and the remaining n-m bits of the n-bit hash value indicate the given block size. For example, the first two bits of a 14-bit hash function can indicate a block size, while the remaining 12 bits indicate a hash value according to a 12-bit hash function. Or, a hash function can produce an m-bit hash value regardless of the size of the block, and an entry for a candidate block in the data structure stores information indicating the block size for the candidate block, which can be used in hash-based block matching.
- For a high-resolution picture, the data structure can store information representing a very large number of candidate blocks. To reduce the amount of memory used for the data structure, the encoder can eliminate redundant values. For example, the encoder can skip adding identical blocks to the data structure. In general, reducing the size of the data structure by eliminating identical blocks can hurt coding efficiency. Thus, by deciding whether to eliminate identical blocks, the encoder can trade off memory size for the data structure and coding efficiency. The encoder can also screen out candidate blocks, depending on the content of the blocks.
- C. Iterative Hash-Based Block Matching.
- When the encoder uses a single hash function with n possible hash values, the encoder can rule out n−1 lists of candidate blocks based on the hash value of a current block, but the encoder may still need to perform sample-wise block matching operations for the remaining candidate block(s), if any, for the list with the matching hash value. Also, when updating a data structure that organizes candidate blocks, the encoder may need to perform sample-wise block matching operations to identify identical blocks. Collectively, these sample-wise block matching operations can be computationally intensive.
- Therefore, in some example implementations, the encoder uses iterative hash-based block matching. Iterative hash-based block matching can speed up the block matching process and also speed up the process of updating a data structure that organizes candidate blocks.
- Iterative hash-based block matching uses multiple hash values determined with different hash functions. For a block B (current block or candidate block), in addition to the hash value h(B), the encoder determines another hash value h′(B) using a different hash function h′( ). With the first hash value h(Bcurrent) for a current block, the encoder identifies candidate blocks that have the same hash value for the first hash function h( ). To further rule out some of these identified candidate blocks, the encoder uses a second hash value h′(Bcurrent) for the current block, which is determined using a different hash function. The encoder compares the second hash value h′(Bcurrent) with the second hash values for the previously identified candidate blocks (which have same first hash value), in order to filter out more of the candidate blocks. A hash table tracks hash values for the candidate blocks according to the different hash functions.
- In the example of
FIG. 8a , if h(Bcurrent)=h3, the encoder selects the candidate blocks with entry(3, 0), entry (3, 1), entry(3, 2), entry(3, 3), . . . for further refinement. As shown inFIG. 8c , for a candidate block B, an entry includes a block address and a second hash value h′(B) from the hash function h′( ). The encoder compares the second hash value h′(Bcurrent) for the current block with the second hash values h′(B) for the respective candidate blocks with entry(3, 0), entry (3, 1), entry(3, 2), entry(3, 3), . . . . Based on results of the second hash value comparisons, the encoder can rule out more of the candidate blocks, leaving candidate blocks, if any, that have first and second hash values matching h(Bcurrent) and h′(Bcurrent), respectively. The encoder can perform sample-wise block matching on any remaining candidate blocks to select a matching block. -
FIGS. 9a-9c show another example of iterative hash-based block matching that uses a different data structure. The data structure (900) inFIG. 9a organizes candidate blocks by first hash value from a first hash function h( ), which has n1 possible hash values. The data structure (900) includes lists for hash values from h0 . . . hn1-1. In the example, the encoder determines a first hash value h(Bcurrent)=h2 for the current block, and selects the list for h2 from the structure (900). - As shown in
FIG. 9b , the list (910) for h2 includes multiple lists that further organize the remaining candidate blocks by second hash value from a second hash function h which has n2 possible hash values. The list (910) includes lists for hash values from h′0 . . . h′n2-1, each including entries with block addresses (e.g., horizontal and vertical coordinates for top-left positions of respective candidate blocks), as shown for the entry (920) inFIG. 9c . In the example, the encoder determines a second hash value h′(Bcurrent)=h′0 for the current block, and selects the list for h′0 from the list (910). For the candidate blocks in the list for h′0, the encoder can perform sample-wise block matching to select a matching block. In this example, the lists for the second hash values are specific to a given list for the first hash value. Alternatively, there is one set of lists for the second hash values, and the encoder identifies any candidate blocks that are (1) in the matching list for the first hash values and also (2) in the matching list for the second hash values. - Aside from hash-based block matching, the second hash function h′( ) can be used to simplify the process of updating a data structure that organizes candidate blocks. For example, when the encoder checks whether a new candidate block is identical to a candidate block already represented in the data structure, the encoder can use multiple hash values with different hash functions to filter out non-identical blocks. For remaining candidate blocks, the encoder can perform sample-wise block matching to identify any identical block.
- In the preceding examples, the iterative hash-based block matching and updating use two different hash functions. Alternatively, the encoder uses three, four or more hash functions to further speed up hash-based block matching or filter out non-identical blocks, and thereby reduce the number of sample-wise block matching operations. Also, for a low-complexity encoder or for faster decision-making processes, the encoder can skip sample-wise block matching operations when hash values match. For hash functions with a large number of possible hash values, there is a high probability that two blocks are identical if hash values for the two blocks match. In particular, in some example implementations of encoder-side decisions described below, the encoder considers, as the results of hash-based block matching, whether hash values match, but does not perform any sample-wise block matching operations.
- This section presents various approaches to selection of motion vector (“MV”) precision during encoding, depending on the results of hash-based block matching (e.g., matching hash values). By selecting appropriate MV precisions during encoding, these approaches can facilitate compression that is effective in terms of rate-distortion performance and/or computational efficiency of encoding and decoding.
- A. Different MV Precisions.
- When encoding artificially-created video content, MV values usually represent integer-sample spatial displacements, and very few MV values represent fractional-sample spatial displacements. This provides opportunities for reducing MV precision to improve overall performance.
-
FIG. 10a shows motion compensation with an MV (1020) having an integer-sample spatial displacement. The MV (1020) indicates a spatial displacement of four samples to the left, and one sample up, relative to the co-located position (1010) in a reference picture for a current block. For example, for a 4×4 current block at position (64, 96) in a current picture, the MV (1020) indicates a 4×4 prediction region (1030) whose position is (60, 95) in the reference picture. The prediction region (1030) includes reconstructed sample values at integer-sample positions in the reference picture. An encoder or decoder need not perform interpolation to determine the values of the prediction region (1030). -
FIG. 10b shows motion compensation with an MV (1021) having a fractional-sample spatial displacement. The MV (1021) indicates a spatial displacement of 3.75 samples to the left, and 0.5 samples up, relative to the co-located position (1010) in a reference picture for a current block. For example, for a 4×4 current block at position (64, 96) in a current picture, the MV (1021) indicates a 4×4 prediction region (1031) whose position is (60.25, 95.5) in the reference picture. The prediction region (1031) includes interpolated sample values at fractional-sample positions in the reference picture. An encoder or decoder performs interpolation to determine the sample values of the prediction region (1031). When fractional-sample spatial displacements are allowed, there are more candidate prediction regions that may match a current block, and thus the quality of motion-compensated prediction usually improves, at least for some types of video content (e.g., natural video content). - B. Representation of MV Values.
- MV values are typically represented using integer values whose meaning depends on MV precision. For integer-sample MV precision, for example, an integer value of 1 indicates a spatial displacement of 1 sample, an integer value of 2 indicates a spatial displacement of 2 samples, and so on. For ¼-sample MV precision, for example, an integer value of 1 indicates a spatial displacement of 0.25 samples. Integer values of 2, 3, 4 and 5 indicate spatial displacements of 0.5, 0.75, 1.0 and 1.25 samples, respectively. Regardless of MV precision, the integer value can indicate a magnitude of the spatial displacement, and separate flag value can indicate whether displacement is negative or positive. The horizontal MV component and vertical MV component of a given MV value can be represented using two integer values. Thus, the meaning of two integer values representing an MV value depends on MV precision. For example, for an MV value having a 2-sample horizontal displacement and no vertical displacement, if MV precision is ¼-sample MV precision, the MV value is represented as (8, 0). If MV precision is integer-sample MV precision, however, the MV value is represented as (2, 0).
- MV values in a bitstream of encoded video data are typically entropy coded (e.g., on an MV-component-wise basis). An MV value may also be differentially encoded relative to a predicted MV value (e.g., on an MV-component-wise basis). In many cases, the MV value equals the predicted MV value, so the differential MV value is zero, which can be encoded very efficiently. A differential MV value (or MV value, if MV prediction is not used) can be entropy encoded using Exponential-Golomb coding, context-adaptive binary arithmetic coding or another form of entropy coding. Although the exact relationship between MV value (or differential MV value) and encoded bits depends on the form of entropy coding used, in general, smaller values are encoded more efficiently (that is, using fewer bits) because they are more common, and larger values are encoded less efficiently (that is, using more bits) because they are less common.
- C. Adaptive MV Precision—Introduction.
- To summarize the preceding two sections, using MV values with integer-sample MV precision tends to reduce bit rate associated with signaling MV values and reduce computational complexity of encoding and decoding (by avoiding interpolation of sample values at fractional-sample positions in reference pictures), but may reduce the quality of motion-compensated prediction, at least for some types of video content. On the other hand, using MV values with fractional-sample MV precision tends to increase bit rate associated with signaling MV values and increase computational complexity of encoding and decoding (by including interpolation of sample values at fractional-sample positions in reference pictures), but may improve the quality of motion-compensated prediction, at least for some types of video content. In general, computational complexity, bit rate for signaling MV values, and quality of motion-compensated prediction increase as MV precision increases (e.g., from integer-sample to ½-sample, or from ½-sample to ¼-sample), up to a point of diminishing returns.
- When encoding artificially-created video content, the added costs of fractional-sample MV precision (in terms of bit rate and computational complexity) may be unjustified. For example, if most MV values represent integer-sample spatial displacements, and very few MV values represent fractional-sample spatial displacements, the added costs of fractional-sample MV precision are not warranted. The encoder can skip searching at fractional-sample positions (and interpolation operations to determine sample values at those positions) during motion estimation. For such content, bit rate and computational complexity can be reduced, without a significant penalty to the quality of motion-compensated prediction, by using MV values with integer-sample MV precision.
- Since fractional-sample MV precision may still be useful for other types of video content (e.g., natural video captured by camera), an encoder and decoder can be adapted to switch between MV precisions. For example, an encoder and decoder can use integer-sample MV precision for artificially-created video content, but use a fractional-sample MV precision (such as ¼-sample MV precision) for natural video content. Approaches that an encoder may follow when selecting MV precision are described in the next section. The encoder can signal the selected MV precision to the decoder using one or more syntax elements in the bitstream.
- In one approach to signaling MV precision, when adaptive selection of MV precision is enabled, the encoder selects an MV precision on a slice-by-slice basis. A flag value in a sequence parameter set (“SPS”), picture parameter set (“PPS”) or other syntax structure indicates whether adaptive selection of MV precision is enabled. If so, one or more syntax elements in a slice header for a given slice indicate the selected MV precision for blocks of that slice. For example, a flag value of 0 indicates ¼-sample MV precision, and a flag value of 1 indicates integer-sample MV precision.
- In another approach to signaling MV precision, the encoder selects an MV precision on a picture-by-picture basis or slice-by-slice basis. A syntax element in a PPS indicates one of three MV precision modes: (0) ¼-sample MV precision for MV values of slice(s) of a picture associated with the PPS, (1) integer-sample MV precision for MV values of slice(s) of a picture associated with the PPS, or (2) slice-adaptive MV precision depending on a flag value signaled per slice header, where the flag value in the slice header can indicate ¼-sample MV precision or integer-sample MV precision for MV values of the slice.
- In still another approach to signaling MV precision, when adaptive selection of MV precision is enabled, the encoder selects an MV precision on a CU-by-CU basis. One or more syntax elements in a structure for a given CU indicate the selected MV precision for blocks of that CU. For example, a flag value in a CU syntax structure for a CU indicates whether MV values for all PUs associated with the CU have integer-sample MV precision or ¼-sample MV precision.
- In any of these approaches, the encoder and decoder can use different MV precisions for horizontal and vertical MV components. This can be useful when encoding artificially-created video content that has been scaled horizontally or vertically (e.g., using integer-sample MV precision in an unscaled dimension, and using a fractional-sample MV precision in a scaled dimension). In some example implementations, if rate control cannot be achieved solely through adjustment of QP values, an encoder may resize video horizontally or vertically to reduce bit rate, then encode the resized video. At the decoder side, the video is scaled back to its original dimensions after decoding. The encoder can signal the MV precision for horizontal MV components and also signal the MV precision for vertical MV components to the decoder.
- More generally, when adaptive selection of MV precision is enabled, the encoder selects an MV precision and signals the selected MV precision in some way. For example, a flag value in a SPS, PPS or other syntax structure can indicate whether adaptive selection of MV precision is enabled. When adaptive MV precision is enabled, one or more syntax elements in sequence-layer syntax, GOP-layer syntax, picture-layer syntax, slice-layer syntax, tile-layer syntax, block-layer syntax or another syntax structure can indicate the selected MV precision for horizontal and vertical components of MV values. Or, one or more syntax elements in sequence-layer syntax, GOP-layer syntax, picture-layer syntax, slice-header-layer syntax, slice-data-layer syntax, tile-layer syntax, block-layer syntax or another syntax structure can indicate MV precisions for different MV components. When there are two available MV precisions, a flag value can indicate a selection between the two MV precisions. Where there are more available MV precisions, an integer value can a selection between those MV precisions.
- Aside from modifications to signal/parse the syntax elements that indicate selected MV precision(s), decoding can be modified to change how signaled MV values are interpreted depending on the selected MV precision. The details of how MV values are encoded and reconstructed can vary depending on MV precision. For example, when the MV precision is integer-sample precision, predicted MV values can be rounded to the nearest integer, and differential MV values can indicate integer-sample offsets. Or, when the MV precision is ¼-sample precision, predicted MV values can be rounded to the nearest ¼-sample offset, and differential MV values can indicate ¼-sample offsets. Or, MV values can be signaled in some other way. When MV values have integer-sample MV precision and the video uses 4:2:2 or 4:2:0 chroma sampling, chroma MV values can be derived by scaling, etc., which may result in ½-sample displacements for chroma. Or, chroma MV values can be rounded to integer values.
- Alternatively, the encoder does not change how MV values are predicted or how MV differences are signaled in the bitstream, nor does the decoder change how MV values are predicted or how MV differences are reconstructed, but the interpretation of reconstructed MV values changes depending on the selected MV precision. If the selected MV precision is integer-sample precision, a reconstructed MV value is scaled by a factor of 4 before being used in a motion compensation process (which operates at quarter-sample precision). If the selected MV precision is quarter-sample precision, the reconstructed MV value is not scaled before being used in the motion compensation process.
- D. Selecting MV Precision Using Results of Hash-Based Block Matching.
- When MV precision can be adapted during video encoding, an encoder selects an MV precision for a unit of video (e.g., the MV precision for one or both components of MV values for the unit). The encoder can select the MV precision to use depending on the results of hash-based block matching (e.g., matching hash values). The selection of the MV precision can also depend on other factors, such as classification of blocks as natural video content or artificially-created video content. These approaches can provide a computationally-efficient way to select appropriate MV precisions.
- 1. Example Techniques for Selecting MV Precision.
-
FIG. 11 shows a generalized technique (1100) for selecting MV precision depending on the results of hash-based block matching. The technique (1100) can be performed by an encoder such as one described with reference toFIG. 3 orFIGS. 4a and 4b , or by another encoder. - The encoder encodes (1110) video to produce encoded data, then outputs (1120) the encoded data in a bitstream. As part of the encoding (1110), the encoder determines an MV precision for a unit of the video based at least in part on the results of hash-based block matching. The MV precision can apply for one or both components of MV values. The hash-based block matching can use a hash table as described in section VI or use another hash table. For example, if at least a threshold number of blocks of the unit of the video have matching blocks identified in the hash-based block matching (according to matching hash values, without performing sample-wise block matching), the encoder selects integer-sample MV precision. Otherwise, the encoder selects a fractional-sample MV precision.
-
FIG. 12 shows a more specific technique (1200) for adapting MV precision during encoding, where MV precision is selected depending on the results of hash-based block matching. The technique (1200) can be performed by an encoder such as one described with reference toFIG. 3 orFIGS. 4a and 4b , or by another encoder. - According to the technique (1200), during encoding of video, the encoder determines an MV precision from among multiple MV precisions for units of the video. Specifically, when encoding a unit of video, the encoder determines (1210) whether to change MV precision. At the start of encoding, the encoder can initially set the MV precision according to a default value, or proceed as if changing the MV precision. For later units of video, the encoder may use the current MV precision (which was used for one or more previously encoded units) or change the MV precision. For example, the encoder can decide to change MV precision upon the occurrence of a defined event (e.g., after encoding of a threshold number of units, after a scene change, after a determination that the type of video has changed).
- To change the MV precision, the encoder determines (1220) the MV precision for the unit of video based at least in part on the results of hash-based block matching. For example, the encoder splits the unit into multiple blocks. For a given block of the multiple blocks, the encoder determines a hash value, then determines whether there is a match for it among multiple candidate blocks of one or more reference pictures. The encoder can evaluate a single reference picture (e.g., first reference picture in a reference picture list) or multiple reference pictures (e.g., each reference picture in the reference picture list). The match can signify matching hash values between the given block and one of the multiple candidate blocks. (That is, only hash values are checked.) Or, the match can further signify sample-by-sample matching between the given block and the one of the multiple candidate blocks. (That is, sample-wise comparisons confirm the match.) Considering hash values only is faster, but potentially less reliable since hash values from non-identical blocks can match. The hash-based block matching can use a hash table as described in section VI or use another hash table. If at least a threshold number of blocks of the unit have matching blocks identified in the hash-based block matching, the encoder can select integer-sample MV precision. Otherwise, the encoder can select a fractional-sample MV precision (such as quarter-sample MV precision).
- Whether or not the MV precision has changed, the encoder encodes (1230) the unit using the selected MV precision. MV values of blocks (e.g., prediction units, macroblocks, or other blocks) within the unit of the video have the selected MV precision. The encoder outputs encoded data for the current unit in a bitstream. The encoded data can include syntax elements that indicate the selected MV precision.
- The encoder decides (1240) whether to continue with the next unit. If so, the encoder decides (1210) whether to change the MV precision for the next unit. Thus, MV precision can be selected for each unit. Or, to reduce complexity, the MV precision for a unit can be changed from time-to-time (e.g., periodically or upon the occurrence of a defined event), then repeated for one or more subsequent units.
- In the techniques (1100, 1200) of
FIGS. 11 and 12 , the unit of video can be a sequence, series of pictures between scene changes, group of pictures, picture, slice, tile, CU, PU, other block or other type of unit of video. Depending on a desired tradeoff between complexity and flexibility, the encoder can select MV precision on a highly-local basis (e.g., CU-by-CU basis), a larger region-by-region basis (e.g., tile-by-tile basis or slice-by-slice basis), whole picture basis, or more global basis (e.g., per encoding session, per sequence, per GOP, or per series of pictures between detected scene changes). - In the techniques (1100, 1200) of
FIGS. 11 and 12 , the encoder can select between using ¼-sample MV precision and integer-sample MV precision. More generally, the encoder selects between multiple available MV precisions, which can include integer-sample MV precision, ½-sample MV precision, ¼-sample MV precision and/or another MV precision. The selected MV precision can apply for horizontal components and/or vertical components of MV values for the unit of video. - In the techniques (1100, 1200) of
FIGS. 11 and 12 , the hash-based block matching uses hash values determined from input sample values of the unit and (for candidate blocks) input sample values for one or more reference pictures. Alternatively, for candidate blocks represented in a hash table, the hash-based block matching can use hash values determined from reconstructed sample values. - In the techniques (1100, 1200) of
FIGS. 11 and 12 , when determining the MV precision for a unit of video, the encoder can also consider other factors, such as whether non-matched blocks contain a significant amount of natural video content (camera-captured video), as described in the next sections. - 2. Classifying Non-Matched Blocks.
- This section presents various ways to classify a non-matched block as natural, camera-captured video content or artificially-created video content (such as screen capture content). When determining the MV precision for a unit of video, hash-based block matching may fail to find a matching block for at least some of the blocks of the unit. For a non-matched block among the blocks of the unit, the encoder can classify the non-matched block as containing natural video content or artificially-created video content. By providing a high-probability way to differentiate natural video content from artificially-created video content in non-matched blocks, the encoder can select a more appropriate MV precision.
-
FIG. 13 shows characteristics of typical blocks of natural video content and screen capture content, which depict the same general pattern. The block (1310) of natural video content includes gradually changing sample values and irregular lines. In contrast, the block (1320) of artificially-created video content includes sharper lines and patterns of uniform sample values. Also, the number of different color values varies between the block (1310) of natural video content and block (1320) of screen capture content. The block (1320) of screen capture content includes three colors, and the block (1310) of natural video content includes many more different colors. -
FIG. 14 shows a technique (1400) for classifying a block of video depending on a measure of the number of different colors in the block. The technique (1400) can be performed by an encoder such as one described with reference toFIG. 3 orFIGS. 4a and 4b , or by another encoder. - To start, the encoder measures (1410) the number of different colors in the non-matched block. For example, the encoder counts the distinct colors among sample values in the block. Or, the encoder counts the distinct colors among sample values in the block after clustering of the sample values into fewer colors (e.g., quantizing the sample values such that similar sample values become the same sample value). Or, the encoder measures the number of different colors in the block in some other way. The sample values can be organized as a histogram or organized in some other way.
- The way the encoder measures the number of different colors in the block depends on the color space used. If the color space is YUV (e.g., YCbCr, YCoCg), for example, the encoder can count different Y values in the unit of video. Or, the encoder can count different YUV triplets (that is, distinct combinations of Y, U and V sample values for pixels at locations). If the color space is RGB (or GBR or BGR), the encoder can count sample values in one color component or multiple color components. Or, the encoder can count different triplets (that is, distinct combinations of R, G and B sample values for pixels at locations).
- Then, the encoder compares (1420) the number of different colors in the non-matched block to a threshold count. The value of the threshold count depends on implementation and can be, for example, 5, 8, 10, 20, or 50. The threshold count can be the same for all sizes of units (e.g., regardless of block size). Or, the threshold count can be different for different unit sizes (e.g., different block sizes). The threshold can be pre-defined and static, or the threshold can be adjustable (tunable). In any case, the presence of a small number of discrete sample values in a non-matched block tends to indicate screen capture content, and the presence of a large number of discrete sample values in a non-matched block tends to indicate natural video content.
- If the number of different colors is greater than the threshold, the encoder classifies (1440) the block as natural video content. If the number of different colors is less than the threshold, the encoder classifies (1430) the block as artificially-created video content. The boundary condition (count equals threshold) can be handled using either option, depending on implementation. The encoder repeats the technique (1400) on a block-by-block basis for non-matched blocks of the unit. In some example implementations, when more than a defined proportion of the non-matched blocks of the unit are classified as natural video content, the encoder selects a fractional-sample MV precision, since integer-sample MV precision is primarily useful when encoding artificially-created video content.
- Alternatively, the encoder otherwise considers statistics from the collected sample values of a non-matched block. For example, the encoder determines whether the x most common collected sample values account for more than y % of the sample values. The values of x and y depend on implementation. The value of x can be 10 or some other count. The value of y can be 80, 90 or some other percentage less than 100. If the x most common sample values account for more than y % of the sample values in the block, the block is classified as containing artificially-created video content. Otherwise, the block is classified as containing natural video content.
- 3. Example Decision-Making Processes.
-
FIG. 15 shows an example technique (1500) for selecting MV precision depending on the results of hash-based block matching and further depending on classification of non-matched blocks. The technique (1500) can be performed by an encoder such as one described with reference toFIG. 3 orFIGS. 4a and 4b , or by another encoder. - The encoder splits (1510) a unit of video into T blocks. For example, the T blocks are non-overlapped M×N blocks. In some example implementations, the M×N blocks are 8×8 blocks. Alternatively, the M×N blocks have another size.
- The encoder compares (1520) T to a block count threshold. In the example implementations, the block count threshold is 10. Alternatively, the block count threshold has another value. The block count threshold can be pre-defined and static, or the block count threshold can be adjustable (tunable). The block count threshold ensures that the encoder considers a sufficient number of blocks when selecting the MV precision for the unit. If T is less than the block count threshold, the encoder selects (1580) quarter-sample MV precision for the unit. The boundary condition (T equals the block count threshold) can be handled using this option or the other option, depending on implementation.
- If T is greater than the block count threshold, the encoder performs (1530) hash-based block matching for the T blocks of the unit. For each of the T blocks, the encoder calculates a hash value and finds if there is a candidate block of a reference picture that has an identical hash value. Of the T blocks of the unit, the encoder finds M blocks that have matching blocks (according to matching hash values) in the hash-based block matching. This leaves T M non-matched blocks.
- The encoder compares (1540) the proportion M/T to a matched block threshold. In the example implementations, the matched block threshold is 25%. Alternatively, the matched block threshold has another value. The matched block threshold can be pre-defined and static, or the matched block threshold can be adjustable (tunable). The matched block threshold ensures that a sufficient number of matched blocks has been found when selecting the MV precision for the unit. If M/T is less than the matched block threshold, the encoder selects (1580) quarter-sample MV precision for the unit. The boundary condition (M/T equals the matched block threshold) can be handled using this option or the other option, depending on implementation. Alternatively, instead of using M/T, the encoder compares some other measure that relates to the number of matched blocks to a threshold.
- If M/T is greater than the matched block threshold, the encoder classifies (1550) each the T−M non-matched blocks into one of two categories depending on the histogram of color values (number of different colors) in the block. The two categories are (1) natural video content, for blocks more likely to contain camera-captured video content, and (2) artificially-created video content, for blocks more likely to contain screen capture content. Of the T−M non-matched blocks of the unit, the encoder finds C blocks that are classified as natural video content, and S blocks that are classified as artificially-created video content. T=M+C+S.
- For example, for a given non-matched block, the encoder counts the number of different colors contained in the block. When counting the number of different colors, the encoder can count a single color component (e.g., luma, G) or count all of the color components (e.g., luma and chroma; R, G and B). The encoder compares the count to a color threshold, whose value depends on implementation. In the example implementations, the color threshold is 8 for an 8×8 block. Alternatively, the color threshold has another value. The color threshold can be the same for all sizes of blocks. Or, the color threshold can be different for different block sizes. The color threshold can be pre-defined and static, or the color threshold can be adjustable (tunable). If the count is less than the color threshold, the non-matched block is classified as artificially-created video content. If the count is greater than the color threshold, the non-matched block is classified as natural video content. The boundary condition (count equals the color threshold) can be handled using either option, depending on implementation.
- The encoder compares (1560) the proportion C/T to a natural video block threshold. In the example implementations, the natural video block threshold is 3%. Alternatively, the natural video block threshold has another value. The natural video block threshold can be pre-defined and static, or the natural video block threshold can be adjustable (tunable). The natural video block threshold ensures that integer-sample MV precision is not selected if there are too many blocks of natural video content. If C/T is greater than the natural video block threshold, the encoder selects (1580) quarter-sample MV precision for the unit. If C/T is less than the natural video block threshold, the encoder selects (1570) integer-sample MV precision for the unit. The boundary condition (C/T equals the natural video block threshold) can be handled using either option, depending on implementation. Alternatively, instead of using C/T, the encoder compares some other measure that relates to the number of natural video blocks to a threshold.
- Thus, the encoder selects the MV precision based on one or more of: (a) a comparison of a number of the multiple blocks to a blocks threshold, (b) a comparison of a measure of the multiple blocks that have matching blocks from the hash-based block matching to a matched blocks threshold, and (c) a comparison of a measure of the multiple blocks classified as natural video content to a natural video blocks threshold. For example, the encoder selects integer-sample MV precision if: (a) the number of the multiple blocks greater than the blocks threshold, (b) the measure of the multiple blocks that have matching blocks from the hash-based block matching is greater than the matched blocks threshold, AND (c) the measure of the multiple blocks classified as natural video content is less than the natural video blocks threshold. Otherwise, when any of these conditions (a)-(c) is not satisfied, the encoder selects quarter-sample MV precision. As noted, handling of the boundary conditions depends on implementation.
- 4. Alternatives and Variations
- When the encoder uses the same pattern of tiles from picture-to-picture, the encoder can repeat per-tile MV precisions from picture-to-picture. Co-located tiles from picture-to-picture can use the same MV precision. Similarly, co-located slices from picture-to-picture can use the same MV precision. For example, suppose video depicts a computer desktop, and part of the desktop has a window displaying natural video content. A fractional-sample MV precision may be used within that region of the desktop from picture-to-picture, whether other areas that show text or other rendered content are encoded using integer-sample MV precision.
- The encoder can adjust an amount of bias towards or against integer-sample MV precision based at least in part on a degree of confidence that integer-sample MV precision is appropriate. The encoder can also adjust an amount of bias towards or against integer-sample MV precision based at least in part on target computational complexity of encoding and/or decoding (favoring integer-sample MV precision to reduce computational complexity). For example, the encoder can adjust thresholds used in comparison operations to make it more likely or less likely that integer-sample MV precision is selected.
- The selected MV precision can be for horizontal MV components and/or vertical MV components of the MV values of blocks within the unit of the video, where the horizontal MV components and vertical MV components are permitted to have different MV precisions. Or, the selected MV precision can be for both horizontal MV components and vertical MV components of the MV values of blocks within the unit of the video, where the horizontal MV components and vertical MV components have the same MV precision.
- In most of the preceding examples of selection of MV precision, the encoded video in the bitstream includes one or more syntax elements that indicate the selected MV precision for the unit. A decoder parses the syntax element(s) indicating the selected MV precision and interprets MV values according to the selected MV precision. Alternatively, the encoded video in the bitstream can lack any syntax elements that indicate the selected MV precision. For example, even if the bitstream supports signaling of MV values with a fractional-sample MV precision, the encoder can constrain motion estimation for the unit of the video to use only MV values with fractional parts of zero, and only MV values that indicate integer-sample offsets are used in motion compensation. A decoder reconstructs and applies MV values at the fractional-sample MV precision (where the MV values indicate integer-sample offsets). This may reduce computational complexity of decoding by avoiding interpolation operations.
- This section presents various approaches to selectively disabling sample adaptive offset (“SAO”) filtering depending on the results of hash-based block matching (e.g., matching hash values). By disabling SAO filtering when it is unlikely to be effective, these approaches can facilitate compression that is effective in terms of rate-distortion performance and/or computational efficiency of encoding and decoding.
- A. SAO Filtering.
- SAO filtering involves non-linear filtering operations that can be used, for example, to enhance edge sharpness or suppress banding artifacts or ringing artifacts. Within a region, SAO filtering can be adaptively applied to sample values that satisfy certain conditions, such as presence of a gradient across the sample values.
- According to the H.265/HEVC standard, SAO filtering can be enabled or disabled for a sequence. Specifically, whether SAO filtering is performed for pictures of a sequence can be controlled by a syntax element in the SPS. If sample_adaptive_offset_enabled_flag is 1, SAO filtering may be applied to slices of reconstructed pictures after deblocking filtering. If sample_adaptive_offset_enabled_flag is 0, SAO filtering is not applied.
- According to the H.265/HEVC standard, when enabled for a sequence, SAO filtering can be enabled or disabled on a slice-by-slice basis for luma content of a slice and/or chroma content of the slice. Specifically, two slice segment header flags control SAO filtering for a slice. If slice_sao_luma_flag is 1, SAO filtering is enabled for the luma component of the slice. If slice_sao_luma_flag is 0 (default value, if not present), SAO filtering is disabled for the luma component of the slice. If slice_sao_chroma_flag is 1, SAO filtering is enabled for the chroma component of the slice. If slice_sao_chroma_flag is 0 (default value, if not present), SAO filtering is disabled for the chroma component of the slice.
- Further, according to the H.265/HEVC standard, SAO filtering can be enabled or disabled for CTBs of a CTU in a slice, where a CTU typically includes a luma CTB and corresponding chroma CTBs. For a CTB, a type index (sao_type_idx_luma or sao_type_idx_chroma) indicates whether SAO filtering is disabled, uses band offsets, or uses edge offsets. If the type index is 0, SAO filtering is disabled for the CTB. If the type index is 1, the type of SAO filtering used for the CTB is band offset. Finally, if the type index is 2, the type of SAO filtering used for the CTB is edge offset. In some cases, a CTB can reuse syntax elements from an adjacent CTB to control SAO filtering.
- For band-offset SAO filtering according to the H.265/HEVC standard, the relevant sample value range is split into 32 bands. Sample values in four consecutive bands are modified by adding band offsets. A syntax element indicates the starting position of the bands to be modified, and other syntax elements indicate the band offsets.
- For edge-offset SAO filtering according to the H.265/HEVC standard, a syntax element (sao_eo_class) indicates whether a horizontal, vertical, 45 degree or 135 degree gradient is used in SAO filtering. Each sample value of a CTB is classified based on relations to its neighbor sample values along the selected gradient (e.g., classified as a flat area, local minimum, edge, or local maximum). For categories other than “flat area,” an offset (indicated by syntax elements in the bitstream) is added to the sample value.
- SAO filtering can enhance edge sharpness and suppress certain types of artifacts, but it increases the computational complexity of encoding and decoding, and it consumes some bits signaling SAO parameters. When encoding artificially-created video content, the added costs of SAO filtering (in terms of bit rate and computational complexity) may be unjustified. For example, if blocks of a screen content region of a current picture are predicted well using candidate blocks in a reference picture, and the expected quality of the blocks is at least as good as the quality of the candidate blocks in the reference picture, SAO filtering may fail to improve quality. For such content, bit rate and computational complexity can be reduced, without a significant penalty to quality, by disabling SAO filtering.
- B. Selectively Disabling SAO Filtering Using Results of Hash-Based Block Matching.
-
FIG. 16 shows a generalized technique (1600) for selectively disabling SAO filtering depending on the results (e.g., matching hash values) of hash-based block matching. The technique (1600) can be performed by an encoder such as one described with reference toFIG. 3 orFIGS. 4a and 4b , or by another encoder. - The encoder encodes an image or video to produce encoded data, which the encoder outputs as part of a bitstream. During the encoding, the encoder performs (1610) hash-based block matching for a current block of a current picture. The current block can be a CTB of a CTU, or some other block. For example, the encoder determines a hash value for the current block, then attempts to find a match for it among multiple candidate blocks of one or more reference pictures. The encoder can evaluate a single reference picture (e.g., first reference picture in a reference picture list) or multiple reference pictures (e.g., each reference picture in the reference picture list). The match can signify matching hash values between the given block and one of the multiple candidate blocks. (That is, only hash values are checked.) Or, the match can further signify sample-by-sample matching between the given block and the one of the multiple candidate blocks. (That is, sample-wise comparisons confirm the match.) Considering hash values only is faster, but potentially less reliable since hash values from non-identical blocks can match. The hash-based block matching can use a hash table as described in section VI or use another hash table.
- Based on whether a condition is satisfied, the encoder determines (1620) whether to disable SAO filtering for the current block. The condition depends on whether a match is found during the hash-based block matching for the current block (e.g., considering matching hash values, but not sample-wise comparisons). Reconstructed sample values may be different than the input sample values used to determine hash values. Thus, the condition can also depend on other factors, such as expected quality of the current block relative to quality of a candidate block for the match. Alternatively, the condition depends on other and/or additional factors.
- The expected quality of the current block can be indicated by a quantization parameter (“QP”) value that applies for the current block, and the quality of the candidate block can be indicated by a QP value that applies for the candidate block. The QP values can be picture QP values (QP value for the current picture versus QP value for the reference picture that includes the candidate block) or block-level QP values. If the candidate block (which matches the current block) covers parts of blocks that have different QP values, the QP value that applies for the candidate block can be (a) a smallest QP value among the different QP values for the blocks, (b) a QP value of whichever block covers a largest portion of the candidate block, (c) an average QP value among the different QP values for the blocks, (d) a weighted average QP value among the different QP values for the blocks, (e) a largest QP value among the different QP values for the blocks, or (f) some other QP value derived from one or more of the different QP values for the blocks.
- In particular, as part of the condition, the encoder can check that the QP value for the current picture is greater than or equal to the QP value for the reference picture that includes the candidate block. Or, as part of the condition, the encoder can check that the QP value that applies for the current block is greater than or equal to the QP value that applies for the candidate block. If the QP value for the current picture is greater than or equal to the QP value for the reference picture, the expected error for the current picture is equivalent to or worse than the expected error for the reference picture. Similarly, if the QP value that applies for the current block is greater than or equal to the QP value that applies for the candidate block, the expected error for the current block is equivalent to or worse than the expected error for the candidate block. Alternatively, instead of checking QP values for the current block and candidate block, the encoder evaluates expected quality of the current block relative to quality of a candidate block for the match in some other way.
- Based on results of the determining (1620), the encoder selectively disables (1630) SAO filtering for the current block. If SAO filtering is not disabled for the current block, the encoder can check one or more other conditions to decide whether to use SAO filtering for the current block and, if SAO filtering is used, determine parameters for SAO filtering for the current block. As part of the SAO determination process, the encoder can evaluate different options for type of SAO filter (edge offset or band offset), gradients, bands, offset values, etc.
- The encoder can repeat the technique (1600) on a block-by-block basis for other blocks of a CTU, a slice or picture.
-
FIG. 17 illustrates a more detailed example technique (1700) for selectively disabling SAO filtering depending on the results of hash-based block matching. The technique (1700) can be performed by an encoder such as one described with reference toFIG. 3 orFIGS. 4a and 4b , or by another encoder. - During encoding, the encoder selectively disables SAO filtering for a current block of a current picture. The encoder performs (1710) hash-based block matching for the current block. For example, the encoder performs hash-based block matching using one of the hash tables described in section VI.
- The encoder checks (1720) if hash-based block matching yields a match (here, matching hash values) for the current block. If the hash-based block matching yields a match, the encoder determines (1730) QP values for the current block and the candidate block (e.g., from picture-level, slice-level and/or CU-level QP values), then determines (1740) whether the candidate block passes a quality check (e.g., reconstruction quality of the candidate block (or reference picture) is not worse than the expected quality of the current block (or current picture)). If both checks (1720, 1740) are passed, the encoder disables (1750) SAO filtering for the current block, bypassing any other SAO filtering checking for the current block (according to one or more other conditions). Otherwise, if either of the two checks (1720, 1740) fails, the encoder performs (1760) SAO filtering checking for the current block. That is, if either of the two checks (1720, 1740) fails, the encoder can still determine whether SAO filtering should or should not be used for the current block (according to one or more other conditions) and, if SAO filtering is used, determine the parameters of SAO filtering for the current block.
- The encoder can repeat the technique (1700) on a block-by-block basis for other blocks of a CTU, a slice or picture.
- IX. Determining which Reference Pictures to Retain.
- This section presents various approaches to deciding which reference pictures to retain in a reference picture set (“RPS”) depending on the results of hash-based block matching (e.g., matching hash values). By selecting reference pictures that facilitate effective motion-compensated prediction, these approaches can facilitate compression that is effective in terms of rate-distortion performance.
- A. Reference Picture Sets.
- A reference picture is, in general, a picture that contains samples that may be used for prediction in the decoding process of other pictures, which typically follow the reference picture in decoding order (also called coding order, coded order or decoded order). Multiple reference pictures may be available at a given time for use for motion-compensated prediction.
- In general, an RPS is a set of reference pictures available for use in motion-compensated prediction. For a current picture, for example, an encoder or decoder determines an RPS that includes reference pictures in a decoded frame storage area such as a decoded picture buffer (“DPB”). The size of the RPS can be pre-defined or set according to a syntax element in a bitstream. For example, a syntax element indicates a constraint on the maximum number of reference pictures contained in the RPS. The reference pictures in the RPS may be adjacent in display order (also called temporal order) or separated from each other in display order. Also, a given reference picture in the RPS can precede a current picture in display order or follow the current picture in display order. During encoding and decoding, an RPS is updated—reference pictures in the RPS change from time to time to add newly decoded pictures and drop older pictures that are no longer used as reference pictures.
- According to the H.265/HEVC standard, for a current picture, the RPS is a description of the reference pictures used in the decoding process of the current and future coded pictures. Reference pictures included in the RPS are listed explicitly in the bitstream. Specifically, the RPS includes reference pictures in multiple groups (also called RPS lists). The encoder can determine the RPS once per picture. For a current picture, the encoder determines groups of short-term reference pictures and long-term reference pictures that may be used in inter-picture prediction of the current picture and/or a following picture (in decoding order). Collectively, the groups of reference pictures define the RPS for the current picture. The encoder signals syntax elements in a slice segment header to indicate how the decoder should update the RPS for the current picture.
- According to the H.265/HEVC standard, for the current picture, the decoder determines the RPS after decoding a slice segment header for a slice of the current picture, using syntax elements signaled in the slice header. Reference pictures are identified with picture order count (“POC”) values, parts thereof and/or other information signaled in the bitstream. The decoder determines groups of short-term reference pictures and long-term reference pictures that may be used in inter-picture prediction of the current picture and/or a following picture (in decoding order), which define the RPS for the current picture.
-
FIG. 18 shows an example (1800) of updates to reference pictures of an RPS. The RPS includes up to four reference pictures, which are separated from each other in display order inFIG. 18 . Alternatively, at least some of the reference pictures in the RPS can be adjacent in display order. InFIG. 18 , three of the reference pictures in the RPS precede the current picture in display order, but one reference picture follows the current picture in display order. InFIG. 18 , whenpicture 222 is the current picture, the RPS includes reference pictures 37, 156, 221 and 230. Afterpicture 222 is encoded/decoded, the RPS is updated.Picture 221 is dropped from the RPS, andpicture 222 is added to the RPS. Thus, whenpicture 223 is the current picture, the RPS includes reference pictures 37, 156, 222 and 230. - In general, a reference picture list (“RPL”) is a list of reference pictures used for motion-compensated prediction. An RPL is constructed from the RPS. According to the H.265/HEVC standard, an RPL is constructed for a slice. Reference pictures in the RPL are addressed with reference indices. During encoding and decoding, when an RPL is constructed, reference pictures in the RPL can change to reflect changes to the RPS and/or to reorder reference pictures within the RPL to make signaling of the more commonly used reference indices more efficient. Typically, an RPL is constructed during encoding and decoding based upon available information about the RPL (e.g., available pictures in the RPS), modifications according to rules and/or modifications signaled in the bitstream.
- The H.265/HEVC standard allows an encoder to decide which pictures are retained in an RPS, but does not define the patterns of reference pictures retained or criteria for retaining reference pictures. An encoder can apply a simple, fixed strategy such as dropping the oldest reference picture in the RPS, but that may result in dropping a useful reference picture. Sophisticated approaches to evaluating which reference pictures to retain can be computationally-intensive.
- B. Updating an RPS Using Results of Hash-Based Block Matching.
- This section describes computationally efficient and effective approaches to deciding which reference pictures to retain in an RPS. The approaches are adapted for encoding of artificially-created video content, but can also be applied for other types of video content.
-
FIG. 19 shows a generalized technique (1900) for deciding which reference pictures to retain in an RPS depending on the results (e.g., matching hash values) of hash-based block matching. The technique (1900) can be performed by an encoder such as one described with reference toFIG. 3 orFIGS. 4a and 4b , or by another encoder. - The encoder encodes (1910) video to produce encoded data and outputs (1920) the encoded data in a bitstream. As part of the encoding (1910), the encoder determines which of multiple reference pictures to retain in an RPS based at least in part on the results of hash-based block matching. For example, the multiple reference pictures include one or more previous reference pictures, which were previously in the RPS for encoding of a current picture, as well as a current reference picture that is a reconstructed version of the current picture. To determine which reference pictures to retain in an RPS, the encoder can use the approach shown in
FIG. 20 , the approach shown inFIG. 21 , or another approach. - For example, suppose an RPS includes at most four reference pictures. The RPS can include references pictures picref1, picref2, picref3 and picref4 for encoding of a current picture. When encoding the next picture, the encoder updates the RPS. A reconstructed version of the current picture (piccurrent) can be added to the RPS, in which case one of the reference pictures previously in the RPS is dropped if the capacity of the RPS is exceeded. For example, any four of picref1, picref2, picref3, picref4 and piccurrent can be included in the RPS, and the remaining picture is dropped.
- In the approaches shown in
FIGS. 20 and 21 , hash values for the hash-based block matching are computed from input sample values for a picture, whether the picture is a next picture, current picture (current reference picture) or previous reference picture. That is, even though the encoder is making decisions about reference pictures, which include reconstructed sample values, the hash values are computed from input sample values for those pictures. -
FIG. 20 shows a first example technique (2000) for deciding which reference pictures to retain in an RPS depending on the results on hash-based block matching. The technique (2000) can be performed by an encoder such as one described with reference toFIG. 3 orFIGS. 4a and 4b , or by another encoder. - In general, in the approach shown in
FIG. 20 , if the RPS is already full when the current picture has been encoded, the encoder drops the candidate reference picture that is expected to be least effective in predicting the next picture. The encoder evaluates the candidate reference pictures (current reference picture and previous reference pictures) in succession. For each of the candidate reference pictures, the encoder uses hash-based block matching to estimate how well the candidate reference picture predicts the next picture. After evaluating the candidate reference pictures, the encoder drops the candidate reference picture that is expected to predict the next picture worst. On the other hand, if the RPS is not full, the encoder can simply add the current picture to the RPS as a new reference picture and retain the previous reference pictures. Typically, the approach shown inFIG. 20 retains in the RPS those candidate reference pictures best suited for motion-compensated prediction of the next picture, but the retained reference pictures might not be as useful for motion-compensated prediction of pictures further in the future (e.g., after a scene change). - As shown in
FIG. 20 , the encoder adds (2010) the current picture as a candidate reference picture. The encoder checks (2020) whether the RPS, counting the current picture (current reference picture) and previous reference pictures as candidate reference pictures, would be past full. If not, the updated RPS includes the previous reference pictures, if any, and the current picture (current reference picture), and the technique (2000) ends. - Otherwise (RPS was already at capacity with the previous reference pictures), the encoder determines which candidate reference picture to drop. For a given candidate reference picture, the encoder performs (2030) hash-based block matching between blocks of the next picture and the candidate reference picture. For example, the encoder splits the next picture into M×N blocks (where the M×N blocks can be 8×8 blocks or blocks of some other size), and attempts to find matching hash values for the respective blocks of the next picture and candidate blocks of the candidate reference picture. The encoder counts (2040) blocks of the next picture with matches in the candidate reference picture (e.g., matching hash values from the hash-based block matching, without sample-wise comparisons). A count value countcand _ x indicates how many of the blocks of the next picture have matching blocks in the candidate reference picture.
- The encoder checks (2050) whether to continue with another candidate reference picture. If so, the encoder performs (2030) hash-based block matching between blocks of the next picture and the other candidate reference picture. Thus, the encoder evaluates the previous reference pictures in the RPS (from encoding of the current picture) as well as the current reference picture (after reconstruction of the current picture) as candidate reference pictures. After determining counts of matches for all of the candidate reference pictures, the encoder drops (2060) the candidate reference picture with the lowest count.
- For example, the encoder evaluates picref picref2, picref3, picref4 and piccurrent as candidate reference pictures, performing (2030) hash-based block matching for blocks of the next picture. The encoder determines (2040) count values countcand _ ref1, countcand _ ref2, countcand _ ref3, countcand _ ref4 and countcand _ current, for the respective candidate reference pictures. The encoder determines which of countcand _ ref1, countcand _ ref2, countcand _ ref3, countcand _ ref4 and countcand _ current is lowest, and drops (2060) the candidate reference picture having the lowest count.
- The encoder can repeat the technique (2000) on a picture-by-picture basis.
-
FIG. 21 shows a second example technique (2100) for deciding which reference pictures to retain in an RPS depending on results on hash-based block matching. The technique (2100) can be performed by an encoder such as one described with reference toFIG. 3 orFIGS. 4a and 4b , or by another encoder. - In general, in the approach shown in
FIG. 21 , if the RPS is already full when the current picture has been encoded, the encoder adds the current picture (current reference picture) to the RPS but drops the candidate previous reference picture that is estimated to be most similar to the current picture (current reference picture). This tends to maintain diversity among the reference pictures in the RPS. The encoder evaluates the candidate previous reference pictures in succession. For each of the candidate previous reference pictures (which were in the RPS for encoding of the current picture), the encoder uses hash-based block matching to estimate similarity to the current reference picture. After evaluating the candidate previous reference pictures, the encoder drops the candidate previous reference picture that is estimated to be most similar to the current reference picture. On the other hand, if the RPS is not full, the encoder can simply add the current reference picture to the RPS as a new reference picture and retain the previous reference pictures. In this way, the approach shown inFIG. 21 can retain in the RPS reference pictures that are useful for motion-compensated prediction even if future pictures change significantly (e.g., after a scene change). - As shown in
FIG. 21 , the encoder adds (2110) the current picture as a current reference picture. Compared to the next picture to be encoded, the current reference picture tends to have small temporal differences and a high correlation, so the encoder retains it as a reference picture. The encoder checks (2120) whether the RPS, counting the current reference picture and previous reference pictures as candidate reference pictures, would be past full. If not, the new RPS includes the previous reference pictures, if any, and the current reference picture, and the technique (2100) ends. - Otherwise (RPS was already at capacity with the previous reference pictures), the encoder determines which candidate previous reference picture to drop. For a given candidate previous reference picture, the encoder performs (2130) hash-based block matching between blocks of the current reference picture and the candidate previous reference picture. For example, the encoder splits the current reference picture into M×N blocks (where the M×N blocks can be 8×8 blocks or blocks of some other size), and attempts to find matching hash values for the respective blocks of the current reference picture and candidate blocks of the candidate previous reference picture. The encoder counts (2140) blocks of the current reference picture with matches in the candidate previous reference picture (e.g., matching hash values from the hash-based block matching, without sample-wise comparisons). A count value countcand _ x indicates how many of the blocks of the current reference picture have matching blocks in the candidate previous reference picture.
- The encoder checks (2150) whether to continue with another candidate previous reference picture. If so, the encoder performs (2130) hash-based block matching between blocks of the current reference picture and the other candidate previous reference picture. Thus, the encoder evaluates the previous reference pictures in the RPS (from encoding of the current picture) as candidate reference pictures. After determining counts of matches for all of the candidate previous reference pictures, the encoder drops (2160) the candidate previous reference picture with the highest count.
- For example, the encoder evaluates picref1, Picref2, picref3 and picref4 as candidate reference pictures, performing (2130) hash-based block matching for blocks of the current reference picture. The encoder determines (2140) count values countcand _ ref1, countcand _ ref2, countcand _ ref3 and countcand _ ref4, for the respective candidate previous reference pictures. The encoder determines which of countcand _ ref1, countcand _ ref2, countcand _ ref3 and countcand _ ref4 is highest, and drops (2160) the candidate previous reference picture having the highest count.
- The encoder can repeat the technique (2100) on a picture-by-picture basis.
- In view of the many possible embodiments to which the principles of the disclosed invention may be applied, it should be recognized that the illustrated embodiments are only preferred examples of the invention and should not be taken as limiting the scope of the invention. Rather, the scope of the invention is defined by the following claims. We therefore claim as our invention all that comes within the scope and spirit of these claims.
Claims (22)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2014/080481 WO2015196322A1 (en) | 2014-06-23 | 2014-06-23 | Encoder decisions based on results of hash-based block matching |
Publications (2)
Publication Number | Publication Date |
---|---|
US20170163999A1 true US20170163999A1 (en) | 2017-06-08 |
US10681372B2 US10681372B2 (en) | 2020-06-09 |
Family
ID=54936402
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/321,536 Active US10681372B2 (en) | 2014-06-23 | 2014-06-23 | Encoder decisions based on results of hash-based block matching |
Country Status (5)
Country | Link |
---|---|
US (1) | US10681372B2 (en) |
EP (2) | EP3598758B1 (en) |
KR (1) | KR102287779B1 (en) |
CN (1) | CN105706450B (en) |
WO (1) | WO2015196322A1 (en) |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160261882A1 (en) * | 2015-03-06 | 2016-09-08 | Qualcomm Incorporated | Method and apparatus for low complexity quarter pel generation in motion search |
US9875552B1 (en) * | 2016-07-26 | 2018-01-23 | Teradici Corporation | Content independent method of motion determination using sparse matrices |
US10291928B2 (en) * | 2017-01-10 | 2019-05-14 | Blackberry Limited | Methods and devices for inter-prediction using motion vectors for video coding |
US10368092B2 (en) | 2014-03-04 | 2019-07-30 | Microsoft Technology Licensing, Llc | Encoder-side decisions for block flipping and skip mode in intra block copy prediction |
US10390039B2 (en) | 2016-08-31 | 2019-08-20 | Microsoft Technology Licensing, Llc | Motion estimation for screen remoting scenarios |
US20190349593A1 (en) * | 2017-02-02 | 2019-11-14 | Hewlett-Packard Development Company, L.P. | Video compression |
US10567754B2 (en) | 2014-03-04 | 2020-02-18 | Microsoft Technology Licensing, Llc | Hash table construction and availability checking for hash-based block matching |
US10672113B2 (en) * | 2016-11-09 | 2020-06-02 | AI Analysis, Inc. | Methods and systems for normalizing images |
US10681372B2 (en) | 2014-06-23 | 2020-06-09 | Microsoft Technology Licensing, Llc | Encoder decisions based on results of hash-based block matching |
US20200195959A1 (en) * | 2018-06-29 | 2020-06-18 | Beijing Bytedance Network Technology Co., Ltd. | Concept of using one or multiple look up tables to store motion information of previously coded in order and use them to code following blocks |
US20200413044A1 (en) | 2018-09-12 | 2020-12-31 | Beijing Bytedance Network Technology Co., Ltd. | Conditions for starting checking hmvp candidates depend on total number minus k |
US11025923B2 (en) | 2014-09-30 | 2021-06-01 | Microsoft Technology Licensing, Llc | Hash-based encoder decisions for video coding |
US11076171B2 (en) | 2013-10-25 | 2021-07-27 | Microsoft Technology Licensing, Llc | Representing blocks with hash values in video and image coding and decoding |
US11095877B2 (en) | 2016-11-30 | 2021-08-17 | Microsoft Technology Licensing, Llc | Local hash-based motion estimation for screen remoting scenarios |
US11134267B2 (en) | 2018-06-29 | 2021-09-28 | Beijing Bytedance Network Technology Co., Ltd. | Update of look up table: FIFO, constrained FIFO |
US11134244B2 (en) | 2018-07-02 | 2021-09-28 | Beijing Bytedance Network Technology Co., Ltd. | Order of rounding and pruning in LAMVR |
US11140385B2 (en) | 2018-06-29 | 2021-10-05 | Beijing Bytedance Network Technology Co., Ltd. | Checking order of motion candidates in LUT |
US11140383B2 (en) | 2019-01-13 | 2021-10-05 | Beijing Bytedance Network Technology Co., Ltd. | Interaction between look up table and shared merge list |
US11146785B2 (en) | 2018-06-29 | 2021-10-12 | Beijing Bytedance Network Technology Co., Ltd. | Selection of coded motion information for LUT updating |
US20210329291A1 (en) * | 2019-01-02 | 2021-10-21 | Beijing Bytedance Network Technology Co., Ltd. | Hash-based motion searching |
US11159807B2 (en) | 2018-06-29 | 2021-10-26 | Beijing Bytedance Network Technology Co., Ltd. | Number of motion candidates in a look up table to be checked according to mode |
US11159817B2 (en) | 2018-06-29 | 2021-10-26 | Beijing Bytedance Network Technology Co., Ltd. | Conditions for updating LUTS |
US11202085B1 (en) | 2020-06-12 | 2021-12-14 | Microsoft Technology Licensing, Llc | Low-cost hash table construction and hash-based block matching for variable-size blocks |
US11290714B2 (en) * | 2018-11-15 | 2022-03-29 | Korea Electronics Technology Institute | Motion-constrained AV1 encoding method and apparatus for tiled streaming |
US11343494B2 (en) * | 2018-06-13 | 2022-05-24 | Huawei Technologies Co., Ltd. | Intra sharpening and/or de-ringing filter for video coding |
US11528501B2 (en) | 2018-06-29 | 2022-12-13 | Beijing Bytedance Network Technology Co., Ltd. | Interaction between LUT and AMVP |
US11528500B2 (en) | 2018-06-29 | 2022-12-13 | Beijing Bytedance Network Technology Co., Ltd. | Partial/full pruning when adding a HMVP candidate to merge/AMVP |
US11546617B2 (en) * | 2020-06-30 | 2023-01-03 | At&T Mobility Ii Llc | Separation of graphics from natural video in streaming video content |
US11589071B2 (en) | 2019-01-10 | 2023-02-21 | Beijing Bytedance Network Technology Co., Ltd. | Invoke of LUT updating |
US11641483B2 (en) | 2019-03-22 | 2023-05-02 | Beijing Bytedance Network Technology Co., Ltd. | Interaction between merge list construction and other tools |
US20230196623A1 (en) * | 2021-12-22 | 2023-06-22 | Red Hat, Inc. | Content-based encoding of digital images |
US20230209064A1 (en) * | 2021-12-23 | 2023-06-29 | Ati Technologies Ulc | Identifying long term reference frame using scene detection and perceptual hashing |
US11956464B2 (en) | 2019-01-16 | 2024-04-09 | Beijing Bytedance Network Technology Co., Ltd | Inserting order of motion candidates in LUT |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018111131A1 (en) * | 2016-12-15 | 2018-06-21 | Huawei Technologies Co., Ltd | Intra sharpening and/or de-ringing filter for video coding based on a bitstream flag |
JP7233819B2 (en) | 2017-11-09 | 2023-03-07 | ソニーグループ株式会社 | Image processing device and image processing method |
CN115842912A (en) | 2018-08-04 | 2023-03-24 | 抖音视界有限公司 | Interaction between different decoder-side motion vector derivation modes |
KR20200081328A (en) | 2018-12-27 | 2020-07-07 | 인텔렉추얼디스커버리 주식회사 | Video encoding/decoding method and apparatus |
JP7194651B2 (en) | 2019-07-12 | 2022-12-22 | 信越化学工業株式会社 | COMPOSITION FOR FORMING RESIST UNDERLAYER FILM, PATTERN FORMING METHOD AND POLYMER |
CN114365490B (en) | 2019-09-09 | 2024-06-18 | 北京字节跳动网络技术有限公司 | Coefficient scaling for high precision image and video codecs |
CN114731392A (en) | 2019-09-21 | 2022-07-08 | 北京字节跳动网络技术有限公司 | High precision transform and quantization for image and video coding |
Citations (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5613004A (en) * | 1995-06-07 | 1997-03-18 | The Dice Company | Steganographic method and device |
US6904110B2 (en) * | 1997-07-31 | 2005-06-07 | Francois Trans | Channel equalization system and method |
US20050166040A1 (en) * | 2002-12-02 | 2005-07-28 | Walmsley Simon R. | Embedding data and information related to function with which data is associated into a payload |
US6983020B2 (en) * | 2002-03-25 | 2006-01-03 | Citrix Online Llc | Method and apparatus for fast block motion detection |
US20060224594A1 (en) * | 2005-04-04 | 2006-10-05 | Oracle International Corporation | Methods and systems for identifying highly contended blocks in a database |
US7216232B1 (en) * | 1999-04-20 | 2007-05-08 | Nec Corporation | Method and device for inserting and authenticating a digital signature in digital data |
US7430670B1 (en) * | 1999-07-29 | 2008-09-30 | Intertrust Technologies Corp. | Software self-defense systems and methods |
US7702127B2 (en) * | 2005-10-21 | 2010-04-20 | Microsoft Corporation | Video fingerprinting using complexity-regularized video watermarking by statistics quantization |
US7747584B1 (en) * | 2006-08-22 | 2010-06-29 | Netapp, Inc. | System and method for enabling de-duplication in a storage system architecture |
US20100268836A1 (en) * | 2009-03-16 | 2010-10-21 | Dilithium Holdings, Inc. | Method and apparatus for delivery of adapted media |
US7949186B2 (en) * | 2006-03-15 | 2011-05-24 | Massachusetts Institute Of Technology | Pyramid match kernel and related techniques |
US20110128810A1 (en) * | 2008-06-30 | 2011-06-02 | Fujitsu Semiconductor Limited | Memory device and memory control for controlling the same |
US20110243234A1 (en) * | 2008-10-02 | 2011-10-06 | Sony Corporation | Image processing apparatus and method |
US8041677B2 (en) * | 2005-10-12 | 2011-10-18 | Datacastle Corporation | Method and system for data backup |
US20110311042A1 (en) * | 2008-10-23 | 2011-12-22 | University Of Ulster | Encryption method |
US8086052B2 (en) * | 2003-05-20 | 2011-12-27 | Peter Toth | Hybrid video compression method |
US8099415B2 (en) * | 2006-09-08 | 2012-01-17 | Simply Hired, Inc. | Method and apparatus for assessing similarity between online job listings |
US8099601B2 (en) * | 1999-06-08 | 2012-01-17 | Intertrust Technologies Corp. | Methods and systems for encoding and protecting data using digital signature and watermarking techniques |
US20120170653A1 (en) * | 2010-12-30 | 2012-07-05 | General Instrument Corporation | Block based sampling coding systems |
US20130013618A1 (en) * | 2008-06-06 | 2013-01-10 | Chrysalis Storage, Llc | Method of reducing redundancy between two or more datasets |
US20130036289A1 (en) * | 2010-09-30 | 2013-02-07 | Nec Corporation | Storage system |
US20130148721A1 (en) * | 2011-12-07 | 2013-06-13 | Cisco Technology, Inc. | Reference Frame Management for Screen Content Video Coding Using Hash or Checksum Functions |
US8515123B2 (en) * | 2008-07-03 | 2013-08-20 | Verimatrix, Inc. | Efficient watermarking approaches of compressed media |
US20130243089A1 (en) * | 2010-02-17 | 2013-09-19 | Electronics And Telecommucications Research Institute | Device for encoding ultra-high definition image and method thereof, and decoding device and method thereof |
US20130266078A1 (en) * | 2010-12-01 | 2013-10-10 | Vrije Universiteit Brussel | Method and device for correlation channel estimation |
US20130268621A1 (en) * | 2012-04-08 | 2013-10-10 | Broadcom Corporation | Transmission of video utilizing static content information from video source |
US20130266073A1 (en) * | 2012-04-08 | 2013-10-10 | Broadcom Corporation | Power saving techniques for wireless delivery of video |
US20130272394A1 (en) * | 2012-04-12 | 2013-10-17 | Activevideo Networks, Inc | Graphical Application Integration with MPEG Objects |
US20130279564A1 (en) * | 2012-04-20 | 2013-10-24 | Qualcomm Incorporated | Video coding with enhanced support for stream adaptation and splicing |
US20140010294A1 (en) * | 2012-07-09 | 2014-01-09 | Vid Scale, Inc. | Codec architecture for multiple layer video coding |
US20140092994A1 (en) * | 2012-09-28 | 2014-04-03 | Qualcomm Incorporated | Supplemental enhancement information message coding |
Family Cites Families (175)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US2239538A (en) | 1939-03-30 | 1941-04-22 | Zeiss Carl Fa | Photographic teleobjective |
US2718173A (en) | 1950-09-26 | 1955-09-20 | Cycloptic Anstalt Fur Optik Un | High intensity five component photographic objective |
US3059528A (en) | 1957-07-02 | 1962-10-23 | Allan Ted | Panoramic motion picture camera |
US3142236A (en) | 1961-03-08 | 1964-07-28 | American Optical Corp | Cameras and high speed optical system therefor |
CH486707A (en) | 1968-06-14 | 1970-02-28 | Voigtlaender Ag | A bright lens made up of at least four elements of the extended triplet type standing in the air |
US5016980A (en) | 1985-11-12 | 1991-05-21 | Waldron Robert D | Systems for deviating and (optionally) converging radiation |
US4918583A (en) | 1988-04-25 | 1990-04-17 | Nikon Corporation | Illuminating optical device |
US5565921A (en) * | 1993-03-16 | 1996-10-15 | Olympus Optical Co., Ltd. | Motion-adaptive image signal processing system |
US5610841A (en) | 1993-09-30 | 1997-03-11 | Matsushita Electric Industrial Co., Ltd. | Video server |
US5850312A (en) | 1993-10-22 | 1998-12-15 | Olympus Optical Co., Ltd. | Three-unit zoom lens |
JP3580869B2 (en) | 1994-09-13 | 2004-10-27 | オリンパス株式会社 | Stereoscopic endoscope |
US5774271A (en) | 1996-07-29 | 1998-06-30 | Welch Allyn, Inc. | Lamp assembly |
JP3869895B2 (en) | 1996-12-27 | 2007-01-17 | キヤノン株式会社 | Optical system with anti-vibration function |
US7206346B2 (en) | 1997-06-25 | 2007-04-17 | Nippon Telegraph And Telephone Corporation | Motion vector predictive encoding method, motion vector decoding method, predictive encoding apparatus and decoding apparatus, and storage media storing motion vector predictive encoding and decoding programs |
US6879266B1 (en) | 1997-08-08 | 2005-04-12 | Quickshift, Inc. | Memory module including scalable embedded parallel data compression and decompression engines |
JPH1166301A (en) | 1997-08-15 | 1999-03-09 | Nippon Telegr & Teleph Corp <Ntt> | Method and device for classifying color image and record medium recorded with this method |
US6895048B2 (en) | 1998-03-20 | 2005-05-17 | International Business Machines Corporation | Adaptive encoding of a sequence of still frames or partially still frames within motion video |
US6487440B2 (en) | 1998-07-08 | 2002-11-26 | Lifespex, Inc. | Optical probe having and methods for difuse and uniform light irradiation |
US6332092B1 (en) | 1998-07-08 | 2001-12-18 | Lifespex, Incorporated | Optical probe having and methods for uniform light irradiation and/or light collection over a volume |
US6400764B1 (en) | 1999-04-06 | 2002-06-04 | Koninklijke Philips Electronics N. V. | Motion estimation method featuring orthogonal-sum concurrent multi matching |
US6671407B1 (en) | 1999-10-19 | 2003-12-30 | Microsoft Corporation | System and method for hashing digital images |
JP2001228401A (en) | 2000-02-16 | 2001-08-24 | Canon Inc | Projection optical system, projection aligner by this projection optical system and method for manufacturing device |
GB0006153D0 (en) | 2000-03-14 | 2000-05-03 | Inpharmatica Ltd | Database |
CA2304433A1 (en) | 2000-04-05 | 2001-10-05 | Cloakware Corporation | General purpose access recovery scheme |
GB2364459B (en) * | 2000-06-30 | 2004-03-31 | Nokia Mobile Phones Ltd | Video error resilience |
US6938128B1 (en) | 2000-07-20 | 2005-08-30 | Silicon Graphics, Inc. | System and method for reducing memory latency during read requests |
US6915387B1 (en) | 2000-07-20 | 2005-07-05 | Silicon Graphics, Inc. | System and method for handling updates to memory in a distributed shared memory system |
US6920175B2 (en) | 2001-01-03 | 2005-07-19 | Nokia Corporation | Video coding architecture and methods for using same |
US6765963B2 (en) | 2001-01-03 | 2004-07-20 | Nokia Corporation | Video decoder architecture and method for using same |
GB2375673A (en) | 2001-05-14 | 2002-11-20 | Salgen Systems Ltd | Image compression method using a table of hash values corresponding to motion vectors |
WO2003009277A2 (en) | 2001-07-20 | 2003-01-30 | Gracenote, Inc. | Automatic identification of sound recordings |
DE10158658A1 (en) | 2001-11-30 | 2003-06-12 | Bosch Gmbh Robert | Method for directional prediction of an image block |
US6819322B2 (en) | 2002-01-04 | 2004-11-16 | Hewlett-Packard Development Company, L.P. | Method and apparatus for detecting potential lock-up conditions in a video graphics controller |
CA2574126A1 (en) * | 2002-01-18 | 2003-07-31 | Kabushiki Kaisha Toshiba | Video encoding method and apparatus and video decoding method and apparatus |
US6894289B2 (en) | 2002-02-22 | 2005-05-17 | Xenogen Corporation | Fluorescence illumination assembly for an imaging apparatus |
US6922246B2 (en) | 2002-02-22 | 2005-07-26 | Xenogen Corporation | Bottom fluorescence illumination assembly for an imaging apparatus |
JP4151374B2 (en) | 2002-03-29 | 2008-09-17 | セイコーエプソン株式会社 | Moving picture coding apparatus and moving picture coding method |
US7400774B2 (en) | 2002-09-06 | 2008-07-15 | The Regents Of The University Of California | Encoding and decoding of digital data using cues derivable at a decoder |
US20040174570A1 (en) | 2002-12-02 | 2004-09-09 | Plunkett Richard Thomas | Variable size dither matrix usage |
US7792121B2 (en) | 2003-01-03 | 2010-09-07 | Microsoft Corporation | Frame protocol and scheduling system |
JP4499370B2 (en) | 2003-04-04 | 2010-07-07 | オリンパス株式会社 | Imaging optical system and imaging apparatus using the same |
DE10316428A1 (en) | 2003-04-08 | 2004-10-21 | Carl Zeiss Smt Ag | Catadioptric reduction lens |
US8264489B2 (en) | 2003-07-11 | 2012-09-11 | Intel Corporation | Interface remoting |
US7609763B2 (en) | 2003-07-18 | 2009-10-27 | Microsoft Corporation | Advanced bi-directional predictive coding of video frames |
US20050060643A1 (en) | 2003-08-25 | 2005-03-17 | Miavia, Inc. | Document similarity detection and classification system |
US7349583B2 (en) | 2003-09-05 | 2008-03-25 | The Regents Of The University Of California | Global motion estimation image coding and processing |
EP2270447A1 (en) | 2003-10-27 | 2011-01-05 | The General Hospital Corporation | Method and apparatus for performing optical imaging using frequency-domain interferometry |
US20050105621A1 (en) | 2003-11-04 | 2005-05-19 | Ju Chi-Cheng | Apparatus capable of performing both block-matching motion compensation and global motion compensation and method thereof |
EP1542053B1 (en) | 2003-12-11 | 2007-11-14 | Tokendo | Measuring device for video-endoscopic probe |
US20040133548A1 (en) | 2003-12-15 | 2004-07-08 | Alex Fielding | Electronic Files Digital Rights Management. |
US7095568B2 (en) | 2003-12-19 | 2006-08-22 | Victor Company Of Japan, Limited | Image display apparatus |
KR100995398B1 (en) | 2004-01-20 | 2010-11-19 | 삼성전자주식회사 | Global motion compensated deinterlaing method considering horizontal and vertical patterns |
CN100440170C (en) | 2004-05-26 | 2008-12-03 | 英特尔公司 | Automatic caching generation in network applications |
US7672005B1 (en) | 2004-06-30 | 2010-03-02 | Teradici Corporation | Methods and apparatus for scan block caching |
US20060062303A1 (en) | 2004-09-17 | 2006-03-23 | Sony Corporation | Hybrid global motion estimator for video encoding |
US7526607B1 (en) | 2004-09-23 | 2009-04-28 | Juniper Networks, Inc. | Network acceleration and long-distance pattern detection using improved caching and disk mapping |
AU2005295331A1 (en) | 2004-10-15 | 2006-04-27 | The Regents Of The University Of Colorado, A Body Corporate | Revocable biometrics with robust distance metrics |
JP2006265087A (en) | 2004-12-13 | 2006-10-05 | Ohara Inc | Preform for optical element |
US20060153295A1 (en) | 2005-01-12 | 2006-07-13 | Nokia Corporation | Method and system for inter-layer prediction mode coding in scalable video coding |
KR100716999B1 (en) | 2005-06-03 | 2007-05-10 | 삼성전자주식회사 | Method for intra prediction using the symmetry of video, method and apparatus for encoding and decoding video using the same |
CN100484233C (en) * | 2005-06-03 | 2009-04-29 | 中国科学院研究生院 | Safety certification device for digital TV signal, and TV equipment with the device |
US20070025442A1 (en) | 2005-07-28 | 2007-02-01 | Sanyo Electric Co., Ltd. | Coding method for coding moving images |
US8787460B1 (en) | 2005-07-28 | 2014-07-22 | Teradici Corporation | Method and apparatus for motion vector estimation for an image sequence |
US8107527B1 (en) | 2005-07-28 | 2012-01-31 | Teradici Corporation | Progressive block encoding using region analysis |
KR101211665B1 (en) | 2005-08-12 | 2012-12-12 | 삼성전자주식회사 | Method and apparatus for intra prediction encoding and decoding of image |
JP4815944B2 (en) | 2005-08-19 | 2011-11-16 | 富士ゼロックス株式会社 | Hologram recording method and apparatus |
JP2007066191A (en) | 2005-09-01 | 2007-03-15 | Toshiba Corp | Device and method of reproduction |
FR2891685B1 (en) * | 2005-10-03 | 2008-04-18 | Envivio France Entpr Uniperson | METHOD AND DEVICE FOR MULTIPLE REFERENCE MOTION ESTIMATING, METHOD AND DEVICE FOR ENCODING, COMPUTER PROGRAM PRODUCTS, AND CORRESPONDING STORAGE MEANS. |
RU2298226C1 (en) | 2005-10-28 | 2007-04-27 | Самсунг Электроникс Ко., Лтд. | Method for improving digital images |
GB2431798A (en) | 2005-10-31 | 2007-05-02 | Sony Uk Ltd | Motion vector selection based on integrity |
US7986844B2 (en) | 2005-11-22 | 2011-07-26 | Intel Corporation | Optimized video compression using hashing function |
US20070199011A1 (en) | 2006-02-17 | 2007-08-23 | Sony Corporation | System and method for high quality AVC encoding |
US20070217702A1 (en) * | 2006-03-14 | 2007-09-20 | Sung Chih-Ta S | Method and apparatus for decoding digital video stream |
KR100763917B1 (en) | 2006-06-21 | 2007-10-05 | 삼성전자주식회사 | The method and apparatus for fast motion estimation |
US7636824B1 (en) | 2006-06-28 | 2009-12-22 | Acronis Inc. | System and method for efficient backup using hashes |
DE102006045565B3 (en) | 2006-08-04 | 2008-06-26 | Leica Camera Ag | Wide-angle viewfinder on rangefinder cameras for photographing with different focal lengths |
GB0618057D0 (en) | 2006-09-14 | 2006-10-25 | Perkinelmer Ltd | Improvements in and relating to scanning confocal microscopy |
US8443398B2 (en) | 2006-11-01 | 2013-05-14 | Skyfire Labs, Inc. | Architecture for delivery of video content responsive to remote interaction |
US8320683B2 (en) | 2007-02-13 | 2012-11-27 | Sharp Kabushiki Kaisha | Image processing method, image processing apparatus, image reading apparatus, and image forming apparatus |
US20080212687A1 (en) | 2007-03-02 | 2008-09-04 | Sony Corporation And Sony Electronics Inc. | High accurate subspace extension of phase correlation for global motion estimation |
US8494234B1 (en) | 2007-03-07 | 2013-07-23 | MotionDSP, Inc. | Video hashing system and method |
US8817878B2 (en) | 2007-11-07 | 2014-08-26 | Broadcom Corporation | Method and system for motion estimation around a fixed reference vector using a pivot-pixel approach |
KR101365444B1 (en) | 2007-11-19 | 2014-02-21 | 삼성전자주식회사 | Method and apparatus for encoding/decoding moving image efficiently through adjusting a resolution of image |
CN101904173B (en) | 2007-12-21 | 2013-04-03 | 艾利森电话股份有限公司 | Improved pixel prediction for video coding |
US8213515B2 (en) * | 2008-01-11 | 2012-07-03 | Texas Instruments Incorporated | Interpolated skip mode decision in video compression |
KR101446771B1 (en) | 2008-01-30 | 2014-10-06 | 삼성전자주식회사 | Apparatus of encoding image and apparatus of decoding image |
WO2009102013A1 (en) | 2008-02-14 | 2009-08-20 | Nec Corporation | Motion vector detection device |
JP2009230537A (en) | 2008-03-24 | 2009-10-08 | Olympus Corp | Image processor, image processing program, image processing method, and electronic equipment |
US8295617B2 (en) | 2008-05-19 | 2012-10-23 | Citrix Systems, Inc. | Systems and methods for enhanced image encoding |
GB2460844B (en) | 2008-06-10 | 2012-06-06 | Half Minute Media Ltd | Automatic detection of repeating video sequences |
US9235577B2 (en) | 2008-09-04 | 2016-01-12 | Vmware, Inc. | File transfer using standard blocks and standard-block identifiers |
US8213503B2 (en) | 2008-09-05 | 2012-07-03 | Microsoft Corporation | Skip modes for inter-layer residual video coding and decoding |
US20100119170A1 (en) | 2008-11-07 | 2010-05-13 | Yahoo! Inc. | Image compression by comparison to large database |
US20100166073A1 (en) | 2008-12-31 | 2010-07-01 | Advanced Micro Devices, Inc. | Multiple-Candidate Motion Estimation With Advanced Spatial Filtering of Differential Motion Vectors |
US8599929B2 (en) | 2009-01-09 | 2013-12-03 | Sungkyunkwan University Foundation For Corporate Collaboration | Distributed video decoder and distributed video decoding method |
WO2010086548A1 (en) | 2009-01-28 | 2010-08-05 | France Telecom | Method and device for encoding an image, method and device for decoding and corresponding computer programmes |
WO2010085899A1 (en) | 2009-02-02 | 2010-08-05 | Calgary Scientific Inc. | Image data transmission |
CN102308579B (en) | 2009-02-03 | 2017-06-06 | 汤姆森特许公司 | The method and apparatus of the motion compensation of the gradable middle use smooth reference frame of locating depth |
US7868792B2 (en) | 2009-02-05 | 2011-01-11 | Polytechnic Institute Of New York University | Generating a boundary hash-based hierarchical data structure associated with a plurality of known arbitrary-length bit strings and using the generated hierarchical data structure for detecting whether an arbitrary-length bit string input matches one of a plurality of known arbitrary-length bit springs |
US8724707B2 (en) | 2009-05-07 | 2014-05-13 | Qualcomm Incorporated | Video decoding using temporally constrained spatial dependency |
US9113169B2 (en) | 2009-05-07 | 2015-08-18 | Qualcomm Incorporated | Video encoding with temporally constrained spatial dependency for localized decoding |
US8355585B2 (en) | 2009-05-12 | 2013-01-15 | Red Hat Israel, Ltd. | Data compression of images using a shared dictionary |
US8694547B2 (en) | 2009-07-07 | 2014-04-08 | Palo Alto Research Center Incorporated | System and method for dynamic state-space abstractions in external-memory and parallel graph search |
JP2011024066A (en) * | 2009-07-17 | 2011-02-03 | Sony Corp | Image processing apparatus and method |
KR101712097B1 (en) | 2009-08-19 | 2017-03-03 | 삼성전자 주식회사 | Method and apparatus for encoding and decoding image based on flexible orthogonal transform |
US8345750B2 (en) | 2009-09-02 | 2013-01-01 | Sony Computer Entertainment Inc. | Scene change detection |
US8411750B2 (en) | 2009-10-30 | 2013-04-02 | Qualcomm Incorporated | Global motion parameter estimation using block-based motion vectors |
US8633838B2 (en) | 2010-01-15 | 2014-01-21 | Neverfail Group Limited | Method and apparatus for compression and network transport of data in support of continuous availability of applications |
AU2011207444A1 (en) | 2010-01-22 | 2012-08-09 | Duke University | Multiple window processing schemes for spectroscopic optical coherence tomography (OCT) and fourier domain low coherence interferometry |
US9237355B2 (en) | 2010-02-19 | 2016-01-12 | Qualcomm Incorporated | Adaptive motion resolution for video coding |
EP2365456B1 (en) | 2010-03-11 | 2016-07-20 | CompuGroup Medical SE | Data structure, method and system for predicting medical conditions |
US8442942B2 (en) | 2010-03-25 | 2013-05-14 | Andrew C. Leppard | Combining hash-based duplication with sub-block differencing to deduplicate data |
US8619857B2 (en) | 2010-04-09 | 2013-12-31 | Sharp Laboratories Of America, Inc. | Methods and systems for intra prediction |
EP2559239A2 (en) | 2010-04-13 | 2013-02-20 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus for intra predicting a block, apparatus for reconstructing a block of a picture, apparatus for reconstructing a block of a picture by intra prediction |
EP2559238B1 (en) * | 2010-04-13 | 2015-06-03 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Adaptive image filtering method and apparatus |
KR20110123651A (en) | 2010-05-07 | 2011-11-15 | 한국전자통신연구원 | Apparatus and method for image coding and decoding using skip coding |
US9140888B2 (en) | 2010-06-01 | 2015-09-22 | Hoya Corporation | Objective lens for endoscope, and endoscope |
US8417039B2 (en) | 2010-06-03 | 2013-04-09 | Microsoft Corporation | Motion detection techniques for improved image remoting |
CN101866366B (en) | 2010-07-15 | 2012-01-18 | 哈尔滨工业大学 | Image formula Chinese document retrieval method based on content |
GB2483294B (en) | 2010-09-03 | 2013-01-02 | Canon Kk | Method and device for motion estimation of video data coded according to a scalable coding structure |
EP3962088B1 (en) | 2010-11-04 | 2023-06-21 | GE Video Compression, LLC | Picture coding supporting block merging and skip mode |
KR20120095610A (en) * | 2011-02-21 | 2012-08-29 | 삼성전자주식회사 | Method and apparatus for encoding and decoding multi-view video |
CN103416064A (en) | 2011-03-18 | 2013-11-27 | 索尼公司 | Image-processing device, image-processing method, and program |
JP6061150B2 (en) | 2011-03-18 | 2017-01-18 | ソニー株式会社 | Image processing apparatus, image processing method, and program |
US8480743B2 (en) | 2011-03-25 | 2013-07-09 | Vicente Vanaclocha Vanaclocha | Universal disc prosthesis |
US8582886B2 (en) | 2011-05-19 | 2013-11-12 | Microsoft Corporation | Compression of text contents for display remoting |
CN103563374B (en) | 2011-05-27 | 2017-02-08 | 索尼公司 | Image-processing device and method |
US9167020B2 (en) | 2011-06-10 | 2015-10-20 | Microsoft Technology Licensing, Llc | Web-browser based desktop and application remoting solution |
US8644620B1 (en) | 2011-06-21 | 2014-02-04 | Google Inc. | Processing of matching regions in a stream of screen images |
US9521418B2 (en) | 2011-07-22 | 2016-12-13 | Qualcomm Incorporated | Slice header three-dimensional video extension for slice header prediction |
US11496760B2 (en) | 2011-07-22 | 2022-11-08 | Qualcomm Incorporated | Slice header prediction for depth maps in three-dimensional video codecs |
JP5651560B2 (en) | 2011-09-07 | 2015-01-14 | 日本放送協会 | Motion vector prediction apparatus, encoding apparatus, decoding apparatus, and programs thereof |
US10031636B2 (en) | 2011-09-08 | 2018-07-24 | Microsoft Technology Licensing, Llc | Remoting desktop displays using move regions |
US9351808B2 (en) | 2011-09-27 | 2016-05-31 | Sharon M. E. McCarthy | Apparatus for removing dental appliance and dental system |
GB2495301B (en) | 2011-09-30 | 2018-01-17 | Advanced Risc Mach Ltd | Method of and apparatus for encoding data |
US9357235B2 (en) | 2011-10-13 | 2016-05-31 | Qualcomm Incorporated | Sample adaptive offset merged with adaptive loop filter in video coding |
US9609217B2 (en) | 2011-11-02 | 2017-03-28 | Mediatek Inc. | Image-based motion sensor and related multi-purpose camera system |
US9332271B2 (en) | 2011-11-03 | 2016-05-03 | Cisco Technology, Inc. | Utilizing a search scheme for screen content video coding |
GB201119206D0 (en) | 2011-11-07 | 2011-12-21 | Canon Kk | Method and device for providing compensation offsets for a set of reconstructed samples of an image |
EP2781091B1 (en) | 2011-11-18 | 2020-04-08 | GE Video Compression, LLC | Multi-view coding with efficient residual handling |
KR101874100B1 (en) | 2011-12-02 | 2018-07-04 | 삼성전자주식회사 | Method and apparatus for encoding and decoding image |
US9223534B1 (en) | 2011-12-30 | 2015-12-29 | hopTo Inc. | Client side detection of motion vectors for cross-platform display |
WO2013103376A1 (en) | 2012-01-05 | 2013-07-11 | Intel Corporation | Device, system and method of video encoding |
US9235313B2 (en) | 2012-01-11 | 2016-01-12 | Google Inc. | Efficient motion estimation for remote desktop sharing |
US9380320B2 (en) | 2012-02-10 | 2016-06-28 | Broadcom Corporation | Frequency domain sample adaptive offset (SAO) |
US20130258052A1 (en) | 2012-03-28 | 2013-10-03 | Qualcomm Incorporated | Inter-view residual prediction in 3d video coding |
US9286862B2 (en) | 2012-04-09 | 2016-03-15 | Oracle International Corporation | System and method for detecting a scrolling event during a client display update |
US20130271565A1 (en) | 2012-04-16 | 2013-10-17 | Qualcomm Incorporated | View synthesis based on asymmetric texture and depth resolutions |
AU2012202352A1 (en) | 2012-04-20 | 2013-11-07 | Canon Kabushiki Kaisha | Method, system and apparatus for determining a hash code representing a portion of an image |
US9549180B2 (en) * | 2012-04-20 | 2017-01-17 | Qualcomm Incorporated | Disparity vector generation for inter-view prediction for video coding |
US9479776B2 (en) | 2012-07-02 | 2016-10-25 | Qualcomm Incorporated | Signaling of long-term reference pictures for video coding |
US9264713B2 (en) | 2012-07-11 | 2016-02-16 | Qualcomm Incorporated | Rotation of prediction residual blocks in video coding with transform skipping |
US9277237B2 (en) | 2012-07-30 | 2016-03-01 | Vmware, Inc. | User interface remoting through video encoding techniques |
US9467692B2 (en) | 2012-08-31 | 2016-10-11 | Qualcomm Incorporated | Intra prediction improvements for scalable video coding |
CN103841426B (en) | 2012-10-08 | 2017-04-26 | 华为技术有限公司 | Method and device for setting up motion vector list for motion vector predication |
US9225979B1 (en) | 2013-01-30 | 2015-12-29 | Google Inc. | Remote access encoding |
US11317123B2 (en) | 2013-04-25 | 2022-04-26 | Vmware, Inc. | Systems and methods for using pre-calculated block hashes for image block matching |
CN104142939B (en) | 2013-05-07 | 2019-07-02 | 杭州智棱科技有限公司 | A kind of method and apparatus based on body dynamics information matching characteristic code |
CN103281538B (en) | 2013-06-06 | 2016-01-13 | 上海交通大学 | Based on the inner frame coding method of rolling Hash and block rank infra-frame prediction |
US9210434B2 (en) | 2013-06-12 | 2015-12-08 | Microsoft Technology Licensing, Llc | Screen map and standards-based progressive codec for screen content coding |
US20140369413A1 (en) | 2013-06-18 | 2014-12-18 | Vmware, Inc. | Systems and methods for compressing video data using image block matching |
US10812694B2 (en) | 2013-08-21 | 2020-10-20 | Faro Technologies, Inc. | Real-time inspection guidance of triangulation scanner |
US20150063451A1 (en) | 2013-09-05 | 2015-03-05 | Microsoft Corporation | Universal Screen Content Codec |
JP6212345B2 (en) | 2013-10-02 | 2017-10-11 | ルネサスエレクトロニクス株式会社 | Video encoding apparatus and operation method thereof |
CN105684409B (en) | 2013-10-25 | 2019-08-13 | 微软技术许可有限责任公司 | Each piece is indicated using hashed value in video and image coding and decoding |
WO2015058395A1 (en) | 2013-10-25 | 2015-04-30 | Microsoft Technology Licensing, Llc | Hash-based block matching in video and image coding |
KR102185245B1 (en) | 2014-03-04 | 2020-12-01 | 마이크로소프트 테크놀로지 라이센싱, 엘엘씨 | Hash table construction and availability checking for hash-based block matching |
WO2015131326A1 (en) | 2014-03-04 | 2015-09-11 | Microsoft Technology Licensing, Llc | Encoder-side decisions for block flipping and skip mode in intra block copy prediction |
US10136140B2 (en) | 2014-03-17 | 2018-11-20 | Microsoft Technology Licensing, Llc | Encoder-side decisions for screen content encoding |
EP3598758B1 (en) | 2014-06-23 | 2021-02-17 | Microsoft Technology Licensing, LLC | Encoder decisions based on results of hash-based block matching |
CN106797458B (en) | 2014-07-31 | 2019-03-08 | 惠普发展公司,有限责任合伙企业 | The virtual change of real object |
AU2014408223B2 (en) | 2014-09-30 | 2019-12-05 | Microsoft Technology Licensing, Llc | Hash-based encoder decisions for video coding |
CN104574440A (en) | 2014-12-30 | 2015-04-29 | 安科智慧城市技术(中国)有限公司 | Video movement target tracking method and device |
US10390039B2 (en) | 2016-08-31 | 2019-08-20 | Microsoft Technology Licensing, Llc | Motion estimation for screen remoting scenarios |
US11095877B2 (en) | 2016-11-30 | 2021-08-17 | Microsoft Technology Licensing, Llc | Local hash-based motion estimation for screen remoting scenarios |
-
2014
- 2014-06-23 EP EP19182387.1A patent/EP3598758B1/en active Active
- 2014-06-23 KR KR1020177002065A patent/KR102287779B1/en active IP Right Grant
- 2014-06-23 WO PCT/CN2014/080481 patent/WO2015196322A1/en active Application Filing
- 2014-06-23 EP EP14895767.3A patent/EP3158751B1/en active Active
- 2014-06-23 US US15/321,536 patent/US10681372B2/en active Active
- 2014-06-23 CN CN201480048046.9A patent/CN105706450B/en active Active
Patent Citations (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7870393B2 (en) * | 1995-06-07 | 2011-01-11 | Wistaria Trading, Inc. | Steganographic method and device |
US5687236A (en) * | 1995-06-07 | 1997-11-11 | The Dice Company | Steganographic method and device |
US5613004A (en) * | 1995-06-07 | 1997-03-18 | The Dice Company | Steganographic method and device |
US7761712B2 (en) * | 1995-06-07 | 2010-07-20 | Wistaria Trading, Inc. | Steganographic method and device |
US6904110B2 (en) * | 1997-07-31 | 2005-06-07 | Francois Trans | Channel equalization system and method |
US7216232B1 (en) * | 1999-04-20 | 2007-05-08 | Nec Corporation | Method and device for inserting and authenticating a digital signature in digital data |
US8099601B2 (en) * | 1999-06-08 | 2012-01-17 | Intertrust Technologies Corp. | Methods and systems for encoding and protecting data using digital signature and watermarking techniques |
US7430670B1 (en) * | 1999-07-29 | 2008-09-30 | Intertrust Technologies Corp. | Software self-defense systems and methods |
US6983020B2 (en) * | 2002-03-25 | 2006-01-03 | Citrix Online Llc | Method and apparatus for fast block motion detection |
US20050166040A1 (en) * | 2002-12-02 | 2005-07-28 | Walmsley Simon R. | Embedding data and information related to function with which data is associated into a payload |
US8086052B2 (en) * | 2003-05-20 | 2011-12-27 | Peter Toth | Hybrid video compression method |
US20060224594A1 (en) * | 2005-04-04 | 2006-10-05 | Oracle International Corporation | Methods and systems for identifying highly contended blocks in a database |
US8041677B2 (en) * | 2005-10-12 | 2011-10-18 | Datacastle Corporation | Method and system for data backup |
US7912244B2 (en) * | 2005-10-21 | 2011-03-22 | Microsoft Corporation | Video fingerprinting using watermarks |
US7702127B2 (en) * | 2005-10-21 | 2010-04-20 | Microsoft Corporation | Video fingerprinting using complexity-regularized video watermarking by statistics quantization |
US7949186B2 (en) * | 2006-03-15 | 2011-05-24 | Massachusetts Institute Of Technology | Pyramid match kernel and related techniques |
US7747584B1 (en) * | 2006-08-22 | 2010-06-29 | Netapp, Inc. | System and method for enabling de-duplication in a storage system architecture |
US8099415B2 (en) * | 2006-09-08 | 2012-01-17 | Simply Hired, Inc. | Method and apparatus for assessing similarity between online job listings |
US20130013618A1 (en) * | 2008-06-06 | 2013-01-10 | Chrysalis Storage, Llc | Method of reducing redundancy between two or more datasets |
US20110128810A1 (en) * | 2008-06-30 | 2011-06-02 | Fujitsu Semiconductor Limited | Memory device and memory control for controlling the same |
US8515123B2 (en) * | 2008-07-03 | 2013-08-20 | Verimatrix, Inc. | Efficient watermarking approaches of compressed media |
US20110243234A1 (en) * | 2008-10-02 | 2011-10-06 | Sony Corporation | Image processing apparatus and method |
US20110311042A1 (en) * | 2008-10-23 | 2011-12-22 | University Of Ulster | Encryption method |
US20100268836A1 (en) * | 2009-03-16 | 2010-10-21 | Dilithium Holdings, Inc. | Method and apparatus for delivery of adapted media |
US20130243089A1 (en) * | 2010-02-17 | 2013-09-19 | Electronics And Telecommucications Research Institute | Device for encoding ultra-high definition image and method thereof, and decoding device and method thereof |
US20130036289A1 (en) * | 2010-09-30 | 2013-02-07 | Nec Corporation | Storage system |
US20130266078A1 (en) * | 2010-12-01 | 2013-10-10 | Vrije Universiteit Brussel | Method and device for correlation channel estimation |
US20120170653A1 (en) * | 2010-12-30 | 2012-07-05 | General Instrument Corporation | Block based sampling coding systems |
US20130148721A1 (en) * | 2011-12-07 | 2013-06-13 | Cisco Technology, Inc. | Reference Frame Management for Screen Content Video Coding Using Hash or Checksum Functions |
US20130268621A1 (en) * | 2012-04-08 | 2013-10-10 | Broadcom Corporation | Transmission of video utilizing static content information from video source |
US20130266073A1 (en) * | 2012-04-08 | 2013-10-10 | Broadcom Corporation | Power saving techniques for wireless delivery of video |
US20130272394A1 (en) * | 2012-04-12 | 2013-10-17 | Activevideo Networks, Inc | Graphical Application Integration with MPEG Objects |
US20130279564A1 (en) * | 2012-04-20 | 2013-10-24 | Qualcomm Incorporated | Video coding with enhanced support for stream adaptation and splicing |
US20140010294A1 (en) * | 2012-07-09 | 2014-01-09 | Vid Scale, Inc. | Codec architecture for multiple layer video coding |
US20140092994A1 (en) * | 2012-09-28 | 2014-04-03 | Qualcomm Incorporated | Supplemental enhancement information message coding |
Non-Patent Citations (2)
Title |
---|
ITU-T, "SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS Infrastructure of audiovisual services - Coding of moving video - High efficiency video coding" (04/2013) * |
ITU-T, "SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS Infrastructure of audiovisual services – Coding of moving video - High efficiency video coding" (04/2013) * |
Cited By (60)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11076171B2 (en) | 2013-10-25 | 2021-07-27 | Microsoft Technology Licensing, Llc | Representing blocks with hash values in video and image coding and decoding |
US10368092B2 (en) | 2014-03-04 | 2019-07-30 | Microsoft Technology Licensing, Llc | Encoder-side decisions for block flipping and skip mode in intra block copy prediction |
US10567754B2 (en) | 2014-03-04 | 2020-02-18 | Microsoft Technology Licensing, Llc | Hash table construction and availability checking for hash-based block matching |
US10681372B2 (en) | 2014-06-23 | 2020-06-09 | Microsoft Technology Licensing, Llc | Encoder decisions based on results of hash-based block matching |
US11025923B2 (en) | 2014-09-30 | 2021-06-01 | Microsoft Technology Licensing, Llc | Hash-based encoder decisions for video coding |
US10291932B2 (en) * | 2015-03-06 | 2019-05-14 | Qualcomm Incorporated | Method and apparatus for low complexity quarter pel generation in motion search |
US20160261882A1 (en) * | 2015-03-06 | 2016-09-08 | Qualcomm Incorporated | Method and apparatus for low complexity quarter pel generation in motion search |
US9875552B1 (en) * | 2016-07-26 | 2018-01-23 | Teradici Corporation | Content independent method of motion determination using sparse matrices |
US10390039B2 (en) | 2016-08-31 | 2019-08-20 | Microsoft Technology Licensing, Llc | Motion estimation for screen remoting scenarios |
US10672113B2 (en) * | 2016-11-09 | 2020-06-02 | AI Analysis, Inc. | Methods and systems for normalizing images |
US11095877B2 (en) | 2016-11-30 | 2021-08-17 | Microsoft Technology Licensing, Llc | Local hash-based motion estimation for screen remoting scenarios |
US10291928B2 (en) * | 2017-01-10 | 2019-05-14 | Blackberry Limited | Methods and devices for inter-prediction using motion vectors for video coding |
US20190349593A1 (en) * | 2017-02-02 | 2019-11-14 | Hewlett-Packard Development Company, L.P. | Video compression |
US11134253B2 (en) * | 2017-02-02 | 2021-09-28 | Hewlett-Packard Development Company, L.P. | Video compression |
US11343494B2 (en) * | 2018-06-13 | 2022-05-24 | Huawei Technologies Co., Ltd. | Intra sharpening and/or de-ringing filter for video coding |
US11146786B2 (en) | 2018-06-20 | 2021-10-12 | Beijing Bytedance Network Technology Co., Ltd. | Checking order of motion candidates in LUT |
US11159817B2 (en) | 2018-06-29 | 2021-10-26 | Beijing Bytedance Network Technology Co., Ltd. | Conditions for updating LUTS |
US11159807B2 (en) | 2018-06-29 | 2021-10-26 | Beijing Bytedance Network Technology Co., Ltd. | Number of motion candidates in a look up table to be checked according to mode |
US11706406B2 (en) | 2018-06-29 | 2023-07-18 | Beijing Bytedance Network Technology Co., Ltd | Selection of coded motion information for LUT updating |
US11134267B2 (en) | 2018-06-29 | 2021-09-28 | Beijing Bytedance Network Technology Co., Ltd. | Update of look up table: FIFO, constrained FIFO |
US11140385B2 (en) | 2018-06-29 | 2021-10-05 | Beijing Bytedance Network Technology Co., Ltd. | Checking order of motion candidates in LUT |
US12058364B2 (en) * | 2018-06-29 | 2024-08-06 | Beijing Bytedance Network Technology Co., Ltd. | Concept of using one or multiple look up tables to store motion information of previously coded in order and use them to code following blocks |
US11877002B2 (en) | 2018-06-29 | 2024-01-16 | Beijing Bytedance Network Technology Co., Ltd | Update of look up table: FIFO, constrained FIFO |
US11146785B2 (en) | 2018-06-29 | 2021-10-12 | Beijing Bytedance Network Technology Co., Ltd. | Selection of coded motion information for LUT updating |
US11895318B2 (en) * | 2018-06-29 | 2024-02-06 | Beijing Bytedance Network Technology Co., Ltd | Concept of using one or multiple look up tables to store motion information of previously coded in order and use them to code following blocks |
US11153557B2 (en) | 2018-06-29 | 2021-10-19 | Beijing Bytedance Network Technology Co., Ltd. | Which LUT to be updated or no updating |
US11909989B2 (en) | 2018-06-29 | 2024-02-20 | Beijing Bytedance Network Technology Co., Ltd | Number of motion candidates in a look up table to be checked according to mode |
US11973971B2 (en) | 2018-06-29 | 2024-04-30 | Beijing Bytedance Network Technology Co., Ltd | Conditions for updating LUTs |
US11528500B2 (en) | 2018-06-29 | 2022-12-13 | Beijing Bytedance Network Technology Co., Ltd. | Partial/full pruning when adding a HMVP candidate to merge/AMVP |
US11695921B2 (en) | 2018-06-29 | 2023-07-04 | Beijing Bytedance Network Technology Co., Ltd | Selection of coded motion information for LUT updating |
US11528501B2 (en) | 2018-06-29 | 2022-12-13 | Beijing Bytedance Network Technology Co., Ltd. | Interaction between LUT and AMVP |
US12034914B2 (en) | 2018-06-29 | 2024-07-09 | Beijing Bytedance Network Technology Co., Ltd | Checking order of motion candidates in lut |
US11245892B2 (en) | 2018-06-29 | 2022-02-08 | Beijing Bytedance Network Technology Co., Ltd. | Checking order of motion candidates in LUT |
US20200195959A1 (en) * | 2018-06-29 | 2020-06-18 | Beijing Bytedance Network Technology Co., Ltd. | Concept of using one or multiple look up tables to store motion information of previously coded in order and use them to code following blocks |
US11463685B2 (en) | 2018-07-02 | 2022-10-04 | Beijing Bytedance Network Technology Co., Ltd. | LUTS with intra prediction modes and intra mode prediction from non-adjacent blocks |
US11134244B2 (en) | 2018-07-02 | 2021-09-28 | Beijing Bytedance Network Technology Co., Ltd. | Order of rounding and pruning in LAMVR |
US11153559B2 (en) | 2018-07-02 | 2021-10-19 | Beijing Bytedance Network Technology Co., Ltd. | Usage of LUTs |
US11153558B2 (en) | 2018-07-02 | 2021-10-19 | Beijing Bytedance Network Technology Co., Ltd. | Update of look-up tables |
US11134243B2 (en) | 2018-07-02 | 2021-09-28 | Beijing Bytedance Network Technology Co., Ltd. | Rules on updating luts |
US20200413044A1 (en) | 2018-09-12 | 2020-12-31 | Beijing Bytedance Network Technology Co., Ltd. | Conditions for starting checking hmvp candidates depend on total number minus k |
US11159787B2 (en) | 2018-09-12 | 2021-10-26 | Beijing Bytedance Network Technology Co., Ltd. | Conditions for starting checking HMVP candidates depend on total number minus K |
US11997253B2 (en) | 2018-09-12 | 2024-05-28 | Beijing Bytedance Network Technology Co., Ltd | Conditions for starting checking HMVP candidates depend on total number minus K |
US20210297659A1 (en) | 2018-09-12 | 2021-09-23 | Beijing Bytedance Network Technology Co., Ltd. | Conditions for starting checking hmvp candidates depend on total number minus k |
US11290714B2 (en) * | 2018-11-15 | 2022-03-29 | Korea Electronics Technology Institute | Motion-constrained AV1 encoding method and apparatus for tiled streaming |
US20210329291A1 (en) * | 2019-01-02 | 2021-10-21 | Beijing Bytedance Network Technology Co., Ltd. | Hash-based motion searching |
US11805274B2 (en) | 2019-01-02 | 2023-10-31 | Beijing Bytedance Network Technology Co., Ltd | Early determination of hash-based motion searching |
US11558638B2 (en) * | 2019-01-02 | 2023-01-17 | Beijing Bytedance Network Technology Co., Ltd. | Hash-based motion searching |
US11616978B2 (en) * | 2019-01-02 | 2023-03-28 | Beijing Bytedance Network Technology Co., Ltd. | Simplification of hash-based motion searching |
US11589071B2 (en) | 2019-01-10 | 2023-02-21 | Beijing Bytedance Network Technology Co., Ltd. | Invoke of LUT updating |
US11140383B2 (en) | 2019-01-13 | 2021-10-05 | Beijing Bytedance Network Technology Co., Ltd. | Interaction between look up table and shared merge list |
US11909951B2 (en) | 2019-01-13 | 2024-02-20 | Beijing Bytedance Network Technology Co., Ltd | Interaction between lut and shared merge list |
US11962799B2 (en) | 2019-01-16 | 2024-04-16 | Beijing Bytedance Network Technology Co., Ltd | Motion candidates derivation |
US11956464B2 (en) | 2019-01-16 | 2024-04-09 | Beijing Bytedance Network Technology Co., Ltd | Inserting order of motion candidates in LUT |
US11641483B2 (en) | 2019-03-22 | 2023-05-02 | Beijing Bytedance Network Technology Co., Ltd. | Interaction between merge list construction and other tools |
US11202085B1 (en) | 2020-06-12 | 2021-12-14 | Microsoft Technology Licensing, Llc | Low-cost hash table construction and hash-based block matching for variable-size blocks |
US11546617B2 (en) * | 2020-06-30 | 2023-01-03 | At&T Mobility Ii Llc | Separation of graphics from natural video in streaming video content |
US12026919B2 (en) * | 2021-12-22 | 2024-07-02 | Red Hat, Inc. | Content-based encoding of digital images |
US20230196623A1 (en) * | 2021-12-22 | 2023-06-22 | Red Hat, Inc. | Content-based encoding of digital images |
US11956441B2 (en) * | 2021-12-23 | 2024-04-09 | Ati Technologies Ulc | Identifying long term reference frame using scene detection and perceptual hashing |
US20230209064A1 (en) * | 2021-12-23 | 2023-06-29 | Ati Technologies Ulc | Identifying long term reference frame using scene detection and perceptual hashing |
Also Published As
Publication number | Publication date |
---|---|
KR102287779B1 (en) | 2021-08-06 |
EP3158751A1 (en) | 2017-04-26 |
KR20170021337A (en) | 2017-02-27 |
EP3598758A1 (en) | 2020-01-22 |
CN105706450B (en) | 2019-07-16 |
WO2015196322A1 (en) | 2015-12-30 |
US10681372B2 (en) | 2020-06-09 |
CN105706450A (en) | 2016-06-22 |
EP3158751B1 (en) | 2019-07-31 |
EP3598758B1 (en) | 2021-02-17 |
EP3158751A4 (en) | 2017-07-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11736701B2 (en) | Hash-based encoder decisions for video coding | |
US10681372B2 (en) | Encoder decisions based on results of hash-based block matching | |
US11979600B2 (en) | Encoder-side search ranges having horizontal bias or vertical bias | |
US11303889B2 (en) | Encoder-side decisions for sample adaptive offset filtering | |
US10136140B2 (en) | Encoder-side decisions for screen content encoding | |
US10567754B2 (en) | Hash table construction and availability checking for hash-based block matching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LI, BIN;XU, JIZHENG;REEL/FRAME:041065/0990 Effective date: 20160309 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |