Nothing Special   »   [go: up one dir, main page]

WO2010021665A1 - Hypothetical reference decoder - Google Patents

Hypothetical reference decoder Download PDF

Info

Publication number
WO2010021665A1
WO2010021665A1 PCT/US2009/004544 US2009004544W WO2010021665A1 WO 2010021665 A1 WO2010021665 A1 WO 2010021665A1 US 2009004544 W US2009004544 W US 2009004544W WO 2010021665 A1 WO2010021665 A1 WO 2010021665A1
Authority
WO
WIPO (PCT)
Prior art keywords
bitstream
parameters
modified
picture
modifying
Prior art date
Application number
PCT/US2009/004544
Other languages
French (fr)
Inventor
Jiancong Luo
Lihua Zhu
Cristina Gomila
Original Assignee
Thomson Licensing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing filed Critical Thomson Licensing
Publication of WO2010021665A1 publication Critical patent/WO2010021665A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/161Encoding, multiplexing or demultiplexing different image signal components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/172Processing image signals image signals comprising non-image signal components, e.g. headers or format information
    • H04N13/178Metadata, e.g. disparity information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/194Transmission of image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/243Image signal generators using stereoscopic image cameras using three or more 2D image sensors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234327Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by decomposing into layers, e.g. base layer and one or more enhancement layers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/24Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests
    • H04N21/2404Monitoring of server processing errors or hardware failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • H04N21/2665Gathering content from different sources, e.g. Internet and satellite
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8451Structuring of content, e.g. decomposing content into time segments using Advanced Video Coding [AVC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N2213/00Details of stereoscopic systems
    • H04N2213/005Aspects relating to the "3D+depth" image format

Definitions

  • TECHNICAL FIELD Implementations are described that relate to coding systems. Various particular implementations relate to a hypothetical reference decoder.
  • HRD Hypothetical reference decoder
  • An HRD generally presents a set of requirements on the bitstream.
  • An HRD verifier may include software and/or hardware and is typically used to verify conformance of a bitstream to the requirements by examining the bitstream, detecting whether any HRD errors exist and, if so, reporting such errors.
  • video compression standards such as MPEG-1 , MPEG-2,
  • H.264/MPEG-4 AVC Standard or variations thereof, such as “H.264”, the “AVC Standard” or simply “AVC”
  • HRD hypothetical reference decoder
  • VBV video buffer verifier
  • the HRD specifies rules that bitstreams generated by a video encoder adhere to for such an encoder to be considered conformant under a given standard.
  • HRD is typically a normative part of video coding standards and, hence, any bitstream under a given standard has to adhere to the HRD rules and constraints, and a real decoder can assume that such rules have been conformed with and such constraints have been met.
  • a bitstream including data that has been encoded and parameters describing how to decode the encoded data, is accessed. It is determined whether the bitstream is not compliant with a standard. One or more of the parameters are modified to produce modified parameters. A modified bitstream is produced that is compliant with the standard. The modified bitstream includes the modified parameters.
  • implementations may be configured or embodied in various manners.
  • an implementation may be performed as a method, or embodied as apparatus, such as, for example, an apparatus configured to perform a set of operations or an apparatus storing instructions for performing a set of operations, or embodied in a signal.
  • apparatus such as, for example, an apparatus configured to perform a set of operations or an apparatus storing instructions for performing a set of operations, or embodied in a signal.
  • Figure 1 is a diagram of an implementation of a hypothetical reference decoder (HRD) verifier system.
  • HRD hypothetical reference decoder
  • Figure 2 is a diagram of an implementation of an H.264 HRD buffer model.
  • Figure 3 is a diagram of an implementation of an HRD verifier.
  • Figure 4 is a diagram of an implementation of an HRD verifier with a patch feature.
  • Figure 5 is a diagram of an implementation of a video transmission system.
  • Figure 6 is a diagram of an implementation of a video receiving system.
  • Figure 7 is a diagram of an implementation of a video processing device.
  • Figure 8 is a diagram of an implementation of a system for transmitting and receiving multi-view video with depth information.
  • Figure 10 is a diagram of an implementation of a hypothetical reference decoder (HRD) conformance error correction process.
  • HRD hypothetical reference decoder
  • Figure 11 is a diagram of an implementation of a parallel encoding system.
  • HRD errors often happen in applications that require stream splicing or splitting. These applications include parallel encoding, segment re-encoding, advertisement insertion, and so forth. Actually, when independently created HRD compliant streams are concatenated, it is generally not possible to guarantee that the spliced stream will also be HRD-compliant.
  • Parallel encoding is frequently used in storage, broadcasting, and internet transmission to improve the working performance.
  • a sequence is cut into segments.
  • these segments are typically encoded simultaneously, or at least during partially overlapping time periods. More generally, parallel encoding occurs when these segments are encoded separately, regardless of the actual time during which they are encoded.
  • the final bitstream is obtained by concatenating the segmented bitstreams.
  • some information needed to set-up the HRD parameter to encode a scene is dependent upon the buffer status at the end of the preceding scene. For example, setting up the initial coded picture buffer removal delay requires having access to the value of the final arrival time of the last picture of the preceding scene.
  • Advertisement insertion has a similar problem, since the inserted advertisement usually is encoded individually with respect to the program to which the advertisement bitstream will be inserted. Although similar, advertisement insertion and parallel encoding do present differences. Thus, an HRD verifier is used to check the HRD compliancy of the compressed bitstream. An HRD verifier is often used in various products such as encoders, multiplexers, conformance analyzers, and so forth.
  • An HRD verifier typically takes as input a bitstream, and the result indicates whether the bitstream is compliant or incompliant, the type of the error if there is any, and the location in the bitstream where the error occurs. To correct the HRD errors, the bitstream usually needs to be re-encoded with modified parameters.
  • HRD hypothetical reference decoder
  • the proposed methods and apparatuses can be applied to HRD verifiers, bitstream analyzers, and/or multiplexers to obtain a compliant bitstream.
  • the present principles may be implemented in a decoder that is performing HRD-type activities prior to, for example, re-transmission and/or storage.
  • One problem addressed by at least one disclosed implementation is a non-compliant bitstream.
  • methods and apparatuses are provided that convert a HRD non-conformant AVC bitstream into an HRD conformant bitstream without re-encoding the full sequence. In at least one embodiment, this is done by replacing the buffering period and picture timing parameters that cause the HRD errors with the correct parameters derived based on the variation in bit rate over the time in the bitstream; and/or by replacing parts of the bitstream with the new bitstreams.
  • HRD or VBV is a normative part of most of the recent video compression standards.
  • the rules and operations of HRD and VBD are defined in the standard specifications.
  • HRD is defined in Annex C of H.264
  • VBV is defined in the Annex C of MPEG-2.
  • FIG. 1 shows an exemplary hypothetical reference decoder (HRD) verifier system 100 to which the present principles may be applied, in accordance with an embodiment of the present principles.
  • the HRD verifier system 100 commences operation at a step 105.
  • a video encoder encodes the bitstream at a given bit rate.
  • the HRD verifier 104 examines the bitstream.
  • step 125 one or more parameters are modified, and the encoder will be run again with the modified bit rate per step 110. Thus, the process will continue until the output bitstream is identified as conformant to the HRD requirements.
  • implementations of the present principles may involve, for example, the manual and/or automatic modification of encoding parameters.
  • a human operator may modify the encoding parameters.
  • the HRD verifier 104 is configured to propose new parameters in an automated way (e.g., percentage by which the bit rate should be lowered).
  • H.264 decoding data buffering occurs in order to maintain a transmission bit rate and video presentation rate.
  • Pictures vary greatly in the amount ofdata used to encode them.
  • the decoding buffer constantly fluctuates with respect to how much data it contains, and this is referred to as the buffer fullness.
  • the buffer fullness is carefully monitored to avoid overflow and underflow.
  • the encoder mimics the decoder buffer's behavior with a "virtual buffer”.
  • the virtual buffer attaches at the output of the encoder, and employs a mathematical equation(s) to determine how much data is entering and leaving the buffer at a given encode rate and a given buffer size. This virtual decoder is referred to as the hypothetical reference decoder (HRD).
  • HRD hypothetical reference decoder
  • FIG. 2 shows an exemplary H.264 hypothetical reference decoder (HRD) buffer model 200 to which the present principles may be applied, in accordance with an embodiment of the present principles.
  • the H.264 HRD buffer model 200 includes a hypothetical stream scheduler (HSS) 205, a coded picture buffer (CPB) 210, an instantaneous decoding process 215, and a decoded picture buffer (DPB) 220 as shown in Figure 2.
  • the H.264 HRD buffer model 200 operates as follows. Data associated with access units that flow into the CPB 210 according to a specified arrival schedule are delivered by the HSS 205. The data associated with each access unit is removed and decoded instantaneously by the instantaneous decoding process 215 at CPB removal times.
  • Each decoded picture is placed in the DPB 220 at its CPB removal time unless it is output at its CPB removal time and is a non-reference picture.
  • the picture is removed from the DPB 220 at the later of the DPB output time or the time that it is marked as "unused for reference”.
  • HSS and HRD information concerning the number of enumerated delivery schedules and their associated bit rates and buffer sizes is specified in video user information (VUI).
  • VUI video user information
  • the HRD is initialized as specified by the buffering period SEI message.
  • the removal timing of access units from the CPB 210 and output timing from the DPB 220 are specified in the picture timing SEI message. All timing information relating to a specific access unit shall arrive prior to the CPB removal time of the access unit.
  • a HRD conformance verifier is, for example, software and/or hardware that analyzes the bitstream based on the HRD rules and constraints and indicates the conformity of the bitstream.
  • FIG. 3 shows an exemplary HRD verifier 300 to which the present principles may be applied, in accordance with an embodiment of the present principle.
  • the HRD verifier 300 includes a bitstream parser 310, a CPB arrival and removal time calculator 320, and a constraint checker 330.
  • the bitstream parser 310 takes as input the compressed bitstream 301 and extracts the buffer size and maximum bit rate information 311 from the sequence parameters set (SPS), and the picture size 312 and picture timing information 313 from the picture timing SEI message.
  • the information is used to calculate the CPB arrival time and removal time of each picture, by the CPB arrival and removal time calculator 320.
  • the initial arrival time 321 , final arrival time 322 and removal time 323 of each picture and the initial CPB removal delay of each buffering period 314 are inputted to the constraint checker 330.
  • the conformance indicator 331 indicates any constraint violation.
  • an HRD verifier in accordance with the present principles can check bitstream compliancy, and is also able to correct some types of errors in the bitstream and indicate a subset of the bitstream that is to be re-encoded.
  • an HRD verifier in accordance with the present principles avoids re-encoding the full bitstream when the bitstream is incompliant.
  • Figure 4 shows an exemplary HRD verifier 400 with a patch feature, to which the present principles may be applied, in accordance with an embodiment of the present principle.
  • the HRD verifier 400 includes a bitstream parser 310, a CPB arrival and removal time calculator 320, a constraint checker 330, and a bitstream patcher 440.
  • the violation type 432 indicates which rule is violated.
  • the bitstream patcher 440 will patch the bitstream based on the violation type, by changing the picture timing in the picture timing SEI and the initial CPB removal delay (and delay offset) in the buffering period SEI, in order to provide a conformant bitstream 442.
  • "patch feature” refers to an ability of the HRD verifier 400 to patch (i.e., correct) a detected HRD error in a bitstream. This may also be referred to as a "rewriting feature” because the HRD verifier 400 recalculates the HRD parameters and writes them back to the bitstream.
  • FIG. 5 shows an exemplary video transmission system 500, to which the present principles may be applied, in accordance with an implementation of the present principles.
  • the video transmission system 500 may be, for example, a head-end or transmission system for transmitting a signal using any of a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast.
  • the transmission may be provided over the Internet or some other network.
  • the video transmission system 500 is capable of generating and delivering video content encoded using inter-view skip mode with depth. This is achieved by generating an encoded signal(s) including depth information or information capable of being used to synthesize the depth information at a receiver end that may, for example, have a decoder.
  • the video transmission system 500 includes an encoder 510 and a transmitter 520 capable of transmitting the encoded signal.
  • the encoder 510 receives video information and generates an encoded signal(s) there from using inter-view skip mode with depth.
  • the encoder 510 may be, for example, the encoder 500 described in detail above.
  • the encoder 510 may include sub-modules, including for example an assembly unit for receiving and assembling various pieces of information into a structured format for storage or transmission.
  • the various pieces of information may include, for example, coded or uncoded video, coded or uncoded depth information, and coded or uncoded elements such as, for example, motion vectors, coding mode indicators, and syntax elements.
  • the transmitter 520 may be, for example, adapted to transmit a program signal having one or more bitstreams representing encoded pictures and/or information related thereto. Typical transmitters perform functions such as, for example, one or more of providing error-correction coding, interleaving the data in the signal, randomizing the energy in the signal, and modulating the signal onto one or more carriers.
  • the transmitter may include, or interface with, an antenna (not shown). Accordingly, implementations of the transmitter 520 may include, or be limited to, a modulator.
  • FIG. 6 shows an exemplary video receiving system 600 to which the present principles may be applied, in accordance with an embodiment of the present principles.
  • the video receiving system 600 may be configured to receive signals over a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast.
  • the signals may be received over the Internet or some other network.
  • the video receiving system 600 may be, for example, a cell-phone, a computer, a set-top box, a television, or other device that receives encoded video and provides, for example, decoded video for display to a user or for storage.
  • the video receiving system 600 may provide its output to, for example, a screen of a television, a computer monitor, a computer (for storage, processing, or display), or some other storage, processing, or display device.
  • the video receiving system 600 is capable of receiving and processing video content including video information.
  • the video receiving system 600 includes a receiver 610 capable of receiving an encoded signal, such as for example the signals described in the implementations of this application, and a decoder 620 capable of decoding the received signal.
  • the receiver 610 may be, for example, adapted to receive a program signal having a plurality of bitstreams representing encoded pictures. Typical receivers perform functions such as, for example, one or more of receiving a modulated and encoded data signal, demodulating the data signal from one or more carriers, de-randomizing the energy in the signal, de-interleaving the data in the signal, and error-correction decoding the signal.
  • the receiver 610 may include, or interface with, an antenna (not shown). Implementations of the receiver 610 may include, or be limited to, a demodulator.
  • the decoder 620 outputs video signals including video information and depth information.
  • the decoder 620 may be, for example, the decoder 600 described in detail above.
  • FIG. 7 shows an exemplary video processing device 700 to which the present principles may be applied, in accordance with an embodiment of the present principles.
  • the video processing device 700 may be, for example, a set top box or other device that receives encoded video and provides, for example, decoded video for display to a user or for storage.
  • the video processing device 700 may provide its output to a television, computer monitor, or a computer or other processing device.
  • the video processing device 700 includes a front-end (FE) device 705 and a decoder 710.
  • the front-end device 705 may be, for example, a receiver adapted to receive a program signal having a plurality of bitstreams representing encoded pictures, and to select one or more bitstreams for decoding from the plurality of bitstreams.
  • Typical receivers perform functions such as, for example, one or more of receiving a modulated and encoded data signal, demodulating the data signal, decoding one or more encodings (for example, channel coding and/or source coding) of the data signal, and/or error-correcting the data signal.
  • the front-end device 705 may receive the program signal from, for example, an antenna (not shown).
  • the front-end device 705 provides a received data signal to the decoder 710.
  • the decoder 710 receives a data signal 720.
  • the data signal 720 may include, for example, one or more Advanced Video Coding (AVC), Scalable Video Coding (SVC), or Multi-view Video Coding (MVC) compatible streams.
  • AVC refers more specifically to the existing International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 Recommendation (hereinafter the "H.264/MPEG-4 AVC Standard” or variations thereof, such as the "AVC standard” or simply "AVC”).
  • MVC refers more specifically to a multi-view video coding ("MVC") extension (Annex H) of the AVC standard, referred to as H.264/MPEG-4 AVC, MVC extension (the "MVC extension” or simply “MVC”).
  • MVC refers more specifically to a scalable video coding (“SVC”) extension
  • the decoder 710 decodes all or part of the received signal 720 and provides as output a decoded video signal 730.
  • the decoded video 730 is provided to a selector 750.
  • the device 700 also includes a user interface 760 that receives a user input 770.
  • the user interface 760 provides a picture selection signal 780, based on the user input 770, to the selector 750.
  • the picture selection signal 780 and the user input 770 indicate which of multiple pictures, sequences, scalable versions, views, or other selections of the available decoded data a user desires to have displayed.
  • the selector 750 provides the selected picture(s) as an output 790.
  • the selector 750 uses the picture selection information 780 to select which of the pictures in the decoded video 730 to provide as the output 790.
  • the selector 750 includes the user interface 760, and in other implementations no user interface 760 is needed because the selector 750 receives the user input 770 directly without a separate interface function being performed.
  • the selector 750 may be implemented in software or as an integrated circuit, for example.
  • the selector 750 is incorporated with the decoder 710, and in another implementation, the decoder 710, the selector 750, and the user interface 760 are all integrated.
  • front-end 705 receives a broadcast of various television shows and selects one for processing. The selection of one show is based on user input of a desired channel to watch.
  • front-end device 705 receives the user input 770.
  • the front-end 705 receives the broadcast and processes the desired show by demodulating the relevant part of the broadcast spectrum, and decoding any outer encoding of the demodulated show.
  • the front-end 705 provides the decoded show to the decoder 710.
  • the decoder 710 is an integrated unit that includes devices 760 and 750.
  • the decoder 710 thus receives the user input, which is a user-supplied indication of a desired view to watch in the show.
  • the decoder 710 decodes the selected view, as well as any required reference pictures from other views, and provides the decoded view 790 for display on a television (not shown).
  • the user may desire to switch the view that is displayed and may then provide a new input to the decoder 710.
  • the decoder 710 decodes both the old view and the new view, as well as any views that are in between the old view and the new view. That is, the decoder 710 decodes any views that are taken from cameras that are physically located in between the camera taking the old view and the camera taking the new view.
  • the front-end device 705 also receives the information identifying the old view, the new view, and the views in between. Such information may be provided, for example, by a controller (not shown in Figure 7) having information about the locations of the views, or the decoder 710.
  • Other implementations may use a front-end device that has a controller integrated with the front-end device.
  • the decoder 710 provides all of these decoded views as output 790.
  • a post-processor (not shown in Figure 7) interpolates between the views to provide a smooth transition from the old view to the new view, and displays this transition to the user. After transitioning to the new view, the post-processor informs (through one or more communication links not shown) the decoder 710 and the front-end device 705 that only the new view is needed. Thereafter, the decoder 710 only provides as output 790 the new view.
  • the system 700 may be used to receive multiple views of a sequence of images, and to present a single view for display, and to switch between the various views in a smooth manner.
  • the smooth manner may involve interpolating between views to move to another view.
  • the system 700 may allow a user to rotate an object or scene, or otherwise to see a three-dimensional representation of an object or a scene.
  • the rotation of the object for example, may correspond to moving from view to view, and interpolating between the views to obtain a smooth transition between the views or simply to obtain a three-dimensional representation. That is, the user may "select" an interpolated view as the "view" that is to be displayed.
  • 3D Video is a new framework that includes a coded representation for multiple view video and depth information and targets the generation of high-quality 3D rendering at the receiver. This enables 3D visual experiences with auto-multiscopic displays.
  • Figure 8 shows an exemplary system 800 for transmitting and receiving multi-view video with depth information, to which the present principles may be applied, according to an embodiment of the present principles.
  • video data is indicated by a solid line
  • depth data is indicated by a dashed line
  • meta data is indicated by a dotted line.
  • the system 800 may be, for example, but is not limited to, a free-viewpoint television system.
  • the system 800 includes a three-dimensional (3D) content producer 820, having a plurality of inputs for receiving one or more of video, depth, and meta data from a respective plurality of sources.
  • 3D three-dimensional
  • Such sources may include, but are not limited to, a stereo camera 811 , a depth camera 812, a multi-camera setup 813, and 2-dimensional/3-dimensional (2D/3D) conversion processes 814.
  • One or more networks 830 may be used for transmit one or more of video, depth, and meta data relating to multi-view video coding (MVC) and digital video broadcasting (DVB).
  • MVC multi-view video coding
  • DVD digital video broadcasting
  • a depth image-based renderer 850 performs depth image-based rendering to project the signal to various types of displays.
  • the depth image-based renderer 850 is capable of receiving display configuration information and user preferences.
  • An output of the depth image-based renderer 650 may be provided to one or more of a 2D display 861, an M-view 3D display 862, and/or a head-tracked stereo display 863.
  • the framework 900 involves an auto-stereoscopic 3D display 910, which supports output of multiple views, a first depth image-based renderer 920, a second depth image-based renderer 930, and a buffer for decoded data 940.
  • the decoded data is a representation known as Multiple View plus Depth (MVD) data.
  • MVD Multiple View plus Depth
  • the nine cameras are denoted by V1 through V9.
  • Corresponding depth maps for the three input views are denoted by D1 , D5, and D9.
  • Any virtual camera positions in between the captured camera positions e.g., Pos 1 , Pos 2, Pos 3) can be generated using the available depth maps (D1 , D5, D9), as shown in Figure 9.
  • the baseline between the actual cameras (V1 , V5 and V9) used to capture data can be large.
  • the correlation between these cameras is significantly reduced and coding efficiency of these cameras may suffer since the coding efficiency would only rely on temporal correlation.
  • views 2, 3, 4, 6, 7 and 8 are skipped in coding.
  • the decoder will need view 1 , 5 and 9 and their depth images. Note that the depth images for view 5 are obtained for rendering the missed views. On the other hand, in order to encode view 1 and view 9 based on view 5, a different depth image for view 5 may be identified. Thus, two different depth images for view 5 need to be coded. In at least one implementation, we propose to code the refinement on the original depth images for coding other views.
  • an HRD verifier for H.264/MPEG-4 AVC bitstream has two features: conformance verification; and HRD error correction.
  • the primary goal of an HRD verifier is to check the HRD conformance of an input bitstream.
  • the verifier does not need to actually decode the VCL-NAL unit. Instead, the VCL-NAL unit will be by-passed and only the size of the NAL unit is counted.
  • the non-VCL NAL unit including HRD parameters such as, for example, the sequence parameter set (SPS), Buffering Period supplemental enhancement information (SEI), and Picture Timing SEI, will be parsed and the HRD parameters will be extracted in order to calculate the timing and buffer status.
  • SPS sequence parameter set
  • SEI Buffering Period supplemental enhancement information
  • Picture Timing SEI Picture Timing SEI
  • the verifier When the beginning of a new access unit (AU) is identified, the verifier will (1) update the timing and buffer status based on the size of the previous AU, and (2) check for any HRD violation. 1.1 Update the HRD scheduler
  • the HRD monitors picture CPB removal time (time to decode a picture), initial arrival time (time when the first bit of a picture enters the CPB), final arrival time (time when the last bit of a picture entered the CPB), and buffer fullness. Based on these values, the HRD detects buffer overflow, underflow, and other violations.
  • variable tc is derived as follows and is called a clock tick:
  • the nominal removal time of the access unit from the CPB is specified by the following:
  • cpb_removal_delay( n ) is extracted from the picture timing SEI, and n b is the 1 st AU of the current buffering period.
  • the nominal removal time of the access unit from the CPB is specified by the following:
  • t r , n ( n b ) is the nominal removal time of the first access unit of the previous buffering period
  • cpb_removal_delay( n ) is the value of cpb_removal_delay specified in the picture timing SEI message associated with access unit n.
  • the initial arrival time is determined as follows:
  • tai( n ) Max( t af ( n - 1 ), tai. ⁇ am ⁇ st( n ) ) where t a , e ari ⁇ est( n ) is derived as follows
  • t a( eari ⁇ est( n ) is derived as follows
  • ta ⁇ eari ⁇ est( n ) t r n ( n ) - ( ⁇ n ⁇ t ⁇ al_cpb_removal_delay[ SchedSelldx ] + ⁇ n ⁇ t ⁇ al_cpb_removal_delay_offset[ SchedSelldx ] ) / 90000 with ⁇ n ⁇ t ⁇ al_cpb_removal_delay[ SchedSelldx ] and ⁇ n ⁇ t ⁇ al_cpb_removal_delay_offset[ SchedSelldx ] being specified in the previous buffering period SEI message
  • t a ⁇ e ari ⁇ est( n ) is derived as follows
  • the final arrival time for AU n is derived by the following
  • t af ( n ) ta.( n ) + b( n ) / B ⁇ tRate[ SchedSelldx ]
  • Type I bitstream is a NAL unit stream containing only the VCL NAL units and filler data NAL units for all access units in the bitstream
  • Type Il bitstream contains, in addition to the VCL NAL units and filler data NAL units for all access units in the bitstream, at least one of the following additional non-VCL NAL units other than filler data NAL units all leading_zero_8bits, zero_byte, start_code_prefix_one_3bytes, and trailing_zero_8bits syntax elements that form a byte stream from the NAL unit stream (as specified in Annex B of the
  • the CPB fullness is derived by the following:
  • n n indicates the picture that follows after picture n in output order.
  • Type 3 violations can occur because, for example, the boundary is not correct.
  • the first sequence starts at time zero and we designate the removal time (t r, n (O)) from the CPB (encoder) buffer which stores the compressed bitstream just as a decoder would store the received compressed bitstream. All removal times for subsequent frames in this sequence are referred to as frame n, t r , ⁇ (n), and they are all calculated with respect to the removal time for "0".
  • the next sequences start at time nb, t r , n (nb). All such sequence starts refer back to time "0" to get their removal time. All removal times for subsequent frames in the subsequent sequences are referred to as frame n, t r , n (n), and they are all calculated with respect to the removal time for "nb", so they necessarily also rely on time "0".
  • Type 4 and 5 violations can also relate to boundaries. Type 4 violations can occur, for example, if a subsequent frame is removed before a previous frame. Type 5 violations can occur, for example, if you are using a fixed frame rate but the delta (time separation difference) between frames is not a constant. To fix these violations, in at least one implementation, we recalculate the removal time for the first picture/frame in this buffering period, t r , n (nb). We set the removal time for the first picture/frame in this buffering period equal to the sum of the removal time for the previous frame and a constant separation distance (presuming a fixed frame rate).
  • the corresponding cpb_removal_delay value of this frame is the difference between its removal time and the removal time of the first picture of the previous buffering period. For example, if the removal time of the previous picture is 1s, the removal time of the first picture of the previous buffering period is 0.5s, and frame rate is 30 frames per second (therefore the constant separation distance between frames is 0.042 second), then the removal time of the first picture in this buffering period is 1.042s.
  • the cpb_removal_delay of this picture is 0.542s.
  • Violations of conditions 3 through 5 can be corrected by, for example, changing the picture timing SEI or buffering period SEI. Violation of conditions 1 and 2 cannot be corrected by simply changing the SEIs, and re-encoding a subset of the bitstream is generally required. In the following, we describe the derivation of the new timing values to satisfy the conditions 3 through 5, and how to select which part of the bitstream should be re-encoded when condition 1 or 2 is violated.
  • the initial_cpb_removal_delay [SchedSelldx] in the buffering period SEI will be modified to equal Ceil( ⁇ tg,g O ( n ) ).
  • the cpb_removal_delay of the first picture of the buffering period is replaced by the following derived value:
  • cpb_removal_delay( n ) ( t r , n ( n-1 ) + DeltaTfiDivisor * t c - t r , n ( n b ) )/ t c
  • n b is the first picture of the previous buffering period.
  • the buffer overflow or underflow is caused by there being not enough or too many bits created for some pictures.
  • the buffer overflow may be a result of propagation and accumulation and, hence, is not necessarily due solely to the picture at which the condition is violated.
  • re-encoding is generally, but not always, required.
  • Other implementations re-encode additional pictures, depending on the extent of the problem and the over/under-flow.
  • FIG. 10 shows an exemplary method 1000 for hypothetical reference decoder (HRD) error correction, in accordance with an embodiment of the present principles.
  • HRD hypothetical reference decoder
  • the method 1000 may be implemented, for example, in an HRD verifier, a bitstream verifier, a multiplexer, a video encoder, a video decoder, and so forth.
  • the preceding devices/applications are merely illustrative and, thus, given the teachings of the present principles provided herein, one of ordinary skill in this and related arts will contemplate these and various other devices/applications to which the present principles may be applied, while maintaining the spirit of the present principles.
  • a bitstream is parsed, and HRD parameters are read there from.
  • step 1020 HRD conformance is checked.
  • step 1025 it is determined whether or not an HRD error exists. If so, then control is passed to a step 1030. Otherwise, the method is terminated.
  • step 1030 it is determined whether or not the error is a (Type 1 or Type 2 error) or a (Type 3 error) or a (Type 4 or Type 5 error). If a Type 1 or Type 2 error, then control is passed to a step 1035. If a Type 3 error, then control is passed to a step 1040. If a Type 4 or 5 error, then control is passed to a step 1045. At step 1035, the bitstream is partially re-encoded. At step 1040, the initial_cpb_removal_delay[SchedSelldx] to set it to Ceil(t g go(n)). At step 1050, the Buffering Period SEI is modified with the updated initial_cpb_removal_delay[SchedSelldx].
  • the cpb_removal_delay(n) is recalculated and set to ( t r , n -i) + DeltaTfiDivisor * t c - - t r , n ( n b ) ) / tc .
  • the Picture Timing SEI is modified with the updated cpb_removal_delay(n).
  • Figure 11 shows an exemplary parallel encoding system 1100, to which the present principles may be applied, in accordance with an embodiment of the present principles.
  • the parallel encoding system 1100 includes a segmentation module 1 105 for outputting video segments 1 11 10 through video segments n 11 19.
  • the video segments 1 1 110 through video segments n 11 19 are input to respective video encoder 1 1 120 through video encoder n 1 129.
  • a concatenator 1 130 receives the outputs of the video encoders and concatenates the same to provide an output bitstream.
  • such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C).
  • This may be extended, as readily apparent by one of ordinary skill in this and related arts, for as many items listed.
  • a picture may include, for example, either a frame or a field.
  • Implementations may signal information using a variety of techniques including, but not limited to, in-band information, out-of-band information, datastream data, implicit signaling, and explicit signaling.
  • In-band information and explicit signaling may include, for various implementations and/or standards, slice headers, SEI messages, other high level syntax, and non-high-level syntax. Accordingly, although implementations described herein may be described in a particular context, such descriptions should in no way be taken as limiting the features and concepts to such implementations or contexts.
  • the implementations and features described herein may be used in the context of the MPEG-4 AVC Standard, or the MPEG-4 AVC Standard with the MVC extension, or the MPEG-4 AVC Standard with the SVC extension. However, these implementations and features may be used in the context of another standard and/or recommendation (existing or future), or in a context that does not involve a standard and/or recommendation.
  • the implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program).
  • An apparatus may be implemented in, for example, appropriate hardware, software, and firmware.
  • the methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device.
  • processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
  • PDAs portable/personal digital assistants
  • Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications associated with data encoding and decoding.
  • equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices.
  • the equipment may be mobile and even installed in a mobile vehicle.
  • the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette, a random access memory ("RAM"), or a read-only memory (“ROM").
  • the instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two.
  • a processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process.
  • a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
  • implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations.
  • a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment.
  • Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal.
  • the formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream.
  • the information that the signal carries may be, for example, analog or digital information.
  • the signal may be transmitted over a variety of different wired or wireless links, as is known.
  • the signal may be stored on a processor-readable medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Library & Information Science (AREA)
  • Physics & Mathematics (AREA)
  • Astronomy & Astrophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Various implementations are described. Several implementations relate to a hypothetical reference decoder. According to one aspect, a bitstream, including data that has been encoded and parameters describing how to decode the encoded data, is accessed. It is determined whether the bitstream is not compliant with a standard. One or more of the parameters are modified to produce a modified bitstream that is compliant with the standard. The modified bitstream includes the encoded data and the modified parameters.

Description

HYPOTHETICAL REFERENCE DECODER
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of U.S. Provisional Application Serial No. 61/189,516, filed on August 20, 2008, titled "Hypothetical Reference Decoder", the contents of which are hereby incorporated by reference in their entirety for all purposes.
TECHNICAL FIELD Implementations are described that relate to coding systems. Various particular implementations relate to a hypothetical reference decoder.
BACKGROUND
Hypothetical reference decoder (HRD) conformance is typically a normative part of video compression standards. An HRD generally presents a set of requirements on the bitstream. An HRD verifier may include software and/or hardware and is typically used to verify conformance of a bitstream to the requirements by examining the bitstream, detecting whether any HRD errors exist and, if so, reporting such errors. In the context of video compression standards, such as MPEG-1 , MPEG-2,
MPEG-4, H.261 , H.263, and H.264/MPEG-4 part 10 AVC (hereinafter the "H.264/MPEG-4 AVC Standard" or variations thereof, such as "H.264", the "AVC Standard" or simply "AVC"), a bitstream is determined to be conformant if the bitstream adheres to the syntactical and semantic rules embodied in the standard. One such set of rules takes the form of a successful flow of the bitstream through a mathematical or hypothetical model of the decoder, which is conceptually connected to the output of an encoder and receives the bitstream from the encoder. Such a model decoder is referred to a hypothetical reference decoder (HRD) in some standards or the video buffer verifier (VBV) in other standards. In other words, the HRD specifies rules that bitstreams generated by a video encoder adhere to for such an encoder to be considered conformant under a given standard. HRD is typically a normative part of video coding standards and, hence, any bitstream under a given standard has to adhere to the HRD rules and constraints, and a real decoder can assume that such rules have been conformed with and such constraints have been met.
SUMMARY According to a general aspect, a bitstream, including data that has been encoded and parameters describing how to decode the encoded data, is accessed. It is determined whether the bitstream is not compliant with a standard. One or more of the parameters are modified to produce modified parameters. A modified bitstream is produced that is compliant with the standard. The modified bitstream includes the modified parameters.
The details of one or more implementations are set forth in the accompanying drawings and the description below. Even if described in one particular manner, it should be clear that implementations may be configured or embodied in various manners. For example, an implementation may be performed as a method, or embodied as apparatus, such as, for example, an apparatus configured to perform a set of operations or an apparatus storing instructions for performing a set of operations, or embodied in a signal. Other aspects and features will become apparent from the following detailed description considered in conjunction with the accompanying drawings and the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a diagram of an implementation of a hypothetical reference decoder (HRD) verifier system.
Figure 2 is a diagram of an implementation of an H.264 HRD buffer model. Figure 3 is a diagram of an implementation of an HRD verifier.
Figure 4 is a diagram of an implementation of an HRD verifier with a patch feature.
Figure 5 is a diagram of an implementation of a video transmission system.
Figure 6 is a diagram of an implementation of a video receiving system. Figure 7 is a diagram of an implementation of a video processing device.
Figure 8 is a diagram of an implementation of a system for transmitting and receiving multi-view video with depth information.
Figure 9 is a diagram of an implementation of a framework for generating nine output views (N = 9) out of 3 input views with depth (K = 3). Figure 10 is a diagram of an implementation of a hypothetical reference decoder (HRD) conformance error correction process.
Figure 11 is a diagram of an implementation of a parallel encoding system.
DETAILED DESCRIPTION
The inventors have determined that HRD errors often happen in applications that require stream splicing or splitting. These applications include parallel encoding, segment re-encoding, advertisement insertion, and so forth. Actually, when independently created HRD compliant streams are concatenated, it is generally not possible to guarantee that the spliced stream will also be HRD-compliant.
Parallel encoding is frequently used in storage, broadcasting, and internet transmission to improve the working performance. With parallel encoding, a sequence is cut into segments. In parallel encoding, these segments are typically encoded simultaneously, or at least during partially overlapping time periods. More generally, parallel encoding occurs when these segments are encoded separately, regardless of the actual time during which they are encoded. The final bitstream is obtained by concatenating the segmented bitstreams. In this scenario, some information needed to set-up the HRD parameter to encode a scene is dependent upon the buffer status at the end of the preceding scene. For example, setting up the initial coded picture buffer removal delay requires having access to the value of the final arrival time of the last picture of the preceding scene. Due to the parallel encoding, this information is not available as it would be with serial encoding. The bitstream created by parallel encoding may not be HRD-compliant on the segment boundary. Moreover, the support for re-encoding selective segments adds more difficulties as well. For instance, the re-encoding of a scene will change the final arrival time of the last picture of a scene, which makes the initial coded picture buffer removal delay of the succeeding scene invalid.
Advertisement insertion has a similar problem, since the inserted advertisement usually is encoded individually with respect to the program to which the advertisement bitstream will be inserted. Although similar, advertisement insertion and parallel encoding do present differences. Thus, an HRD verifier is used to check the HRD compliancy of the compressed bitstream. An HRD verifier is often used in various products such as encoders, multiplexers, conformance analyzers, and so forth.
An HRD verifier typically takes as input a bitstream, and the result indicates whether the bitstream is compliant or incompliant, the type of the error if there is any, and the location in the bitstream where the error occurs. To correct the HRD errors, the bitstream usually needs to be re-encoded with modified parameters.
In at least one implementation, we propose a framework for using a hypothetical reference decoder (HRD). The inventors have determined that re-encoding of the full sequence is not necessary in many cases. In some implementations, some HRD errors could be corrected by changing the buffering period parameters or picture timing associated with the bitstream, such that the bitstream will become compliant without re-encoding. In other implementations, other types of HRD violations cannot be corrected by partial re-encoding. In at least some implementations, we present methods and apparatuses for HRD conformance error correction without involving re-encoding the full bitstream. In at least one implementation, this is done by modifying the SEI messages associated with the bitstream or re-encoding a subset of the bitstream. The proposed methods and apparatuses can be applied to HRD verifiers, bitstream analyzers, and/or multiplexers to obtain a compliant bitstream. In at least one implementation, the present principles may be implemented in a decoder that is performing HRD-type activities prior to, for example, re-transmission and/or storage.
It is to be appreciated that the elements and applications described in this application are merely illustrative. Thus, these and other elements and applications to which the present principles may be applied are readily determined by one of ordinary skill in this and related arts given the teachings of the present principles provided herein and are, hence, within the spirit of the present principles.
One problem addressed by at least one disclosed implementation is a non-compliant bitstream. In at least one implementation, methods and apparatuses are provided that convert a HRD non-conformant AVC bitstream into an HRD conformant bitstream without re-encoding the full sequence. In at least one embodiment, this is done by replacing the buffering period and picture timing parameters that cause the HRD errors with the correct parameters derived based on the variation in bit rate over the time in the bitstream; and/or by replacing parts of the bitstream with the new bitstreams.
HRD or VBV is a normative part of most of the recent video compression standards. The rules and operations of HRD and VBD are defined in the standard specifications. For example, HRD is defined in Annex C of H.264; VBV is defined in the Annex C of MPEG-2.
Figure 1 shows an exemplary hypothetical reference decoder (HRD) verifier system 100 to which the present principles may be applied, in accordance with an embodiment of the present principles. The HRD verifier system 100 commences operation at a step 105. At step 110, a video encoder encodes the bitstream at a given bit rate. At step 115, the HRD verifier 104 examines the bitstream. At step 120, it is determined whether or not the bitstream is compliant. If so (i.e., the bitstream is identified as conformant), then the operation of the HRD verifier system 100 is terminated. Otherwise, control is passed to a step 125. At step 125, one or more parameters are modified, and the encoder will be run again with the modified bit rate per step 110. Thus, the process will continue until the output bitstream is identified as conformant to the HRD requirements.
It is to be appreciated that implementations of the present principles may involve, for example, the manual and/or automatic modification of encoding parameters. For example, in one implementation, a human operator may modify the encoding parameters. In another implementation, the HRD verifier 104 is configured to propose new parameters in an automated way (e.g., percentage by which the bit rate should be lowered). These and other variations of the present principles are readily contemplated by one of ordinary skill in this and related arts and are within the spirit of the present principles.
In H.264 decoding, data buffering occurs in order to maintain a transmission bit rate and video presentation rate. Pictures vary greatly in the amount ofdata used to encode them. Accordingly, the decoding buffer constantly fluctuates with respect to how much data it contains, and this is referred to as the buffer fullness. The buffer fullness is carefully monitored to avoid overflow and underflow. To be certain that the bitstream being created will not violate the buffer fullness, the encoder mimics the decoder buffer's behavior with a "virtual buffer". The virtual buffer attaches at the output of the encoder, and employs a mathematical equation(s) to determine how much data is entering and leaving the buffer at a given encode rate and a given buffer size. This virtual decoder is referred to as the hypothetical reference decoder (HRD).
Figure 2 shows an exemplary H.264 hypothetical reference decoder (HRD) buffer model 200 to which the present principles may be applied, in accordance with an embodiment of the present principles. The H.264 HRD buffer model 200 includes a hypothetical stream scheduler (HSS) 205, a coded picture buffer (CPB) 210, an instantaneous decoding process 215, and a decoded picture buffer (DPB) 220 as shown in Figure 2. The H.264 HRD buffer model 200 operates as follows. Data associated with access units that flow into the CPB 210 according to a specified arrival schedule are delivered by the HSS 205. The data associated with each access unit is removed and decoded instantaneously by the instantaneous decoding process 215 at CPB removal times. Each decoded picture is placed in the DPB 220 at its CPB removal time unless it is output at its CPB removal time and is a non-reference picture. When a picture is placed in the DPB 220, the picture is removed from the DPB 220 at the later of the DPB output time or the time that it is marked as "unused for reference".
HSS and HRD information concerning the number of enumerated delivery schedules and their associated bit rates and buffer sizes is specified in video user information (VUI). The HRD is initialized as specified by the buffering period SEI message. The removal timing of access units from the CPB 210 and output timing from the DPB 220 are specified in the picture timing SEI message. All timing information relating to a specific access unit shall arrive prior to the CPB removal time of the access unit.
A HRD conformance verifier is, for example, software and/or hardware that analyzes the bitstream based on the HRD rules and constraints and indicates the conformity of the bitstream.
Figure 3 shows an exemplary HRD verifier 300 to which the present principles may be applied, in accordance with an embodiment of the present principle. The HRD verifier 300 includes a bitstream parser 310, a CPB arrival and removal time calculator 320, and a constraint checker 330. The bitstream parser 310 takes as input the compressed bitstream 301 and extracts the buffer size and maximum bit rate information 311 from the sequence parameters set (SPS), and the picture size 312 and picture timing information 313 from the picture timing SEI message. The information is used to calculate the CPB arrival time and removal time of each picture, by the CPB arrival and removal time calculator 320. The initial arrival time 321 , final arrival time 322 and removal time 323 of each picture and the initial CPB removal delay of each buffering period 314 are inputted to the constraint checker 330. The conformance indicator 331 indicates any constraint violation.
Thus, in at least one implementation, an HRD verifier in accordance with the present principles can check bitstream compliancy, and is also able to correct some types of errors in the bitstream and indicate a subset of the bitstream that is to be re-encoded. In at least one implementation, an HRD verifier in accordance with the present principles avoids re-encoding the full bitstream when the bitstream is incompliant. Figure 4 shows an exemplary HRD verifier 400 with a patch feature, to which the present principles may be applied, in accordance with an embodiment of the present principle. The HRD verifier 400 includes a bitstream parser 310, a CPB arrival and removal time calculator 320, a constraint checker 330, and a bitstream patcher 440. The violation type 432 indicates which rule is violated. The bitstream patcher 440 will patch the bitstream based on the violation type, by changing the picture timing in the picture timing SEI and the initial CPB removal delay (and delay offset) in the buffering period SEI, in order to provide a conformant bitstream 442. As used herein, "patch feature" refers to an ability of the HRD verifier 400 to patch (i.e., correct) a detected HRD error in a bitstream. This may also be referred to as a "rewriting feature" because the HRD verifier 400 recalculates the HRD parameters and writes them back to the bitstream. One or more types of corrections that may be employed by the HRD verifier 400 are described in further detail herein below with respect to various types of HRD errors to which the HRD verifier in accordance with the present principles may be applied. Figure 5 shows an exemplary video transmission system 500, to which the present principles may be applied, in accordance with an implementation of the present principles. The video transmission system 500 may be, for example, a head-end or transmission system for transmitting a signal using any of a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast. The transmission may be provided over the Internet or some other network.
The video transmission system 500 is capable of generating and delivering video content encoded using inter-view skip mode with depth. This is achieved by generating an encoded signal(s) including depth information or information capable of being used to synthesize the depth information at a receiver end that may, for example, have a decoder.
The video transmission system 500 includes an encoder 510 and a transmitter 520 capable of transmitting the encoded signal. The encoder 510 receives video information and generates an encoded signal(s) there from using inter-view skip mode with depth. The encoder 510 may be, for example, the encoder 500 described in detail above. The encoder 510 may include sub-modules, including for example an assembly unit for receiving and assembling various pieces of information into a structured format for storage or transmission. The various pieces of information may include, for example, coded or uncoded video, coded or uncoded depth information, and coded or uncoded elements such as, for example, motion vectors, coding mode indicators, and syntax elements.
The transmitter 520 may be, for example, adapted to transmit a program signal having one or more bitstreams representing encoded pictures and/or information related thereto. Typical transmitters perform functions such as, for example, one or more of providing error-correction coding, interleaving the data in the signal, randomizing the energy in the signal, and modulating the signal onto one or more carriers. The transmitter may include, or interface with, an antenna (not shown). Accordingly, implementations of the transmitter 520 may include, or be limited to, a modulator.
Figure 6 shows an exemplary video receiving system 600 to which the present principles may be applied, in accordance with an embodiment of the present principles. The video receiving system 600 may be configured to receive signals over a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast. The signals may be received over the Internet or some other network.
The video receiving system 600 may be, for example, a cell-phone, a computer, a set-top box, a television, or other device that receives encoded video and provides, for example, decoded video for display to a user or for storage. Thus, the video receiving system 600 may provide its output to, for example, a screen of a television, a computer monitor, a computer (for storage, processing, or display), or some other storage, processing, or display device.
The video receiving system 600 is capable of receiving and processing video content including video information. The video receiving system 600 includes a receiver 610 capable of receiving an encoded signal, such as for example the signals described in the implementations of this application, and a decoder 620 capable of decoding the received signal.
The receiver 610 may be, for example, adapted to receive a program signal having a plurality of bitstreams representing encoded pictures. Typical receivers perform functions such as, for example, one or more of receiving a modulated and encoded data signal, demodulating the data signal from one or more carriers, de-randomizing the energy in the signal, de-interleaving the data in the signal, and error-correction decoding the signal. The receiver 610 may include, or interface with, an antenna (not shown). Implementations of the receiver 610 may include, or be limited to, a demodulator.
The decoder 620 outputs video signals including video information and depth information. The decoder 620 may be, for example, the decoder 600 described in detail above.
Figure 7 shows an exemplary video processing device 700 to which the present principles may be applied, in accordance with an embodiment of the present principles. The video processing device 700 may be, for example, a set top box or other device that receives encoded video and provides, for example, decoded video for display to a user or for storage. Thus, the video processing device 700 may provide its output to a television, computer monitor, or a computer or other processing device.
The video processing device 700 includes a front-end (FE) device 705 and a decoder 710. The front-end device 705 may be, for example, a receiver adapted to receive a program signal having a plurality of bitstreams representing encoded pictures, and to select one or more bitstreams for decoding from the plurality of bitstreams. Typical receivers perform functions such as, for example, one or more of receiving a modulated and encoded data signal, demodulating the data signal, decoding one or more encodings (for example, channel coding and/or source coding) of the data signal, and/or error-correcting the data signal. The front-end device 705 may receive the program signal from, for example, an antenna (not shown). The front-end device 705 provides a received data signal to the decoder 710.
The decoder 710 receives a data signal 720. The data signal 720 may include, for example, one or more Advanced Video Coding (AVC), Scalable Video Coding (SVC), or Multi-view Video Coding (MVC) compatible streams. AVC refers more specifically to the existing International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 Recommendation (hereinafter the "H.264/MPEG-4 AVC Standard" or variations thereof, such as the "AVC standard" or simply "AVC").
MVC refers more specifically to a multi-view video coding ("MVC") extension (Annex H) of the AVC standard, referred to as H.264/MPEG-4 AVC, MVC extension (the "MVC extension" or simply "MVC"). SVC refers more specifically to a scalable video coding ("SVC") extension
(Annex G) of the AVC standard, referred to as H.264/MPEG-4 AVC, SVC extension (the "SVC extension" or simply "SVC").
The decoder 710 decodes all or part of the received signal 720 and provides as output a decoded video signal 730. The decoded video 730 is provided to a selector 750. The device 700 also includes a user interface 760 that receives a user input 770. The user interface 760 provides a picture selection signal 780, based on the user input 770, to the selector 750. The picture selection signal 780 and the user input 770 indicate which of multiple pictures, sequences, scalable versions, views, or other selections of the available decoded data a user desires to have displayed. The selector 750 provides the selected picture(s) as an output 790. The selector 750 uses the picture selection information 780 to select which of the pictures in the decoded video 730 to provide as the output 790.
In various implementations, the selector 750 includes the user interface 760, and in other implementations no user interface 760 is needed because the selector 750 receives the user input 770 directly without a separate interface function being performed. The selector 750 may be implemented in software or as an integrated circuit, for example. In one implementation, the selector 750 is incorporated with the decoder 710, and in another implementation, the decoder 710, the selector 750, and the user interface 760 are all integrated. In one application, front-end 705 receives a broadcast of various television shows and selects one for processing. The selection of one show is based on user input of a desired channel to watch. Although the user input to front-end device 705 is not shown in Figure 7, front-end device 705 receives the user input 770. The front-end 705 receives the broadcast and processes the desired show by demodulating the relevant part of the broadcast spectrum, and decoding any outer encoding of the demodulated show. The front-end 705 provides the decoded show to the decoder 710. The decoder 710 is an integrated unit that includes devices 760 and 750. The decoder 710 thus receives the user input, which is a user-supplied indication of a desired view to watch in the show. The decoder 710 decodes the selected view, as well as any required reference pictures from other views, and provides the decoded view 790 for display on a television (not shown).
Continuing the above application, the user may desire to switch the view that is displayed and may then provide a new input to the decoder 710. After receiving a "view change" from the user, the decoder 710 decodes both the old view and the new view, as well as any views that are in between the old view and the new view. That is, the decoder 710 decodes any views that are taken from cameras that are physically located in between the camera taking the old view and the camera taking the new view. The front-end device 705 also receives the information identifying the old view, the new view, and the views in between. Such information may be provided, for example, by a controller (not shown in Figure 7) having information about the locations of the views, or the decoder 710. Other implementations may use a front-end device that has a controller integrated with the front-end device.
The decoder 710 provides all of these decoded views as output 790. A post-processor (not shown in Figure 7) interpolates between the views to provide a smooth transition from the old view to the new view, and displays this transition to the user. After transitioning to the new view, the post-processor informs (through one or more communication links not shown) the decoder 710 and the front-end device 705 that only the new view is needed. Thereafter, the decoder 710 only provides as output 790 the new view.
The system 700 may be used to receive multiple views of a sequence of images, and to present a single view for display, and to switch between the various views in a smooth manner. The smooth manner may involve interpolating between views to move to another view. Additionally, the system 700 may allow a user to rotate an object or scene, or otherwise to see a three-dimensional representation of an object or a scene. The rotation of the object, for example, may correspond to moving from view to view, and interpolating between the views to obtain a smooth transition between the views or simply to obtain a three-dimensional representation. That is, the user may "select" an interpolated view as the "view" that is to be displayed. Returning to a description of the present principles and environments in which they may be applied, it is to be appreciated that advantageously, the present principles may be applied to 3D Video (3DV). 3D Video is a new framework that includes a coded representation for multiple view video and depth information and targets the generation of high-quality 3D rendering at the receiver. This enables 3D visual experiences with auto-multiscopic displays.
Figure 8 shows an exemplary system 800 for transmitting and receiving multi-view video with depth information, to which the present principles may be applied, according to an embodiment of the present principles. In Figure 8, video data is indicated by a solid line, depth data is indicated by a dashed line, and meta data is indicated by a dotted line. The system 800 may be, for example, but is not limited to, a free-viewpoint television system. At a transmitter side 810, the system 800 includes a three-dimensional (3D) content producer 820, having a plurality of inputs for receiving one or more of video, depth, and meta data from a respective plurality of sources. Such sources may include, but are not limited to, a stereo camera 811 , a depth camera 812, a multi-camera setup 813, and 2-dimensional/3-dimensional (2D/3D) conversion processes 814. One or more networks 830 may be used for transmit one or more of video, depth, and meta data relating to multi-view video coding (MVC) and digital video broadcasting (DVB). At a receiver side 840, a depth image-based renderer 850 performs depth image-based rendering to project the signal to various types of displays. The depth image-based renderer 850 is capable of receiving display configuration information and user preferences. An output of the depth image-based renderer 650 may be provided to one or more of a 2D display 861, an M-view 3D display 862, and/or a head-tracked stereo display 863.
In order to reduce the amount of data to be transmitted, the dense array of cameras (V1, V2...V9) may be sub-sampled and only a sparse set of cameras actually capture the scene. Figure 9 shows an exemplary framework 900 for generating nine output views (N = 9) out of 3 input views with depth (K = 3), to which the present principles may be applied, in accordance with an embodiment of the present principles. The framework 900 involves an auto-stereoscopic 3D display 910, which supports output of multiple views, a first depth image-based renderer 920, a second depth image-based renderer 930, and a buffer for decoded data 940. The decoded data is a representation known as Multiple View plus Depth (MVD) data. The nine cameras are denoted by V1 through V9. Corresponding depth maps for the three input views are denoted by D1 , D5, and D9. Any virtual camera positions in between the captured camera positions (e.g., Pos 1 , Pos 2, Pos 3) can be generated using the available depth maps (D1 , D5, D9), as shown in Figure 9. As can be seen in Figure 9, the baseline between the actual cameras (V1 , V5 and V9) used to capture data can be large. As a result, the correlation between these cameras is significantly reduced and coding efficiency of these cameras may suffer since the coding efficiency would only rely on temporal correlation. Moreover, as shown in Figure 9, views 2, 3, 4, 6, 7 and 8 are skipped in coding. On one hand, in order to reconstruct the missed views, the decoder will need view 1 , 5 and 9 and their depth images. Note that the depth images for view 5 are obtained for rendering the missed views. On the other hand, in order to encode view 1 and view 9 based on view 5, a different depth image for view 5 may be identified. Thus, two different depth images for view 5 need to be coded. In at least one implementation, we propose to code the refinement on the original depth images for coding other views.
Moreover, in at least one implementation, instead of simply skipping the coding of certain views, the encoder can send the residual signal for selected pictures or views among the skipped views (i.e., views 2, 3, 4, 6, 7 and 8 in Figure 9) to enhance the rendering quality. In an embodiment, an HRD verifier for H.264/MPEG-4 AVC bitstream has two features: conformance verification; and HRD error correction.
1. Conformance Verification
The primary goal of an HRD verifier is to check the HRD conformance of an input bitstream. As a hypothetical decoder, the verifier does not need to actually decode the VCL-NAL unit. Instead, the VCL-NAL unit will be by-passed and only the size of the NAL unit is counted. The non-VCL NAL unit including HRD parameters, such as, for example, the sequence parameter set (SPS), Buffering Period supplemental enhancement information (SEI), and Picture Timing SEI, will be parsed and the HRD parameters will be extracted in order to calculate the timing and buffer status.
When the beginning of a new access unit (AU) is identified, the verifier will (1) update the timing and buffer status based on the size of the previous AU, and (2) check for any HRD violation. 1.1 Update the HRD scheduler
In at least one implementation, the HRD monitors picture CPB removal time (time to decode a picture), initial arrival time (time when the first bit of a picture enters the CPB), final arrival time (time when the last bit of a picture entered the CPB), and buffer fullness. Based on these values, the HRD detects buffer overflow, underflow, and other violations.
The variable tc is derived as follows and is called a clock tick:
tc = num_units_in_tick / time_scale
Update the CPB removal time:
For access unit 0, the nominal removal time of the access unit from the CPB is specified by the following:
tr,n( 0 ) = initial_cpb_removal_delay[ SchedSelldx ] / 90000 else tr,n( n ) = tr,n( nb ) + tc * cpb_removal_delay( n )
where cpb_removal_delay( n ) is extracted from the picture timing SEI, and nb is the 1st AU of the current buffering period.
For the first access unit of a buffering period that does not initialize the HRD, the nominal removal time of the access unit from the CPB is specified by the following:
tr,n( n ) = tr,n( nb ) + k * cpb_removal_delay( n )
where tr,n( nb ) is the nominal removal time of the first access unit of the previous buffering period and cpb_removal_delay( n ) is the value of cpb_removal_delay specified in the picture timing SEI message associated with access unit n.
Update the initial arrival time:
The initial arrival time is determined as follows:
tai( n ) = Max( taf( n - 1 ), tai.βamβst( n ) ) where ta, eariιest( n ) is derived as follows
If AU n is not the first AU of a subsequent buffering period, then ta( eariιest( n ) is derived as follows
taι eariιest( n ) = tr n( n ) - ( ιnιtιal_cpb_removal_delay[ SchedSelldx ] + ιnιtιal_cpb_removal_delay_offset[ SchedSelldx ] ) / 90000 with ιnιtιal_cpb_removal_delay[ SchedSelldx ] and ιnιtιal_cpb_removal_delay_offset[ SchedSelldx ] being specified in the previous buffering period SEI message
Otherwise (AU n is the first AU of a subsequent buffering period), taι eariιest( n ) is derived as follows
ta, eariιest( n ) = tr n( n ) - ( ιnιtιal_cpb_removal_delay[ SchedSelldx ] / 90000 ) with ιnιtιal_cpb_removal_delay[ SchedSelldx ] being specified in the buffering period SEI message associated with AU n
Update the final arrival time: The final arrival time for AU n is derived by the following
taf( n ) = ta.( n ) + b( n ) / BιtRate[ SchedSelldx ]
where b( n ) is the size in bits of AU n, counting the bits of the VCL NAL units and the filler data NAL units for the Type I conformance point or all bits of the Type Il bitstream for the Type Il conformance point Two types of bitstreams are subject to HRD conformance checking for H 264/MPEG-4 AVC The first such type of bitstream, called Type I bitstream, is a NAL unit stream containing only the VCL NAL units and filler data NAL units for all access units in the bitstream The second type of bitstream, called a Type Il bitstream, contains, in addition to the VCL NAL units and filler data NAL units for all access units in the bitstream, at least one of the following additional non-VCL NAL units other than filler data NAL units all leading_zero_8bits, zero_byte, start_code_prefix_one_3bytes, and trailing_zero_8bits syntax elements that form a byte stream from the NAL unit stream (as specified in Annex B of the H.264 Specification)
Update the CPB fullness:
The CPB fullness is derived by the following:
B(n) = B(n-1) + b(n)
1.2 Check for HRD violation
In at least one implementation, the following conditions (types of violations) have to be satisfied in a HRD conformant bitstream:
1. CPB no overflow B(n) < CpbSize[ SchedSelldx ]
where CpbSize[ SchedSelldx ] is extracted from the SPS.
2. CPB no underflow taf( n ) <= tr,n( n )
3. C-14/C-16 conditions on the buffering period boundary
If the current AU is the beginning of a buffering period, then this condition has to be satisfied:
If cbr_flag[ SchedSelldx ] initial_cpb_removal_delay[SchedSelldx] <= Ceil( Δtgι90( n ) ) else Floor( Δtg,9o( n ) ) <= initial_cpb_removal_delay[SchedSelldx] <=
Figure imgf000018_0001
)
where Δtg,90( n ) = 90000 * ( tr,n( n ) - taf( n - 1 ) ) This type of HRD error usually happens when concatenating two independently encoded bitstream.
4. cpb_removal_delay increment
W n ) > tr,n( n-1 )
This type of HRD error usually happens when concatenating two independently encoded bitstream.
5. Fixed Frame Rate
If fixed_frame_rate_flag = 1 Δto,dPb( n ) ÷ DeltaTfi Divisor = tc
where DeltaTfi Divisor is derived from Table E-6 from Annex E of the AVC Standard (which is reproduced below), tc = num_units_in_tick / time_scale, (num_units_in_tick and time_scale are parameters that is fixed for a sequence) and Δto,dPt>( n ) is derived by
ΔtOidPb( n ) = to,dpb( nn ) - to,dpb( n )
where nn indicates the picture that follows after picture n in output order.
This type of HRD error usually happens when concatenating two independently encoded bitstream.
Table E-6
Figure imgf000019_0001
Figure imgf000020_0001
Further regarding Type 3, Type 3 violations can occur because, for example, the boundary is not correct. As background, note that the first sequence starts at time zero and we designate the removal time (tr, n (O)) from the CPB (encoder) buffer which stores the compressed bitstream just as a decoder would store the received compressed bitstream. All removal times for subsequent frames in this sequence are referred to as frame n, tr, π (n), and they are all calculated with respect to the removal time for "0". The next sequences start at time nb, tr, n (nb). All such sequence starts refer back to time "0" to get their removal time. All removal times for subsequent frames in the subsequent sequences are referred to as frame n, tr, n (n), and they are all calculated with respect to the removal time for "nb", so they necessarily also rely on time "0".
Type 4 and 5 violations can also relate to boundaries. Type 4 violations can occur, for example, if a subsequent frame is removed before a previous frame. Type 5 violations can occur, for example, if you are using a fixed frame rate but the delta (time separation difference) between frames is not a constant. To fix these violations, in at least one implementation, we recalculate the removal time for the first picture/frame in this buffering period, tr, n (nb). We set the removal time for the first picture/frame in this buffering period equal to the sum of the removal time for the previous frame and a constant separation distance (presuming a fixed frame rate). The corresponding cpb_removal_delay value of this frame is the difference between its removal time and the removal time of the first picture of the previous buffering period. For example, if the removal time of the previous picture is 1s, the removal time of the first picture of the previous buffering period is 0.5s, and frame rate is 30 frames per second (therefore the constant separation distance between frames is 0.042 second), then the removal time of the first picture in this buffering period is 1.042s. The cpb_removal_delay of this picture is 0.542s. Of course, it is to be appreciated that the present principles are not limited solely to the preceding types of violations with respect to bitstream conformance with respect to a HRD and, thus, given the teachings of the present principles provided herein, one of ordinary skill in this and related arts will contemplate these and various other types of violations to which the present principles may be applied, while maintaining the spirit of the present principles.
2. Error Correction
There are five exemplary types of violations listed herein before. Violations of conditions 3 through 5 can be corrected by, for example, changing the picture timing SEI or buffering period SEI. Violation of conditions 1 and 2 cannot be corrected by simply changing the SEIs, and re-encoding a subset of the bitstream is generally required. In the following, we describe the derivation of the new timing values to satisfy the conditions 3 through 5, and how to select which part of the bitstream should be re-encoded when condition 1 or 2 is violated.
When a violation of condition 3 is detected, the initial_cpb_removal_delay [SchedSelldx] in the buffering period SEI will be modified to equal Ceil( Δtg,gO( n ) ). When violation of condition 4 or 5 is detected at the first picture of a buffering period, the cpb_removal_delay of the first picture of the buffering period is replaced by the following derived value:
cpb_removal_delay( n ) = ( tr,n( n-1 ) + DeltaTfiDivisor * tc - tr,n( nb ) )/ tc
where nb is the first picture of the previous buffering period. When conditions 1 or 2 are violated, the buffer overflows or underflows. The buffer overflow or underflow is caused by there being not enough or too many bits created for some pictures. The buffer overflow, for example, may be a result of propagation and accumulation and, hence, is not necessarily due solely to the picture at which the condition is violated. When a violation of conditions 1 or 2 happens, re-encoding is generally, but not always, required. In various implementations, we re-encode the buffering period that includes the error and the buffering period immediately before it. Such implementations are able to correct many violations of conditions 1 or 2. Other implementations re-encode additional pictures, depending on the extent of the problem and the over/under-flow. Figure 10 shows an exemplary method 1000 for hypothetical reference decoder (HRD) error correction, in accordance with an embodiment of the present principles. It is to be appreciated that the method 1000 may be implemented, for example, in an HRD verifier, a bitstream verifier, a multiplexer, a video encoder, a video decoder, and so forth. Moreover, it is to be appreciated that the preceding devices/applications are merely illustrative and, thus, given the teachings of the present principles provided herein, one of ordinary skill in this and related arts will contemplate these and various other devices/applications to which the present principles may be applied, while maintaining the spirit of the present principles. At step 1015, a bitstream is parsed, and HRD parameters are read there from.
At step 1020, HRD conformance is checked. At step 1025, it is determined whether or not an HRD error exists. If so, then control is passed to a step 1030. Otherwise, the method is terminated.
At step 1030, it is determined whether or not the error is a (Type 1 or Type 2 error) or a (Type 3 error) or a (Type 4 or Type 5 error). If a Type 1 or Type 2 error, then control is passed to a step 1035. If a Type 3 error, then control is passed to a step 1040. If a Type 4 or 5 error, then control is passed to a step 1045. At step 1035, the bitstream is partially re-encoded. At step 1040, the initial_cpb_removal_delay[SchedSelldx] to set it to Ceil(tg go(n)). At step 1050, the Buffering Period SEI is modified with the updated initial_cpb_removal_delay[SchedSelldx].
At step 1045, the cpb_removal_delay(n) is recalculated and set to ( tr,n-i) + DeltaTfiDivisor * tc - - tr,n( nb ) ) / tc . At step 1055, the Picture Timing SEI is modified with the updated cpb_removal_delay(n). Figure 11 shows an exemplary parallel encoding system 1100, to which the present principles may be applied, in accordance with an embodiment of the present principles. The parallel encoding system 1100 includes a segmentation module 1 105 for outputting video segments 1 11 10 through video segments n 11 19. The video segments 1 1 110 through video segments n 11 19 are input to respective video encoder 1 1 120 through video encoder n 1 129. A concatenator 1 130 receives the outputs of the video encoders and concatenates the same to provide an output bitstream.
Reference in the specification to "one embodiment" or "an embodiment" or "one implementation" or "an implementation" of the present principles, as well as other variations thereof, mean that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase "in one embodiment" or "in an embodiment" or "in one implementation" or "in an implementation", as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
It is to be appreciated that the use of any of the following 7", "and/or", and "at least one of, for example, in the cases of "A/B", "A and/or B" and "at least one of A and B", is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of "A, B, and/or C" and "at least one of A, B, and C", such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as readily apparent by one of ordinary skill in this and related arts, for as many items listed.
Throughout this application the term "picture" is used. A picture may include, for example, either a frame or a field.
Implementations may signal information using a variety of techniques including, but not limited to, in-band information, out-of-band information, datastream data, implicit signaling, and explicit signaling. In-band information and explicit signaling may include, for various implementations and/or standards, slice headers, SEI messages, other high level syntax, and non-high-level syntax. Accordingly, although implementations described herein may be described in a particular context, such descriptions should in no way be taken as limiting the features and concepts to such implementations or contexts.
The implementations and features described herein may be used in the context of the MPEG-4 AVC Standard, or the MPEG-4 AVC Standard with the MVC extension, or the MPEG-4 AVC Standard with the SVC extension. However, these implementations and features may be used in the context of another standard and/or recommendation (existing or future), or in a context that does not involve a standard and/or recommendation. The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications associated with data encoding and decoding. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle. Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette, a random access memory ("RAM"), or a read-only memory ("ROM"). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation. As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application and are within the scope of the following claims.

Claims

CLAIMS:
1. A method comprising: accessing a bitstream including data that has been encoded and parameters describing how to decode the encoded data; determining that the bitstream is not compliant with a standard; modifying one or more of the parameters to produce modified parameters; and producing a modified bitstream that is compliant with the standard, the modified bitstream including the modified parameters.
2. The method of claim 1 , wherein the bitstream is modified without fully re-encoding the data.
3. The method of claim 1 , wherein the bitstream is modified without re-encoding any of the data.
4. The method of claim 1 , wherein the step of modifying one or more of the parameters contributes to correcting a problem causing the non-compliance.
5. The method of claim 1 , further comprising re-encoding at least some of the data, and wherein the re-encoding and the modifying both contribute to correct a problem causing the non-compliance.
6. The method of claim 1 , wherein the step of modifying one or more of the parameters comprises changing parameters associated with the bitstream, the parameters corresponding to at least one of buffering period parameters and picture timing parameters.
7. The method of claim 1, wherein non-compliance is caused by buffer underflow or overflow.
8. The method of claim 1 , wherein the step of modifying one or more of the parameters comprises changing a parameter that sets a removal time for a picture.
9. The method of claim 1 , wherein the step of modifying one or more of the parameters comprises replacing some of the parameters with parameters derived based on a variation in bit rate over time in the bitstream.
10. The method of claim 1 , wherein the standard is the MPEG-4 AVC
Standard.
11. The method of claim 1 , wherein a determined non-compliant feature is a coded picture buffer overflow, and the bitstream is modified with a re-encoding of a current picture and previous pictures with respect to the current picture.
12. The method of claim 1 , wherein a determined non-compliant feature is a coded picture buffer underflow, and the bitstream is modified with a re-encoding of a current picture and previous pictures with respect to the current picture.
13. The method of claim 1 , wherein: a determined non-compliant feature relates to a buffering period boundary, and a parameter controlling a delay period between arrival of a current picture and removal of the current picture is set to a value greater than a difference between removal time of the current picture and final arrival time of a previous picture.
14. The method of claim 10, wherein: a determined non-compliant feature relates to a buffering period boundary, and an initial_cpb_removal_delay[SchedSellDX], as defined in the MPEG-4 AVC Standard, in buffering period supplemental enhancement information is modified.
15. The method of claim 1 , wherein: a determined non-compliant feature relates to a subsequent picture being removed from a coded picture buffer before a previous picture with respect to the subsequent picture, and modifying the one or more parameters comprises modifying removal time for a first picture of a buffering period corresponding to the bitstream by setting the removal time with respect to the first picture equal to a removal time of a previous picture plus a constant separation distance.
16. The method of claim 15, wherein modifying the removal time comprises modifying cpb_removal_delay(n), as defined in the MPEG-4 AVC Standard.
17. The method of claim 10, wherein: a determined non-compliant feature relates to an irregularity in a fixed picture rate, and modifying the one or more parameters comprises modifying removal time for a first picture of a buffering period corresponding to the bitstream by setting the removal time with respect to the first picture equal to a removal time of a previous picture plus a constant separation distance.
18. The method of claim 17, wherein modifying the removal time comprises modifying cpb_removal_delay(n), as defined in the MPEG-4 AVC Standard.
19. The method of claim 1 , wherein the bitstream includes a first portion and a second portion, and the method includes receiving the bitstream from a parallel encoder that has encoded the first portion and the second portion in parallel.
20. The method of claim 1 , wherein the modifying is performed at one or more of a hypothetical reference decoder verifier or an encoder.
21. An apparatus comprising: means for accessing a bitstream including data that has been encoded and parameters describing how to decode the encoded data; means for determining that the bitstream is not compliant with a standard; means for modifying one or more of the parameters to produce modified parameters; and means for producing a modified bitstream that is compliant with the standard, the modified bitstream including the modified parameters.
22. A processor readable medium having stored thereon instructions for causing a processor to perform at least the following: accessing a bitstream including data that has been encoded and parameters describing how to decode the encoded data; determining that the bitstream is not compliant with a standard; modifying one or more of the parameters to produce modified parameters; and producing a modified bitstream that is compliant with the standard, the modified bitstream including the modified parameters.
23. An apparatus, comprising a processor configured to perform at least the following: accessing a bitstream including data that has been encoded and parameters describing how to decode the encoded data; determining that the bitstream is not compliant with a standard; modifying one or more of the parameters to produce modified parameters; and producing a modified bitstream that is compliant with the standard, the modified bitstream including the modified parameters.
24. An apparatus comprising: a bitstream parser for accessing a bitstream including data that has been encoded and parameters describing how to decode the encoded data; a hypothetical reference decoder verifier for determining that the bitstream is not compliant with a standard; and a bitstream patcher for modifying one or more of the parameters to produce modified parameters, and for producing a modified bitstream that is compliant with the standard, the modified bitstream including the modified parameters.
25. The apparatus of claim 24 wherein the apparatus includes an encoder.
26. The apparatus of claim 24 wherein the apparatus includes a decoder.
27. An apparatus comprising: a bitstream parser for accessing a bitstream including data that has been encoded and parameters describing how to decode the encoded data; a hypothetical reference decoder verifier for determining that the bitstream is not compliant with a standard; a bitstream patcher for modifying one or more of the parameters to produce modified parameters, and for producing a modified bitstream that is compliant with the standard, the modified bitstream including the modified parameters; and a modulator for modulating a signal, the signal including the modified bitstream.
28. An apparatus comprising: a demodulator for receiving and demodulating a signal, the signal including a bitstream; a bitstream parser for accessing the bitstream including data that has been encoded and parameters describing how to decode the encoded data; a hypothetical reference decoder verifier for determining that the bitstream is not compliant with a standard; and a bitstream patcher for modifying one or more of the parameters to produce modified parameters, and for producing a modified bitstream that is compliant with the standard, the modified bitstream including the modified parameters.
PCT/US2009/004544 2008-08-20 2009-08-07 Hypothetical reference decoder WO2010021665A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US18951608P 2008-08-20 2008-08-20
US61/189,516 2008-08-20

Publications (1)

Publication Number Publication Date
WO2010021665A1 true WO2010021665A1 (en) 2010-02-25

Family

ID=41213231

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2009/004544 WO2010021665A1 (en) 2008-08-20 2009-08-07 Hypothetical reference decoder

Country Status (1)

Country Link
WO (1) WO2010021665A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011107426A1 (en) * 2010-03-01 2011-09-09 Institut für Rundfunktechnik GmbH Method and system for reproduction of 3d image contents
WO2013151634A1 (en) * 2012-04-04 2013-10-10 Qualcomm Incorporated Low-delay video buffering in video coding
CN104519370A (en) * 2013-09-29 2015-04-15 中国电信股份有限公司 Video stream splicing method and system
US9154785B2 (en) 2012-10-08 2015-10-06 Qualcomm Incorporated Sub-bitstream applicability to nested SEI messages in video coding
CN106664427A (en) * 2014-06-20 2017-05-10 高通股份有限公司 Systems and methods for selectively performing a bitstream conformance check
RU2646378C2 (en) * 2012-09-24 2018-03-02 Квэлкомм Инкорпорейтед Advanced determination of the decoding unit
TWI647948B (en) * 2013-06-03 2019-01-11 高通公司 Hypothetical reference decoder model and consistency of cross-layer random access skip images

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1045589A2 (en) * 1999-04-16 2000-10-18 Sony United Kingdom Limited Apparatus and method for splicing of encoded video bitstreams
WO2005025227A1 (en) * 2003-09-05 2005-03-17 General Instrument Corporation Methods and apparatus to improve the rate control during splice transitions
US20080056383A1 (en) * 2006-09-05 2008-03-06 Eiji Ueki Information processing apparatus and method
WO2008085935A1 (en) * 2007-01-08 2008-07-17 Thomson Licensing Methods and apparatus for video stream splicing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1045589A2 (en) * 1999-04-16 2000-10-18 Sony United Kingdom Limited Apparatus and method for splicing of encoded video bitstreams
WO2005025227A1 (en) * 2003-09-05 2005-03-17 General Instrument Corporation Methods and apparatus to improve the rate control during splice transitions
US20080056383A1 (en) * 2006-09-05 2008-03-06 Eiji Ueki Information processing apparatus and method
WO2008085935A1 (en) * 2007-01-08 2008-07-17 Thomson Licensing Methods and apparatus for video stream splicing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LUO J ET AL: "On HRD conformance for splice bitstreams", JOINT VIDEO TEAM (JVT) OF ISO/IEC MPEG & ITU-T VCEG(ISO/IEC JTC1/SC29/WG11 AND ITU-T SG16 Q6), XX, XX, no. JVT-V055, 14 January 2007 (2007-01-14), XP030006863 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011107426A1 (en) * 2010-03-01 2011-09-09 Institut für Rundfunktechnik GmbH Method and system for reproduction of 3d image contents
US9565431B2 (en) 2012-04-04 2017-02-07 Qualcomm Incorporated Low-delay video buffering in video coding
WO2013151634A1 (en) * 2012-04-04 2013-10-10 Qualcomm Incorporated Low-delay video buffering in video coding
US9578326B2 (en) 2012-04-04 2017-02-21 Qualcomm Incorporated Low-delay video buffering in video coding
RU2646378C2 (en) * 2012-09-24 2018-03-02 Квэлкомм Инкорпорейтед Advanced determination of the decoding unit
US9380317B2 (en) 2012-10-08 2016-06-28 Qualcomm Incorporated Identification of operation points applicable to nested SEI message in video coding
US9319703B2 (en) 2012-10-08 2016-04-19 Qualcomm Incorporated Hypothetical reference decoder parameter syntax structure
US9154785B2 (en) 2012-10-08 2015-10-06 Qualcomm Incorporated Sub-bitstream applicability to nested SEI messages in video coding
TWI647948B (en) * 2013-06-03 2019-01-11 高通公司 Hypothetical reference decoder model and consistency of cross-layer random access skip images
CN104519370A (en) * 2013-09-29 2015-04-15 中国电信股份有限公司 Video stream splicing method and system
CN104519370B (en) * 2013-09-29 2018-06-08 中国电信股份有限公司 The joining method and system of a kind of video flowing
CN106664427A (en) * 2014-06-20 2017-05-10 高通股份有限公司 Systems and methods for selectively performing a bitstream conformance check
US10542261B2 (en) 2014-06-20 2020-01-21 Qualcomm Incorporated Systems and methods for processing a syntax structure assigned a minimum value in a parameter set

Similar Documents

Publication Publication Date Title
JP5536676B2 (en) Virtual reference view
US20200112724A1 (en) Transmission device, transmission method, reception device, and reception method
JP5614902B2 (en) 3DV inter-layer dependency information
US7826536B2 (en) Tune in time reduction
US20100074340A1 (en) Methods and apparatus for video stream splicing
WO2010021665A1 (en) Hypothetical reference decoder
KR101959319B1 (en) Video data encoding and decoding methods and apparatuses
WO2011075548A1 (en) Carriage systems encoding or decoding jpeg 2000 video
US8724710B2 (en) Method and apparatus for video encoding with hypothetical reference decoder compliant bit allocation
JP2011109665A (en) Method and apparatus for multiplexing video elementary stream without coded timing information
CN115398481A (en) Apparatus and method for performing artificial intelligence encoding and artificial intelligence decoding on image
KR101584111B1 (en) A Method And Apparatus For Enhancing Quality Of Multimedia Service By Using Cloud Computing
KR20050090463A (en) Method and apparatus for preventing error propagation in a video sequence

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09789088

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09789088

Country of ref document: EP

Kind code of ref document: A1