Nothing Special   »   [go: up one dir, main page]

EP2174500A2 - Video indexing method, and video indexing device - Google Patents

Video indexing method, and video indexing device

Info

Publication number
EP2174500A2
EP2174500A2 EP08761351A EP08761351A EP2174500A2 EP 2174500 A2 EP2174500 A2 EP 2174500A2 EP 08761351 A EP08761351 A EP 08761351A EP 08761351 A EP08761351 A EP 08761351A EP 2174500 A2 EP2174500 A2 EP 2174500A2
Authority
EP
European Patent Office
Prior art keywords
interest
regions
picture
region
video data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP08761351A
Other languages
German (de)
French (fr)
Inventor
Sylvain Fabre
Régis Sochard
Pierre Laurent Lagalaye
Olivier Le Meur
Philippe Guillotel
Samuel Vermeulen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Thomson Licensing SAS
Original Assignee
Thomson Licensing SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing SAS filed Critical Thomson Licensing SAS
Publication of EP2174500A2 publication Critical patent/EP2174500A2/en
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display

Definitions

  • the invention relates to a video indexing method, and a video indexing device,
  • ROI regions of interest
  • coding applications often decode regions of interest and deploy more resources for coding these regions.
  • the detection of regions of interest is today principally used prior to coding in a such a manner as to privilege the regions of interest during coding by according them more bandwidth, for example by reducing the quantization step for these regions.
  • the present invention is principally concerned not with the detection of regions of interest, but rather with the transmission of these regions of interest to the devices or applications that take them into account for different applications and can at least resolve the picture display problem on a terminal with a low display capacity, whether mobile or not.
  • the present invention proposes a method for indexing a coded video data stream.
  • the video data stream comprises information relative to the location of regions of interest of each picture, the method comprises steps of:
  • the selected regions of interest are recorded in a temporary memory as they are being selected and decoded, - when all the selected regions of interest are recorded in the temporary memory, the selected regions of interest are transferred to a permanent memory support (503).
  • the regions of interest are formatted in order to obtain a homogenous size for all the selected regions of interest.
  • the method comprises a step of encrypting the location of the regions of interest thanks to an encryption key.
  • the method comprises a step of obtaining a decryption key upon payment by the user.
  • the video data stream is coded according to the coding standard H.264/AVC and the location information is contained in a Supplemental Enhancement Information (SEI) type message.
  • SEI Supplemental Enhancement Information
  • the SEI messages are encapsulated into real-time protocol packets (RTP), the RTP packets being encrypted.
  • the Supplemental Enhancement Information type messages relative to regions of interest location information are inserted in the coded data before or after each picture to which they refer.
  • the location information comprises information chosen from:
  • the selection step of a region of interest per picture selects a region of interest according to the weight relative to the importance of the region of interest.
  • the video coding standard uses flexible macro-bloc ordering, the regions of interest being coded into slice groups, independently from the other picture data, the location information of regions of interest comprising the slice group numbers in which the regions of interest are coded.
  • the Supplemental Enhancement Information message comprises an identifier indicating for each slice group if it is related to one region of interest.
  • the method comprises a further step of reading the SEI messages and in that the step of decoding of video data decodes only the slice groups containing the region of interests.
  • the invention concerns also a device for indexing a coded video data stream.
  • the video data stream comprises information relative to the location of regions of interest of each picture
  • the device comprises means for:
  • the detection of the regions of interest of a picture is made in general prior to coding. This data is then used to facilitate the encoding.
  • the inventors realized that the location of regions of interest can also be of interest during the decoding of a picture and particularly during the display on a device whose display capacity is limited. In fact, the reception terminal can in fact choose to display the regions of interest only, which enables having a better visibility of these regions relative to the display of the complete picture.
  • FIG. 1 shows a coding device according to a preferred embodiment of the invention
  • - figure 2 shows a coding method according to a preferred embodiment of the invention
  • - figure 3 shows a decoding device according to a preferred embodiment of the invention
  • - figure 4 shows a decoding method according to another embodiment of the invention
  • - figure 5 shows a personal recording type device according to another embodiment of the invention
  • FIG. 6 shows an indexing method in a personal recording type device implementing an embodiment of the invention.
  • Figure 1 shows a coding device in accordance with the coding standard H.264/AVC implementing a preferred embodiment of the invention.
  • a video stream is coded.
  • a current frame F n is presented at the coder input to be coded by it.
  • This frame is coded in the form of slices, namely it is divided into sub-units which each contain a certain number of macroblocks corresponding to groups of 16x16 pixels.
  • Each macroblock is coded in intra or inter mode. Whether in intra mode or inter mode, a macroblock is coded by being based on a reconstructed frame.
  • a module 109 decides the coding mode in intra mode of the current picture, according to the content of the picture.
  • P shown in figure 2 comprises samples of the current frame Fn that were previously coded, decoded and reconstructed (uF'n on figure 2, u meaning non-filtered).
  • inter mode P is comprised from a motion estimation based on one or more F' n- i frames.
  • a motion estimation module 101 establishes an estimation of motion between the current frame Fn and at least one preceding frame F'n-1. From this motion estimation, a motion compensation module 102 produces a frame P when the current picture Fn must be coded in inter mode. A subtractor 103 produces a signal Dn, the difference between the picture Fn to be coded and the picture P. Then this picture is transformed by a DCT transform in a module 104. The transformed picture is then quantized by a quantization module 105. Then, the pictures are reorganized by a module 111.
  • a CABAC (Context-based Adaptive Binary Arithmetic Coding) type entropic coding module 112 then codes each picture.
  • the modules 106 and 107 respectively of quantization and inverse transformation enable a difference D'n to be reconstituted after transformation and quantization then inverse quantization and inverse transformation.
  • an intra prediction module 108 codes the picture.
  • a uF'n picture is obtained at the adder output 114, as is the sum of the D'n signal and the P signal.
  • This module 108 also receives at input the reconstructed non-filtered F'n picture.
  • a filter module 110 can obtain an F'n picture reconstructed and filtered from a uF'n picture.
  • the entropic decoding module 112 transmits the coded slices encapsulated in NAL type units.
  • the NALs contain, as well as the slices, information relating to the headers for example.
  • the NAL type units are transmitted to a module 113.
  • a module 116 enables the regions of interest to be determined.
  • the means 116 then establish a salience map for each picture of the video.
  • parameters entered by the user can also be taken into account. For example, it is possible to define, according to the event to which the video is related, certain important objects of the filmed scene and particularly for sporting events to specify that it concerns a football match.
  • this allows a salience map to be obtained that weights the salience zones according to the event. In a football match, it would be preferable to focus on the ball rather than on the terraces.
  • the region of interest module therefore enables one or more salient zones to be extracted, also referred to as regions of interest. These regions of interest are then geographically located on the picture. They are identified by their coordinates according to the height and width of the picture. Their size can also be extracted for each of the regions of interest. It is also possible to associate them with an element of semantic information. In fact for a football match, one may require information on a region of interest if the user can select the regions of interest to be displayed from a choice of several regions of interest to be displayed.
  • the module 115 receives information relating to the regions of interest in order to code them into an SEI ("Supplemental Enhancement Information") type message.
  • SEI Supplemental Enhancement Information
  • the SEI message is coded as indicated in the table below:
  • uuid_iso_iec_11578 single word of 128 bits to indicate our message type to the decoder.
  • user_data_payload_byte 8 bits comprising a part of the SEI massage.
  • payloadSize 17 (bytes) thus 16 for the UUID and 1 for the proprietary data.
  • number_of_ROI Number of regions of interest present in the picture
  • roi_x_16 Position X in the picture of the region of interest, in multiples of 16 pixels.
  • roi_y_16 Position Y in the picture of the region of interest, in multiples of 16 pixels.
  • roi_w_16 Width in the picture of the region of interest, in multiples of
  • semantic_information title characterizing the region of interest.
  • Relative weights gives the weight of each region of interest of the picture in such a way to know which region of interest that has in principle the most interest.
  • Macroblock_alignment gives the number of the starting macroblock in which the region of interest is found, as well as the size of the region of interest in number of macroblocks, in width and in height.
  • the regions of interest are classified as salient if their salience is higher than a certain threshold predetermined by the method for obtaining salience maps.
  • the regions of interest are classed in increasing order of salience for all regions where the salience is higher than a fixed threshold.
  • the module 113 inserts the SEI message into the data stream and sends the video stream thus coded to the transmission network.
  • An SEI message is transmitted before each picture to which it refers. In other embodiments, it is also possible to transmit the SEI message only when the location of at least one region of interest changes between two or more pictures. Hence, during decoding, the decoder takes into account the last SEI message received, whether it is immediately before the picture to be decoded or if it relates to a picture previously received if the current picture is not preceded by such an SEI message.
  • Figure 2 shows a coding method in accordance with the coding standard H.264/AVC implementing a preferred embodiment of the invention.
  • the salience map associated with the video to be broadcast is determined.
  • information relating to the video content can also be received to take account of this information during the establishment of the salience map.
  • the position of the ball corresponds to a region of interest for the user and in this case, privilege the zones of the picture in which the ball is situated.
  • the presenter corresponds to a region of interest, and in this case, determine the regions of interest by privileging the zones containing the presenter by detecting for example the face using known picture processing techniques.
  • one or more regions of interest relating to the video content are thus obtained.
  • the coordinates of the regions of interest in the pictures are determined.
  • the size of the regions of interest can also be determined in pixels and semantic information on the content can be associated with each region of interest.
  • the video stream is coded according to the coding standard H.264.
  • zones are privileged that were detected as regions of interest.
  • a lower quantization step is applied to them.
  • an SEI message is created from location and semantic information associated with the regions of interest.
  • the SEI message thus created is in accordance with the SEI message previously described in tables 1 and 2.
  • the stream is constituted by inserting SEI messages into the stream to obtain a coded stream according to the H.264 standard.
  • the video stream thus coded is transmitted to decoding devices in real time or in a deferred manner during a step E6, the decoding devices can be local or remote.
  • Figure 3 represents a preferred embodiment of a decoding device according to the invention, in accordance with the coding standard H.264/AVC.
  • a 209 module receives SEI messages at the input. It extracts the different SEI messages.
  • the NALs of useful data are transmitted to an entropic decoding module 201.
  • the SEI messages are analyzed by a module 210. This module enables decoding of the content of SEI messages representative of the regions of interest. The regions of interest of each picture are thus identified at the level of the decoding device in a simple manner and prior to the decoding of each picture using information contained in the field macroblock_alignment.
  • the macroblocks are transmitted to a re-ordering module 202 to obtain a set of coefficients. These coefficients undergo an inverse quantization in the module 203 and an inverse DCT transformation in the module 204 at the output of which D'n macroblocks are obtained, D'n being a deformed version of Dn.
  • a predictive block P is added to D'n, by an adder 205, to reconstruct a macroblock uF'n.
  • the block P is obtained after motion compensation, carried out by a module 208, of the preceding decoded frame, during a coding in inter mode or after intra prediction of the macroblock uF'n, by the module 207, in the case of coding in intra mode.
  • a filter 206 is applied to the signal uF'n to reduce the effects of the distortion and the reconstructed frame F'n is created from a series of macroblocks.
  • SEI messages the blocks representative of regions of interest are detected in the stream and prior to display, these blocks are identified and can be cropped according to the choice of the user and transmitted for display to a device such as a PDA, or mobile telephone.
  • this region of interest is displayed in zoom on the screen to take up the full screen.
  • the decoding device thus only decodes the macroblocks likely to contain information of interest to the user. In this way the decoding is faster and requires less resources at the level of the decoding device and therefore at reception. This is particularly advantageous when the receiving device is a mobile terminal comprising limited processing capacity.
  • Figure 4 shows a decoding method in accordance with the coding standard H.264/AVC implementing a preferred embodiment of the invention.
  • Such a method can be implemented in a mobile terminal having a limited display capacity.
  • step S1 the type of display required is selected.
  • the selection is made by means of the user interface present on the mobile terminal. Either it is decided to function in full picture mode and in this case the integrality of the video stream is displayed as it is transmitted by the transmitter. Or it is decided to display the only the regions of interest of the picture. This particular mode constitutes the particularity of the invention.
  • step S2 if not it passes to step S8.
  • step S8 it is understood that different types of SEI messages can be inserted into the video stream for other applications and in this case, prior to step S8 or during step S8, there can be a step of SEI message analysis.
  • the user selects the use that he wants to make of the regions of interest. Particularly, he can select: - the maximum number of regions of interest that he wants to display.
  • the regions of interest whose "semantic information" field comprises the keyword are also possible to specify whether it is required to display a single region of interest per picture comprising the keyword (and in this case those for which the salience is maximum) or several regions of interest comprising the key word.
  • the SEI messages present in the stream are analyzed as they are being received.
  • the SEI message is used to code the location of regions of interest of the picture as they were detected prior to the picture coding. Hence for each picture, there can be one or more regions of interest according to the visual properties of the picture or according to picture content or both.
  • the SEI message is coded according to the tables 1 and 2 previously described. Information relating to SEI messages is recorded temporally up until the display of the corresponding picture.
  • the pictures are all decoded in conformance with the decoding standard.
  • the decoded regions of interest are processed according to those that the user selected during the S2 step. If the user selects a zoom of the principle region of interest of the picture, then during step S6, the zone is magnified so as to reach the maximum size of display. If the user has selected a mosaic of regions of interest then the picture is recomposed of regions of interest, each being magnified according to the screen size and the number of regions of interest selected for display. If the user has specified a keyword, then the regions of interest comprising the keyword are displayed and zoomed. During a step S7, the regions of interest are displayed on the screen of the mobile terminal, according to the user's desire.
  • step a S8 following a non-selection by the user to display only the regions of interest, the entire video stream is decoded for display.
  • Figure 5 shows a video indexing application of the invention.
  • FIG. 5 partially shows a personal recorder (PVR) type device 500.
  • PVR personal recorder
  • the PVR 500 receives a compressed video stream at its input.
  • this video data stream is in accordance with the coding standard H.264.
  • the compressed video stream comprises particularly
  • This video data stream is partly transmitted to a recording support 503.
  • Recording support can be understood as hard disk, holographic support, memory card or "blue ray” disk. This recording support can be remote in other embodiments.
  • the video data stream is transmitted in another part to a decoder 501 to be decoded in real time, this for example to be displayed on a television set.
  • the stream is transmitted to the decoder 501 when the user wants to view it in real time. If not, it is not decoded but simply recorded, when recording is requested.
  • the present invention offers to decode part of the video data stream, even when viewing in real time is not requested.
  • a part of the video stream it is understood particularly the regions of interest or certain regions of interest.
  • the decoder 501 When the decoder 501 receives a video stream for which a recording is requested, the data is transmitted to the recording support 503.
  • the recording support 503 records the data as it is received.
  • the decoder 501 receives the video data stream and progressively decodes the SEI messages.
  • the decoded regions of interest are transmitted to the video indexing module 502 responsible for their temporary recording before transmitting them to the recording support 503.
  • Figure 6 illustrates the method implemented by the decoder 501 and the indexing module 502.
  • the video data stream is received by the decoder 501.
  • the decoder 501 decodes the SEI messages present in the video data stream.
  • the decoded SEI messages are SEI messages as previously described in the tables 1 and 2.
  • the decoder can also decode other SEI messages but that is not the object of the present invention.
  • Each SEI message can describe one or more regions of interest per picture as described in tables 1 and 2.
  • the decoder 501 analyzes each SEI message and decodes each picture. During this step, the weight indicated in the SEI message is used to select which region of interest will be recorded for each picture. In a preferred embodiment, the region of interest with the maximum of salience is kept, i.e having the highest weight.
  • the indexing module 502. decides which picture is used to index the video. According to the preferred embodiment described here, only about 10 pictures will be selected for a video of one and a half hours. It can be imagined that in other embodiments the number of pictures will be greater. These 10 pictures are taken at regular intervals. These selected pictures are recorded temporarily in a RAM type memory comprised in the indexing module 502 and not shown.
  • the pictures are zoomed during a step T5, that is they are enlarged so that they are all the same size.
  • this size can be the size of the picture.
  • they are read in the temporary memory and re-recorded after their enlargement.
  • the pictures are enlarged prior to their recording in the temporary memory.
  • the images are presented as a mosaic on the display. Therefore, instead of being enlarged, the images are reduced to one single size, the same for all of them.
  • the indexing pictures are also transferred from the temporary memory to the recording support 503 and recorded in a file.
  • the regions of interest are used for the indexation or can also be used for display on a PVR type device when the user wants to consult the content of the database.
  • this encrypting step in respect of figure 2, would be a step E4' (not shown) but inserted after the step E4.
  • Obtaining of the decryption key could be the object of a paid for service from the programme broadcaster for example.
  • the SEI messages relating to regions of interest are encapsulated in RTP (Real Time Protocol) type packets and transmitted on a different video port.
  • Temporal CTS type labels can link the SEI messages relating to regions of interest with corresponding pictures.
  • this transmission mode enables encrypting only RTP packets containing the SEI messages and not the video.
  • the decryption is carried out at the level of the terminal receiver.
  • the encrypting standard used is DVB-CSA and SEI messages relating to regions of interest are encapsulated in a different PID than that of the video.
  • the SEI messages relating to regions of interest are linked to corresponding pictures via the PTS (timestamp) of the PES packet header. This transmission mode allows encryption only of the PIDs that contain SEI messages relating to regions of interest and not the video PID.
  • the video data stream is coded in accordance with the coding standard H.264/AVC using FMO (Flexible Macroblock Ordering) which enables coding of different parts of the picture independently and so decoding of them independently.
  • FMO Flexible Macroblock Ordering
  • the FMO mode uses "slice groups".
  • the "slice groups” are defined in the standard.
  • the regions of interest are coded in groups different from the rest of the picture.
  • a PPS type NAL comprises a map of "slice groups”. SEI messages are inserted such as those described hereafter indicating in which "slice groups" the regions of interest are coded.
  • uuid_iso_iec_11578 single word of 128 bits to indicate our message type to the decoder.
  • user_data_payload_byte 8 bits comprising a part of the SEI message.
  • a semantic information For each slice_group representing a region of interest, a semantic information, a relative weight and which macroblock it concerns can be specified.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention relates to a method and a device for indexing a coded video data stream. According to the invention,the video data stream comprises information relative to the location of regions of interest of each picture, said method comprises steps of: reception (T1) of coded video stream, recording the coded video stream on a recording support, decoding (T2) location information of regions of interest, selection (T3) of a region of interest per picture, decoding (T3) of video data, selecting (T4) a predetermined number of regions of interest for the video data stream from among the regions of interest selected per picture, recording (T6) of the selected regions of interest.

Description

VIDEO INDEXING METHOD, AND VIDEO INDEXING DEVICE
FIELD OF THE INVENTION
The invention relates to a video indexing method, and a video indexing device,
BACKGROUND OF THE INVENTION
Several picture processing applications use the detection of regions of interest (ROI) to improve picture quality. For example, coding applications often decode regions of interest and deploy more resources for coding these regions.
Different methods enable detection of regions of interest in a picture.
Particularly, methods are known based on the establishment of salience maps of a picture or a video that take into account the visual parameters and enables definition of regions on which the human eye lingers when viewing a picture or a video.
The detection of regions of interest is today principally used prior to coding in a such a manner as to privilege the regions of interest during coding by according them more bandwidth, for example by reducing the quantization step for these regions.
The emergence of mobile terminals, such as mobile telephones, PDAs, game consoles, portable DVD players, the development of display and screen techniques and the emergence of new services have all combined to render' necessary the display of video on terminals with a low display capacity. For example, the possibility to receive television on a mobile telephone raises display problems for dense pictures on low dimension screens.
The present invention is principally concerned not with the detection of regions of interest, but rather with the transmission of these regions of interest to the devices or applications that take them into account for different applications and can at least resolve the picture display problem on a terminal with a low display capacity, whether mobile or not.
SUMMARY OF THE INVENTION
For this purpose, the present invention proposes a method for indexing a coded video data stream. According to the invention, the video data stream comprises information relative to the location of regions of interest of each picture, the method comprises steps of:
- reception of coded video stream,
- recording the coded video stream on a recording support, - decoding location information of regions of interest,
- selection of a region of interest per picture,
- decoding of video data,
- selecting a predetermined number of regions of interest for the video data stream from among the regions of interest selected per picture, - recording of the selected regions of interest.
According to a preferred embodiment, during the recording step,
- the selected regions of interest are recorded in a temporary memory as they are being selected and decoded, - when all the selected regions of interest are recorded in the temporary memory, the selected regions of interest are transferred to a permanent memory support (503).
Preferentially, prior to their recording the regions of interest are formatted in order to obtain a homogenous size for all the selected regions of interest.
Preferentially, the method comprises a step of encrypting the location of the regions of interest thanks to an encryption key.
Preferentially, the method comprises a step of obtaining a decryption key upon payment by the user.
Preferentially, the video data stream is coded according to the coding standard H.264/AVC and the location information is contained in a Supplemental Enhancement Information (SEI) type message. According to a preferred embodiment, the SEI messages are encapsulated into real-time protocol packets (RTP), the RTP packets being encrypted.
Preferentially, the Supplemental Enhancement Information type messages relative to regions of interest location information are inserted in the coded data before or after each picture to which they refer.
According to a preferred embodiment, the location information comprises information chosen from:
- the number of regions of interest in each picture,
- the coordinates of each region of interest for each of the picture dimensions,
- the surface of each region of interest, - a weight relative to the importance of the region of interest with respect to other regions of interest of the picture,
- information relating to the content of each region of interest, and any combination of this information.
Preferentially, the selection step of a region of interest per picture selects a region of interest according to the weight relative to the importance of the region of interest.
Preferentially, the video coding standard uses flexible macro-bloc ordering, the regions of interest being coded into slice groups, independently from the other picture data, the location information of regions of interest comprising the slice group numbers in which the regions of interest are coded.
Preferentially, the Supplemental Enhancement Information message comprises an identifier indicating for each slice group if it is related to one region of interest. Preferentially, the method comprises a further step of reading the SEI messages and in that the step of decoding of video data decodes only the slice groups containing the region of interests.
The invention concerns also a device for indexing a coded video data stream. According to the invention, the video data stream comprises information relative to the location of regions of interest of each picture, the device comprises means for:
- receiving the coded video stream, - recording the coded video stream on a recording support (503),
- decoding (501 ) location information of the regions of interest,
- decoding (501 ) video data,
- selecting (502) a region of interest per picture,
- selecting (502) a predetermined number of regions of interest for the video data stream from among the regions of interest selected per picture,
- recording (503) the selected regions of interest.
The detection of the regions of interest of a picture is made in general prior to coding. This data is then used to facilitate the encoding. The inventors realized that the location of regions of interest can also be of interest during the decoding of a picture and particularly during the display on a device whose display capacity is limited. In fact, the reception terminal can in fact choose to display the regions of interest only, which enables having a better visibility of these regions relative to the display of the complete picture.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will be better understood and illustrated by means of embodiments and implementations, by no means limiting, with reference to the figures attached in the appendix, wherein:
- figure 1 shows a coding device according to a preferred embodiment of the invention,
- figure 2 shows a coding method according to a preferred embodiment of the invention, - figure 3 shows a decoding device according to a preferred embodiment of the invention,
- figure 4 shows a decoding method according to another embodiment of the invention, - figure 5 shows a personal recording type device according to another embodiment of the invention,
- figure 6 shows an indexing method in a personal recording type device implementing an embodiment of the invention.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
Figure 1 shows a coding device in accordance with the coding standard H.264/AVC implementing a preferred embodiment of the invention. In this preferred embodiment, a video stream is coded.
A current frame Fn is presented at the coder input to be coded by it.
This frame is coded in the form of slices, namely it is divided into sub-units which each contain a certain number of macroblocks corresponding to groups of 16x16 pixels. Each macroblock is coded in intra or inter mode. Whether in intra mode or inter mode, a macroblock is coded by being based on a reconstructed frame. A module 109 decides the coding mode in intra mode of the current picture, according to the content of the picture. In intra mode, P (shown in figure 2) comprises samples of the current frame Fn that were previously coded, decoded and reconstructed (uF'n on figure 2, u meaning non-filtered). In inter mode, P is comprised from a motion estimation based on one or more F'n-i frames.
A motion estimation module 101 establishes an estimation of motion between the current frame Fn and at least one preceding frame F'n-1. From this motion estimation, a motion compensation module 102 produces a frame P when the current picture Fn must be coded in inter mode. A subtractor 103 produces a signal Dn, the difference between the picture Fn to be coded and the picture P. Then this picture is transformed by a DCT transform in a module 104. The transformed picture is then quantized by a quantization module 105. Then, the pictures are reorganized by a module 111. A CABAC (Context-based Adaptive Binary Arithmetic Coding) type entropic coding module 112 then codes each picture.
The modules 106 and 107 respectively of quantization and inverse transformation enable a difference D'n to be reconstituted after transformation and quantization then inverse quantization and inverse transformation.
When a picture is coded in intra mode, according to module 109, an intra prediction module 108 codes the picture. A uF'n picture is obtained at the adder output 114, as is the sum of the D'n signal and the P signal. This module 108 also receives at input the reconstructed non-filtered F'n picture. A filter module 110 can obtain an F'n picture reconstructed and filtered from a uF'n picture.
The entropic decoding module 112 transmits the coded slices encapsulated in NAL type units. The NALs contain, as well as the slices, information relating to the headers for example. The NAL type units are transmitted to a module 113.
A module 116 enables the regions of interest to be determined. Several methods now enable regions of interest to be located in a picture. Particularly known are methods based on the establishment of salience maps. For example the patent application WO2006/07263 filed in the name of
Thompson Licensing on the 10th January 2006 and published on 13th July 2006 discloses an effective method for establishing a salience map.
The means 116 then establish a salience map for each picture of the video. To establish this salience map, parameters entered by the user can also be taken into account. For example, it is possible to define, according to the event to which the video is related, certain important objects of the filmed scene and particularly for sporting events to specify that it concerns a football match. Advantageously, this allows a salience map to be obtained that weights the salience zones according to the event. In a football match, it would be preferable to focus on the ball rather than on the terraces.
The region of interest module therefore enables one or more salient zones to be extracted, also referred to as regions of interest. These regions of interest are then geographically located on the picture. They are identified by their coordinates according to the height and width of the picture. Their size can also be extracted for each of the regions of interest. It is also possible to associate them with an element of semantic information. In fact for a football match, one may require information on a region of interest if the user can select the regions of interest to be displayed from a choice of several regions of interest to be displayed.
The module 115 receives information relating to the regions of interest in order to code them into an SEI ("Supplemental Enhancement Information") type message.
The SEI message is coded as indicated in the table below:
Table 1
uuid_iso_iec_11578: single word of 128 bits to indicate our message type to the decoder. user_data_payload_byte: 8 bits comprising a part of the SEI massage.
Typically in this case: payloadSize = 17 (bytes) thus 16 for the UUID and 1 for the proprietary data. user_data_payload_byte:
Table 2
Where: number_of_ROI: Number of regions of interest present in the picture
(or the following pictures). roi_x_16: Position X in the picture of the region of interest, in multiples of 16 pixels. roi_y_16: Position Y in the picture of the region of interest, in multiples of 16 pixels. roi_w_16: Width in the picture of the region of interest, in multiples of
16 pixels. roi_h_16: Height in the picture of the region of interest, in multiples of
16 pixels. semantic_information: title characterizing the region of interest.
Relative weights: gives the weight of each region of interest of the picture in such a way to know which region of interest that has in principle the most interest.
Macroblock_alignment: gives the number of the starting macroblock in which the region of interest is found, as well as the size of the region of interest in number of macroblocks, in width and in height.
When regions of interest are detected using the salience maps, a rate of salience is obtained for each region of interest, the regions are classified as salient if their salience is higher than a certain threshold predetermined by the method for obtaining salience maps. Hence, in the SEI messages, the regions of interest are classed in increasing order of salience for all regions where the salience is higher than a fixed threshold. The module 113 inserts the SEI message into the data stream and sends the video stream thus coded to the transmission network.
An SEI message is transmitted before each picture to which it refers. In other embodiments, it is also possible to transmit the SEI message only when the location of at least one region of interest changes between two or more pictures. Hence, during decoding, the decoder takes into account the last SEI message received, whether it is immediately before the picture to be decoded or if it relates to a picture previously received if the current picture is not preceded by such an SEI message.
Figure 2 shows a coding method in accordance with the coding standard H.264/AVC implementing a preferred embodiment of the invention.
During a step E1 , the salience map associated with the video to be broadcast is determined. In order to determine this salience map that shows the regions of interest, information relating to the video content can also be received to take account of this information during the establishment of the salience map. Particularly, during a sporting event, it can be considered that the position of the ball corresponds to a region of interest for the user and in this case, privilege the zones of the picture in which the ball is situated. When the video corresponds to the broadcast of a televised report, it can also be assumed that the presenter corresponds to a region of interest, and in this case, determine the regions of interest by privileging the zones containing the presenter by detecting for example the face using known picture processing techniques.
At the end of the E1 step, one or more regions of interest relating to the video content are thus obtained.
During a step E2, the coordinates of the regions of interest in the pictures are determined. The size of the regions of interest can also be determined in pixels and semantic information on the content can be associated with each region of interest.
In parallel, during a step E3, the video stream is coded according to the coding standard H.264. During the coding, zones are privileged that were detected as regions of interest. In order to privilege the regions of interest at the coding level, a lower quantization step is applied to them.
Following step E2, during a step E4, an SEI message is created from location and semantic information associated with the regions of interest. The SEI message thus created is in accordance with the SEI message previously described in tables 1 and 2.
During a step E5, the stream is constituted by inserting SEI messages into the stream to obtain a coded stream according to the H.264 standard.
The video stream thus coded is transmitted to decoding devices in real time or in a deferred manner during a step E6, the decoding devices can be local or remote.
Figure 3 represents a preferred embodiment of a decoding device according to the invention, in accordance with the coding standard H.264/AVC.
A 209 module receives SEI messages at the input. It extracts the different SEI messages. The NALs of useful data are transmitted to an entropic decoding module 201. The SEI messages are analyzed by a module 210. This module enables decoding of the content of SEI messages representative of the regions of interest. The regions of interest of each picture are thus identified at the level of the decoding device in a simple manner and prior to the decoding of each picture using information contained in the field macroblock_alignment.
The macroblocks are transmitted to a re-ordering module 202 to obtain a set of coefficients. These coefficients undergo an inverse quantization in the module 203 and an inverse DCT transformation in the module 204 at the output of which D'n macroblocks are obtained, D'n being a deformed version of Dn. A predictive block P is added to D'n, by an adder 205, to reconstruct a macroblock uF'n. The block P is obtained after motion compensation, carried out by a module 208, of the preceding decoded frame, during a coding in inter mode or after intra prediction of the macroblock uF'n, by the module 207, in the case of coding in intra mode. A filter 206 is applied to the signal uF'n to reduce the effects of the distortion and the reconstructed frame F'n is created from a series of macroblocks.
Using information relating to the regions of interest comprised in the
SEI messages, the blocks representative of regions of interest are detected in the stream and prior to display, these blocks are identified and can be cropped according to the choice of the user and transmitted for display to a device such as a PDA, or mobile telephone.
It is also possible to leave the choice to the user to choose which macroblock he wants to display, by entering semantic information for example. He enters for example "ball" and in this case the regions of interest containing a ball are displayed. If no region of interest is associated with this semantic, then all the regions of interest can be displayed. The different regions of interest can be displayed in the form of a mosaic on the screen.
When a single region of interest is displayed, this region of interest is displayed in zoom on the screen to take up the full screen.
The decoding device thus only decodes the macroblocks likely to contain information of interest to the user. In this way the decoding is faster and requires less resources at the level of the decoding device and therefore at reception. This is particularly advantageous when the receiving device is a mobile terminal comprising limited processing capacity.
Figure 4 shows a decoding method in accordance with the coding standard H.264/AVC implementing a preferred embodiment of the invention.
Such a method can be implemented in a mobile terminal having a limited display capacity.
During a step S1 , the type of display required is selected. The selection is made by means of the user interface present on the mobile terminal. Either it is decided to function in full picture mode and in this case the integrality of the video stream is displayed as it is transmitted by the transmitter. Or it is decided to display the only the regions of interest of the picture. This particular mode constitutes the particularity of the invention. When it is decided to display the regions of interest, it passes to step S2, if not it passes to step S8. It is understood that different types of SEI messages can be inserted into the video stream for other applications and in this case, prior to step S8 or during step S8, there can be a step of SEI message analysis.
During a step S2, the user selects the use that he wants to make of the regions of interest. Particularly, he can select: - the maximum number of regions of interest that he wants to display.
- the manner in which he wants to display the various regions of interest on the screen, for example in the form of a mosaic,
- the degree of zoom that he wants on the region of interest.
- using a keyword, the regions of interest whose "semantic information" field comprises the keyword. In this case, for each picture, it is also possible to specify whether it is required to display a single region of interest per picture comprising the keyword (and in this case those for which the salience is maximum) or several regions of interest comprising the key word.
During a step S3, the SEI messages present in the stream, are analyzed as they are being received. The SEI message is used to code the location of regions of interest of the picture as they were detected prior to the picture coding. Hence for each picture, there can be one or more regions of interest according to the visual properties of the picture or according to picture content or both. The SEI message is coded according to the tables 1 and 2 previously described. Information relating to SEI messages is recorded temporally up until the display of the corresponding picture.
During a step S4, the pictures are all decoded in conformance with the decoding standard. During a step S5, the decoded regions of interest are processed according to those that the user selected during the S2 step. If the user selects a zoom of the principle region of interest of the picture, then during step S6, the zone is magnified so as to reach the maximum size of display. If the user has selected a mosaic of regions of interest then the picture is recomposed of regions of interest, each being magnified according to the screen size and the number of regions of interest selected for display. If the user has specified a keyword, then the regions of interest comprising the keyword are displayed and zoomed. During a step S7, the regions of interest are displayed on the screen of the mobile terminal, according to the user's desire.
During step a S8, following a non-selection by the user to display only the regions of interest, the entire video stream is decoded for display.
Figure 5 shows a video indexing application of the invention.
Figure 5 partially shows a personal recorder (PVR) type device 500.
The PVR 500 receives a compressed video stream at its input. According to the embodiment described, this video data stream is in accordance with the coding standard H.264. The compressed video stream comprises particularly
SEI messages as previously described in the tables 1 and 2.
This video data stream is partly transmitted to a recording support 503. Recording support can be understood as hard disk, holographic support, memory card or "blue ray" disk. This recording support can be remote in other embodiments.
The video data stream is transmitted in another part to a decoder 501 to be decoded in real time, this for example to be displayed on a television set. In the known devices, the stream is transmitted to the decoder 501 when the user wants to view it in real time. If not, it is not decoded but simply recorded, when recording is requested.
The present invention, according to this aspect, offers to decode part of the video data stream, even when viewing in real time is not requested. For a part of the video stream, it is understood particularly the regions of interest or certain regions of interest.
When the decoder 501 receives a video stream for which a recording is requested, the data is transmitted to the recording support 503. The recording support 503 records the data as it is received. In a simultaneous manner, the decoder 501 receives the video data stream and progressively decodes the SEI messages. The decoded regions of interest are transmitted to the video indexing module 502 responsible for their temporary recording before transmitting them to the recording support 503. Figure 6 illustrates the method implemented by the decoder 501 and the indexing module 502.
During a step T1 , the video data stream is received by the decoder 501. During a step 12, the decoder 501 decodes the SEI messages present in the video data stream. The decoded SEI messages are SEI messages as previously described in the tables 1 and 2. The decoder can also decode other SEI messages but that is not the object of the present invention. Each SEI message can describe one or more regions of interest per picture as described in tables 1 and 2. During a step T3, the decoder 501 analyzes each SEI message and decodes each picture. During this step, the weight indicated in the SEI message is used to select which region of interest will be recorded for each picture. In a preferred embodiment, the region of interest with the maximum of salience is kept, i.e having the highest weight.
Once the region of interest has been decoded, during a step T4, it is transmitted to the indexing module 502. The recording of a region of interest per picture, and this for all the pictures, is of little interest as it represents a large volume of information and also does not enable an efficient indexing of the video. Hence the indexing module decides which picture is used to index the video. According to the preferred embodiment described here, only about 10 pictures will be selected for a video of one and a half hours. It can be imagined that in other embodiments the number of pictures will be greater. These 10 pictures are taken at regular intervals. These selected pictures are recorded temporarily in a RAM type memory comprised in the indexing module 502 and not shown. In order to display them in the best manner, the pictures are zoomed during a step T5, that is they are enlarged so that they are all the same size. According to a preferred embodiment, this size can be the size of the picture. For that, they are read in the temporary memory and re-recorded after their enlargement. According to another embodiment, the pictures are enlarged prior to their recording in the temporary memory.
According to another embodiment, the images are presented as a mosaic on the display. Therefore, instead of being enlarged, the images are reduced to one single size, the same for all of them. When the entire video is received and so recorded in the recording support 503, during step T6, the indexing pictures are also transferred from the temporary memory to the recording support 503 and recorded in a file.
Then according to the desired use, the regions of interest are used for the indexation or can also be used for display on a PVR type device when the user wants to consult the content of the database.
According to another aspect of the invention, it is also possible to encrypt the location data of the regions of interest during the coding of SEI messages. Hence, only users having the decryption key can access the regions of interest and so access the visualization of regions of interest or indexing of video streams due to the location information of the regions of interest. This encrypting step, in respect of figure 2, would be a step E4' (not shown) but inserted after the step E4.
Obtaining of the decryption key could be the object of a paid for service from the programme broadcaster for example.
To do this, the SEI messages relating to regions of interest are encapsulated in RTP (Real Time Protocol) type packets and transmitted on a different video port. Temporal CTS type labels can link the SEI messages relating to regions of interest with corresponding pictures. Advantageously, this transmission mode enables encrypting only RTP packets containing the SEI messages and not the video.
The decryption is carried out at the level of the terminal receiver. In the case of an MPEG-2 TS encapsulation, the encrypting standard used is DVB-CSA and SEI messages relating to regions of interest are encapsulated in a different PID than that of the video. The SEI messages relating to regions of interest are linked to corresponding pictures via the PTS (timestamp) of the PES packet header. This transmission mode allows encryption only of the PIDs that contain SEI messages relating to regions of interest and not the video PID.
According to another embodiment, the video data stream is coded in accordance with the coding standard H.264/AVC using FMO (Flexible Macroblock Ordering) which enables coding of different parts of the picture independently and so decoding of them independently. The FMO mode uses "slice groups". The "slice groups" are defined in the standard. In this embodiment, the regions of interest are coded in groups different from the rest of the picture. A PPS type NAL comprises a map of "slice groups". SEI messages are inserted such as those described hereafter indicating in which "slice groups" the regions of interest are coded.
The tables below illustrate the format of the SEI message used according to this embodiment:
Table 3
uuid_iso_iec_11578: single word of 128 bits to indicate our message type to the decoder. user_data_payload_byte: 8 bits comprising a part of the SEI message.
Typically in this case:
• payloadSize = 17 (bytes) thus 16 for the UUID and 1 for the proprietary data.
• user_data_payload_byte:
Table 4
- Slice_group(i)_id: if the slice_group_id equals "1" then the slice_group represents an region of interest, if it equals "0" then the slice_group represents the rest of the picture.
For each slice_group representing a region of interest, a semantic information, a relative weight and which macroblock it concerns can be specified.
Hence, only macroblocks corresponding to the regions of interest can be decoded during reception as they are identified and coded independently.

Claims

Claims
1. Method for indexing a coded video data stream, characterized in that said video data stream comprises information relative to the location of regions of interest of each picture, said method comprises steps of:
- reception (T1 ) of coded video stream,
- recording the coded video stream on a recording support,
- decoding (T2) location information of regions of interest,
- selection (T3) of a region of interest per picture, - decoding (T3) of video data,
- selecting (T4) a predetermined number of regions of interest for the video data stream from among the regions of interest selected per picture,
- recording (T6) of the selected regions of interest.
2. Indexing method according to claim 1 characterized in that during the recording step,
- the selected regions of interest are recorded in a temporary memory as they are being selected and decoded, - when all the selected regions of interest are recorded in said temporary memory, the said selected regions of interest are transferred to a permanent memory support (503).
3. Indexing method according to one of claims 1 or 2, characterized in that prior to their recording said regions of interest are formatted in order to obtain a homogenous size for all the selected regions of interest.
4. Indexing method according to any of the previous claims characterized in that it comprises a step of encrypting the location of the regions of interest thanks to an encryption key.
5. Indexing method according to claim 4 characterized in that it comprises a step of obtaining a decryption key upon payment by the user.
6. Indexing method according to any of the previous claims characterized in that the video data stream is coded according to the coding standard H.264/AVC and the location information is contained in a Supplemental Enhancement Information(SEI) type message.
7. Indexing method according to claims 5 and 6 characterized in that said SEI messages are encapsulated into real-time protocol packets (RTP), said RTP packets being encrypted.
8. Indexing method according to one of claims 5 or 6, characterized in that said Supplemental Enhancement Information type messages relative to regions of interest location information are inserted in the coded data before or after each picture to which they refer.
9. Indexing method according to one of the preceding claims, characterized in that said location information comprises information chosen from:
- the number of regions of interest in each picture, - the coordinates of each region of interest for each of the picture dimensions,
- the surface of each region of interest,
- a weight relative to the importance of the region of interest with respect to other regions of interest of said picture, - information relating to the content of each region of interest, and any combination of this information.
10. Indexing method according to any of the previous claims, characterized in that said selection step (T3) of a region of interest per picture selects a region of interest according to the weight relative to the importance of said region of interest.
11. Indexing method according to any of claims 6 to 10 characterized in that the video coding standard uses flexible macro-bloc ordering, said regions of interest being coded into slice groups, independently from the other picture data, said location information of regions of interest comprising the slice group numbers in which the regions of interest are coded.
12. Indexing method according to claim 11 characterized in that Supplemental Enhancement Information message comprises an identifier indicating for each slice group if it is related to one region of interest.
13. Indexing method according to claim 12 characterized in that it comprises a further step of reading the SEI messages and in that the step of decoding of video data (T3) decodes only the slice groups containing the region of interests.
14. Device for indexing a coded video data stream, characterized in that said video data stream comprises information relative to the location of regions of interest of each picture, said device comprises means for:
- receiving the coded video stream, - recording the coded video stream on a recording support (503),
- decoding (501 ) location information of the regions of interest,
- decoding (501 ) video data,
- selecting (502) a region of interest per picture,
- selecting (502) a predetermined number of regions of interest for the video data stream from among the regions of interest selected per picture,
- recording (503) the selected regions of interest.
EP08761351A 2007-06-29 2008-06-25 Video indexing method, and video indexing device Ceased EP2174500A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR0756181 2007-06-29
PCT/EP2008/058050 WO2009003885A2 (en) 2007-06-29 2008-06-25 Video indexing method, and video indexing device

Publications (1)

Publication Number Publication Date
EP2174500A2 true EP2174500A2 (en) 2010-04-14

Family

ID=39204994

Family Applications (1)

Application Number Title Priority Date Filing Date
EP08761351A Ceased EP2174500A2 (en) 2007-06-29 2008-06-25 Video indexing method, and video indexing device

Country Status (5)

Country Link
EP (1) EP2174500A2 (en)
JP (1) JP5346338B2 (en)
KR (1) KR101488548B1 (en)
CN (1) CN101690228B (en)
WO (1) WO2009003885A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9532086B2 (en) 2013-11-20 2016-12-27 At&T Intellectual Property I, L.P. System and method for product placement amplification

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5305451B2 (en) * 2009-06-03 2013-10-02 独立行政法人情報通信研究機構 Hologram encoding apparatus and hologram decoding apparatus, and hologram encoding program and hologram decoding program
JP2011009949A (en) 2009-06-24 2011-01-13 Toshiba Corp Video processor and video processing method
CN103096049A (en) 2011-11-02 2013-05-08 华为技术有限公司 Video processing method and system and associated equipment
WO2013077236A1 (en) * 2011-11-21 2013-05-30 Canon Kabushiki Kaisha Image coding apparatus, image coding method, image decoding apparatus, image decoding method, and storage medium
CN103246658B (en) * 2012-02-03 2017-02-08 展讯通信(上海)有限公司 Index table building method and coding method
HUE031183T2 (en) 2012-04-13 2017-06-28 Ge Video Compression Llc Scalable data stream and network entity
EP3905697A1 (en) 2012-06-29 2021-11-03 GE Video Compression, LLC Video data stream method and apparatus
US9247225B2 (en) 2012-09-25 2016-01-26 Intel Corporation Video indexing with viewer reaction estimation and visual cue detection
WO2014168972A1 (en) * 2013-04-08 2014-10-16 Sony Corporation Region of interest scalability with shvc
EP3028472B1 (en) * 2013-07-29 2020-02-26 Koninklijke KPN N.V. Providing tile video streams to a client
US20150237351A1 (en) * 2014-02-18 2015-08-20 Penne Lee Techniques for inclusion of region of interest indications in compressed video data
US10694192B2 (en) 2014-06-27 2020-06-23 Koninklijke Kpn N.V. HEVC-tiled video streaming
WO2015197815A1 (en) 2014-06-27 2015-12-30 Koninklijke Kpn N.V. Determining a region of interest on the basis of a hevc-tiled video stream
US10715843B2 (en) 2015-08-20 2020-07-14 Koninklijke Kpn N.V. Forming one or more tile streams on the basis of one or more video streams
US10674185B2 (en) 2015-10-08 2020-06-02 Koninklijke Kpn N.V. Enhancing a region of interest in video frames of a video stream
US10582201B2 (en) * 2016-05-19 2020-03-03 Qualcomm Incorporated Most-interested region in an image
WO2018043143A1 (en) * 2016-08-30 2018-03-08 ソニー株式会社 Transmitting device, transmitting method, receiving device and receiving method
EP3823275B1 (en) * 2016-11-17 2024-08-14 INTEL Corporation Indication of suggested regions of interest in the metadata of an omnidirectional video
CN108810600B (en) * 2017-04-28 2020-12-22 华为技术有限公司 Video scene switching method, client and server
US10771163B2 (en) * 2017-10-24 2020-09-08 Mediatek Inc. Apparatus and method for decoding ROI regions in image
US11523185B2 (en) 2019-06-19 2022-12-06 Koninklijke Kpn N.V. Rendering video stream in sub-area of visible display area
CN111510752B (en) * 2020-06-18 2021-04-23 平安国际智慧城市科技股份有限公司 Data transmission method, device, server and storage medium
CN113747151B (en) * 2021-07-30 2024-04-12 咪咕文化科技有限公司 Video encoding and decoding method, device, equipment and computer readable storage medium
CN116074585B (en) * 2023-03-03 2023-06-23 乔品科技(深圳)有限公司 Super-high definition video coding and decoding method and device based on AI and attention mechanism

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070034943A (en) * 2005-09-26 2007-03-29 한국전자통신연구원 Apparatus and Method for Restoring Multiple Roi Settings in Scalable Video Coding

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07148155A (en) * 1993-11-26 1995-06-13 Toshiba Corp Computerized tomographic apparatus
US20020044696A1 (en) * 1999-11-24 2002-04-18 Sirohey Saad A. Region of interest high resolution reconstruction for display purposes and a novel bookmarking capability
US6549674B1 (en) * 2000-10-12 2003-04-15 Picsurf, Inc. Image compression based on tiled wavelet-like transform using edge and non-edge filters
US6909745B1 (en) * 2001-06-05 2005-06-21 At&T Corp. Content adaptive video encoder
FR2833132B1 (en) * 2001-11-30 2004-02-13 Eastman Kodak Co METHOD FOR SELECTING AND SAVING A SUBJECT OF INTEREST IN A DIGITAL STILL IMAGE
JP3966461B2 (en) * 2002-08-09 2007-08-29 株式会社リコー Electronic camera device
JP2005110145A (en) * 2003-10-02 2005-04-21 Ricoh Co Ltd Code string converter, code string converting method, photographing system, image display system, monitoring system, program, and information recording
US20060045381A1 (en) * 2004-08-31 2006-03-02 Sanyo Electric Co., Ltd. Image processing apparatus, shooting apparatus and image display apparatus
US7598977B2 (en) * 2005-04-28 2009-10-06 Mitsubishi Electric Research Laboratories, Inc. Spatio-temporal graphical user interface for querying videos
EP1748385A3 (en) * 2005-07-28 2009-12-09 THOMSON Licensing Method and device for generating a sequence of images of reduced size
US8024768B2 (en) * 2005-09-15 2011-09-20 Penthera Partners, Inc. Broadcasting video content to devices having different video presentation capabilities

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070034943A (en) * 2005-09-26 2007-03-29 한국전자통신연구원 Apparatus and Method for Restoring Multiple Roi Settings in Scalable Video Coding

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9532086B2 (en) 2013-11-20 2016-12-27 At&T Intellectual Property I, L.P. System and method for product placement amplification
US10412421B2 (en) 2013-11-20 2019-09-10 At&T Intellectual Property I, L.P. System and method for product placement amplification

Also Published As

Publication number Publication date
CN101690228B (en) 2012-08-08
KR101488548B1 (en) 2015-02-02
WO2009003885A2 (en) 2009-01-08
JP2010532121A (en) 2010-09-30
CN101690228A (en) 2010-03-31
KR20100042632A (en) 2010-04-26
WO2009003885A3 (en) 2009-03-26
JP5346338B2 (en) 2013-11-20

Similar Documents

Publication Publication Date Title
KR101488548B1 (en) Video indexing method, and video indexing device
US10911786B2 (en) Image processing device and method
US9918108B2 (en) Image processing device and method
US7933327B2 (en) Moving picture coding method and moving picture decoding method
EP0895694B1 (en) System and method for creating trick play video streams from a compressed normal play video bitstream
JP4877852B2 (en) Image encoding apparatus and image transmitting apparatus
US7650032B2 (en) Method for encoding moving image and method for decoding moving image
US8081678B2 (en) Picture coding method and picture decoding method
US20010028725A1 (en) Information processing method and apparatus
US8750631B2 (en) Image processing device and method
KR100630983B1 (en) Image processing method, and image encoding apparatus and image decoding apparatus capable of employing the same
JPWO2013031315A1 (en) Image processing apparatus and image processing method
US20060062299A1 (en) Method and device for encoding/decoding video signals using temporal and spatial correlations between macroblocks
JPH1079941A (en) Picture processor
CN114946190B (en) Image encoding apparatus and method for controlling loop filtering
CN114946191B (en) Image encoding apparatus and method based on signaling of information for filtering
JP2006311079A (en) Image bit stream conversion apparatus
EP1926104A1 (en) Encoding device, decoding device, recording device, audio/video data transmission system
JP2012010147A (en) Information processing device and information processing method
JP3519722B2 (en) Data processing method and data processing device
JP2006109060A (en) Blur correcting method and device using image coding information
KR20100028749A (en) System and method for transmitting and receiving of multi-view video
JP2003284067A (en) Image relay system, image transmitter and its program, and image receiver and its program

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20091223

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA MK RS

DAX Request for extension of the european patent (deleted)
17Q First examination report despatched

Effective date: 20140606

REG Reference to a national code

Ref country code: DE

Ref legal event code: R003

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20151020