Nothing Special   »   [go: up one dir, main page]

WO2014094912A1 - Processing media data - Google Patents

Processing media data Download PDF

Info

Publication number
WO2014094912A1
WO2014094912A1 PCT/EP2012/076811 EP2012076811W WO2014094912A1 WO 2014094912 A1 WO2014094912 A1 WO 2014094912A1 EP 2012076811 W EP2012076811 W EP 2012076811W WO 2014094912 A1 WO2014094912 A1 WO 2014094912A1
Authority
WO
WIPO (PCT)
Prior art keywords
media
piece
user
audio data
outputted
Prior art date
Application number
PCT/EP2012/076811
Other languages
French (fr)
Inventor
Steve Hamilton SHAW
Daniel Laurence
Original Assignee
Rocket Pictures Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rocket Pictures Limited filed Critical Rocket Pictures Limited
Priority to PCT/EP2012/076811 priority Critical patent/WO2014094912A1/en
Publication of WO2014094912A1 publication Critical patent/WO2014094912A1/en

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/102Programmed access in sequence to addressed parts of tracks of operating record carriers
    • G11B27/105Programmed access in sequence to addressed parts of tracks of operating record carriers of operating discs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/41Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47202End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting content on demand, e.g. video on demand
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/254Management at additional data server, e.g. shopping server, rights management server

Definitions

  • the present invention relates to processing media.
  • a user watches a piece of media (which may, for example, be a film, a live television programme, a pre-recorded television programme, a music video or an advertisement), he may see something in the video data of the piece of media which he would like to remember later, e.g. when the piece of media has finished being outputted at the user device.
  • the user may see a particular object being displayed in the film (e.g. a watch worn in a scene by an action hero) for which he would like to find out more details.
  • a user has to either pause the film to identify the object (if he can do that) or attempt to recall the object when the film finishes. For example, in a cinema, pausing the film is not an option for a viewer. Summary
  • ODM Onscreen Visual Media
  • users may have a smartphone, a tablet, a laptop, a personal computer (PC) or gaming device.
  • PC personal computer
  • Such devices are capable of executing applications (Apps) which interface with a user.
  • the pieces of media referred to herein include both video data and audio data output in synchronisation.
  • the application can record a frame of the piece of media corresponding to the moment within the piece of media that the user wanted to remember for display to the user. This can be done when the movie or program is finished, or during the movie or program. In methods described herein, this is achieved by identifying a piece of media using the audio data of the piece of media. A small portion of audio data (e.g. of the order of 1 to 10 seconds) of a piece of media is usually sufficient to identify a piece of media and the temporal position of the audio data within the piece of media. The application can then track the output of the piece of media.
  • a method of processing media data comprising: receiving audio data of a piece of media outputted to a user, said piece of media comprising synchronised video data and audio data; comparing the received audio data of the outputted piece of media to audio data of known pieces of media stored in a data store, to thereby identify the outputted piece of media; receiving notification of a user input at a user device during the output of the piece of media to the user; and storing an indication of a portion, e.g. a frame or scene, of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
  • the method of the first aspect may be performed at a server or at the user device.
  • a computer program product configured to process media may be embodied on a computer-readable storage medium and configured so as when executed on a processor of the server to perform the method of the first aspect.
  • a method of processing media data comprising: receiving at a user device audio data of a piece of media outputted to a user; sending the audio data of the outputted piece of media for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media; receiving a user input from the user during the output of the piece of media to the user; and sending a notification of the user input to the server for use in storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
  • a computer program product configured to process media may be embodied on a computer-readable storage medium and configured so as when executed on a processor of the user device to perform the method of the second aspect.
  • the audio data of a piece of media is used to identify the piece of media.
  • the timing of a user input within the piece of media corresponds to a frame of the video data of the piece of media, and an indication of that frame is stored. This allows the user to provide a user input during the output of the piece of media in order to remember a moment within the piece of media.
  • the invention also provides a computer device configured to process media data, the computer device comprising: a receiving module configured to: (i) receive audio data of a piece of media outputted to a user, said piece of media comprising synchronised video data and audio data, and (ii) receive a notification of a user input during the output of the piece of media to the user; a data store configured to store known pieces of media; a comparing module configured to compare the received audio data of the outputted piece of media to audio data of the known pieces of media stored in the data store, to thereby identify the outputted piece of media; and a storing module configured to store an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
  • the invention also provides a user device configured to process media, the user device comprising: an audio data receiving module configured to receive audio data of a piece of media output to a user of the user device, said piece of media comprising synchronised video data and audio data; a user interface configured to receive a user input from the user during the output of the piece of media to the user; and a sending module configured to: (i) send audio data of the outputted piece of media to a server for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media, and (ii) send a notification of the user input to the server, said indication of the user input being for use by the server in storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
  • Figure 1 shows a schematic illustration of a network
  • Figure 2 is a schematic functional block diagram of a user device
  • Figure 3 is a schematic functional block diagram of a server
  • Figure 4 is a flow chart for a process of processing media according to a preferred embodiment
  • Figure 5 is an example of a graph showing the amplitude of an audio signal as a function of time.
  • Figures 6a to 6d, 7a and 7b show examples of user interfaces displayed at a user device.
  • Figure 1 shows a system including a user device 102 which is useable by a user 104.
  • the user 104 is watching a piece of media on another device 103, referred to as a screen device, which could be a cinema screen, or computer, DVD or television, for example.
  • the user device 102 can connect to the network 106, which may for example be the Internet.
  • the user device 102 may for example be a mobile phone (e.g. a smartphone), a tablet, a laptop, a personal computer (“PC”), a gaming device or other embedded device able to communicate over the network 106.
  • the user device 102 is arranged to receive information from and output information to the user 104.
  • the network 106 comprises a server 108 which has access to a data store such as a database 110.
  • IP Internet Protocol
  • FIG. 2 illustrates a detailed view of the user device 102.
  • the user device 102 comprises a processor (“CPU") 202 configured to process data on the user device 102.
  • CPU processor
  • Connected to the CPU 202 is a display 204 which may be implemented as a touch screen for inputting data to the CPU 202.
  • Also connected to the CPU 202 are speakers 206 for outputting audio data, a microphone 208 for receiving audio data, a keypad 210, a memory 212 for storing data and a network interface 214 for connecting to the network 106.
  • the display 204, speakers 206, microphone 208, keypad 210, memory 212 and network interface 214 are integrated into the user device 102 (e.g. when the user device 104 is a mobile phone).
  • the display 204 and speakers 206 act as output apparatus of the user device 102 for outputting video and audio data respectively.
  • the display 204 (when implemented as a touch screen), microphone 208 and keypad 210 act as input apparatus of the user device 102.
  • one or more of the display 204, speakers 206, microphone 208, keypad 210, memory 212 and network interface 214 may not be integrated into the user device 102 and may be connected to the CPU 202 via respective interfaces.
  • One example of such an interface is a USB interface.
  • the user device 102 may include other components which are not shown in Figure 2. For example, when the user device 102 is a PC, the CPU 202 may be connected to a mouse via a USB interface.
  • the CPU 202 may be connected to a touchpad via a USB interface.
  • the CPU 202 may be connected to a remote control via a wireless (e.g. infra-red) interface.
  • An operating system (OS) 216 is running on the CPU 202.
  • the user device 102 is configured to execute a media application 218 on top of the OS 216.
  • the media application 218 is a computer program product which is configured to process media data at the user device 102.
  • the media application 218 is stored in the memory 212 and when executed on the CPU 202 performs methods described in more detail below for processing media data at the user device 102.
  • FIG 3 illustrates a detailed view of the server 108.
  • the server 108 comprises a processor (“CPU") 302 configured to process data on the server 108.
  • CPU central processing unit
  • the server 108 also includes a memory for storing data which may include the database 110.
  • a computer program product may be stored in the memory at the server 08 and configured such that when it is executed on the CPU 302 to perform methods described in more detail below for processing media at the server 108.
  • steps S402, S404, S410, S412, S414, S418, S420, S424 and S426) are implemented at the user device 102, whereas the steps shown in the right hand column (i.e. steps S406, S408, S416 and S422) are implemented at the server 108.
  • a piece of media is outputted to the user 104 at the screen device 103.
  • the screen device 103 may be showing a film or TV program.
  • the film includes synchronized streams of video and audio data.
  • the piece of media may be a television programme, music video, advert, or any other OVM.
  • the piece of media can come from any source, and can be shown on any suitable screen device. Moreover, it could be streamed to the user device or stored at the user device for display at the user device itself. In that case, the screen device 103 is the same as the user device 102.
  • the user 104 opens the media application 218, such that the media application 218 executes on the CPU 202.
  • the media application 218 takes a sample of the audio data of the piece of media currently being output at the screen device 103.
  • the sample may for example have a duration in the range of 1 second to 10 seconds.
  • the sample has a sufficient duration for the piece of media to be identified as described below.
  • step S404 the audio data of the piece of media which is outputted at the screen device 103 is received by the user device 102 which sends it over the network 106 to the server 108 (e.g. via the network interface 214 and the network interface 304).
  • the network interface 214 may connect the user device 102 to the network 106 via a Wi-Fi router or via a cellular telephone network (e.g. implementing 3rd or 4 th generation of mobile telecommunications (3G or 4G) technology).
  • the server 108 receives the audio data sent from the user device 102, and in step S406 the server 108 uses the received audio data to identify the piece of media being outputted at the user device 102.
  • the server 108 can also identify the exact point in the piece of media at which the audio data occurs. This takes a few seconds, and is implemented as described below.
  • the audio data may be represented as a graph of amplitude against time, such as that shown in Figure 5.
  • a piece of media has a unique signature through samples of its audio data. This is true even for audio samples which have a duration of the order of 1 to 10 seconds.
  • the server 108 has access to a data store which stores audio data of known pieces of media.
  • the data store is implemented as a database 110 stored at the server 108.
  • the database 110 may for example store data representing power functions (such as that shown in Figure 5) for the audio data of the known pieces of media.
  • the known pieces of media may include, for example, a film, a live television programme, a pre-recorded television programme, a music video, an advert, or any other OVM which may be output.
  • step S406 the audio data received from the user device 102 is compared to audio data of known pieces of media stored in the database 110, to thereby identify the piece of media being outputted from the user device 102.
  • the audio signature of a piece of media is used to differentiate between known pieces of media and the exact position of the audio data within a piece of media.
  • the comparison of the received audio data with the known pieces of audio data may be performed using known algorithms. This involves comparing features (e.g. audio fingerprints) of audio data using statistical analysis to determine whether two samples of audio data match each other. For example, applications such as Shazam, Soundhound, SoundPrint and IntoNow implement algorithms for identifying audio data by comparing it with audio data from a database of audio data from known pieces of media. As an example, the IntoNow implementation is described in the US patent publication number 2012/0209612 A1. Since such algorithms are known in the art, they are not described in detail herein.
  • Step S406 identifies which piece of media the user 104 is viewing and also identifies the exact temporal position within the piece of media to which the audio data matches.
  • the server 108 sends an indication of the identified piece of media to the user device 102.
  • the server 108 also sends an identifier of the temporal position within the identified piece of media to the user device 102.
  • the user device 102 receives the indication of the piece of media (e.g. the title of the piece of media) and the identifier of the temporal position (e.g. a time from the start of the piece of media).
  • step S410 using the indication of the piece of media and the identifier of the temporal position, the media application 218 tracks the outputting of the piece of media.
  • the tracking of the piece of media may continue until completion of the outputting of the piece of media, that is, until the piece of media finishes being output.
  • the media application 218 will need to reconnect to the server 108 in order to correctly track the output of the piece of media.
  • the film may be paused on TV, or buffering issues may interrupt the playout if the media is being streamed.
  • a reconnect process would repeat steps S404 to S410 as described above.
  • the media application 218 is able to obtain information (e.g. title) indicating what media is being output and how far through that media (in time) the outputting of the media is.
  • the media application 218 may display details about the piece of media on the display 204 when it has received them from the server 108. For example, the media application 218 may display the title, current point and total length of the piece of media currently being output.
  • the media application 218 receives a user input from the user 104 during the output of the piece of media.
  • the user 104 may provide the user input via the user interface of the user device 102.
  • the user interface of the user device 102 comprises input apparatus by which the user can provide a user input.
  • the input apparatus may for example comprise one or more of a touch screen, a button, a mouse, a touch pad, a voice recognition system and a remote control.
  • the user taps the touch screen 204 to provide the user input.
  • a gesture can act as an input.
  • the user may provide the user input when he sees something in the outputted piece of media which he wants to record, either to remember after the piece of media has finished or at the time the object of interest is displayed.
  • the user 104 may decide that he would like to buy an object being displayed in the film (e.g. a watch worn in a scene by an action hero).
  • the user 104 may not want to interrupt his viewing experience of the film, so he decides that he will follow up on the object when the film has finished.
  • the user might not proceed to buy the object, for one of many possible reasons.
  • the user may forget about his intention to buy the object, he may not know how to buy the object or he may proceed to do something else when the film finishes instead of buying the object.
  • a viewer may want additional information about something shown in a TV documentary, such as an animal or location in a wildlife or holiday program.
  • the user 104 will be reminded (e.g. when the film finishes) of what was on the screen at the time of providing the user input.
  • an alternative is to present to the user information related to what was on the screen at the time providing the user input.
  • This information can be connected to the content or subject matter of the onscreen visual medial itself, or only connected to a particular object in the frame or scene indicated by the user, and not connected to the overall content of the OVM.
  • a notification of the user input is sent to the server 108.
  • the notification of the user input may comprise a time within the piece of media at which the user input was received in step S412.
  • All of the communications between the user device 102 and the server 108 occur over the network 106, e.g. via the network interface 214 and the network interface 304.
  • the server 108 receives the notification of the user input, and in step S416 the server 108 stores an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
  • the portion can be a scene or frame or any other defined time period around a frame. Other examples include a particular camera shot (that is, a point of view in a scene), or a sequence of frames.
  • a delayed motion event can cause a number of frames in a scene to be displayed to a user at the user device 102, to allow him to pick the precise frame/scene of interest.
  • the indication of the portion (scene or frame) may be stored in a memory at the server 108.
  • the notification of the user input may indicate a time within the piece of media at which the user input is received, and step S416 may include determining which frame of the identified piece of media occurs at the identified time.
  • the frame occurring at the identified time may then be stored at the server 108. That is, the frame itself may be stored. In this way, a screenshot can be saved of whatever is displayed on the display 204 at the time at which the user input is received.
  • the timing of the frame within the piece of media may be stored instead of the frame itself. In that case, the stored timing can subsequently be used to determine the frame of the video data of the identified piece of media occurring at the identified time.
  • Steps S412 to S416 may be repeated throughout the outputting of the piece of media for each appropriate user input that is received by the media application 218.
  • the dotted line in Figure 4 indicates a passage of time, after which in step S418 the outputting of the piece of media finishes. The finishing of the piece of media is detected by the media application 218.
  • the media application 218 sends an indication to the server 108 to indicate that the output of the piece of media at the user device 102 has finished.
  • step S422 The indication that the output of the piece of media has finished is received at the server 108.
  • the server 108 sends the frame(s) of the piece of media, which are indicated by the indication(s) which were stored in step S416, to the user device 102. If the frames themselves were stored in step S416, then step S422 simply involves retrieving the frames from the data store where they were saved in step S416 and then sending the frames to the user device 102.
  • step S422 involves retrieving the timings from the data store where they were saved in step S416, using the timings and the video data of the known piece of media to determine the frames at the relevant timings, and sending the determined frames to the user device 102.
  • the frame(s) are received at the user device 102.
  • the received frames are displayed to the user 104 on the display 204 at the user device 102. In this way the user 104 is reminded of what was on the screen when he decided to provide the user input in step S412.
  • a link to a relevant webpage may be displayed with the frame that is displayed in step S424.
  • the relevant web page may relate to an object displayed in the frame of the video data. For example, if the frame of video data includes a character wearing a dress, then there may be provided a link to a web page from which the dress can be purchased. Interaction with the link may cause a remote retailer to take action, such as to send a brochure or advertisement to the user device.
  • step S426 is not necessary, and instead step S424 of displaying the frame(s) includes automatically directing the user 104 to a webpage of an online store in which the frame(s) are displayed.
  • the user 104 may be taken straight to an online store when the piece of media finishes, so that the user 104 can review their screenshots (i.e. the frames which caused them to provide the user input in step S412).
  • the user 104 can login separately to browse the internet for the items that attracted them in the frames which caused them to provide the user input in step S412).
  • Figures 6a to 6d show representations of an example web page to which the user 104 may be directed.
  • Figure 6a shows a screen 602 to which the user 102 may first be directed in order to view the frames which he chose to be reminded about.
  • Three frames are shown in screen 602 which show the frames of the piece of media which the user chose to save.
  • the user 104 can select one of the saved scenes, e.g. by clicking on one of the frames 604, 606 or 608 using for example, the touch screen 204 or keypad 210.
  • screen 610 When the user 104 selects a scene from screen 602, screen 610 is displayed as shown in Figure 6b. Screen 610 requests that the user 104 selects a category in order to shop for items shown in the scene of the piece of media which he has selected.
  • the example categories shown in Figure 6b are clothes 612, products 614 and accessories 616.
  • screen 618 is displayed as shown in Figure 6c. Screen 618 requests that the user 104 selects between the characters which are included in the selected scene. For example, as shown in Figure 6c, three characters are included in the scene, those being character A 620, character B 622 and character C 624.
  • Screen 626 presents the user 104 with online shopping opportunities relating to the category and character selected in screens 610 and 618. For example, if the user has selected clothes 612 and character A 620, then screen 626 may present options for the user 104 to buy a dress or shoes by clicking on the respective links 630 and 632.
  • the dress or shoes may be those worn by the selected character in the scene of the piece of media which includes the frame at the time for which the user provided the user input in step S412. Other information relating to the relevant products and/or characters may also be displayed in screen 626.
  • the example implementation illustrated in Figures 6a to 6d enables the user 04 to purchase products and/or services via the media outputted at the user device 102 without interrupting the viewing experience, i.e. the purchasing of products and/or services occurs after the media has finished being outputted at the user device 102.
  • the media application 218 bridges the gap between product placement and product purchasing. This opens up huge potential for the way that viewers of media interact with the products included in, or extrapolated from, the media.
  • the media application can also be used to provide instant information about an object on the screen 103.
  • the media application 218 may be downloaded to the user device 102 over the network 106 and stored in the memory 212 of the user device 102.
  • the media application 218 may, or may not, be downloaded in return for payment from the user 104 to a software provider of the media application 218.
  • a piece of media could have multiple versions, including for example a director's cut, extended cut, international cut and the final cut.
  • the audio data is matched to audio data of a known piece of media (in step S406) it may match with more than one of these versions.
  • the default result which is assumed in this case is the final cut.
  • the database 110 will store the temporal positions within the piece of media where the versions differ, and at this point the media application 218 may reconnect to the server 108 in order to verify which version of the piece of media is being output, i.e. to perform steps S404 to S410 again. This is done without requiring involvement from the user 104.
  • the connection between the user device 102 and the server 108 may be maintained (e.g. using a Wi-Fi connection) and the sampling of the audio data of the outputted piece of media is continued, e.g. at regular intervals (e.g., every 5 seconds), to detect the presence of adverts.
  • adverts are detected the tracking of the output of the piece of media is paused until the adverts are finished and the output of the piece of media re-commences.
  • the media application 218 can maintain the sound sampling over Wi-Fi or other wireless connection to differentiate between the piece of media and the advertisements, to ensure that the tracking of the output of the piece of media proceeds correctly.
  • the broadcaster of the television program To avoid the need to continuously track the media and advertisements so as to identify the adverts, it would be possible for the broadcaster of the television program to provide data to the user device which would indicate when the adverts were starting and stopping, so as to indicate to the user device when to reconnect to the audio data for tracking.
  • the application When the application is used in a voting TV show, the points at which the user votes, voting is allowed and the vote data can be sent to the show and data can be sent to the user device of overall votes.
  • the user 104 wishes to remember a frame of the outputted media in order to purchase a product shown in the frame, or obtain more details about an object in the frame. These details can be connected to the content of the subject matter of the OVM, or only connected to the object itself and not the overall content of the OVM. As an alternative to details concerning an object, services or products connected to the OVM can constitute information available to a user of the user device. However, in other embodiments, the user 104 may wish to remember the frame for some other reason, which might not be related to the purchasing of products or services.
  • the analysis of the audio data for comparison with audio data of known pieces of media provides a fast and reliable method to identify a piece of media, and a temporal position within the piece of media, which is outputted from the user device 102.
  • the method steps described above and shown in Figure 4 may be implemented as functional modules at the user device 102 and the server 108 as appropriate, e.g. in hardware or software.
  • the method steps may be implemented by executing a computer program product on the respective CPU (202 or 302) to implement the steps in software.
  • the media application 218 described above is an example of a suitable computer program product for implementing the steps at the user device 102.
  • the screen of the user device 102 is not displaying the media and can be considered effectively to be blank.
  • the screen could be used to display additional information to augment the OVM content.
  • the App it is useful for a user to be aware that the App is open and responsive, and so the App can be designed to generate a "tracking" screen as shown in Figure 7a while the onscreen visual media is being tracked.
  • a "scene saved" screen can be displayed to a user as shown, for example, in Figure 7b.
  • the application could cause the device to adopt a cinema mode, in which the ringing tone is turned off, any camera is turned off, and notifications and recordings of any kind are prevented.
  • the screen could show black, but would still allow sending and receiving data for recognition of the onscreen visual media.
  • the piece of media could be tracked using the audio data from the beginning of a particular piece, for example, a television show, wherein the broadcasters of the show send information to the user device connected with the show which is being viewed by the user. For example, coupons or voting opportunities can be advised and displayed to a user of the user device while he is watching the television show, and based on the tracking of that show using the audio data.
  • a user could identify shows that he wishes to track in this fashion by using a tap or other input gesture at the beginning of a show once he has opened the application on his user device.
  • the show could automatically (assuming the application is open) notify the application to commence tracking such that the show can interact with the user on the display of the user device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Signal Processing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Software Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Human Computer Interaction (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The Invention relates to a method of processing media data, the method comprising: receiving audio data of a piece of media outputted to a user, said piece of media comprising both video data and audio data; comparing the received audio data of the outputted piece of media to audio data of known pieces of media stored in a data store, to thereby identify the outputted piece of media; receiving a notification of a user input during the output of the piece of media to the user; and storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.

Description

PROCESSING MEDIA DATA
Field of the Invention The present invention relates to processing media. Background
As a user watches a piece of media (which may, for example, be a film, a live television programme, a pre-recorded television programme, a music video or an advertisement), he may see something in the video data of the piece of media which he would like to remember later, e.g. when the piece of media has finished being outputted at the user device. For example, during a film, the user may see a particular object being displayed in the film (e.g. a watch worn in a scene by an action hero) for which he would like to find out more details. At present, a user has to either pause the film to identify the object (if he can do that) or attempt to recall the object when the film finishes. For example, in a cinema, pausing the film is not an option for a viewer. Summary
The inventors recognise that many viewers of media, termed herein as Onscreen Visual Media" (OVM) have a variety of user devices. For example, users may have a smartphone, a tablet, a laptop, a personal computer (PC) or gaming device. Such devices are capable of executing applications (Apps) which interface with a user.
There are described herein methods by which a user can use an application executed at a user device in order to "remember" a moment during output of a piece of media. The pieces of media referred to herein include both video data and audio data output in synchronisation. The application can record a frame of the piece of media corresponding to the moment within the piece of media that the user wanted to remember for display to the user. This can be done when the movie or program is finished, or during the movie or program. In methods described herein, this is achieved by identifying a piece of media using the audio data of the piece of media. A small portion of audio data (e.g. of the order of 1 to 10 seconds) of a piece of media is usually sufficient to identify a piece of media and the temporal position of the audio data within the piece of media. The application can then track the output of the piece of media.
In particular, in a first aspect there is provided a method of processing media data, the method comprising: receiving audio data of a piece of media outputted to a user, said piece of media comprising synchronised video data and audio data; comparing the received audio data of the outputted piece of media to audio data of known pieces of media stored in a data store, to thereby identify the outputted piece of media; receiving notification of a user input at a user device during the output of the piece of media to the user; and storing an indication of a portion, e.g. a frame or scene, of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media. The method of the first aspect may be performed at a server or at the user device. For example, a computer program product configured to process media may be embodied on a computer-readable storage medium and configured so as when executed on a processor of the server to perform the method of the first aspect.
In a second aspect, there is provided a method of processing media data the method comprising: receiving at a user device audio data of a piece of media outputted to a user; sending the audio data of the outputted piece of media for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media; receiving a user input from the user during the output of the piece of media to the user; and sending a notification of the user input to the server for use in storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
As an example, a computer program product configured to process media may be embodied on a computer-readable storage medium and configured so as when executed on a processor of the user device to perform the method of the second aspect.
In this way, the audio data of a piece of media is used to identify the piece of media. The timing of a user input within the piece of media corresponds to a frame of the video data of the piece of media, and an indication of that frame is stored. This allows the user to provide a user input during the output of the piece of media in order to remember a moment within the piece of media. The invention also provides a computer device configured to process media data, the computer device comprising: a receiving module configured to: (i) receive audio data of a piece of media outputted to a user, said piece of media comprising synchronised video data and audio data, and (ii) receive a notification of a user input during the output of the piece of media to the user; a data store configured to store known pieces of media; a comparing module configured to compare the received audio data of the outputted piece of media to audio data of the known pieces of media stored in the data store, to thereby identify the outputted piece of media; and a storing module configured to store an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
The invention also provides a user device configured to process media, the user device comprising: an audio data receiving module configured to receive audio data of a piece of media output to a user of the user device, said piece of media comprising synchronised video data and audio data; a user interface configured to receive a user input from the user during the output of the piece of media to the user; and a sending module configured to: (i) send audio data of the outputted piece of media to a server for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media, and (ii) send a notification of the user input to the server, said indication of the user input being for use by the server in storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media. Brief Description of the Drawings
For a better understanding of the present invention and to show how the same may be put into effect, reference will now be made, by way of example, to the following drawings in which;
Figure 1 shows a schematic illustration of a network;
Figure 2 is a schematic functional block diagram of a user device;
Figure 3 is a schematic functional block diagram of a server;
Figure 4 is a flow chart for a process of processing media according to a preferred embodiment;
Figure 5 is an example of a graph showing the amplitude of an audio signal as a function of time; and
Figures 6a to 6d, 7a and 7b show examples of user interfaces displayed at a user device.
Detailed Description of Preferred Embodiments
Preferred embodiments of the invention will now be described by way of example only.
Figure 1 shows a system including a user device 102 which is useable by a user 104. The user 104 is watching a piece of media on another device 103, referred to as a screen device, which could be a cinema screen, or computer, DVD or television, for example. The user device 102 can connect to the network 106, which may for example be the Internet. The user device 102 may for example be a mobile phone (e.g. a smartphone), a tablet, a laptop, a personal computer ("PC"), a gaming device or other embedded device able to communicate over the network 106. The user device 102 is arranged to receive information from and output information to the user 104. The network 106 comprises a server 108 which has access to a data store such as a database 110. Many more nodes than those shown in Figure 1 may be connected to the network 106, but for clarity only the user device 102 and server 108 are shown in Figure 1. The user device 102 and the server 108 can communicate with each other over the network 106. For example, where the network 106 is the Internet, the user device 102 and the server 108 can communicate with each other by sending Internet Protocol (IP) data packets across the network 106. It will be appreciated that if the network 106 is a network other than the Internet then data packets may be formatted and sent according to some other, appropriate protocol.
Figure 2 illustrates a detailed view of the user device 102. The user device 102 comprises a processor ("CPU") 202 configured to process data on the user device 102. Connected to the CPU 202 is a display 204 which may be implemented as a touch screen for inputting data to the CPU 202. Also connected to the CPU 202 are speakers 206 for outputting audio data, a microphone 208 for receiving audio data, a keypad 210, a memory 212 for storing data and a network interface 214 for connecting to the network 106. The display 204, speakers 206, microphone 208, keypad 210, memory 212 and network interface 214 are integrated into the user device 102 (e.g. when the user device 104 is a mobile phone). The display 204 and speakers 206 act as output apparatus of the user device 102 for outputting video and audio data respectively. The display 204 (when implemented as a touch screen), microphone 208 and keypad 210 act as input apparatus of the user device 102. In alternative user devices one or more of the display 204, speakers 206, microphone 208, keypad 210, memory 212 and network interface 214 may not be integrated into the user device 102 and may be connected to the CPU 202 via respective interfaces. One example of such an interface is a USB interface. The user device 102 may include other components which are not shown in Figure 2. For example, when the user device 102 is a PC, the CPU 202 may be connected to a mouse via a USB interface. Similarly, when the user device 102 is a laptop, the CPU 202 may be connected to a touchpad via a USB interface. As another example, when the user device 102 is a television, the CPU 202 may be connected to a remote control via a wireless (e.g. infra-red) interface.
An operating system (OS) 216 is running on the CPU 202. The user device 102 is configured to execute a media application 218 on top of the OS 216. The media application 218 is a computer program product which is configured to process media data at the user device 102. The media application 218 is stored in the memory 212 and when executed on the CPU 202 performs methods described in more detail below for processing media data at the user device 102.
Figure 3 illustrates a detailed view of the server 108. The server 108 comprises a processor ("CPU") 302 configured to process data on the server 108. Connected to the CPU 302 is a network interface 304 for connecting to the network 106. The server 108 also includes a memory for storing data which may include the database 110. A computer program product may be stored in the memory at the server 08 and configured such that when it is executed on the CPU 302 to perform methods described in more detail below for processing media at the server 108. With reference to Figures 4 to 6 there is now described a method of a preferred embodiment. In Figure 4, the steps shown in the left hand column (i.e. steps S402, S404, S410, S412, S414, S418, S420, S424 and S426) are implemented at the user device 102, whereas the steps shown in the right hand column (i.e. steps S406, S408, S416 and S422) are implemented at the server 108.
In step S402 a piece of media is outputted to the user 104 at the screen device 103. For example, the screen device 103 may be showing a film or TV program. The film includes synchronized streams of video and audio data. As described above, the piece of media may be a television programme, music video, advert, or any other OVM. The piece of media can come from any source, and can be shown on any suitable screen device. Moreover, it could be streamed to the user device or stored at the user device for display at the user device itself. In that case, the screen device 103 is the same as the user device 102.
The user 104 opens the media application 218, such that the media application 218 executes on the CPU 202. The media application 218 takes a sample of the audio data of the piece of media currently being output at the screen device 103. The sample may for example have a duration in the range of 1 second to 10 seconds. The sample has a sufficient duration for the piece of media to be identified as described below.
In step S404, the audio data of the piece of media which is outputted at the screen device 103 is received by the user device 102 which sends it over the network 106 to the server 108 (e.g. via the network interface 214 and the network interface 304). For example, the network interface 214 may connect the user device 102 to the network 106 via a Wi-Fi router or via a cellular telephone network (e.g. implementing 3rd or 4th generation of mobile telecommunications (3G or 4G) technology).
The server 108 receives the audio data sent from the user device 102, and in step S406 the server 108 uses the received audio data to identify the piece of media being outputted at the user device 102. The server 108 can also identify the exact point in the piece of media at which the audio data occurs. This takes a few seconds, and is implemented as described below.
The audio data may be represented as a graph of amplitude against time, such as that shown in Figure 5. A piece of media has a unique signature through samples of its audio data. This is true even for audio samples which have a duration of the order of 1 to 10 seconds.
The server 108 has access to a data store which stores audio data of known pieces of media. For example, in the embodiments described in detail herein, the data store is implemented as a database 110 stored at the server 108. The database 110 may for example store data representing power functions (such as that shown in Figure 5) for the audio data of the known pieces of media. As described above, the known pieces of media may include, for example, a film, a live television programme, a pre-recorded television programme, a music video, an advert, or any other OVM which may be output.
In step S406 the audio data received from the user device 102 is compared to audio data of known pieces of media stored in the database 110, to thereby identify the piece of media being outputted from the user device 102. The audio signature of a piece of media is used to differentiate between known pieces of media and the exact position of the audio data within a piece of media. The comparison of the received audio data with the known pieces of audio data may be performed using known algorithms. This involves comparing features (e.g. audio fingerprints) of audio data using statistical analysis to determine whether two samples of audio data match each other. For example, applications such as Shazam, Soundhound, SoundPrint and IntoNow implement algorithms for identifying audio data by comparing it with audio data from a database of audio data from known pieces of media. As an example, the IntoNow implementation is described in the US patent publication number 2012/0209612 A1. Since such algorithms are known in the art, they are not described in detail herein.
Step S406 identifies which piece of media the user 104 is viewing and also identifies the exact temporal position within the piece of media to which the audio data matches. In step S408 the server 108 sends an indication of the identified piece of media to the user device 102. In step S408 the server 108 also sends an identifier of the temporal position within the identified piece of media to the user device 102. The user device 102 receives the indication of the piece of media (e.g. the title of the piece of media) and the identifier of the temporal position (e.g. a time from the start of the piece of media). Then in step S410, using the indication of the piece of media and the identifier of the temporal position, the media application 218 tracks the outputting of the piece of media. The tracking of the piece of media may continue until completion of the outputting of the piece of media, that is, until the piece of media finishes being output. However, if there is any disruption in the outputting of the piece of media then the media application 218 will need to reconnect to the server 108 in order to correctly track the output of the piece of media. For example, the film may be paused on TV, or buffering issues may interrupt the playout if the media is being streamed. A reconnect process would repeat steps S404 to S410 as described above.
In this way, the media application 218 is able to obtain information (e.g. title) indicating what media is being output and how far through that media (in time) the outputting of the media is. To reassure the user 104, the media application 218 may display details about the piece of media on the display 204 when it has received them from the server 108. For example, the media application 218 may display the title, current point and total length of the piece of media currently being output.
Once the media application 218 has synched to the particular piece of media being output then the temporal position within the piece of media at which a user input is subsequently received can be determined. For example, in step S412 the media application 218 receives a user input from the user 104 during the output of the piece of media. The user 104 may provide the user input via the user interface of the user device 102. The user interface of the user device 102 comprises input apparatus by which the user can provide a user input. The input apparatus may for example comprise one or more of a touch screen, a button, a mouse, a touch pad, a voice recognition system and a remote control. In a preferred embodiment, the user taps the touch screen 204 to provide the user input. Alternatively, a gesture can act as an input. As an example, the user may provide the user input when he sees something in the outputted piece of media which he wants to record, either to remember after the piece of media has finished or at the time the object of interest is displayed. For example, during a film, the user 104 may decide that he would like to buy an object being displayed in the film (e.g. a watch worn in a scene by an action hero). However, the user 104 may not want to interrupt his viewing experience of the film, so he decides that he will follow up on the object when the film has finished. In the past, when the film finished, the user might not proceed to buy the object, for one of many possible reasons. For example, the user may forget about his intention to buy the object, he may not know how to buy the object or he may proceed to do something else when the film finishes instead of buying the object. Alternatively, a viewer may want additional information about something shown in a TV documentary, such as an animal or location in a wildlife or holiday program. In accordance with the novel methods described herein, by providing the user input when the user 104 sees the object, the user 104 will be reminded (e.g. when the film finishes) of what was on the screen at the time of providing the user input. As will become clear from the following description, an alternative is to present to the user information related to what was on the screen at the time providing the user input. For example, details about objects that were on the screen, or services or products related to what was on the screen. This information can be connected to the content or subject matter of the onscreen visual medial itself, or only connected to a particular object in the frame or scene indicated by the user, and not connected to the overall content of the OVM.
When the user input is received, in step S414 a notification of the user input is sent to the server 108. The notification of the user input may comprise a time within the piece of media at which the user input was received in step S412.
All of the communications between the user device 102 and the server 108 occur over the network 106, e.g. via the network interface 214 and the network interface 304.
The server 108 receives the notification of the user input, and in step S416 the server 108 stores an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media. The portion can be a scene or frame or any other defined time period around a frame. Other examples include a particular camera shot (that is, a point of view in a scene), or a sequence of frames. A delayed motion event can cause a number of frames in a scene to be displayed to a user at the user device 102, to allow him to pick the precise frame/scene of interest. The indication of the portion (scene or frame) may be stored in a memory at the server 108. For example, the notification of the user input may indicate a time within the piece of media at which the user input is received, and step S416 may include determining which frame of the identified piece of media occurs at the identified time. The frame occurring at the identified time may then be stored at the server 108. That is, the frame itself may be stored. In this way, a screenshot can be saved of whatever is displayed on the display 204 at the time at which the user input is received. Alternatively, the timing of the frame within the piece of media may be stored instead of the frame itself. In that case, the stored timing can subsequently be used to determine the frame of the video data of the identified piece of media occurring at the identified time.
Steps S412 to S416 may be repeated throughout the outputting of the piece of media for each appropriate user input that is received by the media application 218. The dotted line in Figure 4 indicates a passage of time, after which in step S418 the outputting of the piece of media finishes. The finishing of the piece of media is detected by the media application 218. In step S420 the media application 218 sends an indication to the server 108 to indicate that the output of the piece of media at the user device 102 has finished.
The indication that the output of the piece of media has finished is received at the server 108. In response, in step S422 the server 108 sends the frame(s) of the piece of media, which are indicated by the indication(s) which were stored in step S416, to the user device 102. If the frames themselves were stored in step S416, then step S422 simply involves retrieving the frames from the data store where they were saved in step S416 and then sending the frames to the user device 102. Alternatively, if the timings of the frames were stored in step S416, then step S422 involves retrieving the timings from the data store where they were saved in step S416, using the timings and the video data of the known piece of media to determine the frames at the relevant timings, and sending the determined frames to the user device 102.
The frame(s) are received at the user device 102. In step S424 the received frames are displayed to the user 104 on the display 204 at the user device 102. In this way the user 104 is reminded of what was on the screen when he decided to provide the user input in step S412.
In one implementation, as shown in Figure 4 by step S426, a link to a relevant webpage may be displayed with the frame that is displayed in step S424. The relevant web page may relate to an object displayed in the frame of the video data. For example, if the frame of video data includes a character wearing a dress, then there may be provided a link to a web page from which the dress can be purchased. Interaction with the link may cause a remote retailer to take action, such as to send a brochure or advertisement to the user device.
In another implementation, step S426 is not necessary, and instead step S424 of displaying the frame(s) includes automatically directing the user 104 to a webpage of an online store in which the frame(s) are displayed. In this way, the user 104 may be taken straight to an online store when the piece of media finishes, so that the user 104 can review their screenshots (i.e. the frames which caused them to provide the user input in step S412). Alternatively, the user 104 can login separately to browse the internet for the items that attracted them in the frames which caused them to provide the user input in step S412).
Figures 6a to 6d show representations of an example web page to which the user 104 may be directed. Figure 6a shows a screen 602 to which the user 102 may first be directed in order to view the frames which he chose to be reminded about. Three frames (indicated as 604, 606 and 608) are shown in screen 602 which show the frames of the piece of media which the user chose to save. The user 104 can select one of the saved scenes, e.g. by clicking on one of the frames 604, 606 or 608 using for example, the touch screen 204 or keypad 210.
When the user 104 selects a scene from screen 602, screen 610 is displayed as shown in Figure 6b. Screen 610 requests that the user 104 selects a category in order to shop for items shown in the scene of the piece of media which he has selected. The example categories shown in Figure 6b are clothes 612, products 614 and accessories 616. When the user 104 selects a category from screen 610, screen 618 is displayed as shown in Figure 6c. Screen 618 requests that the user 104 selects between the characters which are included in the selected scene. For example, as shown in Figure 6c, three characters are included in the scene, those being character A 620, character B 622 and character C 624. When the user selects one of the characters in screen 618, he is taken to screen 626 as shown in Figure 6d. Screen 626 presents the user 104 with online shopping opportunities relating to the category and character selected in screens 610 and 618. For example, if the user has selected clothes 612 and character A 620, then screen 626 may present options for the user 104 to buy a dress or shoes by clicking on the respective links 630 and 632. The dress or shoes may be those worn by the selected character in the scene of the piece of media which includes the frame at the time for which the user provided the user input in step S412. Other information relating to the relevant products and/or characters may also be displayed in screen 626.
It can therefore be appreciated that the example implementation illustrated in Figures 6a to 6d, enables the user 04 to purchase products and/or services via the media outputted at the user device 102 without interrupting the viewing experience, i.e. the purchasing of products and/or services occurs after the media has finished being outputted at the user device 102. In this implementation the media application 218 bridges the gap between product placement and product purchasing. This opens up huge potential for the way that viewers of media interact with the products included in, or extrapolated from, the media. The media application can also be used to provide instant information about an object on the screen 103.
The media application 218 may be downloaded to the user device 102 over the network 106 and stored in the memory 212 of the user device 102. The media application 218 may, or may not, be downloaded in return for payment from the user 104 to a software provider of the media application 218.
A piece of media (e.g. film) could have multiple versions, including for example a director's cut, extended cut, international cut and the final cut. When the audio data is matched to audio data of a known piece of media (in step S406) it may match with more than one of these versions. The default result which is assumed in this case is the final cut. However, the database 110 will store the temporal positions within the piece of media where the versions differ, and at this point the media application 218 may reconnect to the server 108 in order to verify which version of the piece of media is being output, i.e. to perform steps S404 to S410 again. This is done without requiring involvement from the user 104.
When the piece of media is outputted on a television channel which includes adverts, the connection between the user device 102 and the server 108 may be maintained (e.g. using a Wi-Fi connection) and the sampling of the audio data of the outputted piece of media is continued, e.g. at regular intervals (e.g., every 5 seconds), to detect the presence of adverts. When adverts are detected the tracking of the output of the piece of media is paused until the adverts are finished and the output of the piece of media re-commences. In this way, the media application 218 can maintain the sound sampling over Wi-Fi or other wireless connection to differentiate between the piece of media and the advertisements, to ensure that the tracking of the output of the piece of media proceeds correctly.
To avoid the need to continuously track the media and advertisements so as to identify the adverts, it would be possible for the broadcaster of the television program to provide data to the user device which would indicate when the adverts were starting and stopping, so as to indicate to the user device when to reconnect to the audio data for tracking. When the application is used in a voting TV show, the points at which the user votes, voting is allowed and the vote data can be sent to the show and data can be sent to the user device of overall votes.
In the embodiments described above, the user 104 wishes to remember a frame of the outputted media in order to purchase a product shown in the frame, or obtain more details about an object in the frame. These details can be connected to the content of the subject matter of the OVM, or only connected to the object itself and not the overall content of the OVM. As an alternative to details concerning an object, services or products connected to the OVM can constitute information available to a user of the user device. However, in other embodiments, the user 104 may wish to remember the frame for some other reason, which might not be related to the purchasing of products or services.
The analysis of the audio data for comparison with audio data of known pieces of media provides a fast and reliable method to identify a piece of media, and a temporal position within the piece of media, which is outputted from the user device 102.
The method steps described above and shown in Figure 4 may be implemented as functional modules at the user device 102 and the server 108 as appropriate, e.g. in hardware or software. For example, the method steps may be implemented by executing a computer program product on the respective CPU (202 or 302) to implement the steps in software. The media application 218 described above is an example of a suitable computer program product for implementing the steps at the user device 102.
In particular, in the above description, a method has been described wherein audio data is received by the user device 102 and then transmitted to the server. It would be possible for the user device 102 to carry out the steps which are described above as being carried out by the server, in particular, the comparison and storing steps. This would be particularly appropriate in a situation where the provider of a piece of onscreen visual media, such a film, created an application specifically for that film, and including a sound file with the application. That is, the media application described above could be made specific to a particular piece of onscreen visual media such that when the application is opened and the user input is received, there would be no requirement to have a server connection until a later time.
It will be appreciated that in the embodiments described above where the onscreen visual media is output on the screen device 103, the screen of the user device 102 is not displaying the media and can be considered effectively to be blank. Alternatively the screen could be used to display additional information to augment the OVM content. However, it is useful for a user to be aware that the App is open and responsive, and so the App can be designed to generate a "tracking" screen as shown in Figure 7a while the onscreen visual media is being tracked. When a user has provided a user input, for example, by tap or gesture, a "scene saved" screen can be displayed to a user as shown, for example, in Figure 7b.
For watching movies in a cinema, the application could cause the device to adopt a cinema mode, in which the ringing tone is turned off, any camera is turned off, and notifications and recordings of any kind are prevented. The screen could show black, but would still allow sending and receiving data for recognition of the onscreen visual media.
In an alternative aspect of the invention, there may not be a requirement for a user to provide a user input during the output of the piece of media to identify a piece of media in which he is interested. Instead, the piece of media could be tracked using the audio data from the beginning of a particular piece, for example, a television show, wherein the broadcasters of the show send information to the user device connected with the show which is being viewed by the user. For example, coupons or voting opportunities can be advised and displayed to a user of the user device while he is watching the television show, and based on the tracking of that show using the audio data.
A user could identify shows that he wishes to track in this fashion by using a tap or other input gesture at the beginning of a show once he has opened the application on his user device. Alternatively, the show could automatically (assuming the application is open) notify the application to commence tracking such that the show can interact with the user on the display of the user device.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

CLAIMS:
1. A method of processing media data, the method comprising:
receiving audio data of a piece of media outputted to a user, said piece of media comprising both video data and audio data;
comparing the received audio data of the outputted piece of media to audio data of known pieces of media stored in a data store, to thereby identify the outputted piece of media;
receiving a notification of a user input during the output of the piece of media to the user; and
storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
2. The method of claim 1 , wherein the notification of the user input indicates the timing of the user input within the piece of media.
3. The method of claim 1 , wherein said comparing step further identifies a temporal position of the received audio data within the identified piece of media.
4. The method of claim 1, wherein said indication of a portion of the video data is one of:
a frame of the video data itself;
a scene of the video data;
a shot in the video data; and
a sequence of frames of the video data.
5. The method of claim 1 , wherein said indication of a portion of the video data is the timing of the portion within the piece of media, wherein said timing is subsequently used to determine the portion of the video data of the identified piece of media.
6. The method of claim 1 , wherein the method steps are performed at a server, the piece of media is outputted to the user at a screen device, and the audio data is received at a user device associated with the user and transmitted to the server.
7. The method of claim 6, further comprising sending an indication of the identified piece of media to the user device.
8. The method of claim 6, when dependent upon claim 3 further comprising sending the identified temporal position to the user device.
9. The method of claim 6, further comprising:
using said stored indication of the portion to determine the portion; and sending the determined portion or information about an object contained in the determined portion to the user device.
10. The method of claim 9, wherein the determined portion or information is sent to the user device after the piece of media has finished being outputted at the user device.
11. A computer device configured to process media data, the computer device comprising:
a receiving module configured to: (i) receive audio data of a piece of media outputted to a user, said piece of media comprising synchronised video data and audio data, and (ii) receive a notification of a user input during the output of the piece of media to the user;
a data store configured to store known pieces of media;
a comparing module configured to compare the received audio data of the outputted piece of media to audio data of the known pieces of media stored in the data store, to thereby identify the outputted piece of media; and a storing module configured to store an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
12. The computer device of claim 11 , which is one of a server and a user device.
13. A computer program product configured to process media data, the computer program product being embodied on a computer-readable storage medium and configured so as when executed on a processor of a server to perform the operations of:
receiving audio data of a piece of media outputted to a user, said piece of media comprising both video data and audio data;
comparing the received audio data of the outputted piece of media to audio data of known pieces of media stored in a data store, to thereby identify the outputted piece of media;
receiving notification of a user input during the output of the piece of media to the user; and
storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
14. A method of processing media data at a user device, the method comprising:
receiving at a user device audio data of a piece of media outputted to a user;
sending the audio data of the outputted piece of media for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media;
receiving a user input from the user during the output of the piece of media to the user; and
sending a notification of the user input to the server for use in storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
15. The method of claim 14, further comprising receiving an indication of the identified piece of media from the server.
16. The method of claim 14, further comprising receiving, from the server, an identifier of a temporal position, within the identified piece of media, of the audio data sent to the server.
17. The method of claim 16, wherein receiving the audio data causes the user device to track the output of the piece of media.
18. The method of claim 17, further comprising using the tracking of the output of the piece of media to determine the timing of the user input within the piece of media.
19. The method of claim 18, wherein the notification of the user input sent to the server comprises an indication of the determined timing of the user input.
20. The method of claim 14, wherein said user input is received via a user interface of the user device, said user interface comprising at least one of a touch screen, a button, a mouse, a touch pad, a voice recognition system, a remote control and a gesture recognition system.
21. The method of claim 14, further comprising:
after the piece of media has finished being outputted, receiving said indicated portion of the video data of the identified piece of media or information about an object contained in the identified piece of media corresponding to the timing of the user input within the piece of media; and displaying said portion or information at the user device.
22. The method of claim 21 , further comprising providing a link to a web page relating to the object displayed in said portion of the video data.
23. A user device configured to process media, the user device comprising: an audio data receiving module configured to receive audio data of a piece of media output to a user of the user device, said piece of media comprising synchronised video data and audio data;
a user interface configured to receive a user input from the user during the output of the piece of media to the user; and
a sending module configured to: (i) send audio data of the outputted piece of media to a server for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media, and
(ii) send a notification of the user input to the server, said indication of the user input being for use by the server in storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
24. The user device of claim 23, wherein the input apparatus comprises at least one of a touch screen, a button, a mouse, a touch pad, a voice recognition system, a gesture recognition system and a remote control.
25. The user device of claim 23, comprising:
a display operable to present a tracking screen while receiving audio data from the piece of media and a storing screen when sending the notification of the user input.
26. The user device of claim 25, operable to receive the identified piece of media or information about an object in the identified piece of media, wherein the display is operable to present the identified piece of media or information.
27. A computer program product configured to process media data at a user device, the computer program product being embodied on a computer- readable storage medium and configured so as when executed on a processor of the user device to perform the operations of:
receiving audio data of a piece of media;
sending audio data of the outputted piece of media to a server for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media;
receiving a user input from the user during the output of the piece of media to the user; and
sending a notification of the user input to the server, said indication of the user input being for use by the server in storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
28. The method of claim 1 , computer device of claim 11 , computer program product of claim 15, method of claim 14, user device of claim 23 or computer program product of claim 27, wherein said indication of a portion of the video data comprises information related to the portion of video data.
29. The method, computer device, user device or computer program product of claim 28, wherein said information comprises details relating to an object in the portion of video data or information about products or services relating to the content of the video data.
30. The method of claim 14, further comprising:
concurrently with outputting with the piece of media, receiving said indicated portion of the video data of the identified piece of media or information about an object contained in the identified piece of media corresponding to the timing of the user input within the piece of media; and displaying said portion or information at the user device.
31. A method of processing media data, the method comprising:
receiving audio data of a piece of media outputted to a user, said piece of media comprising synchronised video data and audio data;
comparing the received audio data of the outputted piece of media to audio data of known pieces of media stored in a data store, to thereby identify the outputted piece of media;
tracking the outputted piece of media while it is output to a user; and receiving at a user device items for display to a user based on said tracking.
32. A user device comprising a processor configured to execute a computer program product which when executed implements the method of claim 31, the user device further comprising a display for displaying said items.
PCT/EP2012/076811 2012-12-21 2012-12-21 Processing media data WO2014094912A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2012/076811 WO2014094912A1 (en) 2012-12-21 2012-12-21 Processing media data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2012/076811 WO2014094912A1 (en) 2012-12-21 2012-12-21 Processing media data

Publications (1)

Publication Number Publication Date
WO2014094912A1 true WO2014094912A1 (en) 2014-06-26

Family

ID=47458992

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2012/076811 WO2014094912A1 (en) 2012-12-21 2012-12-21 Processing media data

Country Status (1)

Country Link
WO (1) WO2014094912A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1286541A1 (en) * 2000-04-14 2003-02-26 Nippon Telegraph and Telephone Corporation Method, system, and apparatus for acquiring information concerning broadcast information
US20100095326A1 (en) * 2008-10-15 2010-04-15 Robertson Iii Edward L Program content tagging system
US20100154012A1 (en) * 2008-12-15 2010-06-17 Verizon Business Network Services Inc. Television bookmarking with multiplatform distribution
WO2011090540A2 (en) * 2009-12-29 2011-07-28 Tv Interactive Systems, Inc. Method for identifying video segments and displaying contextually targeted content on a connected television
US20110247042A1 (en) * 2010-04-01 2011-10-06 Sony Computer Entertainment Inc. Media fingerprinting for content determination and retrieval
US20120209612A1 (en) 2011-02-10 2012-08-16 Intonow Extraction and Matching of Characteristic Fingerprints from Audio Signals

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1286541A1 (en) * 2000-04-14 2003-02-26 Nippon Telegraph and Telephone Corporation Method, system, and apparatus for acquiring information concerning broadcast information
US20100095326A1 (en) * 2008-10-15 2010-04-15 Robertson Iii Edward L Program content tagging system
US20100154012A1 (en) * 2008-12-15 2010-06-17 Verizon Business Network Services Inc. Television bookmarking with multiplatform distribution
WO2011090540A2 (en) * 2009-12-29 2011-07-28 Tv Interactive Systems, Inc. Method for identifying video segments and displaying contextually targeted content on a connected television
US20110247042A1 (en) * 2010-04-01 2011-10-06 Sony Computer Entertainment Inc. Media fingerprinting for content determination and retrieval
US20120209612A1 (en) 2011-02-10 2012-08-16 Intonow Extraction and Matching of Characteristic Fingerprints from Audio Signals

Similar Documents

Publication Publication Date Title
US12039776B2 (en) Systems and methods for presenting supplemental content in augmented reality
JP5837198B2 (en) System and method for visual selection of elements in video content
US8913171B2 (en) Methods and systems for dynamically presenting enhanced content during a presentation of a media content instance
US9015745B2 (en) Method and system for detection of user-initiated events utilizing automatic content recognition
US20130173765A1 (en) Systems and methods for assigning roles between user devices
JP5530028B2 (en) System and method for providing information related to advertisement contained in broadcast to client terminal side via network
US9460204B2 (en) Apparatus and method for scene change detection-based trigger for audio fingerprinting analysis
US20130174191A1 (en) Systems and methods for incentivizing user interaction with promotional content on a secondary device
US9781492B2 (en) Systems and methods for making video discoverable
WO2016118519A1 (en) System and methods for facile, instant, and minimally disruptive playback of media files
US20120042041A1 (en) Information processing apparatus, information processing system, information processing method, and program
US9769530B2 (en) Video-on-demand content based channel surfing methods and systems
CN103918277A (en) System and method for determining a level of confidence that a media item is being presented
KR20160117933A (en) Display apparatus for performing a search and Method for controlling display apparatus thereof
US20130085846A1 (en) System and method for online selling of products appearing on a display
US20130177286A1 (en) Noninvasive accurate audio synchronization
TWI571119B (en) Method and system of displaying and controlling, breakaway judging apparatus and video/audio processing apparatus
US10141023B2 (en) Method and system for multimedia summary generation
GB2509150A (en) Bookmarking a scene within a video clip by identifying the associated audio stream
WO2014094912A1 (en) Processing media data
US20190379920A1 (en) Method and system for creating a customized video associated with an advertisement
US12075122B2 (en) Gesture-based parental control system
EP4386653A1 (en) Placing orders for a subject included in a multimedia segment
KR102524066B1 (en) Method and apparatus for indentifying device in a web-based way
KR101380963B1 (en) System and method for providing relevant information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12808408

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12808408

Country of ref document: EP

Kind code of ref document: A1