US8103646B2 - Automatic tagging of content based on a corpus of previously tagged and untagged content - Google Patents
Automatic tagging of content based on a corpus of previously tagged and untagged content Download PDFInfo
- Publication number
- US8103646B2 US8103646B2 US11/717,266 US71726607A US8103646B2 US 8103646 B2 US8103646 B2 US 8103646B2 US 71726607 A US71726607 A US 71726607A US 8103646 B2 US8103646 B2 US 8103646B2
- Authority
- US
- United States
- Prior art keywords
- tag
- content
- data
- audio
- classification model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
- 238000000034 method Methods 0.000 claims description 63
- 238000013145 classification model Methods 0.000 claims description 33
- 238000012549 training Methods 0.000 claims description 15
- 238000010801 machine learning Methods 0.000 claims description 3
- 238000007619 statistical method Methods 0.000 claims description 3
- 230000003993 interaction Effects 0.000 abstract description 5
- 230000007246 mechanism Effects 0.000 abstract description 5
- 230000008569 process Effects 0.000 description 33
- 238000012545 processing Methods 0.000 description 22
- 238000004891 communication Methods 0.000 description 18
- 238000005516 engineering process Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 7
- 238000012706 support-vector machine Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 235000014510 cooky Nutrition 0.000 description 3
- 230000000875 corresponding effect Effects 0.000 description 3
- 230000006855 networking Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000013549 information retrieval technique Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/685—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7844—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
Definitions
- Semantic tagging and indexing is a popular way of organizing information, especially on the Internet.
- tags are used extensively for blog postings, product catalogs (e.g., of book sellers), and photo collections. Audio recordings are also becoming more popular as an information medium, with Internet momentum gaining around podcasting, audio books, and video.
- the taxonomy used for tagging this content is not pre-defined and is evolving in an ad-hoc fashion, following popular trends, for example.
- the popular taxonomy can be referred to as “folksonomy”.
- Audio and video content is oftentimes large in file size and should be reviewed serially at or near actual speed (or a small multiple thereof, such as double or triple speed) by a human in order to tag appropriately. This can lead to content not being tagged or to only portions of the content being reviewed, and hence, the tags not representative of the content as a whole.
- the disclosed architecture facilitates an automated mechanism of automatically tagging media files such as audio recordings containing spoken word (e.g., podcasts), blog entries, and videos, for example, with meaningful taxonomy tags.
- the architecture provides active (or automated) assistance in assigning appropriate tags to a particular piece of content (or media).
- the architecture includes a system for the automatic tagging of audio streams on the Internet, whether from audio files or from the audio tracks of audio/video files, using the folksonomy of the Internet.
- the audio streams may be provided by the media author.
- the author can make a recording to be posted on a website, and use the system to automatically suggest (via prompted author interaction) folksonomically appropriate tags for the media recording.
- the system can be used in an automated fashion to develop and assign a tag without any intervention by the author.
- the system searches and receives the media (e.g., audio stream) into a recognition (e.g., automatic speech recognition (ASR) for automatic transcribing audio into text) processor for recognition.
- ASR automatic speech recognition
- an ASR process is driven by a model of language and acoustic characteristics.
- the resulting text is not expected to be perfectly accurate, but at least an adequate representation of what was received (or voiced).
- the ASR process should be resilient to background noise, music, sound effects, and provide separation or discrimination processing in the presence of multi-voice environments.
- the system then forwards the transcribed text to a classifier that uses a tag classification model to produce a short list of tags that have a likelihood of being appropriate or related to the transcribed text.
- the accuracy of the tag classifier is maintained by utilizing a crawler, for example, to locate textual content that has already been tagged.
- the text and corresponding tag are input into a tag model trainer, which updates the tag classification model.
- the crawler can look for audio content, as well as audio/video content that have already been tagged.
- the classifier can consider the source of the content, such as the specific authors or sites from which the content was obtained.
- the author can then peruse the short list of likely tags and select the tag(s) desired.
- the classifier can implement a confidence threshold to reduce the likelihood of an inappropriate tag being selected.
- FIG. 1 illustrates a method of managing information in accordance with a novel embodiment.
- FIG. 2 illustrates a computer-implemented system that facilitates information management using modeling and tagging.
- FIG. 3 illustrates a system for media recognition and tagging where the media is audio data.
- FIG. 4 illustrates a system for media recognition and tagging where the media is blog posting data.
- FIG. 5 illustrates a tagging system that employs a crawler to locate textual context from network entities for training of a tag model.
- FIG. 6 illustrates a system where tagged audio data is employed for tag model training.
- FIG. 7 illustrates a system for using transcribed and tagged audio to train classification and ASR models.
- FIG. 8 illustrates a system that employs a machine learning and reasoning (LR) component which facilitates automating one or more features of the modeling and tagging architecture.
- LR machine learning and reasoning
- FIG. 9 illustrates a method of ranking and selecting tags.
- FIG. 10 illustrates a method of processing source information of various types in furtherance of generating a tag for new content.
- FIG. 11 illustrates a method of selecting information sources based on learning and reasoning processing.
- FIG. 12 illustrates a method of processing different types of information for tagging content.
- FIG. 13 illustrates a server that can employ the functionality provided by the system of FIG. 1 and/or the system of FIG. 7 .
- FIG. 14 illustrates a client device that can employ the functionality provided by the system of FIG. 2 and/or the system of FIG. 8 .
- FIG. 15 illustrates a block diagram of a computing system operable to execute modeling and tagging in accordance with the disclosed architecture.
- FIG. 16 illustrates a schematic block diagram of an exemplary computing environment for content processing and tagging in accordance with the disclosed architecture.
- the disclosed architecture facilitates a mechanism for automatically tagging media files such as podcasts, blog entries, and videos, for example, with meaningful taxonomy tags. Additionally, models are generated that can also be trained to provide a greater likelihood that the generated tags are relevant to the new content to be tagged.
- the system can be configured to allow a user to be involved in the selection process or not involved.
- a podcast is one form of audio recording that contains speech or spoke words. Accordingly, it is intended that tagging applies to all forms of audio files that can contain speech or spoken word.
- the architecture includes means for searching entities or information sources (e.g., web sites, blogs) for tagged and/or untagged content from which to develop one or more models.
- the searches in accordance with various embodiments described herein can be performed as Internet searches as well as searches within an intranet (e.g., a search of information stored on a remote desktop computer or within a corporate network). Additionally, in more robust implementations, searches can be conducted down to the device-level. For example, local desktops or more local computing systems can be capable of hosting web sites (e.g., home networks, enterprise subnets). Accordingly, the disclosed architecture can be employed on a local computing system, alternatively or in combination with server systems.
- FIG. 1 illustrates a method of managing information using modeling and tagging. While, for purposes of simplicity of explanation, the one or more methodologies shown herein, for example, in the form of a flow chart or flow diagram, are shown and described as a series of acts, it is to be understood and appreciated that the methodologies are not limited by the order of acts, as some acts may, in accordance therewith, occur in a different order and/or concurrently with other acts from that shown and described herein. For example, those skilled in the art will understand and appreciate that a methodology could alternatively be represented as a series of interrelated states or events, such as in a state diagram. Moreover, not all acts illustrated in a methodology may be required for a novel implementation.
- a search is performed for and on information sources (e.g., Internet-based, intranet-based) for tag and content relationship data.
- information sources e.g., Internet-based, intranet-based
- a tag model is created based on the relationship data.
- one or more tags for new content are generated based on the tag model.
- FIG. 2 illustrates a computer-implemented system 200 that facilitates information management using modeling and tagging.
- the system 200 includes a modeling component 202 for generating a model 204 of relationship data (or taxonomy) 206 between tags (denoted TAG) and an associated corpus of tagged content (denoted CONTENT).
- the tagged content can be media such as text, audio data, video data, or a combination thereof.
- there can be more than one tag e.g., TAG 11 and TAG 21 , represented as first and second tags (first subscript) for the first content (second subscript)) for the same content (e.g., CONTENT 1 ).
- the system 200 can further comprise a tagging component 208 for automatically generating one or more tags 210 for new content based on taxonomy employed in creating the model 204 .
- the system 200 automatically tags audio data searched from devices and systems disposed on the Internet, whether from audio files, streaming audio, and/or from audio tracks of audio/video files, using the folksonomy of the Internet.
- folksonomy is associated with a network-based information retrieval technique consisting of collaboratively generated, open-ended tags (or labels) that categorize content such as web pages, image media (e.g., photographs, videos), and web links, for example.
- a folksonomy can be contrasted with a taxonomy in that with folksonomy, the authors of the tagging system are oftentimes the main users (and sometimes originators) of the tagged content.
- the audio data may be provided by the author.
- the author can make a recording to be posted on the Internet, and use the disclosed architecture to suggest folksonomically appropriate tags for the recording. This is particularly beneficial when the author lacks sufficient knowledge of current folksonomy, and/or when the author does not have convenient access to a text-entry tool to enter the tags (e.g., the user is making the recording over an interactive voice response (IVR) system, an answering machine, voicemail system, or many other types of voice recording systems or recorded information).
- IVR interactive voice response
- the architecture can be used in an automated fashion by tagging content without any intervention by the author.
- the architecture receives and then feeds the audio data into an automatic speech recognition (ASR) process to transcribe the audio into text.
- ASR automatic speech recognition
- the ASR process is driven by a model of language and acoustic characteristics.
- the resulting text does not need to be perfectly transcribed, but at least a decent representation of the content of the audio media.
- the ASR process can be sufficiently discriminatory to background noise, music, sound effects, and the presence of more than one person speaking.
- the transcribed text can then be passed into a classifier that uses a tag classification model to produce a short list of tags that are most likely to be appropriate to the transcribed text.
- the author can then peruse the short list of likely tags, and select those to apply.
- the classifier can implement a confidence threshold to reduce the likelihood of an inappropriate tag being selected.
- FIG. 3 illustrates a system 300 for media recognition and tagging where the media is audio data.
- An author 302 creates the non-tagged (or untagged) audio data (e.g., file, streaming) 304 which can also include a channel of audio data that typically accompanies video content.
- the audio portion of the data 304 is input to an ASR transcriber 306 for processing the audio into digital data, and thereafter, converting the digital data into text.
- an ASR data model 308 is provided for converting the digital portion of the audio data into text.
- the transcribed text 310 is then sent to a tag classifier 312 for applying probabilistic and/or statistical analysis to the transcribed text 310 in order to classify the text for tagging.
- a tag classification model 314 is generated and evolves as tag processing continues. Based on the received transcribed text 310 , the tag classifier 312 obtains tag information from the tag model 314 and outputs the tag information as a list of likely tags 316 .
- the list 316 can be presented to a user (e.g., the author 302 ) via a user interface, for example.
- the author 302 can then select from the list 316 a tag for use in tagging the audio and/or video data 304 .
- the tag classifier 312 can include selection functionality that automatically prioritizes (or ranks) and selects the tag for associating with the audio and/or video data.
- the classifier 312 can be configured to implement a confidence threshold to reduce the likelihood of an inappropriate tag being selected.
- FIG. 4 illustrates a system 400 for media recognition and tagging where the media is blog posting data 402 .
- An author 400 creates the untagged blog posting data 402 , which is then forwarded to the tag classifier 312 .
- the tag classifier 312 obtains tag information from the tag classification model 314 and outputs the tag information as the list of likely tags 316 .
- the list 316 can be presented to a user (e.g., the author 302 ) via a user interface, for example.
- the author 302 can then select from the list 316 a tag for the use in tagging the blog posting data 402 .
- the tag classifier 312 can include selection functionality that automatically prioritizes and selects the tag for associating with the blog posting data 402 .
- the accuracy of the classifier 312 can directly impact the effectiveness of the tagging process.
- the classifier 312 should be representative of the correct usage of tags in the folksonomy defined by network (e.g., Internet) content.
- the system 300 can also include a mechanism to ensure this by training the tag classification model 314 .
- FIG. 5 illustrates a tagging system 500 that employs a crawler 502 to locate textual context from network entities for training of a model.
- the system 500 can employ the network (e.g., Internet) crawler 502 to locate textual content of a network 504 that has already been tagged.
- the textual content can include human-assigned tags 506 obtained by and forwarded from the network crawler 502 to a tag model trainer 508 , as well as blog posting text 510 obtained by and forwarded from the network crawler 502 to the tag model trainer 508 .
- the text 510 and the corresponding tags 506 are fed into the tag model trainer process 508 , which updates the tag classification model 314 .
- Conventional Internet crawling and classification model training techniques can be employed, as are well-known by one skilled in the art.
- Another input to the tag model trainer 508 can be the source data of the content ( 506 and 510 ), as provided by the crawler 502 , since the classifier should ideally also be representative of local variations in tagging across the Internet.
- the source data can include the URL of the content, author, industry, for example, as well as other information that will aid in tagging the content.
- the source data can be obtained via the crawler 502 , and passed to the trainer 508 along with the corresponding human-assigned tags 506 and the blog posting text 510 , for example.
- the URL address, author data, industry information and/or other source data associated with the blog can be communicated to the trainer 508 as part of the blog posting text 510 .
- the source data can be processed as an input with respect to any data input described herein.
- FIG. 6 illustrates a system 600 where tagged audio data is employed for tag model training.
- the crawler 502 can also search for untagged audio/video data 602 .
- the network crawler 502 obtains the human assigned tags 506 and audio and/or video data 602 .
- the human assigned tag data 506 is passed directly to the tag model trainer 508 .
- the audio portion of the audio/video data 602 is passed to the ASR transcriber 306 , which employs the ASR model 308 to process the audio portion of the data 602 into ASR-transcribed text 310 .
- the text 310 is then passed to the tag model trainer 508 .
- Both the human-assigned tag information and the transcribed text 310 are then used to train the tag classification model 314 . Since the ASR transcription process 306 can, at times, be less than optimum, other inputs to the trainer 508 can be utilized to assign less weight to the text 310 when training the classification model 314 .
- FIG. 7 illustrates a system 700 for using transcribed and tagged audio to train classification and ASR models.
- the system 700 includes the crawler 502 that searches a network (e.g., network 504 of FIG. 5 ) for the human-assigned tags 506 , audio and/or video data 602 , and additionally, human transcribed audio/video data 702 .
- the human-assigned tags 506 and human-transcribed audio portion of the data 702 are passed to the tag model trainer 508 for training of the tag classification model 314 .
- the human-transcribed audio portion of the data 702 and the untagged audio/video data 602 are passed to the ASR model trainer 704 for processing and training of the ASR model 308 to improve the language and acoustic models.
- the ASR model trainer 704 can also take into account the source of the content. For example, audio content from specific authors or sites may predominantly use the same set of speakers, and hence, speaker dependent characteristics can be incorporated into the ASR model 308 . Conventional techniques for the training of acoustic and language models can be employed in the system 700 .
- FIG. 8 illustrates a system 800 that employs a machine learning and reasoning (LR) component 802 which facilitates automating one or more features of the modeling and tagging architecture.
- the subject architecture e.g., in connection with selection
- Such classification can employ a probabilistic and/or other statistical analysis (e.g., one factoring into the analysis utilities and costs to maximize the expected value to one or more people) to prognose or infer an action that a user desires to be automatically performed.
- to infer and “inference” refer generally to the process of reasoning about or inferring states of the system, environment, and/or user from a set of observations as captured via events and/or data. Inference can be employed to identify a specific context or action, or can generate a probability distribution over states, for example.
- the inference can be probabilistic—that is, the computation of a probability distribution over states of interest based on a consideration of data and events.
- Inference can also refer to techniques employed for composing higher-level events from a set of events and/or data. Such inference results in the construction of new events or actions from a set of observed events and/or stored event data, whether or not the events are correlated in close temporal proximity, and whether the events and data come from one or several event and data sources.
- a support vector machine is an example of a classifier that can be employed.
- the SVM operates by finding a hypersurface in the space of possible inputs that splits the triggering input events from the non-triggering events in an optimal way. Intuitively, this makes the classification correct for testing data that is near, but not identical to training data.
- Other directed and undirected model classification approaches include, for example, various forms of statistical regression, na ⁇ ve Bayes, Bayesian networks, decision trees, neural networks, fuzzy logic models, and other statistical classification models representing different patterns of independence can be employed. Classification as used herein also is inclusive of methods used to assign rank and/or priority.
- the subject architecture can employ classifiers that are explicitly trained (e.g., via a generic training data) as well as implicitly trained (e.g., via observing user behavior, receiving extrinsic information).
- SVM's are configured via a learning or training phase within a classifier constructor and feature selection module.
- the classifier(s) can be employed to automatically learn and perform a number of functions according to predetermined criteria.
- the learning and reasoning component 802 can be employed to learn and reason about different aspects of one or more of the previously-disclosed systems 200 , 300 , 400 , 500 , 600 , 700 and 800 , for example.
- the learning and reasoning component 802 can be employed in FIG. 3 to interface to one or more of the ASR transcriber 306 to analyze the data 304 and quality of the transcribed text 310 , the ASR model 308 to sample and/or analyze ASR model 308 processes and data, the tag classification model 314 for analysis of tag classification processes and data, the tag classifier 312 to analyze classification processes, and the tag list 316 to further analyze the “quality” of the output of the classifier 312 .
- the learning and reasoning component 802 can further interface to one or more of the inputs and/or outputs of the tag classifier 312 to monitor, analyze, and modify classifier 312 and model 314 processes based on the non-tagged blog posting 402 .
- the learning and reasoning component 802 can interface to one or more of the inputs and/or outputs of the network crawler 502 , the network 504 , the tag model trainer 508 , and the tag model 314 to monitor, analyze, and modify processes associated therewith.
- the learning and reasoning component 802 can interface to one or more of the inputs and/or outputs of the network crawler 502 , the inputs and/or outputs of the ASR transcriber 306 , the inputs and/or outputs of the tag model trainer 408 , and one or more of the models ( 308 or/and 314 ).
- the learning and reasoning component 802 can interface to one or more of the inputs and/or outputs of the network crawler 502 , the inputs and/or outputs of the tag model trainer 508 , the inputs and/or outputs of the ASR model trainer 704 , and one or more of the models ( 308 or/and 314 ).
- the learning and reasoning component 802 can control the network crawler 502 to search sites (e.g., web sites, blogs, etc.) that are learned to provide tagging information of a higher quality than other sites previously searched.
- the quality can be based on the amount of human interaction involved after automatically providing the tag lists, for example. If it is learned that the user frequently selects tags that are generated based on information of a site or group of sites, the crawler can be controlled to search those sites more frequently.
- FIG. 9 illustrates a method of ranking and selecting tags.
- source information is received and processed into output data based on searched information sources (e.g., web sites, blogs, forums, etc.). This can be a manual and/or automated process.
- the output is processed in a group or listing of tags.
- the group or listing of tags is then ranked according to criteria, which can be based on predetermined criteria or automatically derived criteria (e.g., using the learning and reasoning component 802 ).
- tag selection from the list or group for the current content is initiated.
- the system checks the mode of selection.
- flow is to 910 to automatically select the tag from the list.
- the selected tag is then assigned to (or associated with) the content.
- flow is from 908 to 914 to manually select the tag based on the ranked list.
- Flow is then to 912 to tag the content.
- the selectors e.g., user, software
- FIG. 10 illustrates a method of processing source information of various types in furtherance of generating a tag for new content.
- sources of information e.g., web sites, blog sites, information servers, computing devices, smart phones, databases, . . .
- sources of information e.g., web sites, blog sites, information servers, computing devices, smart phones, databases, . . .
- categorize the information according to type This can be textual, audio, image and/or video, for example.
- the textual information can be in many different formats.
- textual information can be raw text as presented on a web page, text scanned and obtained from source code underlying a web page, e-mail messages, XML (extensible markup language) text, program code, and so on.
- the categorized information is prepared for analysis.
- textual data can be processed directly for content.
- Audio data can be recognized and translated into text for processing, and image data can be image processed according to conventional image processing techniques and annotated (e.g., manually, automatically), for example, as to the information depicted.
- Video data can be processed to separate the audio portion from the video portion, and the audio portion processed as previously described.
- a single frame or groups of frames of the video can be processed and annotated, as described above, and according to convention video and image processing technologies.
- the analyzed output can be classification processed in order to update a classification model, and/or to utilize the existing state of the model to classify and obtain a listing of tags for selection.
- the list is processed (e.g., automatically, manually) to obtain one or more tags to assign to the content.
- FIG. 11 illustrates a method of selecting information sources based on learning and reasoning processing.
- information sources e.g., network-based, device-based
- tagging information e.g., human-tagged content, untagged information
- the tagging information is processed (e.g., for classification, for learning and reasoning).
- a tag model is updated and/or utilized to obtain a list of tags for selection.
- the learning and reasoning component monitors these changes.
- the changes are learned, and the learning and reasoning component changes the information sources to be selected for future searches based on the learned changes.
- the process is repeated for the next processes.
- FIG. 12 illustrates a method of processing different types of information for tagging content.
- a search process is initiated for human-assigned tags, human-transcribed audio data, and raw audio and/or video file(s).
- a model trainer receives and processes the human-assigned tags and human-transcribed audio and/or video information.
- the audio and/or video file and the human-transcribed audio/video data is processed using an ASR model trainer.
- the ASR model trainer updates an ASR model.
- a tag model trainer receives the human-assigned tags and human-transcribed audio/video data.
- the tag model trainer updates a tag classification model.
- FIG. 13 illustrates a server 1300 that can employ the functionality provided by the system 200 of FIG. 2 and/or the system 800 of FIG. 8 .
- a subsystem 1302 includes the functionality, which typically, will operate as a background process that is relatively transparent to other server processes.
- the subsystem 1302 can interface to the operating system and/or other applications that access and provide the desired data.
- FIG. 14 illustrates a client device 1400 that can employ the functionality provided by the system 200 of FIG. 2 and/or the system 800 of FIG. 8 .
- a subsystem 1402 includes the functionality, which typically, will operate as a background process that is transparent to the user.
- the subsystem 1402 can interface to the operating system and/or other applications that access and provide the desired data.
- a component can be, but is not limited to being, a process running on a processor, a processor, a hard disk drive, multiple storage drives (of optical and/or magnetic storage medium), an object, an executable, a thread of execution, a program, and/or a computer.
- a component can be, but is not limited to being, a process running on a processor, a processor, a hard disk drive, multiple storage drives (of optical and/or magnetic storage medium), an object, an executable, a thread of execution, a program, and/or a computer.
- an application running on a server and the server can be a component.
- One or more components can reside within a process and/or thread of execution, and a component can be localized on one computer and/or distributed between two or more computers.
- FIG. 15 there is illustrated a block diagram of a computing system 1500 operable to execute tagging in accordance with the disclosed architecture.
- FIG. 15 and the following discussion are intended to provide a brief, general description of a suitable computing system 1500 in which the various aspects can be implemented. While the description above is in the general context of computer-executable instructions that may run on one or more computers, those skilled in the art will recognize that a novel embodiment also can be implemented in combination with other program modules and/or as a combination of hardware and software.
- program modules include routines, programs, components, data structures, etc., that perform particular tasks or implement particular abstract data types.
- inventive methods can be practiced with other computer system configurations, including single-processor or multiprocessor computer systems, minicomputers, mainframe computers, as well as personal computers, hand-held computing devices, microprocessor-based or programmable consumer electronics, and the like, each of which can be operatively coupled to one or more associated devices.
- the illustrated aspects may also be practiced in distributed computing environments where certain tasks are performed by remote processing devices that are linked through a communications network.
- program modules can be located in both local and remote memory storage devices.
- Computer-readable media can be any available media that can be accessed by the computer and includes volatile and non-volatile media, removable and non-removable media.
- Computer-readable media can comprise computer storage media and communication media.
- Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data.
- Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital video disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer.
- the exemplary computing system 1500 for implementing various aspects includes a computer 1502 , the computer 1502 including a processing unit 1504 , a system memory 1506 and a system bus 1508 .
- the system bus 1508 provides an interface for system components including, but not limited to, the system memory 1506 to the processing unit 1504 .
- the processing unit 1504 can be any of various commercially available processors. Dual microprocessors and other multi-processor architectures may also be employed as the processing unit 1504 .
- the system bus 1508 can be any of several types of bus structure that may further interconnect to a memory bus (with or without a memory controller), a peripheral bus, and a local bus using any of a variety of commercially available bus architectures.
- the system memory 1506 includes read-only memory (ROM) 1510 and random access memory (RAM) 1512 .
- ROM read-only memory
- RAM random access memory
- a basic input/output system (BIOS) is stored in a non-volatile memory 1510 such as ROM, EPROM, EEPROM, which BIOS contains the basic routines that help to transfer information between elements within the computer 1502 , such as during start-up.
- the RAM 1512 can also include a high-speed RAM such as static RAM for caching data.
- the computer 1502 further includes an internal hard disk drive (HDD) 1514 (e.g., EIDE, SATA), which internal hard disk drive 1514 may also be configured for external use in a suitable chassis (not shown), a magnetic floppy disk drive (FDD) 1516 , (e.g., to read from or write to a removable diskette 1518 ) and an optical disk drive 1520 , (e.g., reading a CD-ROM disk 1522 or, to read from or write to other high capacity optical media such as the DVD).
- the hard disk drive 1514 , magnetic disk drive 1516 and optical disk drive 1520 can be connected to the system bus 1508 by a hard disk drive interface 1524 , a magnetic disk drive interface 1526 and an optical drive interface 1528 , respectively.
- the interface 1524 for external drive implementations includes at least one or both of Universal Serial Bus (USB) and IEEE 1394 interface technologies.
- the drives and their associated computer-readable media provide nonvolatile storage of data, data structures, computer-executable instructions, and so forth.
- the drives and media accommodate the storage of any data in a suitable digital format.
- computer-readable media refers to a HDD, a removable magnetic diskette, and a removable optical media such as a CD or DVD, it should be appreciated by those skilled in the art that other types of media which are readable by a computer, such as zip drives, magnetic cassettes, flash memory cards, cartridges, and the like, may also be used in the exemplary operating environment, and further, that any such media may contain computer-executable instructions for performing the methods of the disclosed architecture.
- a number of program modules can be stored in the drives and RAM 1512 , including an operating system 1530 , one or more application programs 1532 , other program modules 1534 and program data 1536 . All or portions of the operating system, applications, modules, and/or data can also be cached in the RAM 1512 . It is to be appreciated that the architecture can be implemented with various commercially available operating systems or combinations of operating systems.
- the applications 1532 and/or modules 1534 can include the components described supra in the figures, for example, the modeling component 202 , tagging component 208 , models ( 308 and 314 ), classifier 312 , transcriber 306 , and trainers ( 508 and 704 ).
- a user can enter commands and information into the computer 1502 through one or more wired/wireless input devices, for example, a keyboard 1538 and a pointing device, such as a mouse 1540 .
- Other input devices may include a microphone, an IR remote control, a joystick, a game pad, a stylus pen, touch screen, or the like.
- These and other input devices are often connected to the processing unit 1504 through an input device interface 1542 that is coupled to the system bus 1508 , but can be connected by other interfaces, such as a parallel port, an IEEE 1394 serial port, a game port, a USB port, an IR interface, etc.
- a monitor 1544 or other type of display device is also connected to the system bus 1508 via an interface, such as a video adapter 1546 .
- a computer typically includes other peripheral output devices (not shown), such as speakers, printers, etc.
- the computer 1502 may operate in a networked environment using logical connections via wired and/or wireless communications to one or more remote computers, such as a remote computer(s) 1548 .
- the remote computer(s) 1548 can be a workstation, a server computer, a router, a personal computer, portable computer, microprocessor-based entertainment appliance, a peer device or other common network node, and typically includes many or all of the elements described relative to the computer 1502 , although, for purposes of brevity, only a memory/storage device 1550 is illustrated.
- the logical connections depicted include wired/wireless connectivity to a local area network (LAN) 1552 and/or larger networks, for example, a wide area network (WAN) 1554 .
- LAN and WAN networking environments are commonplace in offices and companies, and facilitate enterprise-wide computer networks, such as intranets, all of which may connect to a global communications network, for example, the Internet.
- the computer 1502 When used in a LAN networking environment, the computer 1502 is connected to the local network 1552 through a wired and/or wireless communication network interface or adapter 1556 .
- the adaptor 1556 may facilitate wired or wireless communication to the LAN 1552 , which may also include a wireless access point disposed thereon for communicating with the wireless adaptor 1556 .
- the computer 1502 can include a modem 1558 , or is connected to a communications server on the WAN 1554 , or has other means for establishing communications over the WAN 1554 , such as by way of the Internet.
- the modem 1558 which can be internal or external and a wired or wireless device, is connected to the system bus 1508 via the serial port interface 1542 .
- program modules depicted relative to the computer 1502 can be stored in the remote memory/storage device 1550 . It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers can be used.
- the computer 1502 is operable to communicate with any wireless devices or entities operatively disposed in wireless communication, for example, a printer, scanner, desktop and/or portable computer, portable data assistant, communications satellite, any piece of equipment or location associated with a wirelessly detectable tag (e.g., a kiosk, news stand, restroom), and telephone.
- any wireless devices or entities operatively disposed in wireless communication for example, a printer, scanner, desktop and/or portable computer, portable data assistant, communications satellite, any piece of equipment or location associated with a wirelessly detectable tag (e.g., a kiosk, news stand, restroom), and telephone.
- the communication can be a predefined structure as with a conventional network or simply an ad hoc communication between at least two devices.
- Wi-Fi Wireless Fidelity
- Wi-Fi is a wireless technology similar to that used in a cell phone that enables such devices, for example, computers, to send and receive data indoors and out; anywhere within the range of a base station.
- Wi-Fi networks use radio technologies called IEEE 802.11x (a, b, g, etc.) to provide secure, reliable, fast wireless connectivity.
- IEEE 802.11x a, b, g, etc.
- a Wi-Fi network can be used to connect computers to each other, to the Internet, and to wired networks (which use IEEE 802.3 or Ethernet).
- the system 1600 includes one or more client(s) 1602 .
- the client(s) 1602 can be hardware and/or software (e.g., threads, processes, computing devices).
- the client(s) 1602 can house cookie(s) and/or associated contextual information, for example.
- the system 1600 also includes one or more server(s) 1604 .
- the server(s) 1604 can also be hardware and/or software (e.g., threads, processes, computing devices).
- the servers 1604 can house threads to perform transformations by employing the architecture, for example.
- One possible communication between a client 1602 and a server 1604 can be in the form of a data packet adapted to be transmitted between two or more computer processes.
- the data packet may include a cookie and/or associated contextual information, for example.
- the system 1600 includes a communication framework 1606 (e.g., a global communication network such as the Internet) that can be employed to facilitate communications between the client(s) 1602 and the server(s) 1604 .
- a communication framework 1606 e.g., a global communication network such as the Internet
- Communications can be facilitated via a wired (including optical fiber) and/or wireless technology.
- the client(s) 1602 are operatively connected to one or more client data store(s) 1608 that can be employed to store information local to the client(s) 1602 (e.g., cookie(s) and/or associated contextual information).
- the server(s) 1604 are operatively connected to one or more server data store(s) 1610 that can be employed to store information local to the servers 1604 .
- the servers 1604 can employ the systems described supra, for example, the systems 200 , 300 , 400 , et seq.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Library & Information Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (19)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/717,266 US8103646B2 (en) | 2007-03-13 | 2007-03-13 | Automatic tagging of content based on a corpus of previously tagged and untagged content |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/717,266 US8103646B2 (en) | 2007-03-13 | 2007-03-13 | Automatic tagging of content based on a corpus of previously tagged and untagged content |
Publications (2)
Publication Number | Publication Date |
---|---|
US20080228749A1 US20080228749A1 (en) | 2008-09-18 |
US8103646B2 true US8103646B2 (en) | 2012-01-24 |
Family
ID=39763684
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/717,266 Expired - Fee Related US8103646B2 (en) | 2007-03-13 | 2007-03-13 | Automatic tagging of content based on a corpus of previously tagged and untagged content |
Country Status (1)
Country | Link |
---|---|
US (1) | US8103646B2 (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100138712A1 (en) * | 2008-12-01 | 2010-06-03 | Changki Lee | Apparatus and method for verifying training data using machine learning |
US20110106879A1 (en) * | 2009-10-30 | 2011-05-05 | Samsung Electronics Co., Ltd. | Apparatus and method for reproducing multimedia content |
US20110307542A1 (en) * | 2010-06-10 | 2011-12-15 | Microsoft Corporation | Active Image Tagging |
US8930308B1 (en) | 2012-08-20 | 2015-01-06 | 3Play Media, Inc. | Methods and systems of associating metadata with media |
US20170287500A1 (en) * | 2016-04-04 | 2017-10-05 | Honeywell International Inc. | System and method to distinguish sources in a multiple audio source environment |
US9953646B2 (en) | 2014-09-02 | 2018-04-24 | Belleau Technologies | Method and system for dynamic speech recognition and tracking of prewritten script |
US9953062B2 (en) | 2014-08-18 | 2018-04-24 | Lexisnexis, A Division Of Reed Elsevier Inc. | Systems and methods for providing for display hierarchical views of content organization nodes associated with captured content and for determining organizational identifiers for captured content |
CN109886211A (en) * | 2019-02-25 | 2019-06-14 | 北京达佳互联信息技术有限公司 | Data mask method, device, electronic equipment and storage medium |
US10381022B1 (en) * | 2015-12-23 | 2019-08-13 | Google Llc | Audio classifier |
US10714144B2 (en) | 2017-11-06 | 2020-07-14 | International Business Machines Corporation | Corroborating video data with audio data from video content to create section tagging |
US11217228B2 (en) * | 2016-03-22 | 2022-01-04 | Sri International | Systems and methods for speech recognition in unseen and noisy channel conditions |
US11551407B1 (en) | 2021-09-01 | 2023-01-10 | Design Interactive, Inc. | System and method to convert two-dimensional video into three-dimensional extended reality content |
Families Citing this family (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6233389B1 (en) | 1998-07-30 | 2001-05-15 | Tivo, Inc. | Multimedia time warping system |
US7680940B2 (en) * | 2007-03-28 | 2010-03-16 | Scenera Technologies, Llc | Method and system for managing dynamic associations between folksonomic data and resources |
US8880529B2 (en) | 2007-05-15 | 2014-11-04 | Tivo Inc. | Hierarchical tags with community-based ratings |
US9288548B1 (en) | 2007-05-15 | 2016-03-15 | Tivo Inc. | Multimedia content search system |
US9324082B2 (en) * | 2007-07-06 | 2016-04-26 | Ebay Inc. | System and method for providing information tagging in a networked system |
US8170916B1 (en) * | 2007-09-06 | 2012-05-01 | Amazon Technologies, Inc. | Related-item tag suggestions |
US8086504B1 (en) * | 2007-09-06 | 2011-12-27 | Amazon Technologies, Inc. | Tag suggestions based on item metadata |
US8909632B2 (en) * | 2007-10-17 | 2014-12-09 | International Business Machines Corporation | System and method for maintaining persistent links to information on the Internet |
US8126863B2 (en) * | 2007-10-25 | 2012-02-28 | Apple Inc. | Search control combining classification and text-based searching techniques |
US8185528B2 (en) * | 2008-06-23 | 2012-05-22 | Yahoo! Inc. | Assigning human-understandable labels to web pages |
US8131708B2 (en) * | 2008-06-30 | 2012-03-06 | Vobile, Inc. | Methods and systems for monitoring and tracking videos on the internet |
US20100042612A1 (en) * | 2008-07-11 | 2010-02-18 | Gomaa Ahmed A | Method and system for ranking journaled internet content and preferences for use in marketing profiles |
US9892103B2 (en) * | 2008-08-18 | 2018-02-13 | Microsoft Technology Licensing, Llc | Social media guided authoring |
US20100191658A1 (en) * | 2009-01-26 | 2010-07-29 | Kannan Pallipuram V | Predictive Engine for Interactive Voice Response System |
US20100303425A1 (en) * | 2009-05-29 | 2010-12-02 | Ziwei Liu | Protected Fiber Optic Assemblies and Methods for Forming the Same |
US8533134B1 (en) * | 2009-11-17 | 2013-09-10 | Google Inc. | Graph-based fusion for video classification |
US8452778B1 (en) | 2009-11-19 | 2013-05-28 | Google Inc. | Training of adapted classifiers for video categorization |
KR20110073756A (en) * | 2009-12-24 | 2011-06-30 | 삼성전자주식회사 | Method for tagging condition information and multimedia apparatus using the same |
US8914368B2 (en) * | 2010-03-31 | 2014-12-16 | International Business Machines Corporation | Augmented and cross-service tagging |
US8655881B2 (en) | 2010-09-16 | 2014-02-18 | Alcatel Lucent | Method and apparatus for automatically tagging content |
US8533192B2 (en) | 2010-09-16 | 2013-09-10 | Alcatel Lucent | Content capture device and methods for automatically tagging content |
US8666978B2 (en) | 2010-09-16 | 2014-03-04 | Alcatel Lucent | Method and apparatus for managing content tagging and tagged content |
US9582503B2 (en) | 2010-09-29 | 2017-02-28 | Microsoft Technology Licensing, Llc | Interactive addition of semantic concepts to a document |
US9251503B2 (en) | 2010-11-01 | 2016-02-02 | Microsoft Technology Licensing, Llc | Video viewing and tagging system |
WO2012064976A1 (en) * | 2010-11-11 | 2012-05-18 | Google Inc. | Learning tags for video annotation using latent subtags |
US9087297B1 (en) | 2010-12-17 | 2015-07-21 | Google Inc. | Accurate video concept recognition via classifier combination |
US8856051B1 (en) | 2011-04-08 | 2014-10-07 | Google Inc. | Augmenting metadata of digital objects |
US8706655B1 (en) * | 2011-06-03 | 2014-04-22 | Google Inc. | Machine learned classifiers for rating the content quality in videos using panels of human viewers |
US9063935B2 (en) | 2011-06-17 | 2015-06-23 | Harqen, Llc | System and method for synchronously generating an index to a media stream |
US9135560B1 (en) * | 2011-06-30 | 2015-09-15 | Sumo Logic | Automatic parser selection and usage |
US20130297469A1 (en) * | 2012-05-01 | 2013-11-07 | Bank Of America Corporation | Tagging, data collection and content delivery in a globally distributed computing infrastructure |
US9852215B1 (en) * | 2012-09-21 | 2017-12-26 | Amazon Technologies, Inc. | Identifying text predicted to be of interest |
US20160019202A1 (en) * | 2014-07-21 | 2016-01-21 | Charles Adams | System, method, and apparatus for review and annotation of audiovisual media content |
US10380166B2 (en) | 2015-06-29 | 2019-08-13 | The Nielson Company (Us), Llc | Methods and apparatus to determine tags for media using multiple media features |
US10002136B2 (en) * | 2015-07-27 | 2018-06-19 | Qualcomm Incorporated | Media label propagation in an ad hoc network |
US10079738B1 (en) * | 2015-11-19 | 2018-09-18 | Amazon Technologies, Inc. | Using a network crawler to test objects of a network document |
EP3532906A4 (en) * | 2016-10-28 | 2020-04-15 | Vilynx, Inc. | Video tagging system and method |
US10062039B1 (en) * | 2017-06-28 | 2018-08-28 | CS Disco, Inc. | Methods and apparatus for asynchronous and interactive machine learning using word embedding within text-based documents and multimodal documents |
US10691738B1 (en) * | 2017-08-07 | 2020-06-23 | Amdocs Development Limited | System, method, and computer program for tagging application data with enrichment information for interpretation and analysis by an analytics system |
US10108902B1 (en) * | 2017-09-18 | 2018-10-23 | CS Disco, Inc. | Methods and apparatus for asynchronous and interactive machine learning using attention selection techniques |
US10565189B2 (en) * | 2018-02-26 | 2020-02-18 | International Business Machines Corporation | Augmentation of a run-time query |
US10674197B2 (en) | 2018-02-28 | 2020-06-02 | At&T Intellectual Property I, L.P. | Media content distribution system and methods for use therewith |
US10860649B2 (en) * | 2018-03-14 | 2020-12-08 | TCL Research America Inc. | Zoomable user interface for TV |
US10755229B2 (en) | 2018-04-11 | 2020-08-25 | International Business Machines Corporation | Cognitive fashion-ability score driven fashion merchandising acquisition |
US10956928B2 (en) | 2018-05-17 | 2021-03-23 | International Business Machines Corporation | Cognitive fashion product advertisement system and method |
US11538083B2 (en) | 2018-05-17 | 2022-12-27 | International Business Machines Corporation | Cognitive fashion product recommendation system, computer program product, and method |
US10963744B2 (en) * | 2018-06-27 | 2021-03-30 | International Business Machines Corporation | Cognitive automated and interactive personalized fashion designing using cognitive fashion scores and cognitive analysis of fashion trends and data |
US11151165B2 (en) * | 2018-08-30 | 2021-10-19 | Microsoft Technology Licensing, Llc | Data classification using data flow analysis |
CN110046278B (en) * | 2019-03-11 | 2021-10-15 | 北京奇艺世纪科技有限公司 | Video classification method and device, terminal equipment and storage medium |
US11488290B2 (en) | 2019-03-31 | 2022-11-01 | Cortica Ltd. | Hybrid representation of a media unit |
US12049116B2 (en) | 2020-09-30 | 2024-07-30 | Autobrains Technologies Ltd | Configuring an active suspension |
EP4194300A1 (en) | 2021-08-05 | 2023-06-14 | Autobrains Technologies LTD. | Providing a prediction of a radius of a motorcycle turn |
Citations (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5649060A (en) | 1993-10-18 | 1997-07-15 | International Business Machines Corporation | Automatic indexing and aligning of audio and text using speech recognition |
US6006241A (en) | 1997-03-14 | 1999-12-21 | Microsoft Corporation | Production of a video stream with synchronized annotations over a computer network |
EP1079313A2 (en) | 1999-08-20 | 2001-02-28 | Digitake Software Systems Limited | An audio processing system |
WO2001022729A1 (en) | 1999-09-20 | 2001-03-29 | Tivo, Inc. | Closed caption tagging system |
US6332144B1 (en) | 1998-03-11 | 2001-12-18 | Altavista Company | Technique for annotating media |
WO2002008948A2 (en) | 2000-07-24 | 2002-01-31 | Vivcom, Inc. | System and method for indexing, searching, identifying, and editing portions of electronic multimedia files |
US6418424B1 (en) | 1991-12-23 | 2002-07-09 | Steven M. Hoffberg | Ergonomic man-machine interface incorporating adaptive pattern recognition based control system |
US20020131511A1 (en) | 2000-08-25 | 2002-09-19 | Ian Zenoni | Video tags and markers |
US20020194188A1 (en) | 2001-06-14 | 2002-12-19 | Ralf Ostermann | Method and apparatus for automatically or electronically addressing data within a file or files |
US20040090462A1 (en) * | 1997-12-22 | 2004-05-13 | Ricoh Company, Ltd. | Multimedia visualization and integration environment |
US20040199494A1 (en) * | 2003-04-04 | 2004-10-07 | Nikhil Bhatt | Method and apparatus for tagging and locating audio data |
US20050055321A1 (en) * | 2000-03-06 | 2005-03-10 | Kanisa Inc. | System and method for providing an intelligent multi-step dialog with a user |
US20050131559A1 (en) * | 2002-05-30 | 2005-06-16 | Jonathan Kahn | Method for locating an audio segment within an audio file |
US7035803B1 (en) | 2000-11-03 | 2006-04-25 | At&T Corp. | Method for sending multi-media messages using customizable background images |
US20060149558A1 (en) * | 2001-07-17 | 2006-07-06 | Jonathan Kahn | Synchronized pattern recognition source data processed by manual or automatic means for creation of shared speaker-dependent speech user profile |
US20060242554A1 (en) * | 2005-04-25 | 2006-10-26 | Gather, Inc. | User-driven media system in a computer network |
US20070078832A1 (en) * | 2005-09-30 | 2007-04-05 | Yahoo! Inc. | Method and system for using smart tags and a recommendation engine using smart tags |
US20070106508A1 (en) * | 2003-04-29 | 2007-05-10 | Jonathan Kahn | Methods and systems for creating a second generation session file |
US20070112630A1 (en) * | 2005-11-07 | 2007-05-17 | Scanscout, Inc. | Techniques for rendering advertisments with rich media |
US20070130112A1 (en) * | 2005-06-30 | 2007-06-07 | Intelligentek Corp. | Multimedia conceptual search system and associated search method |
US20070179968A1 (en) * | 2006-02-02 | 2007-08-02 | Fish Robert D | Information registry |
US20070288514A1 (en) * | 2006-06-09 | 2007-12-13 | Ebay Inc. | System and method for keyword extraction |
US20080021963A1 (en) * | 2006-07-21 | 2008-01-24 | At&T Corp. | Content dissemination using a multi-protocol converter |
US20080040674A1 (en) * | 2006-08-09 | 2008-02-14 | Puneet K Gupta | Folksonomy-Enhanced Enterprise-Centric Collaboration and Knowledge Management System |
US20080069480A1 (en) * | 2006-09-14 | 2008-03-20 | Parham Aarabi | Method, system and computer program for interactive spatial link-based image searching, sorting and/or displaying |
US20080082416A1 (en) * | 2006-09-29 | 2008-04-03 | Kotas Paul A | Community-Based Selection of Advertisements for a Concept-Centric Electronic Marketplace |
US20080086688A1 (en) * | 2006-10-05 | 2008-04-10 | Kubj Limited | Various methods and apparatus for moving thumbnails with metadata |
US20080097970A1 (en) * | 2005-10-19 | 2008-04-24 | Fast Search And Transfer Asa | Intelligent Video Summaries in Information Access |
US20080104032A1 (en) * | 2004-09-29 | 2008-05-01 | Sarkar Pte Ltd. | Method and System for Organizing Items |
US20080114644A1 (en) * | 2006-03-03 | 2008-05-15 | Frank Martin R | Convergence Of Terms Within A Collaborative Tagging Environment |
US20080125892A1 (en) * | 2006-11-27 | 2008-05-29 | Ramsay Hoguet | Converting web content into two-dimensional cad drawings and three-dimensional cad models |
US20080154949A1 (en) * | 2006-12-26 | 2008-06-26 | Brooks David A | Method and system for social bookmarking of resources exposed in web pages that don't follow the representational state transfer architectural style (rest) |
US20080201348A1 (en) * | 2007-02-15 | 2008-08-21 | Andy Edmonds | Tag-mediated review system for electronic content |
US20080255837A1 (en) * | 2004-11-30 | 2008-10-16 | Jonathan Kahn | Method for locating an audio segment within an audio file |
US20090287674A1 (en) * | 2008-05-15 | 2009-11-19 | International Business Machines Corporation | Method for Enhancing Search and Browsing in Collaborative Tagging Systems Through Learned Tag Hierachies |
-
2007
- 2007-03-13 US US11/717,266 patent/US8103646B2/en not_active Expired - Fee Related
Patent Citations (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6418424B1 (en) | 1991-12-23 | 2002-07-09 | Steven M. Hoffberg | Ergonomic man-machine interface incorporating adaptive pattern recognition based control system |
US5649060A (en) | 1993-10-18 | 1997-07-15 | International Business Machines Corporation | Automatic indexing and aligning of audio and text using speech recognition |
US6006241A (en) | 1997-03-14 | 1999-12-21 | Microsoft Corporation | Production of a video stream with synchronized annotations over a computer network |
US20040090462A1 (en) * | 1997-12-22 | 2004-05-13 | Ricoh Company, Ltd. | Multimedia visualization and integration environment |
US6332144B1 (en) | 1998-03-11 | 2001-12-18 | Altavista Company | Technique for annotating media |
EP1079313A2 (en) | 1999-08-20 | 2001-02-28 | Digitake Software Systems Limited | An audio processing system |
WO2001022729A1 (en) | 1999-09-20 | 2001-03-29 | Tivo, Inc. | Closed caption tagging system |
US20050055321A1 (en) * | 2000-03-06 | 2005-03-10 | Kanisa Inc. | System and method for providing an intelligent multi-step dialog with a user |
WO2002008948A2 (en) | 2000-07-24 | 2002-01-31 | Vivcom, Inc. | System and method for indexing, searching, identifying, and editing portions of electronic multimedia files |
US20020131511A1 (en) | 2000-08-25 | 2002-09-19 | Ian Zenoni | Video tags and markers |
US7035803B1 (en) | 2000-11-03 | 2006-04-25 | At&T Corp. | Method for sending multi-media messages using customizable background images |
US20020194188A1 (en) | 2001-06-14 | 2002-12-19 | Ralf Ostermann | Method and apparatus for automatically or electronically addressing data within a file or files |
US20060149558A1 (en) * | 2001-07-17 | 2006-07-06 | Jonathan Kahn | Synchronized pattern recognition source data processed by manual or automatic means for creation of shared speaker-dependent speech user profile |
US20050131559A1 (en) * | 2002-05-30 | 2005-06-16 | Jonathan Kahn | Method for locating an audio segment within an audio file |
US20040199494A1 (en) * | 2003-04-04 | 2004-10-07 | Nikhil Bhatt | Method and apparatus for tagging and locating audio data |
US20070106508A1 (en) * | 2003-04-29 | 2007-05-10 | Jonathan Kahn | Methods and systems for creating a second generation session file |
US20080104032A1 (en) * | 2004-09-29 | 2008-05-01 | Sarkar Pte Ltd. | Method and System for Organizing Items |
US20080255837A1 (en) * | 2004-11-30 | 2008-10-16 | Jonathan Kahn | Method for locating an audio segment within an audio file |
US20060242554A1 (en) * | 2005-04-25 | 2006-10-26 | Gather, Inc. | User-driven media system in a computer network |
US20070130112A1 (en) * | 2005-06-30 | 2007-06-07 | Intelligentek Corp. | Multimedia conceptual search system and associated search method |
US20070078832A1 (en) * | 2005-09-30 | 2007-04-05 | Yahoo! Inc. | Method and system for using smart tags and a recommendation engine using smart tags |
US20080097970A1 (en) * | 2005-10-19 | 2008-04-24 | Fast Search And Transfer Asa | Intelligent Video Summaries in Information Access |
US20070112630A1 (en) * | 2005-11-07 | 2007-05-17 | Scanscout, Inc. | Techniques for rendering advertisments with rich media |
US20070179968A1 (en) * | 2006-02-02 | 2007-08-02 | Fish Robert D | Information registry |
US20080114644A1 (en) * | 2006-03-03 | 2008-05-15 | Frank Martin R | Convergence Of Terms Within A Collaborative Tagging Environment |
US20070288514A1 (en) * | 2006-06-09 | 2007-12-13 | Ebay Inc. | System and method for keyword extraction |
US20080021963A1 (en) * | 2006-07-21 | 2008-01-24 | At&T Corp. | Content dissemination using a multi-protocol converter |
US20080040674A1 (en) * | 2006-08-09 | 2008-02-14 | Puneet K Gupta | Folksonomy-Enhanced Enterprise-Centric Collaboration and Knowledge Management System |
US20080069480A1 (en) * | 2006-09-14 | 2008-03-20 | Parham Aarabi | Method, system and computer program for interactive spatial link-based image searching, sorting and/or displaying |
US20080082416A1 (en) * | 2006-09-29 | 2008-04-03 | Kotas Paul A | Community-Based Selection of Advertisements for a Concept-Centric Electronic Marketplace |
US20080086688A1 (en) * | 2006-10-05 | 2008-04-10 | Kubj Limited | Various methods and apparatus for moving thumbnails with metadata |
US20080125892A1 (en) * | 2006-11-27 | 2008-05-29 | Ramsay Hoguet | Converting web content into two-dimensional cad drawings and three-dimensional cad models |
US20080154949A1 (en) * | 2006-12-26 | 2008-06-26 | Brooks David A | Method and system for social bookmarking of resources exposed in web pages that don't follow the representational state transfer architectural style (rest) |
US20080201348A1 (en) * | 2007-02-15 | 2008-08-21 | Andy Edmonds | Tag-mediated review system for electronic content |
US20090287674A1 (en) * | 2008-05-15 | 2009-11-19 | International Business Machines Corporation | Method for Enhancing Search and Browsing in Collaborative Tagging Systems Through Learned Tag Hierachies |
Non-Patent Citations (4)
Title |
---|
Adams, et al., "IBM Research TREC-2002 Video Retrieval System", http://www-24.nist.gov/projects/t2002v/results/notebook.papers/ibm.smith.pdf. |
Jaimes, et al., "Semi-automatic, data-driven construction of multimedia ontologies", Date: 2003, http://ieeexplore.ieee.org/ieI5/8655/27433/01221034.pdf?isNumber=. |
Smith, et al., "Integrating Features, Models, and Semantics for TREC Video Retrieval", http://www.scils.rutgers.edu/~muresan/IR/TREC/Proceedings/t10-proceedings/papers/IBM-TREC-VIDEO-2001.pdf. |
Smith, et al., "Integrating Features, Models, and Semantics for TREC Video Retrieval", http://www.scils.rutgers.edu/˜muresan/IR/TREC/Proceedings/t10—proceedings/papers/IBM-TREC-VIDEO-2001.pdf. |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8458520B2 (en) * | 2008-12-01 | 2013-06-04 | Electronics And Telecommunications Research Institute | Apparatus and method for verifying training data using machine learning |
US20100138712A1 (en) * | 2008-12-01 | 2010-06-03 | Changki Lee | Apparatus and method for verifying training data using machine learning |
US20110106879A1 (en) * | 2009-10-30 | 2011-05-05 | Samsung Electronics Co., Ltd. | Apparatus and method for reproducing multimedia content |
US9355682B2 (en) * | 2009-10-30 | 2016-05-31 | Samsung Electronics Co., Ltd | Apparatus and method for separately viewing multimedia content desired by a user |
US10268760B2 (en) | 2009-10-30 | 2019-04-23 | Samsung Electronics Co., Ltd. | Apparatus and method for reproducing multimedia content successively in a broadcasting system based on one integrated metadata |
US20110307542A1 (en) * | 2010-06-10 | 2011-12-15 | Microsoft Corporation | Active Image Tagging |
US8825744B2 (en) * | 2010-06-10 | 2014-09-02 | Microsoft Corporation | Active image tagging |
US8930308B1 (en) | 2012-08-20 | 2015-01-06 | 3Play Media, Inc. | Methods and systems of associating metadata with media |
US9953062B2 (en) | 2014-08-18 | 2018-04-24 | Lexisnexis, A Division Of Reed Elsevier Inc. | Systems and methods for providing for display hierarchical views of content organization nodes associated with captured content and for determining organizational identifiers for captured content |
US9953646B2 (en) | 2014-09-02 | 2018-04-24 | Belleau Technologies | Method and system for dynamic speech recognition and tracking of prewritten script |
US10381022B1 (en) * | 2015-12-23 | 2019-08-13 | Google Llc | Audio classifier |
US10566009B1 (en) | 2015-12-23 | 2020-02-18 | Google Llc | Audio classifier |
US11217228B2 (en) * | 2016-03-22 | 2022-01-04 | Sri International | Systems and methods for speech recognition in unseen and noisy channel conditions |
US20170287500A1 (en) * | 2016-04-04 | 2017-10-05 | Honeywell International Inc. | System and method to distinguish sources in a multiple audio source environment |
US11138987B2 (en) * | 2016-04-04 | 2021-10-05 | Honeywell International Inc. | System and method to distinguish sources in a multiple audio source environment |
US10714144B2 (en) | 2017-11-06 | 2020-07-14 | International Business Machines Corporation | Corroborating video data with audio data from video content to create section tagging |
CN109886211A (en) * | 2019-02-25 | 2019-06-14 | 北京达佳互联信息技术有限公司 | Data mask method, device, electronic equipment and storage medium |
CN109886211B (en) * | 2019-02-25 | 2022-03-01 | 北京达佳互联信息技术有限公司 | Data labeling method and device, electronic equipment and storage medium |
US11551407B1 (en) | 2021-09-01 | 2023-01-10 | Design Interactive, Inc. | System and method to convert two-dimensional video into three-dimensional extended reality content |
Also Published As
Publication number | Publication date |
---|---|
US20080228749A1 (en) | 2008-09-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8103646B2 (en) | Automatic tagging of content based on a corpus of previously tagged and untagged content | |
US11394667B2 (en) | Chatbot skills systems and methods | |
US11321535B2 (en) | Hierarchical annotation of dialog acts | |
US9594826B2 (en) | Co-selected image classification | |
US8190627B2 (en) | Machine assisted query formulation | |
US9978365B2 (en) | Method and system for providing a voice interface | |
US8260809B2 (en) | Voice-based search processing | |
US20110314011A1 (en) | Automatically generating training data | |
US20080229828A1 (en) | Establishing reputation factors for publishing entities | |
US11769064B2 (en) | Onboarding of entity data | |
US11960514B1 (en) | Interactive conversation assistance using semantic search and generative AI | |
US20110295787A1 (en) | Information processing apparatus, information processing method, and program | |
RU2720074C2 (en) | Method and system for creating annotation vectors for document | |
US10860588B2 (en) | Method and computer device for determining an intent associated with a query for generating an intent-specific response | |
US11262978B1 (en) | Voice-adapted reformulation of web-based answers | |
US20200312312A1 (en) | Method and system for generating textual representation of user spoken utterance | |
US11769013B2 (en) | Machine learning based tenant-specific chatbots for performing actions in a multi-tenant system | |
US20240256599A1 (en) | Responding to queries with voice recordings | |
JP5430960B2 (en) | Content classification apparatus, method, and program | |
US20090006344A1 (en) | Mark-up ecosystem for searching | |
US20200401638A1 (en) | Method of and system for generating search query completion suggestion on search engine | |
Sun et al. | HELPR: A framework to break the barrier across domains in spoken dialog systems | |
GB2555945A (en) | Hierarchical annotation of dialog acts | |
JP2024034157A (en) | Information retrieval support apparatus, information retrieval support method, program, and recording medium | |
KR20230014680A (en) | Bit vector based content matching for 3rd party digital assistant actions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICROSOFT CORPORATION, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BROWN, ROBERT I.;REEL/FRAME:019610/0567 Effective date: 20070312 |
|
ZAAA | Notice of allowance and fees due |
Free format text: ORIGINAL CODE: NOA |
|
ZAAB | Notice of allowance mailed |
Free format text: ORIGINAL CODE: MN/=. |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034542/0001 Effective date: 20141014 |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20240124 |