CN112000817A

CN112000817A - Multimedia resource processing method and device, electronic equipment and storage medium

Info

Publication number: CN112000817A
Application number: CN202010847843.0A
Authority: CN
Inventors: 张志伟; 杨帆
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-08-21
Filing date: 2020-08-21
Publication date: 2020-11-27
Anticipated expiration: 2040-08-21
Also published as: CN112000817B

Abstract

The disclosure relates to a processing method, a device, an electronic device and a storage medium of multimedia resources, by obtaining word combinations corresponding to the multimedia resources; acquiring the prediction probability of each word in the word combination and the information domain characteristics of each word; the prediction probability is the probability of identifying each word, and the information domain characteristics are used for representing the source way of the word; in order to express the characteristics of each word more accurately, aiming at each word, combining the corresponding prediction probability and the corresponding information domain characteristics to obtain the fusion characteristics of each word; and estimating the word weight of each word according to the fusion characteristics of each word, and sequencing each word according to the estimated word weight to obtain a sequenced word combination. The multimedia resource search or the multimedia resource recommendation is carried out by combining the importance of each word in the word combination corresponding to the multimedia resource, so that the multimedia resource can be efficiently and accurately provided for the user.

Description

Multimedia resource processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for processing multimedia resources, an electronic device, and a storage medium.

Background

With the development of the machine learning field, deep learning is widely applied to related fields such as video images, voice recognition, natural language processing and the like. A Convolutional Neural Network (CNN) is used as an important branch of deep learning, and due to the ultra-strong fitting capability and the end-to-end global optimization capability of the CNN, the prediction accuracy of an image classification model on a video file is greatly improved after the CNN is applied.

In the related art, the video file is sensed through an image classification model, a natural language processing algorithm, a voice recognition algorithm and the like, and text data of the video file on the information fields can be obtained. The text data on these fields of information can be used to generate video tags for video files.

However, in an actual business scenario, such as video search, the degree of matching between a video file obtained by video tag search in the conventional art and a user's desire is not high.

Disclosure of Invention

The present disclosure provides a multimedia resource processing method, apparatus, electronic device, and storage medium, to at least solve the problem in the related art that a matching degree between a video file obtained by video tag search and a user expectation is not high. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a method for processing a multimedia resource, including:

acquiring a word combination corresponding to a multimedia resource, wherein the word combination comprises a plurality of words;

acquiring the prediction probability of each word and the information domain characteristics of each word; the prediction probability is the probability of identifying the multimedia resource to obtain each word, and the information domain characteristics are used for representing the source way of the word;

for each word, combining the corresponding prediction probability and the corresponding information domain characteristic to obtain a fusion characteristic of each word;

and estimating the word weight of each word according to the fusion characteristics of each word, and sequencing each word according to the estimated word weight to obtain a sequenced word combination.

In one embodiment, the generating manner of the word combination corresponding to the multimedia resource includes:

acquiring text information corresponding to the multimedia resource, wherein the format of the text information comprises a document format and a label format;

performing word segmentation processing on the text information in the document format to obtain words corresponding to the text information in the document format;

and generating a word combination corresponding to the multimedia resource according to the words corresponding to the text information in the document format and the text information in the tag format.

In one embodiment, the generating a word combination corresponding to the multimedia resource according to a word combination corresponding to the text information in the document format and text information in the tag format includes:

and carrying out fusion and de-duplication processing on words corresponding to the text information in the document format and the text information in the label format to obtain word combinations corresponding to the multimedia resources.

In one embodiment, the obtaining the prediction probability of each word and the information domain feature of each word includes:

performing word recognition on each multimedia resource to obtain the prediction probability of recognizing each word from the multimedia resource;

and carrying out numerical representation on the information domain corresponding to the source way of each word to obtain the information domain characteristics of each word.

In one embodiment, the estimating the word weight of each word according to the fusion characteristics of each word, and ranking each word according to the estimated word weight size to obtain a ranked word combination includes:

inputting the fusion characteristics of the words into a sequencing learning model, performing score estimation on the fusion characteristics of the words through the sequencing learning model, and sequencing the words according to the estimated scores to obtain the sequenced word combination, wherein the estimated scores are used for representing the word weight.

In one embodiment, the generation manner of the ranking learning model includes:

constructing a training sample set, wherein each training sample comprises a sample word combination corresponding to a sample multimedia resource and fusion characteristics of each sample word in the sample word combination, and each sample word in the sample word combination is ordered according to the word weight of each sample word;

and performing iterative training on the initial sequencing learning model according to the training sample set, and stopping training until a convergence condition is met to obtain the sequencing learning model.

In one embodiment, the constructing the training sample set includes:

acquiring original word combinations corresponding to a plurality of sample multimedia resources;

acquiring word weight of each sample word in each original word combination;

for each original word combination, sequencing each sample word according to the word weight of each sample word in the original word combination to obtain the sample word combination;

for each sample word in each original word combination, acquiring the prediction probability of the sample word and the information domain characteristics of the sample word, and combining the corresponding prediction probability of the sample word and the corresponding information domain characteristics of the sample word to obtain the fusion characteristics of the sample word;

and constructing the training sample set by utilizing the sample word combinations corresponding to the sample multimedia resources and the fusion characteristics of the sample words in the original word combinations.

In one embodiment, the obtaining a word weight of each sample word in each original word combination includes:

and inputting the original word combinations into a document theme generation model aiming at each original word combination, and detecting word weights of the sample words through the document theme generation model to obtain word weight pairs corresponding to the sample words, wherein the word weight pairs comprise one sample word and the word weights of the sample words.

In one embodiment, after the detecting, by the document theme generation model, the word weight of each sample term to obtain a term weight pair corresponding to each sample term, the method further includes:

comparing the word weight of each sample word with a preset word weight threshold value to obtain a target sample word of which the word weight is greater than the word weight threshold value;

the step of sequencing each sample word according to the word weight of each sample word in the original word combination to obtain the word weight of the sample word combination comprises:

and sequencing the target sample words according to the word weight of each target sample word to obtain the sample word combination.

According to a second aspect of the embodiments of the present disclosure, there is provided a processing apparatus for a multimedia resource, including:

the word combination acquisition module is configured to execute acquisition of a word combination corresponding to the multimedia resource, wherein the word combination comprises a plurality of words;

a probability characteristic obtaining module configured to execute obtaining of a prediction probability of each word and an information domain characteristic of each word; the prediction probability is the probability of identifying the multimedia resource to obtain each word, and the information domain characteristics are used for representing the source way of the word;

a probability feature merging module configured to perform merging, for each word, the corresponding prediction probability and the corresponding information domain feature to obtain a fusion feature of each word;

and the word ordering module is configured to perform pre-estimation on the word weight of each word according to the fusion characteristics of each word, and order each word according to the pre-estimated word weight to obtain an ordered word combination.

In one embodiment, the processing apparatus further includes a word combination generating module, where the word combination generating module includes a text information obtaining unit, a word segmentation processing unit, and a word combination generating unit;

the text information acquisition unit is configured to execute acquisition of text information corresponding to the multimedia resource, and the format of the text information comprises a document format and a label format;

the word segmentation processing unit is configured to perform word segmentation processing on the text information in the document format to obtain words corresponding to the text information in the document format;

the word combination generating unit is configured to execute word generation corresponding to the multimedia resource according to the text information in the document format and the text information in the tag format.

In one embodiment, the word combination generating unit is further configured to perform fusion and de-duplication processing on the words corresponding to the text information in the document format and the text information in the tag format to obtain the word combination corresponding to the multimedia resource.

In one embodiment, the probability feature obtaining module is further configured to perform word recognition on each multimedia resource, so as to obtain a prediction probability of recognizing each word from the multimedia resource; and carrying out numerical representation on the information domain corresponding to the source way of each word to obtain the information domain characteristics of each word.

In one embodiment, the word ranking module is further configured to perform score estimation on the fusion features of the words through the ranking learning model, and rank the words according to the estimated scores to obtain the ranked word combinations, where the estimated scores are used to represent the word weight.

In one embodiment, the processing apparatus further comprises a ranking learning model generation module, which comprises a sample set construction unit and a model training unit;

the sample set construction unit is configured to execute construction of a training sample set, each training sample comprises a sample word combination corresponding to a sample multimedia resource and fusion characteristics of each sample word in the sample word combination, and each sample word in the sample word combination is ordered according to the word weight of each sample word;

and the model training unit is configured to perform iterative training on an initial sequencing learning model according to the training sample set, and stop training until a convergence condition is met to obtain the sequencing learning model.

In one embodiment, the sample set constructing unit includes an original word combination obtaining subunit, a word weight obtaining subunit, a sample word combination obtaining subunit, a fusion feature obtaining subunit, and a sample set constructing subunit;

the original word combination obtaining subunit is configured to perform obtaining of original word combinations corresponding to the plurality of sample multimedia resources;

the word weight obtaining subunit is configured to perform obtaining of a word weight of each sample word in each original word combination;

the sample word combination obtaining subunit is configured to perform, for each original word combination, sorting the sample words according to word weight of each sample word in the original word combination to obtain the sample word combination;

the fusion characteristic obtaining subunit is configured to perform, for each sample word in each original word combination, obtaining a prediction probability of the sample word and an information domain characteristic of the sample word, and merging the prediction probability of the corresponding sample word and the information domain characteristic of the corresponding sample word to obtain a fusion characteristic of the sample word;

the sample set constructing subunit is configured to execute construction of the training sample set by using sample word combinations corresponding to the plurality of sample multimedia resources and fusion features of sample words in the original word combinations.

In one embodiment, the word weight obtaining subunit is further configured to perform, for each original word combination, inputting the original word combination into a document theme generation model, and detecting, by the document theme generation model, a word weight of each sample word to obtain a word weight pair corresponding to each sample word, where the word weight pair includes one sample word and a word weight of the sample word.

In one embodiment, the sample set constructing unit further includes a target sample word obtaining subunit;

the target sample word obtaining subunit is configured to compare the word weight of each sample word with a preset word weight threshold value to obtain a target sample word of which the word weight is greater than the word weight threshold value;

the sample word combination obtaining subunit is further configured to perform sorting on each target sample word according to the word weight of each target sample word, so as to obtain the sample word combination.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the processing method of the multimedia resource described in any embodiment of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, a storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device/server to perform a method of processing a multimedia asset as described in any one of the embodiments of the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, the program product comprising a computer program, the computer program being stored in a readable storage medium, from which at least one processor of a device reads and executes the computer program, so that the device performs the method of processing a multimedia resource as described in any one of the embodiments of the first aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

acquiring a word combination corresponding to a multimedia resource; acquiring the prediction probability of each word in the word combination and the information domain characteristics of each word; the prediction probability is the probability of identifying each word, and the information domain characteristics are used for representing the source way of the word; in order to express the characteristics of each word more accurately, aiming at each word, combining the corresponding prediction probability and the corresponding information domain characteristics to obtain the fusion characteristics of each word; and estimating the word weight of each word according to the fusion characteristics of each word, and sequencing each word according to the estimated word weight to obtain a sequenced word combination. The method and the device have the advantages that the multimedia resource search or the multimedia resource recommendation is carried out by combining the importance of each word in the word combination corresponding to the multimedia resource, the multimedia resource can be efficiently and accurately provided for the user, the technical problem that the matching degree between the multimedia resource file provided for the user and the user expectation in the traditional technology is not high is solved, the click rate of the multimedia resource is improved, and the operation cost of the user is reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a diagram illustrating an application environment of a method for processing a multimedia asset according to an exemplary embodiment.

Fig. 2 is a flow chart illustrating a method of processing a multimedia asset according to an exemplary embodiment.

Fig. 3 is a flowchart illustrating a generation manner of word combinations corresponding to multimedia resources according to an exemplary embodiment.

FIG. 4 is a flow diagram illustrating a manner of generation of a ranked learning model according to an exemplary embodiment.

FIG. 5 is a flow diagram illustrating the construction of a training sample set in accordance with an exemplary embodiment.

FIG. 6 is a flow diagram illustrating the construction of a training sample set in accordance with an exemplary embodiment.

FIG. 7 is a flow diagram illustrating a manner of generation of a ranked learning model according to an exemplary embodiment.

Fig. 8 is a flow chart illustrating a method of processing a multimedia asset according to an exemplary embodiment.

Fig. 9 is a block diagram illustrating a processing device of a multimedia asset according to an example embodiment.

Fig. 10 is an internal block diagram of an electronic device shown in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The processing method of the multimedia resource provided by the present disclosure can be applied to the application environment shown in fig. 1. The method comprises the following steps: a terminal 110, a first electronic device 120, and a second electronic device 130. The first electronic device 120 and the second electronic device 130 refer to electronic devices with strong data storage and computation capabilities, for example, the first electronic device 120 and the second electronic device 130 may be PCs (Personal computers) or servers, and the first electronic device 120 and the second electronic device 130 may be implemented by independent servers or a server cluster formed by a plurality of servers. The terminal 110 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.

The second electronic device 130 may be configured to build a ranking learning model, and train the built ranking learning model through the second electronic device 130. The second electronic device 130 obtains original word combinations corresponding to the plurality of sample multimedia resources; acquiring word weights of sample words in the original word combinations; for each original word combination, sequencing each sample word according to the word weight of each sample word in the original word combination to obtain a sample word combination; acquiring the prediction probability of the sample words and the information domain characteristics of the sample words aiming at each sample word in each original word combination, and combining the prediction probability of the corresponding sample words and the information domain characteristics of the corresponding sample words to obtain the fusion characteristics of the sample words; and constructing a training sample set by utilizing the sample word combinations corresponding to the sample multimedia resources and the fusion characteristics of the sample words in the original word combinations. And performing iterative training on the initial sequencing learning model according to the training sample set, and stopping training until a convergence condition is met to obtain the sequencing learning model.

The terminal 110 is installed and operated with an application program supporting a search function, which may be at least one of a browser, a social application, a live application, a shopping application, or a payment application. The terminal 110 receives a search request input by a user, the search request carries search terms, the terminal 110 sends the search request to the first electronic device 120, and the first electronic device 120 searches for corresponding multimedia resources according to the received search terms. The first electronic device 120 may obtain text information corresponding to the multimedia resource, where the format of the text information includes a document format and a tag format; performing word segmentation processing on the text information in the document format to obtain words corresponding to the text information in the document format; and generating a word combination corresponding to the multimedia resource according to the words corresponding to the text information in the document format and the text information in the label format. Further, the first electronic device 120 obtains a word combination corresponding to the multimedia resource, where the word combination includes a plurality of words; acquiring the prediction probability of each word and the information domain characteristics of each word; the prediction probability is the probability of identifying the multimedia resources to obtain each word, and the information domain characteristics are used for representing the source way of the word; aiming at each word, combining the corresponding prediction probability and the corresponding information domain characteristics to obtain the fusion characteristics of each word; and estimating the word weight of each word according to the fusion characteristics of each word, and sequencing each word according to the estimated word weight to obtain a sequenced word combination.

Fig. 2 is a flowchart illustrating a processing method of a multimedia resource according to an exemplary embodiment, where as shown in fig. 2, the processing method of the multimedia resource is used in the terminal 110 or the first electronic device 120 in the application environment shown in fig. 1, and includes the following steps:

in step S210, a word combination corresponding to the multimedia resource is obtained.

The multimedia resource can be at least one of a text resource, a video resource, an audio resource, a picture resource or a webpage resource. On one hand, the multimedia resources are analyzed in a machine learning mode such as image classification and voice recognition, and text information carried by the multimedia resources can be obtained; on the other hand, the user may describe the multimedia resource through various ways, such as user comment, user description, and a tag that the user has marked on the multimedia resource. By analyzing the information of the two aspects, a plurality of words can be obtained, and the words can form word combinations corresponding to the multimedia resources. Specifically, when the multimedia resource is processed, the rich information carried by the multimedia resource can be analyzed and processed to obtain the word combination corresponding to the multimedia resource. Or analyzing and processing rich information carried by the multimedia resource in advance, and storing the obtained word combination corresponding to the multimedia resource in a server local to the first electronic device or in network connection with the first electronic device. When the multimedia resource is processed, the word combination corresponding to the multimedia resource may be obtained from a server local to the first electronic device or network-connected to the first electronic device.

In step S220, the prediction probability of each word and the information domain feature of each word are acquired.

And the prediction probability is the probability of identifying the multimedia resources to obtain each word. As described above, if the multimedia resources are analyzed by machine learning methods such as image classification and speech recognition to obtain some words describing the multimedia resources, the machine learning method recognizes the existence possibility of the words, which is the prediction probability. It is understood that if the words describing the multimedia resource are based on user behaviors, such as tagging by a user and inputting a session by the user, and the words generated based on the user behaviors are certain to exist, the prediction probability of the words is 1. Whether the words in the word combinations corresponding to the multimedia resources are identified in a machine learning mode or the words in the word combinations corresponding to the multimedia resources determined based on the user behaviors belong to the source paths of the words, and the information domain features are used for representing the source paths of the words. Specifically, when the multimedia resource is processed, each word in the word combination corresponding to the multimedia resource is identified, and the source path of each word is analyzed to obtain the prediction probability of each word and the information domain characteristics of each word. Or analyzing each word in the word combination corresponding to the multimedia resource and the source way of each word in advance to obtain the prediction probability of each word and the information domain characteristics of each word, and storing the prediction probability and the information domain characteristics in a server local to the first electronic device or in network connection with the first electronic device. If necessary, the prediction probability of each word and the information domain feature of each word can be obtained from a server local to the first electronic device or connected with the first electronic device through a network.

In step S230, for each word, the corresponding prediction probability and the corresponding information domain feature are combined to obtain a fusion feature of each word.

Specifically, when the multimedia resource is processed, after the prediction probability of each word and the information domain feature of each word are obtained, for more accurately expressing each word, the prediction probability of the word and the information domain feature of the word are combined for any word, for example, the information domain feature of the word may be placed after the prediction probability of the word, and then the fusion feature of the word is generated. It is to be understood that the predicted probability of the word may also be placed after the information domain feature of the word to generate a fused feature of the word. By repeatedly executing the above processes, the prediction probability of each word and the information domain characteristics of each word can be combined to obtain the fusion characteristics of each word. The fusion characteristics comprise the prediction probability of each word and the information domain characteristics of each word, so that the characteristic expression capability is improved.

Illustratively, a word is identified in a variety of ways (i.e., source ways), which may include through several machine learning models, or may include user-based descriptions of multimedia assets. Each mode corresponds to a prediction probability for identifying the word, and a word is identified through multiple modes (namely source ways), and then corresponds to multiple prediction probabilities with the same quantity, and the multiple prediction probabilities of the word are combined with information domain characteristics representing the source ways of the word to obtain the fusion characteristics of the word.

In step S240, the word weight of each word is estimated according to the fusion feature of each word, and the words are sorted according to the estimated word weight, so as to obtain a sorted word combination.

The word weight is used for representing the importance of a word to accurately express the central theme expressed by the multimedia resource. Specifically, when searching for or recommending multimedia resources, the importance of each word in the word combinations corresponding to the multimedia resources needs to be combined, so that the multimedia resources are efficiently and accurately provided for the user, the click rate of the multimedia resources is improved, and the operation cost of the user is reduced. Because the fusion characteristics comprise the prediction probability of each word and the information domain characteristics of each word, the importance of each word is analyzed based on the fusion characteristics of each word, and the word weight of each word is estimated. Further, the words in the word combinations can be arranged in the order from big to small according to the estimated word weight, and the ordered word combinations are obtained.

In the method for processing the multimedia resources, word combinations corresponding to the multimedia resources are obtained; acquiring the prediction probability of each word in the word combination and the information domain characteristics of each word; in order to express the characteristics of each word more accurately, aiming at each word, combining the corresponding prediction probability and the corresponding information domain characteristics to obtain the fusion characteristics of each word; and estimating the word weight of each word according to the fusion characteristics of each word, and sequencing each word according to the estimated word weight to obtain a sequenced word combination. The multimedia resource searching or multimedia resource recommending method based on the word combination comprises the steps of searching the multimedia resources or recommending the multimedia resources according to the importance of each word in the word combination corresponding to the multimedia resources, efficiently and accurately providing the multimedia resources for users, and solving the technical problem that the matching degree between the multimedia resource files provided for the users and the user expectation in the traditional technology is not high, so that the click rate of the multimedia resources is improved, and the user operation cost is reduced.

In an exemplary embodiment, as shown in fig. 3, a generation manner of a word combination corresponding to a multimedia resource may be specifically implemented by the following steps:

in step S310, text information corresponding to the multimedia resource is acquired.

The format of the text information comprises a document format and a label format. On one hand, the text information carried by the multimedia resources can be obtained by analyzing the multimedia resources in machine learning manners such as image classification and voice recognition, and the text information obtained in the machine learning manner is stored in a document format. On the other hand, the user can describe the multimedia resource through various ways, such as user comments, user descriptions, and tags marked by the user on the multimedia resource, and text information obtained based on user behaviors is stored in a tag format. Specifically, when the multimedia resource is processed, the rich information carried by the multimedia resource can be analyzed and processed to obtain the text information corresponding to the multimedia resource. Or analyzing and processing rich information carried by the multimedia resource in advance, and storing the obtained text information corresponding to the multimedia resource in a server local to the first electronic device or in network connection with the first electronic device. When the multimedia resource is processed, the text information corresponding to the multimedia resource can be acquired from a server local to the first electronic device or connected with the first electronic device through a network.

In step S320, word segmentation is performed on the text information in the document format to obtain words corresponding to the text information in the document format.

Because the text information in the form of the label comes from the description of the multimedia resource by the user and expresses the multimedia resource relatively correctly, the text information in the form of the label is not subjected to word segmentation and the integrity of the text information in the form of the label is maintained. The text information in the document format is derived from the machine learning model, and further processing is required to improve the accuracy of the text information in the document format. Specifically, a word segmentation tool (e.g., jieba) is deployed on the terminal 110 or the first electronic device 120, and the word segmentation tool is used to perform word segmentation processing on the text information in the document format to obtain a plurality of words corresponding to the text information in the document format. Further, the word segmentation result of the text information in the document format is filtered, for example, stop words in the word segmentation result are removed. By removing stop words in the word segmentation results, the precision of word combinations corresponding to multimedia resources is improved, and the interference of useless information is reduced.

In step S330, a word combination corresponding to the multimedia resource is generated according to the word corresponding to the text information in the document format and the text information in the tag format.

Specifically, in order to completely express the information carried by the multimedia resource, words and text information in a tag format corresponding to the text information in the document format need to be summarized, and word combinations corresponding to the multimedia resource are formed by using the words and text information in the tag format corresponding to the text information in the document format.

According to the method for generating the word combinations corresponding to the multimedia resources, the text information in the document format and the text information in the label format corresponding to the multimedia resources are acquired, the text information is processed in a targeted processing mode according to the characteristics of the text information in different formats, word segmentation is performed on the text information in the document format, and words corresponding to the text information in the document format are obtained; therefore, word combinations corresponding to the multimedia resources are generated according to the words corresponding to the text information in the document format and the text information in the label format, and the word combinations can accurately and completely express the multimedia resources.

In an exemplary embodiment, in step S330, a word combination corresponding to the multimedia resource is generated according to the word corresponding to the text information in the document format and the text information in the tag format, which may specifically be implemented by the following steps: and performing fusion and de-duplication processing on the words corresponding to the text information in the document format and the text information in the label format to obtain word combinations corresponding to the multimedia resources.

Specifically, on the one hand, there may be some duplicate information between the text information in the form of a tag and the text information in the format of a document, and on the other hand, the text information in the form of a tag and the text information in the format of a document have different expressions for multimedia resources. Therefore, in order to reduce the complexity of the multimedia resource expression and ensure the integrity of the multimedia resource expression, words and text information in a label format corresponding to the text information in a document format are summarized, and the same part between the words and the text information in the label format corresponding to the text information in the document format is deleted to obtain a word combination corresponding to the multimedia resource.

In the embodiment, the words corresponding to the text information in the document format and the text information in the tag format are fused and deduplicated to obtain the word combinations corresponding to the multimedia resources, so that the information carried by the multimedia resources can be expressed completely and simply.

In an exemplary embodiment, in step S220, the prediction probability of each word and the information domain feature of each word are obtained, which may be specifically implemented by the following steps: performing word recognition on each multimedia resource to obtain the prediction probability of each word recognized from the multimedia resources; and carrying out numerical representation on the information domain corresponding to the source way of each word to obtain the information domain characteristics of each word.

Specifically, the multimedia resource is identified through any one or more of image classification, optical character recognition and voice recognition, so as to obtain the prediction probability of each word, and the information domain may include any one or more of image classification, optical character recognition and voice recognition. And carrying out numerical representation on the information domain corresponding to the source way of each word to obtain the information domain characteristics of each word. Illustratively, the multimedia resource file is a video of royal glory, and the probability of identifying the "royal glory" through an image classification model is 0.9. The probability of "royal glory" recognized by voice recognition was 0.8. The information domain characteristics can be obtained a priori, and the information domain can comprise image classification, optical character recognition and voice recognition; where the number 1 indicates that the word can be recognized through a certain source path and the number 0 indicates that the word is not recognized through a certain source path. Then the "royal glory" information field feature can be represented as (1,1, 0).

In the above embodiment, the prediction probability of identifying each word from the multimedia resources is obtained by performing word identification on each multimedia resource; and performing numerical representation on the information domain corresponding to the source way of each word to obtain the information domain characteristics of each word. Each word in the word combination is expressed from two aspects of the prediction probability and the source path, so that each word is completely and accurately expressed, and a data basis is provided for the subsequent sequencing of each word.

In an exemplary embodiment, in step S240, the word weight of each word is estimated according to the fusion feature of each word, and each word is ranked according to the estimated word weight, so as to obtain a ranked word combination, which may specifically be implemented by the following steps: inputting the fusion characteristics of all the words into a sequencing learning model, performing score estimation on the fusion characteristics of all the words through the sequencing learning model, and sequencing all the words according to the estimated scores to obtain a sequenced word combination.

The sequencing learning model is a machine learning model used for sequencing all the words in the word combination corresponding to the multimedia resource. Specifically, the fusion characteristics of each word are input into a sequencing learning model, the sequencing learning model corresponds to a scoring function F, each word is scored according to the fusion characteristics of each word through the scoring function F to obtain the estimated score of each word, and the estimated score can be used for representing the weight of each word. And the sequencing learning model corresponds to a sequencing system, and the sequencing system sequences all the words according to the estimated scores to obtain a sequenced word combination.

In the above embodiment, the score of the fusion feature of each word is estimated through the ranking learning model, and each word is ranked according to the estimated score, so as to obtain the ranked word combination. The importance ordering of each word in the word combination corresponding to the multimedia resource is realized, and then the multimedia resource which is fit with the user expectation can be efficiently and accurately provided for the user when the multimedia resource searching or the multimedia resource recommending is carried out.

In an exemplary embodiment, as shown in fig. 4, the generation manner of the rank learning model may be specifically implemented by the following steps:

in step S410, a training sample set is constructed.

Each training sample comprises a sample word combination corresponding to one sample multimedia resource and fusion characteristics of each sample word in the sample word combination, and each sample word in the sample word combination is ordered according to the word weight of each sample word; specifically, a plurality of tuples are constructed based on sample word combinations corresponding to a plurality of sample multimedia resources and the fusion characteristics of each sample word in each sample word combination, and each tuple comprises one sample word combination corresponding to a sample multimedia resource and the fusion characteristics of each sample word in the sample word combination. Each tuple is a training sample, so that the construction of the training sample set is completed.

In step S420, iterative training is performed on the initial ranking learning model according to the training sample set, and the training is stopped until a convergence condition is satisfied, so as to obtain the ranking learning model.

Specifically, when an initial ranking learning model is trained by using one of the training samples, a predicted value corresponding to the training sample can be obtained, a loss function value of the initial ranking learning model is determined according to the predicted value and the arrangement sequence of each sample word in the sample word combination, so that the initial ranking learning model is adjusted, another predicted value can be obtained by using another training sample to train the adjusted ranking learning model, the loss function value of the adjusted ranking learning model is continuously calculated, the steps are repeated, iterative training is performed on the ranking learning model until a convergence condition is met, and the ranking learning model is obtained.

In the above embodiment, the training sample set is constructed, the initial ranking learning model is iteratively trained by using the training sample set, and the training is stopped until a convergence condition is satisfied, so that the ranking learning model is obtained. The words in the word combinations of the multimedia resources are ranked according to the word weights of the words through the ranking learning model, and the ranked word combinations can accurately predict which multimedia resources are more fit with the user expectation in the actual application scene of the user, so that the application accuracy of the multimedia resources in the actual scene is improved, and the conversion rate of the multimedia resources is improved.

In an exemplary embodiment, as shown in fig. 5, the training sample set is constructed, which may specifically be implemented by the following steps:

in step S510, original word combinations corresponding to the sample multimedia resources are obtained.

Wherein, the words in the original word combination are not ordered according to the word weight of each word. Specifically, for each sample multimedia resource, text information corresponding to the sample multimedia resource is obtained, and the format of the text information comprises a document format and a label format; performing word segmentation processing on the text information in the document format to obtain words corresponding to the text information in the document format; and performing fusion and de-duplication processing on the words corresponding to the text information in the document format and the text information in the label format to obtain an original word combination corresponding to the sample multimedia resource. It can be understood that the text information corresponding to the sample multimedia resource may also be processed in advance to obtain an original phrase corresponding to the sample multimedia resource, and the original phrase is stored in the second electronic device 130 or in a server connected to the second electronic device 130 through a network.

In step S520, a word weight of each sample word in each original word combination is obtained.

Specifically, the original word combination corresponding to the sample multimedia resource includes a plurality of words, and the words in the original word combination need to be ordered according to the importance of each sample word to generate the label data required by the training ordering learning model. Word weights may characterize the importance of a word in the original word combination. Therefore, for each sample multimedia resource, the word weight of each word in the original word combination corresponding to each sample multimedia resource is obtained.

In step S530, for each original word combination, the sample words are ordered according to the word weight of each sample word in the original word combination, so as to obtain a sample word combination.

The accuracy of the output result of the sequencing learning model can be ensured by improving the performance of the sequencing learning model, so that the multimedia resource expected to be attached to the user can be predicted by using the output result. Therefore, the tag data of the ranking learning model in this disclosure needs to be relatively complete, i.e., the words in the original word combination need to be ranked according to word weight. Specifically, for each original word combination, the original word combination includes a plurality of sample words, the word weights of the sample words are obtained, and the sample words are ordered according to the word weight of each sample word, so that the ordering of each word in the original word combination is completed, and thus the sample word combination is obtained.

In step S540, for each sample word in each original word combination, the prediction probability of the sample word and the information domain feature of the sample word are obtained, and the prediction probability of the corresponding sample word and the information domain feature of the corresponding sample word are combined to obtain the fusion feature of the sample word.

Specifically, after preparation of tag data of the complete ranking learning model, it is further necessary to prepare feature data of the ranking learning model. The multimedia resources are identified through any one or more modes of image classification, optical character identification and voice identification to obtain the prediction probability of each sample word, and the information domain can comprise any one or more of image classification, optical character identification and voice identification. And performing numerical representation on the information domain corresponding to the source way of each sample word to obtain the information domain characteristics of each sample word. In order to express each sample word more accurately, for any sample word, the prediction probability of the sample word and the information domain characteristics of the sample word are combined to obtain the fusion characteristics of each sample word. The fusion characteristics comprise the prediction probability of each sample word and the information domain characteristics of each sample word, so that the characteristic expression capability is improved.

In step S550, a training sample set is constructed by using sample word combinations corresponding to the plurality of sample multimedia resources and fusion features of sample words in the original word combinations.

Specifically, a plurality of tuples are constructed based on sample word combinations corresponding to a plurality of sample multimedia resources and the fusion characteristics of each sample word in each sample word combination, and each tuple comprises one sample word combination corresponding to a sample multimedia resource and the fusion characteristics of each sample word in the sample word combination. Each tuple is a training sample, so that the construction of the training sample set is completed.

In the above embodiment, first, the sample words are ordered according to the word weight of each sample word in the original word combination to obtain a corresponding sample word combination; secondly, acquiring the prediction probability of the sample words and the information domain characteristics of the sample words, and combining the prediction probability of the corresponding sample words and the information domain characteristics of the corresponding sample words to obtain the fusion characteristics of the sample words; and finally, constructing a training sample set by utilizing the sample word combinations corresponding to the plurality of sample multimedia resources and the fusion characteristics of the sample words in the original word combinations. The method not only provides label data and characteristic data for training of the sequencing learning model, but also improves the characteristic expression capability due to the integrity of the characteristic data, and the accuracy of the label data is favorable for improving the prediction accuracy of the sequencing learning model.

In an exemplary embodiment, in step S520, the word weight of each sample word in each original word combination is obtained, which may specifically be implemented by the following steps: and inputting the original word combinations into a document theme generation model aiming at each original word combination, detecting the word weight of each sample word through the document theme generation model, and obtaining a word weight pair corresponding to each sample word, wherein the word weight pair comprises one sample word and the word weight of the sample word.

The document theme generation model (LDA) includes three layers of structures, i.e., a word, a theme, and a document. LDA is an unsupervised machine learning technique that can be used to identify underlying topic information in large-scale document collections (document collections) or corpora (corpus). It adopts bag of words (bag of words) method, which treats each document as a word frequency vector, thereby converting text information into digital information easy to model. Specifically, in order to ensure the accuracy of the output result of the ranking learning model, each sample word in the sample word combination constructed in the present disclosure is ranked according to the word weight of each sample word. In order to quickly acquire the word weight of each sample word, each sample word in the original word combination is detected by means of a document theme generation model. And inputting the original word combinations into a document theme generation model aiming at each original word combination, and detecting the word weight of each sample word in the original word combinations through the document theme generation model to obtain a word weight pair corresponding to each sample word, wherein the word weight pair comprises one sample word and the word weight of the sample word.

In the above embodiment, the word weights of the sample words are detected by the document theme generation model to obtain the word weights of the sample words, and tag data can be quickly constructed for the ranking learning model.

In an exemplary embodiment, as shown in fig. 6, after detecting the word weight of each sample term through the document topic generation model to obtain a term weight pair corresponding to each sample term, the constructing of the training sample set further includes the following steps:

in step S610, the word weight of each sample word is compared with a preset word weight threshold, so as to obtain a target sample word whose word weight is greater than the word weight threshold.

Because the document theme generation model is based on the unsupervised machine learning technology, the term weight pairs corresponding to the sample terms need to be further cleaned. Specifically, the minimum value of the word weights of the sample words is set, and the minimum value of the word weights is used as a preset word weight threshold value. And comparing the word weight of each sample word with a preset word weight threshold, removing the sample words with the word weight less than or equal to the word weight threshold, and keeping the sample words with the word weight more than the word weight threshold to obtain the target sample words with the word weight more than the word weight threshold. It can be understood that stop words are further filtered by comparing the word weight of each sample word with a preset word weight threshold value, so that interference of unnecessary words is reduced, and a good data base is laid.

Sequencing the sample words according to the word weight of each sample word in the original word combination to obtain the word weight of the sample word combination, which can be realized by the following steps:

in step S620, the target sample words are ranked according to the word weights of the target sample words, so as to obtain a sample word combination.

Specifically, for each original word combination, the original word combination includes a plurality of sample words, word weights of the sample words are obtained through the LDA model, and data is cleaned to obtain target sample words. Sequencing the target sample words according to the word weight of each target sample word, and sequencing the target sample words so as to obtain a sample word combination.

In the embodiment, unnecessary interference information is reduced through data cleaning, the accuracy of data is ensured, high-quality training samples are provided for subsequent model training, and therefore the performance capability of the sequencing learning model is improved.

Fig. 7 is a flowchart illustrating a generation method of a ranking learning model according to an exemplary embodiment, where the generation method of the ranking learning model is used in the second electronic device 130 in the application environment illustrated in fig. 1, as illustrated in fig. 7, and includes the following steps:

in step S702, original word combinations corresponding to the sample multimedia resources are obtained.

Specifically, for each sample multimedia resource, text information corresponding to the sample multimedia resource is obtained, and the format of the text information comprises a document format and a label format; performing word segmentation processing on the text information in the document format to obtain words corresponding to the text information in the document format; and performing fusion and de-duplication processing on the words corresponding to the text information in the document format and the text information in the label format to obtain an original word combination corresponding to the sample multimedia resource.

In step S704, for each original word combination, the original word combination is input to the document theme generation model, and the word weight of each sample word is detected by the document theme generation model, so as to obtain a word weight pair corresponding to each sample word, where the word weight pair includes one sample word and the word weight of each sample word.

In step S706, the word weight of each sample word is compared with a preset word weight threshold, so as to obtain a target sample word whose word weight is greater than the word weight threshold.

In step S708, for each original word combination, the target sample words are ranked according to the word weight of each target sample word, so as to obtain a sample word combination.

In step S710, for each sample word in each original word combination, the prediction probability of the sample word and the information domain feature of the sample word are obtained, and the prediction probability of the corresponding sample word and the information domain feature of the corresponding sample word are combined to obtain the fusion feature of the sample word.

Specifically, identifying each sample word to obtain the prediction probability of each sample word; and performing numerical representation on the information domain corresponding to the source way of each sample word to obtain the information domain characteristics of each sample word.

In step S712, a training sample set is constructed by using sample word combinations corresponding to the plurality of sample multimedia resources and the fusion features of each sample word in each original word combination.

In step S714, the fusion features of the sample words are input to the initial ranking learning model, score estimation is performed on the fusion features of the sample words through the initial ranking learning model, and the sample words are ranked according to the estimated scores to obtain ranked original word combinations.

And the pre-estimated score is used for representing the weight of the word.

In step S716, a loss function value of the training process is obtained by calculating the sorted original word combination and sample word combination, and a parameter of the initial sorted learning model is adjusted.

In step S718, the above steps S714 to S716 are iteratively performed, and the training is stopped until the convergence condition is satisfied, so as to obtain the ranking learning model.

In the generation method of the ranking learning model, the initial ranking learning model is subjected to iterative training by constructing the training sample set with less interference information and accurate data, and the training is stopped until the convergence condition is met, so that the ranking learning model with good performance is obtained. The word combinations of the multimedia resources are ranked according to the word weights of the words through the ranking learning model, and the ranked word combinations can accurately predict which multimedia resources are more fit with the user expectation in the actual application scene of the user, so that the application accuracy of the multimedia resources in the actual scene is improved, and the conversion rate of the multimedia resources is improved.

Fig. 8 is a flowchart illustrating a processing method of a multimedia resource according to an exemplary embodiment, where as shown in fig. 8, the processing method of the multimedia resource is used in the terminal 110 or the first electronic device 120 in the application environment shown in fig. 1, and includes the following steps:

in step S802, original word combinations corresponding to the sample multimedia resources are obtained.

Specifically, sample text information corresponding to each sample multimedia resource is obtained, and the format of the sample text information comprises a document format and a label format; performing word segmentation processing on the sample text information in the document format to obtain words corresponding to the sample text information in the document format; and carrying out fusion and de-duplication processing on the words corresponding to the sample text information in the document format and the sample text information in the label format to obtain an original word combination corresponding to the sample multimedia resource.

In step S804, for each original word combination, the original word combination is input to the document theme generation model, and the word weight of each sample word is detected by the document theme generation model, so as to obtain a word weight pair corresponding to each sample word, where the word weight pair includes one sample word and the word weight of the sample word.

In step S806, the word weight of each sample word is compared with a preset word weight threshold, so as to obtain a target sample word whose word weight is greater than the word weight threshold.

In step S808, for each original word combination, the target sample words are ordered according to the word weight of each target sample word, so as to obtain a sample word combination.

In step S810, for each sample word in each original word combination, the prediction probability of the sample word and the information domain feature of the sample word are obtained, and the prediction probability of the corresponding sample word and the information domain feature of the corresponding sample word are combined to obtain the fusion feature of the sample word.

In step S812, a training sample set is constructed by using sample word combinations corresponding to the plurality of sample multimedia resources and fusion features of sample words in the original word combinations.

In step S814, the initial ranking learning model is iteratively trained according to the training sample set, and the training is stopped until a convergence condition is satisfied, so as to obtain the ranking learning model.

In step S816, text information corresponding to the multimedia resource is obtained, and the format of the text information includes a document format and a tag format.

In step S818, word segmentation is performed on the text information in the document format to obtain words corresponding to the text information in the document format.

In step S820, the words corresponding to the text information in the document format and the text information in the tag format are fused and deduplicated to obtain a word combination corresponding to the multimedia resource.

Wherein, the word combination comprises a plurality of words.

In step S822, word recognition is performed on each multimedia resource, and a prediction probability of recognizing each word from the multimedia resource is obtained.

In step S824, the information domain corresponding to the source route of each word is represented numerically, so as to obtain the information domain characteristics of each word.

In step S826, for each word, the corresponding prediction probability and the corresponding information domain feature are combined to obtain a fusion feature of each word.

In step S828, the fusion features of the words are input to the ranking learning model, scores of the fusion features of the words are estimated by the ranking learning model, and the words are ranked according to the estimated scores to obtain ranked word combinations.

And the pre-estimated score is used for representing the weight of the word.

In the method for processing the multimedia resources, word combinations corresponding to the multimedia resources are obtained; acquiring the prediction probability of each word in the word combination and the information domain characteristics of each word; the prediction probability is the probability of identifying each word, and the information domain characteristics are used for representing the source way of the word; in order to express the characteristics of each word more accurately, aiming at each word, combining the corresponding prediction probability and the corresponding information domain characteristics to obtain the fusion characteristics of each word; and estimating the word weight of each word according to the fusion characteristics of each word, and sequencing each word according to the estimated word weight to obtain a sequenced word combination. The method and the device have the advantages that the multimedia resource search or the multimedia resource recommendation is carried out by combining the importance of each word in the word combination corresponding to the multimedia resource, the multimedia resource can be efficiently and accurately provided for the user, the technical problem that the matching degree between the multimedia resource file provided for the user and the user expectation in the traditional technology is not high is solved, the click rate of the multimedia resource is improved, and the operation cost of the user is reduced.

It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the above-mentioned flowcharts may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or the stages is not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a part of the steps or the stages in other steps.

Fig. 9 is an apparatus block diagram illustrating a processing apparatus 900 for multimedia assets in accordance with an example embodiment. Referring to FIG. 9, the apparatus includes a word combination acquisition module 902, a probability feature acquisition module 904, a probability feature merging module 906, and a word ordering module 908.

A word combination obtaining module 902 configured to perform obtaining of a word combination corresponding to a multimedia resource, where the word combination includes a plurality of words;

a probability feature obtaining module 904 configured to perform obtaining a prediction probability of each of the words and an information domain feature of each of the words; wherein the prediction probability is the probability of identifying each word, and the information domain features are used for representing the source way of the words;

a probability feature merging module 906 configured to perform, for each word, merging the corresponding prediction probability and the corresponding information domain feature to obtain a fusion feature of each word;

and a word sorting module 908 configured to perform pre-estimation on the word weight of each word according to the fusion feature of each word, and sort each word according to the pre-estimated word weight size to obtain a sorted word combination.

In an exemplary embodiment, the processing apparatus further includes a word combination generating module, which includes a text information acquiring unit, a word segmentation processing unit, and a word combination generating unit;

In an exemplary embodiment, the word combination generating unit is further configured to perform fusion and de-duplication processing on the words corresponding to the text information in the document format and the text information in the tag format to obtain the word combination corresponding to the multimedia resource.

In an exemplary embodiment, the probability feature obtaining module 904 is further configured to perform word recognition on each multimedia resource, and obtain a prediction probability for recognizing each word from the multimedia resource; and carrying out numerical representation on the information domain corresponding to the source way of each word to obtain the information domain characteristics of each word.

In an exemplary embodiment, the word ranking module 908 is further configured to perform inputting the fusion characteristics of the words into a ranking learning model, performing score estimation on the fusion characteristics of the words through the ranking learning model, and ranking the words according to the estimated scores to obtain the ranked word combinations, where the estimated scores are used to characterize the word weight.

In an exemplary embodiment, the processing apparatus further comprises a ranking learning model generation module, the ranking learning model generation module comprising a sample set construction unit and a model training unit;

In an exemplary embodiment, the sample set constructing unit includes an original word combination obtaining subunit, a word weight obtaining subunit, a sample word combination obtaining subunit, a fusion feature obtaining subunit, and a sample set constructing subunit;

In an exemplary embodiment, the word weight obtaining subunit is further configured to perform, for each original word combination, inputting the original word combination into a document theme generation model, and detecting, by the document theme generation model, a word weight of each sample word to obtain a word weight pair corresponding to each sample word, where the word weight pair includes one sample word and a word weight of the sample word.

In an exemplary embodiment, the sample set constructing unit further includes a target sample word obtaining subunit;

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 10 is a block diagram illustrating an apparatus 1000 for processing of multimedia assets, according to an example embodiment. For example, the device 1000 may be a server. Referring to fig. 10, device 1000 includes a processing component 1020 that further includes one or more processors and memory resources, represented by memory 1022, for storing instructions, such as application programs, that are executable by processing component 1020. The application programs stored in memory 1022 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1020 is configured to execute the instructions to perform the processing method of the multimedia asset described above.

Device 1000 can also include a power component 1024 configured to perform power management for device 1000, a wired or wireless network interface 1026 configured to connect device 1000 to a network, and an input-output (I/O) interface 1028. The device 1000 may operate based on an operating system stored in the memory 1022, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, or the like.

In an exemplary embodiment, a storage medium comprising instructions, such as memory 1022 comprising instructions, executable by a processor of device 1000 to perform the above-described method is also provided. The storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for processing multimedia resources, comprising:

2. The method for processing multimedia resources according to claim 1, wherein the generating manner of the word combinations corresponding to the multimedia resources comprises:

3. The method for processing the multimedia resource according to claim 2, wherein the generating the word combination corresponding to the multimedia resource according to the word combination corresponding to the text information in the document format and the text information in the tag format includes:

4. The method for processing multimedia resources according to claim 1, wherein the obtaining the prediction probability of each word and the information domain feature of each word comprises:

5. The method as claimed in any one of claims 1 to 4, wherein the estimating a word weight of each word according to the fusion feature of each word, and ranking each word according to the estimated word weight to obtain a ranked word combination comprises:

6. The method for processing multimedia resources according to claim 5, wherein the generation manner of the ranking learning model comprises:

7. The method of claim 6, wherein the constructing the training sample set comprises:

acquiring word weight of each sample word in each original word combination;

8. An apparatus for processing a multimedia resource, comprising:

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the processing method of the multimedia resource of any of claims 1 to 7.

10. A storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a method of processing a multimedia asset as claimed in any one of claims 1 to 7.