WO2018161880A1 - 媒体搜索词推送方法、装置及存储介质 - Google Patents
媒体搜索词推送方法、装置及存储介质 Download PDFInfo
- Publication number
- WO2018161880A1 WO2018161880A1 PCT/CN2018/078084 CN2018078084W WO2018161880A1 WO 2018161880 A1 WO2018161880 A1 WO 2018161880A1 CN 2018078084 W CN2018078084 W CN 2018078084W WO 2018161880 A1 WO2018161880 A1 WO 2018161880A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- media
- user
- application
- keyword
- information
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Definitions
- the present application relates to the field of Internet technologies, and in particular, to a media search term pushing method and apparatus.
- the embodiment of the present invention provides a media search term pushing method and device, which can recommend a media search term to a user based on the user's Internet user behavior data, which can effectively improve the efficiency of the user acquiring information through the media application.
- an embodiment of the present invention provides a media search term pushing method, which is applied to a media search word pushing device, and the method includes:
- user behavior data of the second user application by the associated user of the user, where the user behavior data includes at least one piece of media information corresponding to the user behavior of the associated user using the second media application;
- an embodiment of the present invention further provides a media search term pushing device, where the device includes a processor and a memory, where the memory stores instructions executable by the processor, when the instruction is executed,
- the processor is used to:
- the user behavior data of the second user application by the associated user of the user, where the user behavior data includes at least one piece of media information corresponding to the user behavior of the associated user using the second media application;
- an embodiment of the present invention further provides a non-volatile computer storage medium in which a computer program for executing the above method is stored.
- FIG. 1 is a schematic structural diagram of an implementation scenario of a media search word pushing method according to an embodiment of the present invention
- FIG. 2 is a schematic flowchart of an implementation process of a media search word pushing method according to an embodiment of the present invention
- FIG. 3 is a schematic structural diagram of an implementation scenario of a media search term pushing method according to another embodiment of the present invention.
- FIG. 4 is a schematic flowchart of an implementation process of a media search word pushing method according to another embodiment of the present invention.
- FIG. 5 is a schematic flowchart of extracting media keywords in an embodiment of the present invention.
- FIG. 6 is a schematic structural diagram of a media search word pushing apparatus in an embodiment of the present invention.
- FIG. 7 is a schematic structural diagram of a keyword extraction module according to an embodiment of the present invention.
- FIG. 8 is a schematic structural diagram of a search word pushing module according to an embodiment of the present invention.
- FIG. 9 is a schematic structural diagram of a hardware component of a media search word pushing apparatus according to an embodiment of the present invention.
- the media search term pushing method in the embodiment of the present invention is implemented by a media search word pushing device, which may be an internet client for obtaining media information from the Internet, for example, network music, without special explanation.
- the first media application and the second media application in the embodiment of the present invention may be different functions of the Internet client, for example, the first media application is a network music application, then the application, the network news application, the network video application, or the browser application, etc.
- the second media application may be a network news application, a network video application, or a browser application. If the first media application is a network video application, the second media application may be a network music application, a network news application, or a browser application. This type of push.
- the first media application and the second media application in the embodiment of the present invention may be Internet applications with different functions used by the user on the same user terminal, or may be Internet applications with different functions used by the user on different user terminals, respectively. Different implementation scenarios
- FIG. 1 is a schematic structural diagram of an implementation scenario of a media search term pushing method in an embodiment of the present invention.
- the media search term pushing device can be implemented in the background server 1102 of the first media application, and
- the flow of the media search word pushing method in this embodiment may include as shown in FIG. 2:
- the media search word pushing device acquires user identification information of a current user of the first media application.
- the first media application in the user terminal 1101 may send the user identification information of the current user to the media search word pushing device of the background server after being activated, and may be the first media application actively reporting or the media search term.
- the push device actively pulls from the first media application, and the user identification information may be a user login account or a bound mobile phone number, an email account, and the like.
- the media search term pushing device acquires, according to the user identification information, the user behavior data of the associated user of the user using the second media application from the background server 1103 of the second media, where the user behavior data includes the The associated user uses at least one piece of media information corresponding to the user behavior of the second media application.
- the background server 1103 of the second media application may share the user behavior data of the user using the second media application to the background server 1102 of the first media application, so that the media search word pushing device may be according to the current user
- the user identification information acquires user behavior data of the second media application used by the associated user of the user.
- the media search term pushing device requests the background server 1103 of the second media application to provide user behavior data of the associated user of the user according to the user identifier information of the current user, for example, may be applied by the second media application.
- the media search term pushing device only needs to provide the user identification information of the current user, for example, the openID, and the background server 1103 of the second media application can return the associated user of the current user to the media search word pushing device.
- User behavior data is only needs to provide the user identification information of the current user, for example, the openID, and the background server 1103 of the second media application can return the associated user of the current user to the media search word pushing device.
- the current user and the associated user of the current user mentioned in the embodiment of the present invention may be user identification information of the same physical user and user identification information of the second media application, which may be represented by the user account.
- the user account used by the current user and the associated user of the current user may be the same or different, but the association relationship between the two user identities needs to be established in advance in the background server, for example, the user login account of Xiao Ming using the first media application. It is ABC2005, Xiaoming's user login account using the second media application is BCD2005, and Huaweing can request to establish the association relationship with the user login account ABC2005 of the first media application when the second media application creates the BCD2005 account.
- the request for establishing the association relationship between the two user login accounts is subsequently submitted in the second media application process, and the background server 1103 of the second media application sends the association to the background server 1102 of the first media application after receiving the request.
- Confirm the inquiry message and receive the use of ABC2 After the association determination message sent by the first media application of the user login account of 005, the association relationship between the two user accounts is established; Huaweing requests the background server 1102 of the first media application to establish an association between the two user login accounts.
- the manner of the relationship is the same as that in the embodiment of the present invention.
- the media search term pushing device when the media search term pushing device requests the background server 1103 of the second media application to provide the user behavior data of the associated user of the user, the user account of the associated user of the second media application may be authorized, in the user.
- the background server 1103 that initiates the authorization of the first media application to the background media server by the second media application sends the authorization token token to the first media application, and the media search word pushing device can The token obtained from the first media application is sent to the background server 1103 of the second media application, and the background server 1103 of the second media application returns the user behavior data of the associated user of the first media application to the media according to the token.
- Searching for the word push device the authorization token can be set to an expiration date, and the authorization process does not need to be repeated during the validity period.
- the user behavior data may include a browsing behavior, a playing behavior, a collecting behavior, a sharing behavior, a downloading behavior, or an evaluation behavior of the associated user using the second media application, and each behavior may be directed to a certain media information, that is, the user.
- the media search word pushing device in the behavior data may be used by the media search term pushing device in the embodiment of the present invention to obtain the behavior of the user by using the media information corresponding to the user behavior of the second media application.
- the preference or type of interest is analyzed to specifically recommend the corresponding media search term to the user in the first media application.
- the user behavior data may include all historical user behavior records of the associated user using the second media application, or may be a user behavior record of the associated user within a recent period of time (eg, nearly one month or nearly one week, etc.).
- the media search word pushing device extracts at least one media keyword from the word segment included in the at least one media information according to the segmentation frequency statistical data of the word segment included in the at least one media information.
- the media search word pushing device extracts the media keyword from the obtained media information corresponding to the user behavior of the second media application. Can be further divided into the following links:
- the media search word pushing device performs text segmentation processing on the obtained media information, for example, a text segmentation processing method such as full-mode word segmentation or search segmentation may be used to obtain text segmentation words included in the plurality of media information.
- a text segmentation processing method such as full-mode word segmentation or search segmentation may be used to obtain text segmentation words included in the plurality of media information.
- the media information content can be pre-processed, such as garbled filtering, punctuation filtering, Chinese character simplification conversion, word segmentation, stop word filtering, and the like.
- the media search word pushing device may first perform correlation screening on the obtained media information, and may specifically be based on the preset first media application. Correlating the word segmentation, determining, in the at least one media information, at least one associated media information, where the associated media information includes at least one associated segmentation of the first media application, thereby including media information not including the associated segmentation As excluded from the associated media information, the amount of subsequent analysis calculations can be effectively reduced.
- the preset related word segment set of the first media application may be a vocabulary set of the domain in which the first media application is located, and the first media application is a network music application as an example, and the preset first media application's associated word segmentation
- the collection may include a collection of song names, a collection of artist names, a collection of album names, a collection of song type names, and the like.
- the related word segmentation of the first media application may be used to perform participle matching only for part of the media information, for example, only the title and abstract in each media information are determined. Whether the keyword segment includes the associated participle of the first media application, and without judging other parts of the media information, the amount of information processing of the correlation screening can be greatly reduced.
- the word segmentation frequency statistics of each participle may include a word frequency, a text frequency, a text number, or an inverse text frequency. Respectively, the frequency, the number of times, or the degree of meaning of the respective participles in the obtained media information (for example, "", “Y”, “Yes", “Yes”, etc., although appearing more, should not be considered Is a keyword).
- At least one media keyword may be extracted from the segmentation included in the acquired media information by using a TF-IDF (Term Frequency-Inverse Document Frequency) algorithm or a TextRank document ranking algorithm.
- TF-IDF Term Frequency-Inverse Document Frequency
- the word frequency TF may be the number of occurrences of a given participle in the certain media information divided by the total number of word segments obtained according to the plurality of media information processing.
- n i,j is the number of occurrences of the word in document d j
- the denominator is the total number of all word segment features in document d j .
- the inverse document frequency IDF may be obtained by dividing the total number of the plurality of pieces of media information by the number of pieces of media information including a participle, and then obtaining the obtained logarithm of the quotient, that is:
- is the total number of the plurality of pieces of media information
- is the number of pieces of media information containing the word t i (ie , the number of media information of n k,j ⁇ 0). Used to assess how important a word is to a document or a domain document set in a corpus.
- Tfi-df i,j tf i,j ⁇ idf i , usually the high word frequency within a particular document, and the low document frequency of the word in the entire document set, can produce a high-weight TF-IDF. Therefore, by filtering words with lower TF-IDF, you can filter out common words and retain important words.
- a predetermined number of (for example, 3, 5, or 10) particials having the highest TF-IDF among the word segmentation of each media information may be determined as a media keyword.
- the importance of the word segmentation appearing in a certain media information can be sorted by the TextRank algorithm, and the most important preset number of word segments can be determined as the media keyword.
- the weight value or the highest ranked segmentation word is extracted as a weight key by using a TF-IDF algorithm or a TextRank document ranking algorithm.
- the media search word pushing device may further perform correlation screening on the obtained weight keywords, and may specifically determine at least one of the at least one weight keyword according to the preset associated word segment set of the first media application.
- a media keyword where the media keyword is an associated word segment in the associated word segment set of the first media application, so that the weight keyword of the associated segment word is excluded as the unrelated segment word, and the user may further focus on using the first media. Search terms that may be used when applying.
- the media search word pushing device pushes the media search term to the first media application according to the at least one media keyword.
- the media search word pushing device sends all or part of the determined media keywords as the media search words to the first media application, and the first media application displays the media search words in the search.
- the column provides users with quick input of search terms. Since these media search words are media keywords that the user is more concerned about in another media application, there is also a greater possibility as the media search words used by the user on the first media application. Thereby, the efficiency of obtaining information by the user through the media application can be effectively improved.
- the media search word pushing device may acquire the search behavior statistics of the plurality of users using the at least one media keyword in the first media application, and further The segmentation frequency statistics of the at least one media keyword in the at least one media information and the search behavior statistics of the at least one media keyword in the first media application, in the at least one media keyword Determining a media search term to push the determined media search term to the first media application. And according to the segmentation frequency statistics data of the media keyword in the at least one media information, the degree of interest or degree of interest of the user on a certain media keyword may be analyzed, and the first media application is applied according to the media keyword.
- the search behavior statistics can obtain the search heat of the media keyword in the first media application, and the recommended scores of a certain media keyword can be calculated by combining the two aspects, and then the plurality of media keywords with the highest recommended score are used as the media.
- the search term is pushed to the first media application.
- the weight score for example, is the TF-IDF value, qv(i) refers to the number of times the i-th media keyword is searched in the first media application for a period of time; qv_max is the maximum number of searches for all qvs, where qv_max is used to do Normalized, in order to avoid the value of the recommended score is too high.
- the media search term pushing device in the embodiment of the present invention extracts the media search term from the media information corresponding to the user behavior by analyzing the user behavior data of the associated user on the second media application, and sends the media search term to the media search term.
- the first media application since the media search words are media keywords that the user is more concerned about on another media application, there is also a greater possibility as the media search words used by the user on the first media application, thereby Effectively improve the efficiency of users' access to information through the media application.
- FIG. 3 is a schematic structural diagram of an implementation scenario of a media search term pushing method in another embodiment of the present invention.
- the media search term pushing device 1201, the first media application 1202, and the second media application 1203 all run on the same user.
- the flow of the media search word pushing method in this embodiment as shown in the figure may include as shown in FIG. 4:
- the media search word pushing device acquires user identification information of a current user of the first media application.
- the media search term pushing device in this embodiment may acquire user behavior data of the second user application of the user by using the second media application in the same user terminal.
- the user behavior data of the associated user using the second media application may be saved in the local specified directory of the second media application, or may be recorded in the background server of the second media application, obtained by the second media application from the background server, and then submitted to the media search.
- Word push device may acquire user behavior data of the second user application of the user by using the second media application in the same user terminal.
- the user behavior data of the associated user using the second media application may be saved in the local specified directory of the second media application, or may be recorded in the background server of the second media application, obtained by the second media application from the background server, and then submitted to the media search.
- Word push device may acquire user behavior data of the second user application of the user by using the second media application in the same user terminal.
- the current user mentioned in the embodiment of the present invention and the associated user of the current user may be respectively the user identity of the same physical user applied in the first media and the user identity applied in the second media, which may be represented by the user account, current The user account used by the user and the associated user of the current user may be the same or different.
- the relationship between the two user identities may be established in advance in the background server of any media application, for example, the user who uses the first media application by Xiao Ming.
- the login account is ABC2005
- Xiaoming's user login account using the second media application is BCD2005
- Xiaoming can request the establishment of the association relationship with the user login account ABC2005 of the first media application when the second media application creates the BCD2005 account.
- the request for establishing the association relationship between the two user login accounts submitted in the second media application process may be used, and the background server of the second media application sends the request to the background server of the first media application after receiving the request.
- Correlate confirmation inquiry message and receive it when it is received After the association determination message sent by the first media application of the user login account of the ABC 2005, the association relationship between the two user accounts is established; Xiao Ming requests the background server of the first media application to establish the association relationship between the two user login accounts.
- Xiao Ming requests the background server of the first media application to establish the association relationship between the two user login accounts.
- the first media application and the second media application may be triggered to initiate a relationship with each other, or triggered by the same third-party application, that is, the user triggers the activation of the second media application when using the first media application.
- the user triggers the activation of the first media application when using the second media application
- the current user account of the first media application is obviously associated with the current user account of the second media application, if the user is using the third application (for example, When the first media application and the second media application are triggered by the instant messaging application or the SNS application, the current user account of the first media application and the current user account of the second media application are both associated with the user account of the third application. It is obvious that the current user account of the first media application is also associated with the current user account of the second media application.
- the media search word pushing device may send the user identification information of the current user of the first media application to the second media application, and the second media application searches for the associated user corresponding to the user identification information, and searches for The user behavior data of the associated user is sent to the media search word pushing device.
- the media search term pushing device may obtain the information of the associated user from the first media application according to the user identifier information of the current user of the first media application, thereby requesting the second media application to provide the associated user. User behavior data.
- the media search word pushing device does not run on the same user terminal as the first media application and the second media application, for example, the first media application and the second media application run on the same user terminal, and the media search The word pushing device is implemented in the background server of the first media application, and the media search word pushing device can also request the second media application to obtain the second media application through the inter-process communication between the first media application and the second media application.
- the associated user of the current user uses the user behavior data of the second media application.
- S403 in this embodiment may further include:
- S4031 Determine, according to the preset association word segmentation of the first media application, at least one associated media information, where the associated media information includes at least one association of the first media application. Participle.
- the preset related word segment set of the first media application may be a vocabulary set of the domain in which the first media application is located, and the first media application is a network music application as an example, and the preset first media application's associated word segmentation
- the collection may include a collection of song names, a collection of artist names, a collection of album names, a collection of song type names, and the like.
- S4032 Extract at least one weight keyword from the participle included in the at least one associated media information according to the segmentation frequency statistics of the participle included in the at least one associated media information.
- the weight keywords refer to S104 in the foregoing embodiment, and details are not described in this embodiment.
- S4033 Determine, according to the preset association word segment set of the first media application, at least one media keyword in the at least one weight keyword, where the media keyword is in an associated word segment set of the first media application. Associated word segmentation.
- the media search word pushing device may obtain, from a background server of the first media application, a search behavior statistic of a plurality of users using the at least one media keyword in the first media application for a period of time. data.
- the media search term is determined in the keyword.
- the degree of interest or degree of interest of the user on a certain media keyword may be analyzed, and the first media application is applied according to the media keyword.
- the search behavior statistics can obtain the search heat of the media keyword in the first media application, and the recommended scores of a certain media keyword can be calculated by combining the two aspects, and then the plurality of media keywords with the highest recommended score are used as the media.
- the search term is pushed to the first media application.
- the weight score for example, is the TF-IDF value, qv(i) refers to the number of times the i-th media keyword is searched in the first media application for a period of time; qv_max is the maximum number of searches for all qvs, where qv_max is used to do Normalized, in order to avoid the value of the recommended score is too high.
- the media search term pushing device sends the media search term to the first media application, and the first media application displays the media search term in the search bar to provide a user to quickly input a search term
- These media search words are media keywords that the user pays more attention to in another media application, and therefore have a greater possibility as media search words used by the user on the first media application, thereby effectively improving the user's application through the media. Get information efficiency.
- the media search term pushing method of the present application can be extended to more implementation scenario architectures.
- the first media application and the second media application run on different user terminals, and the first media application or the media search word pushing device sends a request to the second media application to request the associated user to use the user behavior data of the second media application to determine the media.
- the search term, and thus the embodiment obtained without the creative labor extension, should belong to the technical solution claimed in the present application.
- FIG. 6 is a schematic structural diagram of a media search term pushing device according to an embodiment of the present invention.
- the media search term pushing device in the embodiment of the present invention may be implemented in the same user terminal as the first media application, or may be implemented separately, and may also be implemented.
- the media search term pushing device in the embodiment of the present invention may include at least:
- the user identifier obtaining module 610 is configured to obtain user identifier information of a current user of the first media application.
- the user identification information may be a user login account or a bound mobile phone number, an email account, and the like.
- the media search word pushing device is implemented on the background server of the first media application, the first media application in the user terminal may send the current user's user identification information to the media search word pushing device after being activated, which may be the first The media application actively reports, or the user identifier obtaining module 610 of the media search word pushing device actively pulls from the first media application.
- the behavior data obtaining module 620 is configured to acquire user behavior data of the second user application by the associated user of the user according to the user identification information, where the user behavior data includes user behavior of the associated user using the second media application. Corresponding at least one piece of media information.
- the background server of the second media application may share the user behavior data of the user using the second media application to the background of the first media application.
- the server so that the media search word pushing device can obtain the user behavior data of the second user application of the associated user of the user according to the user identification information of the current user.
- the media search term pushing device requests the background server of the second media application to provide user behavior data of the associated user of the user according to the user identifier information of the current user, for example, may be through the background of the second media application.
- the third-party program provided by the server provides an interface or a cooperative protocol platform established by the two parties, for example, an instant messaging service open platform, an SNS open platform, etc., and obtains user behavior data of the associated user of the user from the background server of the second media application, in the implementation.
- the media search word pushing device only needs to provide the user identification information of the current user, for example, the openID, and the background server of the second media application can return the user behavior data of the associated user of the current user to the media search word pushing device.
- the media search word pushing device may directly request the user behavior data of the associated user from the second media application, and may also pass the A media application requests the second media application to send an inter-process request to obtain user behavior data of the associated user.
- the current user mentioned in the embodiment of the present invention and the associated user of the current user may be respectively the user identity of the same physical user applied in the first media and the user identity applied in the second media, which may be represented by the user account, current
- the user account used by the user and the associated user of the current user may be the same or different, but the relationship between the two user identities needs to be established in the background server in advance.
- the user login account of Xiao Ming using the first media application is ABC2005.
- Xiaoming s user login account using the second media application is BCD2005, and Huawei may request to establish an association relationship with the user login account ABC2005 of the first media application when the second media application creates the BCD2005 account, or Subsequent to submitting a request for establishing an association relationship between the two user login accounts in the second media application process, the background server of the second media application sends an association confirmation inquiry message to the background server of the first media application after receiving the request. And log in to the user who received ABC2005 After the association determination message sent by the first media application, the association relationship between the two user accounts is established; Huaweing requests the background server of the first media application to establish the association relationship between the two user login accounts. For the same reason, it will not be described in detail in the embodiments of the present invention.
- the media search word pushing device when the media search word pushing device requests the background server of the second media application to provide the user behavior data of the associated user of the user, the user account of the associated user of the second media application may be authorized to pass the user.
- the background server of the second media application sends an authorization token token to the first media application, and the media search word pushing device can
- the token obtained by the media application is sent to the background server of the second media application, and the background server of the second media application returns the user behavior data of the associated user of the first media application to the media search word pushing device according to the token.
- the authorization token can be set to an expiration date, and the authorization process does not need to be repeated during the validity period.
- the user behavior data may include a browsing behavior, a playing behavior, a collecting behavior, a sharing behavior, a downloading behavior, or an evaluation behavior of the associated user using the second media application, and each behavior may be directed to a certain media information, that is, the user.
- the media search word pushing device in the behavior data may be used by the media search term pushing device in the embodiment of the present invention to obtain the behavior of the user by using the media information corresponding to the user behavior of the second media application.
- the preference or type of interest is analyzed to specifically recommend the corresponding media search term to the user in the first media application.
- the user behavior data may include all historical user behavior records of the associated user using the second media application, or may be a user behavior record of the associated user within a recent period of time (eg, nearly one month or nearly one week, etc.).
- the keyword extraction module 630 is configured to extract at least one media keyword from the word segmentation included in the at least one media information according to the segmentation frequency statistics of the segmentation included in the at least one piece of media information.
- the media search word pushing device extracts the media keyword from the obtained media information corresponding to the user behavior of the second media application. Can be further divided into the following links:
- the media search word pushing device performs text segmentation processing on the obtained media information, for example, a text segmentation processing method such as full-mode word segmentation or search segmentation may be used to obtain text segmentation words included in the plurality of media information.
- a text segmentation processing method such as full-mode word segmentation or search segmentation may be used to obtain text segmentation words included in the plurality of media information.
- the media information content can be pre-processed, such as garbled filtering, punctuation filtering, Chinese character conversion, word segmentation, stop word filtering, and the like.
- the word segmentation frequency statistics of each participle may include a word frequency, a text frequency, a text number, or an inverse text frequency. Respectively, the frequency, the number of times, or the degree of meaning of the respective participles in the obtained media information (for example, "", “Y”, “Yes", “Yes”, etc., although appearing more, should not be considered Is a keyword).
- At least one media keyword may be extracted from the segmentation included in the acquired media information by using a TF-IDF (Term Frequency-Inverse Document Frequency) algorithm or a TextRank document ranking algorithm.
- TF-IDF Term Frequency-Inverse Document Frequency
- the word frequency TF may be the number of occurrences of a given participle in the certain media information divided by the total number of word segments obtained according to the plurality of media information processing.
- n i,j is the number of occurrences of the word in document d j
- the denominator is the total number of all word segment features in document d j .
- the inverse document frequency IDF may be obtained by dividing the total number of the plurality of pieces of media information by the number of pieces of media information including a participle, and then obtaining the obtained logarithm of the quotient, that is:
- is the total number of the plurality of media information
- is the number of media information including the word t i (ie , the number of media information of n k,j ⁇ 0). Used to assess how important a word is to a document or a domain document set in a corpus.
- Tfi-df i,j tf i,j ⁇ idf i , usually the high word frequency within a particular document, and the low document frequency of the word in the entire document set, can produce a high-weight TF-IDF. Therefore, by filtering words with lower TF-IDF, you can filter out common words and retain important words.
- a predetermined number of (for example, 3, 5, or 10) particials having the highest TF-IDF among the word segmentation of each media information may be determined as a media keyword.
- the importance of the word segmentation appearing in a certain media information can be sorted by the TextRank algorithm, and the most important preset number of word segments can be determined as the media keyword.
- the keyword extraction module 630 may further include as shown in FIG. 7:
- the association information filtering unit 631 is configured to determine, according to the preset association word segmentation of the first media application, at least one associated media information, where the associated media information includes at least one of the The associated participle of the first media application.
- the related information filtering unit 631 may first perform correlation screening on the obtained media information, which may be based on the preset related word segment set of the first media application. Determining at least one associated media information in the at least one media information, the associated media information including at least one associated segmentation of the first media application, thereby using media information not including the associated segmentation as unrelated media information Exclusion can effectively reduce the amount of subsequent analysis calculations.
- the preset related word segment set of the first media application may be a vocabulary set of the domain in which the first media application is located, and the first media application is a network music application as an example, and the preset first media application's associated word segmentation
- the collection may include a collection of song names, a collection of artist names, a collection of album names, a collection of song type names, and the like.
- the related word segmentation of the first media application may be used to perform participle matching only for part of the media information, for example, only the title and abstract in each media information are determined. Whether the keyword segment includes the associated participle of the first media application, and without judging other parts of the media information, the amount of information processing of the correlation screening can be greatly reduced.
- the keyword extracting unit 632 is configured to extract at least one weight keyword from the word segmentation included in the at least one media information according to the word segmentation frequency statistical data of the word segment included in the at least one piece of media information.
- the associated word segment filtering unit 633 is configured to determine, according to the preset associated word segment set of the first media application, at least one media keyword in the at least one weight keyword, where the media keyword is a first media application Associated participles in the associated participle collection.
- the associated word segmentation is filtered by the TF-IDF algorithm or the TextRank document ranking algorithm.
- the unit 633 may further perform correlation screening on the obtained weight keywords, and specifically may be, according to the preset association word segment set of the first media application, determining at least one media keyword in the at least one weight keyword,
- the media keyword is an associated word segment in the associated word segment set of the first media application, so that the weight keyword that is not the associated segment word is excluded as the unrelated segment word, and may be further focused on the user may use when using the first media application. Search term.
- association information filtering unit 631 and the associated word segmentation filtering unit 633 may be any one of them in other embodiments.
- the search word pushing module 640 is configured to push the media search term to the first media application according to the at least one media keyword.
- the search term pushing module 640 sends the media search term to the first media application, and the first media application displays the media search term in the search bar to provide a user to quickly input a search term,
- These media search words are media keywords that the user pays more attention to in another media application, and therefore have a greater possibility as media search words used by the user on the first media application, thereby effectively improving the user's application through the media. Get information efficiency.
- the search term pushing module 640 may further include as shown in FIG. 8:
- the search data obtaining unit 641 is configured to acquire search behavior statistics data of the plurality of users using the at least one media keyword in the first media application.
- a search term determining unit 642 configured to calculate, according to the segmentation frequency statistics of the at least one media keyword in the at least one media information and the search behavior statistics of the at least one media keyword in the first media application, A media search term is determined in the at least one media keyword.
- the degree of interest or degree of interest of the user on a certain media keyword may be analyzed, and the first media application is applied according to the media keyword.
- the search behavior statistics can obtain the search heat of the media keyword in the first media application, and the recommended scores of a certain media keyword can be calculated by combining the two aspects, and then the plurality of media keywords with the highest recommended score are used as the media.
- the search term is pushed to the first media application.
- the weight score for example, is the TF-IDF value, qv(i) refers to the number of times the i-th media keyword is searched in the first media application for a period of time; qv_max is the maximum number of searches for all qvs, where qv_max is used to do Normalized, in order to avoid the value of the recommended score is too high.
- the search word pushing unit 643 is configured to push the determined media search word to the first media application.
- the above-mentioned media search word pushing device may be an electronic device such as a PC, and may also be a portable electronic device such as a PAD, a tablet computer or a laptop computer, and is not limited to the description herein; or may be constituted by a cluster server.
- the electronic search word pushing device includes at least a database for storing data and a processor for data processing, which may include built-in storage, for merging electronic devices that are separately configured as an entity or each unit function. Media or storage media that are set up independently.
- a microprocessor for the processor for data processing, a microprocessor, a central processing unit (CPU), a digital signal processor (DSP, Digital Singnal Processor), or a programmable logic array may be used when performing processing.
- CPU central processing unit
- DSP digital signal processor
- a programmable logic array may be used when performing processing.
- FPGA Field-Programmable Gate Array
- storage medium including an operation instruction, which may be computer executable code, by which the above-described implementation of the present invention is implemented, such as FIG. 2 or 4-5 The various steps in the process of the media search word push method shown.
- the apparatus includes a processor 901, a storage medium 902, and at least one external communication interface 903; the processor 901, the storage medium 902, and the communication interface 903 are all connected by a bus 904.
- the processor 901 in the media search word pushing device can call the operation instruction in the storage medium 902 to execute the following process:
- the user behavior data of the second user application by the associated user of the user, where the user behavior data includes at least one piece of media information corresponding to the user behavior of the associated user using the second media application;
- the disclosed apparatus and method may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the unit is only a logical function division.
- there may be another division manner such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed.
- the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.
- the units described above as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units, that is, may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit;
- the unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
- the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
- the foregoing storage device includes the following steps: the foregoing storage medium includes: a mobile storage device, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
- ROM read-only memory
- RAM random access memory
- magnetic disk or an optical disk.
- optical disk A medium that can store program code.
- the above-described integrated unit of the present application may be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a stand-alone product.
- the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions.
- a computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present invention.
- the foregoing storage medium includes various media that can store program codes, such as a mobile storage device, a ROM, a RAM, a magnetic disk, or an optical disk.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种媒体搜索词推送方法,所述方法包括:获取第一媒体应用的当前用户的用户标识信息(S401);根据所述用户标识信息,获取所述用户的关联用户使用第二媒体应用的用户行为数据,所述用户行为数据包括所述关联用户使用第二媒体应用的用户行为对应的至少一个媒体信息(S402);根据所述至少一个媒体信息中包含的分词的分词频率统计数据,从所述至少一个媒体信息中包含的分词中提取得到至少一个媒体关键词(S403);向所述第一媒体应用推送确定得到的媒体搜索词(S406)。
Description
本申请要求于2017年3月8日提交中国专利局、申请号为201710135931.6、发明名称为“一种媒体搜索词推送方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及一种互联网技术领域,尤其涉及一种媒体搜索词推送方法和装置。
发明背景
随着互联网技术的发展,人们越来越多的通过互联网获取信息,而为了缩短用户在使用各类媒体应用(例如网络音乐应用、网络新闻应用、网络视频应用或浏览器应用等)获取对应的媒体信息的过程,媒体应用往往会在搜索入口提供一些热搜词推荐,而这些热搜词通常是近一段时间内用户通过该媒体应用进行搜索的高频搜索词,未能针对当前使用者的个人习惯、偏好等推荐个性化的热搜词,导致推荐的热搜词的被使用率很低,未能有效提高用户通过该媒体应用的获取信息效率。
发明内容
有鉴于此,本发明实施例提供一种媒体搜索词推送方法和装置,可基于用户的互联网用户行为数据向用户推荐媒体搜索词,可有效提高用户通过该媒体应用的获取信息效率。
为了解决上述技术问题,本发明实施例提供了一种媒体搜索词推送方法,应用于媒体搜索词推送装置,所述方法包括:
获取第一媒体应用的当前用户的用户标识信息;
根据所述用户标识信息,获取所述用户的关联用户使用第二媒体应 用的用户行为数据,所述用户行为数据包括所述关联用户使用第二媒体应用的用户行为对应的至少一个媒体信息;
根据所述至少一个媒体信息中包含的分词的分词频率统计数据,从所述至少一个媒体信息中包含的分词中提取得到至少一个媒体关键词;
根据所述至少一个媒体关键词向所述第一媒体应用推送媒体搜索词。
相应地,本发明实施例还提供了一种媒体搜索词推送装置,所述装置包括处理器和存储器,所述存储器中存储可被所述处理器执行的指令,当执行所述指令时,所述处理器用于:
获取第一媒体应用的当前用户的用户标识信息;
根据所述用户标识信息,获取所述用户的关联用户使用第二媒体应用的用户行为数据,所述用户行为数据包括所述关联用户使用第二媒体应用的用户行为对应的至少一个媒体信息;
根据所述至少一个媒体信息中包含的分词的分词频率统计数据,从所述至少一个媒体信息中包含的分词中提取得到至少一个媒体关键词;
根据所述至少一个媒体关键词向所述第一媒体应用推送媒体搜索词。
相应地,本发明实施例还提供一种非易失性计算机存储介质,其中存储有计算机程序,该计算机程序用于执行上述方法。
附图简要说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例中的一种媒体搜索词推送方法的实施场景结构示意图;
图2是本发明实施例中的一种媒体搜索词推送方法的实施流程示意图;
图3是本发明另一实施例中的媒体搜索词推送方法的实施场景结构示意图;
图4是本发明另一实施例中的一种媒体搜索词推送方法的实施流程示意图;
图5是本发明实施例中提取媒体关键词的流程示意图;
图6是本发明实施例中的媒体搜索词推送装置的结构示意图;
图7是本发明一实施例中关键词提取模块的结构示意图;
图8是本发明一实施例中搜索词推送模块的结构示意图;
图9是本发明实施例的媒体搜索词推送装置的一个硬件组成结构示意图。
实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本发明实施例中的媒体搜索词推送方法在不作出特别说明的情况下,由媒体搜索词推送装置实施,所述媒体应用可以为用于从互联网获取媒体信息的互联网客户端,可以例如网络音乐应用、网络新闻应用、网络视频应用或浏览器应用等,本发明实施例中的第一媒体应用和第二媒体应用可以为不同功能的互联网客户端,例如第一媒体应用为网络音乐应用,那么第二媒体应用可以为网络新闻应用、网络视频应用或浏览器应用等,若第一媒体应用为网络视频应用,那么第二媒体应用可以为网络音乐应用、网络新闻应用或浏览器应用等,以此类推。本发明实施例中的第一媒体应用和第二媒体应用可以为用户在同一用户终端上使 用的不同功能的互联网应用,也可以为用户在不同用户终端上使用的不同功能的互联网应用,分别针对的是不同的实施场景
图1是本发明实施例中的一种媒体搜索词推送方法的实施场景结构示意图,如图所示在本实施例中媒体搜索词推送装置可以实现于第一媒体应用的后台服务器1102中,而本实施例中的媒体搜索词推送方法流程可以如图2所示包括:
S101,媒体搜索词推送装置获取第一媒体应用的当前用户的用户标识信息。
具体的,可以是用户终端1101中的第一媒体应用在启动后将当前用户的用户标识信息发送至后台服务器的媒体搜索词推送装置,可以是第一媒体应用主动上报,也可以是媒体搜索词推送装置主动从第一媒体应用拉取,所述用户标识信息可以是用户登录账号或绑定的手机号码、邮箱账号等。
S102-S103,媒体搜索词推送装置根据所述用户标识信息从所述第二媒体的后台服务器1103获取所述用户的关联用户使用第二媒体应用的用户行为数据,所述用户行为数据包括所述关联用户使用第二媒体应用的用户行为对应的至少一个媒体信息。
在一个实施例中,第二媒体应用的后台服务器1103可以将用户使用第二媒体应用的用户行为数据共享给第一媒体应用的后台服务器1102,从而媒体搜索词推送装置可以根据所述当前用户的用户标识信息获取到该用户的关联用户使用第二媒体应用的用户行为数据。在另一实施方式中,媒体搜索词推送装置根据所述当前用户的用户标识信息,请求第二媒体应用的后台服务器1103提供该用户的关联用户的用户行为数据,例如可以通过第二媒体应用的后台服务器1103提供的第三方程序提供接口或双方建立的合作协议平台,例如即时通讯服务开放平台、SNS开放平台等,从第二媒体应用的后台服务器1103获取该用户的关联用户的用户行为数据,在该实施方式下媒体搜索词推送装置只需提供所述当前用户的用户标识信息,例如openID,第二媒体应用的后台服务 器1103即可向媒体搜索词推送装置返回所述当前用户的关联用户的用户行为数据。
本发明实施例中提及的当前用户以及当前用户的关联用户,可以分别为同一实际使用者在第一媒体应用的用户标识信息和在第二媒体应用的用户标识信息,其可以通过用户账号代表,当前用户以及当前用户的关联用户使用的用户账号可以相同,也可以不同,但均需要预先在后台服务器建立两个用户身份之间的关联关系,例如小明在使用第一媒体应用的用户登录账号是ABC2005,小明在使用第二媒体应用的用户登录账号是BCD2005,而小明可以在第二媒体应用创建BCD2005账号的时候就请求建立与第一媒体应用的用户登录账号ABC2005之间的关联关系,也可以是后续使用第二媒体应用过程中提交建立这两个用户登录账号之间的关联关系的请求,第二媒体应用的后台服务器1103在接收到请求后向第一媒体应用的后台服务器1102发送关联确认询问消息,并在接收到使用ABC2005的用户登录账号的第一媒体应用发送的关联确定消息后,建立这两个用户账号之间的关联关系;小明请求第一媒体应用的后台服务器1102建立这两个用户登录账号之间的关联关系的方式与此同理,本发明实施例中不再赘述。
在一个实施例中,媒体搜索词推送装置请求第二媒体应用的后台服务器1103提供该用户的关联用户的用户行为数据时,可以需要经过第二媒体应用的关联用户的用户账号的授权,在用户通过第二媒体应用向其后台服务器发起针对第一媒体应用的授权后第二媒体应用的后台服务器1103将一个授权令牌token下发给第一媒体应用,媒体搜索词推送装置在需要时可以将从第一媒体应用获取到的token发送至第二媒体应用的后台服务器1103,第二媒体应用的后台服务器1103根据该token将所述第一媒体应用当前用户的关联用户的用户行为数据返回给媒体搜索词推送装置,该授权token可以设置一个有效期,在有效期内不需重复进行授权过程。
所述用户行为数据可以包括所述关联用户在使用第二媒体应用的 浏览行为、播放行为、收藏行为、分享行为、下载行为或评价行为等,每种行为均可以针对某一个媒体信息,即用户行为数据中每个用户行为对应的媒体信息,本发明实施例中的媒体搜索词推送装置通过获取到所述关联用户使用第二媒体应用的用户行为对应的媒体信息可以对该用户的行为习惯、喜好或关注类型进行分析,以便针对性的在第一媒体应用向用户推荐相应的媒体搜索词。所述用户行为数据可以包括所述关联用户使用第二媒体应用的所有历史用户行为记录,也可以是该关联用户最近一段时间内(例如近一个月或近一周等)的用户行为记录。
S104,媒体搜索词推送装置根据所述至少一个媒体信息中包含的分词的分词频率统计数据,从所述至少一个媒体信息中包含的分词中提取得到至少一个媒体关键词。
即媒体搜索词推送装置通过分析获取到的所述关联用户使用第二媒体应用的用户行为对应的媒体信息,从中提取媒体关键词。可以进一步拆分为以下几个环节:
1)媒体搜索词推送装置分别对获取到的媒体信息进行文本分词处理,例如可以采用全模式分词或搜索分词等文本分词处理方式,得到所述多个媒体信息中包含的文本分词。此外在分词处理之前还可以对媒体信息内容进行预处理,例如乱码过滤、标点过滤、汉字繁简转换、分词、停用词过滤等。
在一个实施例中,媒体搜索词推送装置在对获取到的媒体信息进行文本分词处理之前,还可以先对获取到的媒体信息进行相关性筛选,具体可以为根据预设的第一媒体应用的关联分词集合,在所述至少一个媒体信息中确定得到至少一个关联媒体信息,所述关联媒体信息中包含至少一个所述第一媒体应用的关联分词,从而将不包含所述关联分词的媒体信息作为不关联媒体信息排除,可以有效降低后续的分析计算量。所述预设的第一媒体应用的关联分词集合可以为第一媒体应用所处领域的词汇集合,以第一媒体应用为网络音乐应用为例,所述预设的第一媒体应用的关联分词集合可以包括歌曲名称集合、歌手名称集合、专辑名 称集合、歌曲类型名称集合等。进一步的,在对媒体信息进行相关性筛选时,可以根据预设的第一媒体应用的关联分词集合仅针对媒体信息中的部分内容进行分词匹配,例如只判断每个媒体信息中的标题、摘要或关键词标签中是否包含所述第一媒体应用的关联分词,而不用再判断媒体信息中的其他部分,可以大幅减少相关性筛选的信息处理量。
2)获取媒体信息中包含的各个分词的分词频率统计数据。具体的,所述各个分词的分词频率统计数据可以包括词频、文本频率、文本数或逆文本频率等。分别可以表示所述各个分词在获取到的媒体信息中出现的频率、次数或者意义程度(例如“的”,“了”,“是”、“可以”等虽然出现再多,也不应该被认为是关键词)。
3)根据所述多个媒体信息中包含的各个分词的分词频率统计数据,从中提取媒体关键词。
在一个实施例中,可以通过TF-IDF(Term Frequency-Inverse Document Frequency,词频-逆文档频率)算法或TextRank文档排名算法,从获取到的媒体信息包含的分词中提取至少一个媒体关键词。
以TF-IDF算法为例,词频TF可以为某一个给定的分词在所述某个媒体信息中出现的次数除以根据所述多个媒体信息处理得到的分词总数,
其中n
i,j是该词在文档d
j中的出现次数,而分母则是在文档d
j中所有分词特征的总数。而逆文档频率IDF,可以由所述多个媒体信息的总数量除以包含某个分词的媒体信息的数量,再将得到的商取对数得到,即:
其中|D|为所述多个媒体信息的总数,|{j:t
i∈d
j}|为包含词语t
i的媒体信息的数量(即n
k,j≠0的媒体信息的数目)。用以评估一个词对于一个文档或者一个语料库中的一个领域文档集的重要程度。
tfi-df
i,j=tf
i,j×idf
i,通常某一特定文档内的高词语频率,以及该词语在整个文档集合中的低文档频率,可以产生出高权重的TF-IDF。因此,通过过滤TF-IDF较低的词语,可以过滤掉常见的词语,保留重要 的词语。在本发明实施例中,可以将每个媒体信息的分词中,TF-IDF最高的预设数量的(例如3个、5个或10个)分词确定为媒体关键词。
同样的可以通过TextRank算法将某个媒体信息中出现的分词的重要性进行排序,并将重要性最高的预设数量的分词确定为媒体关键词。
在一个实施例中,在经过上述根据所述多个媒体信息中包含的各个分词的分词频率统计数据,通过TF-IDF算法或TextRank文档排名算法提取得到权重数值或排名最高的若干分词作为权重关键词后,媒体搜索词推送装置还可以对得到的权重关键词进行相关性筛选,具体可以为根据预设的第一媒体应用的关联分词集合,在所述至少一个权重关键词中确定得到至少一个媒体关键词,所述媒体关键词为第一媒体应用的关联分词集合中的关联分词,从而将是所述关联分词的权重关键词作为不关联分词排除,可以进一步聚焦于用户在使用第一媒体应用时可能会用到的搜索词。
S105,媒体搜索词推送装置根据所述至少一个媒体关键词向所述第一媒体应用推送媒体搜索词。
本实施例中,媒体搜索词推送装置将全部或部分确定得到的媒体关键词作为所述媒体搜索词发送至所述第一媒体应用,所述第一媒体应用将所述媒体搜索词显示在搜索栏中提供用户快速输入搜索词,由于这些媒体搜索词是该用户在另一媒体应用上较为关注的媒体关键词,因此也有较大可能性作为用户在第一媒体应用上使用的媒体搜索词,从而可有效提高用户通过该媒体应用的获取信息效率。
进而在一个实施例中,在提取得到至少一个媒体关键词后,媒体搜索词推送装置可以获取多个用户在所述第一媒体应用使用所述至少一个媒体关键词的搜索行为统计数据,进而根据所述至少一个媒体关键词在所述至少一个媒体信息中的分词频率统计数据和所述至少一个媒体关键词在所述第一媒体应用的搜索行为统计数据,在所述至少一个媒体关键词中确定媒体搜索词,从而向所述第一媒体应用推送所述确定得到的媒体搜索词。根据所述媒体关键词在所述至少一个媒体信息中的分词 频率统计数据可以分析得到用户对某个媒体关键词的关注程度或感兴趣程度,而根据媒体关键词在所述第一媒体应用的搜索行为统计数据可以得到该媒体关键词在第一媒体应用的搜索热度,综合这两方面可以计算得到某个媒体关键词的推荐分值,进而将推荐分值最高的若干个媒体关键词作为媒体搜索词推送给第一媒体应用。例如基于如下公式计算推荐分值:RecommScore=KeyScore(i)*qv(i)/qv_max,其中KeyScore(i)为第i个媒体关键词在所述至少一个媒体信息中的分词频率统计数据确定的权重分值,例如为TF-IDF值,qv(i)指第i个媒体关键词在第一媒体应用一段时间内的被搜索次数;qv_max是所有qv的最大搜索次数,此处qv_max用来做归一化,为了避免推荐分值的数值过高。
本发明实施例中的媒体搜索词推送装置通过分析关联用户在第二媒体应用上的用户行为数据,从其用户行为对应的媒体信息中提取得到媒体搜索词,并将所述媒体搜索词发送至所述第一媒体应用,由于这些媒体搜索词是该用户在另一媒体应用上较为关注的媒体关键词,因此也有较大可能性作为用户在第一媒体应用上使用的媒体搜索词,从而可有效提高用户通过该媒体应用的获取信息效率。
图3是本发明另一实施例中的媒体搜索词推送方法的实施场景结构示意图,在本实施例中媒体搜索词推送装置1201、第一媒体应用1202以及第二媒体应用1203均运行于同一用户终端中,如图所示本实施例中的媒体搜索词推送方法流程可以如图4中所示包括:
S401,媒体搜索词推送装置获取第一媒体应用的当前用户的用户标识信息。
S402,根据所述用户标识信息,获取所述用户的关联用户使用第二媒体应用的用户行为数据,所述用户行为数据包括所述关联用户使用第二媒体应用的用户行为对应的至少一个媒体信息。
区别于前文图1的实施场景结构,本实施例中的媒体搜索词推送装置可以从同一用户终端中的第二媒体应用获取所述用户的关联用户使 用第二媒体应用的用户行为数据,所述关联用户使用第二媒体应用的用户行为数据可以保存在第二媒体应用本地指定目录,也可以记录在第二媒体应用的后台服务器,由第二媒体应用从其后台服务器获取到后交由媒体搜索词推送装置。
本发明实施例中提及的当前用户以及当前用户的关联用户,可以分别为同一实际使用者在第一媒体应用的用户身份和在第二媒体应用的用户身份,其可以通过用户账号代表,当前用户以及当前用户的关联用户使用的用户账号可以相同,也可以不同,可以预先在其中任一媒体应用的后台服务器建立两个用户身份之间的关联关系,例如小明在使用第一媒体应用的用户登录账号是ABC2005,小明在使用第二媒体应用的用户登录账号是BCD2005,而小明可以在第二媒体应用创建BCD2005账号的时候就请求建立与第一媒体应用的用户登录账号ABC2005之间的关联关系,也可以是后续使用第二媒体应用过程中提交的建立这两个用户登录账号之间的关联关系的请求,第二媒体应用的后台服务器在接收到请求后向第一媒体应用的后台服务器发送关联确认询问消息,并在接收到使用ABC2005的用户登录账号的第一媒体应用发送的关联确定消息后,建立这两个用户账号之间的关联关系;小明请求第一媒体应用的后台服务器建立这两个用户登录账号之间的关联关系的方式与此同理,本发明实施例中不再赘述。在本实施例中,第一媒体应用和第二媒体应用之间可以为相互触发启动的关系,或由同一第三方应用触发启动,即用户在使用第一媒体应用时触发启动第二媒体应用,或用户在使用第二媒体应用时触发启动第一媒体应用,那么第一媒体应用当前的用户账号与第二媒体应用当前的用户账号显然就是关联的,同理若用户在使用第三应用(例如为即时通讯应用或SNS应用)时触发启动了第一媒体应用和第二媒体应用,那么第一媒体应用当前的用户账号与第二媒体应用当前的用户账号均与第三应用的用户账号是关联的,显然第一媒体应用当前的用户账号与第二媒体应用当前的用户账号之间也是关联的。
在其他实施例中,媒体搜索词推送装置可以将第一媒体应用的当前 用户的用户标识信息发送给第二媒体应用,由第二媒体应用查找所述用户标识信息对应的关联用户,并将查找到的关联用户的用户行为数据发送给媒体搜索词推送装置。在另一实施例中,也可以是媒体搜索词推送装置根据第一媒体应用的当前用户的用户标识信息从第一媒体应用获取其关联用户的信息,从而请求第二媒体应用提供该关联用户的用户行为数据。
进而在其他实施场景结构下,若媒体搜索词推送装置不与第一媒体应用以及第二媒体应用运行在同一用户终端,例如第一媒体应用和第二媒体应用运行在同一用户终端,而媒体搜索词推送装置实现在第一媒体应用的后台服务器,那么媒体搜索词推送装置也可以通过第一媒体应用与第二媒体应用之间的进程间通信,由第一媒体应用向第二媒体应用请求获取当前用户的关联用户使用第二媒体应用的用户行为数据。
S403,根据所述至少一个媒体信息中包含的分词的分词频率统计数据,从所述至少一个媒体信息中包含的分词中提取得到至少一个媒体关键词。
本实施例中的S403如图5所示进一步可以包括:
S4031,根据预设的所述第一媒体应用的关联分词集合,在所述至少一个媒体信息中确定得到至少一个关联媒体信息,所述关联媒体信息中包含至少一个所述第一媒体应用的关联分词。
所述预设的第一媒体应用的关联分词集合可以为第一媒体应用所处领域的词汇集合,以第一媒体应用为网络音乐应用为例,所述预设的第一媒体应用的关联分词集合可以包括歌曲名称集合、歌手名称集合、专辑名称集合、歌曲类型名称集合等。
S4032,根据所述至少一个关联媒体信息中包含的分词的分词频率统计数据,从所述至少一个关联媒体信息中包含的分词中提取得到至少一个权重关键词。提取权重关键词的方式可以参考前文实施例中的S104,本实施例中不再赘述。
S4033,根据预设的所述第一媒体应用的关联分词集合,在所述至 少一个权重关键词中确定得到至少一个媒体关键词,所述媒体关键词为第一媒体应用的关联分词集合中的关联分词。
S404,获取多个用户在所述第一媒体应用使用所述至少一个媒体关键词的搜索行为统计数据。
在本实施例的实施场景结构下,媒体搜索词推送装置可以从第一媒体应用的后台服务器获取一段时间内多个用户在所述第一媒体应用使用所述至少一个媒体关键词的搜索行为统计数据。
S405,根据所述至少一个媒体关键词在所述至少一个媒体信息中的分词频率统计数据和所述至少一个媒体关键词在所述第一媒体应用的搜索行为统计数据,在所述至少一个媒体关键词中确定媒体搜索词。
根据所述媒体关键词在所述至少一个媒体信息中的分词频率统计数据可以分析得到用户对某个媒体关键词的关注程度或感兴趣程度,而根据媒体关键词在所述第一媒体应用的搜索行为统计数据可以得到该媒体关键词在第一媒体应用的搜索热度,综合这两方面可以计算得到某个媒体关键词的推荐分值,进而将推荐分值最高的若干个媒体关键词作为媒体搜索词推送给第一媒体应用。例如基于如下公式计算推荐分值:RecommScore=KeyScore(i)*qv(i)/qv_max,其中KeyScore(i)为第i个媒体关键词在所述至少一个媒体信息中的分词频率统计数据确定的权重分值,例如为TF-IDF值,qv(i)指第i个媒体关键词在第一媒体应用一段时间内的被搜索次数;qv_max是所有qv的最大搜索次数,此处qv_max用来做归一化,为了避免推荐分值的数值过高。
S406,向所述第一媒体应用推送所述确定得到的媒体搜索词。
本实施例中,媒体搜索词推送装置将所述媒体搜索词发送至所述第一媒体应用,所述第一媒体应用将所述媒体搜索词显示在搜索栏中提供用户快速输入搜索词,由于这些媒体搜索词是该用户在另一媒体应用上较为关注的媒体关键词,因此也有较大可能性作为用户在第一媒体应用上使用的媒体搜索词,从而可有效提高用户通过该媒体应用的获取信息效率。
需要指出的是,以上仅是结合示例性的两种实施场景架构下媒体搜索词推送方法的实施过程,根据上述介绍,可以扩展到更多的实施场景架构中实现本申请的媒体搜索词推送方法,例如第一媒体应用和第二媒体应用运行在不同的用户终端,由第一媒体应用或媒体搜索词推送装置向第二媒体应用发送请求关联用户使用第二媒体应用的用户行为数据从而确定媒体搜索词,由此不经创造性劳动扩展得到的实施例均应属于本申请权利要求保护的技术方案。
图6是本发明实施例中的媒体搜索词推送装置的结构示意图,本发明实施例中的媒体搜索词推送装置可以与第一媒体应用实现在同一用户终端中,也可以单独实现,还可以实现在第一媒体应用的后台服务器侧,如图所示本发明实施例中的媒体搜索词推送装置至少可以包括:
用户标识获取模块610,用于获取第一媒体应用的当前用户的用户标识信息。
具体的,所述用户标识信息可以是用户登录账号或绑定的手机号码、邮箱账号等。若媒体搜索词推送装置实现在第一媒体应用的后台服务器上,则可以是用户终端中的第一媒体应用在启动后将当前用户的用户标识信息发送至媒体搜索词推送装置,可以是第一媒体应用主动上报,也可以是媒体搜索词推送装置的用户标识获取模块610主动从第一媒体应用拉取。
行为数据获取模块620,用于根据所述用户标识信息,获取所述用户的关联用户使用第二媒体应用的用户行为数据,所述用户行为数据包括所述关联用户使用第二媒体应用的用户行为对应的至少一个媒体信息。
在一个实施例中,若媒体搜索词推送装置实现在第一媒体应用的后台服务器上,第二媒体应用的后台服务器可以将用户使用第二媒体应用的用户行为数据共享给第一媒体应用的后台服务器,从而媒体搜索词推送装置可以根据所述当前用户的用户标识信息获取到该用户的关联用 户使用第二媒体应用的用户行为数据。在另一实施方式中,媒体搜索词推送装置根据所述当前用户的用户标识信息,请求第二媒体应用的后台服务器提供该用户的关联用户的用户行为数据,例如可以通过第二媒体应用的后台服务器提供的第三方程序提供接口或双方建立的合作协议平台,例如即时通讯服务开放平台、SNS开放平台等,从第二媒体应用的后台服务器获取该用户的关联用户的用户行为数据,在该实施方式下媒体搜索词推送装置只需提供所述当前用户的用户标识信息,例如openID,第二媒体应用的后台服务器即可向媒体搜索词推送装置返回所述当前用户的关联用户的用户行为数据。若媒体搜索词推送装置、第一媒体应用以及第二媒体应用实现在同一用户终端中,该媒体搜索词推送装置可以直接从第二媒体应用请求所述关联用户的用户行为数据,还可以通过第一媒体应用向第二媒体应用发送进程间请求的方式请求获取所述关联用户的用户行为数据。
本发明实施例中提及的当前用户以及当前用户的关联用户,可以分别为同一实际使用者在第一媒体应用的用户身份和在第二媒体应用的用户身份,其可以通过用户账号代表,当前用户以及当前用户的关联用户使用的用户账号可以相同,也可以不同,但均需要预先在后台服务器建立两个用户身份之间的关联关系,例如小明在使用第一媒体应用的用户登录账号是ABC2005,小明在使用第二媒体应用的用户登录账号是BCD2005,而小明可以在第二媒体应用创建BCD2005账号的时候就请求建立与第一媒体应用的用户登录账号ABC2005之间的关联关系,也可以是后续使用第二媒体应用过程中提交建立这两个用户登录账号之间的关联关系的请求,第二媒体应用的后台服务器在接收到请求后向第一媒体应用的后台服务器发送关联确认询问消息,并在接收到使用ABC2005的用户登录账号的第一媒体应用发送的关联确定消息后,建立这两个用户账号之间的关联关系;小明请求第一媒体应用的后台服务器建立这两个用户登录账号之间的关联关系的方式与此同理,本发明实施例中不再赘述。
在一个实施例中,媒体搜索词推送装置请求第二媒体应用的后台服务器提供该用户的关联用户的用户行为数据时,可以需要经过第二媒体应用的关联用户的用户账号的授权,在用户通过第二媒体应用向其后台服务器发起针对第一媒体应用的授权后第二媒体应用的后台服务器将一个授权令牌token下发给第一媒体应用,媒体搜索词推送装置在需要时可以将从第一媒体应用获取到的token发送至第二媒体应用的后台服务器,第二媒体应用的后台服务器根据该token将所述第一媒体应用当前用户的关联用户的用户行为数据返回给媒体搜索词推送装置,该授权token可以设置一个有效期,在有效期内不需重复进行授权过程。
所述用户行为数据可以包括所述关联用户在使用第二媒体应用的浏览行为、播放行为、收藏行为、分享行为、下载行为或评价行为等,每种行为均可以针对某一个媒体信息,即用户行为数据中每个用户行为对应的媒体信息,本发明实施例中的媒体搜索词推送装置通过获取到所述关联用户使用第二媒体应用的用户行为对应的媒体信息可以对该用户的行为习惯、喜好或关注类型进行分析,以便针对性的在第一媒体应用向用户推荐相应的媒体搜索词。所述用户行为数据可以包括所述关联用户使用第二媒体应用的所有历史用户行为记录,也可以是该关联用户最近一段时间内(例如近一个月或近一周等)的用户行为记录。
关键词提取模块630,用于根据所述至少一个媒体信息中包含的分词的分词频率统计数据,从所述至少一个媒体信息中包含的分词中提取得到至少一个媒体关键词。
即媒体搜索词推送装置通过分析获取到的所述关联用户使用第二媒体应用的用户行为对应的媒体信息,从中提取媒体关键词。可以进一步拆分为以下几个环节:
1)媒体搜索词推送装置分别对获取到的媒体信息进行文本分词处理,例如可以采用全模式分词或搜索分词等文本分词处理方式,得到所述多个媒体信息中包含的文本分词。此外在分词处理之前还可以对媒体信息内容进行预处理,例如乱码过滤、标点过滤、汉字繁简转换、分词、 停用词过滤等。
2)获取媒体信息中包含的各个分词的分词频率统计数据。具体的,所述各个分词的分词频率统计数据可以包括词频、文本频率、文本数或逆文本频率等。分别可以表示所述各个分词在获取到的媒体信息中出现的频率、次数或者意义程度(例如“的”,“了”,“是”、“可以”等虽然出现再多,也不应该被认为是关键词)。
3)根据所述多个媒体信息中包含的各个分词的分词频率统计数据,从中提取媒体关键词。
在一个实施例中,可以通过TF-IDF(Term Frequency-Inverse Document Frequency,词频-逆文档频率)算法或TextRank文档排名算法,从获取到的媒体信息包含的分词中提取至少一个媒体关键词。
以TF-IDF算法为例,词频TF可以为某一个给定的分词在所述某个媒体信息中出现的次数除以根据所述多个媒体信息处理得到的分词总数,
其中n
i,j是该词在文档d
j中的出现次数,而分母则是在文档d
j中所有分词特征的总数。而逆文档频率IDF,可以由所述多个媒体信息的总数量除以包含某个分词的媒体信息的数量,再将得到的商取对数得到,即:
其中|D|为所述多个媒体信息的总数,|{j:t
i∈d
j}|为包含词语t
i的媒体信息的数量(即n
k,j≠0的媒体信息的数目)。用以评估一个词对于一个文档或者一个语料库中的一个领域文档集的重要程度。
tfi-df
i,j=tf
i,j×idf
i,通常某一特定文档内的高词语频率,以及该词语在整个文档集合中的低文档频率,可以产生出高权重的TF-IDF。因此,通过过滤TF-IDF较低的词语,可以过滤掉常见的词语,保留重要的词语。在本发明实施例中,可以将每个媒体信息的分词中,TF-IDF最高的预设数量的(例如3个、5个或10个)分词确定为媒体关键词。
同样的可以通过TextRank算法将某个媒体信息中出现的分词的重要性进行排序,并将重要性最高的预设数量的分词确定为媒体关键词。
在一个实施例中,关键词提取模块630可以如图7所示进一步包括:
关联信息过滤单元631,用于根据预设的所述第一媒体应用的关联分词集合,在所述至少一个媒体信息中确定得到至少一个关联媒体信息,所述关联媒体信息中包含至少一个所述第一媒体应用的关联分词。
即在对获取到的媒体信息进行文本分词处理之前,可以由关联信息过滤单元631先对获取到的媒体信息进行相关性筛选,具体可以为根据预设的第一媒体应用的关联分词集合,在所述至少一个媒体信息中确定得到至少一个关联媒体信息,所述关联媒体信息中包含至少一个所述第一媒体应用的关联分词,从而将不包含所述关联分词的媒体信息作为不关联媒体信息排除,可以有效降低后续的分析计算量。所述预设的第一媒体应用的关联分词集合可以为第一媒体应用所处领域的词汇集合,以第一媒体应用为网络音乐应用为例,所述预设的第一媒体应用的关联分词集合可以包括歌曲名称集合、歌手名称集合、专辑名称集合、歌曲类型名称集合等。进一步的,在对媒体信息进行相关性筛选时,可以根据预设的第一媒体应用的关联分词集合仅针对媒体信息中的部分内容进行分词匹配,例如只判断每个媒体信息中的标题、摘要或关键词标签中是否包含所述第一媒体应用的关联分词,而不用再判断媒体信息中的其他部分,可以大幅减少相关性筛选的信息处理量。
关键词提取单元632,用于根据所述至少一个媒体信息中包含的分词的分词频率统计数据,从所述至少一个媒体信息中包含的分词中提取得到至少一个权重关键词。
关联分词过滤单元633,用于根据预设的所述第一媒体应用的关联分词集合,在所述至少一个权重关键词中确定得到至少一个媒体关键词,所述媒体关键词为第一媒体应用的关联分词集合中的关联分词。
在经过上述根据所述多个媒体信息中包含的各个分词的分词频率统计数据,通过TF-IDF算法或TextRank文档排名算法提取得到权重数值或排名最高的若干分词作为权重关键词后,关联分词过滤单元633还可以对得到的权重关键词进行相关性筛选,具体可以为根据预设的第一 媒体应用的关联分词集合,在所述至少一个权重关键词中确定得到至少一个媒体关键词,所述媒体关键词为第一媒体应用的关联分词集合中的关联分词,从而将不是所述关联分词的权重关键词作为不关联分词排除,可以进一步聚焦于用户在使用第一媒体应用时可能会用到的搜索词。
需要指出的是,关联信息过滤单元631和关联分词过滤单元633在其他实施例中可以仅有其中的任意一个。
搜索词推送模块640,用于根据所述至少一个媒体关键词向所述第一媒体应用推送媒体搜索词。
本实施例中,搜索词推送模块640将所述媒体搜索词发送至所述第一媒体应用,所述第一媒体应用将所述媒体搜索词显示在搜索栏中提供用户快速输入搜索词,由于这些媒体搜索词是该用户在另一媒体应用上较为关注的媒体关键词,因此也有较大可能性作为用户在第一媒体应用上使用的媒体搜索词,从而可有效提高用户通过该媒体应用的获取信息效率。
进而在一个实施例中,搜索词推送模块640可以如图8所示进一步包括:
搜索数据获取单元641,用于获取多个用户在所述第一媒体应用使用所述至少一个媒体关键词的搜索行为统计数据。
搜索词确定单元642,用于根据所述至少一个媒体关键词在所述至少一个媒体信息中的分词频率统计数据和所述至少一个媒体关键词在所述第一媒体应用的搜索行为统计数据,在所述至少一个媒体关键词中确定媒体搜索词。
根据所述媒体关键词在所述至少一个媒体信息中的分词频率统计数据可以分析得到用户对某个媒体关键词的关注程度或感兴趣程度,而根据媒体关键词在所述第一媒体应用的搜索行为统计数据可以得到该媒体关键词在第一媒体应用的搜索热度,综合这两方面可以计算得到某个媒体关键词的推荐分值,进而将推荐分值最高的若干个媒体关键词作 为媒体搜索词推送给第一媒体应用。
例如基于如下公式计算推荐分值:RecommScore=KeyScore(i)*qv(i)/qv_max,其中KeyScore(i)为第i个媒体关键词在所述至少一个媒体信息中的分词频率统计数据确定的权重分值,例如为TF-IDF值,qv(i)指第i个媒体关键词在第一媒体应用一段时间内的被搜索次数;qv_max是所有qv的最大搜索次数,此处qv_max用来做归一化,为了避免推荐分值的数值过高。
搜索词推送单元643,用于向所述第一媒体应用推送所述确定得到的媒体搜索词。
这里需要指出的是,上述媒体搜索词推送装置可以为PC这种电子设备,还可以为如PAD,平板电脑,手提电脑这种便携电子设备,不限于这里的描述;也可以是通过集群服务器构成的,为实现各单元功能而合并为一实体或各单元功能分体设置的电子设备,媒体搜索词推送装置至少包括用于存储数据的数据库和用于数据处理的处理器,可以包括内置的存储介质或独立设置的存储介质。
其中,对于用于数据处理的处理器而言,在执行处理时,可以采用微处理器、中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital SingnalProcessor)或可编程逻辑阵列(FPGA,Field-Programmable Gate Array)实现;对于存储介质来说,包含操作指令,该操作指令可以为计算机可执行代码,通过所述操作指令来实现上述本发明实施例如图2或4-5所示的媒体搜索词推送方法流程中的各个步骤。
媒体搜索词推送装置作为硬件实体的一个示例如图9所示。所述装置包括处理器901、存储介质902以及至少一个外部通信接口903;所述处理器901、存储介质902以及通信接口903均通过总线904连接。
媒体搜索词推送装置中的处理器901可以调用存储介质902中的操作指令执行以下流程:
获取第一媒体应用的当前用户的用户标识信息;
根据所述用户标识信息,获取所述用户的关联用户使用第二媒体应用的用户行为数据,所述用户行为数据包括所述关联用户使用第二媒体应用的用户行为对应的至少一个媒体信息;
根据所述至少一个媒体信息中包含的分词的分词频率统计数据,从所述至少一个媒体信息中包含的分词中提取得到至少一个媒体关键词;
根据所述至少一个媒体关键词向所述第一媒体应用推送媒体搜索词。
这里需要指出的是:以上涉及媒体搜索词推送装置的描述,与前文媒体搜索词推送方法的描述是类似的,同方法的有益效果描述,不做赘述。对于本发明媒体搜索词推送装置实施例中未披露的技术细节,请参照本发明方法实施例的描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本发明各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。
Claims (25)
- 一种媒体搜索词推送方法,应用于媒体搜索词推送装置,其特征在于,所述方法包括:获取第一媒体应用的当前用户的用户标识信息;根据所述用户标识信息,获取所述用户的关联用户使用第二媒体应用的用户行为数据,所述用户行为数据包括所述关联用户使用第二媒体应用的用户行为对应的至少一个媒体信息;根据所述至少一个媒体信息中包含的分词的分词频率统计数据,从所述至少一个媒体信息中包含的分词中提取得到至少一个媒体关键词;根据所述至少一个媒体关键词向所述第一媒体应用推送媒体搜索词。
- 如权利要求1所述的媒体搜索词推送方法,其特征在于,所述根据所述至少一个媒体关键词向所述第一媒体应用推送媒体搜索词包括:获取多个用户在所述第一媒体应用使用所述至少一个媒体关键词的搜索行为统计数据;根据所述至少一个媒体关键词在所述至少一个媒体信息中的分词频率统计数据和所述至少一个媒体关键词在所述第一媒体应用的搜索行为统计数据,在所述至少一个媒体关键词中确定媒体搜索词;向所述第一媒体应用推送所述确定得到的媒体搜索词。
- 如权利要求1所述的媒体搜索词推送方法,其特征在于,所述根据所述至少一个媒体信息中包含的分词的分词频率统计数据,从所述至少一个媒体信息中包含的分词中提取得到至少一个媒体关键词包括:根据预设的所述第一媒体应用的关联分词集合,在所述至少一个媒体信息中确定得到至少一个关联媒体信息,所述关联媒体信息中包含至少一个所述第一媒体应用的关联分词;根据所述至少一个关联媒体信息中包含的分词的分词频率统计数 据,从所述至少一个关联媒体信息中包含的分词中提取得到至少一个媒体关键词。
- 如权利要求3所述的媒体搜索词推送方法,其特征在于,所述关联媒体信息中包含至少一个所述第一媒体应用的关联分词为:所述关联媒体信息的标题、摘要或关键词标签中包含至少一个所述第一媒体应用的关联分词。
- 如权利要求1所述的媒体搜索词推送方法,其特征在于,所述根据所述至少一个媒体信息中包含的分词的分词频率统计数据,从所述至少一个媒体信息中包含的分词中提取得到至少一个媒体关键词包括:根据所述至少一个媒体信息中包含的分词的分词频率统计数据,从所述至少一个媒体信息中包含的分词中提取得到至少一个权重关键词;根据预设的所述第一媒体应用的关联分词集合,在所述至少一个权重关键词中确定得到至少一个媒体关键词,所述媒体关键词为第一媒体应用的关联分词集合中的关联分词。
- 如权利要求1所述的媒体搜索词推送方法,其特征在于,所述分词频率统计数据包括词频-逆文档频率。
- 如权利要求1所述的媒体搜索词推送方法,其特征在于,所述第一媒体应用为网络音乐应用。
- 如权利要求1所述的媒体搜索词推送方法,其特征在于,所述第二媒体应用为网络新闻应用、网络视频应用或浏览器应用。
- 如权利要求1所述的媒体搜索词推送方法,其特征在于,根据所述用户标识信息,获取所述用户的关联用户使用第二媒体应用的用户行为数据之前,所述方法进一步包括:预先建立第一媒体应用的当前用户的用户标识信息与第二媒体应用中该用户的关联用户的用户标识信息的关联关系。
- 如权利要求1所述的媒体搜索词推送方法,其特征在于,所述第一媒体应用的当前用户的用户标识信息为第一用户账号,所述第二媒体应用中该用户的关联用户的用户标识信息为第二用户账号,所述第一 用户账号与第二用户账号相同或不同。
- 如权利要求10所述的媒体搜索词推送方法,其特征在于,所述预先建立第一媒体应用的当前用户的用户标识信息与第二媒体应用中该用户的关联用户的用户标识信息的关联关系包括:在创建第二用户账号时,建立所述第一用户账号与所述第二用户账号的关联关系。
- 如权利要求10所述的媒体搜索词推送方法,其特征在于,所述预先建立第一媒体应用的当前用户的用户标识信息与第二媒体应用中该用户的关联用户的用户标识信息的关联关系包括:在创建第二用户账号后,使用所述第二媒体应用的过程中,建立所述第一用户账号和所述第二用户账号的关联关系。
- 一种媒体搜索词推送装置,其特征在于,所述装置包括处理器和存储器,所述存储器中存储可被所述处理器执行的指令,当执行所述指令时,所述处理器用于:获取第一媒体应用的当前用户的用户标识信息;根据所述用户标识信息,获取所述用户的关联用户使用第二媒体应用的用户行为数据,所述用户行为数据包括所述关联用户使用第二媒体应用的用户行为对应的至少一个媒体信息;根据所述至少一个媒体信息中包含的分词的分词频率统计数据,从所述至少一个媒体信息中包含的分词中提取得到至少一个媒体关键词;根据所述至少一个媒体关键词向所述第一媒体应用推送媒体搜索词。
- 如权利要求13所述的媒体搜索词推送装置,其特征在于,当执行所述指令时,所述处理器进一步用于:获取多个用户在所述第一媒体应用使用所述至少一个媒体关键词的搜索行为统计数据;根据所述至少一个媒体关键词在所述至少一个媒体信息中的分词频率统计数据和所述至少一个媒体关键词在所述第一媒体应用的搜索 行为统计数据,在所述至少一个媒体关键词中确定媒体搜索词;向所述第一媒体应用推送所述确定得到的媒体搜索词。
- 如权利要求13所述的媒体搜索词推送装置,其特征在于,当执行所述指令时,所述处理器进一步用于:根据预设的所述第一媒体应用的关联分词集合,在所述至少一个媒体信息中确定得到至少一个关联媒体信息,所述关联媒体信息中包含至少一个所述第一媒体应用的关联分词;根据所述至少一个关联媒体信息中包含的分词的分词频率统计数据,从所述至少一个关联媒体信息中包含的分词中提取得到至少一个媒体关键词。
- 如权利要求15所述的媒体搜索词推送装置,其特征在于,所述关联媒体信息中包含至少一个所述第一媒体应用的关联分词为:所述关联媒体信息的标题、摘要或关键词标签中包含至少一个所述第一媒体应用的关联分词。
- 如权利要求13所述的媒体搜索词推送装置,其特征在于,当执行所述指令时,所述处理器进一步用于:根据所述至少一个媒体信息中包含的分词的分词频率统计数据,从所述至少一个媒体信息中包含的分词中提取得到至少一个权重关键词;根据预设的所述第一媒体应用的关联分词集合,在所述至少一个权重关键词中确定得到至少一个媒体关键词,所述媒体关键词为第一媒体应用的关联分词集合中的关联分词。
- 如权利要求13所述的媒体搜索词推送装置,其特征在于,所述分词频率统计数据包括词频-逆文档频率。
- 如权利要求13所述的媒体搜索词推送装置,其特征在于,所述第一媒体应用为网络音乐应用。
- 如权利要求13所述的媒体搜索词推送装置,其特征在于,所述第二媒体应用为网络新闻应用、网络视频应用或浏览器应用。
- 如权利要求13所述的媒体搜索词推送装置,其特征在于,当 执行所述指令时,所述处理器进一步用于:预先建立第一媒体应用的当前用户的用户标识信息与第二媒体应用中该用户的关联用户的用户标识信息的关联关系。
- 如权利要求13所述的媒体搜索词推送装置,其特征在于,所述第一媒体应用的当前用户的用户标识信息为第一用户账号,所述第二媒体应用中该用户的关联用户的用户标识信息为第二用户账号,所述第一用户账号与第二用户账号相同或不同。
- 如权利要求22所述的媒体搜索词推送装置,其特征在于,当执行所述指令时,所述处理器进一步用于:在创建第二用户账号时,建立所述第一用户账号与所述第二用户账号的关联关系。
- 如权利要求22所述的媒体搜索词推送装置,其特征在于,当执行所述指令时,所述处理器进一步用于:在创建第二用户账号后,使用所述第二媒体应用的过程中,建立所述第一用户账号和所述第二用户账号的关联关系。
- 一种非易失性计算机存储介质,其特征在于,其中存储有计算机程序,该计算机程序用于执行所述权利要求1至12任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710135931.6A CN108304422B (zh) | 2017-03-08 | 2017-03-08 | 一种媒体搜索词推送方法和装置 |
CN201710135931.6 | 2017-03-08 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018161880A1 true WO2018161880A1 (zh) | 2018-09-13 |
Family
ID=62872018
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/078084 WO2018161880A1 (zh) | 2017-03-08 | 2018-03-06 | 媒体搜索词推送方法、装置及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108304422B (zh) |
WO (1) | WO2018161880A1 (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110941766A (zh) * | 2019-12-10 | 2020-03-31 | 北京字节跳动网络技术有限公司 | 一种信息推送的方法、装置、计算机设备及存储介质 |
CN111415176A (zh) * | 2018-12-19 | 2020-07-14 | 杭州海康威视数字技术股份有限公司 | 一种满意度评价方法、装置及电子设备 |
CN111737501A (zh) * | 2020-06-22 | 2020-10-02 | 北京百度网讯科技有限公司 | 一种内容推荐方法及装置、电子设备、存储介质 |
CN112182358A (zh) * | 2019-07-05 | 2021-01-05 | 百度在线网络技术(北京)有限公司 | 一种多媒体推送计划的创建方法和创建系统 |
CN113704591A (zh) * | 2021-09-06 | 2021-11-26 | 北京雷石天地电子技术有限公司 | 一种媒体数据分析方法、装置、计算机设备和存储介质 |
CN114385903A (zh) * | 2020-10-22 | 2022-04-22 | 腾讯科技(深圳)有限公司 | 应用账号的识别方法、装置、电子设备及可读存储介质 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717038B (zh) * | 2019-09-17 | 2022-10-04 | 腾讯科技(深圳)有限公司 | 对象分类方法及装置 |
CN113536244A (zh) * | 2021-07-15 | 2021-10-22 | 维沃移动通信(杭州)有限公司 | 信息处理方法、信息处理装置、电子设备和可读存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103425650A (zh) * | 2012-05-15 | 2013-12-04 | 腾讯科技(深圳)有限公司 | 推荐搜索方法和系统 |
US20140143246A1 (en) * | 2011-08-02 | 2014-05-22 | Tencent Technology (Shenzhen) Company Limited | Search Method, System and Device |
CN104239571A (zh) * | 2014-09-30 | 2014-12-24 | 北京奇虎科技有限公司 | 一种进行应用推荐的方法和装置 |
CN104239450A (zh) * | 2014-09-01 | 2014-12-24 | 百度在线网络技术(北京)有限公司 | 搜索推荐方法和装置 |
CN104516915A (zh) * | 2013-09-30 | 2015-04-15 | 腾讯科技(北京)有限公司 | 一种基于微博timeline的媒体数据发布方法和装置 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7730012B2 (en) * | 2004-06-25 | 2010-06-01 | Apple Inc. | Methods and systems for managing data |
US9703892B2 (en) * | 2005-09-14 | 2017-07-11 | Millennial Media Llc | Predictive text completion for a mobile communication facility |
AU2008236708A1 (en) * | 2007-04-03 | 2008-10-16 | Grape Technology Group Inc. | System and method for customized search engine and search result optimization |
CN102479366A (zh) * | 2010-11-25 | 2012-05-30 | 阿里巴巴集团控股有限公司 | 一种商品推荐方法及系统 |
WO2015096609A1 (zh) * | 2013-12-26 | 2015-07-02 | 乐视网信息技术(北京)股份有限公司 | 视频资源的倒排索引文件建立方法及其系统 |
CN104572889B (zh) * | 2014-12-24 | 2016-10-05 | 深圳市腾讯计算机系统有限公司 | 一种搜索词推荐方法、装置和系统 |
CN104834698A (zh) * | 2015-04-27 | 2015-08-12 | 百度在线网络技术(北京)有限公司 | 信息推送方法和装置 |
CN105095474B (zh) * | 2015-08-11 | 2018-12-14 | 北京奇虎科技有限公司 | 建立搜索词与应用数据推荐关系的方法及装置 |
CN105808685B (zh) * | 2016-03-02 | 2021-09-28 | 腾讯科技(深圳)有限公司 | 推广信息的推送方法及装置 |
-
2017
- 2017-03-08 CN CN201710135931.6A patent/CN108304422B/zh active Active
-
2018
- 2018-03-06 WO PCT/CN2018/078084 patent/WO2018161880A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140143246A1 (en) * | 2011-08-02 | 2014-05-22 | Tencent Technology (Shenzhen) Company Limited | Search Method, System and Device |
CN103425650A (zh) * | 2012-05-15 | 2013-12-04 | 腾讯科技(深圳)有限公司 | 推荐搜索方法和系统 |
CN104516915A (zh) * | 2013-09-30 | 2015-04-15 | 腾讯科技(北京)有限公司 | 一种基于微博timeline的媒体数据发布方法和装置 |
CN104239450A (zh) * | 2014-09-01 | 2014-12-24 | 百度在线网络技术(北京)有限公司 | 搜索推荐方法和装置 |
CN104239571A (zh) * | 2014-09-30 | 2014-12-24 | 北京奇虎科技有限公司 | 一种进行应用推荐的方法和装置 |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111415176A (zh) * | 2018-12-19 | 2020-07-14 | 杭州海康威视数字技术股份有限公司 | 一种满意度评价方法、装置及电子设备 |
CN111415176B (zh) * | 2018-12-19 | 2023-06-30 | 杭州海康威视数字技术股份有限公司 | 一种满意度评价方法、装置及电子设备 |
CN112182358A (zh) * | 2019-07-05 | 2021-01-05 | 百度在线网络技术(北京)有限公司 | 一种多媒体推送计划的创建方法和创建系统 |
CN112182358B (zh) * | 2019-07-05 | 2024-04-30 | 百度在线网络技术(北京)有限公司 | 一种多媒体推送计划的创建方法和创建系统 |
CN110941766A (zh) * | 2019-12-10 | 2020-03-31 | 北京字节跳动网络技术有限公司 | 一种信息推送的方法、装置、计算机设备及存储介质 |
CN110941766B (zh) * | 2019-12-10 | 2023-10-20 | 北京字节跳动网络技术有限公司 | 一种信息推送的方法、装置、计算机设备及存储介质 |
CN111737501A (zh) * | 2020-06-22 | 2020-10-02 | 北京百度网讯科技有限公司 | 一种内容推荐方法及装置、电子设备、存储介质 |
CN114385903A (zh) * | 2020-10-22 | 2022-04-22 | 腾讯科技(深圳)有限公司 | 应用账号的识别方法、装置、电子设备及可读存储介质 |
CN114385903B (zh) * | 2020-10-22 | 2024-02-06 | 腾讯科技(深圳)有限公司 | 应用账号的识别方法、装置、电子设备及可读存储介质 |
CN113704591A (zh) * | 2021-09-06 | 2021-11-26 | 北京雷石天地电子技术有限公司 | 一种媒体数据分析方法、装置、计算机设备和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN108304422B (zh) | 2021-12-17 |
CN108304422A (zh) | 2018-07-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018161880A1 (zh) | 媒体搜索词推送方法、装置及存储介质 | |
US8977573B2 (en) | System and method for identifying customers in social media | |
CN107172151B (zh) | 用于推送信息的方法和装置 | |
US20170185654A1 (en) | Method and server for pushing information proactively | |
CN110413875B (zh) | 一种文本信息推送的方法以及相关装置 | |
CA2896819C (en) | Identifying business type using public information | |
EP3021264A1 (en) | Information recommendation method and apparatus in social media | |
US20180189909A1 (en) | Patentability search and analysis | |
WO2015196793A1 (zh) | 热点信息分析方法、设备和计算机存储介质 | |
US12002036B2 (en) | Enhancing merchant databases using crowdsourced browser data | |
WO2016078533A1 (zh) | 搜索方法、装置、设备及非易失性计算机存储介质 | |
WO2015081848A1 (zh) | 社交化扩展搜索方法及相应的装置、系统 | |
WO2019024497A1 (zh) | 客户回访事件的生成方法、装置、终端设备及介质 | |
US20130152155A1 (en) | Providing user attributes to complete an online transaction | |
KR20150041592A (ko) | 피호출자의 전자 디바이스에서 연락처 정보를 업데이트하는 방법 및 전자 디바이스 | |
US10693897B2 (en) | Behavioral and account fingerprinting | |
CN106202440B (zh) | 数据处理方法、装置及设备 | |
TWI575391B (zh) | 社群資料篩選系統、方法及其非揮發性電腦可讀取紀錄媒體 | |
US9843559B2 (en) | Method for determining validity of command and system thereof | |
US20160314477A1 (en) | Identifying entities trending in a professional community | |
CN110909233A (zh) | 一种舆情分析方法及电子设备 | |
RU2702275C1 (ru) | Способ и система маркировки действий пользователя для последующего анализа и накопления | |
CN110674386B (zh) | 资源推荐方法、装置及存储介质 | |
CN112507220A (zh) | 信息推送方法、装置及介质 | |
JP2018013819A (ja) | ビジネスマッチング支援システムおよびビジネスマッチング支援方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18764798 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18764798 Country of ref document: EP Kind code of ref document: A1 |