CN116303983A - A keyword recommendation method, device and electronic equipment - Google Patents
A keyword recommendation method, device and electronic equipment Download PDFInfo
- Publication number
- CN116303983A CN116303983A CN202111569802.0A CN202111569802A CN116303983A CN 116303983 A CN116303983 A CN 116303983A CN 202111569802 A CN202111569802 A CN 202111569802A CN 116303983 A CN116303983 A CN 116303983A
- Authority
- CN
- China
- Prior art keywords
- candidate
- search
- word set
- word
- recommended
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 239000013598 vector Substances 0.000 claims description 49
- 238000004590 computer program Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 5
- 238000012163 sequencing technique Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 9
- 240000008042 Zea mays Species 0.000 description 8
- 230000006399 behavior Effects 0.000 description 8
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 description 4
- 235000002017 Zea mays subsp mays Nutrition 0.000 description 4
- 230000002457 bidirectional effect Effects 0.000 description 4
- 235000005822 corn Nutrition 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 238000012790 confirmation Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 241000220225 Malus Species 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 235000021016 apples Nutrition 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000002420 orchard Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请公开了一种关键词推荐方法、装置及电子设备,涉及互联网技术领域,以解决现有关键词推荐技术不能很好匹配用户真实需求,以致推荐准确性较差的问题。该方法包括:获取输入的搜索词,以及获取输入所述搜索词的时间信息;确定所述时间信息所属的目标时间类别;从预先建立的用户搜索日志数据库中,确定对应所述目标时间类别的目标用户搜索记录集,所述用户搜索日志数据库中存储有多个分别对应不同时间类别的用户搜索记录集;从所述目标用户搜索记录集中确定与所述搜索词关联的第一候选词集;基于所述第一候选词集,确定推荐词集并进行推荐。本申请实施例可根据用户搜索时间对应的搜索记录,为用户推荐更符合其期望的信息,提高推荐准确性。
The present application discloses a keyword recommendation method, device and electronic equipment, which relate to the field of Internet technology and solve the problem that the existing keyword recommendation technology cannot well match the real needs of users, resulting in poor recommendation accuracy. The method includes: acquiring the input search word and time information of the input search word; determining the target time category to which the time information belongs; A target user search record set, a plurality of user search record sets corresponding to different time categories are stored in the user search log database; a first candidate word set associated with the search term is determined from the target user search record set; Based on the first candidate word set, a recommended word set is determined and recommended. According to the search record corresponding to the user's search time, the embodiment of the present application can recommend information that is more in line with the user's expectations and improve the accuracy of the recommendation.
Description
技术领域technical field
本申请涉及互联网技术领域,尤其涉及一种关键词推荐方法、装置及电子设备。The present application relates to the technical field of the Internet, in particular to a keyword recommendation method, device and electronic equipment.
背景技术Background technique
关键词推荐技术是搜索引擎中经常使用的技术,也被称为查询推荐技术。在查询过程中使用关键词推荐技术可以快速并较为准确地帮助用户定位到其要查询的具体信息,提升用户查询体验及节省搜索时间。Keyword recommendation technology is a technology often used in search engines, also known as query recommendation technology. Using keyword recommendation technology in the query process can quickly and accurately help users locate the specific information they want to query, improve user query experience and save search time.
现有关键词推荐技术中,通常是根据多用户历史搜索行为,对推荐内容进行优选,然而,这种方式仍然存在推荐结果不能很好匹配用户真实需求,以致推荐准确性较差的问题。In the existing keyword recommendation technology, the recommended content is usually optimized based on the historical search behavior of multiple users. However, this method still has the problem that the recommendation results cannot match the real needs of users well, resulting in poor recommendation accuracy.
发明内容Contents of the invention
本申请实施例提供一种关键词推荐方法、装置及电子设备,以解决现有关键词推荐技术不能很好匹配用户真实需求,以致推荐准确性较差的问题。Embodiments of the present application provide a keyword recommendation method, device, and electronic equipment to solve the problem that the existing keyword recommendation technology cannot well match the real needs of users, resulting in poor recommendation accuracy.
第一方面,本申请实施例提供了一种关键词推荐方法,包括:In the first aspect, the embodiment of the present application provides a keyword recommendation method, including:
获取输入的搜索词,以及获取输入所述搜索词的时间信息;Obtaining the input search term, and obtaining time information when the search term is input;
确定所述时间信息所属的目标时间类别;determining the target time category to which the time information belongs;
从预先建立的用户搜索日志数据库中,确定对应所述目标时间类别的目标用户搜索记录集,其中,所述用户搜索日志数据库中存储有多个分别对应不同时间类别的用户搜索记录集;From a pre-established user search log database, determine a target user search record set corresponding to the target time category, wherein a plurality of user search record sets respectively corresponding to different time categories are stored in the user search log database;
从所述目标用户搜索记录集中确定与所述搜索词关联的第一候选词集;determining a first candidate word set associated with the search word from the target user search record set;
基于所述第一候选词集,确定推荐词集并进行推荐。Based on the first candidate word set, a recommended word set is determined and recommended.
可选地,所述从预先建立的用户搜索日志数据库中,确定对应所述目标时间类别的目标用户搜索记录集之前,所述方法还包括:Optionally, before determining the target user search record set corresponding to the target time category from the pre-established user search log database, the method further includes:
将时间按照日期属性和/或时间段属性,划分多个时间类别;Divide time into multiple time categories according to date attributes and/or time period attributes;
获取用户搜索日志数据;Obtain user search log data;
将所述用户搜索日志数据中的各用户搜索记录,按照历史搜索时间所属的时间类别进行分类,得到分别属于不同时间类别的多个用户搜索记录集;Classify each user search record in the user search log data according to the time category to which the historical search time belongs, and obtain a plurality of user search record sets belonging to different time categories;
将所述多个用户搜索记录集按时间类别标签存储至所述用户搜索日志数据库。The plurality of user search record sets are stored in the user search log database according to time category tags.
可选地,所述将时间按照日期属性和/或时间段属性,划分多个时间类别,包括:Optionally, the dividing time into multiple time categories according to date attributes and/or time period attributes includes:
将时间按照日期属性划分法定节假日、其他节日和非节日,并将所述其他节日和非节日,按照时间段属性划分工作时间和非工作时间,得到包括法定节假日、其他节日中的工作时间、其他节日中的非工作时间、非节日中的工作时间和非节日中的非工作时间的五个时间类别,其中,所述其他节日为非法定的节日。Divide the time into legal holidays, other holidays and non-holidays according to the date attribute, and divide the other holidays and non-holidays into working hours and non-working hours according to the time period attribute to obtain legal holidays, working hours in other festivals, other Five time categories of non-working time in festivals, working hours in non-holidays and non-working hours in non-holidays, wherein the other festivals are non-statutory festivals.
可选地,所述获取输入的搜索词,包括:Optionally, the acquisition of input search terms includes:
获取初始输入的第一搜索词;Obtain the first search term of the initial input;
在识别所述第一搜索词存在歧义的情况下,显示所述第一搜索词的多个含义;displaying multiple meanings of the first search term in the event that the first search term is identified to be ambiguous;
基于用户选择的含义,确定用户确认输入的搜索词。Based on the meaning selected by the user, it is determined that the search term entered by the user is confirmed.
可选地,所述获取输入的搜索词之后,所述基于所述第一候选词集,确定推荐词集并进行推荐之前,所述方法还包括:Optionally, after acquiring the input search words, before determining the recommended word set based on the first candidate word set and recommending, the method further includes:
确定所述搜索词的词向量;determining a word vector for the search term;
从预先构建的词向量集合中,确定与所述搜索词的词向量相似的相似词向量集,得到所述相似词向量集对应的相似词集;From the pre-built word vector set, determine a similar word vector set similar to the word vector of the search term, and obtain a similar word set corresponding to the similar word vector set;
使用所述相似词集中的词对所述搜索词进行替换处理,得到第二候选词集;Using words in the similar word set to replace the search word to obtain a second candidate word set;
所述基于所述第一候选词集,确定推荐词集并进行推荐,包括:The determining and recommending a recommended word set based on the first candidate word set includes:
基于所述第一候选词集和所述第二候选词集,确定推荐词集并进行推荐。Based on the first candidate word set and the second candidate word set, a recommended word set is determined and recommended.
可选地,所述基于所述第一候选词集和所述第二候选词集,确定推荐词集并进行推荐,包括:Optionally, the determining and recommending a recommended word set based on the first candidate word set and the second candidate word set includes:
将所述第一候选词集中的候选词与所述第二候选词集中的候选词进行相似比较;comparing the candidate words in the first candidate word set with the candidate words in the second candidate word set;
基于相似比较结果,确定所述第一候选词集中的第一推荐词集和所述第二候选词集中的第二推荐词集;Based on similar comparison results, determining a first recommended word set in the first candidate word set and a second recommended word set in the second candidate word set;
推荐所述第一推荐词集与所述第二推荐词集。Recommending the first recommended word set and the second recommended word set.
可选地,所述基于相似比较结果,确定所述第一候选词集中的第一推荐词集和所述第二候选词集中的第二推荐词集,包括:Optionally, the determining the first recommended word set in the first candidate word set and the second recommended word set in the second candidate word set based on similarity comparison results includes:
确定所述第一候选词集中的第一组候选词和第二组候选词为所述第一推荐词集,以及确定所述第二候选词集中的第三组候选词为所述第二推荐词集;Determining the first group of candidate words and the second group of candidate words in the first candidate word set as the first recommended word set, and determining the third group of candidate words in the second candidate word set as the second recommendation vocabulary;
其中,所述第一组候选词包括所述第一候选词集中,与所述第二候选词集中的任一候选词相似的候选词;所述第二组候选词包括所述第一候选词集中,与所述第二候选词集中的任一候选词均不相似的候选词;所述第三组候选词包括所述第二候选词集中,与所述第一候选词集中的任一候选词均不相似的候选词;Wherein, the first group of candidate words includes the first candidate word set, a candidate word similar to any candidate word in the second candidate word set; the second set of candidate words includes the first candidate word set, a candidate word that is not similar to any candidate word in the second candidate word set; the third group of candidate words includes the second candidate word set, and any candidate word in the first candidate word set Candidates whose words are not similar;
所述推荐所述第一推荐词集与所述第二推荐词集,包括:The recommending the first recommended word set and the second recommended word set includes:
对所述第一推荐词集与所述第二推荐词集,按照所述第一组候选词、所述第二组候选词、所述第三组候选词的先后顺序进行排序后推荐。The first recommended word set and the second recommended word set are sorted and recommended according to the order of the first group of candidate words, the second group of candidate words, and the third group of candidate words.
第二方面,本申请实施例还提供一种关键词推荐装置,包括:In the second aspect, the embodiment of the present application also provides a keyword recommendation device, including:
第一获取模块,用于获取输入的搜索词,以及获取输入所述搜索词的时间信息;The first obtaining module is used to obtain the input search term, and obtain the time information of inputting the search term;
第一确定模块,用于确定所述时间信息所属的目标时间类别;a first determining module, configured to determine the target time category to which the time information belongs;
第二确定模块,用于从预先建立的用户搜索日志数据库中,确定对应所述目标时间类别的目标用户搜索记录集,其中,所述用户搜索日志数据库中存储有多个分别对应不同时间类别的用户搜索记录集;The second determining module is configured to determine a target user search record set corresponding to the target time category from a pre-established user search log database, wherein the user search log database stores a plurality of records corresponding to different time categories user search record set;
第三确定模块,用于从所述目标用户搜索记录集中确定与所述搜索词关联的第一候选词集;A third determining module, configured to determine a first candidate word set associated with the search word from the target user search record set;
推荐模块,用于基于所述第一候选词集,确定推荐词集并进行推荐。A recommendation module, configured to determine and recommend a recommended word set based on the first candidate word set.
可选地,所述关键词推荐装置还包括:Optionally, the keyword recommendation device also includes:
划分模块,用于将时间按照日期属性和/或时间段属性,划分多个时间类别;A division module, used to divide time into multiple time categories according to date attributes and/or time period attributes;
第二获取模块,用于获取用户搜索日志数据;The second obtaining module is used to obtain user search log data;
分类模块,用于将所述用户搜索日志数据中的各用户搜索记录,按照历史搜索时间所属的时间类别进行分类,得到分别属于不同时间类别的多个用户搜索记录集;A classification module, configured to classify each user search record in the user search log data according to the time category to which the historical search time belongs, to obtain a plurality of user search record sets belonging to different time categories;
存储模块,用于将所述多个用户搜索记录集按时间类别标签存储至所述用户搜索日志数据库。A storage module, configured to store the plurality of user search record sets in the user search log database according to time category tags.
可选地,所述划分模块用于将时间按照日期属性划分法定节假日、其他节日和非节日,并将所述其他节日和非节日,按照时间段属性划分工作时间和非工作时间,得到包括法定节假日、其他节日中的工作时间、其他节日中的非工作时间、非节日中的工作时间和非节日中的非工作时间的五个时间类别,其中,所述其他节日为非法定的节日。Optionally, the division module is used to divide the time into legal holidays, other holidays and non-holidays according to the date attribute, and divide the other holidays and non-holidays into working hours and non-working hours according to the time period attribute to obtain legal Five time categories of holidays, working hours in other festivals, non-working hours in other festivals, working hours in non-holidays and non-working hours in non-holidays, wherein the other festivals are non-statutory festivals.
可选地,所述第一获取模块包括:Optionally, the first acquisition module includes:
获取单元,用于获取初始输入的第一搜索词;an acquisition unit, configured to acquire the first search term initially input;
显示单元,用于在识别所述第一搜索词存在歧义的情况下,显示所述第一搜索词的多个含义;A display unit, configured to display multiple meanings of the first search term when it is identified that the first search term is ambiguous;
第一确定单元,用于基于用户选择的含义,确定用户确认输入的搜索词。The first determining unit is configured to determine the search word input by the user for confirmation based on the meaning selected by the user.
可选地,所述关键词推荐装置还包括:Optionally, the keyword recommendation device also includes:
第四确定模块,用于确定所述搜索词的词向量;The fourth determination module is used to determine the word vector of the search word;
第五确定模块,用于从预先构建的词向量集合中,确定与所述搜索词的词向量相似的相似词向量集,得到所述相似词向量集对应的相似词集;The fifth determination module is used to determine a similar word vector set similar to the word vector of the search word from the pre-built word vector set, and obtain a similar word set corresponding to the similar word vector set;
处理模块,用于使用所述相似词集中的词对所述搜索词进行替换处理,得到第二候选词集;A processing module, configured to use words in the similar word set to replace the search word to obtain a second candidate word set;
所述推荐模块用于基于所述第一候选词集和所述第二候选词集,确定推荐词集并进行推荐。The recommending module is configured to determine and recommend a recommended word set based on the first candidate word set and the second candidate word set.
可选地,所述推荐模块包括:Optionally, the recommendation module includes:
相似比较单元,用于将所述第一候选词集中的候选词与所述第二候选词集中的候选词进行相似比较;a similar comparison unit, configured to similarly compare the candidate words in the first candidate word set with the candidate words in the second candidate word set;
第二确定单元,用于基于相似比较结果,确定所述第一候选词集中的第一推荐词集和所述第二候选词集中的第二推荐词集;A second determining unit, configured to determine a first recommended word set in the first candidate word set and a second recommended word set in the second candidate word set based on similarity comparison results;
推荐单元,用于推荐所述第一推荐词集与所述第二推荐词集。A recommending unit is configured to recommend the first recommended word set and the second recommended word set.
可选地,所述第二确定单元用于确定所述第一候选词集中的第一组候选词和第二组候选词为所述第一推荐词集,以及确定所述第二候选词集中的第三组候选词为所述第二推荐词集;Optionally, the second determining unit is configured to determine the first group of candidate words and the second group of candidate words in the first candidate word set as the first recommended word set, and determine the second set of candidate words The third group of candidate words is the second recommended word set;
其中,所述第一组候选词包括所述第一候选词集中,与所述第二候选词集中的任一候选词相似的候选词;所述第二组候选词包括所述第一候选词集中,与所述第二候选词集中的任一候选词均不相似的候选词;所述第三组候选词包括所述第二候选词集中,与所述第一候选词集中的任一候选词均不相似的候选词;Wherein, the first group of candidate words includes the first candidate word set, a candidate word similar to any candidate word in the second candidate word set; the second set of candidate words includes the first candidate word set, a candidate word that is not similar to any candidate word in the second candidate word set; the third group of candidate words includes the second candidate word set, and any candidate word in the first candidate word set Candidates whose words are not similar;
所述推荐单元用于对所述第一推荐词集与所述第二推荐词集,按照所述第一组候选词、所述第二组候选词、所述第三组候选词的先后顺序进行排序后推荐。The recommending unit is used for the first recommended word set and the second recommended word set, according to the order of the first group of candidate words, the second group of candidate words, and the third group of candidate words Recommended after sorting.
第三方面,本申请实施例还提供一种电子设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上所述的关键词推荐方法中的步骤。In the third aspect, the embodiment of the present application also provides an electronic device, including: a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the computer program, the above-mentioned Steps in the keyword recommendation method described above.
第四方面,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储计算机程序,所述计算机程序被处理器执行时实现如上所述的关键词推荐方法中的步骤。In the fourth aspect, the embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the above-mentioned keyword recommendation method is implemented. step.
在本申请实施例中,获取输入的搜索词,以及获取输入所述搜索词的时间信息;确定所述时间信息所属的目标时间类别;从预先建立的用户搜索日志数据库中,确定对应所述目标时间类别的目标用户搜索记录集,其中,所述用户搜索日志数据库中存储有多个分别对应不同时间类别的用户搜索记录集;从所述目标用户搜索记录集中确定与所述搜索词关联的第一候选词集;基于所述第一候选词集,确定推荐词集并进行推荐。这样,通过考虑到用户在不同时间的搜索偏好,对用户搜索记录按照时间进行分类,从而可实现根据用户搜索时间对应的搜索记录,为用户推荐更符合其期望的信息,提高推荐准确性。In the embodiment of the present application, the input search term is obtained, and the time information of the input search term is obtained; the target time category to which the time information belongs is determined; from the pre-established user search log database, the corresponding target is determined A target user search record set of a time category, wherein a plurality of user search record sets respectively corresponding to different time categories are stored in the user search log database; the first search term associated with the search term is determined from the target user search record set A set of candidate words; based on the first set of candidate words, determine a set of recommended words and make recommendations. In this way, by considering the user's search preferences at different times, the user's search records are classified according to time, so that the search records corresponding to the user's search time can be realized, and the information that is more in line with their expectations can be recommended to the user, and the recommendation accuracy can be improved.
附图说明Description of drawings
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that need to be used in the description of the embodiments of the present application will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained based on these drawings without any creative effort.
图1是本申请实施例提供的关键词推荐方法的流程图之一;Fig. 1 is one of the flowcharts of the keyword recommendation method provided by the embodiment of the present application;
图2是本申请实施例提供的关键词推荐方法的流程图之二;Fig. 2 is the second flow chart of the keyword recommendation method provided by the embodiment of the present application;
图3是本申请实施例提供的确定第一候选词集的方法流程图;FIG. 3 is a flowchart of a method for determining a first candidate word set provided by an embodiment of the present application;
图4是本申请实施例提供的确定第二候选词集的方法流程图;FIG. 4 is a flowchart of a method for determining a second set of candidate words provided by an embodiment of the present application;
图5是本申请实施例提供的确定最终推荐词集的方法流程图;Fig. 5 is the flow chart of the method for determining the final recommended word set provided by the embodiment of the present application;
图6是本申请实施例提供的关键词推荐装置的结构图;FIG. 6 is a structural diagram of a keyword recommendation device provided in an embodiment of the present application;
图7是本申请实施例提供的电子设备的结构图。FIG. 7 is a structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.
参见图1,图1是本申请实施例提供的关键词推荐方法的流程图,如图1所示,包括以下步骤:Referring to FIG. 1, FIG. 1 is a flow chart of the keyword recommendation method provided in the embodiment of the present application. As shown in FIG. 1, it includes the following steps:
步骤101、获取输入的搜索词,以及获取输入所述搜索词的时间信息。
本申请实施例中,可以根据用户搜索行为在不同日期及时间内的偏好信息构造推荐词集。In the embodiment of the present application, the recommended word set may be constructed according to the preference information of the user's search behavior on different dates and times.
上述获取输入的搜索词,可以是获取用户输入的表示其搜索意图的搜索关键词,例如,获取用户在搜索引擎中输入的用于搜索的关键词“玉米”、“今天天气”或“有关房屋租赁的合同模板”等。The above-mentioned acquisition of the input search term may be acquisition of the search keyword input by the user to indicate its search intention, for example, acquisition of the keyword "corn", "today's weather" or "related houses" entered by the user in the search engine for search Lease contract template", etc.
上述获取输入所述搜索词的时间信息,即是获取用户输入所述搜索词时的时间信息,例如,可以获取所述搜索词的输入时间戳。The acquisition of the time information of the input of the search term is to acquire the time information when the user inputs the search term, for example, the input time stamp of the search term may be acquired.
可选地,所述获取输入的搜索词,包括:Optionally, the acquisition of input search terms includes:
获取初始输入的第一搜索词;Obtain the first search term of the initial input;
在识别所述第一搜索词存在歧义的情况下,显示所述第一搜索词的多个含义;displaying multiple meanings of the first search term in the event that the first search term is identified to be ambiguous;
基于用户选择的含义,确定用户确认输入的搜索词。Based on the meaning selected by the user, it is determined that the search term entered by the user is confirmed.
一种实施方式中,考虑到搜索词中可能存在的一词多义、语言歧义的问题,可对用户初始输入的第一搜索词进行语义识别,以识别所述第一搜索词是否存在歧义,如是否存在多种含义,例如,对于搜索词“玉米”,其可以是指植物玉米,也可是指某粉丝群名。In one embodiment, considering the polysemy and language ambiguity that may exist in the search term, semantic recognition may be performed on the first search term initially input by the user to identify whether the first search term is ambiguous, For example, whether there are multiple meanings, for example, for the search term "corn", it may refer to the plant corn or the name of a certain fan group.
在识别到用户初始输入的第一搜索词存在歧义的情况下,可以获取所述第一搜索词的多个含义,并将所述多个含义进行展示,以让用户选择确认真实想要搜索的含义。例如,可以获取所述第一搜索词的所有含义并以选项形式展示供用户选择。When it is recognized that there is ambiguity in the first search term initially input by the user, multiple meanings of the first search term can be obtained, and the multiple meanings can be displayed, so that the user can choose to confirm what he really wants to search meaning. For example, all meanings of the first search word may be acquired and displayed in the form of options for the user to choose.
并可基于用户从所述多个含义中选择的能够表示用户真实搜索意图的含义,确定用户确认输入的有明确含义的搜索词。例如,对于用户初始输入的搜索词“玉米”,可提供“植物玉米”和“粉丝群:玉米”两种含义供用户选择,当用户选择“植物玉米”时,可以“植物玉米”作为确认后的搜索词。这样,能够使搜索意图更为明确,避免推荐其他歧义相关的推荐词。And based on the meaning selected by the user from the plurality of meanings that can represent the user's real search intention, determine the search word with a clear meaning that the user confirms to input. For example, for the search term "corn" initially entered by the user, two meanings of "plant corn" and "fan group: corn" can be provided for the user to choose from. When the user selects "plant corn", "plant corn" can be used as the confirmation search terms. In this way, the search intent can be made clearer, and other ambiguously related recommendation words can be avoided from being recommended.
这样,该实施方式中,通过提供一种新颖的搜索输入策略,在用户输入内容有歧义时,会增加用户选择环节,推送输入词条所涵盖的所有含义,供用户选择,引导用户进一步明确搜索意图,从而可为用户推荐更为准确的相关关键词,以及提供更精准的搜索结果。In this way, in this embodiment, by providing a novel search input strategy, when the user input content is ambiguous, a user selection link will be added to push all the meanings covered by the input entry for the user to choose, and guide the user to further clarify the search Intent, so as to recommend more accurate relevant keywords for users and provide more accurate search results.
步骤102、确定所述时间信息所属的目标时间类别。
本申请实施例中,可以对时间进行分类,将不同日期、不同时间段按照特定规则划分为不同的时间类别,例如,划分出节假日、工作日、周末、工作时间、非工作时间等类别。并且可对用户历史搜索记录按时间类别进行分类存储,以便于在发生搜索行为时,根据用户搜索时间,匹配相应时间类别的历史搜索记录。In this embodiment of the application, time can be classified, and different dates and time periods can be divided into different time categories according to specific rules, for example, holidays, working days, weekends, working hours, non-working hours and other categories. And the user's historical search records can be classified and stored by time category, so that when a search behavior occurs, the historical search records of the corresponding time category can be matched according to the user's search time.
该步骤中,可以根据用户输入所述搜索词的时间信息所对应的时间属性,确定该时间信息所属的时间类别,得到所述目标时间类别。例如,用户在2020年06月25日的8点18分输入“端午节来源”,则基于输入时间信息:2020年06月25日的8点18分,可知2020年06月25日为端午节,从而可确定该时间所属的时间类别为节假日。In this step, the time category to which the time information belongs may be determined according to the time attribute corresponding to the time information input by the user to obtain the target time category. For example, if the user enters "Dragon Boat Festival source" at 8:18 on June 25, 2020, based on the input time information: 8:18 on June 25, 2020, it can be known that June 25, 2020 is the Dragon Boat Festival , so that it can be determined that the time category to which this time belongs is a holiday.
步骤103、从预先建立的用户搜索日志数据库中,确定对应所述目标时间类别的目标用户搜索记录集,其中,所述用户搜索日志数据库中存储有多个分别对应不同时间类别的用户搜索记录集。
上述用户搜索日志数据库可以是用于存放用户搜索日志数据的数据库,所述用户搜索日志数据可以是指任一用户搜索时所产生的日志数据,如某搜索网站的用户搜索日志数据库,可以包括各类用户在该搜索网站进行搜索行为时产生的搜索日志数据。The above-mentioned user search log database may be a database for storing user search log data, and the user search log data may refer to log data generated when any user searches, such as a user search log database of a search website, which may include various Search log data generated when a user performs a search on the search website.
本申请实施例中,可以建立所述用户搜索日志数据库,通过收集各用户搜索日志数据(包括搜索时间戳和搜索记录),对各用户搜索日志数据的搜索时间进行分析,并按搜索时间类别对各用户搜索日志数据进行分类,得到分别对应不同时间类别的用户搜索记录集,分类后的用户搜索记录集存储至所述用户搜索日志数据库中。这样,所述用户搜索日志数据库中的每个用户搜索记录集均有各自的时间类别。In the embodiment of the present application, the user search log database can be established, and the search time of each user search log data can be analyzed by collecting each user search log data (including search time stamp and search record), and the search time can be classified according to the search time category. Each user search log data is classified to obtain user search record sets respectively corresponding to different time categories, and the classified user search record sets are stored in the user search log database. In this way, each user search record set in the user search log database has its own time category.
该步骤中,可从所述用户搜索日志数据库中,确定对应所述目标时间类别的目标用户搜索记录集,例如,可从所述用户搜索日志数据库中,查找时间类别为所述目标时间类别的用户搜索记录集,查找到的所述目标时间类别的用户搜索记录集即为所述目标用户搜索记录集。In this step, the target user search record set corresponding to the target time category may be determined from the user search log database, for example, the search record set whose time category is the target time category may be searched from the user search log database The user search record set, the found user search record set of the target time category is the target user search record set.
可选地,所述步骤103之前,所述方法还包括:Optionally, before
将时间按照日期属性和/或时间段属性,划分多个时间类别;Divide time into multiple time categories according to date attributes and/or time period attributes;
获取用户搜索日志数据;Obtain user search log data;
将所述用户搜索日志数据中的各用户搜索记录,按照历史搜索时间所属的时间类别进行分类,得到分别属于不同时间类别的多个用户搜索记录集;Classify each user search record in the user search log data according to the time category to which the historical search time belongs, and obtain a plurality of user search record sets belonging to different time categories;
将所述多个用户搜索记录集按时间类别标签存储至所述用户搜索日志数据库。The plurality of user search record sets are stored in the user search log database according to time category tags.
即一种实施方式中,可将时间按照日期属性、时间段属性等,进行时间类别的划分,如按照日期属性可划分为节日、假日、纪念日、非节假日等,按时间段属性可划分为节工作时间、非工作时间,或者白天、夜间等。这样,通过时间划分,可得到多个不同时间类别。That is, in one embodiment, time can be divided into time categories according to date attributes, time period attributes, etc. For example, according to date attributes, it can be divided into festivals, holidays, anniversaries, non-holidays, etc., and according to time period attributes, it can be divided into holiday working hours, non-working hours, or daytime, nighttime, etc. In this way, through time division, multiple different time categories can be obtained.
上述用户搜索日志数据,可以是指用户在进行搜索行为时产生的搜索日志数据,也即历史搜索记录,上述获取用户搜索日志数据可以是获取一段时间内的用户搜索日志数据,例如,获取近一年、近半年、近三个月、近一月或近一周的用户搜索日志数据,具体获取时段可根据实际需求灵活设定。The above-mentioned user search log data may refer to the search log data generated when the user performs a search behavior, that is, historical search records. The user search log data of the past year, the past six months, the past three months, the past one month or the past week, and the specific acquisition period can be flexibly set according to actual needs.
获取的所述用户搜索日志数据中记录有各用户搜索记录产生的时间,也即历史搜索时间,该实施方式中,为了后续根据用户在不同时间的搜索偏好进行针对性推荐,可以对所述用户搜索日志数据中的各用户搜索记录,按照历史搜索时间进行分类,具体地,可基于已划分的时间类别,确定所述用户搜索日志数据中各用户搜索记录的历史搜索时间属于哪一时间类别,并将历史搜索时间属于同一时间类别的用户搜索记录归为一类,并入一个用户搜索记录集。这样,可将用户搜索日志数据中的用户搜索记录,按历史搜索时间类别,划分为多个用户搜索记录集,每个用户搜索记录集对应一个时间类别。The obtained user search log data records the time when each user's search record was generated, that is, the historical search time. In this embodiment, in order to make targeted recommendations based on the user's search preferences at different times, the user can Each user search record in the search log data is classified according to the historical search time. Specifically, based on the divided time categories, determine which time category the historical search time of each user search record in the user search log data belongs to, And the user search records whose historical search time belongs to the same time category are classified into one category, and merged into one user search record set. In this way, the user search records in the user search log data can be divided into multiple user search record sets according to historical search time categories, and each user search record set corresponds to a time category.
最后,可将所述多个用户搜索记录集存储至所述用户搜索日志数据库,并可对每个用户搜索记录集添加各自的时间类别标签,以进行标记区分。Finally, the plurality of user search record sets may be stored in the user search log database, and respective time category labels may be added to each user search record set for marking distinction.
这样,该实施方式中,通过将时间按照日期属性和/或时间段属性进行类别划分,并对用户搜索日志数据中的各用户搜索记录,按照历史搜索时间所属的时间类别进行分类存储,可归类出用户在不同日期、不同时间段的搜索偏好,有利于后续搜索中,根据用户搜索时间的搜索偏好,进行更为准确、更理解用户真实搜索需求的搜索关键词推荐。In this way, in this embodiment, by classifying the time according to the date attribute and/or time period attribute, and classifying and storing each user search record in the user search log data according to the time category to which the historical search time belongs, it can be classified into Classifying the user's search preferences on different days and different time periods is beneficial for subsequent searches, according to the search preferences of the user's search time, to recommend search keywords that are more accurate and better understand the user's real search needs.
进一步地,所述将时间按照日期属性和/或时间段属性,划分多个时间类别,包括:Further, said dividing time into multiple time categories according to date attribute and/or time period attribute includes:
将时间按照日期属性划分法定节假日、其他节日和非节日,并将所述其他节日和非节日,按照时间段属性划分工作时间和非工作时间,得到包括法定节假日、其他节日中的工作时间、其他节日中的非工作时间、非节日中的工作时间和非节日中的非工作时间的五个时间类别,其中,所述其他节日为非法定的节日。Divide the time into legal holidays, other holidays and non-holidays according to the date attribute, and divide the other holidays and non-holidays into working hours and non-working hours according to the time period attribute to obtain legal holidays, working hours in other festivals, other Five time categories of non-working time in festivals, working hours in non-holidays and non-working hours in non-holidays, wherein the other festivals are non-statutory festivals.
即一种实施方式中,考虑到用户在不同的节假日前后,以及在工作时间和非工作时间,搜索的内容会存在一定倾向性,例如,在端午节前后搜索“屈原”、“粽子”、“龙舟”的可能性要比平时更大。在端午节期间搜索“屈原”,那么用户大概率是想知道端午节的来历,那么搜索引擎为其推荐“端午节来历”就比较合适;而如果是平时搜索“屈原”,那么用户可能更想知道屈原的文学造诣,那么为其推荐”屈原诗词“就更为合适。又例如,在工作时间搜索的内容更倾向于与工作性质相关,而在非工作时间搜索的内容更倾向于生活休闲方面。That is to say, in one embodiment, considering that users search for "Qu Yuan", "Zongzi", etc. "Dragon Boat" is more likely than usual. If you search for "Qu Yuan" during the Dragon Boat Festival, the user probably wants to know the origin of the Dragon Boat Festival, so it is more appropriate for the search engine to recommend "The Origin of the Dragon Boat Festival"; Knowing Qu Yuan's literary attainments, it is more appropriate to recommend "Qu Yuan's Poems" for him. For another example, the content searched during working hours tends to be more related to the nature of work, while the content searched during non-working hours tends to be more related to life and leisure.
因此,该实施方式中,将时间先按照日期属性划分为法定节假日(如元旦、春节、清明节……)、非法定的其他节日(如情人节、植树节、青年节……)和非节日(即除法定节假日和非法定的其他节日外的日期),并将所述其他节日和非节日,进一步按照时间段属性划分工作时间(如9:00-18:00)和非工作时间(如18:00-次日9:00),从而可得到五类时间,分别为法定节假日、其他节日中的工作时间、其他节日中的非工作时间、非节日中的工作时间和非节日中的非工作时间。Therefore, in this embodiment, time is divided into legal holidays (such as New Year's Day, Spring Festival, Ching Ming Festival...), non-statutory other festivals (such as Valentine's Day, Arbor Day, Youth Day...) and non-holidays according to the date attribute. (that is, dates other than legal holidays and other non-statutory holidays), and further divide the other holidays and non-holidays into working hours (such as 9:00-18:00) and non-working hours (such as 18:00-9:00 the next day), so that five types of time can be obtained, which are legal holidays, working hours in other festivals, non-working hours in other festivals, working hours in non-holidays and non-holidays in non-holidays. operating hours.
这样,通过考虑用户在不同节假日和不同工作时间的搜索习惯和偏好,对时间进行合理分类,从而能够保证按照合理的时间类别对用户搜索记录进行归类,进而保证后续根据用户在各时间类别的搜索偏好进行推荐的合理性和准确性。In this way, by considering the user's search habits and preferences during different holidays and different working hours, the time is classified reasonably, so as to ensure that the user's search records are classified according to reasonable time categories, and then ensure that the subsequent search records are classified according to the user's search records in each time category. Reasonableness and accuracy of search preferences for recommendations.
步骤104、从所述目标用户搜索记录集中确定与所述搜索词关联的第一候选词集。
该步骤中,可以从所述目标用户搜索记录集中寻找与所述搜索词关联的搜索记录作为待推荐的候选词集,即所述第一候选词集,所述与所述搜索词关联可以是指相关或相似。例如,可以通过将所述目标用户搜索记录集中的各搜索记录(即历史搜索词)与所述搜索词进行相似性比较,如计算相似度,将所述目标用户搜索记录集中与所述搜索词相似的搜索记录,作为候选词加入所述第一候选词集,所述相似可以是指相似度大于一定阈值,如相似度大于0.8则认为是相似。In this step, search records associated with the search term can be found from the target user search record set as a candidate word set to be recommended, that is, the first candidate word set, and the association with the search term can be Refers to related or similar. For example, by comparing each search record (that is, historical search term) in the target user search record set with the search term for similarity, such as calculating the similarity, the target user search record set can be compared with the search term Similar search records are added to the first candidate word set as candidate words, and the similarity may refer to a similarity greater than a certain threshold, and if the similarity is greater than 0.8, it is considered similar.
步骤105、基于所述第一候选词集,确定推荐词集并进行推荐。Step 105, based on the first candidate word set, determine a recommended word set and make recommendations.
该步骤中,可以基于所述第一候选词集中的候选词,确定推荐词集,具体地,可以是从所述第一候选词集中选择部分候选词,如选择搜索频次较高或与所述搜索词相似度较高的若干个候选词,作为推荐词加入所述推荐词集,也可以是将所述第一候选词集中的所有候选词都作为推荐词加入所述推荐词集,即可直接将所述第一候选词集作为推荐词集。最后可将所述推荐词集中的推荐词推荐给用户,如在用户输入所述搜索词的位置附近逐条显示所述推荐词集的推荐词。In this step, the recommended word set may be determined based on the candidate words in the first candidate word set. Specifically, some candidate words may be selected from the first candidate word set. Several candidate words with higher similarity of search words are added to the recommended word set as recommended words, or all candidate words in the first candidate word set are added to the recommended word set as recommended words. The first candidate word set is directly used as the recommended word set. Finally, the recommended words in the recommended word set may be recommended to the user, for example, the recommended words in the recommended word set are displayed one by one near the position where the user inputs the search word.
下面结合图3,以举例的方式来说明上述确定第一候选词集的具体实施方式:Below in conjunction with Fig. 3, illustrate the above-mentioned specific implementation manner of determining the first candidate word set by way of example:
分析用户日志数据,提取出日期时间——搜索记录结构化数据。将时间按照一定规则进行分类。例如,按年月日区分不同的节日,按时分秒区分工作时间和非工作时间。对于一天中的不同时刻,将其区分为工作时间和非工作时间。例如,9:00-18:00为工作时间区间,18:00-下一天的9:00为非工作时间区间。对于每个时间(年月日-时分秒)先按照年月日划分其是否属于某个节日,一种实施方式中,可以将某节日前3天都划分在该节日范围内;例如:国庆节为10月1日-10月7日,在本申请的划分中可将9月28日-10月7日期间的搜索内容都属于国庆节范围内,因为在节日前,用户通常会通过搜索节日相关内容为度过节日做准备。如通过年月日分类,确定该时间属于法定节假日,则不需要根据时分秒进一步判断其是否属于工作时间。如果该时间不是法定节假日,则还需根据时分秒进一步判断该时间是否是工作时间。输入搜索关键词后,可获取当前时间信息,根据其时间对其分类,找到该分类时间段的用户搜索记录中与该搜索关键词相似度高于0.8的词语作为该次搜索的候选推荐词,得到候选推荐词集1,即第一候选词集。Analyze user log data and extract date and time—search for recorded structured data. Classify time according to certain rules. For example, different holidays are distinguished by year, month and day, and working hours and non-working hours are distinguished by hours, minutes and seconds. For different times of the day, distinguish between working hours and non-working hours. For example, 9:00-18:00 is the working time interval, and 18:00-9:00 of the next day is the non-working time interval. For each time (year, month, day-hour, minute, second) first divide whether it belongs to a certain festival according to the year, month, and day. In one implementation, the first 3 days of a certain festival can be divided into the scope of the festival; for example: National Day From October 1st to October 7th, in the division of this application, the search content from September 28th to October 7th can be included in the scope of the National Day, because before the festival, users usually search for festivals Related content Get ready for the holidays. If the time is determined to be a legal holiday through the classification of year, month and day, there is no need to further judge whether it belongs to working time based on the hour, minute and second. If the time is not a legal holiday, it is necessary to further judge whether the time is working time according to the hours, minutes and seconds. After inputting the search keyword, the current time information can be obtained, and it can be classified according to its time, and the words whose similarity with the search keyword is higher than 0.8 in the user search records of the classification time period can be found as the candidate recommendation words for this search. Candidate recommended word set 1 is obtained, that is, the first candidate word set.
下面举例说明:The following example illustrates:
日志数据处理得到结构化数据:Log data is processed to obtain structured data:
2020-05-07 10:34:44——“关系数据库的操作步骤”2020-05-07 10:34:44——"Operation steps of relational database"
2020-05-07 20:30:03——“豆瓣得分高的电影”2020-05-07 20:30:03——"Movies with high Douban scores"
2020-06-25 08:18:52——“端午节的风俗习惯”2020-06-25 08:18:52——"Customs and Habits of the Dragon Boat Festival"
……...
对时间进行分类(以2020年为例):Classify the time (take 2020 as an example):
法定节假日:{元旦2019.12.29-2020.01.01,Statutory holidays: {New Year's Day 2019.12.29-2020.01.01,
春节2020.01.21-2020.02.02,Spring Festival 2020.01.21-2020.02.02,
清明节2020.04.01-2020.04.06,Ching Ming Festival 2020.04.01-2020.04.06,
……}...}
其它节日:Other festivals:
{情人节2020.02.11-2020.02.14:{工作时间9:00-18:00,非工作时间18:00-次日9:00},{Valentine's Day 2020.02.11-2020.02.14: {working hours 9:00-18:00, non-working hours 18:00-next day 9:00},
植树节2020.03.09-2020.03.12:{工作时间9:00-18:00,非工作时间18:00-次日9:00},Arbor Day 2020.03.09-2020.03.12: {working hours 9:00-18:00, non-working hours 18:00-next day 9:00},
……}...}
非节日:{工作时间9:00-18:00,非工作时间18:00-次日9:00}Non-holidays: {working hours 9:00-18:00, non-working hours 18:00-next day 9:00}
根据时间类别整理用户日志记录,每个时间类别都得到一个用户搜索记录集合。Organize user log records according to time categories, and each time category gets a collection of user search records.
用户输入搜索关键词:有关房屋租赁的合同模板(搜索时间:2020-03-2110:34:22)The user enters the search keywords: contract templates related to house leasing (search time: 2020-03-21 10:34:22)
为搜索该关键词的时间进行分类:属于非节日中的工作时间段Classify the time of searching for this keyword: it belongs to the working time period in non-holidays
在该时间分类对应的用户搜索记录集合中寻找与关键词相似度高(相似度>0.8)的候选词:[江西房屋租赁合同,房屋出租合同模板,……]Look for candidate words with high similarity (similarity>0.8) to the keyword in the user search record set corresponding to the time classification: [Jiangxi house lease contract, house lease contract template,...]
得到候选推荐词集1:[江西房屋租赁合同,房屋出租合同模板,……]。Candidate recommended word set 1 is obtained: [Jiangxi house lease contract, house lease contract template, ...].
可选地,所述获取输入的搜索词之后,所述步骤105之前,所述方法还包括:Optionally, after the acquisition of the input search term and before step 105, the method further includes:
确定所述搜索词的词向量;determining a word vector for the search term;
从预先构建的词向量集合中,确定与所述搜索词的词向量相似的相似词向量集,得到所述相似词向量集对应的相似词集;From the pre-built word vector set, determine a similar word vector set similar to the word vector of the search term, and obtain a similar word set corresponding to the similar word vector set;
使用所述相似词集中的词对所述搜索词进行替换处理,得到第二候选词集;Using words in the similar word set to replace the search word to obtain a second candidate word set;
所述步骤105包括:The step 105 includes:
基于所述第一候选词集和所述第二候选词集,确定推荐词集并进行推荐。Based on the first candidate word set and the second candidate word set, a recommended word set is determined and recommended.
一种实施方式中,还可以根据语义关系构造推荐词集,使得最终能够结合语义关系和用户搜索行为在不同日期及时间内的偏好信息构造推荐词集。In one embodiment, the recommended word set can also be constructed according to the semantic relationship, so that the recommended word set can be constructed by combining the semantic relationship and user search behavior preference information at different dates and times.
该实施方式中,可以预先利用大规模语料,构建各词的词向量集合,并基于词向量之间的相似关系,确定与所述搜索词相关的另一候选词集。In this embodiment, a large-scale corpus can be used in advance to construct a word vector set for each word, and another candidate word set related to the search word can be determined based on the similarity relationship between the word vectors.
具体地,对于用户输入的搜索词,可首先对其进行预处理,包括去特殊字符,如“/、?!”等;另外考虑到用户输入查询的内容可能不只是一个词或短语,也可能是一个句子,因此可对用户输入的搜索词进行分词,并且去掉分词后的虚词或无意义的部分,如“的、吗、为”等词。接下来可为分词后得到的实词按照语义关系相似度寻找候选词集,而计算相似度的方法可以是将词语转化为词向量,通过计算词向量间的余弦距离来衡量词语间的语义相似度。Specifically, for the search term entered by the user, it can be preprocessed first, including removing special characters, such as "/, ?!", etc.; in addition, considering that the content of the user's input query may not be just a word or phrase, it may also be It is a sentence, so the search words entered by the user can be segmented, and the function words or meaningless parts after the segmentation can be removed, such as "de, what, for" and other words. Next, the content words obtained after word segmentation can be used to find candidate word sets according to the similarity of semantic relationship, and the method of calculating similarity can be to convert words into word vectors, and measure the semantic similarity between words by calculating the cosine distance between word vectors .
对于词向量的获取方法,可采用在大规模无监督语料集合上预训练双向多头注意力机制语言模型,也可叫基于转换器的双向编码表征(Bidirectional EncoderRepresentations from Transformers,Bert)模型,来得到词向量集合。通过这种方式得到的词向量能在一定程度上解决中文的一词多义问题。例如,对于语句“这片果园里的苹果长势喜人”和“相比安卓,我更习惯使用苹果的系统”,这两句中都出现了“苹果”一词,很明显它们具有截然不同的含义,按照传统的独热编码(one-hot)向量或word2vec词向量方法构造的词向量都不能区分不同的含义,而通过双向多头注意力机制模型得到的词向量能区别同一词汇的不同含义。For the method of obtaining word vectors, a language model of bidirectional multi-head attention mechanism can be pre-trained on a large-scale unsupervised corpus, which can also be called a Bidirectional Encoder Representations from Transformers (Bert) model based on a converter. Vector collection. The word vector obtained in this way can solve the polysemy problem in Chinese to a certain extent. For example, for the sentences "the apples in this orchard are growing well" and "I'm more used to using Apple's operating system than Android", the word "apple" appears in both sentences, and it is obvious that they have very different meanings , the word vector constructed according to the traditional one-hot vector or the word2vec word vector method cannot distinguish different meanings, but the word vector obtained through the bidirectional multi-head attention mechanism model can distinguish different meanings of the same vocabulary.
对于所述搜索词,也可采用双向多头注意力机制语言模型来将所述搜索词表示为词向量。对所述搜索词进行分词后得到的实词,需要依次找到它们的相似词或相关词,具体可通过计算各实词的词向量与所述词向量集合中词向量的相似度,来判断是否加入到候选词集中。例如,可设置相似度阈值为0.8,若某实词与所述词向量集合中某词的相似度大于阈值,则将该词加入到该实词的候选词集中。将所有实词的候选词集找到后,可在实词对应位置使用其候选词进行替换,且可以替换任意个位置的实词。这样可得到较多的待推荐候选词,但是其中会存在大量的噪声,并且有待推荐候选词不通顺的情况,因此之后可借助语言模型筛选待推荐候选词中流畅度高的词,得到第二候选词集。For the search term, a bidirectional multi-head attention mechanism language model may also be used to represent the search term as a word vector. For the content words obtained after word segmentation of the search words, it is necessary to find their similar words or related words in turn. Specifically, it is possible to determine whether to add them to set of candidate words. For example, the similarity threshold can be set to 0.8, and if the similarity between a certain content word and a certain word in the word vector set is greater than the threshold, the word is added to the candidate word set of the content word. After the candidate word sets of all content words are found, the candidate words can be used to replace the corresponding positions of the content words, and the content words in any position can be replaced. In this way, more candidate words to be recommended can be obtained, but there will be a lot of noise, and the candidate words to be recommended are not smooth, so the language model can be used to screen the words with high fluency among the candidate words to be recommended, and the second set of candidate words.
这样,可根据用户搜索行为在不同日期及时间内的偏好信息构造得到第一候选词集,并可根据词之间的语义关系构造得到第二候选词集,从而可基于所述第一候选词集和所述第二候选词集,来确定推荐词集并进行推荐,具体可从所述第一候选词集和所述第二候选词集中筛选部分候选词集进行推荐,也可融合所述第一候选词集和所述第二候选词集中的相似候选词进行推荐,最终获取优质的、能满足用户要求的推荐关键词。In this way, the first candidate word set can be constructed according to the preference information of the user's search behavior at different dates and times, and the second candidate word set can be constructed according to the semantic relationship between words, so that based on the first candidate word set and the second set of candidate words to determine the set of recommended words and recommend them. Specifically, some candidate word sets can be selected from the first set of candidate words and the second set of candidate words for recommendation, or the Similar candidate words in the first candidate word set and the second candidate word set are recommended, and high-quality recommended keywords that can meet user requirements are finally obtained.
这样,通过进一步结合语义关系构造推荐词集,能够保证在语义相似度的角度上找到更契合的推荐词,并能在一定程度上解决词语的一词多义问题。In this way, by further combining the semantic relationship to construct the recommended word set, it can ensure that more suitable recommended words can be found from the perspective of semantic similarity, and the problem of polysemy of words can be solved to a certain extent.
即一种实施方式中,可采用如图2所示的关键词推荐方法,包括:获取用户输入的搜索关键词;提供搜索内容所有含义展示供用户选择真实搜索意图;确定搜索内容后根据语义关系构造推荐词集1;根据用户搜索行为在不同日期及时间内的偏好信息构造推荐词集2;对推荐词集进行后处理;得到最终推荐关键词。That is, in one embodiment, the keyword recommendation method as shown in Figure 2 can be adopted, including: obtaining the search keyword input by the user; providing a display of all meanings of the search content for the user to select the real search intention; determining the search content according to the semantic relationship Construct the recommended word set 1; construct the recommended word set 2 according to the preference information of the user's search behavior at different dates and times; perform post-processing on the recommended word set; obtain the final recommended keywords.
下面结合图4,以举例的方式来说明上述确定第二候选词集的具体实施方式:Below in conjunction with Fig. 4, the specific implementation manner of the above-mentioned determination of the second candidate word set is described by way of example:
用户输入搜索关键词:有关房屋租赁的合同模板The user enters the search keywords: contract templates related to house leasing
分词:有关/房屋/租赁/的/合同/模板Participle: about /housing/lease/of/contract/template
去掉虚词、无意义词:[有关,房屋,租赁,合同,模板]Remove function words and meaningless words: [related, house, lease, contract, template]
词向量集合:[房屋:词向量表示(0.663896 0.921862-1.805689……);合约:词向量表示(0.669016 2.295912-0.409784……);Word vector collection: [house: word vector representation (0.663896 0.921862-1.805689…); contract: word vector representation (0.669016 2.295912-0.409784…);
房子:词向量表示(0.456529-0.898323-1.437611……);House: word vector representation (0.456529-0.898323-1.437611...);
……]...]
关键词实词的相似词集合(每个词的候选相似词与原实词的相似度都大于设定阈值0.8):The set of similar words of the keyword content words (the similarity between the candidate similar words of each word and the original content words is greater than the set threshold of 0.8):
{有关:[关于,相关],{ about: [about, related],
房屋:[房子,屋宇,屋子],house: [house, house, room],
租赁:[租用,出租,租借],Lease: [rent, lease, lease],
合同:[协定,契约,合约,公约],Contract: [agreement, deed, contract, covenant],
模板:[模式,模本,模印]}template: [pattern, template, stencil]}
替换后的候选词集:[关于房屋租赁的合同模板,相关房屋租赁的合同模板,有关房子租赁的合同模板,有关房屋租用的合同模板,关于屋宇租赁的合同模本,有关房屋租借的契约模板,关于房子租赁的合同模本……]Candidate word set after replacement: [contract template about house lease, contract template about house lease, contract template about house lease, contract template about house lease, contract template about house lease, contract template about house lease , about the contract model for house leasing...]
借助语言模型筛选后的候选推荐词集2,即第二候选词集:[关于房屋租赁的合同模板,有关房子租赁的合同模板,有关房屋租用的合同模板,关于房子租赁的合同模本……]。Candidate recommended word set 2 filtered by the language model, i.e. the second candidate word set: [contract template about house leasing, contract template about house leasing, contract template about house renting, contract template about house leasing... ].
进一步地,所述基于所述第一候选词集和所述第二候选词集,确定推荐词集并进行推荐,包括:Further, the determining and recommending a recommended word set based on the first candidate word set and the second candidate word set includes:
将所述第一候选词集中的候选词与所述第二候选词集中的候选词进行相似比较;comparing the candidate words in the first candidate word set with the candidate words in the second candidate word set;
基于相似比较结果,确定所述第一候选词集中的第一推荐词集和所述第二候选词集中的第二推荐词集;Based on similar comparison results, determining a first recommended word set in the first candidate word set and a second recommended word set in the second candidate word set;
推荐所述第一推荐词集与所述第二推荐词集。Recommending the first recommended word set and the second recommended word set.
即一种实施方式中,可以将所述第一候选词集中的候选词与所述第二候选词集中的候选词进行相似比较,具体地,对于所述第一候选词集中的每个候选词,均需分别与所述第二候选词集中的各个候选词进行相似比较,以确定所述第一候选词集中是否存在候选词与所述第二候选词集中的候选词相似,从而可得到这两个候选词集中相似的候选词,以及不相似的候选词,也即相似比较结果,其中,所述相似可以是指相似度大于预定阈值。That is, in one embodiment, the candidate words in the first candidate word set may be similarly compared with the candidate words in the second candidate word set, specifically, for each candidate word in the first candidate word set , all need to be similarly compared with each candidate word in the second candidate word set to determine whether there is a candidate word in the first candidate word set that is similar to the candidate words in the second candidate word set, so that the Similar candidate words and dissimilar candidate words in the two candidate word sets are similarity comparison results, wherein the similarity may refer to a degree of similarity greater than a predetermined threshold.
然后,可基于相似比较结果,确定所述第一候选词集中作为推荐词的候选词,得到第一推荐词集,以及确定所述第二候选词集中作为推荐词的候选词,得到第二推荐词集,例如,可将所述第一候选词集与所述第二候选词集中相似的候选词确定为推荐词,对于所述第一候选词集与所述第二候选词集中不相似的候选词,可以根据其他因素,如搜索频次、与所述搜索词的相似度等,进一步确定是否作为推荐词。Then, based on the similarity comparison results, determine the candidate words in the first candidate word set as recommended words to obtain the first recommended word set, and determine the candidate words in the second candidate word set as recommended words to obtain the second recommendation word set, for example, the candidate words similar to the first candidate word set and the second candidate word set may be determined as recommended words, for the first candidate word set not similar to the second candidate word set Candidate words can be further determined as recommended words according to other factors, such as search frequency, similarity with the search word, etc.
最终,可向用户推荐所述第一推荐词集与所述第二推荐词集,例如,可在搜索位置处分别显示所述第一推荐词集中的推荐词与所述第二推荐词集中的推荐词。Finally, the first recommended word set and the second recommended word set can be recommended to the user, for example, the recommended words in the first recommended word set and the words in the second recommended word set can be displayed at the search position respectively. Recommended words.
这样,通过对所述第一候选词集中的候选词与所述第二候选词集中的候选词进行相似比较,来确定推荐词集,能够进一步保证关键词推荐的准确性。In this way, by comparing the candidate words in the first candidate word set with the candidate words in the second candidate word set to determine the recommended word set, the accuracy of keyword recommendation can be further ensured.
进一步地,所述基于相似比较结果,确定所述第一候选词集中的第一推荐词集和所述第二候选词集中的第二推荐词集,包括:Further, the determining the first recommended word set in the first candidate word set and the second recommended word set in the second candidate word set based on similarity comparison results includes:
确定所述第一候选词集中的第一组候选词和第二组候选词为所述第一推荐词集,以及确定所述第二候选词集中的第三组候选词为所述第二推荐词集;Determining the first group of candidate words and the second group of candidate words in the first candidate word set as the first recommended word set, and determining the third group of candidate words in the second candidate word set as the second recommendation vocabulary;
其中,所述第一组候选词包括所述第一候选词集中,与所述第二候选词集中的任一候选词相似的候选词;所述第二组候选词包括所述第一候选词集中,与所述第二候选词集中的任一候选词均不相似的候选词;所述第三组候选词包括所述第二候选词集中,与所述第一候选词集中的任一候选词均不相似的候选词;Wherein, the first group of candidate words includes the first candidate word set, a candidate word similar to any candidate word in the second candidate word set; the second set of candidate words includes the first candidate word set, a candidate word that is not similar to any candidate word in the second candidate word set; the third group of candidate words includes the second candidate word set, and any candidate word in the first candidate word set Candidates whose words are not similar;
所述推荐所述第一推荐词集与所述第二推荐词集,包括:The recommending the first recommended word set and the second recommended word set includes:
对所述第一推荐词集与所述第二推荐词集,按照所述第一组候选词、所述第二组候选词、所述第三组候选词的先后顺序进行排序后推荐。The first recommended word set and the second recommended word set are sorted and recommended according to the order of the first group of candidate words, the second group of candidate words, and the third group of candidate words.
即在比较所述第一候选词集中的候选词和所述第二候选词集中的候选词的相似性后,可以将所述第一候选词集中的相似候选词(即第一组候选词)和非相似候选词(即第二组候选词)均作为推荐词进行推荐,而将所述第二候选词集中的相似候选词去除,仅将所述第二候选词集中的非相似候选词(即第三组候选词)作为推荐词进行推荐。其中,所述相似候选词为与另一候选词集中的某个候选词相似的候选词,所述非相似候选词为与另一候选词集中的任一个候选词都不相似的候选词。That is, after comparing the similarity between the candidate words in the first candidate word set and the candidate words in the second candidate word set, the similar candidate words (ie, the first group of candidate words) in the first candidate word set can be and dissimilar candidate words (that is, the second group of candidate words) are recommended as recommended words, and the similar candidate words in the second candidate word set are removed, and only the dissimilar candidate words in the second candidate word set ( That is, the third group of candidate words) are recommended as recommended words. Wherein, the similar candidate word is a candidate word similar to a candidate word in another candidate word set, and the non-similar candidate word is a candidate word not similar to any candidate word in another candidate word set.
例如,所述第一候选词集中的第i个候选词与所述第二候选词集中的第j个候选词相似,则所述第i个候选词为所述第一候选词集中的相似候选词,并可加入所述第一推荐词集,所述第j个候选词为所述第二候选词集中的相似候选词,予以删除,不加入所述第二推荐词集。For example, the i-th candidate word in the first candidate word set is similar to the j-th candidate word in the second candidate word set, then the i-th candidate word is a similar candidate in the first candidate word set words, and can be added to the first recommended word set, and the jth candidate word is a similar candidate word in the second candidate word set, which is deleted and not added to the second recommended word set.
又例如,所述第一候选词集中的第k个候选词与所述第二候选词集中的每个候选词都不相似,所述第二候选词集中的第g个候选词与所述第一候选词集中的每个候选词都不相似,则所述第k个候选词为所述第一候选词集中的非相似候选词,并可加入所述第一推荐词集,所述第g个候选词为所述第二候选词集中的非相似候选词,并可加入所述第二推荐词集。For another example, the kth candidate word in the first candidate word set is not similar to each candidate word in the second candidate word set, and the gth candidate word in the second candidate word set is similar to the first candidate word in the second candidate word set. Each candidate word in a candidate word set is not similar, then the kth candidate word is a non-similar candidate word in the first candidate word set, and can be added to the first recommended word set, and the gth candidate word candidate words are non-similar candidate words in the second candidate word set, and can be added to the second recommended word set.
在确定所述第一组候选词、所述第二组候选词、所述第三组候选词为推荐词集时,还可确定各组候选词的推荐顺序,具体地,为保证推荐的准确性和效率,可以将所述第一组候选词排在最前推荐,将所述第二组候选词排在中间推荐,将所述第三组候选词排在最后进行推荐,即推荐词按照所述第一组候选词、所述第二组候选词至所述第三组候选词的前后位置顺序进行显示。When determining that the first group of candidate words, the second group of candidate words, and the third group of candidate words are recommended word sets, the order of recommendation of each group of candidate words can also be determined. Specifically, in order to ensure the accuracy of the recommendation performance and efficiency, the first group of candidate words can be recommended at the top, the second group of candidate words can be recommended in the middle, and the third group of candidate words can be recommended at the end, that is, the recommended words are recommended according to the The first group of candidate words, the second group of candidate words to the third group of candidate words are displayed in sequence.
这样,由于第一候选词集中的候选词更符合用户在当前搜索时段的搜索偏好,第二候选词集中的候选词则与当前搜索词较为相关,因此,通过对所述第一候选词集中的相似候选词和非相似候选词均进行推荐,对所述第二候选词集中的非相似候选词进行推荐,既能够保证关键词推荐的准确性,也能保证关键词推荐的全面性。另外,通过按照所述第一组候选词、所述第二组候选词、所述第三组候选词的先后顺序进行排序后推荐,能够保证将最可能符合用户期望的推荐词靠前显示推荐给用户,让用户较快地注意到符合其期望的搜索词。In this way, since the candidate words in the first candidate word set are more in line with the user's search preference in the current search period, the candidate words in the second candidate word set are more relevant to the current search word. Therefore, by analyzing the first candidate word set Both the similar candidate words and the non-similar candidate words are recommended, and the non-similar candidate words in the second candidate word set are recommended, which can not only ensure the accuracy of keyword recommendation, but also ensure the comprehensiveness of keyword recommendation. In addition, by sorting and recommending according to the order of the first group of candidate words, the second group of candidate words, and the third group of candidate words, it can ensure that the recommended words most likely to meet the user's expectations are displayed in the front of the recommended list. To the user, let the user quickly notice the search terms that meet their expectations.
下面结合图5,以举例的方式来说明上述确定最终推荐词集的具体实施方式:Below in conjunction with Fig. 5, the specific implementation manner of the above-mentioned determination of the final recommended word set is described by way of example:
如图5所示,在得到候选推荐词集1(第一候选词集)、候选推荐词集2(第二候选词集)后,通过后处理对最终推荐词进行排序,将更能满足用户期待的推荐词显示在前。具体地,可计算候选推荐词集1中词语{a、d、e}与候选推荐词集2中词语{A、B、C}的相似度,如a与A相似,则保留a,删除A,并将a排在最前面。候选推荐词集1和候选推荐词集2中其它相似度超过阈值的词按照候选推荐词集1中词语在前的顺序进行排列,候选推荐词集1的非相似词排在中间,候选推荐词集2的非相似词排在后边。最终得到推荐词集:{a、d、e、B、C}。As shown in Figure 5, after obtaining the candidate recommended word set 1 (the first candidate word set) and the candidate recommended word set 2 (the second candidate word set), the final recommended words are sorted through post-processing, which will be more satisfying to the user. The expected recommendation is displayed first. Specifically, the similarity between the words {a, d, e} in the candidate recommended word set 1 and the words {A, B, C} in the candidate recommended word set 2 can be calculated. If a is similar to A, keep a and delete A , and place a at the top. The other words whose similarity exceeds the threshold in candidate recommended word set 1 and candidate recommended word set 2 are arranged in the order in which the words in candidate recommended word set 1 come first, and the non-similar words in candidate recommended word set 1 are arranged in the middle, and the candidate recommended word set The non-similar words of set 2 are ranked next. Finally, the recommended word set is obtained: {a, d, e, B, C}.
现有的关键词推荐技术通常是基于搜索相关文档来获取与搜索关键词相似或相关的词,当用户输入信息有歧义时无法清晰推送符合用户期望的关键词。本申请对于有歧义的搜索内容展示出来供用户进一步选择,明确用户搜索意图,实现精准推荐。本申请使用双向多头注意力机制模型在大规模语料上训练词向量,在拟合词语语义信息上效果比传统特征向量及word2vec的方法更好,为基于语义相似度方法推荐关键词提供基础,并且该方法能在一定程度上解决汉语的一词多义问题。另外,本申请充分考虑到了用户搜索行为在不同日期及时间上的偏好。由于用户在不同的节日,工作时间或非工作时间所搜索的内容会有较大差别,例如,上班和下班时段、节日前后时段都会出现搜索的明显偏好,因此本申请通过分析历史用户日志,将其按照时间进行分类(节假日、其它节日工作时间、其它节日非工作时间、非节日工作时间和非节日非工作时间)。对于用户搜索关键词的推荐结合时间类别进行相关推荐更能符合用户的需求。Existing keyword recommendation technologies are usually based on searching related documents to obtain words similar or related to the search keyword. When the user input information is ambiguous, keywords that meet user expectations cannot be clearly pushed. This application displays ambiguous search content for users to further select, clarifying users' search intentions, and achieving accurate recommendations. This application uses a two-way multi-head attention mechanism model to train word vectors on large-scale corpus, which is better than traditional feature vectors and word2vec methods in fitting word semantic information, providing a basis for recommending keywords based on semantic similarity methods, and This method can solve the polysemy problem in Chinese to a certain extent. In addition, this application fully takes into account the user's search behavior preferences on different dates and times. Since the search content of users in different festivals, working hours or non-working hours will be quite different. It is classified by time (holidays, other holiday working hours, other holiday non-working hours, non-holiday working hours and non-holiday non-working hours). For the recommendation of the user's search keywords combined with the time category to make relevant recommendations, it can better meet the needs of the user.
本申请实施例的关键词推荐方法,获取输入的搜索词,以及获取输入所述搜索词的时间信息;确定所述时间信息所属的目标时间类别;从预先建立的用户搜索日志数据库中,确定对应所述目标时间类别的目标用户搜索记录集,其中,所述用户搜索日志数据库中存储有多个分别对应不同时间类别的用户搜索记录集;从所述目标用户搜索记录集中确定与所述搜索词关联的第一候选词集;基于所述第一候选词集,确定推荐词集并进行推荐。这样,通过考虑到用户在不同时间的搜索偏好,对用户搜索记录按照时间进行分类,从而可实现根据用户搜索时间对应的搜索记录,为用户推荐更符合其期望的信息,提高推荐准确性。The keyword recommendation method of the embodiment of the present application obtains the input search term and the time information of the input search term; determines the target time category to which the time information belongs; determines the corresponding time information from the pre-established user search log database The target user search record set of the target time category, wherein a plurality of user search record sets respectively corresponding to different time categories are stored in the user search log database; The associated first set of candidate words; based on the first set of candidate words, determine a set of recommended words and make recommendations. In this way, by considering the user's search preferences at different times, the user's search records are classified according to time, so that the search records corresponding to the user's search time can be realized, and the information that is more in line with their expectations can be recommended to the user, and the recommendation accuracy can be improved.
本申请实施例还提供了一种关键词推荐装置。参见图6,图6是本申请实施例提供的关键词推荐装置的结构图。由于关键词推荐装置解决问题的原理与本申请实施例中关键词推荐方法相似,因此该关键词推荐装置的实施可以参见方法的实施,重复之处不再赘述。The embodiment of the present application also provides a keyword recommendation device. Referring to FIG. 6 , FIG. 6 is a structural diagram of a keyword recommendation device provided by an embodiment of the present application. Since the problem-solving principle of the keyword recommendation device is similar to the keyword recommendation method in the embodiment of the present application, the implementation of the keyword recommendation device can refer to the implementation of the method, and the repetition will not be repeated.
如图6所示,关键词推荐装置600包括:As shown in Figure 6, the
第一获取模块601,用于获取输入的搜索词,以及获取输入所述搜索词的时间信息;The first acquiring
第一确定模块602,用于确定所述时间信息所属的目标时间类别;A first determining
第二确定模块603,用于从预先建立的用户搜索日志数据库中,确定对应所述目标时间类别的目标用户搜索记录集,其中,所述用户搜索日志数据库中存储有多个分别对应不同时间类别的用户搜索记录集;The second determining
第三确定模块604,用于从所述目标用户搜索记录集中确定与所述搜索词关联的第一候选词集;A
推荐模块605,用于基于所述第一候选词集,确定推荐词集并进行推荐。The recommending
可选地,关键词推荐装置600还包括:Optionally, the
划分模块,用于将时间按照日期属性和/或时间段属性,划分多个时间类别;A division module, used to divide time into multiple time categories according to date attributes and/or time period attributes;
第二获取模块,用于获取用户搜索日志数据;The second obtaining module is used to obtain user search log data;
分类模块,用于将所述用户搜索日志数据中的各用户搜索记录,按照历史搜索时间所属的时间类别进行分类,得到分别属于不同时间类别的多个用户搜索记录集;A classification module, configured to classify each user search record in the user search log data according to the time category to which the historical search time belongs, to obtain a plurality of user search record sets belonging to different time categories;
存储模块,用于将所述多个用户搜索记录集按时间类别标签存储至所述用户搜索日志数据库。A storage module, configured to store the plurality of user search record sets in the user search log database according to time category tags.
可选地,所述划分模块用于将时间按照日期属性划分法定节假日、其他节日和非节日,并将所述其他节日和非节日,按照时间段属性划分工作时间和非工作时间,得到包括法定节假日、其他节日中的工作时间、其他节日中的非工作时间、非节日中的工作时间和非节日中的非工作时间的五个时间类别,其中,所述其他节日为非法定的节日。Optionally, the division module is used to divide the time into legal holidays, other holidays and non-holidays according to the date attribute, and divide the other holidays and non-holidays into working hours and non-working hours according to the time period attribute to obtain legal Five time categories of holidays, working hours in other festivals, non-working hours in other festivals, working hours in non-holidays and non-working hours in non-holidays, wherein the other festivals are non-statutory festivals.
可选地,第一获取模块601包括:Optionally, the first obtaining
获取单元,用于获取初始输入的第一搜索词;an acquisition unit, configured to acquire the first search term initially input;
显示单元,用于在识别所述第一搜索词存在歧义的情况下,显示所述第一搜索词的多个含义;A display unit, configured to display multiple meanings of the first search term when it is identified that the first search term is ambiguous;
第一确定单元,用于基于用户选择的含义,确定用户确认输入的搜索词。The first determining unit is configured to determine the search word input by the user for confirmation based on the meaning selected by the user.
可选地,关键词推荐装置600还包括:Optionally, the
第四确定模块,用于确定所述搜索词的词向量;The fourth determination module is used to determine the word vector of the search word;
第五确定模块,用于从预先构建的词向量集合中,确定与所述搜索词的词向量相似的相似词向量集,得到所述相似词向量集对应的相似词集;The fifth determination module is used to determine a similar word vector set similar to the word vector of the search word from the pre-built word vector set, and obtain a similar word set corresponding to the similar word vector set;
处理模块,用于使用所述相似词集中的词对所述搜索词进行替换处理,得到第二候选词集;A processing module, configured to use words in the similar word set to replace the search word to obtain a second candidate word set;
推荐模块605用于基于所述第一候选词集和所述第二候选词集,确定推荐词集并进行推荐。The recommending
可选地,推荐模块605包括:Optionally, the
相似比较单元,用于将所述第一候选词集中的候选词与所述第二候选词集中的候选词进行相似比较;a similar comparison unit, configured to similarly compare the candidate words in the first candidate word set with the candidate words in the second candidate word set;
第二确定单元,用于基于相似比较结果,确定所述第一候选词集中的第一推荐词集和所述第二候选词集中的第二推荐词集;A second determining unit, configured to determine a first recommended word set in the first candidate word set and a second recommended word set in the second candidate word set based on similarity comparison results;
推荐单元,用于推荐所述第一推荐词集与所述第二推荐词集。A recommending unit is configured to recommend the first recommended word set and the second recommended word set.
可选地,所述第二确定单元用于确定所述第一候选词集中的第一组候选词和第二组候选词为所述第一推荐词集,以及确定所述第二候选词集中的第三组候选词为所述第二推荐词集;Optionally, the second determining unit is configured to determine the first group of candidate words and the second group of candidate words in the first candidate word set as the first recommended word set, and determine the second set of candidate words The third group of candidate words is the second recommended word set;
其中,所述第一组候选词包括所述第一候选词集中,与所述第二候选词集中的任一候选词相似的候选词;所述第二组候选词包括所述第一候选词集中,与所述第二候选词集中的任一候选词均不相似的候选词;所述第三组候选词包括所述第二候选词集中,与所述第一候选词集中的任一候选词均不相似的候选词;Wherein, the first group of candidate words includes the first candidate word set, a candidate word similar to any candidate word in the second candidate word set; the second set of candidate words includes the first candidate word set, a candidate word that is not similar to any candidate word in the second candidate word set; the third group of candidate words includes the second candidate word set, and any candidate word in the first candidate word set Candidates whose words are not similar;
所述推荐单元用于对所述第一推荐词集与所述第二推荐词集,按照所述第一组候选词、所述第二组候选词、所述第三组候选词的先后顺序进行排序后推荐。The recommending unit is used for the first recommended word set and the second recommended word set, according to the order of the first group of candidate words, the second group of candidate words, and the third group of candidate words Recommended after sorting.
本申请实施例提供的关键词推荐装置600,可以执行上述方法实施例,其实现原理和技术效果类似,本实施例此处不再赘述。The
本申请实施例的关键词推荐装置600,获取输入的搜索词,以及获取输入所述搜索词的时间信息;确定所述时间信息所属的目标时间类别;从预先建立的用户搜索日志数据库中,确定对应所述目标时间类别的目标用户搜索记录集,其中,所述用户搜索日志数据库中存储有多个分别对应不同时间类别的用户搜索记录集;从所述目标用户搜索记录集中确定与所述搜索词关联的第一候选词集;基于所述第一候选词集,确定推荐词集并进行推荐。这样,通过考虑到用户在不同时间的搜索偏好,对用户搜索记录按照时间进行分类,从而可实现根据用户搜索时间对应的搜索记录,为用户推荐更符合其期望的信息,提高推荐准确性。The
本申请实施例还提供了一种电子设备。由于电子设备解决问题的原理与本申请实施例中关键词推荐方法相似,因此该电子设备的实施可以参见方法的实施,重复之处不再赘述。如图7所示,本申请实施例的电子设备,包括:The embodiment of the present application also provides an electronic device. Since the problem-solving principle of the electronic device is similar to that of the keyword recommendation method in the embodiment of the present application, the implementation of the electronic device can refer to the implementation of the method, and the repetition will not be repeated. As shown in Figure 7, the electronic device of the embodiment of the present application includes:
处理器700,用于读取存储器720中的程序,执行下列过程:The
获取输入的搜索词,以及获取输入所述搜索词的时间信息;Obtaining the input search term, and obtaining time information when the search term is input;
确定所述时间信息所属的目标时间类别;determining the target time category to which the time information belongs;
从预先建立的用户搜索日志数据库中,确定对应所述目标时间类别的目标用户搜索记录集,其中,所述用户搜索日志数据库中存储有多个分别对应不同时间类别的用户搜索记录集;From a pre-established user search log database, determine a target user search record set corresponding to the target time category, wherein a plurality of user search record sets respectively corresponding to different time categories are stored in the user search log database;
从所述目标用户搜索记录集中确定与所述搜索词关联的第一候选词集;determining a first candidate word set associated with the search word from the target user search record set;
基于所述第一候选词集,确定推荐词集并进行推荐。Based on the first candidate word set, a recommended word set is determined and recommended.
其中,在图7中,总线架构可以包括任意数量的互联的总线和桥,具体由处理器700代表的一个或多个处理器和存储器720代表的存储器的各种电路链接在一起。总线架构还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路链接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口提供接口。处理器700负责管理总线架构和通常的处理,存储器720可以存储处理器700在执行操作时所使用的数据。Wherein, in FIG. 7 , the bus architecture may include any number of interconnected buses and bridges, specifically one or more processors represented by the
可选地,处理器700还用于读取存储器720中的程序,执行如下步骤:Optionally, the
将时间按照日期属性和/或时间段属性,划分多个时间类别;Divide time into multiple time categories according to date attributes and/or time period attributes;
获取用户搜索日志数据;Obtain user search log data;
将所述用户搜索日志数据中的各用户搜索记录,按照历史搜索时间所属的时间类别进行分类,得到分别属于不同时间类别的多个用户搜索记录集;Classify each user search record in the user search log data according to the time category to which the historical search time belongs, and obtain a plurality of user search record sets belonging to different time categories;
将所述多个用户搜索记录集按时间类别标签存储至所述用户搜索日志数据库。The plurality of user search record sets are stored in the user search log database according to time category tags.
可选地,处理器700还用于读取存储器720中的程序,执行如下步骤:Optionally, the
将时间按照日期属性划分法定节假日、其他节日和非节日,并将所述其他节日和非节日,按照时间段属性划分工作时间和非工作时间,得到包括法定节假日、其他节日中的工作时间、其他节日中的非工作时间、非节日中的工作时间和非节日中的非工作时间的五个时间类别,其中,所述其他节日为非法定的节日。Divide the time into legal holidays, other holidays and non-holidays according to the date attribute, and divide the other holidays and non-holidays into working hours and non-working hours according to the time period attribute to obtain legal holidays, working hours in other festivals, other Five time categories of non-working time in festivals, working hours in non-holidays and non-working hours in non-holidays, wherein the other festivals are non-statutory festivals.
可选地,处理器700还用于读取存储器720中的程序,执行如下步骤:Optionally, the
获取初始输入的第一搜索词;Obtain the first search term of the initial input;
在识别所述第一搜索词存在歧义的情况下,显示所述第一搜索词的多个含义;displaying multiple meanings of the first search term in the event that the first search term is identified to be ambiguous;
基于用户选择的含义,确定用户确认输入的搜索词。Based on the meaning selected by the user, it is determined that the search term entered by the user is confirmed.
可选地,处理器700还用于读取存储器720中的程序,执行如下步骤:Optionally, the
确定所述搜索词的词向量;determining a word vector for the search term;
从预先构建的词向量集合中,确定与所述搜索词的词向量相似的相似词向量集,得到所述相似词向量集对应的相似词集;From the pre-built word vector set, determine a similar word vector set similar to the word vector of the search term, and obtain a similar word set corresponding to the similar word vector set;
使用所述相似词集中的词对所述搜索词进行替换处理,得到第二候选词集;Using words in the similar word set to replace the search word to obtain a second candidate word set;
基于所述第一候选词集和所述第二候选词集,确定推荐词集并进行推荐。Based on the first candidate word set and the second candidate word set, a recommended word set is determined and recommended.
可选地,处理器700还用于读取存储器720中的程序,执行如下步骤:Optionally, the
将所述第一候选词集中的候选词与所述第二候选词集中的候选词进行相似比较;comparing the candidate words in the first candidate word set with the candidate words in the second candidate word set;
基于相似比较结果,确定所述第一候选词集中的第一推荐词集和所述第二候选词集中的第二推荐词集;Based on similar comparison results, determining a first recommended word set in the first candidate word set and a second recommended word set in the second candidate word set;
推荐所述第一推荐词集与所述第二推荐词集。Recommending the first recommended word set and the second recommended word set.
可选地,处理器700还用于读取存储器720中的程序,执行如下步骤:Optionally, the
确定所述第一候选词集中的第一组候选词和第二组候选词为所述第一推荐词集,以及确定所述第二候选词集中的第三组候选词为所述第二推荐词集;Determining the first group of candidate words and the second group of candidate words in the first candidate word set as the first recommended word set, and determining the third group of candidate words in the second candidate word set as the second recommendation vocabulary;
其中,所述第一组候选词包括所述第一候选词集中,与所述第二候选词集中的任一候选词相似的候选词;所述第二组候选词包括所述第一候选词集中,与所述第二候选词集中的任一候选词均不相似的候选词;所述第三组候选词包括所述第二候选词集中,与所述第一候选词集中的任一候选词均不相似的候选词;Wherein, the first group of candidate words includes the first candidate word set, a candidate word similar to any candidate word in the second candidate word set; the second set of candidate words includes the first candidate word set, a candidate word that is not similar to any candidate word in the second candidate word set; the third group of candidate words includes the second candidate word set, and any candidate word in the first candidate word set Candidates whose words are not similar;
对所述第一推荐词集与所述第二推荐词集,按照所述第一组候选词、所述第二组候选词、所述第三组候选词的先后顺序进行排序后推荐。The first recommended word set and the second recommended word set are sorted and recommended according to the order of the first group of candidate words, the second group of candidate words, and the third group of candidate words.
本申请实施例提供的电子设备,可以执行上述方法实施例,其实现原理和技术效果类似,本实施例此处不再赘述。The electronic device provided by the embodiment of the present application can execute the above-mentioned method embodiment, and its implementation principle and technical effect are similar, so this embodiment will not repeat them here.
此外,本申请实施例的计算机可读存储介质,用于存储计算机程序,所述计算机程序可被处理器执行实现图1所示方法实施例中的各个步骤。In addition, the computer-readable storage medium in the embodiment of the present application is used to store a computer program, and the computer program can be executed by a processor to implement various steps in the method embodiment shown in FIG. 1 .
在本申请所提供的几个实施例中,应该理解到,所揭露方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed methods and devices may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理包括,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may be physically included separately, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software functional units.
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述收发方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated units implemented in the form of software functional units may be stored in a computer-readable storage medium. The above-mentioned software functional units are stored in a storage medium, and include several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) to execute some steps of the sending and receiving methods described in various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store program codes. .
以上所述是本申请的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请所述原理的前提下,还可以作出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。The above description is the preferred implementation mode of the present application. It should be pointed out that for those of ordinary skill in the art, some improvements and modifications can also be made without departing from the principles described in the application. These improvements and modifications are also It should be regarded as the protection scope of this application.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111569802.0A CN116303983A (en) | 2021-12-21 | 2021-12-21 | A keyword recommendation method, device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111569802.0A CN116303983A (en) | 2021-12-21 | 2021-12-21 | A keyword recommendation method, device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116303983A true CN116303983A (en) | 2023-06-23 |
Family
ID=86815414
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111569802.0A Pending CN116303983A (en) | 2021-12-21 | 2021-12-21 | A keyword recommendation method, device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116303983A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117540057A (en) * | 2024-01-10 | 2024-02-09 | 广东省电信规划设计院有限公司 | AIGC-based retrieval guidance method and device |
-
2021
- 2021-12-21 CN CN202111569802.0A patent/CN116303983A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117540057A (en) * | 2024-01-10 | 2024-02-09 | 广东省电信规划设计院有限公司 | AIGC-based retrieval guidance method and device |
CN117540057B (en) * | 2024-01-10 | 2024-04-30 | 广东省电信规划设计院有限公司 | AIGC-based retrieval guiding method and AIGC-based retrieval guiding device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109992645B (en) | Data management system and method based on text data | |
EP2192500B1 (en) | System and method for providing robust topic identification in social indexes | |
US7860872B2 (en) | Automated media analysis and document management system | |
CN103336793B (en) | A kind of personalized article recommends method and system thereof | |
WO2016179938A1 (en) | Method and device for question recommendation | |
CN113239163A (en) | Intelligent question-answering method and system based on traffic big data | |
CN105824959A (en) | Public opinion monitoring method and system | |
TWI743623B (en) | Artificial intelligence-based business intelligence system and its analysis method | |
CN101833560A (en) | Internet-based automatic ranking system for manufacturers' word-of-mouth | |
CN109492168B (en) | Visual tourism interest recommendation information generation method based on tourism photos | |
CN107967290A (en) | A kind of knowledge mapping network establishing method and system, medium based on magnanimity scientific research data | |
CN101751439A (en) | Image retrieval method based on hierarchical clustering | |
CN116244410B (en) | Index data analysis method and system based on knowledge graph and natural language | |
CN112231494A (en) | Information extraction method and device, electronic equipment and storage medium | |
CN113988057A (en) | Title generation method, device, device and medium based on concept extraction | |
CN116010552A (en) | Engineering cost data analysis system and method based on keyword word library | |
CN111191153A (en) | Information technology consultation service display device | |
CN113641788B (en) | Unsupervised long and short film evaluation fine granularity viewpoint mining method | |
CN116303983A (en) | A keyword recommendation method, device and electronic equipment | |
WO2021136009A1 (en) | Search information processing method and apparatus, and electronic device | |
CN118277537A (en) | Intellectual property retrieval management method and device based on big data | |
CN117436421A (en) | Standard file editing system, method and equipment | |
CN111859108A (en) | Public opinion system search word recommendation system | |
Çelebi et al. | Automatic question answering for Turkish with pattern parsing | |
CN111949781B (en) | Intelligent interaction method and device based on natural sentence syntactic analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |