CN111428127B

CN111428127B - Personalized event recommendation method and system integrating theme matching and bidirectional preference

Info

Publication number: CN111428127B
Application number: CN202010069262.9A
Authority: CN
Inventors: 钱忠胜; 杨家秀; 朱懿敏
Original assignee: Jiangxi University of Finance and Economics
Current assignee: Jiangxi University of Finance and Economics
Priority date: 2020-01-21
Filing date: 2020-01-21
Publication date: 2023-08-11
Anticipated expiration: 2040-01-21
Also published as: CN111428127A

Abstract

The invention discloses a personalized event recommendation method and system integrating theme matching and bidirectional preference. Firstly, extracting topic information of an event and a historical event participated by a user by using a document topic generation model LDA, and calculating topic matching degree of the user and the event; secondly, considering the social network recommendation based on the event from the two-way angles of the user and the event, constructing preference models of the user and the event, respectively obtaining user preference scores and event preference scores, and more completely mining preference relations from the two angles of the user and the event; finally, the user-event pair matching degree is combined with the user event bi-directional preference linear weighted combination to obtain the final user-event pair comprehensive score, and the ordered TOP-K user-event pairs are used as recommendation results. The performance of the recommendation algorithm of the scheme is superior to that of the traditional recommendation scheme, and the personalized preference of the user can be well predicted, so that the purpose of personalized recommendation is achieved.

Description

Personalized event recommendation method and system integrating topic matching and two-way preference

技术领域Technical Field

本发明涉及信息推荐技术领域，具体涉及一种融合主题匹配与双向偏好的个性化事件推荐方法及系统。The present invention relates to the technical field of information recommendation, and in particular to a personalized event recommendation method and system integrating topic matching and two-way preference.

背景技术Background Art

随着互联网和计算机技术的快速发展，近年来传统的社交网络也朝着不同的革新方向发展，随之形成了一些特殊类型的新型社交网络，比如基于位置的社交网络(Location-Based Social Network, LBSN)，主要根据用户的地理签到信息形成社交关系的社交网络，以及另一种线上与线下结合的复杂异构社交网络——基于事件的社交网络(Events-Based Social Network, EBSN)，区别于传统的社交网络中熟人之间建立的好友关系，基于事件的社交网络中用户通过社会活动建立人际关系，用户根据自身的兴趣或共同点加入线上的兴趣小组和线下的集体社交活动。With the rapid development of the Internet and computer technology, traditional social networks have also developed in different innovative directions in recent years, and some special types of new social networks have been formed, such as location-based social networks (LBSN), which mainly form social relationships based on users' geographic check-in information, and another complex heterogeneous social network that combines online and offline - event-based social networks (EBSN). Different from the friendship established between acquaintances in traditional social networks, users in event-based social networks establish interpersonal relationships through social activities. Users join online interest groups and offline collective social activities based on their own interests or common points.

基于事件的社交网络正处于快速发展的过程中，越来越多的用户选择在事件社交网络中参与社交活动，在基于事件的社交网络平台上，用户可以加入各种不同的线上群组，组织者或者组内的用户可以发起并参与任意的线下社交活动，例如聚会、徒步旅行、体育活动、演唱会等，并与其他用户进行信息共享。Event-based social networks are in the process of rapid development, and more and more users choose to participate in social activities in event-based social networks. On event-based social network platforms, users can join various online groups. Organizers or users in the group can initiate and participate in any offline social activities, such as parties, hiking, sports activities, concerts, etc., and share information with other users.

基于事件的社交网络可以为用户提供从线上到线下结合的社交服务，帮助用户发起及制定个性化的事件参与计划。用户在线上通过共同兴趣形成在线群组关系，并在线上发起线下集会事件，基于事件的社交网络拥有比基于位置的社交网络更广泛的社交属性，已有的工作表明在推荐系统中事件社交网络拥有比传统社交网络的更好的推荐特性。Event-based social networks can provide users with social services that combine online and offline, helping users initiate and develop personalized event participation plans. Users form online group relationships through common interests and initiate offline gatherings online. Event-based social networks have more extensive social attributes than location-based social networks. Existing work has shown that event social networks have better recommendation features than traditional social networks in recommendation systems.

当前大部分基于事件的社交网络推荐主要是基于用户单向角度提取特征偏好进行推荐，虽然会考虑事件主办方的社交影响，但对于事件的潜在吸引度表示性不足。另一方面，关于主题因素的影响仅仅将事件主题作为推荐因素之一，较少考虑用户主题因素及其与事件主题的匹配度。Most current event-based social network recommendations are mainly based on user-based feature preferences. Although they take into account the social influence of the event organizer, they are not representative enough of the potential attractiveness of the event. On the other hand, the influence of thematic factors only takes the event theme as one of the recommendation factors, and rarely considers the user theme factor and its matching degree with the event theme.

发明内容Summary of the invention

有鉴于此，有必要提供一种结合几类主要的上下文信息计算用户偏好及事件潜在偏好，并最终融合主题匹配度与用户-事件双向偏好的融合主题匹配与双向偏好的个性化事件推荐方法及系统。In view of this, it is necessary to provide a personalized event recommendation method and system that combines several main types of contextual information to calculate user preferences and event potential preferences, and finally integrates topic matching and user-event two-way preferences.

一种融合主题匹配与双向偏好的个性化事件推荐方法，包括以下步骤：A personalized event recommendation method integrating topic matching and bidirectional preference includes the following steps:

步骤一，以文档主题生成模型LDA提取事件的主题信息，并根据用户参与的历史事件记录得到用户主题信息，计算新事件和用户历史事件的主题，采用JS散度算法计算用户-事件对的主题匹配度评分；Step 1: Use the document topic generation model LDA to extract the topic information of the event, and obtain the user topic information based on the historical event records in which the user participated, calculate the topics of the new event and the user's historical events, and use the JS divergence algorithm to calculate the topic matching score of the user-event pair;

步骤二，分别构建用户偏好模型和事件偏好模型，并分别计算用户偏好评分和事件偏好评分；Step 2: construct a user preference model and an event preference model respectively, and calculate the user preference score and the event preference score respectively;

步骤三，利用贝叶斯个性化排序算法BPR学习用户偏好评分和事件偏好评分的权重参数，得到用户事件双向偏好评分，将主题匹配度评分和双向偏好评分线性加权组合得出用户-事件对的最终推荐评分，向用户推荐排序后的前K个事件。Step three, use the Bayesian personalized ranking algorithm BPR to learn the weight parameters of user preference scores and event preference scores, obtain the user-event bidirectional preference score, linearly weighted combine the topic matching score and the bidirectional preference score to obtain the final recommendation score of the user-event pair, and recommend the top K events after sorting to the user.

进一步地，步骤一中的所述文档主题生成模型LDA具有三层生成式贝叶斯网络结构，包括文档、主题和词，其中文档-主题和主题-词均服从多项式分布；每个文档以一定概率选择一个主题，并从这个主题中以一定概率选择一个词语，在任意文档中的主题均符合Dirichlet分布，通过该分布发掘文本之间的关系。Furthermore, the document topic generation model LDA in step one has a three-layer generative Bayesian network structure, including documents, topics and words, wherein both document-topic and topic-word obey multinomial distribution; each document selects a topic with a certain probability, and selects a word from this topic with a certain probability, and the topic in any document conforms to the Dirichlet distribution, through which the relationship between texts is discovered.

进一步地，步骤一中的所述计算新事件和用户历史事件的主题，采用JS散度算法计算用户-事件对的主题匹配度评分，具体步骤包括：Furthermore, the calculation of the topics of the new event and the user's historical events in step 1 uses the JS divergence algorithm to calculate the topic matching score of the user-event pair, and the specific steps include:

步骤1-1，将所有事件描述内容组成文档集D并去除停用词，将所述文档集D输入文档主题生成模型LDA，分别求得每个事件的主题分布；Step 1-1, all event description contents are grouped into a document set D and stop words are removed, the document set D is input into a document topic generation model LDA, and the topic distribution of each event is obtained respectively;

对所有事件内容去除停用词及标点符号，把去除噪声干扰词后的文档内容视为所有文档的集合D，输入到LDA主题模型中，产生文档主题和词的联合分布，如式(1)所示：Remove stop words and punctuation marks from all event contents, and regard the document contents after removing noise interference words as the set D of all documents, which is input into the LDA topic model to generate document Joint distribution of topics and words , as shown in formula (1):

(1)； (1);

然后使用Gibbs采样方法估计模型中的两个未知参数：事件主题分布和主题词分布；Then the Gibbs sampling method is used to estimate the two unknown parameters in the model: event topic distribution and keyword distribution ;

步骤1-2，根据JS散度算法计算目标用户历史事件和新事件之间的主题分布相似度；Step 1-2, calculate the topic distribution similarity between the target user's historical events and new events according to the JS divergence algorithm;

根据式(1)已经生成所有事件的主题分布，给定事件和分别具有主题分布，通过JS散度方法首先计算两者之间的JS散度，如式(2)所示：According to formula (1), the topic distribution of all events has been generated , given an event and They have topic distribution , first calculate the JS divergence between the two through the JS divergence method , as shown in formula (2):

(2)； (2);

其中，，表示KL散度，用来描述两个概率分布和之间的差异，计算公式如式(3)所示：in, , Represents KL divergence, which is used to describe two probability distributions and The difference between them is calculated as shown in formula (3):

(3)； (3);

结合式(2)和式(3)可得事件和的主题相似度为，如式(4)所示：Combining equation (2) and equation (3), we can get the event and The topic similarity is , as shown in formula (4):

(4)； (4);

其中，事件的主题相似度的值位于[0,1]中，值越接近1则表示事件相似度越高；Among them, the topic similarity of the event The value of lies in [0,1], and the closer the value is to 1, the higher the event similarity;

步骤1-3，对目标用户所有历史事件的相似度取平均值，得到用户和新事件的主题匹配度评分；Steps 1-3: average the similarities of all historical events of the target user to obtain the topic matching score between the user and the new event;

以表示目标用户的历史事件数,取目标用户所有相似度的平均值作为用户和新事件的主题匹配度评分,如式(5)所示：by Represents the number of historical events of the target user, taking the average value of all similarities of the target user As the topic matching score between the user and the new event, as shown in formula (5):

(5)； (5);

根据构建的主题匹配模型，最终以来度量目标用户与新事件之间的主题匹配关系。According to the constructed topic matching model, the final To measure the topic matching relationship between target users and new events.

进一步地，步骤二中的所述构建用户偏好模型分别从地理位置、社交关系、时间因素三个方面来构建用户的单因素偏好，具体包括：Furthermore, the user preference model constructed in step 2 constructs the user's single factor preference from three aspects: geographical location, social relationship, and time factor, specifically including:

步骤2-1-1，构建地理位置偏好模型：Step 2-1-1, build a geographic location preference model:

地理位置偏好模型计算目标用户将参与在该位置举办事件的概率，采用核密度估计KDE方法对用户参与的事件的二维地理位置分布进行建模，用归一化之后的事件参与概率表示用户对地理位置的偏好度。事件地理位置的经纬度坐标用(Lx, Ly)表示，用户历史参与事件的地点集合以L(u)表示，则关于用户u的KDE函数如式(6)所示：The location preference model calculates the probability that the target user will participate in an event held at that location. The kernel density estimation (KDE) method is used to model the two-dimensional location distribution of the events that the user participates in. The normalized event participation probability is used to represent the user's preference for the location. The longitude and latitude coordinates of the event location are represented by ( Lx, Ly ), and the set of locations where the user has historically participated in events is represented by L ( u ). Then the KDE function for user u is As shown in formula (6):

(6)； (6);

其中，l _i=(Lx _i ,Ly _i)^T表示事件位置经纬度坐标的二维化向量，m _l(u,l _i)表示用户u参加地理位置l _i处举办活动的频率，σ表示邻域窗口(带宽)的大小，N表示位置样本中的个数，K(•)表示高斯核函数，其定义形式如式(7)所示：Where, l _i =( Lx _i ,Ly _i ) ^T represents the two-dimensional vector of the longitude and latitude coordinates of the event location, m _l ( u,l _i ) represents the frequency of user u participating in activities held at geographical location l _i , σ represents the size of the neighborhood window (bandwidth), N represents the number of location samples, and K ( • ) represents the Gaussian kernel function, which is defined as shown in formula (7):

(7)； (7);

结合式(6)和式(7)可定义用户u参加将在位置为l处举办的事件活动的概率，如式(8)所示：Combining equations (6) and (7), we can define the probability that user u will participate in an event held at location l , as shown in equation (8):

(8)； (8);

将概率归一化，得到用户关于地理位置的偏好评分，如式(9)所示：Normalize the probability to get the user's preference score for geographic location , as shown in formula (9):

(9)； (9);

其中，分母表示目标用户最大的事件参与概率；The denominator represents the target user’s maximum event participation probability;

步骤2-1-2，构建社交关系偏好模型：Step 2-1-2, build a social relationship preference model:

在用户社交关系网络中，用户会在线上加入至少一个或多个兴趣组中，并选择参与不同的小组发布的事件活动，通过用户的线上同组关系判断用户的社会关系偏好，所述同组关系主要包括两种交互关系；In the user social relationship network, users will join at least one or more interest groups online and choose to participate in events and activities released by different groups. The user's social relationship preferences are judged by the user's online group relationships, which mainly include two types of interactive relationships;

第一种，用户与组的相关性，定义为用户与他们所属的所有组之间以及用户与组内创建的事件之间的交互关系，以G(u)表示用户u参与的事件所属的组的集合，则用户与组的相关性可表示成式(10)所示：The first type, user-group relevance, is defined as the interaction relationship between users and all the groups they belong to and between users and events created within the groups. G ( u ) represents the set of groups to which events user u participates. Then the user-group relevance is It can be expressed as shown in formula (10):

(10)； (10);

其中，m _p(u,g)表示用户所在组中用户u曾参加的事件活动集合；Where, m _p ( u,g ) represents the set of events that user u has participated in in the group to which he belongs;

第二种，组内用户相关性，组内用户相关性由目标用户所在组中的好友相似性来定义，计算目标用户与组内用户的相似性，如式(11)所示：The second type is the intra-group user relevance. The intra-group user relevance is defined by the similarity of the friends in the target user’s group. The similarity between the target user and the users in the group is calculated. , as shown in formula (11):

(11)； (11);

其中，sim(u _i ,u _j)表示同一组中用户u _i和用户u _j之间的相似性，如式(12)所示；Among them, sim ( u _i ,u _j ) represents the similarity between user u _i and user u _j in the same group, as shown in formula (12);

(12)； (12);

将s(u,g)归一化为，如式(13)所示：Normalize s ( u,g ) to , as shown in formula (13):

(13)； (13);

结合上述两种交互关系，属于相同组的用户倾向于参加由这些组内的其他用户创建的事件，综合用户与组的相关性和组内用户相关性得出用户u关于线上小组g的社交偏好评分，如式(14)所示：Combining the above two interactive relationships, users belonging to the same group tend to participate in events created by other users in these groups. Combining the correlation between users and groups and the correlation between users in groups, the social preference score of user u about online group g is obtained. , as shown in formula (14):

(14)； (14);

其中，作为权重控制参数，在社交关系网络中，设定目标用户和小组的偏好关联与组内用户之间的关联同等重要，通过实验验证将此处的值设为0.5；in, As a weight control parameter, in social relationship networks, setting the preference association between the target user and the group is as important as the association between users in the group. The value of is set to 0.5;

步骤2-1-3，构建时间因素偏好模型：Step 2-1-3, construct a time factor preference model:

事件的时间因素是计算用户偏好时需要考虑的一个重要偏好因子；将用户能选择参加的新事件e表示为一个7*24维的事件时间向量，当新事件在一周的某个特定时间段中发生时，则将该时间段的向量分量值置为1，否则为0；在时间偏好模型中根据用户参加的历史事件记录将用户表示为用户时间向量，如式(15)所示：The time factor of the event is an important preference factor that needs to be considered when calculating user preferences; the new event e that the user can choose to participate in is represented as a 7*24-dimensional event time vector , when a new event occurs in a specific time period of a week, the vector component value of the time period is set to 1, otherwise it is 0; in the time preference model, users are represented as user time vectors according to the historical event records of the users. , as shown in formula (15):

(15)； (15);

其中，E _u表示目标用户参与过的历史事件集合，然后计算用户时间向量和新事件时间向量之间的余弦相似度，如式(16)所示：Among them, Eu represents the set of historical events that the target user has participated in, and then calculates the cosine similarity between the _user time vector and the new event time vector , as shown in formula (16):

(16)； (16);

对于新事件，用户可根据式(16)求得相似度，归一化该相似度得到该用户对事件的时间偏好评分，如式(17)所示：For new events ,user The similarity can be obtained according to formula (16): , normalize the similarity to get the user's time preference score for the event , as shown in formula (17):

(17)； (17);

进一步地，步骤二中的所述计算用户偏好评分，具体包括：Furthermore, the calculation of the user preference score in step 2 specifically includes:

对于所述地理位置偏好模型，通过预测用户参与该位置举办的事件活动的概率表示地理位置偏好评分；对于所述社交关系偏好模型，从目标用户与组的关系、与组内用户相关性两个方面计算目标用户的社交偏好评分；对于所述时间因素偏好模型，通过构建日期和小时两个粒度的统一向量表示，并基于此计算用户-事件对的相似度作为目标用户的时间偏好评分；结合这三个单因素偏好组成一个用户偏好感知模型，将三个单因素偏好线性组合求得用户u对事件e的总体偏好评分，如式(18)所示：For the geographic location preference model, the geographic location preference score is expressed by predicting the probability of the user participating in events held at the location; for the social relationship preference model, the social preference score of the target user is calculated from two aspects: the relationship between the target user and the group and the correlation with the users in the group; for the time factor preference model, a unified vector representation of the two granularities of date and hour is constructed, and the similarity of the user-event pair is calculated based on this as the time preference score of the target user; these three single-factor preferences are combined to form a user preference perception model, and the three single-factor preferences are linearly combined to obtain the overall preference score of user u for event e , as shown in formula (18):

(18)； (18);

其中，分别表示用户在地理位置、社交关系、时间因素三个单因素上的偏好评分。in, They represent the user's preference scores on three single factors: geographic location, social relationship, and time.

进一步地，步骤二中的所述构建事件偏好模型分别从事件位置流行度、事件主办方影响力两个方面来构建事件的单因素偏好，具体包括：Furthermore, the event preference model constructed in step 2 constructs single factor preference of events from two aspects: event location popularity and event organizer influence, specifically including:

步骤2-2-1，构建事件位置流行度偏好模型：Step 2-2-1, construct event location popularity preference model:

根据用户u和其所加入的线上小组g中的用户对地点访问频率来计算地理位置的流行度；The popularity of a geographic location is calculated based on the frequency of visits to the location by user u and the users in the online group g to which he joins;

首先定义事件地理位置l _e关于用户u的流行度，如式(19)所示：First, define the popularity of the event location l _e with respect to user u , as shown in formula (19):

(19)； (19);

其中，分子为用户u参加地理位置l _e处举办活动的频率，分母为用户u历史访问过的位置的最大频率；同样地，定义地理位置l _e关于用户u所在小组g的流行度，如式(20)所示：Among them, the molecule is the frequency of user u participating in activities held at geographic location l _e , and the denominator is the maximum frequency of locations that user u has visited historically; similarly, define the popularity of geographic location l _e with respect to group g to which user u belongs , as shown in formula (20):

(20)； (20);

其中，分子表示小组g中每个用户在位置l参加实践活动的频率，分母为小组成员历史访问过的位置的最大频率，由此可计算出地理位置l _e关于小组g中的用户的流行度；结合和定义要推荐事件的举办位置对目标用户u的总流行度为，如式(21)所示：The numerator represents the frequency of each user in group g participating in practical activities at location l , and the denominator is the maximum frequency of locations that group members have visited historically. This allows us to calculate the popularity of geographic location l _e with respect to users in group g . and Define the total popularity of the location of the recommended event to the target user u as , as shown in formula (21):

(21)； (twenty one);

步骤2-2-2，构建事件主办方影响力偏好模型：Step 2-2-2, build the influence preference model of event organizers:

第一，事件主办方对目标用户的影响度，选择通过主办方的信誉度或者影响度来表示事件的隐式偏好；定义事件对用户u的影响度，如式(22)所示：First, the influence of the event organizer on the target user. The organizer's reputation or influence is used to express the implicit preference of the event. Define the influence of the event on user u , as shown in formula (22):

(22)； (twenty two);

其中，表示用户u参加过的由主办方u _h举办的事件集合，E _h是主办方u _h举办的所有事件集合；in, represents the set of events held by organizer u _h that user u has participated in, E _h is the set of all events held by organizer u _h ;

第二，事件主办方在小组中的影响度，针对目标用户所在的线上小组，事件在该组中的影响度类比采用用户参加的频率比例来表示，用户在组中的影响度以表示，如式(23)所示：Second, the influence of the event organizer in the group. For the online group where the target user is located, the influence of the event in the group is expressed by the frequency ratio of users participating in the event. The influence of the user in the group is expressed by It is expressed as shown in formula (23):

(23)； (twenty three);

其中，U _g表示小组中的用户集合，表示用户参与的由主办方举办的事件集合，表示在小组中举办的事件集合；结合事件主办方对目标用户以及对小组中用户的影响度求得事件主办方的综合影响度评分，如式(24)所示：Among them, U _g represents the group The collection of users in Indicates user Participants are organized by A collection of events held, express In the group The event collection held in the group; the comprehensive influence score of the event organizer is obtained by combining the influence of the event organizer on the target users and the users in the group. , as shown in formula (24):

(24)； (twenty four);

进一步地，步骤二中的所述计算事件偏好评分，具体包括：Furthermore, the calculation of the event preference score in step 2 specifically includes:

对于未发生的新事件，通过计算新事件的事件位置流行度和事件主办方影响力来表示事件的偏好；对已构建的事件位置流行度和事件主办方影响力线性组合，计算得到事件e对用户u的偏好评分，如式(25)所示：For new events that have not occurred, the preference for the event is expressed by calculating the event location popularity and event organizer influence of the new event; for the constructed event location popularity and event organizer influence Linear combination, calculate the preference score of event e for user u , as shown in formula (25):

(25)； (25);

进一步地，步骤三中的所述得到用户事件双向偏好评分，将主题匹配度评分和双向偏好评分线性加权组合得出用户-事件对的最终推荐评分，具体步骤包括：Furthermore, the user-event bidirectional preference score is obtained in step 3, and the topic matching score and the bidirectional preference score are linearly weighted combined to obtain the final recommendation score of the user-event pair. The specific steps include:

步骤3-1，对用户-事件对求双向偏好：Step 3-1, find the bidirectional preference for the user-event pair:

假设用户和事件的偏好评分权重分别为和，把两者加权融合得到用户事件双向偏好评分；将双向偏好评分的问题转换为求两个偏好评分的权重向量，选择使用隐式反馈作为训练数据学习权重向量；Assume that the preference score weights of users and events are and , weighted fusion of the two to obtain the user event bidirectional preference score ; Convert the problem of bidirectional preference scoring into finding the weight vector of two preference scores, and choose to use implicit feedback as training data to learn the weight vector;

选择基于贝叶斯最大似然估计的学习算法BPR对权重进行排序学习，根据用户对事件的隐式反馈数据学习用户-事件对的正确排序顺序，使得用户参与的事件排在新事件或其它事件之前；首先，定义最大化后验概率，如式(26)所示：We select the learning algorithm BPR based on Bayesian maximum likelihood estimation to rank the weights and learn the correct ranking order of user-event pairs based on the implicit feedback data of users on events, so that the events in which users participate are ranked before new events or other events. First, we define the maximum posterior probability , as shown in formula (26):

(26)； (26);

其中，θ表示权重向量，R表示所有用户-事件对的集合，定义如式(27)所示；Among them, θ represents the weight vector, R represents the set of all user-event pairs, The definition is as shown in formula (27);

(27)； (27);

其中，式中表示用户的用户-事件对，而表示对于用户事件排在前面的概率，如式(28)所示：Among them, Indicates user of user-event pairs, and For users event Ranked The previous probability is shown in formula (28):

(28)； (28);

其中，即为双向偏好评分，；为了更方便进行优化，假设服从均值为0的正态分布，展开推导得出最终优化目标函数，如式(29)所示：in, Bidirectional preference score , ; In order to facilitate optimization, assume Obeying the normal distribution with a mean of 0, the final optimization objective function is derived , as shown in formula (29):

(29)； (29);

其中，表示正则项系数，通过用户事件的隐式交互反馈数据最大化优化目标函数，得出最优权重参数向量；采用随机梯度下降算法SGD求解该优化问题，在迭代过程中从训练集随机提取目标用户的用户-事件对来更新权重向量，更新过程如式(30)所示：in, represents the regularization term coefficient. The objective function is optimized by maximizing the implicit interactive feedback data of user events to obtain the optimal weight parameter vector. The stochastic gradient descent algorithm SGD is used to solve the optimization problem. In the iterative process, the user-event pairs of the target user are randomly extracted from the training set to update the weight vector. , the update process is shown in formula (30):

(30)； (30);

其中，是学习率，；通过以上学习过程可以自动根据用户事件偏好评分训练集和超参数和求得权重向量，从而得到双向偏好评分；in, is the learning rate, ; Through the above learning process, the training set and hyperparameters can be automatically scored according to user event preferences and Find the weight vector , thus obtaining a two-way preference score ;

步骤3-2，结合主题匹配和双向偏好求得用户-事件对最终推荐评分：Step 3-2, combine topic matching and bidirectional preference to obtain the final recommendation score for the user-event pair:

首先，通过LDA主题模型提取事件主题并求得用户和事件的主题匹配度评分；其次，根据EBSN中的用户事件上下文信息分别构建用户和事件的偏好模型，通过BPR学习算法得到用户事件双向偏好评分；最后，将主题匹配度评分与用户事件双向偏好评分线性加权求和得到最终的用户-事件对推荐度评分，如式(31)所示：Firstly, the LDA topic model is used to extract event topics and obtain the topic matching scores of users and events. Secondly, the preference models of users and events are constructed according to the user event context information in EBSN, and the bidirectional preference scores of users and events are obtained through the BPR learning algorithm. Finally, the topic matching scores are calculated. Bidirectional preference scoring with user events Linear weighted summation is used to obtain the final user-event recommendation score , as shown in formula (31):

(31)； (31);

其中，为权重参数，通常根据经验手动设定，将通过实验来确定最优设置。in, is a weight parameter, which is usually set manually based on experience, and the optimal setting will be determined through experiments.

以及，一种融合主题匹配与双向偏好的个性化事件推荐的实现系统，其用于实现如上任一项所述的融合主题匹配与双向偏好的个性化事件推荐方法，该实现系统包括：And, a system for implementing personalized event recommendation integrating topic matching and two-way preference, which is used to implement the personalized event recommendation method integrating topic matching and two-way preference as described in any one of the above items, and the implementation system includes:

文档主题生成模块，用于提取用户历史事件和新事件的主题，并计算事件的主题分布和词分布，以用户历史事件和新事件之间的主题相似度表示主题匹配度，将其作为推荐的关键因素之一融合到推荐模型中，以进行事件推荐；The document topic generation module is used to extract the topics of the user's historical events and new events, and calculate the topic distribution and word distribution of the events. The topic similarity between the user's historical events and new events is used to represent the topic matching degree, which is integrated into the recommendation model as one of the key factors for recommendation to perform event recommendation.

构建用户偏好模块，用于从地理位置、社交关系、时间因素三个方面来构建用户的单因素偏好，并将三个单因素偏好加权融合得到用户整体偏好；Build a user preference module to construct the user's single factor preference from three aspects: geographic location, social relationship, and time factor, and weightedly integrate the three single factor preferences to obtain the user's overall preference;

构建事件偏好模块，利用以事件主办方在小组中的社交影响力，以及事件举办的地理位置在小组中的流行度来表示事件的偏好；Construct an event preference module, which uses the social influence of the event organizer in the group and the popularity of the geographical location where the event is held in the group to express the preference for the event;

用户事件双向偏好评分模块，利用排序学习算法对用户偏好评分和事件偏好评分的权重参数进行求解，得到用户事件双向偏好评分；The user event bidirectional preference scoring module uses a ranking learning algorithm to solve the weight parameters of the user preference score and the event preference score to obtain the user event bidirectional preference score;

用户-事件对的最终推荐评分模块，用于将主题匹配度评分和双向偏好评分线性加权组合得出用户-事件对的最终推荐度评分。The final recommendation score module for the user-event pair is used to linearly weight the topic matching score and the two-way preference score to obtain the final recommendation score of the user-event pair.

进一步地，所述用户偏好模块包括地理位置偏好模块、社交关系偏好模块和时间因素偏好模块，所述事件偏好模块包括事件位置流行度偏好模块和事件主办方影响力偏好模块，其中：Furthermore, the user preference module includes a geographic location preference module, a social relationship preference module and a time factor preference module, and the event preference module includes an event location popularity preference module and an event organizer influence preference module, wherein:

所述地理位置偏好模块，用于通过预测用户参与某个地理位置举办的事件活动的概率，来表示地理位置偏好评分；The geographic location preference module is used to express the geographic location preference score by predicting the probability of a user participating in an event held in a certain geographic location;

所述社交关系偏好模块，用于从目标用户与组的关系、与组内用户相关性两个方面计算目标用户的社交偏好评分；The social relationship preference module is used to calculate the social preference score of the target user from two aspects: the relationship between the target user and the group, and the correlation with the users in the group;

所述时间因素偏好模块，用于通过构建日期和小时两个粒度的统一向量表示，并计算用户-事件对的相似度作为目标用户的时间偏好评分；The time factor preference module is used to construct a unified vector representation of two granularities, date and hour, and calculate the similarity of the user-event pair as the time preference score of the target user;

所述事件位置流行度偏好模块，用于在新事件推荐时，举办地点对于感兴趣的用户来说是重要的选择依据，称为地理位置在用户群体中的流行度，考虑事件地理位置的受欢迎程度能够更加精确地计算事件对用户的吸引度；The event location popularity preference module is used to recommend new events. The location of the event is an important selection basis for interested users, which is called the popularity of the geographical location among the user group. Considering the popularity of the geographical location of the event can more accurately calculate the attractiveness of the event to the user;

所述事件主办方影响力偏好模块，用于根据事件主办方在目标用户所在群组的影响力来提升推荐的精确度，从事件主办方对目标用户的影响度和事件主办方在小组中的影响度两个方面计算其影响力。The event organizer influence preference module is used to improve the accuracy of recommendations based on the influence of the event organizer in the target user's group, and calculates the influence of the event organizer from two aspects: the influence of the event organizer on the target user and the influence of the event organizer in the group.

上述基于融合主题匹配与双向偏好的个性化事件推荐方法及系统中，首先，利用文档主题生成模型LDA提取事件的主题信息，并根据用户参与的历史事件记录得到用户主题信息，计算用户与事件的主题匹配度作为推荐模型中的重要推荐因素，主题因素能更好地表示特征偏好；其次，对于基于事件的社交网络推荐从用户和事件的双向角度考虑，构建用户和事件的偏好模型，分别得到用户偏好评分和事件偏好评分，从用户和事件两个角度更完整地挖掘偏好关系；最后，将用户-事件对匹配度融合用户事件双向偏好线性加权组合得到最终的用户-事件对综合评分，将排序后的前K（即，TOP-K）个用户-事件对作为推荐结果。本方案在Meetup真实数据集上进行了大量实验，并与其它的事件推荐算法进行了比较，表明了本软件推荐算法的性能优于传统的推荐方案，能很好地预测用户的个性化偏好，从而达到个性化推荐的目的。In the above-mentioned personalized event recommendation method and system based on the fusion of topic matching and bidirectional preference, firstly, the topic information of the event is extracted by using the document topic generation model LDA, and the user topic information is obtained according to the historical event records of the user's participation, and the topic matching degree between the user and the event is calculated as an important recommendation factor in the recommendation model. The topic factor can better represent the feature preference; secondly, for the event-based social network recommendation, from the bidirectional perspective of users and events, the preference model of users and events is constructed, and the user preference score and event preference score are obtained respectively, and the preference relationship is more completely mined from the two perspectives of users and events; finally, the user-event pair matching degree is fused with the user-event bidirectional preference linear weighted combination to obtain the final user-event pair comprehensive score, and the top K (i.e., TOP-K) user-event pairs after sorting are used as the recommendation results. This scheme has been extensively experimented on the real Meetup dataset and compared with other event recommendation algorithms, which shows that the performance of this software recommendation algorithm is better than the traditional recommendation scheme, and can well predict the personalized preferences of users, thereby achieving the purpose of personalized recommendation.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明实施例的融合主题匹配与双向偏好的个性化事件推荐方法及系统的整体推荐融合框架结构图。FIG1 is a diagram of an overall recommendation fusion framework structure of a personalized event recommendation method and system integrating topic matching and two-way preference according to an embodiment of the present invention.

图2是本发明实施例的融合主题匹配与双向偏好的个性化事件推荐方法及系统的文档主题生成模型LDA的结构框图。FIG. 2 is a structural block diagram of a document topic generation model LDA of a personalized event recommendation method and system integrating topic matching and bidirectional preference according to an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

本实施例以融合主题匹配与双向偏好的个性化事件推荐方法为例，以下将结合具体实施例和附图对本发明进行详细说明。This embodiment takes the personalized event recommendation method integrating topic matching and two-way preference as an example, and the present invention will be described in detail below in conjunction with specific embodiments and drawings.

请参阅图1和图2，示出本发明实施例提供的一种融合主题匹配与双向偏好的个性化事件推荐方法及系统。Please refer to FIG. 1 and FIG. 2 , which illustrate a personalized event recommendation method and system integrating topic matching and two-way preference provided by an embodiment of the present invention.

这里具体讲解本软件的融合主题匹配与双向偏好的个性化事件推荐系统涉及到的技术细节。其主要思想是，首先，通过LDA主题模型计算新事件和用户历史事件的主题，采用余弦相似度计算用户-事件对的主题匹配度，并分别构建用户偏好模型和事件偏好模型。其中，用户偏好模型从时间、地理、社交关系三个方面计算用户的综合偏好评分。事件偏好模型根据新事件在目标用户小组中的地理位置流行度以及主办方的组内社交影响度表示事件潜在偏好评分。然后，利用贝叶斯个性化排序算法(Bayesian Personalized Ranking,BPR)学习用户偏好评分和事件偏好评分的权重参数，得到用户事件双向偏好评分。最后，与主题匹配度线性加权融合求得用户-事件对最终推荐度评分，向用户推荐排序后的TOP-K个事件。即，本软件将用户和事件主题匹配，结合几类主要的上下文信息计算用户偏好及事件潜在偏好，最终融合主题匹配度与用户-事件双向偏好来进行事件推荐。Here we will explain the technical details of the personalized event recommendation system of this software that integrates topic matching and two-way preference. The main idea is that, first, the topics of new events and user historical events are calculated through the LDA topic model, the topic matching degree of user-event pairs is calculated using cosine similarity, and the user preference model and event preference model are constructed respectively. Among them, the user preference model calculates the user's comprehensive preference score from three aspects: time, geography, and social relationship. The event preference model represents the potential preference score of the event according to the geographical location popularity of the new event in the target user group and the social influence of the organizer within the group. Then, the Bayesian Personalized Ranking (BPR) algorithm is used to learn the weight parameters of the user preference score and the event preference score to obtain the user event two-way preference score. Finally, the final recommendation score of the user-event pair is obtained by linear weighted fusion with the topic matching degree, and the top-k events after sorting are recommended to the user. That is, this software matches users and event topics, calculates user preferences and event potential preferences in combination with several major contextual information, and finally integrates the topic matching degree and the user-event two-way preference to recommend events.

1.融合LDA主题匹配与用户事件双向偏好的推荐框架1. Recommendation framework integrating LDA topic matching and user event bidirectional preference

在当前已有工作的基础上，基于EBSN中的地理位置信息、时间信息、社交关系及其它相关的用户事件上下文信息，提出一种结合用户-事件对主题匹配和用户-事件对双向偏好的事件推荐方案。在该方案中，分别考虑了主题匹配度、用户偏好及事件偏好对事件推荐的影响，并融合这些因素有效地对用户进行兴趣事件推荐。推荐模型的总体框架如图1所示，其具体推荐过程如下：On the basis of existing work, based on the geographic location information, time information, social relations and other relevant user event context information in EBSN, an event recommendation scheme combining user-event pair topic matching and user-event pair bidirectional preference is proposed. In this scheme, the influence of topic matching, user preference and event preference on event recommendation is considered respectively, and these factors are integrated to effectively recommend events of interest to users. The overall framework of the recommendation model is shown in Figure 1, and its specific recommendation process is as follows:

1) 根据EBSN中事件的描述文档利用LDA主题模型计算新事件和目标用户的历史事件主题，以用户历史事件的主题表示用户的主题，然后计算事件与用户主题分布的语义相似度，得到用户-事件主题的匹配度评分。1) Based on the description documents of events in EBSN, the LDA topic model is used to calculate the topics of new events and historical events of the target user. The topics of the user's historical events are used to represent the user's topics. Then, the semantic similarity between the event and user topic distribution is calculated to obtain the matching score of the user-event topic.

2) 计算用户偏好评分和事件偏好评分，对于用户偏好分别从地理位置、社交关系、时间三个方面计算偏好评分并线性融合，而事件偏好则是通过事件举办地理位置的流行度和事件主办方的社交影响力来表示，同样进行线性融合得到事件偏好评分。需要注意的是，计算关于事件的地理位置流行度和主办方影响力时，只针对目标用户所在的小组和组内用户，对于其他用户及小组的关联全部忽略，以提高推荐性能并降低计算复杂度。2) Calculate user preference scores and event preference scores. For user preference, the preference scores are calculated from three aspects: geographic location, social relationship, and time, and linearly integrated. Event preference is represented by the popularity of the geographic location of the event and the social influence of the event organizer. Linear integration is also performed to obtain the event preference score. It should be noted that when calculating the geographic location popularity and organizer influence of the event, only the group and users in the group to which the target user belongs are calculated, and all associations with other users and groups are ignored to improve recommendation performance and reduce computational complexity.

3) 通过以上的计算得到用户-事件主题的匹配度评分和用户对事件的偏好评分，以及事件对用户的偏好评分。先利用贝叶斯个性化排序算法学习用户偏好评分和事件偏好评分的权重，从而根据权重融合用户和事件的偏好评分得到双向偏好评分，最后线性组合主题匹配度评分及双向偏好评分信息得到最终的用户-事件对推荐度评分，并向用户推荐评分最高的TOP-K事件。3) Through the above calculations, we get the user-event topic matching score, the user's preference score for the event, and the event's preference score for the user. We first use the Bayesian personalized ranking algorithm to learn the weights of the user preference score and the event preference score, and then fuse the user and event preference scores according to the weights to get the two-way preference score. Finally, we linearly combine the topic matching score and the two-way preference score information to get the final user-event recommendation score, and recommend the top-k events with the highest scores to the user.

2.基于LDA的主题匹配模型2. LDA-based topic matching model

在事件社交网络中用户和事件之间存在明显的主题语义相似关系，用户通常选择参与某一类感兴趣的事件，一般这一类事件具有相似的属性和主题。在推荐中应用事件的主题能更好地捕捉用户和事件的偏好，我们以用户参加的历史事件的主题表示用户主题，并计算新事件主题分布和词分布，以用户历史事件和新事件之间的主题相似度表示主题匹配度，将其作为推荐的关键因素之一融合到推荐模型中进行事件推荐。In event social networks, there is an obvious topic semantic similarity relationship between users and events. Users usually choose to participate in a certain type of event of interest, which generally has similar attributes and topics. Applying the topic of events in recommendations can better capture the preferences of users and events. We represent user topics with the topics of historical events that users have participated in, and calculate the topic distribution and word distribution of new events. We represent the topic matching degree with the topic similarity between user historical events and new events, and integrate it into the recommendation model as one of the key factors for recommendation to make event recommendations.

当两个文档具有相同的主题等特征时，用TF-IDF(Term Frequency–InverseDocument Frequency)算法很难区分这两个对象，因此选择基于贝叶斯的LDA主题模型来计算文档主题分布和词分布。LDA主题模型是一种用于计算文档主题分布的贝叶斯概率模型，用于为文档聚类潜在主题并生成文档主题。其核心思想是，每个文档以一定概率选择了某个主题，并从这个主题中以一定概率选择某个词语，认为在任意文档中的主题均符合Dirichlet分布，通过该分布可以发掘文本之间的关系。LDA 由三层生成式贝叶斯网络结构组成，包含文档、主题、和词，文档-主题和主题-词都服从多项式分布。LDA主题模型生成过程如图2所示。When two documents have the same topic and other features, it is difficult to distinguish the two objects using the TF-IDF (Term Frequency–Inverse Document Frequency) algorithm. Therefore, the Bayesian-based LDA topic model is selected to calculate the document topic distribution and word distribution. The LDA topic model is a Bayesian probability model for calculating the document topic distribution. It is used to cluster potential topics for documents and generate document topics. The core idea is that each document selects a topic with a certain probability, and selects a word from this topic with a certain probability. It is believed that the topic in any document conforms to the Dirichlet distribution, through which the relationship between texts can be discovered. LDA consists of a three-layer generative Bayesian network structure, including documents, topics, and words. Both document-topic and topic-word obey multinomial distribution. The LDA topic model generation process is shown in Figure 2.

给定文档集，图2中和分别表示文档的主题分布和词分布的先验Dirichlet分布，分别是根据经验给定的主题先验分布和词先验分布的超参数，k是事先指定的文档集的主题数，N _m表示文档的单词总数，M是文档集中的文档数。对于文档中的每一个单词，LDA根据先验知识确定文档的主题分布，然后从主题分布中抽取一个主题z，又根据先验知识确定当前主题的词分布，再从主题z所对应的词分布中抽取一个单词，重复以上过程N _m次即可生成文档。在这个过程中利用Gibbs采样方法即可求解文档的主题分布。Given a set of documents , Figure 2 and Respectively represent documents The prior Dirichlet distribution of topic distribution and word distribution, are the hyperparameters of the topic prior distribution and word prior distribution given by experience, k is the number of topics in the document set specified in advance, _and Nm represents the document The total number of words in the document set. For each word in, LDA uses prior knowledge Determine the topic distribution of documents , and then from the topic distribution Extract a topic z from the Determine the word distribution of the current topic , and then from the word distribution corresponding to the topic z Extract a word from Repeat the above process N _m times to generate the document In this process, the Gibbs sampling method can be used to solve the document The topic distribution.

根据LDA计算用户和事件之间的主题相似度，先将文本内容转换为语义特征，对每一个事件利用LDA主题模型计算主题分布。事件内容主要由标题和描述文档构成，还包括时间和举办地点等信息，可以通过事件内容提取事件主题。相对地，用户可以选择设置兴趣标签来表示偏好，然而很多用户并不会设置兴趣标签或自我简介等内容，用户内容缺乏文档信息且面临数据极度稀疏问题，此时没有可用的特征表示用户主题，所以选择用户参与的历史事件的主题来表达用户主题更加准确，且避免了数据稀疏和标签空白的问题。对所有事件内容去除停用词及标点符号，把去除噪声干扰词后的文档内容视为所有文档的集合D，输入到LDA主题模型中，根据上面描述的生成过程产生文档主题和词的联合分布，如式(1)所示。然后使用Gibbs抽样方法估计模型中的两个未知参数，即事件主题分布和主题词分布。According to LDA, the topic similarity between users and events is calculated. The text content is first converted into semantic features, and the topic distribution is calculated for each event using the LDA topic model. Event content is mainly composed of titles and description documents, as well as information such as time and venue. Event topics can be extracted through event content. Relatively speaking, users can choose to set interest tags to express their preferences. However, many users do not set interest tags or self-introductions. User content lacks document information and faces the problem of extremely sparse data. At this time, there are no available features to represent user topics, so it is more accurate to choose the topic of historical events in which the user participated to express the user topic, and avoid the problems of data sparsity and label blanks. Stop words and punctuation marks are removed from all event content, and the document content after removing noise interference words is regarded as the set D of all documents, which is input into the LDA topic model to generate documents according to the generation process described above. Joint distribution of topics and words , as shown in formula (1). Then the Gibbs sampling method is used to estimate the two unknown parameters in the model, namely the event topic distribution and keyword distribution .

(1) (1)

经过LDA过程得到事件文档的主题分布和词分布之后，接着利用JS散度(JensenShannon divergence)方法根据事件的主题分布计算事件间的相似度。JS散度是基于KL散度(Kullback-Leibler divergence)的变体，它是对称的，解决了KL散度非对称问题，可以更好地度量两个概率分布的相似度。根据式(1)已经生成所有事件的主题分布，给定事件和分别具有主题分布，通过JS散度方法首先计算两者之间的JS散度，如式(2)所示。After obtaining the topic distribution and word distribution of the event document through the LDA process, the JS divergence (JensenShannon divergence) method is then used to calculate the similarity between events based on the topic distribution of the events. JS divergence is a variant based on KL divergence (Kullback-Leibler divergence). It is symmetric and solves the asymmetric problem of KL divergence. It can better measure the similarity of two probability distributions. According to formula (1), the topic distribution of all events has been generated , given an event and They have topic distribution , first calculate the JS divergence between the two through the JS divergence method , as shown in formula (2).

(2) (2)

其中，，表示KL散度，用来描述两个概率分布和之间的差异，计算公式如式(3)所示。in, , Represents KL divergence, which is used to describe two probability distributions and The difference between them is calculated as shown in formula (3).

(3) (3)

结合式(2)和式(3)可得事件和的主题相似度为，如式(4)所示。Combining equation (2) and equation (3), we can get the event and The topic similarity is , as shown in formula (4).

(4) (4)

事件的主题相似度的值位于[0,1]中，值越接近1则表示事件相似度越高。前面已经提到把新事件和用户历史事件之间的主题相似度作为用户与事件的主题相似度，而用户往往参与过多次事件，和新事件之间存在多个主题相似度，以表示目标用户的历史事件数,取目标用户所有相似度的平均值作为用户和新事件的主题匹配度评分,如式(5)所示。Thematic similarity of events The value of lies in [0,1]. The closer the value is to 1, the higher the event similarity is. As mentioned above, the topic similarity between the new event and the user's historical events is used as the topic similarity between the user and the event. Users often participate in multiple events and have multiple topic similarities with the new event. Represents the number of historical events of the target user, taking the average value of all similarities of the target user As the topic matching score between the user and the new event, as shown in formula (5).

(5) (5)

算法1描述了通过LDA主题模型计算用户-事件对的主题匹配度过程,其中表示主题的单词分布，表示文档主题分布，Dir()表示Dirichlet分布，Mult()表示多项式分布，Poiss()表示泊松分布。Algorithm 1 describes the process of calculating the topic matching degree of user-event pairs through the LDA topic model, where The distribution of words representing the topic, represents document topic distribution, Dir () represents Dirichlet distribution, Mult () represents multinomial distribution, and Poiss () represents Poisson distribution.

算法1给出了利用LDA主题模型和JS散度算法求解用户-事件对主题匹配度评分的过程。首先，将所有事件描述内容组成文档集并去除停用词，作为LDA模型的输入，分别求得每个事件的主题分布(第2行至第11行)；再根据JS散度算法计算目标用户历史事件和新事件之间的主题分布相似度(第12行至第14行)；最后对目标用户所有历史事件的相似度取平均值，得到用户和新事件的主题匹配度评分(第15行至第16行)。Algorithm 1 gives the process of solving the user-event topic matching score using the LDA topic model and the JS divergence algorithm. First, all event descriptions are organized into a document set and stop words are removed as the input of the LDA model, and the topic distribution of each event is obtained respectively (lines 2 to 11); then the topic distribution similarity between the target user's historical events and new events is calculated according to the JS divergence algorithm (lines 12 to 14); finally, the similarity of all historical events of the target user is averaged to obtain the topic matching score between the user and the new event (lines 15 to 16).

3. 基于用户的偏好模型3. User-based preference model

对于用户偏好一般从用户的相关上下文信息中进行特征学习，并将学习到的特征信息表示为用户偏好。下面分别从地理因素、社交关系、时间因素三个方面来构建用户的单因素偏好，并将三个单因素偏好加权融合得到用户整体偏好。For user preferences, feature learning is generally performed from the user's relevant context information, and the learned feature information is expressed as user preferences. The following constructs the user's single factor preferences from three aspects: geographical factors, social relationships, and time factors, and then weights and fuses the three single factor preferences to obtain the user's overall preference.

3.1 地理位置偏好3.1 Geographic location preference

地理位置偏好模型计算目标用户将参与在该位置举办事件的概率，采用KDE(Kernel Density Estimation，核密度估计)方法对用户参与的事件的二维地理位置分布进行建模，用归一化之后的事件参与概率表示用户对地理位置的偏好度。事件地理位置的经纬度坐标用(Lx, Ly)表示，用户历史参与事件的地点集合以L(u)表示，则关于用户u的KDE函数如式(6)所示。The location preference model calculates the probability that the target user will participate in an event held at that location. It uses the KDE (Kernel Density Estimation) method to model the two-dimensional location distribution of the events that the user participates in, and uses the normalized event participation probability to represent the user's preference for the location. The longitude and latitude coordinates of the event location are represented by ( Lx, Ly ), and the set of locations where the user has historically participated in events is represented by L(u). The KDE function for user u is As shown in formula (6).

(6) (6)

其中，l _i=(Lx _i ,Ly _i)^T表示事件位置经纬度坐标的二维化向量，m _l(u,l _i)表示用户u参加地理位置l _i处举办活动的频率，σ表示邻域窗口(带宽)的大小，N表示位置样本中的个数，K(•)表示高斯核函数(Gaussian kernel function)，其定义形式如式(7)所示。Among them, l _i =( Lx _i ,Ly _i ) ^T represents the two-dimensional vector of the longitude and latitude coordinates of the event location, m _l ( u,l _i ) represents the frequency of user u participating in activities held at geographic location l _i , σ represents the size of the neighborhood window (bandwidth), N represents the number of location samples, K ( • ) represents the Gaussian kernel function, and its definition is shown in formula (7).

(7) (7)

结合式(6)和式(7)可定义用户u参加将在位置为l处举办的事件活动的概率，如式(8)所示。Combining equations (6) and (7), we can define the probability that user u will participate in an event held at location l , as shown in equation (8).

(8) (8)

将概率归一化，得到用户关于地理位置的偏好评分，如式(9)所示。Normalize the probability to get the user's preference score for geographic location , as shown in formula (9).

(9) (9)

3.2 社交关系偏好3.2 Social relationship preference

在用户社交关系网络中，用户一般会在线上加入至少一个或多个兴趣组中，并可以选择参与不同的小组发布的事件活动。在这些群组关系中，用户通常选择的是自身最感兴趣的偏好小组参与其中，则在同一个组中的成员一般都存在相同的兴趣，因此，可以通过用户的线上同组关系考虑用户的社会关系偏好，主要包括两种交互关系。In the user social relationship network, users generally join at least one or more interest groups online and can choose to participate in events published by different groups. In these group relationships, users usually choose to participate in the preferred group that they are most interested in. Members in the same group generally have the same interests. Therefore, the user's social relationship preferences can be considered through the user's online group relationships, which mainly include two types of interactive relationships.

1) 用户与组的相关性。即用户与他们所属的所有组之间以及用户与组内创建的事件之间的交互关系。以G(u)表示用户u参与的事件所属的组的集合，则用户与组的相关性可表示成式(10)所示。1) The correlation between users and groups. That is, the interactive relationship between users and all the groups they belong to, and between users and events created in the groups. Let G ( u ) represent the set of groups to which the events that user u participates belong. Then the correlation between users and groups is It can be expressed as shown in formula (10).

(10) (10)

其中，m _p(u,g)表示用户所在组中用户u曾参加的事件活动集合。Among them, m _p ( u,g ) represents the set of events and activities that user u has participated in in the user group.

2) 组内用户相关性。组内用户相关性由目标用户所在组中的好友相似性来定义，计算目标用户与组内用户的相似性，如式(11)所示。2) Intra-group user relevance. Intra-group user relevance is defined by the similarity of friends in the target user’s group. The similarity between the target user and the users in the group is calculated. , as shown in formula (11).

(11) (11)

其中，sim(u _i ,u _j)表示同一组中用户u _i和用户u _j之间的相似性，如式(12)所示。Among them, sim ( u _i ,u _j ) represents the similarity between user u _i and user u _j in the same group, as shown in formula (12).

(12) (12)

最后将s(u,g)归一化为，如式(13)所示。Finally, s ( u,g ) is normalized to , as shown in formula (13).

(13) (13)

结合这两种交互关系，属于相同或相似组的用户倾向于参加由这些组内创建的事件，综合用户与组的相关性和组内用户相关性得出用户u关于线上小组g的社交偏好评分，如式(14)所示。Combining these two interactive relationships, users belonging to the same or similar groups tend to participate in events created by these groups. Combining the correlation between users and groups and the correlation between users within groups, we can derive the social preference score of user u regarding online group g . , as shown in formula (14).

(14) (14)

其中，作为权重控制参数，在社交关系网络中，一般认为目标用户和小组的偏好关联与组内用户之间的关联同等重要，通过实验验证将此处的值设为0.5。in, As a weight control parameter, in social relationship networks, it is generally believed that the preference association between the target user and the group is as important as the association between users in the group. The value of is set to 0.5.

3.3 时间偏好3.3 Time Preference

事件的时间因素是计算用户偏好时需要考虑的另一个重要偏好因子。对于不同的用户在选择参加事件活动时有不同的偏好，有的用户可能喜欢选择在晚上参加活动，而另一些可能喜欢在上午参加活动，又或者偏好工作日或者周末的不同时间点。现实中时间是周期性的，主要以每周7天和每天24小时为周期，对于用户选择在一周中的某一天和在一天中某几个小时参加活动，会形成两个不同的粒度层次上的用户时间偏好。我们通过结合两个粒度层次上的用户选择来表示用户的时间偏好。The time factor of the event is another important preference factor that needs to be considered when calculating user preferences. Different users have different preferences when choosing to participate in events. Some users may prefer to participate in events in the evening, while others may prefer to participate in events in the morning, or they may prefer different time points on weekdays or weekends. In reality, time is cyclical, mainly based on 7 days a week and 24 hours a day. For users to choose to participate in events on a certain day of the week and a certain hour of the day, two different levels of user time preferences will be formed. We represent the user's time preference by combining the user's choices at two levels of granularity.

用户如果选择一星期中某一天的某个时间段参加活动，这可能表示用户的一个隐式时间偏好，用户可能会选择在下一次的同一时间段再次参加事件活动。为了统一直观地表示这种隐式偏好，我们将用户可以选择参加的新事件e表示为一个7*24维的事件时间向量。当新事件在一周的某个特定时间段中发生时，即将该时间段的向量分量值置为1，否则为0。因此，可以在时间偏好模型中根据用户参加的历史事件记录将用户表示为用户时间向量，如式(15)所示。If a user chooses a certain time period on a certain day of the week to participate in an event, this may indicate an implicit time preference of the user, and the user may choose to participate in the event again in the same time period next time. In order to uniformly and intuitively represent this implicit preference, we represent the new event e that the user can choose to participate in as a 7*24-dimensional event time vector When a new event occurs in a specific time period of a week, the vector component value of that time period is set to 1, otherwise it is set to 0. Therefore, in the time preference model, users can be represented as user time vectors based on the historical event records that the users participated in. , as shown in formula (15).

(15) (15)

其中，E _u表示目标用户参与过的历史事件集合，然后计算用户时间向量和新事件时间向量之间的余弦相似度，如式(16)所示。Among them, Eu represents the set of historical events that the target user has participated in, and then calculates the cosine similarity between the _user time vector and the new event time vector , as shown in formula (16).

(16) (16)

对于新事件，用户可根据式(16)求得相似度，归一化该相似度得到该用户对事件的时间偏好评分，如式(17)所示。For new events ,user The similarity can be obtained according to formula (16): , normalize the similarity to get the user's time preference score for the event , as shown in formula (17).

(17) (17)

3.4 用户融合偏好评分3.4 User Fusion Preference Scoring

根据前面从三个方面对用户的单因素偏好模型建模，分别计算了用户关于地理位置、社交关系以及时间的偏好评分。对于地理位置，通过预测用户参与该位置举办的事件活动的概率表示地理位置偏好评分；对于社交关系，则从目标用户与组的关系、与组内用户相关性两个方面计算目标用户的社交偏好评分；对于时间偏好，则通过构建日期和小时两个粒度的统一向量表示，并基于此计算用户-事件对的相似度作为目标用户的时间偏好评分。结合这三个单因素偏好组成一个用户偏好感知模型，将三个单因素偏好线性组合求得用户u对事件e的总体偏好评分，如式(18)所示。Based on the previous modeling of the user's single-factor preference model from three aspects, the user's preference scores for geographic location, social relationships, and time are calculated respectively. For geographic location, the geographic location preference score is expressed by predicting the probability of the user participating in events held at that location; for social relationships, the target user's social preference score is calculated from two aspects: the relationship between the target user and the group, and the correlation with the users in the group; for time preference, it is represented by constructing a unified vector with two granularities of date and hour, and based on this, the similarity of the user-event pair is calculated as the target user's time preference score. These three single-factor preferences are combined to form a user preference perception model, and the three single-factor preferences are linearly combined to obtain the overall preference score of user u for event e. , as shown in formula (18).

(18) (18)

其中，分布表示用户在地理位置、社交关系、时间三个因素上的偏好评分。算法2描述了用户偏好评分的计算过程。in, The distribution represents the user's preference scores on the three factors of geographic location, social relationship, and time. Algorithm 2 describes the calculation process of user preference scores.

算法2给出了结合用户在地理位置、社交关系、时间三个因素上的偏好求解用户综合偏好评分的过程。通过核密度估计算法预测用户可能参加在某个特定位置举办的事件的概率，将概率归一化后表示用户的地理偏好(第3行)；根据式(10)和(13)计算用户与线上小组和组内成员的社交关联度表示社交偏好(第5行至第11行)；将新事件和用户历史事件表示为时间向量，计算两者的余弦相似度表示用户的时间偏好(第4行)；最后对三个偏好值线性组合得到用户总偏好评分(第13行至第14行)。Algorithm 2 gives the process of solving the user's comprehensive preference score by combining the user's preferences in geographic location, social relationship, and time. The kernel density estimation algorithm is used to predict the probability that the user may attend an event held in a specific location, and the probability is normalized to represent the user's geographic preference (line 3); the social association between the user and the online group and the members in the group is calculated according to equations (10) and (13) to represent the social preference (lines 5 to 11); new events and user historical events are represented as time vectors, and the cosine similarity between the two is calculated to represent the user's time preference (line 4); finally, the three preference values are linearly combined to obtain the user's total preference score (lines 13 to 14).

4. 基于事件的偏好模型4. Event-based Preference Model

对于事件的偏好，考虑从事件主办方以及事件本体信息中学习。由于事件相比用户缺少活跃的个性化上下文信息，对于新事件来说，它不存在历史记录、个性化标签等信息，因此，以事件主办方在小组中的社交影响力，以及事件举办的地理位置在小组中的流行度来表示事件的偏好。For event preferences, we consider learning from event organizers and event ontology information. Since events lack active personalized context information compared to users, new events do not have historical records, personalized tags, and other information. Therefore, we use the social influence of the event organizer in the group and the popularity of the geographical location where the event is held in the group to represent event preferences.

4.1 事件位置流行度4.1 Event Location Popularity

事件举办的地理位置是用户选择是否参加事件活动的一个考虑因素。对于用户加入的某个线上小组一般是具有相同兴趣的用户群体，可能有多个用户选择参加相同的事件活动，因此，对于新事件推荐，其举办地点对于感兴趣的用户来说可以作为重要的选择依据，将这种关系称为地理位置在用户群体中的流行度。在计算事件偏好的模型中考虑事件地理位置的受欢迎程度能够更加精确地计算事件对用户的吸引度。根据用户u和其所加入的线上小组g中的用户对地点访问频率来计算地理位置的流行度。The geographical location of an event is a factor that users consider when choosing whether to attend an event. For an online group that a user joins, it is generally a group of users with the same interests. There may be multiple users who choose to attend the same event. Therefore, for new event recommendations, the location of the event can be an important basis for interested users to choose. This relationship is called the popularity of the geographical location in the user group. Considering the popularity of the geographical location of an event in the model for calculating event preferences can more accurately calculate the attractiveness of the event to users. The popularity of the geographical location is calculated based on the frequency of visits to the location by user u and the users in the online group g that he joined.

首先定义事件地理位置l _e关于用户u的流行度，如式(19)所示。First, define the popularity of the event location l _e with respect to user u , as shown in formula (19).

(19) (19)

其中，分子为用户u参加地理位置l _e处举办活动的频率，分母为用户u历史访问过的位置的最大频率。同样地，可以定义地理位置l _e关于用户u所在小组g的流行度，如式(20)所示。Among them, the molecule is the frequency of user u participating in activities held at location l _e , and the denominator is the maximum frequency of locations that user u has visited in the past. Similarly, the popularity of location l _e with respect to group g to which user u belongs can be defined as , as shown in formula (20).

(20) (20)

其中，分子表示小组g中每个用户在位置l参加实践活动的频率，分母为小组成员历史访问过的位置的最大频率，由此可计算出地理位置l _e关于小组g中的用户的流行度。结合和可定义要推荐事件的举办位置对目标用户u的总流行度为，如式(21)所示。The numerator represents the frequency of each user in group g participating in practical activities at location l , and the denominator is the maximum frequency of locations that group members have visited historically. This allows us to calculate the popularity of geographic location l _e with respect to users in group g . and The total popularity of the location of the recommended event to the target user u can be defined as , as shown in formula (21).

(21) (twenty one)

4.2 事件主办方影响力4.2 Influence of event organizers

在事件社交网络中，每个事件活动的发起者也是网络上的普通用户，一般主办方发起某次活动获得较好的反响，那么下次发起其它新活动时，之前参加的用户很大可能会选择再次参加其举办的活动。虽然要推荐的事件对于每个用户来说是尚未发生的全新事件，但事件的主办方也许是该类型事件的活跃举办方，可能在以前已经主办过多次活动，这对于解决事件推荐中存在的冷启动问题提供了更多的辅助推荐信息。可见，事件主办方在小组内用户群体中的影响力是事件偏好的一个重要特征，本软件根据事件主办方在目标用户所在群组的影响力来提升推荐的精确度。可以从以下两个方面考虑其影响力。In event social networks, the initiator of each event is also an ordinary user on the network. Generally, if the organizer initiates an event and gets a good response, then the next time the organizer initiates other new events, the users who participated before are likely to choose to participate in the event again. Although the event to be recommended is a brand new event that has not yet occurred for each user, the organizer of the event may be an active organizer of this type of event and may have hosted many events in the past. This provides more auxiliary recommendation information for solving the cold start problem in event recommendation. It can be seen that the influence of the event organizer in the user group within the group is an important feature of event preference. This software improves the accuracy of recommendation based on the influence of the event organizer in the group where the target user is located. Its influence can be considered from the following two aspects.

1) 事件主办方对目标用户的影响度。在事件社交网络中不存在用户对事件的评分信息，无法直观地表示主办方及事件的影响力，而且在事件的生命周期结束时再对其评分就没有实际意义，因为不会影响到之后举办的新事件，所以选择通过主办方的信誉度或者影响度来表示事件的隐式偏好。首先定义事件对用户u的影响度，如式(22)所示。1) The influence of the event organizer on the target user. In the event social network, there is no user rating information on the event, so it is impossible to intuitively represent the influence of the organizer and the event. Moreover, it is meaningless to rate the event at the end of its life cycle because it will not affect new events held later. Therefore, the organizer's reputation or influence is used to represent the implicit preference for the event. First, define the influence of the event on user u , as shown in formula (22).

(22) (twenty two)

其中，表示用户u参加过的由主办方u _h举办的事件集合，E _h是主办方u _h举办的所有事件集合。in, represents the set of events organized by organizer u _h that user u has participated in, and E _h is the set of all events organized by organizer u _h .

2) 事件主办方在小组中的影响度。针对目标用户所在的线上小组，事件在该组中的影响度可以类似地采用用户参加的频率比例来表示，用户在组中的影响度以表示，如式(23)所示。2) The influence of the event organizer in the group. For the online group where the target user is located, the influence of the event in the group can be similarly expressed by the frequency ratio of users participating. The influence of the user in the group is expressed as It is represented as shown in formula (23).

(23) (twenty three)

其中，U _g表示小组中的用户集合，表示用户参与的由主办方举办的事件集合，表示在小组中举办的事件集合。结合事件主办方对目标用户以及对小组中用户的影响度可求得事件主办方的综合影响度评分，如式(24)所示。Among them, U _g represents the group The collection of users in Indicates user Participants are organized by A collection of events held, express In the group The event collection held in the group. The comprehensive influence score of the event organizer can be obtained by combining the influence of the event organizer on the target users and the users in the group. , as shown in formula (24).

(24) (twenty four)

4.3 事件潜在偏好评分4.3 Event Potential Preference Scoring

对于未发生的新事件，本软件设置吸引用户参加的两个关键因素为地理位置以及主办方的影响力。通过计算新事件的地理位置流行度和其主办方的社交影响力来表示事件的偏好。为减小计算复杂度，避免弱相关数据的干扰和影响，对于事件地理位置流行度和主办方社交影响力只局限在目标用户所在的小组中。此处假定其余的用户或小组相关度为零，对事件偏好不产生影响。对以上构建得事件地理位置流行度和主办方影响力线性组合从而求出事件e对用户u的偏好评分，如式(25)所示。For new events that have not yet occurred, this software sets the two key factors to attract users to participate as geographic location and the influence of the organizer. The preference for the event is expressed by calculating the geographic location popularity of the new event and the social influence of its organizer. In order to reduce the complexity of calculation and avoid the interference and influence of weakly correlated data, the geographic location popularity of the event and the social influence of the organizer are limited to the group where the target user belongs. It is assumed here that the correlation of the remaining users or groups is zero, which has no effect on the event preference. For the above constructed event geographic location popularity and the influence of the organizers Linear combination to find the preference score of event e for user u , as shown in formula (25).

(25) (25)

算法3详细描述了通过事件位置流行度和主办方影响力计算事件潜在偏好评分的过程。Algorithm 3 describes in detail the process of calculating the event potential preference score based on the event location popularity and organizer influence.

算法3给出了根据事件地理位置流行度和主办方影响力求解事件潜在偏好评分的过程。对于目标用户所在小组，根据式(19)和式(20)分别计算事件地理位置对用户和小组的流行度，结合二者表示事件地理位置的总流行度(第3行至第8行)；同样地由式(22)和式(23)求得事件主办方对用户和小组的影响力(第9行至第13行)，结合二者表示事件主办方影响力；最后对位置流行度和主办方影响力线性组合得到事件的潜在偏好评分(第17行)。Algorithm 3 gives the process of solving the potential preference score of an event based on the popularity of the event's geographic location and the influence of the organizer. For the group to which the target user belongs, the popularity of the event's geographic location to the user and the group is calculated according to equations (19) and (20), and the combination of the two represents the total popularity of the event's geographic location (rows 3 to 8); similarly, the influence of the event organizer on the user and the group is obtained by equations (22) and (23) (rows 9 to 13), and the combination of the two represents the influence of the event organizer; finally, the location popularity and the organizer's influence are linearly combined to obtain the potential preference score of the event (row 17).

5. 融合主题匹配与用户事件双向偏好的推荐算法5. Recommendation algorithm integrating topic matching and user event bidirectional preference

前面已经利用LDA主题模型分别求解了用户和事件的主题分布，并根据主题分布计算了用户-事件对的主题匹配度；接下来又对用户和事件构建了特征偏好评分模型，分别求得用户偏好评分和事件偏好评分。现在将主题匹配和用户事件偏好进行融合求解最终推荐评分，第一步，先利用排序学习算法对用户偏好评分和事件偏好评分的权重参数进行求解，得到用户事件双向偏好评分；第二步，将主题匹配度评分和双向偏好评分线性加权组合得出用户-事件对的最终推荐度评分。下面是具体的介绍。Previously, we used the LDA topic model to solve the topic distribution of users and events respectively, and calculated the topic matching of user-event pairs based on the topic distribution; then we built a feature preference scoring model for users and events to obtain the user preference score and event preference score respectively. Now we will combine the topic matching and user event preference to solve the final recommendation score. In the first step, we use the ranking learning algorithm to solve the weight parameters of the user preference score and event preference score to obtain the user event bidirectional preference score; in the second step, we linearly weight the topic matching score and the bidirectional preference score to obtain the final recommendation score of the user-event pair. The following is a detailed introduction.

1) 对用户-事件对求双向偏好评分。假设用户和事件的偏好评分权重分别为和，把两者加权融合得到用户事件双向偏好评分。于是双向偏好评分的关键问题为求两个偏好评分的权重向量，选择使用隐式反馈作为训练数据学习权重向量。与用户对项目进行评分的显式反馈不同，在事件社交网络中隐式反馈只能以用户和事件之间的交互信息表示，即如果用户参加了事件，反馈为1，否则反馈为0。显然地，对于所有新事件，用户的反馈均为0。1) Calculate the bidirectional preference score for the user-event pair. Assume that the preference score weights of the user and event are and , weighted fusion of the two to obtain the user event bidirectional preference score Therefore, the key problem of two-way preference scoring is to find the weight vector of the two preference scores. We choose to use implicit feedback as training data to learn the weight vector. Different from the explicit feedback of users rating items, implicit feedback in event social networks can only be represented by the interaction information between users and events, that is, if the user participates in the event, the feedback is 1, otherwise the feedback is 0. Obviously, for all new events, the user's feedback is 0.

此处选择基于贝叶斯最大似然估计的学习算法BPR对权重进行排序学习，根据用户对事件的隐式反馈数据学习用户-事件对的正确排序顺序，使得用户参与的事件排在新事件或其它事件之前。首先，定义最大化后验概率，如式(26)所示。Here, we choose the learning algorithm BPR based on Bayesian maximum likelihood estimation to rank the weights and learn the correct ranking order of user-event pairs based on the implicit feedback data of users on events, so that the events in which users participate are ranked before new events or other events. First, we define the maximum posterior probability , as shown in formula (26).

(26) (26)

其中，θ表示权重向量，R表示所有用户-事件对的集合，定义如式(27)所示。Among them, θ represents the weight vector, R represents the set of all user-event pairs, The definition is shown in formula (27).

(27) (27)

其中，表示用户的用户-事件对，而表示对于用户事件排在前面的概率，如式(28)所示。in, Indicates user of user-event pairs, and For users event Ranked The previous probability is shown in formula (28).

(28) (28)

其中，即为双向偏好评分，。为了更方便进行优化，假设服从均值为0 的正态分布，展开推导得出最终优化目标函数，如式(29)所示。in, Bidirectional preference score , To facilitate optimization, assume Obeying the normal distribution with a mean of 0, the final optimization objective function is derived , as shown in formula (29).

(29) (29)

其中，表示正则项系数。通过用户事件的隐式交互反馈数据最大化优化目标函数，即可得出最优权重参数向量。采用随机梯度下降算法(Stochastic Gradient Descent,SGD)求解该优化问题，在迭代过程中从训练集随机提取目标用户的用户-事件对来更新权重向量，更新过程如式(30)所示。in, Represents the regularization term coefficient. By maximizing the objective function through the implicit interactive feedback data of user events, the optimal weight parameter vector can be obtained. The Stochastic Gradient Descent (SGD) algorithm is used to solve the optimization problem. In the iterative process, the user-event pairs of the target user are randomly extracted from the training set to update the weight vector , the updating process is shown in formula (30).

(30) (30)

其中，是学习率，。通过以上学习过程可以自动根据用户事件偏好评分训练集和超参数和求得权重向量，从而得到双向偏好评分。in, is the learning rate, Through the above learning process, the training set and hyperparameters can be automatically scored according to user event preferences. and Find the weight vector , thus obtaining a two-way preference score .

2) 结合主题匹配和双向偏好求得用户-事件对最终推荐评分。综合以上关于用户和事件的主题匹配和偏好计算的讨论，首先，通过LDA主题模型提取事件主题并求得用户和事件的主题匹配度评分；其次，根据EBSN中的用户事件上下文信息分别构建用户和事件的偏好模型，通过BPR学习算法得到用户事件双向偏好评分；最后，将主题匹配度评分与用户事件双向偏好评分线性加权求和得到最终的用户-事件对推荐度评分，如式(31)所示。2) Combine topic matching and bidirectional preference to obtain the final recommendation score of the user-event pair. Based on the above discussion on topic matching and preference calculation of users and events, first, the event topic is extracted through the LDA topic model and the topic matching score of users and events is obtained; secondly, the preference models of users and events are constructed respectively according to the user event context information in EBSN, and the bidirectional preference score of users and events is obtained through the BPR learning algorithm; finally, the topic matching score is calculated. Bidirectional preference scoring with user events Linear weighted summation is used to obtain the final user-event recommendation score , as shown in formula (31).

(31) (31)

其中，为权重参数，通常根据经验手动设定，将通过实验来确定最优设置。算法4描述了融合主题匹配和双向偏好求解用户-事件对最终推荐度评分的过程。in, is a weight parameter, which is usually set manually based on experience. The optimal setting will be determined through experiments. Algorithm 4 describes the process of integrating topic matching and bidirectional preference to solve the final recommendation score of user-event pairs.

算法4给出了最终融合主题匹配评分和用户事件双向偏好评分的过程。首先，通过贝叶斯个性化排序算法对由用户偏好评分集和事件偏好评分集合生成的训练集进行排序学习，求得最优权重向量，并根据计算目标用户的用户-事件对双向偏好评分(第2行至第10行)；其次，线性组合用户-事件对的主题匹配度评分与双向偏好评分得到最终推荐度评分(第11行至第13行)，从而根据最终推荐度评分排序对用户推荐TOP-K事件。Algorithm 4 gives the final fusion process of topic matching score and user event bidirectional preference score. First, the training set generated by the user preference score set and the event preference score set is sorted and learned by the Bayesian personalized sorting algorithm to obtain the optimal weight vector , and according to Calculate the target user's user-event pair bidirectional preference score (rows 2 to 10); secondly, linearly combine the user-event pair's topic matching score and bidirectional preference score to obtain the final recommendation score (rows 11 to 13), and then recommend TOP-K events to the user according to the final recommendation score ranking.

至此，我们结合了主题匹配和用户事件双向偏好，提出了一种个性化事件推荐方案，并在以上部分详细介绍了其具体内容。So far, we have combined topic matching and user-event bidirectional preferences to propose a personalized event recommendation solution, and introduced its specific content in detail in the above section.

上述基于融合主题匹配与双向偏好的个性化事件推荐方法及系统中，首先，利用文档主题生成模型LDA提取事件的主题信息，并根据用户参与的历史事件记录得到用户主题信息，计算用户与事件的主题匹配度作为推荐模型中的重要推荐因素，主题因素能更好地表示特征偏好；其次，对于基于事件的社交网络推荐从用户和事件的双向角度考虑，构建用户和事件的偏好模型，分别得到用户偏好评分和事件偏好评分，从用户和事件两个角度更完整地挖掘偏好关系；最后，将用户-事件对匹配度融合用户事件双向偏好线性加权组合得到最终的用户-事件对综合评分，将排序后的TOP-K个用户-事件对作为推荐结果。本方案在Meetup真实数据集上进行了大量实验，并与其它的事件推荐算法进行了比较，表明了本软件推荐算法的性能优于传统的推荐方案，能很好地预测用户的个性化偏好，从而达到个性化推荐的目的。In the above-mentioned personalized event recommendation method and system based on the fusion of topic matching and bidirectional preference, firstly, the topic information of the event is extracted by using the document topic generation model LDA, and the user topic information is obtained according to the historical event records of the user's participation, and the topic matching degree between the user and the event is calculated as an important recommendation factor in the recommendation model. The topic factor can better represent the feature preference; secondly, for the event-based social network recommendation, from the bidirectional perspective of users and events, the preference model of users and events is constructed, and the user preference score and event preference score are obtained respectively, and the preference relationship is more completely mined from the two perspectives of users and events; finally, the user-event pair matching degree is fused with the user-event bidirectional preference linear weighted combination to obtain the final user-event pair comprehensive score, and the sorted TOP-K user-event pairs are used as the recommendation results. This scheme has been extensively experimented on the real Meetup dataset and compared with other event recommendation algorithms, which shows that the performance of this software recommendation algorithm is better than the traditional recommendation scheme, and can well predict the user's personalized preferences, thereby achieving the purpose of personalized recommendation.

需要说明的是，以上所述仅为本发明的优选实施例，并不用于限制本发明，对于本领域技术人员而言，本发明可以有各种改动和变化。凡在本发明的精神和原理之内所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。It should be noted that the above is only a preferred embodiment of the present invention and is not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection scope of the present invention.

Claims

1. A personalized event recommendation method integrating topic matching and bidirectional preference, characterized by comprising the following steps:

Step 1: Use the document topic generation model LDA to extract the topic information of the event, and obtain the user topic information based on the historical event records in which the user participated, calculate the topics of the new event and the user's historical events, and use the JS divergence algorithm to calculate the topic matching score of the user-event pair;

Step 2: construct a user preference model and an event preference model respectively, and calculate the user preference score and the event preference score respectively;

Step 3: Use the Bayesian personalized ranking algorithm BPR to learn the weight parameters of user preference scores and event preference scores to obtain the user-event bidirectional preference score, perform a linear weighted combination of the topic matching score and the bidirectional preference score to obtain the final recommendation score of the user-event pair, and recommend the top K events after ranking to the user;

The calculation of the topics of the new event and the user's historical events in step 1 uses the JS divergence algorithm to calculate the topic matching score of the user-event pair, and the specific steps include:

Step 1-1, all event description contents are grouped into a document set D and stop words are removed, the document set D is input into a document topic generation model LDA, and the topic distribution of each event is obtained respectively;

Remove stop words and punctuation marks from all event contents, and regard the document contents after removing noise interference words as the set D of all documents, which is input into the LDA topic model to generate document Joint distribution of topics and words , as shown in formula (1):

(1)

Given a set of documents , and Respectively represent documents The prior Dirichlet distribution of topic distribution and word distribution, are the hyperparameters of the topic prior distribution and word prior distribution given by experience, k is the number of topics in the document set specified in advance, _and Nm represents the document The total number of words, M is the number of documents in the document set; for the document For each word in, LDA uses prior knowledge Determine the topic distribution of documents , and then from the topic distribution Extract a topic z from the Determine the word distribution of the current topic , and then from the word distribution corresponding to the topic z Extract a word from Repeat the above process N _m times to generate the document ;

Then the Gibbs sampling method is used to estimate the two unknown parameters in the model: event topic distribution and keyword distribution ;

Step 1-2, calculate the topic distribution similarity between the target user's historical events and new events according to the JS divergence algorithm;

According to formula (1), the topic distribution of all events has been generated , given an event and They have topic distribution , first calculate the JS divergence between the two through the JS divergence method , as shown in formula (2):

(2)

in, , Represents KL divergence, which is used to describe two probability distributions and The difference between them is calculated as shown in formula (3):

(3)

Combining equation (2) and equation (3), we can get the event and The topic similarity is , as shown in formula (4):

(4)

Among them, the topic similarity of the event The value of lies in [0,1], and the closer the value is to 1, the higher the event similarity;

Steps 1-3: average the similarities of all historical events of the target user to obtain the topic matching score between the user and the new event;

by Indicates the number of historical events of the target user, taking the average value of all similarities of the target user As the topic matching score between the user and the new event, as shown in formula (5):

(5)

According to the constructed topic matching model, the final To measure the topic matching relationship between target users and new events.

2. The personalized event recommendation method integrating topic matching and two-way preference as described in claim 1 is characterized in that the document topic generation model LDA in step one has a three-layer generative Bayesian network structure, including documents, topics and words, wherein both document-topic and topic-word obey multinomial distribution; each document selects a topic with a certain probability, and selects a word from this topic with a certain probability, and the topic in any document conforms to the Dirichlet distribution, through which the relationship between texts is discovered.

3. The personalized event recommendation method integrating topic matching and two-way preference according to claim 2 is characterized in that the user preference model constructed in step 2 constructs the user's single factor preference from three aspects: geographical location, social relationship, and time factor, and specifically includes:

Step 2-1-1, build a geographic location preference model:

The location preference model calculates the probability that the target user will participate in an event held at that location. The kernel density estimation (KDE) method is used to model the two-dimensional location distribution of the events that the user participates in. The normalized event participation probability is used to represent the user's preference for the location. The longitude and latitude coordinates of the event location are represented by ( Lx, Ly ), and the set of locations where the user has historically participated in events is represented by L ( u ). The KDE function for user u is As shown in formula (6):

(6)

Where, l _i =( Lxi _, Ly _i ) ^T represents the two-dimensional vector of the longitude and latitude coordinates of the event location, m _l ( u, l _i ) represents the frequency of user u participating in activities held at geographic location l _i , σ represents the size of the neighborhood window (bandwidth), N represents the number of location samples, and K ( • ) represents the Gaussian kernel function, which is defined as shown in formula (7):

(7)

Combining equations (6) and (7), we can define the probability that user u will participate in an event held at location l , as shown in equation (8):

(8)

Normalize the probability to get the user's preference score for geographic location , as shown in formula (9):

(9)

The denominator represents the target user’s maximum event participation probability;

Step 2-1-2, build a social relationship preference model:

In the user social relationship network, users will join at least one or more interest groups online and choose to participate in events and activities released by different groups. The user's social relationship preferences are judged by the user's online group relationships, which mainly include two types of interactive relationships;

The first type, user-group relevance, is defined as the interaction relationship between users and all the groups they belong to and between users and events created within the groups. G ( u ) represents the set of groups to which events user u participates. Then the user-group relevance is It can be expressed as shown in formula (10):

(10)

Where, m _p ( u,g ) represents the set of events that user u has participated in in the group to which he belongs;

The second type is the intra-group user relevance. The intra-group user relevance is defined by the similarity of the friends in the target user’s group. The similarity between the target user and the users in the group is calculated. , as shown in formula (11):

(11)

Among them, sim ( u _i ,u _j ) represents the similarity between user u _i and user u _j in the same group, as shown in formula (12);

(12)

Normalize s ( u,g ) to , as shown in formula (13):

(13)

Combining the above two interactive relationships, users belonging to the same group tend to participate in events created by other users in these groups. Combining the correlation between users and groups and the correlation between users in groups, the social preference score of user u about online group g is obtained. , as shown in formula (14):

(14)

in, As a weight control parameter, in social relationship networks, setting the preference association between the target user and the group is as important as the association between users in the group. The value of is set to 0.5;

Step 2-1-3, construct a time factor preference model:

The time factor of the event is an important preference factor that needs to be considered when calculating user preferences; the new event e that the user can choose to participate in is represented as a 7*24-dimensional event time vector , when a new event occurs in a specific time period of a week, the vector component value of the time period is set to 1, otherwise it is 0; in the time preference model, users are represented as user time vectors according to the historical event records of the users. , as shown in formula (15):

(15)

Among them, Eu represents the set of historical events that the target user has participated in, and then calculates the cosine similarity between the _user time vector and the new event time vector , as shown in formula (16):

(16)

For new events ,user The similarity can be obtained according to formula (16): , normalize the similarity to get the user's time preference score for the event , as shown in formula (17):

(17).

4. The personalized event recommendation method integrating topic matching and bidirectional preference according to claim 3, wherein the step of calculating the user preference score in step 2 specifically includes:

For the geographic location preference model, the geographic location preference score is expressed by predicting the probability of the user participating in events held at the location; for the social relationship preference model, the social preference score of the target user is calculated from two aspects: the relationship between the target user and the group and the correlation with the users in the group; for the time factor preference model, a unified vector representation of the two granularities of date and hour is constructed, and the similarity of the user-event pair is calculated based on this as the time preference score of the target user; these three single-factor preferences are combined to form a user preference perception model, and the three single-factor preferences are linearly combined to obtain the overall preference score of user u for event e , as shown in formula (18):

(18)

in, They represent the user's preference scores on three single factors: geographic location, social relationship, and time.

5. The personalized event recommendation method integrating topic matching and two-way preference according to claim 4 is characterized in that the event preference model constructed in step 2 constructs single-factor preferences of events from two aspects: event location popularity and event organizer influence, specifically including:

Step 2-2-1, construct event location popularity preference model:

The popularity of a geographic location is calculated based on the frequency of visits to the location by user u and the users in the online group g to which he joins;

First, define the popularity of the event location l _e with respect to user u , as shown in formula (19):

(19)

Among them, the molecule is the frequency of user u participating in activities held at geographic location l _e , and the denominator is the maximum frequency of locations that user u has visited historically; similarly, define the popularity of geographic location l _e with respect to group g to which user u belongs , as shown in formula (20):

(20)

The numerator represents the frequency of each user in group g participating in practical activities at location l , and the denominator is the maximum frequency of locations that group members have visited historically. This allows us to calculate the popularity of geographic location l _e with respect to users in group g . and Define the total popularity of the location of the recommended event to the target user u as , as shown in formula (21):

(twenty one)

Step 2-2-2, build the influence preference model of event organizers:

First, the influence of the event organizer on the target user. The organizer's reputation or influence is used to express the implicit preference of the event. Define the influence of the event on user u , as shown in formula (22):

(twenty two)

in, represents the set of events held by organizer u _h that user u has participated in, E _h is the set of all events held by organizer u _h ;

Second, the influence of the event organizer in the group. For the online group where the target user is located, the influence of the event in the group is expressed by the frequency ratio of users participating in the event. The influence of the user in the group is expressed by It is expressed as shown in formula (23):

(twenty three)

Among them, U _g represents the group The collection of users in Indicates user Participants are organized by A collection of events held, express In the group The event collection held in the group; the comprehensive influence score of the event organizer is obtained by combining the influence of the event organizer on the target users and the users in the group. , as shown in formula (24):

(twenty four).

6. The personalized event recommendation method integrating topic matching and two-way preference according to claim 5, characterized in that the calculation of event preference score in step 2 specifically includes:

For new events that have not occurred, the preference for the event is expressed by calculating the event location popularity and event organizer influence of the new event; for the constructed event location popularity and event organizer influence Linear combination, calculate the preference score of event e for user u , as shown in formula (25):

(25).

7. The personalized event recommendation method integrating topic matching and two-way preference according to claim 6 is characterized in that the step of obtaining the two-way preference score of the user event in step 3, linearly weighting the topic matching score and the two-way preference score to obtain the final recommendation score of the user-event pair, specifically comprises the following steps:

Step 3-1, find the bidirectional preference for the user-event pair:

Assume that the preference score weights of users and events are and , weighted fusion of the two to obtain the user event bidirectional preference score ; Convert the problem of bidirectional preference scoring into finding the weight vector of two preference scores, and choose to use implicit feedback as training data to learn the weight vector;

We select the learning algorithm BPR based on Bayesian maximum likelihood estimation to rank the weights and learn the correct ranking order of user-event pairs based on the implicit feedback data of users on events, so that the events in which users participate are ranked before new events or other events. First, we define the maximum posterior probability , as shown in formula (26):

(26)

Among them, θ represents the weight vector, R represents the set of all user-event pairs, The definition is as shown in formula (27);

(27)

Among them, Indicates user of user-event pairs, and For users event Ranked The previous probability is shown in formula (28):

(28)

in, Bidirectional preference score , ; In order to facilitate optimization, assume Obeying the normal distribution with a mean of 0, the final optimization objective function is derived , as shown in formula (29):

(29)

in, represents the regularization term coefficient. The objective function is optimized by maximizing the implicit interactive feedback data of user events to obtain the optimal weight parameter vector. The stochastic gradient descent algorithm SGD is used to solve the optimization problem. In the iterative process, the user-event pairs of the target user are randomly extracted from the training set to update the weight vector. , the update process is shown in formula (30):

(30)

in, is the learning rate, ; Through the above learning process, the training set and hyperparameters can be automatically scored according to user event preferences and Find the weight vector , thus obtaining a two-way preference score ;

Step 3-2, combine topic matching and bidirectional preference to obtain the final recommendation score for the user-event pair:

Firstly, the LDA topic model is used to extract event topics and obtain the topic matching scores of users and events. Secondly, the preference models of users and events are constructed according to the user event context information in EBSN, and the bidirectional preference scores of users and events are obtained through the BPR learning algorithm. Finally, the topic matching scores are calculated. Bidirectional preference scoring with user events Linear weighted summation is used to obtain the final user-event recommendation score , as shown in formula (31):

(31)

in, is a weight parameter, which is usually set manually based on experience, and the optimal setting will be determined through experiments.

8. A system for implementing personalized event recommendation integrating topic matching and two-way preference, which is used to implement the personalized event recommendation method integrating topic matching and two-way preference as claimed in any one of claims 1 to 7, characterized in that the implementation system comprises:

The document topic generation module is used to extract the topics of the user's historical events and new events, and calculate the topic distribution and word distribution of the events. The topic similarity between the user's historical events and new events is used to represent the topic matching degree, which is integrated into the recommendation model as one of the key factors for recommendation to perform event recommendation.

Build a user preference module to construct the user's single factor preference from three aspects: geographic location, social relationship, and time factor, and weightedly integrate the three single factor preferences to obtain the user's overall preference;

Construct an event preference module, which uses the social influence of the event organizer in the group and the popularity of the geographical location where the event is held in the group to express the preference for the event;

The user event bidirectional preference scoring module uses a ranking learning algorithm to solve the weight parameters of the user preference score and the event preference score to obtain the user event bidirectional preference score;

The final recommendation score module for the user-event pair is used to linearly weight the topic matching score and the two-way preference score to obtain the final recommendation score of the user-event pair.

9. The system for implementing personalized event recommendation integrating theme matching and two-way preference according to claim 8, characterized in that the user preference module includes a geographic location preference module, a social relationship preference module and a time factor preference module, and the event preference module includes an event location popularity preference module and an event organizer influence preference module, wherein:

The geographic location preference module is used to express the geographic location preference score by predicting the probability of a user participating in an event held in a certain geographic location;

The social relationship preference module is used to calculate the social preference score of the target user from two aspects: the relationship between the target user and the group, and the correlation with the users in the group;

The time factor preference module is used to construct a unified vector representation of two granularities, date and hour, and calculate the similarity of the user-event pair as the time preference score of the target user;

The event location popularity preference module is used to recommend new events. The location of the event is an important selection basis for interested users, which is called the popularity of the geographical location among the user group. Considering the popularity of the geographical location of the event can more accurately calculate the attractiveness of the event to the user;

The event organizer influence preference module is used to improve the accuracy of recommendations based on the influence of the event organizer in the target user's group, and calculates the influence of the event organizer from two aspects: the influence of the event organizer on the target user and the influence of the event organizer in the group.