Nothing Special   »   [go: up one dir, main page]

CN107133277B - A Tourist Attraction Recommendation Method Based on Dynamic Topic Model and Matrix Factorization - Google Patents

A Tourist Attraction Recommendation Method Based on Dynamic Topic Model and Matrix Factorization Download PDF

Info

Publication number
CN107133277B
CN107133277B CN201710237404.6A CN201710237404A CN107133277B CN 107133277 B CN107133277 B CN 107133277B CN 201710237404 A CN201710237404 A CN 201710237404A CN 107133277 B CN107133277 B CN 107133277B
Authority
CN
China
Prior art keywords
user
photo
tourist
matrix
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710237404.6A
Other languages
Chinese (zh)
Other versions
CN107133277A (en
Inventor
陈岭
徐振兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201710237404.6A priority Critical patent/CN107133277B/en
Publication of CN107133277A publication Critical patent/CN107133277A/en
Application granted granted Critical
Publication of CN107133277B publication Critical patent/CN107133277B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种基于动态主题模型和矩阵分解的旅游景点推荐方法,通过分析用户在不同时间间隔内的旅行历史信息获得用户旅行偏爱的变化,为用户提供细粒度的旅游推荐服务。方法首先从社交网络中获取数据集的信息。其次,利用动态主题模型从用户历史信息中挖掘用户和景点的隐式特征信息。再次,通过对数据集分析,挖掘用户和景点的显式特征信息,并结合用户和景点的隐式特征信息获得用户‑用户和景点‑景点相似性信息。最后,利用带有联合正则项的矩阵分解方法融合用户‑用户和景点‑景点的相似性信息。该方法能够获得用户的旅行偏爱的变化,为用户推荐合适的旅游景点。

The invention discloses a tourist attraction recommendation method based on a dynamic theme model and matrix decomposition, which obtains the change of the user's travel preference by analyzing the user's travel history information in different time intervals, and provides fine-grained travel recommendation services for the user. The method first obtains the information of the dataset from the social network. Second, a dynamic topic model is used to mine the implicit feature information of users and attractions from user history information. Thirdly, through the analysis of the data set, the explicit feature information of users and attractions is mined, and the user-user and attraction-attraction similarity information is obtained by combining the implicit feature information of users and attractions. Finally, the user-user and attraction-attraction similarity information is fused using a matrix factorization method with a joint regularization term. This method can obtain the change of the user's travel preference and recommend suitable tourist attractions for the user.

Description

一种基于动态主题模型和矩阵分解的旅游景点推荐方法A Tourist Attraction Recommendation Method Based on Dynamic Topic Model and Matrix Factorization

技术领域technical field

本发明涉及信息推荐技术领域,尤其涉及一种基于动态主题模型和矩阵分解的旅游景点推荐方法。The invention relates to the technical field of information recommendation, in particular to a tourist attraction recommendation method based on a dynamic theme model and matrix decomposition.

背景技术Background technique

近年来,随着移动互联网、智能手机、以及照片分享网站(例如:Flickr、Panoramio、以及Instagram等)的快速发展,互联网上出现了大量带有地理位置信息的照片数据。并且,这些由群体所贡献的带有地理位置信息的照片数量呈现出急剧增长的趋势。基于这些带有地理位置信息的照片可以挖掘城市中的旅游景点、获得城市中受游客欢迎的旅游线路、分析游客的旅游偏爱,进一步为用户提供个性化的旅游景点或者旅游线路推荐服务。In recent years, with the rapid development of mobile Internet, smart phones, and photo-sharing websites (such as: Flickr, Panoramio, and Instagram, etc.), a large amount of photo data with geographic location information has appeared on the Internet. Moreover, the number of photos with geographical location information contributed by these groups shows a trend of rapid growth. Based on these photos with geographic location information, it is possible to mine tourist attractions in the city, obtain popular tourist routes in the city, analyze tourists' travel preferences, and further provide users with personalized tourist attractions or tourist route recommendation services.

目前的基于带有地理位置信息照片挖掘的旅游景点推荐方法通常是直接利用用户访问景点的次数信息获得用户间的相似性信息,并结合基于用户的协同过滤方法为用户推荐景点。然而,由于旅行时间或者金钱花费的限制,用户在旅游城市中通常只会游览少量的景点,导致了基于用户-景点矩阵的推荐系统在建模过程中出现数据稀疏性问题。The current tourist attraction recommendation method based on photo mining with geographic location information usually directly uses the number of times users visit the scenic spot to obtain the similarity information between users, and combines the user-based collaborative filtering method to recommend scenic spots for users. However, due to the limitation of travel time or money, users usually only visit a small number of scenic spots in tourist cities, which leads to the problem of data sparsity in the modeling process of the recommendation system based on user-attraction matrix.

为解决上述景点推荐中的数据稀疏性问题,目前出现了基于数据降维的景点推荐方法,例如:静态主题模型。该模型是文本挖掘领域中一种比较流行的获得文档隐含主题的方法。在该模型中,用户的旅行历史被看作是文档,旅游景点被看作是词。通过该模型,用户和景点的主题概率分布可以获得。然而,静态主题模型在基于用户旅行历史获得用户旅行偏爱的主题概率分布时,将用户在所有时间段内(例如:年份)的旅行历史信息看作成一个文档,忽略了用户在不同时间段内旅行偏爱的变化问题。In order to solve the data sparsity problem in the above-mentioned scenic spot recommendation, there are currently scenic spot recommendation methods based on data dimensionality reduction, such as static topic models. This model is a relatively popular method in the field of text mining to obtain hidden topics in documents. In this model, the user's travel history is regarded as documents, and tourist attractions are regarded as words. Through this model, topic probability distributions of users and attractions can be obtained. However, when the static topic model obtains the topic probability distribution of the user's travel preference based on the user's travel history, the user's travel history information in all time periods (for example: year) is regarded as a document, and the user's travel history in different time periods is ignored. The problem of changing preferences.

动态主题模型是基于静态主题模型获得文档主题变化的一种扩展形式。它通过将文档集按照时间段划分成子文档集,假设每一个子文档集的主题具有前后依赖关系,主题是随着时间而演化的。通过该模型,文档和词在不同时间段的主题概率分布可以获得,这些在不同时间段的主题体现了文档和词的主题演化过程。该模型为解决在旅游推荐中静态主题模型不能获得用户旅行偏爱的变化问题提供了可能。Dynamic topic model is an extended form to obtain document topic changes based on static topic model. It divides the document set into sub-document sets according to the time period, assuming that the theme of each sub-document set has a front-to-back dependency, and the theme evolves with time. Through this model, the topic probability distribution of documents and words in different time periods can be obtained, and these topics in different time periods reflect the topic evolution process of documents and words. This model provides the possibility to solve the problem that the static topic model cannot obtain the change of user's travel preference in travel recommendation.

发明内容Contents of the invention

本发明要解决的问题是如何通过动态主题模型获得用户旅行偏爱变化信息,为用户提供细粒度的旅游景点推荐服务。The problem to be solved by the present invention is how to obtain the user's travel preference change information through the dynamic theme model, and provide the user with fine-grained tourist attraction recommendation service.

一种基于动态主题模型和矩阵分解的旅游景点推荐方法,包括:A tourist attraction recommendation method based on dynamic topic model and matrix decomposition, including:

获取数据集信息阶段:从社交网络中获取照片数据集Dphoto并对其进行去噪处理,得到旅游照片数据集Dphoto-travel,再从社交网络中签到数据Dcheck-in,并提取签到数据Dcheck-in中的签到地点的类别信息,得到签到地点的类别数据集DcategoryObtaining the dataset information stage: Obtain the photo dataset D photo from the social network and perform denoising processing on it to obtain the travel photo dataset D photo-travel , then check-in the data D check-in from the social network, and extract the check-in data The category information of the check-in place in D check-in , and the category data set D category of the check-in place is obtained;

获得用户旅行偏爱阶段:利用动态主题模型从旅游照片数据集Dphoto-travel中提取用户和景点的隐式特征;然后,通过对旅游照片数据集Dphoto-travel的分析,统计得到用户和景点的显式特征,并结合用户和景点的隐式特征获得用户-用户和景点-景点相似性信息;最后,利用带有联合正则项的矩阵分解方法融合用户-用户和景点-景点的相似性信息,补全稀疏的用户-景点矩阵Y,得到含有用户旅行偏爱信息的矩阵Y’;Obtaining the user's travel preference stage: using the dynamic topic model to extract the implicit features of users and attractions from the tourist photo dataset D photo-travel ; then, through the analysis of the tourist photo dataset D photo-travel , the statistics of the users and attractions are obtained. The explicit features are combined with the implicit features of users and attractions to obtain user-user and attraction-attraction similarity information; finally, the user-user and attraction-attraction similarity information is fused using a matrix decomposition method with a joint regularization term, Complete the sparse user-attraction matrix Y to obtain a matrix Y' containing user travel preference information;

推荐旅游景点阶段:利用矩阵Y’对候选集中的景点打分,并将打分排名前N个旅游景点推荐给用户。Recommend tourist attractions stage: use matrix Y' to score the attractions in the candidate set, and recommend the top N tourist attractions to users.

所述的获取数据集信息阶段的具体步骤为:The concrete steps of the described stage of obtaining data set information are:

(1-1)利用照片分享网站的公用API下载旅游城市中带有地理位置信息的照片数据,组成照片数据集Dphoto(1-1) Utilize the public API of photo sharing website to download the photo data with geographic location information in the tourist city, form photo data set D photo ;

(1-2)利用基于熵的流动性方法对照片数据集Dphoto中的非旅游照片进行过滤,去除照片集中的噪音照片,得到旅游照片数据集Dphoto-travel(1-2) utilize entropy-based mobility method to filter the non-tourist photos in the photo data set D photo , remove the noise photos in the photo collection, and obtain the tourist photo data set D photo-travel ;

(1-3)利用基于位置的社交媒体网站的公用API下载用户在旅游城市中的签到数据Dcheck-in(1-3) Utilize the public API of the location-based social media website to download the check-in data D check-in of the user in the tourist city;

(1-4)提取签到数据Dcheck-in中的签到地点的类别信息,并统计组成签到地点的类别数据集Dcategory(1-4) Extract the category information of the check-in places in the check-in data D check-in , and count the category data set D category that constitutes the check-in places.

所述的照片数据集Dphoto中的照片数据包含:照片的标识信息、照片拍摄时间、照片拍摄地点的经纬度信息、用户为照片添加的文本描述信息以及照片上传者的标识信息。The photo data in the photo data set D photo includes: photo identification information, photo shooting time, latitude and longitude information of the photo shooting location, text description information added by the user for the photo, and photo uploader identification information.

所述的签到数据Dcheck-in中的签到数据包括:用户签到行为的标识信息、用户签到的时间信息、签到地点的经纬度信息、签到地点的类别信息以及签到用户的标识信息。The check-in data in the check-in data D check-in includes: identification information of the user's check-in behavior, time information of the user's check-in, latitude and longitude information of the check-in location, category information of the check-in location, and identification information of the check-in user.

所述的获得用户旅行偏爱阶段的具体步骤为:The specific steps in the stage of obtaining the user's travel preference are as follows:

(2-1)利用基于密度的聚类方法对旅游照片数据集Dphoto-travel中的照片进行空间聚类,获得旅游景点集合L;(2-1) Utilize density-based clustering method to carry out spatial clustering to the photos in the tourist photo data set D photo-travel , obtain tourist attraction collection L;

(2-2)根据从旅游照片数据集Dphoto-travel统计得到的用户访问景点的次数、旅游景点集合L中的旅游景点,构建用户-景点矩阵Y;(2-2) according to the number of times that the user visits the scenic spot obtained from the tourist photo data set D photo-travel statistics, the tourist attraction in the tourist attraction collection L, construct the user-attraction matrix Y;

(2-3)将文档代表用户旅行历史,词代表旅游景点,利用动态主题模型推断用户和景点在不同时间段的潜在主题概率分布,并联合所有时间段的主题概率分布,得到用户的全部隐式特征和景点的全部隐式特征 (2-3) The document represents the user's travel history, and the word represents the tourist attraction. The dynamic topic model is used to infer the potential topic probability distribution of the user and the scenic spot in different time periods, and the topic probability distribution of all time periods is combined to obtain all the latent topics of the user. formula features and the full implicit features of the sights

(2-4)从旅游照片数据集Dphoto-trave和签到地点的类别数据集Dcategorye中提取用户的显式特征和景点的显式特征然后结合用户的全部隐式特征和景点的全部隐式特征建立用户画像Fuser和景点画像Flocation,并利用余弦函数构建m×m的用户-用户相似性矩阵A和n×n的景点-景点相似性矩阵B;(2-4) Extract explicit features of users from the travel photo dataset D photo-trave and the category dataset D categorye of check-in locations and explicit features of attractions Then combine all the implicit characteristics of the user and the full implicit features of the sights Establish user portrait F user and scenic spot portrait F location , and use cosine function to construct m×m user-user similarity matrix A and n×n scenic spot-sightseeing similarity matrix B;

(2-5)根据建立的Y、A、以及B,利用用户-用户以及景点-景点的相似性关系构建带有联合正则项的矩阵分解模型,完成对Y的分解;(2-5) According to the established Y, A, and B, utilize the user-user and attraction-attraction similarity relationship to construct a matrix decomposition model with joint regularization items, and complete the decomposition of Y;

(2-6)根据带有联合正则项的矩阵分解的结果,补全稀疏的用户-景点矩阵Y,得到含有用户旅行偏爱信息的矩阵Y’。(2-6) Complement the sparse user-sightseeing matrix Y according to the result of matrix decomposition with joint regularization items, and obtain a matrix Y' containing user travel preference information.

所述的步骤(2-2)的具体步骤为:The concrete steps of described step (2-2) are:

(2-2-1)从旅游照片数据集Dphoto-trave中提取所有的访问信息v=(l,u,t),其中,v表示用户u于t时间访问旅游景点l;(2-2-1) Extract all the visit information v=(l, u, t) from the tourist photo data set D photo-trave , wherein, v represents that user u visits tourist attraction l at time t;

(2-2-2)根据所有的访问信息v=(l,u,t)统计每个用户访问每个景点的次数和旅游照片数据集Dphoto-trave中的用户总个数m;(2-2-2) count the number of times each user visits each scenic spot and the total number m of users in the tourist photo data set D photo-trave according to all visit information v=(l, u, t);

(2-2-3)根据每个用户访问每个景点的次数、用户总个数m以及旅游景点集合L中的共计n个旅游景点,构建用户-景点矩阵Y,其中,Y∈Rm×n,矩阵Y中(i,j)位置处的值为第i个用户访问第j个景点的次数。(2-2-3) According to the number of times each user visits each scenic spot, the total number of users m, and a total of n tourist attractions in the tourist attraction set L, construct a user-attraction matrix Y, where Y∈R m× n , the value at position (i, j) in the matrix Y is the number of times the i-th user visits the j-th scenic spot.

所述的步骤(2-3)的具体步骤为:The concrete steps of described step (2-3) are:

(2-3-1)将旅游照片数据集Dphoto-trave中所有的访问信息v=(l,u,t)按照相同的时间长度切片,得到与每个时间段对应子数据集,共M个;(2-3-1) Slice all the access information v=(l, u, t) in the travel photo data set D photo-trave according to the same time length, and obtain sub-data sets corresponding to each time period, with a total of M indivual;

(2-3-2)将子数据集作为动态主题模型的输入,通过训练获得用户和景点在不同时间段的主题概率分布,(2-3-2) The sub-dataset is used as the input of the dynamic topic model, and the topic probability distribution of users and scenic spots in different time periods is obtained through training,

其中,为用户在第T个时间段的主题概率分布,景点在第T个时间段的主题概率分布,k为主题个数;in, is the topic probability distribution of the user in the Tth time period, The theme probability distribution of the scenic spot in the Tth time period, k is the number of themes;

(2-3-3)将用户在所有时间段的主题概率分布按时间串联在一起,组成用户的全部隐式特征将景点在所有时间段的主题概率分布按时间串联在一起,组成景点的全部隐式特征(2-3-3) Concatenate the user's topic probability distributions in all time periods together in time to form all the implicit features of the user Concatenate the theme probability distributions of attractions in all time periods together in time to form all the implicit features of attractions

所述的步骤(2-4)的具体步骤为:The concrete steps of described step (2-4) are:

(2-4-1)对旅游照片数据集Dphoto-trave中所有的访问信息v=(l,u,t)和签到地点的类别数据集Dcategory进行统计,获得用户的显式特征和景点的显式特征其中,r为用户的显式特征的总个数,s为景点的显式特征的总个数;(2-4-1) Make statistics on all the visit information v=(l, u, t) in the tourist photo data set D photo-trave and the category data set D category of the check-in location, and obtain the explicit characteristics of the user and explicit features of attractions Among them, r is the total number of explicit features of the user, and s is the total number of explicit features of the scenic spot;

(2-4-2)将用户的显式特征与隐式特征联合一起,构建用户画像将景点的显式特征与隐式特征联合一起,构建景点画像 (2-4-2) Combine the user's explicit features and implicit features to build a user portrait Combine the explicit and implicit features of the scenic spot to construct a portrait of the scenic spot

(2-4-3)结合余弦公式获得用户-用户相似性矩阵A:(2-4-3) Combine the cosine formula to obtain the user-user similarity matrix A:

其中,fpi和fqi分别表示用户p和q的第i个显式特征;Among them, f pi and f qi represent the i-th explicit feature of users p and q, respectively;

同样利用余弦公式获得景点-景点相似性矩阵B,此时,余弦公式为:Also use the cosine formula to obtain the attraction-attraction similarity matrix B. At this time, the cosine formula is:

其中,fxi和fyi分别表示景点x和y的第i个显式特征。Among them, f xi and f yi denote the ith explicit features of scenic spots x and y, respectively.

步骤(2-5)中,在矩阵Y的分解过程中,将A和B的相似性信息作为额外正则项限制Y的分解,具体的目标函数为:In step (2-5), in the decomposition process of matrix Y, the similarity information of A and B is used as an additional regular term to restrict the decomposition of Y. The specific objective function is:

其中,Rij为矩阵Y中(i,j)位置处的数值,Iij表示是用户i是否访问景点j的标识器,如果访问,其值为1,否则为0,Simig表示用户i和用户g之间的相似性信息,Simjq表示景点j和景点q景点之间的相似性信息;Ui表示为用户i的潜在特征向量,Ug表示为用户g的潜在特征向量,Lj表示为景点j的潜在特征向量,Lg表示为景点g的潜在特征向量,表示用户i的潜在特征向量与用户g的潜在特征向量的距离,G(i)表示用户i的相似性用户群,Q(j)表示表示景点j的相似性景点群,U是Y分解后的用户的潜在特征向量,是d×m维的;L是Y分解后的景点的潜在特征向量,是d×n维的;其中,m、n、以及d分别表示用户个数、景点个数、以及潜在特征向量的个数;Among them, R ij is the value at the position (i, j) in the matrix Y, I ij indicates whether the user i visits the scenic spot j or not, if it visits, its value is 1, otherwise it is 0, Sim ig indicates user i and The similarity information between users g, Sim jq represents the similarity information between scenic spot j and scenic spot q; U i represents the latent feature vector of user i, U g represents the latent feature vector of user g, L j represents is the latent feature vector of scenic spot j, L g is the latent feature vector of scenic spot g, Indicates the distance between the potential feature vector of user i and the latent feature vector of user g, G(i) represents the similarity user group of user i, Q(j) represents the similarity scenic spot group of scenic spot j, and U is the decomposed The potential feature vector of the user is d×m-dimensional; L is the potential feature vector of the scenic spot after Y decomposition, which is d×n-dimensional; where m, n, and d represent the number of users, the number of scenic spots, and and the number of latent feature vectors;

基于带有联合正则项的矩阵分解模型分解Y的具体步骤如下:The specific steps of decomposing Y based on the matrix factorization model with joint regularization items are as follows:

(a)随机初始化参数U和参数L,并设置学习率α、误差阈值δ、参数λ1和λ2(a) randomly initialize parameter U and parameter L, and set learning rate α, error threshold δ, parameters λ 1 and λ 2 ;

(b)对于Y中每一个非零值Rij,根据计算Rij的估计值Xij,并根据计算Xij与真实值Rij的误差,最后根据统计对所有非零值的总误差θ,其中,w为非零值的个数;(b) For each non-zero value R ij in Y, according to Calculate the estimated value X ij of R ij , and according to Calculate the error between X ij and the real value R ij , and finally according to Statistics of the total error θ for all non-zero values, where w is the number of non-zero values;

(c)判断总误差θ是否大于误差阈值δ,若是,执行步骤(d),若否,迭代结束,此时的U和L为最优值,完成对矩阵Y的分解;(c) Judging whether the total error θ is greater than the error threshold δ, if so, perform step (d), if not, the iteration ends, U and L at this time are optimal values, and the decomposition of the matrix Y is completed;

(d)采用梯度下降方法更新U和L的值,然后跳转执行步骤(b),梯度下降方法的公式为:(d) Use the gradient descent method to update the values of U and L, and then jump to step (b). The formula of the gradient descent method is:

步骤(2-6)中,根据公式对Y中缺失的值进行填充,填充后的值代表用户的旅行偏爱信息。In step (2-6), according to the formula Fill in the missing values in Y, and the filled values represent the user's travel preference information.

所述的推荐旅游景点阶段的具体步骤为:The specific steps in the stage of recommending tourist attractions are as follows:

(3-1)根据用户输入信息获得用户在旅游城市中景点集合,这些景点作为推荐的候选集;(3-1) According to the information input by the user, the collection of scenic spots of the user in the tourist city is obtained, and these scenic spots are used as candidate sets for recommendation;

(3-2)根据矩阵Y’获得用户对推荐的候选集中景点的打分,获得用户偏爱景点;(3-2) According to the matrix Y', obtain the user's scoring to the recommended candidate concentrated scenic spots, and obtain the user's preferred scenic spots;

(3-3)对偏爱景点的打分进行降序排列,选择打分排名前N个旅游景点推荐给用户。(3-3) Arrange the scores of the preferred attractions in descending order, and select the top N tourist attractions with the highest scores to recommend to the user.

本发明针对传统静态主题模型不足以获得用户旅行偏爱的主题演化问题,提出一种基于动态主题模型和矩阵分解的旅游景点推荐方法,与现有的方法相比,其优点在于:The present invention aims at the theme evolution problem that the traditional static theme model is not enough to obtain the user's travel preference, and proposes a tourist attraction recommendation method based on the dynamic theme model and matrix decomposition. Compared with the existing method, its advantages are:

(1)利用动态主题模型,获得用户的旅行偏爱的主题变化信息(隐式特征信息)。(1) Use the dynamic topic model to obtain the topic change information (implicit feature information) of the user's travel preference.

(2)通过对数据集信息的分析,获得了用户和景点大量显式特征信息,这些信息与隐式特征信息一起可以全面描述用户和景点的特点。(2) Through the analysis of the data set information, a large amount of explicit feature information of users and attractions is obtained, which together with implicit feature information can fully describe the characteristics of users and attractions.

(3)通过带有联合正则项的矩阵分解方法融合用户-用户和景点-景点相似性的信息,这种方法可以在用户-景点矩阵分解的过程中同时限制用户和景点的潜在特征向量,准确补全用户-景点矩阵。(3) Fusion of user-user and attraction-attraction similarity information through matrix decomposition method with joint regularization, this method can limit the potential feature vectors of users and attractions at the same time in the process of user-attraction matrix decomposition, accurate Complete the user-attraction matrix.

附图说明Description of drawings

图1是本发明基于动态主题模型和矩阵分解的旅游景点推荐方法流程图;Fig. 1 is the flow chart of the tourist attraction recommendation method based on dynamic theme model and matrix decomposition of the present invention;

图2是获取数据集信息阶段的流程图;Fig. 2 is a flowchart of the stage of obtaining data set information;

图3是获得用户旅行偏爱阶段的流程图;Fig. 3 is a flow chart of the stage of obtaining user's travel preference;

图4是基于动态主题模型的文档生成原理图;Fig. 4 is a schematic diagram of document generation based on a dynamic topic model;

图5是推荐旅游景点阶段的流程图。Fig. 5 is a flowchart of the stage of recommending tourist attractions.

具体实施方式Detailed ways

为了更为具体地描述本发明,下面结合附图及具体实施方式对本发明的技术方案进行详细说明。In order to describe the present invention more specifically, the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

如图1所示,本发明基于动态主题模型和矩阵分解的旅游景点推荐方法分为获取数据集信息、获得用户旅行偏爱、以及推荐旅游景点三个阶段:As shown in Figure 1, the tourist attraction recommendation method based on the dynamic theme model and matrix decomposition of the present invention is divided into three stages: obtaining data set information, obtaining user travel preferences, and recommending tourist attractions:

获取数据集信息阶段Get dataset information stage

数据处理的流程图如图2所示,步骤如下:The flow chart of data processing is shown in Figure 2, and the steps are as follows:

S1-1,利用照片分享网站的公用API下载旅游城市中带有地理位置信息的照片数据集DphotoS1-1, using the public API of the photo sharing website to download the photo dataset D photo with geographic location information in the tourist city.

获取照片数据的具体步骤包括:The specific steps for obtaining photo data include:

S1-1-1,通过照片分享网站(如:Flickr)提供的公用API,按城市下载在该城市中拍摄的照片及相应的元数据信息。一张带有地理位置信息的照片p可以表示为:p=(pid,pt,pg,px,pu)。其中,pid,pt,pg,px,pu分别表示为照片的唯一标识号、用户拍摄照片的时间、照片的经纬度信息、用户为照片添加的文本描述信息、以及照片上传者的唯一标识号;S1-1-1, through the public API provided by the photo sharing website (such as: Flickr), download the photos taken in the city and the corresponding metadata information according to the city. A photo p with geographic location information can be expressed as: p=(p id , p t , p g , p x , p u ). Among them, p id , p t , p g , p x , p u represent the unique identification number of the photo, the time when the user took the photo, the latitude and longitude information of the photo, the text description information added by the user for the photo, and the photo uploader's ID unique identification number;

S1-1-2,对收集的照片集按照照片的唯一标识信息获得每个用户拍摄的所有照片信息,Hi={p1,p2,…,pe},其中,e为用户i拍摄的所有照片数。S1-1-2, according to the unique identification information of the collected photos, obtain all photo information taken by each user, H i ={p 1 ,p 2 ,…,p e }, where e is taken by user i All photo counts for .

S1-2,利用基于熵的流动性方法对照片数据集Dphoto中的非旅游照片进行过滤,去除照片集中的噪音照片,得到旅游照片数据集Dphoto-travelS1-2, using the entropy-based mobility method to filter the non-travel photos in the photo dataset D photo , remove the noise photos in the photo set, and obtain the tourist photo dataset D photo-travel .

去除非旅游照片的具体步骤如下:The specific steps to remove non-tourism photos are as follows:

S1-2-1,根据照片内容以及经验知识对用户的照片集进行人工标注少量旅游照片和非旅游照片;S1-2-1. Manually mark a small number of tourist photos and non-tourist photos in the user's photo collection according to the photo content and experience knowledge;

S1-2-2,将城市分成x′×y′个小格子(每个小格子表示为(xi,yj),i=1,2,…,x;j=1,2,…,y),统计用户的照片在这些小格子中的照片数目,并计算每个小格子中照片占整个照片数量的比例a,根据信息熵的原理计算用户的照片集的流动性熵Hmobility,计算过程如公式如下式所示S1-2-2, divide the city into x′×y′ small grids (each small grid is represented as ( xi ,y j ), i=1,2,…,x; j=1,2,…, y), count the number of photos of the user's photos in these small grids, and calculate the ratio a of the photos in each small grid to the total number of photos, calculate the mobility entropy H mobility of the user's photo set according to the principle of information entropy, and calculate The process is as shown in the following formula

根据Hmobility>ε,调节ε的值(从0到1,每隔0.1),统计对人工标注照片集的分类准确率,选择准确率高的ε对整个数据集中的照片进行分类,去除非旅游照片。According to H mobility > ε, adjust the value of ε (from 0 to 1, every 0.1), and count the classification accuracy of the artificially marked photo set, select ε with high accuracy to classify the photos in the entire data set, and remove non-tourism photo.

S1-3,利用基于位置的社交媒体网站的公用API下载用户在旅游城市中的签到数据Dcheck-inS1-3, using the public API of the location-based social media website to download the check-in data D check-in of the user in the tourist city.

获取签到数据的具体步骤包括:The specific steps to obtain sign-in data include:

S1-3-1,通过基于位置的社交媒体网站(如:SinaWeibo)提供的公用API,按城市下载在该城市中用户签到的数据。用户的一次签到行为q可以表示为:q=(qid,qt,qg,qc,qu);其中,qid,qt,qg,qc,qu分别表示签到行为的唯一标识号、用户签到的时间、兴趣点的经纬度信息、兴趣点的类别信息、以及签到用户的唯一标识号;S1-3-1, through the public API provided by the location-based social media website (such as: SinaWeibo), download the data of the user's check-in in the city according to the city. A user's check-in behavior q can be expressed as: q=(q id ,q t ,q g ,q c ,q u ); where, q id ,q t ,q g ,q c ,q u represent the sign-in behavior The unique identification number, the time when the user checks in, the latitude and longitude information of the point of interest, the category information of the point of interest, and the unique identification number of the user who checks in;

S1-3-2,根据用户签到数据的用户标识号提取每个用户的所有签到数据;Qi={q1,q2,…,qo},其中,o为用户i在城市中的所有签到数。S1-3-2, extract all the check-in data of each user according to the user identification number of the user check-in data; Q i ={q 1 ,q 2 ,…,q o }, where o is all the Check-ins.

S1-4,提取签到数据Dcheck-in中的签到地点的类别信息,并统计组成签到地点的类别数据集DcategoryS1-4, extracting the category information of the check-in places in the check-in data D check-in , and counting the category data set D category that constitutes the check-in places.

用户签到行为信息中包含用户访问兴趣点时添加的一些类别信息,对这些信息进行统计可以获得每个兴趣点所有的类别信息,具体可以表示为:CPOI=(c1,c2,…,cz)。其中,z为兴趣点包含的类别数目。The user’s check-in behavior information includes some category information added when the user visits the POI. Statistics on this information can obtain all the category information of each POI. Specifically, it can be expressed as: C POI = (c 1 ,c 2 ,…, c z ). Among them, z is the number of categories contained in the interest point.

获得用户旅行偏爱阶段Get user travel preference stage

获得用户旅行偏爱阶段的流程图如图3所示,步骤如下:The flow chart of the stage of obtaining user travel preference is shown in Figure 3, and the steps are as follows:

S2-1,利用基于密度的聚类方法对带有地理位置信息的照片进行空间聚类,从而获得旅游城市中旅游景点集合L。S2-1, use the density-based clustering method to spatially cluster the photos with geographic location information, so as to obtain a set L of tourist attractions in the tourist city.

用户通常在其比较感兴趣的地点拍摄照片,如果大量用户在一个地点拍摄照片,则该地点可以认为是一个旅游景点。采用基于密度的聚类方法(例如:P-DBSCAN)对大量带有地理位置信息的照片进行空间上的聚类,得到的每个聚类代表一个旅游景点,聚类中心即为该旅游景点位置。通过该过程,挖掘出旅游景点集L={l1,l2,…,ln},其中l={Pl,gl},Pl是属于一个景点的所有照片集合,gl是一个景点的经纬度信息。Users usually take photos at places they are more interested in, and if a large number of users take photos at one place, then the place can be considered as a tourist attraction. A density-based clustering method (for example: P-DBSCAN) is used to spatially cluster a large number of photos with geographic location information. Each cluster obtained represents a tourist attraction, and the cluster center is the location of the tourist attraction. . Through this process, a set of tourist attractions L={l 1 ,l 2 ,…,l n } is excavated, where l={P l ,g l }, P l is a collection of all photos belonging to a scenic spot, and g l is a The latitude and longitude information of the attractions.

S2-2,根据从照片中挖掘的旅游景点以及用户在景点的访问历史,统计用户访问景点的次数信息,构建用户-景点矩阵Y。S2-2, according to the tourist attractions mined from the photos and the user's visit history at the attractions, count the number of times the user visits the attractions, and construct the user-attraction matrix Y.

建立用户-景点矩阵的具体步骤包括:The specific steps to establish the user-attraction matrix include:

S2-2-1,从照片集中提取用户对景点的历史访问信息,v=(l,u,t)。其中,l,u,t分别为用户访问的旅游景点、用户的标示、以及用户访问该景点的时间。S2-2-1, extracting the user's historical visit information to scenic spots from the photo collection, v=(l,u,t). Among them, l, u, t are respectively the tourist attraction visited by the user, the user's label, and the time when the user visits the scenic spot.

S2-2-2,统计用户访问景点次数。构建的用户-景点矩阵可以表示为Y:Y∈Rm×n,m和n分别表示用户和景点个数,矩阵中的值为用户访问景点的次数。S2-2-2, counting the number of times the user visits the attractions. The constructed user-attraction matrix can be expressed as Y: Y∈R m×n , where m and n represent the number of users and attractions respectively, and the values in the matrix are the number of times users visit the attractions.

S2-3,将文档代表用户旅行历史,词代表旅游景点,利用动态主题模型推断用户和景点在不同时间段的潜在主题概率分布,并联合所有时间段的主题概率分布,得到用户的全部隐式特征和景点的全部隐式特征 S2-3, the document represents the user's travel history, and the word represents the tourist attraction, using the dynamic topic model to infer the potential topic probability distribution of the user and the scenic spot in different time periods, and combine the topic probability distributions of all time periods to obtain all the user's implicit feature and the full implicit features of the sights

动态主题模型是已经被研究者成功用于获得文档的潜在主题变化的应用中。对于一个文档,其具体生成过程如图4所示。在该图中,文档集按照时间被分成几个子文档集,不同时间段的文档集和词的主题是具有前后依赖关系的,后一个时间段的主题分布是由前一个时间段主题演化而来。α是控制子文档集的主题分布的参数,θ是控制单个文档的主题分布的参数,β是控制主题在词上的概率分布参数,z表示主题,k表示主题个数,w表示词,N表示文档,A表示子文档集。通过该模型,文档和词在不同时间段的主题概率分布都可以获得。Dynamic topic models are applications that have been successfully used by researchers to obtain latent topic variations of documents. For a document, its specific generation process is shown in Figure 4. In this figure, the document set is divided into several sub-document sets according to time. The document sets and word themes in different time periods have a front-to-back dependency relationship, and the theme distribution of the latter time period is evolved from the theme of the previous time period. . α is the parameter that controls the topic distribution of the sub-document set, θ is the parameter that controls the topic distribution of a single document, β is the parameter that controls the probability distribution of the topic on the word, z represents the topic, k represents the number of topics, w represents the word, N Represents a document, and A represents a sub-document set. Through this model, the topic probability distribution of documents and words in different time periods can be obtained.

在基于带有地理位置信息照片挖掘的旅游推荐中,用户的旅行历史信息可以被看作是多个主题的组合,每个主题是多个景点的概率分布。简单地说,在使用动态主题模型时,文档代表用户旅行历史,词代表旅游景点。In the travel recommendation based on photo mining with geographic location information, the user's travel history information can be regarded as a combination of multiple topics, and each topic is the probability distribution of multiple scenic spots. Simply put, when using a dynamic topic model, documents represent user travel history and words represent tourist attractions.

S2-3-1,获得用户和景点隐式特征的具体步骤包括:S2-3-1. The specific steps to obtain the implicit characteristics of users and attractions include:

将用户的旅游历史信息按照时间切片(例如:年份)划分为不同的子数据集。Divide the user's travel history information into different sub-datasets according to time slices (for example: year).

S2-3-2,将子数据集作为动态主题模型的输入,通过训练获得用户和景点在不同时间段的主题概率分布,S2-3-2, the sub-dataset is used as the input of the dynamic topic model, and the topic probability distribution of users and scenic spots in different time periods is obtained through training,

其中,为用户在第T个时间段的主题概率分布,景点在第T个时间段的主题概率分布,k为主题个数;in, is the topic probability distribution of the user in the Tth time period, The theme probability distribution of the scenic spot in the Tth time period, k is the number of themes;

S2-3-3,将用户在所有时间段的主题概率分布按时间串联在一起,组成用户的全部隐式特征将景点在所有时间段的主题概率分布按时间串联在一起,组成景点的全部隐式特征 S2-3-3, concatenating the topic probability distributions of the user in all time periods by time to form all the implicit features of the user Concatenate the theme probability distributions of attractions in all time periods together in time to form all the implicit features of attractions

S2-4,从用户旅行历史中提取用户的显式特征和景点的显式特征然后结合用户的全部隐式特征和景点的全部隐式特征建立用户画像Fuser和景点画像Flocation,并利用余弦函数构建m×m的用户-用户相似性矩阵A和n×n的景点-景点相似性矩阵B。S2-4, extract user's explicit features from user's travel history and explicit features of attractions Then combine all the implicit characteristics of the user and the full implicit features of the sights Establish user portrait F user and scenic spot portrait F location , and use cosine function to construct m×m user-user similarity matrix A and n×n scenic spot-sightseeing similarity matrix B.

建立A和B的具体步骤包括:The specific steps to establish A and B include:

S2-4-1,对用户访问景点的历史信息进行统计(例如:用户访问景点的总个数、一个景点被访问的总用户数、景点类别信息等),获得大量显示信息以描述用户和景点的特征,分别表示为:其中,r和s分别表示用户和景点显示特征的总个数,每个具体的特征如表1和表2所示。这些特征中有些特征的获取需要利用第三方网络服务获得,具体说明如下:S2-4-1, make statistics on the historical information of users visiting scenic spots (for example: the total number of users visiting scenic spots, the total number of users visiting a scenic spot, scenic spot category information, etc.), and obtain a large amount of display information to describe users and scenic spots The characteristics of are expressed as: and Among them, r and s respectively represent the total number of user and attraction display features, and each specific feature is shown in Table 1 and Table 2. The acquisition of some of these features requires the use of third-party network services, as detailed below:

性别和年龄信息:游客的这些信息通过第三方网络服务对照片的内容进行分析获得。例如:www.alchemyapi.com,当该网络服务获得一张上传的照片时,它会调用其API函数(即,Alchemy Vision Face Detection and Recognition API)对照片进行分析,然后返回给上传者该照片中所出现的人脸的性别和年龄信息。对所有在一个景点拍摄的照片中人脸的性别和年龄信息进行统计,即可获得该景点的性别和年龄信息的分布。类似的,通过对一个用户所拍摄的所有照片中人脸的性别和年龄信息的统计,获得该用户所拍摄照片中的人脸的性别和年龄信息的分布。Gender and age information: This information of tourists is obtained by analyzing the content of photos through third-party network services. For example: www.alchemyapi.com, when the web service obtains an uploaded photo, it will call its API function (ie, Alchemy Vision Face Detection and Recognition API) to analyze the photo, and then return the photo to the uploader The gender and age information of the faces that appear. The gender and age information of faces in all the photos taken at a scenic spot are counted to obtain the distribution of the gender and age information of the scenic spot. Similarly, by counting the gender and age information of faces in all the photos taken by a user, the distribution of the gender and age information of the faces in the photos taken by the user is obtained.

天气信息:基于第三方天气网络服务、照片的经纬度信息、以及照片拍摄的时间可以获得该照片拍摄时的天气信息。例如:wundgerground.com,通过该网络服务的API函数,可以获得每个位置的不同时间点的天气信息。对用户在不同天气条件下拍摄的照片进行统计,可以获得该景点在不同天气情况下受游客欢迎的程度。Weather information: Based on the third-party weather network service, the latitude and longitude information of the photo, and the time when the photo was taken, the weather information when the photo was taken can be obtained. For example: wundgerground.com, through the API function of this network service, the weather information of each location at different time points can be obtained. By making statistics on photos taken by users under different weather conditions, it is possible to obtain the degree of popularity of the scenic spot by tourists under different weather conditions.

S2-4-2,将用户和景点的显式特征以及隐式特征联合一起,构建用户和景点的画像,即: S2-4-2, combine the explicit features and implicit features of users and attractions to construct a portrait of users and attractions, namely: and

S2-4-3,结合余弦公式获得用户-用户相似性矩阵A(m×m):S2-4-3. Combine the cosine formula to obtain the user-user similarity matrix A(m×m):

其中,fpi和fqi分别表示用户p和q的第i个显式特征;Among them, f pi and f qi represent the i-th explicit feature of users p and q, respectively;

同样利用余弦公式获得景点-景点相似性矩阵B(n×n),此时,余弦公式为:Also use the cosine formula to obtain the attraction-attraction similarity matrix B(n×n), at this time, the cosine formula is:

其中,f′xi和f′yi分别表示景点x和y的第i个显式特征。where f′ xi and f′ yi denote the ith explicit features of scenic spots x and y, respectively.

S2-5,根据建立的Y、A、以及B,利用用户-用户以及景点-景点的相似性关系构建带有联合正则项的矩阵分解模型,完成对Y的分解。S2-5, according to the established Y, A, and B, use the user-user and attraction-attraction similarity relationship to construct a matrix decomposition model with joint regularization items, and complete the decomposition of Y.

在Y分解过程中,将A和B的相似性信息作为额外正则项限制Y的分解,具体的目标函数为:In the process of Y decomposition, the similarity information of A and B is used as an additional regular term to restrict the decomposition of Y. The specific objective function is:

在该公式中Simig表示用户ui和用户ug之间的相似性信息,Simjq表示景点lj和lq景点之间的相似性信息。U(d×m)和L(d×n)分别是Y分解后的用户和景点的潜在向量表示。其中,m、n、以及d分别表示用户个数、景点个数、以及潜在特征向量的个数。In this formula, Sim ig represents the similarity information between user u i and user u g , and Sim jq represents the similarity information between scenic spots l j and l q . U(d×m) and L(d×n) are latent vector representations of users and attractions after Y decomposition, respectively. Among them, m, n, and d represent the number of users, the number of scenic spots, and the number of latent feature vectors, respectively.

基于带有联合正则项的矩阵分解模型分解Y的具体步骤如下:The specific steps of decomposing Y based on the matrix factorization model with joint regularization items are as follows:

(a)随机初始化参数U和参数L,并设置学习率α、误差阈值δ、参数λ1和λ2(a) randomly initialize parameter U and parameter L, and set learning rate α, error threshold δ, parameters λ 1 and λ 2 ;

(b)对于Y中每一个非零值Rij,根据计算Rij的估计值Xij,并根据计算Xij与真实值Rij的误差,最后根据统计对所有非零值的总误差θ,其中,w为非零值的个数;(b) For each non-zero value R ij in Y, according to Calculate the estimated value X ij of R ij , and according to Calculate the error between X ij and the real value R ij , and finally according to Statistics of the total error θ for all non-zero values, where w is the number of non-zero values;

(c)判断总误差θ是否大于误差阈值δ,若是,执行步骤(d),若否,迭代结束,此时的U和L为最优值,完成对矩阵Y的分解;(c) Judging whether the total error θ is greater than the error threshold δ, if so, perform step (d), if not, the iteration ends, U and L at this time are optimal values, and the decomposition of the matrix Y is completed;

(d)采用梯度下降方法跟新U和L的值,然后跳转执行步骤(b),梯度下降方法的公式为:(d) Use the gradient descent method to update the values of U and L, and then jump to step (b). The formula of the gradient descent method is:

S2-6,根据带有联合正则项的矩阵分解的结果,补全稀疏的Y,获得用户旅行偏爱。S2-6, according to the result of the matrix decomposition with the joint regularization term, complete the sparse Y to obtain the user's travel preference.

根据公式对Y中缺失的值进行填充,填充后的值代表用户的旅行偏爱信息。According to the formula Fill in the missing values in Y, and the filled values represent the user's travel preference information.

表1 用户显式特征信息Table 1 User explicit feature information

表2 景点显式特征信息Table 2 Explicit feature information of attractions

推荐旅游景点阶段Recommended Tourist Attraction Stage

推荐旅游景点的流程如图5所示,主要包括以下步骤:The process of recommending tourist attractions is shown in Figure 5, which mainly includes the following steps:

S3-1,根据用户输入信息获得用户在旅游城市中(目的地)景点集合,这些景点作为推荐的候选集。S3-1. According to the information input by the user, a collection of attractions in the tourist city (destination) of the user is obtained, and these attractions are used as candidate sets for recommendation.

根据用户的ID和用户旅游城市c(目的地),在补全的用户-景点矩阵中进行查找,获得该旅游城市c中的景点集合L′。According to the user's ID and the user's tourist city c (destination), search in the completed user-sightseeing matrix to obtain the scenic spot set L' in the tourist city c.

S3-2,根据步骤1的结果和补全的用户-景点矩阵获得用户对这些景点的打分,即获得用户对这些景点的偏爱。S3-2, according to the result of step 1 and the completed user-attraction matrix, obtain the user's score for these scenic spots, that is, obtain the user's preference for these scenic spots.

补全的用户-景点矩阵中的每一个值体现了用户对不同景点的偏爱打分,根据该矩阵以及L′可以获得用户在旅游城市中对不同景点的偏爱。Each value in the completed user-attraction matrix reflects the user's preference score for different scenic spots. According to this matrix and L', the user's preference for different scenic spots in the tourist city can be obtained.

S3-3,根据S3-2对景点进行排序,选择top N旅游景点推荐给用户。S3-3, sort the attractions according to S3-2, and select top N tourist attractions to recommend to the user.

根据用户在旅游城市中不同景点的打分,按照降序进行排列,将排在前面的N个景点推荐给用户。According to the user's ratings of different scenic spots in the tourist city, they are arranged in descending order, and the top N scenic spots are recommended to the user.

以上所述的具体实施方式对本发明的技术方案和有益效果进行了详细说明,应理解的是以上所述仅为本发明的最优选实施例,并不用于限制本发明,凡在本发明的原则范围内所做的任何修改、补充和等同替换等,均应包含在本发明的保护范围之内。The above-mentioned specific embodiments have described the technical solutions and beneficial effects of the present invention in detail. It should be understood that the above-mentioned are only the most preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, supplements and equivalent replacements made within the scope shall be included in the protection scope of the present invention.

Claims (7)

1. A scenic spot recommendation method based on a dynamic theme model and matrix decomposition comprises the following steps:
and acquiring data set information: obtaining a photo dataset D from a social networkphotoAnd de-noising the image to obtain a data set D of the tourist photophoto-travelThen obtaining check-in data D from social networkcheck-inAnd extracting the check-in data Dcheck-inThe category data set D of the check-in place is obtainedcategory
Obtaining user travel preference stage: the method comprises the following specific steps:
(2-1) Tourism photograph dataset D Using Density-based clustering methodphoto-travelCarrying out spatial clustering on the photos to obtain a tourist attraction set L;
(2-2) based on the data set D from the tourist photophoto-travelCounting the obtained times of visiting the scenic spots by the user and the scenic spots in the scenic spot set L, and constructing a user-scenic spot matrix Y;
(2-3) representing the document as the user travel history, representing the words as tourist attractions, deducing the potential theme probability distribution of the user and the attractions in different time periods by using a dynamic theme model, and combining the theme probability distribution of all time periods to obtain all implicit characteristics of the userAll implicit features of a harmony scene
(2-4) from the tourist photo data set Dphoto-travelAnd category dataset D of check-in placescategoryExtracting explicit characteristics of a userExplicit characteristics of a attractionThen all implicit characteristics of the user are combinedAll implicit features of a harmony sceneCreating a user representation FuserScene image FlocationAnd constructing mxm user-user by cosine functionA similarity matrix A and an nxn scenery-scenery similarity matrix B;
(2-5) according to the Y, A and the B, constructing a matrix decomposition model with a joint regular term by utilizing similarity relations between the user and between the scenic spot and the scenic spot, and completing the decomposition of Y;
(2-6) completing a sparse user-scenery spot matrix Y according to a matrix decomposition result with a joint regular term to obtain a matrix Y' containing user travel preference information;
and recommending tourist attractions: and scoring the scenic spots in the candidate set by using the matrix Y', and recommending the N tourist spots with the top scores to the user.
2. The method for tourist attraction recommendation based on dynamic topic model and matrix decomposition of claim 1, wherein the specific steps of the stage of obtaining data set information are as follows:
(1-1) downloading photo data with geographical position information in a tourist city by using public API of a photo sharing website to form a photo data set Dphoto
(1-2) treating the photograph data set D using the entropy-based fluidity methodphotoFiltering the non-tourist photos to remove the noise photos in the photo set to obtain a tourist photo data set Dphoto-travel
(1-3) downloading user check-in data D in a travel city using a public API of a location-based social media websitecheck-in
(1-4) extracting check-in data Dcheck-inAnd counting category data sets D constituting the check-in placecategory
3. The method for recommending tourist attractions based on dynamic topic model and matrix decomposition of claim 1 wherein the specific steps of step (2-2) are as follows:
(2-2-1) from the tourist photograph data set Dphoto-travelWherein v represents the user u at time tVisiting a tourist attraction l;
(2-2-2) counting the number of times each user visits each attraction and the travel photograph data set D according to all the access information v ═ (l, u, t)photo-travelThe total number m of users in (1);
(2-2-3) constructing a user-sight spot matrix Y according to the number of times each user accesses each sight spot, the total number m of the users and n total sight spots in the sight spot set L, wherein Y belongs to Rm×nThe value at the (i, j) position in the matrix Y is the number of times the ith user accesses the jth attraction.
4. The method for tourist attraction recommendation based on dynamic theme model and matrix decomposition as claimed in claim 1, wherein said steps (2-3) are specifically:
(2-3-1) Tourism photograph data set Dphoto-travelAll the access information v ═ l, u, t in the sub data sets are sliced according to the same time length, and M sub data sets corresponding to each time period are obtained;
(2-3-2) using the subdata sets as the input of the dynamic theme model, obtaining the theme probability distribution of the user and the scenic spots in different time periods through training,
wherein,is the topic probability distribution of the user at the T time period,is the topic probability distribution of the scenic spot in the Tth time period, k is the number of topics,for the kth topic probability distribution for the user at the T time period,the k topic probability distribution of the scenic spot in the T time period;
(2-3-3) concatenating the topic probability distributions of the users in all time periods together in time to form all implicit characteristics of the usersThe theme probability distributions of the scenic spots in all time periods are connected in series according to time to form all implicit characteristics of the scenic spots
5. The method of claim 4, wherein the steps (2-4) comprise the following steps:
(2-4-1) for tourist photo data set Dphoto-travel(l, u, t) and the category data set D of the check-in placecategoryMaking statistics to obtain user explicit characteristicsExplicit characteristics of a attractionWherein r is the total number of the explicit characteristics of the user, and s is the total number of the explicit characteristics of the scenic spots;
(2-4-2) combining the explicit characteristics and the implicit characteristics of the user to construct the user portraitCombining the explicit characteristics and the implicit characteristics of the scenic spot to construct the scenic spot portrait
(2-4-3) combining the cosine formula to obtain a user-user similarity matrix A:
wherein u isp,uqRepresenting user p and user q, f, respectivelypiAnd fqiRespectively represents upAnd uqR is the total number of explicit features;
the sight-sight similarity matrix B is also obtained by using a cosine formula, and at this time, the cosine formula is:
wherein lx,lyRespectively representing sight x and sight y, fxiAnd fyiRespectively represent lxAnd lyS is the number of explicit features.
6. The method as claimed in claim 5, wherein in the step (2-5), in the decomposition of the matrix Y, the similarity information of A and B is used as an additional regular term to limit the decomposition of Y, and the specific objective function is:
wherein R isijIs a matrixValue at the (I, j) position in Y, IijThe identifier indicating whether the user i accesses the sight j has a value of 1 if the user i accesses the sight j, and 0 if the user i does not access the sight jigInformation indicating the similarity between user i and user g, SimjqIndicating a point of sightjSimilarity information between q scenic spots and the scenic spot; u shapeiPotential feature vector, U, denoted as user igPotential feature vector, L, represented as user gjPotential feature vector, L, denoted as sight jqRepresented as a potential feature vector of the sight q,representing the distance between the potential feature vector of the user i and the potential feature vector of the user g, G (i) representing a similarity user group of the user i, Q (j) representing a similarity scenery group of a scenery j, U being the potential feature vector of the user after Y decomposition, and being d multiplied by m dimension; l is a potential feature vector of the scenic spot after Y decomposition and is d multiplied by n dimension; wherein m, n and d respectively represent the number of users, the number of scenic spots and the number of potential feature vectors;
the specific steps of decomposing Y based on the matrix decomposition model with the joint regularization term are as follows:
(a) randomly initializing U and L, and setting learning rate α, error threshold delta, parameter lambda1And λ2
(b) For each non-zero value R in YijAccording toCalculation of RijEstimated value X ofijAccording toCalculating XijWith the true value RijIs finally based onCounting the total error theta of all nonzero values, wherein w is the number of the nonzero values;
(c) judging whether the total error theta is larger than an error threshold value delta or not, if so, executing the step (d), otherwise, finishing iteration, and finishing the decomposition of the matrix Y, wherein the U and the L are optimal values;
(d) updating the values of U and L by adopting a gradient descent method, and then skipping to execute the step (b), wherein the formula of the gradient descent method is as follows:
7. the method of claim 1, wherein the scenic spot recommendation method based on the dynamic topic model and the matrix factorization comprises the following steps:
(3-1) acquiring a scenic spot set of the user in the tourist city according to the user input information, wherein the scenic spots are used as a recommended candidate set;
(3-2) obtaining the scores of the recommended candidate concentrated scenic spots by the user according to the matrix Y' to obtain the scenic spots preferred by the user;
and (3-3) sorting the scores of the preferred scenic spots in a descending order, and selecting N tourist attractions with top scores and ranks to recommend to the user.
CN201710237404.6A 2017-04-12 2017-04-12 A Tourist Attraction Recommendation Method Based on Dynamic Topic Model and Matrix Factorization Expired - Fee Related CN107133277B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710237404.6A CN107133277B (en) 2017-04-12 2017-04-12 A Tourist Attraction Recommendation Method Based on Dynamic Topic Model and Matrix Factorization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710237404.6A CN107133277B (en) 2017-04-12 2017-04-12 A Tourist Attraction Recommendation Method Based on Dynamic Topic Model and Matrix Factorization

Publications (2)

Publication Number Publication Date
CN107133277A CN107133277A (en) 2017-09-05
CN107133277B true CN107133277B (en) 2019-09-06

Family

ID=59716372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710237404.6A Expired - Fee Related CN107133277B (en) 2017-04-12 2017-04-12 A Tourist Attraction Recommendation Method Based on Dynamic Topic Model and Matrix Factorization

Country Status (1)

Country Link
CN (1) CN107133277B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019091569A1 (en) * 2017-11-10 2019-05-16 Bayerische Motoren Werke Aktiengesellschaft Method and apparatus for smartly managing a plurality of potential travel destinations of a user
CN110119822B (en) * 2018-02-06 2024-03-15 阿里巴巴集团控股有限公司 Scenic spot management, journey planning method, client and server
CN108537691A (en) * 2018-06-08 2018-09-14 延晋 A kind of region visit intelligent management system and method
CN109754305A (en) * 2018-11-13 2019-05-14 北京码牛科技有限公司 The preference method of excavation and device based on matrix decomposition algorithm
CN110263256B (en) * 2019-06-21 2022-12-02 西安电子科技大学 Personalized recommendation method based on multi-mode heterogeneous information
CN110348968B (en) * 2019-07-15 2022-02-15 辽宁工程技术大学 A recommendation system and method based on user and item coupling relationship analysis
CN110569447B (en) * 2019-09-12 2022-03-15 腾讯音乐娱乐科技(深圳)有限公司 Network resource recommendation method and device and storage medium
US11402223B1 (en) 2020-02-19 2022-08-02 BlueOwl, LLC Systems and methods for generating scenic routes
US11378410B1 (en) 2020-02-19 2022-07-05 BlueOwl, LLC Systems and methods for generating calm or quiet routes
CN112257517B (en) * 2020-09-30 2023-04-21 中国地质大学(武汉) A Tourist Attraction Recommendation System Based on Attraction Clustering and Group Emotion Recognition
CN112348291B (en) * 2020-12-07 2022-08-26 福州灵和晞科技有限公司 Travel information management method
US11477603B2 (en) 2021-03-03 2022-10-18 International Business Machines Corporation Recommending targeted locations and optimal experience time
CN113505311B (en) * 2021-07-12 2022-03-11 中国科学院地理科学与资源研究所 An interactive recommendation method for tourist attractions based on "latent semantic space"
CN114139052B (en) * 2021-11-19 2022-10-21 北京百度网讯科技有限公司 Ranking model training method for intelligent recommendation, intelligent recommendation method and device
CN117575125A (en) * 2024-01-17 2024-02-20 巢湖学院 Path optimization method based on matrix complement collaborative filtering and quantum approximation optimization

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045865A (en) * 2015-07-13 2015-11-11 电子科技大学 Kernel-based collaborative theme regression tag recommendation method
CN106055713A (en) * 2016-07-01 2016-10-26 华南理工大学 Social network user recommendation method based on extraction of user interest and social topic

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9542477B2 (en) * 2013-12-02 2017-01-10 Qbase, LLC Method of automated discovery of topics relatedness

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045865A (en) * 2015-07-13 2015-11-11 电子科技大学 Kernel-based collaborative theme regression tag recommendation method
CN106055713A (en) * 2016-07-01 2016-10-26 华南理工大学 Social network user recommendation method based on extraction of user interest and social topic

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Trip similarity computation for context-aware travel recommendation exploiting geotagged photos;Zhengxing Xu;《2014 IEEE 30th International Conference on Data Engineering Workshops》;20140519;第1-5页
基于主题模型的矩阵分解推荐算法;林晓勇等;《计算机应用》;20151215;第122-125页

Also Published As

Publication number Publication date
CN107133277A (en) 2017-09-05

Similar Documents

Publication Publication Date Title
CN107133277B (en) A Tourist Attraction Recommendation Method Based on Dynamic Topic Model and Matrix Factorization
Zheng et al. GeoLife: A collaborative social networking service among user, location and trajectory.
CN109241454B (en) A point of interest recommendation method that integrates social network and image content
CN107133262B (en) A Personalized POI Recommendation Method Based on Multi-influence Embedding
CN105740401B (en) A kind of interested site recommended method and device based on individual behavior and group interest
Jiao et al. A novel next new point-of-interest recommendation system based on simulated user travel decision-making process
Zhao et al. Photo2Trip: Exploiting visual contents in geo-tagged photos for personalized tour recommendation
CN112069415A (en) Interest point recommendation method based on heterogeneous attribute network characterization learning
Xing et al. Points-of-interest recommendation based on convolution matrix factorization
CN106997389A (en) Recommend method in a kind of tourist attractions based on many data sets and collaboration tensor resolution
CN109948066B (en) Interest point recommendation method based on heterogeneous information network
Zhuang et al. Understanding People Lifestyles: Construction of Urban Movement Knowledge Graph from GPS Trajectory.
CN105069717A (en) Personalized travel route recommendation method based on tourist trust
CN109062962A (en) A kind of gating cycle neural network point of interest recommended method merging Weather information
Li et al. Where you instagram? associating your instagram photos with points of interest
CN108897750B (en) Personalized location recommendation method and device integrating multiple contextual information
Lyu et al. Weighted multi-information constrained matrix factorization for personalized travel location recommendation based on geo-tagged photos
Noorian et al. A novel Sequence-Aware personalized recommendation system based on multidimensional information
Liu et al. Recommending attractive thematic regions by semantic community detection with multi-sourced VGI data
Zheng et al. Location-based social networks: Locations
Hu et al. Nonnegative matrix tri-factorization with user similarity for clustering in point-of-interest
CN115408621B (en) Point-of-interest recommendation method considering linear and nonlinear interaction of auxiliary information features
Wang et al. ST-SAGE: A spatial-temporal sparse additive generative model for spatial item recommendation
CN112800111A (en) A Location Prediction Method Based on Training Data Mining
CN112784177B (en) A Spatial Distance Adaptive Next Interest Point Recommendation Method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190906