Nothing Special   »   [go: up one dir, main page]

CN113239271B - Recommendation method based on interest drift - Google Patents

Recommendation method based on interest drift Download PDF

Info

Publication number
CN113239271B
CN113239271B CN202110512435.4A CN202110512435A CN113239271B CN 113239271 B CN113239271 B CN 113239271B CN 202110512435 A CN202110512435 A CN 202110512435A CN 113239271 B CN113239271 B CN 113239271B
Authority
CN
China
Prior art keywords
interest
user
node
cluster
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110512435.4A
Other languages
Chinese (zh)
Other versions
CN113239271A (en
Inventor
李川
陈荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202110512435.4A priority Critical patent/CN113239271B/en
Publication of CN113239271A publication Critical patent/CN113239271A/en
Application granted granted Critical
Publication of CN113239271B publication Critical patent/CN113239271B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a recommendation method based on interest drift, which comprises the following steps: acquiring a recommendation data set with time information and project attribute information; constructing a user interest tree, and introducing time attenuation into the user interest tree to create a user interest drift model; selecting a current interest of a user; and generating recommendations by combining the recommendation method of the heterogeneous information network representation learning. The invention firstly provides a hierarchical clustering method based on content information and collaborative information to construct a basic interest model, thereby relieving the problems of biased interest, insufficient mining and the like of the existing method; secondly, generating an interest tree according to the historical records of the user by taking the basic interest model as a template, introducing a multi-granularity time table and an interest weight updating mechanism to simulate the change process of the user interest, and realizing the modeling of interest drift; and finally, selecting a recommended sample according to the current interest selection scheme to generate a final recommended result, so that the recommended result is more in line with the actual situation and accurate, and the information mining is more sufficient.

Description

Recommendation method based on interest drift
Technical Field
The invention relates to the technical field of recommendation, in particular to a recommendation method based on interest drift.
Background
In a recommendation scenario, the user interests are never invariant. For items in a recommendation system, the inherent characteristics, such as genre, country, director, etc., of the movie item are not easily changed except for a part of the inherent characteristics, and other attributes, such as user's awareness of the item and popularity of the item, etc., are different with the passage of time, such as a newly released movie always receives more attention than an old movie in the same category. Also, as for the user interests, which change over time due to interference from the user himself and external factors, a user who historically prefers comedy movies may have a strong interest in them due to recent heat of foreign science fiction movies and enthusiasm discussion and strong recommendations of surrounding friends, while a user who prefers entertainment news in the student age may pay more attention to financial or social news as the age increases, and the recommendation system refers to a phenomenon that such user interests dynamically change over time as interest drifts. The interest drift phenomenon is common in the recommendation system, and it can be said that each user of the recommendation system has a transition of interest anytime and anywhere. A recommendation system with good performance needs to capture the interest difference of different users and also needs to have the ability to adapt to the interest change of the users. As for the user interests, the user interests change over time due to interference of the user and external factors, and the recommendation system refers to the phenomenon that the user interests change dynamically over time as interest drift. Therefore, for the complex system which requires providing instant and accurate personalized services and takes user preference as a main implementation basis, the research of the recommendation method which can adapt to the interest drift of the user is significant.
Building a user interest model is a common method of modeling interest drift. In this method, information such as subjective evaluation of the user, for example, scores and comments is often used. However, there are two non-negligible problems associated with generating an interest model from a user perspective. Firstly, the formation of user interest in a recommendation system is closely related to the characteristics of items, and the interest in an interest model is easily biased by simply constructing the interest model from the perspective of users, so that the problem of mismatching with the mainstream characteristics of the items can be induced; secondly, in a recommendation system, the comment or score data of a user often has sparsity, so that the interest set in the interest model cannot completely cover all potential interest points of the user for such items, and the interest mining is insufficient.
Disclosure of Invention
The invention aims to provide a recommendation method based on interest drift, which is used for solving the problems that in the prior art, the user interest drift is not considered in a recommendation system or method, and the recommendation effect is poor due to biased mining and insufficient interest set mining.
The invention solves the problems through the following technical scheme:
a recommendation method based on interest drift comprises the following steps:
step S100: acquiring a recommendation data set with time information and item attribute information:
step S200: constructing a user interest tree, and introducing time attenuation into the user interest tree to create a user interest drift model;
step S300: selecting a current interest of a user;
step S400: and generating recommendations by combining the recommendation method of the heterogeneous information network representation learning.
According to the invention, the user interest tree is built in the recommendation data set, time attenuation is introduced into the user interest tree, the change of the user interest along with the time is considered, the interest drift model is built, the capture of the dynamic change of the user interest is realized, and the recommendation is generated according to the current interest of the user, so that the recommendation result is more consistent with the real situation, and the problems of biased interest, insufficient mining and the like existing in the conventional method are solved.
The step S200 includes:
step A1: acquiring a basic interest set;
step A2: constructing a basic interest tree;
step A3: generating a user interest tree, and introducing a time table into the interest tree;
step A4: setting a time division sequence, and acquiring interest sets of users in different periods;
step A5: calculating the attenuation weight of each interest in the interest tree in different periods;
step A6: the decay weights are written to the schedule in chronological order.
The step a1 specifically includes:
(1) computing item similarity matrices
Figure GDA0003431244580000031
Wherein i and j are different items, ESijIn order for the collaboration to be explicit,
Figure GDA0003431244580000032
Ri、Rja scoring set of items i, j; ISijAs implicit collaboration,
Figure GDA0003431244580000033
Ui、UiIs a set of users who have interacted with the items i, j;
CSijin order to be a similarity of the two items,
Figure GDA0003431244580000034
Tian attribute tag for item i; t isjAn attribute tag for item j;
(2) normalization of sim: traversing the project similarity matrix, and setting the corresponding position of the neighbor matrix which is greater than or equal to the similarity threshold value to 1; setting the corresponding position of the neighbor matrix smaller than the similarity threshold value to 0;
(3) calculating an item Link matrix Link, Link being Ne · NeT(ii) a Ne is a neighbor matrix of the neighbor set,
(4) generating a basic interest set BI:
initializing a set BI for storing basic interests, wherein each item is a cluster at first, constructing a local heap q [ i ] for a cluster i, wherein the q [ i ] comprises all clusters which are combined with the cluster of the cluster i and have metric function values not equal to 0, and arranging the clusters in a descending order according to the sizes of the metric function values combined with the clusters;
constructing a global heap Q for all clusters, wherein the global heap Q comprises the cluster with the maximum similarity in each local heap Q [ i ];
when the | Q | > M is satisfied, executing B1-B7 in a loop, otherwise executing B8;
b1: acquiring a cluster u with the maximum similarity from the global heap Q;
b2: obtaining the cluster v with the maximum similarity to the cluster u from the partial pile q [ u ] of the cluster u
B3: deleting the cluster v from the global heap Q, and combining the cluster v and the cluster u to form a new cluster w;
b4: updating a global heap Q and a local heap Q [ i ]; wherein M is the number of clusters;
b5: calculating the number of links of the new cluster w and the cluster x in the q [ u ] and the q [ v ];
b6: deleting the merged cluster u and cluster v from the local heap of cluster x;
b7: updating the local heap of the cluster x and the new cluster w, and updating the global heap;
b8: inserting the new cluster w into the global heap, releasing the local heap of the cluster u and the cluster v, and putting M clusters generated finally into a set BI; m clusters are basic interests, i.e. meta interests.
The step a2 is to perform secondary clustering by using M clusters in the set BI as a cluster set C to generate a basic interest tree with a hierarchical structure, and specifically includes:
step 1: initializing a list C-Sequence for storing the clustering intermediate result;
step 2: putting the cluster set C into C-Sequence;
and step 3: calculating the similarity between each cluster in the cluster set C:
Figure GDA0003431244580000041
in the formula, Ci、CjAre different clusters; n isi、njIs a cluster Ci、CjThe number of samples of (a); according to the Rock algorithm, f (θ)) ═ 1- θ)/(1+ θ; and link [ C ]i,Cj]And sim [ C ]i,Cj]Then the following calculation formulas are respectively provided:
Figure GDA0003431244580000042
Figure GDA0003431244580000043
pq、prare data points;
and 4, step 4: two clusters C with the greatest similarityiAnd CjMerging to generate a new cluster Ci-j
And 5: updating cluster set C, i.e. CiAnd CjDeleting from the cluster and adding a new cluster Ci-jPutting the updated cluster set C into C-Sequence;
step 6: judging whether the number of elements in the cluster set C is 1, if not, returning to the step 2 to continue executing; if yes, the current cluster set C and the list C-Sequence recording the intermediate result are returned.
And 7: and (4) reproducing the secondary clustering process according to the cluster set C and the intermediate result list C-Sequence and the top-down Sequence, and generating a basic interest tree with a hierarchical structure.
The method for generating the interest tree in the step a3 includes:
step A31: acquiring historical records of users, and arranging the historical records according to a time sequence from near to far;
step A32: time division is carried out on the historical records according to unequal time intervals;
step A33: acquiring meta-interest set of user in each period
Figure GDA0003431244580000051
Initializing interest sets
Figure GDA0003431244580000052
Traversing a collection of items
Figure GDA0003431244580000053
Traversing the MI of the meta interest set and judging that the user u is at TiWhether epoch possesses Meta-interest CjIf yes, the meta-interest CjJoining interest collections
Figure GDA0003431244580000054
Outputting a set of meta-interests
Figure GDA0003431244580000055
Step A34: all meta-interest sets to be obtained
Figure GDA0003431244580000056
From far to near according to timeArrange and calculate all user pairs
Figure GDA0003431244580000057
An interest weight for each interest;
step A35: on the constructed interest tree, each node is modified by adding a pointer to the node, the pointer points to a time table, and interest weights of the user on the corresponding interest nodes in different periods are recorded in the time table;
step A36: all will be
Figure GDA0003431244580000061
The interest weights of the corresponding interests are written into the time table, and the interest weights of the non-leaf interests in each period are determined by the accumulated sum of the weights of the sub-interests in the corresponding period.
The step S300 includes:
step S310: acquiring a meta-interest set of a user according to the user interest tree;
step S320: replacing the meta-interests according to parent-child relations among the meta-interests;
step S330: calculating the active time of each meta-interest in the replaced meta-interest set;
step S340: and acquiring the current interest of the user according to the meta-interest active time and the interest weight.
The step S400 includes:
step S410: obtaining a recommended sample according to the current interest of each user;
step S420: constructing a heterogeneous information network according to the recommended samples;
step S430: performing score prediction by using a HEFFM method, wherein the method comprises the following steps:
step S431: extracting information, performing representation learning on nodes in a heterogeneous information network, wherein the nodes comprise user nodes and project nodes, and acquiring low-dimensional vectors of users and projects;
step S432: directly butting the low-dimensional vectors of the user and the project with a recommended task, inputting the low-dimensional vectors into a domain perception factor decomposition model as recommended sample features, selecting the features by adding a group lasso as a regular term, and completing grading prediction between the user and the project
Step S440: and generating a TOP-N scoring list, namely a final recommendation list.
The step S431 includes:
step D1: generating a semantic graph according to the meta-structure; the element structure comprises a complex element structure containing a nonlinear structure and a linear element structure only modeling a linear relation, and the specific process of generating the semantic graph comprises the following steps:
step D11: extracting user nodes and comment nodes from the Yelp information network, and establishing links between the user nodes and the comment nodes to form an abnormal composition HG;
step D12: finding out comment pairs which are specific to the same user and contain the same keywords from the Yelp information network, and putting the comment pairs into a set W;
step D13: traversing the set W, establishing a link for the comments in the set W to the heterogeneous composition HG to form a relation R-R, wherein the linear element structure in the heterogeneous composition HG has the semantic of the element structure;
step D14: when the structure is a complex element structure, constructing a corresponding adjacency matrix according to nodes and relations existing in the heterogeneous graph HG; when the structure is a linear element structure, generating an adjacent matrix from an original heterogeneous information network;
step D15: performing matrix operation along a linear element structure in a heterogeneous graph HG to generate a homogeneous matrix AUU
Step D16: according to isomorphic matrix AUUConstructing an isomorphic graph SG, wherein the isomorphic graph SG is a semantic graph corresponding to a corresponding complex element structure or linear original structure;
step D2: dynamically truncating random walk on a semantic graph, acquiring a node sequence R simultaneously containing semantic information and structural information, taking the node sequence R as the input of a skip-gram model, and acquiring a node low-dimensional vector, wherein the method specifically comprises the following steps:
step D21: projecting nodes on semantic graph into heterogeneous information network
Figure GDA0003431244580000071
Calculating node similarity matrixes of a complex element structure CS and a linear element structure LS;
constructing an adjacency matrix W of user nodes and comment nodesURConstructing an adjacency matrix W of comment nodes and project nodesRBAnd constructing an adjacency matrix W of comment nodes and keyword nodesRK
Obtaining C1And C2Of which
Figure GDA0003431244580000072
Similarity matrix of user on complex element structure CS
Figure GDA0003431244580000073
Calculating a node similarity matrix of the linear element structure LS:
Figure GDA0003431244580000074
wherein, WURAs adjacency matrix of user nodes and comment nodes, WRKAn adjacency matrix which is a comment node and a keyword node;
step D22: constraining the number of times of random walks starting from each node, and setting the number of times of random walks starting from each node v as l, wherein l is max (h (v) x maxL, minL), and maxL is the maximum number of times of random walks starting from the node; minL is the minimum number of times of random walk from the node; h (v) is the importance of the node v on the semantic graph;
step D23: the dynamic truncation random walk specifically comprises the following steps:
semantic graph defining a meta-structure s
Figure GDA0003431244580000082
Similarity matrix SIM of nodes on element structure sSMaximum number of migrations maxT for each node, minimum number of migrations minT for each node, maximum length of migrations wl, stopping probability of migration pstop
Initializing list sequences for storing the node sequences;
computing node importance H ═ PageRank (SGs);
E1: calculating the number l of wandering times by taking the node v as an initial node;
e2: initializing a list sequence for storing the current node sequence and recording the current node nnowV, recording the maximum walking times wl _ t;
according to the wandering path, the node x is reached and the transition probability P is recordedtransThe walking path is as follows:
Figure GDA0003431244580000081
in the formula, nxThe current node of the wandering path; n isiIs n for the last hop node of the wandering pathxFirst order neighbors of (1); o (n)i) Is a node niDegree of (d);
adding the node x into the list sequence, and calculating the stopping probability P of the node xx-stop
Figure GDA0003431244580000091
In the formula, PstopIs a pre-specified fixed stopping probability; pstopFor the previous hop node niAnd a current node nxSimilarity without normalization between them;
judging whether stopping at the node x, if so, ending the walking, entering the next step, otherwise, updating the walking step length wl _ t ← wl _ t-1 and the current node nnowJudging whether the number of the wandering times reaches l or not, if so, entering the next step, and if not, returning to E2;
e3: adding the current walk sequence into a list sequences, judging whether all the nodes are calculated, if so, entering the next step, otherwise, returning to E1;
e4: outputting a list sequences of node sequences;
step D24: expressing learning, namely sampling neighbors of the output node sequence through a fixed-length window to obtain a neighbor set of the user, and optimizing the expressing learning by adopting the following formula:
Figure GDA0003431244580000092
in the formula (I), the compound is shown in the specification,
Figure GDA0003431244580000093
a mapping function for embedding nodes into a d-dimensional feature space;
Figure GDA0003431244580000094
the node u is adjacent to the designated element structure;
step D25: node sequence R obtained by dynamically truncating random walks:
R=DynamicTruncatedRandomWalk(SGS,SIMS,maxT,minT,wl,pstop)
and taking R as an input of the skip-gram model, and obtaining a node low-dimensional vector phi which is a skip-gram (d, winL, R).
If there is a direct connection relationship between users in the heterogeneous information network, the method further includes step D3: correcting the user vector, specifically comprising:
step D31: specifying a set of users
Figure GDA0003431244580000101
Define triplets on the basis of<u,ui,uj>Where U ∈ U denotes the target user, UiE.g. U and UjE.g. U are direct neighbor and indirect neighbor of user U respectively, and
Figure GDA0003431244580000102
wherein
Figure GDA0003431244580000103
Representing user u in meta structure
Figure GDA0003431244580000104
A neighbor set of (1); neighbor set of user u
Figure GDA0003431244580000105
All triplets meeting the above requirements form the training data of user u
Figure GDA0003431244580000106
The training data of all users constitutes a meta structure
Figure GDA0003431244580000109
The training data set D is used for carrying out vector correction; definitional symbol >uTo represent the offset relationship of user u on the neighbors, i.e. triplets<u,ui,uj>Can use uiuujInstead of this;
step D32: initializing a training data set D; obtaining user u in-element structure
Figure GDA00034312445800001010
Neighbor set of (N)uObtaining a direct neighbor set DN of the user uuObtaining the indirect neighbor set IN of the user uu
Step D33: adding a triplet consisting of a target user, direct neighbors and indirect neighbors to a training data set;
step D34: parameters are updated according to an iterative formula in a gradient ascent algorithm:
Figure GDA0003431244580000107
Figure GDA0003431244580000108
Figure GDA0003431244580000111
step D35: up to the user vector matrix
Figure GDA0003431244580000112
Converging and outputting the corrected user vector matrix
Figure GDA0003431244580000113
The step S432 specifically includes:
step F1: scoring prediction
And (4) referring to the observation scores between the users and the projects in the data set, splicing the observation scores, and taking the spliced vector as a new recommended sample xn
Figure GDA0003431244580000114
In the formula (I), the compound is shown in the specification,
Figure GDA0003431244580000115
for user uiAnd item bjVector representation on respective ith element structures; d is the dimension of each vector;
step F2: calculate the score using FFM model:
Figure GDA0003431244580000116
in the formula: w is a0Is a global bias, wiCorresponding weight for ith feature and w0The corresponding weight of the combined feature formed by the ith and jth features, and the parameter M is the sample xnI.e., M ═ 2 lxd;
step F3: parameter learning
Obtaining an objective function by using minimum mean square error learning
Figure GDA0003431244580000117
In the formula:ynThe actual score of the nth sample; n is the number of samples;
introducing a set lasso in the objective function that can be used to pick features, the set lasso regularization of parameter p has the following expression:
Figure GDA0003431244580000121
in the formula, pgG is 1, 2,.., G; i | · | purple wind2Is 12A norm;
sample xnFeatures generated by the same meta-structure are grouped into the same group, so sample xnWill be divided into 2L groups, with the following regularization formulas for parameters w and V, respectively:
Figure GDA0003431244580000122
Figure GDA0003431244580000123
in the formula: w is a1Is a vector with dimension d; vlA matrix formed by hidden vectors of the 1 st element structure characteristics on all fields; i | · | purple windFIs the Frobenius norm of the matrix;
combining the objective function and the regularization formula, the optimization objective can be transformed into:
Figure GDA0003431244580000124
and (3) optimizing the model by adopting a non-monotonic acceleration near-end gradient algorithm nmAPG, and outputting/obtaining optimized feature selection.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) according to the method, firstly, the project characteristics are taken as guidance, a hierarchical clustering method based on content information and collaborative information is provided, a basic interest model is constructed according to a clustering result, high unification of user interest and project characteristics is achieved, and the problems of biased interest, insufficient mining and the like existing in the existing method are solved; secondly, generating a unique interest tree for each user according to the historical records of the users by taking the basic interest model as a template, and simulating the change process of the user interest by introducing a multi-granularity time table divided based on the inclined time and an interest weight updating mechanism based on the time attenuation into the user interest tree to realize the modeling of the interest drift; and finally, a current interest selection scheme based on time information and interest preference is provided, so that the obtained current interest meets the requirement of the recent interest and belongs to the category of important interests, a recommendation sample is selected according to the current interest, and a final recommendation result is generated by a recommendation method based on heterogeneous information network representation learning, so that the recommendation result is more consistent with the actual situation, more accurate and more sufficient in information mining.
(2) The invention realizes the capture of the dynamic change of the user interest by obliquely dividing the historical records with time attributes of the user, introducing a multi-granularity time table into the user interest tree, recording the preference weight of the user to each interest in different periods by utilizing the time table and modeling the drift process of the interest by combining a weight updating mechanism based on time attenuation. Therefore, the user can not only have a list of interests in different periods, but also can be aware of the change of each interest.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples, but the embodiments of the present invention are not limited thereto.
Example (b):
with reference to fig. 1, a recommendation method based on interest drift includes an interest model construction, an interest drift modeling, a current interest selection, and a recommendation generation, where the interest model construction is completed by generating a basic interest and representing a user interest:
the recommended interests refer to all possible interests of the user in the specific type of item, the basic interests are regarded as templates of the user interests, and all the interests of the user are derived from the basic interests.
Step S100: acquiring a recommendation data set with time information and item attribute information:
step S200: constructing a user interest tree, introducing time attenuation into the user interest tree to create a user interest drift model:
step A1: acquiring a basic interest set:
(1) computing item similarity matrices
Figure GDA0003431244580000141
Wherein i and j are different items, ESijIn order for the collaboration to be explicit,
Figure GDA0003431244580000142
Ri、Rja scoring set of items i, j; ISijIn order to be an implicit co-operation,
Figure GDA0003431244580000143
Ui、Ujis a set of users who have interacted with the items i, j;
CSijin order to be a similarity of the two items,
Figure GDA0003431244580000144
Tian attribute tag for item i; t isjAn attribute tag for item j;
(2) normalization of sim: traversing the project similarity matrix, and setting the corresponding position of the neighbor matrix which is greater than or equal to the similarity threshold value to 1; setting the corresponding position of the neighbor matrix smaller than the similarity threshold value to 0;
(3) calculating an item Link matrix Link, Link being Ne · NeT(ii) a Ne is a neighbor matrix of the neighbor set,
(4) generating a basic interest set BI:
inputting: item set B, item Link matrix Link, item similarity matrix Sim, clustering number M
And (3) outputting: basic interest set BI
The method comprises the following steps:
Figure GDA0003431244580000151
Figure GDA0003431244580000161
step A2: performing secondary clustering by taking M clusters in the set BI as a cluster set C to generate a basic interest tree with a hierarchical structure:
step A21: initializing a list C-Sequence for storing the clustering intermediate result;
step A22: putting the cluster set C into C-Sequence;
step A23: calculating the similarity between each cluster in the cluster set C:
Figure GDA0003431244580000162
in the formula, Ci、CjAre different clusters; n isi、njIs a cluster Ci、CjThe number of samples of (a); according to the Rock algorithm, f (θ) ═ 1- θ)/(1+ θ); and link [ C ]i,Cj]And sim [ C ]i,Cj]Then the following calculation formulas are respectively provided:
Figure GDA0003431244580000163
Figure GDA0003431244580000164
pq、prare data points;
step A24: two with the greatest similarityCluster CiAnd CjMerging to generate a new cluster Ci-j
Step A25: updating cluster set C, i.e. CiAnd CjDeleting from the cluster and adding a new cluster Ci-jPutting the updated cluster set C into C-Sequence;
step A26: judging whether the number of elements in the cluster set C is 1, if not, returning to the step 2 to continue executing; if yes, the current cluster set C and the list C-Sequence recording the intermediate result are returned.
In each iteration step, two clusters with the largest similarity are selected and combined until only 1 cluster exists in the cluster set C, so that the total iteration time is M-1. Because the M is controlled in a small range, the final iteration times are not too many, and the secondary clustering result can be obtained in a short time;
step A27: and (3) reproducing the secondary clustering process according to the cluster set C and the intermediate result list C-Sequence and the top-down Sequence, and generating a clustering tree with a hierarchical structure, namely a basic interest tree:
inputting: cluster set C, intermediate result list C-Sequence
And (3) outputting: root node root of basic interest tree
The method comprises the following steps:
Figure GDA0003431244580000171
Figure GDA0003431244580000181
the present invention treats each node, i.e., each cluster, in the underlying interest tree as an interest that a user may have on such items, where leaf clusters are referred to as meta-interests. According to the tree structure, interests can be further divided into fine and coarse categories, wherein fine interests can be used for capturing fine changes of user interests, and coarse interests can reflect large changes of user interests.
The interest model of the user is constructed by taking a basic interest tree which is closely related to the characteristics of the item as a template:
step A3: generating a user interest tree, and introducing a time table into the interest tree;
step A31: acquiring historical records of users, and arranging the historical records according to a time sequence from near to far;
step A32: time division is carried out on the historical records according to unequal time intervals;
step A33: acquiring meta-interest set of user in each period
Figure GDA0003431244580000182
Inputting: user u is at TiItem collections for epochs
Figure GDA0003431244580000183
Meta interest set MI
And (3) outputting: user u is at TiInterest set of epochs
Figure GDA0003431244580000184
The method comprises the following steps:
Figure GDA0003431244580000185
Figure GDA0003431244580000191
step A34: all meta-interest sets to be obtained
Figure GDA0003431244580000192
Arrange according to time period from far to near, and calculate user to all
Figure GDA0003431244580000193
An interest weight for each interest;
step A35: constructing an interest tree:
inputting: cluster set C, intermediate result list C-Sequence
And (3) outputting: root node root of basic interest tree
The method comprises the following steps:
Figure GDA0003431244580000194
Figure GDA0003431244580000201
on the constructed interest tree, each node is modified by adding a pointer to the node, the pointer points to a time table, the interest preference of the user to the corresponding interest node in different periods is recorded in the time table, and the interest preference is to the user in a certain period TiInterest in CjAnd their weights
Figure GDA0003431244580000202
Description of (1), as
Figure GDA0003431244580000203
And has the following formal definition:
Figure GDA0003431244580000204
in the formula:
Figure GDA0003431244580000208
as interest CjThe interest weight of (2) implies the interest C of the userjThe degree of preference of (c).
About interest weight
Figure GDA0003431244580000207
This document holds the same thinking as the second question, considering that a user's preference for a certain interest should be reflected by the user's interaction with an item, the user being at a certain time period TiIf and belong to CjThe more item interactions of (C), the user is to CjThe higher the interest of (1), the interest weight
Figure GDA0003431244580000211
The larger; also, the user has a certain period TiIf the inner pair belongs to CjThe higher the item preference of (C), the user pair CjThe preference of (a) will increase as the preference of the item increases. Therefore, the preference weight is given to the interest for the reason of considering the two aspects
Figure GDA0003431244580000212
The following calculation formula is presented herein:
Figure GDA0003431244580000213
in the formula:
Figure GDA0003431244580000214
for the user during the period TiThe set of interactive items of (1);
Figure GDA0003431244580000215
for the user during the period TiHas interaction therein and belongs to CjIs that of
Figure GDA0003431244580000216
Figure GDA0003431244580000217
For user to item bqA preference weight of
Figure GDA0003431244580000218
The invention provides a project preference measurement mode based on score deviation, which uses the difference value between the user score and the project average score for calculating the project preference, and is shown as the following formula:
Figure GDA0003431244580000219
in the formula: r isuGiving user u an item bqScoring of (4);
Figure GDA00034312445800002110
is an item bqAverage score of (3). As can be seen from the formula, the larger the difference value is, the larger the weight is, the higher the preference of the user for the item is;
step A36: all will be
Figure GDA00034312445800002111
The interest weight of the corresponding interest in the table is written into the table, and the non-leaf interest table record is updated according to the principle that the parent interest weight is determined by the child interest, namely the interest weight of the non-leaf interest in each period is determined by the accumulated sum of the weights of the child interest in the corresponding period.
Step S300: interest drift modeling and current interest selection:
interest drift modeling: the interest weight in the user interest model is an important basis for judging whether the user interest changes, so that the interest drift modeling process is set as an interest weight updating rule.
Carrying out time attenuation on the interest weight, and writing the attenuated weight into a time table, which is called as an attenuation weight dw;
given the time decay function h (t), the update rule of the meta-interest decay weight is as follows: if meta-interest CjWhen it occurs for the first time, it is calculated according to the formula 4-9, i.e.
Figure GDA0003431244580000226
If meta-interest CjNot first appearing, it is calculated as follows:
Figure GDA0003431244580000222
in the formula:
Figure GDA0003431244580000223
is a meta interest CjAt the current timeTkInterest weight of;
Figure GDA0003431244580000224
is a meta interest CjIn the last period Tk-1The attenuation weight of (d); h (t) is a pre-specified time decay function,
as can be seen from the above equation, if meta-interest does not explicitly exist in the next epoch, only the weight of the previous epoch is attenuated, i.e.
Figure GDA0003431244580000225
If the meta-interest exists in two adjacent periods, corresponding interest enhancement is carried out on the meta-interest according to an interest weight calculation method in a specific period on the basis of attenuation of the meta-interest so as to ensure that the updating rule conforms to the real change situation of the user interest
For the time decay function h (t), reference is made herein to newton's law of cooling, which is defined as follows:
h(t)=e-λ×Δt (4-12)
in the formula: Δ t is the period length of the current period; λ is a pre-specified attenuation factor.
In summary, by incorporating a time decay mechanism in the calculation process of the interest weight, the user interest model provided herein can simulate the change process of the user interest, and therefore, the weight in the schedule is considered as a result formed under the action of the user interest drift, and is considered as an important basis for judging the current interest of the user.
Current interest selection:
step 1: acquiring all meta-interests of which the time tables are not empty from the user interest tree;
step 2: grouping all the meta-interests acquired in the step 1 according to the parent interests, and grouping the meta-interests with the same parent interests into the same group;
and step 3: checking the number of meta-interests of each group and comparing the meta-interests with the number of sub-interests of the parent interests, if the meta-interests are equal to the number of sub-interests of the parent interests, replacing the group with the parent interests and recording a finally obtained interest set as CI;
and 4, step 4: arranging the interests in the set CI in sequence from large to small according to the latest weight, and intercepting the first N interests and putting the N interests into the set A, wherein N is alpha and CI, and alpha is a pre-specified interest interception factor;
and 5: arranging the interests in the set CI from large to small according to the active time, and still intercepting the first N interests and putting the former N interests into the set B;
step 6: and performing intersection operation on the set A, B, wherein the interest set obtained by the operation result is the current interest set of the user.
Through the selection process, the interest set mined by the scheme provided by the invention not only belongs to the category of the recent access interests of the user, but also achieves the purpose of including the main interests of the user.
Step S400: and (3) generating a recommendation by combining a recommendation method of heterogeneous information network representation learning:
step S410: obtaining a recommended sample according to the current interest of each user;
step S420: constructing a heterogeneous information network according to the recommended samples;
step S430: performing score prediction by using a HEFFM method, wherein the method comprises the following steps:
step S431: extracting information, performing representation learning on nodes in a heterogeneous information network, wherein the nodes comprise user nodes and project nodes, and acquiring low-dimensional vectors of users and projects;
step D1: generating a semantic graph according to the meta-structure; the element structure comprises a complex element structure containing a nonlinear structure and a linear element structure only modeling a linear relation, and the specific process of generating the semantic graph comprises the following steps:
step D11: extracting user nodes and comment nodes from the Yelp information network, and establishing links between the user nodes and the comment nodes to form an abnormal composition HG;
step D12: finding out comment pairs which are specific to the same user and contain the same keywords from the Yelp information network, and putting the comment pairs into a set W;
step D13: traversing the set W, establishing a link for the comments in the set W to the heterogeneous composition HG to form a relation R-R, wherein the linear element structure in the heterogeneous composition HG has the semantic of the element structure;
step D14: when the structure is a complex element structure, constructing a corresponding adjacency matrix according to nodes and relations existing in the heterogeneous graph HG; when the structure is a linear element structure, generating an adjacent matrix from an original heterogeneous information network;
step D15: performing matrix operation along a linear element structure in a heterogeneous graph HG to generate a homogeneous matrix AUU
Step D16: according to isomorphic matrix AUUConstructing an isomorphic graph SG, wherein the isomorphic graph SG is a semantic graph corresponding to a corresponding complex element structure or linear original structure;
step D2: dynamically truncating random walk on a semantic graph, acquiring a node sequence R simultaneously containing semantic information and structural information, taking the node sequence R as the input of a skip-gram model, and acquiring a node low-dimensional vector, wherein the method specifically comprises the following steps:
step D21: projecting nodes on semantic graph into heterogeneous information network
Figure GDA0003431244580000241
Calculating node similarity matrixes of a complex element structure CS and a linear element structure LS;
constructing an adjacency matrix W of user nodes and comment nodesURConstructing an adjacency matrix W of comment nodes and project nodesRBAnd constructing an adjacency matrix W of comment nodes and keyword nodesRK
Obtaining C1And C2Of which
Figure GDA0003431244580000251
Similarity matrix of user on complex element structure CS
Figure GDA0003431244580000252
Calculating a node similarity matrix of the linear element structure LS:
Figure GDA0003431244580000253
wherein, WURAs adjacency matrix of user nodes and comment nodes, WRKAn adjacency matrix which is a comment node and a keyword node;
step D22: constraining the number of times of random walks starting from each node, and setting the number of times of random walks starting from each node v as l, wherein l is max (h (v) x maxL, minL), and maxL is the maximum number of times of random walks starting from the node; minL is the minimum number of times of random walk from the node; h (v) is the importance of the node v on the semantic graph;
step D23: the dynamic truncation random walk specifically comprises the following steps:
semantic graph defining a meta-structure s
Figure GDA0003431244580000254
Similarity matrix SIM of nodes on element structure sSMaximum number of migrations maxT for each node, minimum number of migrations mint for each node, maximum length of migrations wl, stopping probability of migration pstop
Initializing list sequences for storing the node sequences;
computing node importance H ═ PageRank (SGs);
E1: calculating the number l of wandering times by taking the node v as an initial node;
e2: initializing a list sequence for storing the current node sequence and recording the current node nnowV, recording the maximum walking times wl _ t;
according to the wandering path, the node x is reached and the transition probability p is recordedtransThe walking path is as follows:
Figure GDA0003431244580000261
in the formula, nxThe current node of the wandering path; n isiIs n for the last hop node of the wandering pathxFirst order neighbors of (1); o (n)i) Is a node niDegree of (d);
node x is added to the list sequence,computing the stopping probability p for node xx-stop
Figure GDA0003431244580000262
In the formula, PstopIs a pre-specified fixed stopping probability; sim (n)i,nx) For the previous hop node niAnd a current node nxSimilarity without normalization between them;
judging whether stopping at the node x, if so, ending the walking, entering the next step, otherwise, updating the walking step length wl _ t ← wl _ t-1 and the current node nnowJudging whether the number of the wandering times reaches l or not, if so, entering the next step, and if not, returning to E2;
e3: adding the current walk sequence into a list sequences, judging whether all the nodes are calculated, if so, entering the next step, otherwise, returning to E1;
e4: outputting a list sequences of node sequences;
step D24: expressing learning, namely sampling neighbors of the output node sequence through a fixed-length window to obtain a neighbor set of the user, and optimizing the expressing learning by adopting the following formula:
Figure GDA0003431244580000263
in the formula (I), the compound is shown in the specification,
Figure GDA0003431244580000271
a mapping function for embedding nodes into a d-dimensional feature space;
Figure GDA0003431244580000272
the node u is adjacent to the designated element structure;
step D25: node sequence R obtained by dynamically truncating random walks:
R=DynamicTruncatedRandomWalk(SGS,SIMS,maxT,minT,wl,pstop)
and taking R as an input of the skip-gram model, and obtaining a node low-dimensional vector phi which is a skip-gram (d, winL, R).
If there is a direct connection relationship between users in the heterogeneous information network, the method further includes step D3: correcting the user vector, specifically comprising:
step D31: specifying a set of users
Figure GDA0003431244580000279
Define triplets on the basis of<u,ui,uj>Where U ∈ U denotes the target user, UiE.g. U and UjE.g. U are direct neighbor and indirect neighbor of user U respectively, and
Figure GDA0003431244580000273
wherein
Figure GDA0003431244580000274
Representing user u in meta structure
Figure GDA0003431244580000275
A neighbor set of (1); neighbor set of user u
Figure GDA0003431244580000276
All triplets meeting the above requirements form the training data of user u
Figure GDA0003431244580000277
The training data of all users constitutes a meta structure
Figure GDA00034312445800002710
The training data set D is used for carrying out vector correction; definitional symbol >uTo represent the offset relationship of user u on the neighbors, i.e. triplets<u,ui,uj>Can use uiuujInstead of this;
step D32: initializing a training data set D; obtaining user u in-element structure
Figure GDA0003431244580000278
Neighbor set of (N)uObtaining a direct neighbor set DN of the user uuObtaining the indirect neighbor set IN of the user uu
Step D33: adding a triplet consisting of a target user, direct neighbors and indirect neighbors to a training data set;
step D34: parameters are updated according to an iterative formula in a gradient ascent algorithm:
Figure GDA0003431244580000281
Figure GDA0003431244580000282
Figure GDA0003431244580000283
step D35: up to the user vector matrix
Figure GDA0003431244580000284
Converging and outputting the corrected user vector matrix
Figure GDA0003431244580000285
Step S432: directly butting the low-dimensional vectors of the user and the project with a recommended task, inputting the low-dimensional vectors into a domain perception factor decomposition model as recommended sample features, selecting the features by adding a group lasso as a regular term, and completing the grading prediction between the user and the project:
step F1: scoring prediction
And (4) referring to the observation scores between the users and the projects in the data set, splicing the observation scores, and taking the spliced vector as a new recommended sample xn
Figure GDA0003431244580000286
In the formula (I), the compound is shown in the specification,
Figure GDA0003431244580000287
for the user
Figure GDA0003431244580000288
And item bjVector representation on respective ith element structures; d is the dimension of each vector;
step F2: calculate the score using FFM model:
Figure GDA0003431244580000289
in the formula: w is a0Is a global bias, wiCorresponding weight for ith feature and
Figure GDA0003431244580000291
the corresponding weight of the combined feature formed by the ith and jth features, and the parameter M is the sample xnI.e., M ═ 2 lxd;
step F3: parameter learning
Obtaining an objective function by using minimum mean square error learning
Figure GDA0003431244580000292
In the formula: y isnThe actual score of the nth sample; n is the number of samples;
introducing a set lasso in the objective function that can be used to pick features, the set lasso regularization of parameter p has the following expression:
Figure GDA0003431244580000293
in the formula, pgG is 1, 2,.., G; i | · | purple wind2Is 12A norm;
sample xnFeatures generated by the same meta-structure are grouped into the same group, so sample xnWill be divided into 2L groups, with the following regularization formulas for parameters w and V, respectively:
Figure GDA0003431244580000294
Figure GDA0003431244580000295
in the formula: w is alIs a vector with dimension d; vlA matrix formed by hidden vectors of the 1 st element structure characteristics on all fields; i | · | purple windFIs the Frobenius norm of the matrix;
combining the objective function and the regularization formula, the optimization objective can be transformed into:
Figure GDA0003431244580000296
and (3) optimizing the model by adopting a non-monotonic acceleration near-end gradient algorithm nmAPG, and outputting/obtaining optimized feature selection.
Step S440: and generating a TOP-N scoring list, namely a final recommendation list.
Although the present invention has been described herein with reference to the illustrated embodiments thereof, which are intended to be preferred embodiments of the present invention, it is to be understood that the invention is not limited thereto, and that numerous other modifications and embodiments can be devised by those skilled in the art that will fall within the spirit and scope of the principles of this disclosure.

Claims (8)

1. A recommendation method based on interest drift is characterized by comprising the following steps:
step S100: acquiring a recommendation data set with time information and project attribute information;
step S200: constructing a user interest tree, and introducing time attenuation into the user interest tree to create a user interest drift model;
step S300: selecting a current interest of a user;
step S400: generating a recommendation by combining a recommendation method for representing learning by a heterogeneous information network;
the step S200 includes:
step A1: acquiring a basic interest set;
step A2: constructing a basic interest tree;
step A3: generating a user interest tree, and introducing a time table into the interest tree;
step A4: setting a time division sequence, and acquiring interest sets of users in different periods;
step A5: calculating the attenuation weight of each interest in the interest tree in different periods;
step A6: writing the decay weights to a schedule in chronological order;
the step a1 specifically includes:
(1) computing item similarity matrices
Figure FDA0003431244570000011
Wherein i and j are different items, ESijIn order for the collaboration to be explicit,
Figure FDA0003431244570000012
Ri、Rja scoring set of items i, j; ISijIn order to be an implicit co-operation,
Figure FDA0003431244570000013
Ui、Ujis a set of users who have interacted with the items i, j;
CSijin order to be a similarity of the two items,
Figure FDA0003431244570000021
Tian attribute tag for item i; t isjAn attribute tag for item j;
(2) normalization of sim: traversing the project similarity matrix, and setting the corresponding position of the neighbor matrix which is greater than or equal to the similarity threshold value to 1; setting the corresponding position of the neighbor matrix smaller than the similarity threshold value to 0;
(3) calculating an item Link matrix Link, wherein Link is Ne. NeT; ne is a neighbor matrix of the neighbor set,
(4) generating a basic interest set BI:
initializing a set BI for storing basic interests, wherein each item is a cluster at first, constructing a local heap q [ i ] for a cluster i, wherein the q [ i ] comprises all clusters which are combined with the cluster of the cluster i and have metric function values not equal to 0, and arranging the clusters in a descending order according to the sizes of the metric function values combined with the clusters;
constructing a global heap Q for all clusters, wherein the global heap Q comprises the cluster with the maximum similarity in each local heap Q [ i ];
when the | Q | > M is satisfied, executing B1-B7 in a loop, otherwise executing B8;
b1: acquiring a cluster u with the maximum similarity from the global heap Q;
b2: obtaining the cluster v with the maximum similarity to the cluster u from the partial pile q [ u ] of the cluster u
B3: deleting the cluster v from the global heap Q, and combining the cluster v and the cluster u to form a new cluster w;
b4: updating a global heap Q and a local heap Q [ i ]; wherein M is the number of clusters;
b5: calculating the number of links of the new cluster w and the cluster x in the q [ u ] and the q [ v ];
b6: deleting the merged cluster u and cluster v from the local heap of cluster x;
b7: updating the local heap of the cluster x and the new cluster w, and updating the global heap;
b8: inserting the new cluster w into the global heap, releasing the local heap of the cluster u and the cluster v, and putting M clusters generated finally into a set BI; m clusters are basic interests, i.e. meta interests.
2. The recommendation method based on interest drift according to claim 1, wherein the step a2 performs quadratic clustering with M clusters in the set BI as a cluster set C to generate a basic interest tree with a hierarchical structure, specifically comprising:
step 1: initializing a list C-Sequence for storing the clustering intermediate result;
step 2: putting the cluster set C into C-Sequence;
and step 3: calculating the similarity between each cluster in the cluster set C:
Figure FDA0003431244570000031
in the formula, Ci、CjAre different clusters; n isi、njIs a cluster Ci、CjThe number of samples of (a); according to the Rock algorithm, f (θ) ═ 1- θ)/(1+ θ); and link [ C ]i,Cj]And sim [ C ]i,Cj]Then the following calculation formulas are respectively provided:
Figure FDA0003431244570000032
Figure FDA0003431244570000033
pq、prare data points;
and 4, step 4: two clusters C with the greatest similarityiAnd CjMerging to generate a new cluster Ci-j
And 5: updating cluster set C, i.e. CiAnd CjDeleting from the cluster and adding a new cluster Ci-jPutting the updated cluster set C into C-Sequence;
step 6: judging whether the number of elements in the cluster set C is 1, if not, returning to the step 2 to continue executing; if yes, returning a current cluster set C and a list C-Sequence for recording intermediate results;
and 7: and (4) reproducing the secondary clustering process according to the cluster set C and the intermediate result list C-Sequence and the top-down Sequence, and generating a basic interest tree with a hierarchical structure.
3. The recommendation method based on interest drift according to claim 2, wherein the method for generating the interest tree in step a3 is as follows:
step A31: acquiring historical records of users, and arranging the historical records according to a time sequence from near to far;
step A32: time division is carried out on the historical records according to unequal time intervals;
step A33: acquiring meta-interest set of user in each period
Figure FDA0003431244570000041
Initializing interest sets
Figure FDA0003431244570000042
Traversing a collection of items
Figure FDA0003431244570000043
Traversing the MI of the meta interest set and judging that the user u is at TiWhether epoch possesses Meta-interest CjIf yes, the meta-interest CjJoining interest collections
Figure FDA0003431244570000044
Outputting a set of meta-interests
Figure FDA0003431244570000045
Step A34: all meta-interest sets to be obtained
Figure FDA0003431244570000046
Arrange according to time period from far to near, and calculate user to all
Figure FDA0003431244570000047
An interest weight for each interest;
step A35: on the constructed interest tree, each node is modified by adding a pointer to the node, the pointer points to a time table, and interest weights of the user on the corresponding interest nodes in different periods are recorded in the time table;
step A36: all will be
Figure FDA0003431244570000048
The interest weights of the corresponding interests are written into the time table, and the interest weights of the non-leaf interests in each period are determined by the accumulated sum of the weights of the sub-interests in the corresponding period.
4. The recommendation method based on interest drift according to any one of claims 1-3, wherein the step S300 comprises:
step S310: acquiring a meta-interest set of a user according to the user interest tree;
step S320: replacing the meta-interests according to parent-child relations among the meta-interests;
step S330: calculating the active time of each meta-interest in the replaced meta-interest set;
step S340: and acquiring the current interest of the user according to the meta-interest active time and the interest weight.
5. The recommendation method based on interest drift according to claim 4, wherein the step S400 comprises:
step S410: obtaining a recommended sample according to the current interest of each user;
step S420: constructing a heterogeneous information network according to the recommended samples;
step S430: performing score prediction by using a HEFFM method, wherein the method comprises the following steps:
step S431: extracting information, performing representation learning on nodes in a heterogeneous information network, wherein the nodes comprise user nodes and project nodes, and acquiring low-dimensional vectors of users and projects;
step S432: directly butting the low-dimensional vectors of the user and the project with a recommended task, inputting the low-dimensional vectors into a domain perception factor decomposition model as recommended sample features, selecting the features by adding a group lasso as a regular term, and completing grading prediction between the user and the project
Step S440: and generating a TOP-N scoring list, namely a final recommendation list.
6. The interest shift-based recommendation method according to claim 5, wherein said step S431 comprises:
step D1: generating a semantic graph according to the meta-structure; the element structure comprises a complex element structure containing a nonlinear structure and a linear element structure only modeling a linear relation, and the specific process of generating the semantic graph comprises the following steps:
step D11: extracting user nodes and comment nodes from the Yelp information network, and establishing links between the user nodes and the comment nodes to form an abnormal composition HG;
step D12: finding out comment pairs which are specific to the same user and contain the same keywords from the Yelp information network, and putting the comment pairs into a set W;
step D13: traversing the set W, establishing a link for the comments in the set W to the heterogeneous composition HG to form a relation R-R, wherein the linear element structure in the heterogeneous composition HG has the semantic of the element structure;
step D14: when the structure is a complex element structure, constructing a corresponding adjacency matrix according to nodes and relations existing in the heterogeneous graph HG; when the structure is a linear element structure, generating an adjacent matrix from an original heterogeneous information network;
step D15: performing matrix operation along a linear element structure in a heterogeneous graph HG to generate a homogeneous matrix AUU
Step D16: according to isomorphic matrix AUUConstructing an isomorphic graph SG, wherein the isomorphic graph SG is a semantic graph corresponding to a corresponding complex element structure or linear original structure;
step D2: dynamically truncating random walk on a semantic graph, acquiring a node sequence R simultaneously containing semantic information and structural information, taking the node sequence R as the input of a skip-gram model, and acquiring a node low-dimensional vector, wherein the method specifically comprises the following steps:
step D21: projecting nodes on semantic graph into heterogeneous information network
Figure FDA0003431244570000061
Calculating node similarity matrixes of a complex element structure CS and a linear element structure LS;
constructing an adjacency matrix W of user nodes and comment nodesURConstructing an adjacency matrix W of comment nodes and project nodesRBAnd constructing an adjacency matrix W of comment nodes and keyword nodesRK
Obtaining C1And C2Of which
Figure FDA0003431244570000062
Similarity matrix of user on complex element structure CS
Figure FDA0003431244570000063
Calculating a node similarity matrix of the linear element structure LS:
Figure FDA0003431244570000064
wherein, WURAs adjacency matrix of user nodes and comment nodes, WRKAn adjacency matrix which is a comment node and a keyword node;
step D22: constraining the number of times of random walks starting from each node, and setting the number of times of random walks starting from each node v as l, wherein l is max (h (v) x maxL, minL), and maxL is the maximum number of times of random walks starting from the node; minL is the minimum number of times of random walk from the node; h (v) is the importance of the node v on the semantic graph;
step D23: the dynamic truncation random walk specifically comprises the following steps:
semantic map SG defining meta-structure sS={VSG,εSG) Similarity matrix SIM of nodes on element structure sSMaximum number of migrations maxT for each node, minimum number of migrations minT for each node, maximum length of migrations wl, stopping probability of migration pstop
Initializing list sequences for storing the node sequences;
computing node importance H-PageRank (SG)S);
D231: calculating the number l of wandering times by taking the node v as an initial node;
d232: initializing a list sequence for storing the current node sequence and recording the current node nnowV, recording the maximum walking times wl _ t;
according to the wandering path, the node x is reached and the transition probability p is recordedtransThe walking path is as follows:
Figure FDA0003431244570000071
in the formula, nxThe current node of the wandering path; n isiIs n for the last hop node of the wandering pathxFirst order neighbors of (1); o (n)i) Is a node niDegree of (d);
adding the node x into the list sequence, and calculating the stopping probability P of the node xx-stop
Figure FDA0003431244570000081
In the formula, PstopIs a pre-specified fixed stopping probability; sim (n)i,nx) For the previous hop node niAnd a current node nxSimilarity without normalization between them;
judging whether stopping at the node x, if so, ending the walking, entering the next step, otherwise, updating the walking step length wl _ t ← wl _ t-1 and the current node nnowX, judge the gameWhether the walking times reach l or not, if so, entering the next step, and otherwise, returning to D232;
d233: adding the sequence into a list sequences, judging whether all the nodes are calculated, if so, entering the next step, otherwise, returning to D231;
d234: outputting a list sequences of node sequences;
step D24: expressing learning, namely sampling neighbors of the output node sequence through a fixed-length window to obtain a neighbor set of the user, and optimizing the expressing learning by adopting the following formula:
maxfu∈νlog P(Nu|f(u))
in the formula (I), the compound is shown in the specification,
Figure FDA0003431244570000082
a mapping function for embedding nodes into a d-dimensional feature space;
Figure FDA0003431244570000083
the node u is adjacent to the designated element structure;
step D25: node sequence R obtained by dynamically truncating random walks:
R=DynamicTruncatedRandomWalk(SGS,SIMS,maxT,minT,wl,pstop) And taking R as an input of the skip-gram model, and obtaining a node low-dimensional vector phi which is a skip-gram (d, winL, R).
7. The recommendation method based on interest shift according to claim 6, wherein if there is a direct connection relationship between the users in the heterogeneous information network, the method further comprises step D3: correcting the user vector, specifically comprising:
step D31: specifying a set of users
Figure FDA0003431244570000091
Define triplets on the basis of<u,ui,uj<Where U ∈ U denotes the target user, UiE.g. U and UjE.g. U are direct neighbor and indirect neighbor of user U respectively, and
Figure FDA0003431244570000092
wherein
Figure FDA0003431244570000093
Representing user u in meta structure
Figure FDA0003431244570000094
A neighbor set of (1); neighbor set of user u
Figure FDA0003431244570000095
All triplets meeting the above requirements form the training data D of the user uu
Figure FDA0003431244570000096
The training data of all users constitutes a meta structure
Figure FDA0003431244570000098
The training data set D is used for carrying out vector correction; definitional symbol >uTo represent the offset relationship of user u on the neighbors, i.e. triplets<u,ui,uj>Can use ui>ujInstead of this;
step D32: initializing a training data set D; obtaining user u in-element structure
Figure FDA0003431244570000099
Neighbor set of (N)uObtaining a direct neighbor set DN of the user uuObtaining the indirect neighbor set IN of the user uu
Step D33: adding a triplet consisting of a target user, direct neighbors and indirect neighbors to a training data set;
step D34: parameters are updated according to an iterative formula in a gradient ascent algorithm:
Figure FDA0003431244570000097
Figure FDA0003431244570000101
Figure FDA0003431244570000102
step D35: up to the user vector matrix
Figure FDA0003431244570000103
Converging and outputting the corrected user vector matrix
Figure FDA0003431244570000104
8. The interest shift-based recommendation method according to claim 5, wherein the step S432 specifically includes:
step F1: scoring prediction
And (4) referring to the observation scores between the users and the projects in the data set, splicing the observation scores, and taking the spliced vector as a new recommended sample xn
Figure FDA0003431244570000105
In the formula (I), the compound is shown in the specification,
Figure FDA0003431244570000106
for user uiAnd item bjVector representation on respective 1 st element structures; d is the dimension of each vector;
step F2: calculate the score using FFM model:
Figure FDA0003431244570000107
in the formula: w is a0Is a global bias, wiCorresponding weight for ith feature and
Figure FDA0003431244570000108
the corresponding weight of the combined feature formed by the ith and jth features, and the parameter M is the sample xnI.e., M ═ 2 lxd;
step F3: parameter learning
Obtaining an objective function by using minimum mean square error learning
Figure FDA0003431244570000111
In the formula: y isnThe actual score of the nth sample; n is the number of samples;
introducing a set lasso in the objective function that can be used to pick features, the set lasso regularization of parameter p has the following expression:
Figure FDA0003431244570000112
in the formula, pgG is 1, 2,.., G; i | · | purple wind2Is 12A norm;
sample xnFeatures generated by the same meta-structure are grouped into the same group, so sample xnWill be divided into 2L groups, with the following regularization formulas for parameters w and V, respectively:
Figure FDA0003431244570000113
Figure FDA0003431244570000114
in the formula: w is alIs a vector with dimension d; vlA matrix formed by hidden vectors of the 1 st element structure characteristics on all fields; i | · | purple windFIs the Frobenius norm of the matrix;
combining the objective function and the regularization formula, the optimization objective can be transformed into:
Figure FDA0003431244570000115
and (3) optimizing the model by adopting a non-monotonic acceleration near-end gradient algorithm nmAPG, and outputting/obtaining optimized feature selection.
CN202110512435.4A 2021-05-11 2021-05-11 Recommendation method based on interest drift Active CN113239271B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110512435.4A CN113239271B (en) 2021-05-11 2021-05-11 Recommendation method based on interest drift

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110512435.4A CN113239271B (en) 2021-05-11 2021-05-11 Recommendation method based on interest drift

Publications (2)

Publication Number Publication Date
CN113239271A CN113239271A (en) 2021-08-10
CN113239271B true CN113239271B (en) 2022-03-15

Family

ID=77133484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110512435.4A Active CN113239271B (en) 2021-05-11 2021-05-11 Recommendation method based on interest drift

Country Status (1)

Country Link
CN (1) CN113239271B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113836397B (en) * 2021-09-02 2024-07-12 桂林电子科技大学 Recommendation method for personalized feature modeling of shopping basket
CN114491095B (en) * 2022-01-18 2024-10-01 南京大学 Method for recommending items by using potential factor model based on time sequence drift
CN114780861B (en) * 2022-06-20 2022-10-21 上海二三四五网络科技有限公司 Clustering technology-based user multi-interest recommendation method, device, equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955535A (en) * 2014-05-14 2014-07-30 南京大学镇江高新技术研究院 Individualized recommending method and system based on element path
CN106537376A (en) * 2014-06-06 2017-03-22 诺基亚技术有限公司 Method and apparatus for recommendation by applying efficient adaptive matrix factorization

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9710470B2 (en) * 2013-09-09 2017-07-18 International Business Machines Corporation Social recommendation across heterogeneous networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955535A (en) * 2014-05-14 2014-07-30 南京大学镇江高新技术研究院 Individualized recommending method and system based on element path
CN106537376A (en) * 2014-06-06 2017-03-22 诺基亚技术有限公司 Method and apparatus for recommendation by applying efficient adaptive matrix factorization

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Modeling user interests by conceptual clustering";Godoy D ET AL.;《Information Systems》;20061231;第247-265页 *
"一种多维多粒度用户兴趣模型研究";陈辉;《小型微型计算机系统》;20171231;第38卷(第12期);第2785-2790页 *
"一种检测兴趣漂移的元路径推荐模型";石磊 等;《小型微型计算机系统》;20190331;第40卷(第3期);第612-617页 *

Also Published As

Publication number Publication date
CN113239271A (en) 2021-08-10

Similar Documents

Publication Publication Date Title
CN113239271B (en) Recommendation method based on interest drift
JP7105789B2 (en) Machine learning method and apparatus for ranking network nodes after using a network with software agents at the network nodes
CN106547809B (en) Representing compound relationships in a graph database
US8949233B2 (en) Adaptive knowledge platform
Chen et al. General functional matrix factorization using gradient boosting
US7860817B2 (en) System, method and computer program for facet analysis
CN101432684B (en) Method and apparatus for efficient indexed storage for unstructured content
US10755179B2 (en) Methods and apparatus for identifying concepts corresponding to input information
US20100049766A1 (en) System, Method, and Computer Program for a Consumer Defined Information Architecture
CN113190754A (en) Recommendation method based on heterogeneous information network representation learning
CN112182424A (en) Social recommendation method based on integration of heterogeneous information and isomorphic information networks
Huang et al. Neural embedding collaborative filtering for recommender systems
US20240028997A1 (en) Method and System for Automatically Managing and Displaying a Visual Representation of Workflow Information
CN113254630A (en) Domain knowledge map recommendation method for global comprehensive observation results
Vahedian et al. Weighted random walk sampling for multi-relational recommendation
US20230306033A1 (en) Dashboard for monitoring current and historical consumption and quality metrics for attributes and records of a dataset
CN112685452B (en) Enterprise case retrieval method, device, equipment and storage medium
Deng et al. Label propagation on k-partite graphs with heterophily
Sun et al. NCGAN: A neural adversarial collaborative filtering for recommender system
CN116975346A (en) Method, apparatus, device, medium and program product for generating tag map data
Zhiyuli et al. Hsem: highly scalable node embedding for link prediction in very large-scale social networks
US20230289696A1 (en) Interactive tree representing attribute quality or consumption metrics for data ingestion and other applications
CN114996490A (en) Movie recommendation method, system, storage medium and device
US20230289839A1 (en) Data selection based on consumption and quality metrics for attributes and records of a dataset
CN116049566A (en) Object representation method, apparatus, device, storage medium and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant