CN113239271B

CN113239271B - Recommendation method based on interest drift

Info

Publication number: CN113239271B
Application number: CN202110512435.4A
Authority: CN
Inventors: 李川; 陈荣
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2022-03-15
Anticipated expiration: 2041-05-11
Also published as: CN113239271A

Abstract

The invention discloses a recommendation method based on interest drift, which comprises the following steps: acquiring a recommendation data set with time information and project attribute information; constructing a user interest tree, and introducing time attenuation into the user interest tree to create a user interest drift model; selecting a current interest of a user; and generating recommendations by combining the recommendation method of the heterogeneous information network representation learning. The invention firstly provides a hierarchical clustering method based on content information and collaborative information to construct a basic interest model, thereby relieving the problems of biased interest, insufficient mining and the like of the existing method; secondly, generating an interest tree according to the historical records of the user by taking the basic interest model as a template, introducing a multi-granularity time table and an interest weight updating mechanism to simulate the change process of the user interest, and realizing the modeling of interest drift; and finally, selecting a recommended sample according to the current interest selection scheme to generate a final recommended result, so that the recommended result is more in line with the actual situation and accurate, and the information mining is more sufficient.

Description

Recommendation method based on interest drift

Technical Field

The invention relates to the technical field of recommendation, in particular to a recommendation method based on interest drift.

Background

In a recommendation scenario, the user interests are never invariant. For items in a recommendation system, the inherent characteristics, such as genre, country, director, etc., of the movie item are not easily changed except for a part of the inherent characteristics, and other attributes, such as user's awareness of the item and popularity of the item, etc., are different with the passage of time, such as a newly released movie always receives more attention than an old movie in the same category. Also, as for the user interests, which change over time due to interference from the user himself and external factors, a user who historically prefers comedy movies may have a strong interest in them due to recent heat of foreign science fiction movies and enthusiasm discussion and strong recommendations of surrounding friends, while a user who prefers entertainment news in the student age may pay more attention to financial or social news as the age increases, and the recommendation system refers to a phenomenon that such user interests dynamically change over time as interest drifts. The interest drift phenomenon is common in the recommendation system, and it can be said that each user of the recommendation system has a transition of interest anytime and anywhere. A recommendation system with good performance needs to capture the interest difference of different users and also needs to have the ability to adapt to the interest change of the users. As for the user interests, the user interests change over time due to interference of the user and external factors, and the recommendation system refers to the phenomenon that the user interests change dynamically over time as interest drift. Therefore, for the complex system which requires providing instant and accurate personalized services and takes user preference as a main implementation basis, the research of the recommendation method which can adapt to the interest drift of the user is significant.

Building a user interest model is a common method of modeling interest drift. In this method, information such as subjective evaluation of the user, for example, scores and comments is often used. However, there are two non-negligible problems associated with generating an interest model from a user perspective. Firstly, the formation of user interest in a recommendation system is closely related to the characteristics of items, and the interest in an interest model is easily biased by simply constructing the interest model from the perspective of users, so that the problem of mismatching with the mainstream characteristics of the items can be induced; secondly, in a recommendation system, the comment or score data of a user often has sparsity, so that the interest set in the interest model cannot completely cover all potential interest points of the user for such items, and the interest mining is insufficient.

Disclosure of Invention

The invention aims to provide a recommendation method based on interest drift, which is used for solving the problems that in the prior art, the user interest drift is not considered in a recommendation system or method, and the recommendation effect is poor due to biased mining and insufficient interest set mining.

The invention solves the problems through the following technical scheme:

a recommendation method based on interest drift comprises the following steps:

step S100: acquiring a recommendation data set with time information and item attribute information:

step S200: constructing a user interest tree, and introducing time attenuation into the user interest tree to create a user interest drift model;

step S300: selecting a current interest of a user;

step S400: and generating recommendations by combining the recommendation method of the heterogeneous information network representation learning.

According to the invention, the user interest tree is built in the recommendation data set, time attenuation is introduced into the user interest tree, the change of the user interest along with the time is considered, the interest drift model is built, the capture of the dynamic change of the user interest is realized, and the recommendation is generated according to the current interest of the user, so that the recommendation result is more consistent with the real situation, and the problems of biased interest, insufficient mining and the like existing in the conventional method are solved.

The step S200 includes:

step A1: acquiring a basic interest set;

step A2: constructing a basic interest tree;

step A3: generating a user interest tree, and introducing a time table into the interest tree;

step A4: setting a time division sequence, and acquiring interest sets of users in different periods;

step A5: calculating the attenuation weight of each interest in the interest tree in different periods;

step A6: the decay weights are written to the schedule in chronological order.

The step a1 specifically includes:

(1) computing item similarity matrices

Wherein i and j are different items, ES_ijIn order for the collaboration to be explicit,

R_i、R_ja scoring set of items i, j; IS_ijAs implicit collaboration，

U_i、U_iIs a set of users who have interacted with the items i, j;

CS_ijin order to be a similarity of the two items,

T_ian attribute tag for item i; t is_jAn attribute tag for item j;

(2) normalization of sim: traversing the project similarity matrix, and setting the corresponding position of the neighbor matrix which is greater than or equal to the similarity threshold value to 1; setting the corresponding position of the neighbor matrix smaller than the similarity threshold value to 0;

(3) calculating an item Link matrix Link, Link being Ne · Ne^T(ii) a Ne is a neighbor matrix of the neighbor set,

(4) generating a basic interest set BI:

initializing a set BI for storing basic interests, wherein each item is a cluster at first, constructing a local heap q [ i ] for a cluster i, wherein the q [ i ] comprises all clusters which are combined with the cluster of the cluster i and have metric function values not equal to 0, and arranging the clusters in a descending order according to the sizes of the metric function values combined with the clusters;

constructing a global heap Q for all clusters, wherein the global heap Q comprises the cluster with the maximum similarity in each local heap Q [ i ];

when the | Q | > M is satisfied, executing B1-B7 in a loop, otherwise executing B8;

b1: acquiring a cluster u with the maximum similarity from the global heap Q;

b2: obtaining the cluster v with the maximum similarity to the cluster u from the partial pile q [ u ] of the cluster u

B3: deleting the cluster v from the global heap Q, and combining the cluster v and the cluster u to form a new cluster w;

b4: updating a global heap Q and a local heap Q [ i ]; wherein M is the number of clusters;

b5: calculating the number of links of the new cluster w and the cluster x in the q [ u ] and the q [ v ];

b6: deleting the merged cluster u and cluster v from the local heap of cluster x;

b7: updating the local heap of the cluster x and the new cluster w, and updating the global heap;

b8: inserting the new cluster w into the global heap, releasing the local heap of the cluster u and the cluster v, and putting M clusters generated finally into a set BI; m clusters are basic interests, i.e. meta interests.

The step a2 is to perform secondary clustering by using M clusters in the set BI as a cluster set C to generate a basic interest tree with a hierarchical structure, and specifically includes:

step 1: initializing a list C-Sequence for storing the clustering intermediate result;

step 2: putting the cluster set C into C-Sequence;

and step 3: calculating the similarity between each cluster in the cluster set C:

in the formula, C_i、C_jAre different clusters; n is_i、n_jIs a cluster C_i、C_jThe number of samples of (a); according to the Rock algorithm, f (θ)) ═ 1- θ)/(1+ θ; and link [ C ]_i，C_j]And sim [ C ]_i，C_j]Then the following calculation formulas are respectively provided:

p_q、p_rare data points;

and 4, step 4: two clusters C with the greatest similarity_iAnd C_jMerging to generate a new cluster C_i-j；

And 5: updating cluster set C, i.e. C_iAnd C_jDeleting from the cluster and adding a new cluster C_i-jPutting the updated cluster set C into C-Sequence;

step 6: judging whether the number of elements in the cluster set C is 1, if not, returning to the step 2 to continue executing; if yes, the current cluster set C and the list C-Sequence recording the intermediate result are returned.

And 7: and (4) reproducing the secondary clustering process according to the cluster set C and the intermediate result list C-Sequence and the top-down Sequence, and generating a basic interest tree with a hierarchical structure.

The method for generating the interest tree in the step a3 includes:

step A31: acquiring historical records of users, and arranging the historical records according to a time sequence from near to far;

step A32: time division is carried out on the historical records according to unequal time intervals;

step A33: acquiring meta-interest set of user in each period

Initializing interest sets

Traversing a collection of items

Traversing the MI of the meta interest set and judging that the user u is at T_iWhether epoch possesses Meta-interest C_jIf yes, the meta-interest C_jJoining interest collections

Outputting a set of meta-interests

Step A34: all meta-interest sets to be obtained

From far to near according to timeArrange and calculate all user pairs

An interest weight for each interest;

step A35: on the constructed interest tree, each node is modified by adding a pointer to the node, the pointer points to a time table, and interest weights of the user on the corresponding interest nodes in different periods are recorded in the time table;

step A36: all will be

The interest weights of the corresponding interests are written into the time table, and the interest weights of the non-leaf interests in each period are determined by the accumulated sum of the weights of the sub-interests in the corresponding period.

The step S300 includes:

step S310: acquiring a meta-interest set of a user according to the user interest tree;

step S320: replacing the meta-interests according to parent-child relations among the meta-interests;

step S330: calculating the active time of each meta-interest in the replaced meta-interest set;

step S340: and acquiring the current interest of the user according to the meta-interest active time and the interest weight.

The step S400 includes:

step S410: obtaining a recommended sample according to the current interest of each user;

step S420: constructing a heterogeneous information network according to the recommended samples;

step S430: performing score prediction by using a HEFFM method, wherein the method comprises the following steps:

step S431: extracting information, performing representation learning on nodes in a heterogeneous information network, wherein the nodes comprise user nodes and project nodes, and acquiring low-dimensional vectors of users and projects;

step S432: directly butting the low-dimensional vectors of the user and the project with a recommended task, inputting the low-dimensional vectors into a domain perception factor decomposition model as recommended sample features, selecting the features by adding a group lasso as a regular term, and completing grading prediction between the user and the project

Step S440: and generating a TOP-N scoring list, namely a final recommendation list.

The step S431 includes:

step D1: generating a semantic graph according to the meta-structure; the element structure comprises a complex element structure containing a nonlinear structure and a linear element structure only modeling a linear relation, and the specific process of generating the semantic graph comprises the following steps:

step D11: extracting user nodes and comment nodes from the Yelp information network, and establishing links between the user nodes and the comment nodes to form an abnormal composition HG;

step D12: finding out comment pairs which are specific to the same user and contain the same keywords from the Yelp information network, and putting the comment pairs into a set W;

step D13: traversing the set W, establishing a link for the comments in the set W to the heterogeneous composition HG to form a relation R-R, wherein the linear element structure in the heterogeneous composition HG has the semantic of the element structure;

step D14: when the structure is a complex element structure, constructing a corresponding adjacency matrix according to nodes and relations existing in the heterogeneous graph HG; when the structure is a linear element structure, generating an adjacent matrix from an original heterogeneous information network;

step D15: performing matrix operation along a linear element structure in a heterogeneous graph HG to generate a homogeneous matrix A_UU；

Step D16: according to isomorphic matrix A_UUConstructing an isomorphic graph SG, wherein the isomorphic graph SG is a semantic graph corresponding to a corresponding complex element structure or linear original structure;

step D2: dynamically truncating random walk on a semantic graph, acquiring a node sequence R simultaneously containing semantic information and structural information, taking the node sequence R as the input of a skip-gram model, and acquiring a node low-dimensional vector, wherein the method specifically comprises the following steps:

step D21: projecting nodes on semantic graph into heterogeneous information network

Calculating node similarity matrixes of a complex element structure CS and a linear element structure LS;

constructing an adjacency matrix W of user nodes and comment nodes_URConstructing an adjacency matrix W of comment nodes and project nodes_RBAnd constructing an adjacency matrix W of comment nodes and keyword nodes_RK；

Obtaining C₁And C₂Of which

Similarity matrix of user on complex element structure CS

Calculating a node similarity matrix of the linear element structure LS:

wherein, W_URAs adjacency matrix of user nodes and comment nodes, W_RKAn adjacency matrix which is a comment node and a keyword node;

step D22: constraining the number of times of random walks starting from each node, and setting the number of times of random walks starting from each node v as l, wherein l is max (h (v) x maxL, minL), and maxL is the maximum number of times of random walks starting from the node; minL is the minimum number of times of random walk from the node; h (v) is the importance of the node v on the semantic graph;

step D23: the dynamic truncation random walk specifically comprises the following steps:

semantic graph defining a meta-structure s

Similarity matrix SIM of nodes on element structure s_SMaximum number of migrations maxT for each node, minimum number of migrations minT for each node, maximum length of migrations wl, stopping probability of migration p_stop；

Initializing list sequences for storing the node sequences;

computing node importance H ═ PageRank (^SGs)；

E1: calculating the number l of wandering times by taking the node v as an initial node;

e2: initializing a list sequence for storing the current node sequence and recording the current node n_nowV, recording the maximum walking times wl _ t;

according to the wandering path, the node x is reached and the transition probability P is recorded_transThe walking path is as follows:

in the formula, n_xThe current node of the wandering path; n is_iIs n for the last hop node of the wandering path_xFirst order neighbors of (1); o (n)_i) Is a node n_iDegree of (d);

adding the node x into the list sequence, and calculating the stopping probability P of the node x_x-stop：

In the formula, P_stopIs a pre-specified fixed stopping probability; p_stopFor the previous hop node n_iAnd a current node n_xSimilarity without normalization between them;

judging whether stopping at the node x, if so, ending the walking, entering the next step, otherwise, updating the walking step length wl _ t ← wl _ t-1 and the current node n_nowJudging whether the number of the wandering times reaches l or not, if so, entering the next step, and if not, returning to E2;

e3: adding the current walk sequence into a list sequences, judging whether all the nodes are calculated, if so, entering the next step, otherwise, returning to E1;

e4: outputting a list sequences of node sequences;

step D24: expressing learning, namely sampling neighbors of the output node sequence through a fixed-length window to obtain a neighbor set of the user, and optimizing the expressing learning by adopting the following formula:

in the formula (I), the compound is shown in the specification,

a mapping function for embedding nodes into a d-dimensional feature space;

the node u is adjacent to the designated element structure;

step D25: node sequence R obtained by dynamically truncating random walks:

R＝DynamicTruncatedRandomWalk(SG_S，SIM_S，maxT，minT，wl，p_stop)

and taking R as an input of the skip-gram model, and obtaining a node low-dimensional vector phi which is a skip-gram (d, winL, R).

If there is a direct connection relationship between users in the heterogeneous information network, the method further includes step D3: correcting the user vector, specifically comprising:

step D31: specifying a set of users

Define triplets on the basis of<u，u_i，u_j>Where U ∈ U denotes the target user, U_iE.g. U and U_jE.g. U are direct neighbor and indirect neighbor of user U respectively, and

wherein

Representing user u in meta structure

A neighbor set of (1); neighbor set of user u

All triplets meeting the above requirements form the training data of user u

The training data of all users constitutes a meta structure

The training data set D is used for carrying out vector correction; definitional symbol >_uTo represent the offset relationship of user u on the neighbors, i.e. triplets<u，u_i，u_j>Can use u_i＞_uu_jInstead of this;

step D32: initializing a training data set D; obtaining user u in-element structure

Neighbor set of (N)_uObtaining a direct neighbor set DN of the user u_uObtaining the indirect neighbor set IN of the user u_u；

Step D33: adding a triplet consisting of a target user, direct neighbors and indirect neighbors to a training data set;

step D34: parameters are updated according to an iterative formula in a gradient ascent algorithm:

step D35: up to the user vector matrix

Converging and outputting the corrected user vector matrix

The step S432 specifically includes:

step F1: scoring prediction

And (4) referring to the observation scores between the users and the projects in the data set, splicing the observation scores, and taking the spliced vector as a new recommended sample xⁿ；

In the formula (I), the compound is shown in the specification,

for user u_iAnd item b_jVector representation on respective ith element structures; d is the dimension of each vector;

step F2: calculate the score using FFM model:

in the formula: w is a₀Is a global bias, w_iCorresponding weight for ith feature and w₀The corresponding weight of the combined feature formed by the ith and jth features, and the parameter M is the sample xⁿI.e., M ═ 2 lxd;

step F3: parameter learning

Obtaining an objective function by using minimum mean square error learning

In the formula：yⁿThe actual score of the nth sample; n is the number of samples;

introducing a set lasso in the objective function that can be used to pick features, the set lasso regularization of parameter p has the following expression:

in the formula, p_gG is 1, 2,.., G; i | · | purple wind₂Is 1₂A norm;

sample xⁿFeatures generated by the same meta-structure are grouped into the same group, so sample xⁿWill be divided into 2L groups, with the following regularization formulas for parameters w and V, respectively:

in the formula: w is a₁Is a vector with dimension d; v_lA matrix formed by hidden vectors of the 1 st element structure characteristics on all fields; i | · | purple wind_FIs the Frobenius norm of the matrix;

combining the objective function and the regularization formula, the optimization objective can be transformed into:

and (3) optimizing the model by adopting a non-monotonic acceleration near-end gradient algorithm nmAPG, and outputting/obtaining optimized feature selection.

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) according to the method, firstly, the project characteristics are taken as guidance, a hierarchical clustering method based on content information and collaborative information is provided, a basic interest model is constructed according to a clustering result, high unification of user interest and project characteristics is achieved, and the problems of biased interest, insufficient mining and the like existing in the existing method are solved; secondly, generating a unique interest tree for each user according to the historical records of the users by taking the basic interest model as a template, and simulating the change process of the user interest by introducing a multi-granularity time table divided based on the inclined time and an interest weight updating mechanism based on the time attenuation into the user interest tree to realize the modeling of the interest drift; and finally, a current interest selection scheme based on time information and interest preference is provided, so that the obtained current interest meets the requirement of the recent interest and belongs to the category of important interests, a recommendation sample is selected according to the current interest, and a final recommendation result is generated by a recommendation method based on heterogeneous information network representation learning, so that the recommendation result is more consistent with the actual situation, more accurate and more sufficient in information mining.

(2) The invention realizes the capture of the dynamic change of the user interest by obliquely dividing the historical records with time attributes of the user, introducing a multi-granularity time table into the user interest tree, recording the preference weight of the user to each interest in different periods by utilizing the time table and modeling the drift process of the interest by combining a weight updating mechanism based on time attenuation. Therefore, the user can not only have a list of interests in different periods, but also can be aware of the change of each interest.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples, but the embodiments of the present invention are not limited thereto.

Example (b):

with reference to fig. 1, a recommendation method based on interest drift includes an interest model construction, an interest drift modeling, a current interest selection, and a recommendation generation, where the interest model construction is completed by generating a basic interest and representing a user interest:

the recommended interests refer to all possible interests of the user in the specific type of item, the basic interests are regarded as templates of the user interests, and all the interests of the user are derived from the basic interests.

step S200: constructing a user interest tree, introducing time attenuation into the user interest tree to create a user interest drift model:

step A1: acquiring a basic interest set:

(1) computing item similarity matrices

R_i、R_ja scoring set of items i, j; IS_ijIn order to be an implicit co-operation,

U_i、U_jis a set of users who have interacted with the items i, j;

CS_ijin order to be a similarity of the two items,

T_ian attribute tag for item i; t is_jAn attribute tag for item j;

(4) generating a basic interest set BI:

inputting: item set B, item Link matrix Link, item similarity matrix Sim, clustering number M

And (3) outputting: basic interest set BI

The method comprises the following steps:

step A2: performing secondary clustering by taking M clusters in the set BI as a cluster set C to generate a basic interest tree with a hierarchical structure:

step A21: initializing a list C-Sequence for storing the clustering intermediate result;

step A22: putting the cluster set C into C-Sequence;

step A23: calculating the similarity between each cluster in the cluster set C:

in the formula, C_i、C_jAre different clusters; n is_i、n_jIs a cluster C_i、C_jThe number of samples of (a); according to the Rock algorithm, f (θ) ═ 1- θ)/(1+ θ); and link [ C ]_i，C_j]And sim [ C ]_i，C_j]Then the following calculation formulas are respectively provided:

p_q、p_rare data points;

step A24: two with the greatest similarityCluster C_iAnd C_jMerging to generate a new cluster C_i-j；

Step A25: updating cluster set C, i.e. C_iAnd C_jDeleting from the cluster and adding a new cluster C_i-jPutting the updated cluster set C into C-Sequence;

step A26: judging whether the number of elements in the cluster set C is 1, if not, returning to the step 2 to continue executing; if yes, the current cluster set C and the list C-Sequence recording the intermediate result are returned.

In each iteration step, two clusters with the largest similarity are selected and combined until only 1 cluster exists in the cluster set C, so that the total iteration time is M-1. Because the M is controlled in a small range, the final iteration times are not too many, and the secondary clustering result can be obtained in a short time;

step A27: and (3) reproducing the secondary clustering process according to the cluster set C and the intermediate result list C-Sequence and the top-down Sequence, and generating a clustering tree with a hierarchical structure, namely a basic interest tree:

inputting: cluster set C, intermediate result list C-Sequence

And (3) outputting: root node root of basic interest tree

The method comprises the following steps:

the present invention treats each node, i.e., each cluster, in the underlying interest tree as an interest that a user may have on such items, where leaf clusters are referred to as meta-interests. According to the tree structure, interests can be further divided into fine and coarse categories, wherein fine interests can be used for capturing fine changes of user interests, and coarse interests can reflect large changes of user interests.

The interest model of the user is constructed by taking a basic interest tree which is closely related to the characteristics of the item as a template:

step A33: acquiring meta-interest set of user in each period

Inputting: user u is at T_iItem collections for epochs

Meta interest set MI

And (3) outputting: user u is at T_iInterest set of epochs

The method comprises the following steps:

step A34: all meta-interest sets to be obtained

Arrange according to time period from far to near, and calculate user to all

An interest weight for each interest;

step A35: constructing an interest tree:

inputting: cluster set C, intermediate result list C-Sequence

And (3) outputting: root node root of basic interest tree

The method comprises the following steps:

on the constructed interest tree, each node is modified by adding a pointer to the node, the pointer points to a time table, the interest preference of the user to the corresponding interest node in different periods is recorded in the time table, and the interest preference is to the user in a certain period T_iInterest in C_jAnd their weights

Description of (1), as

And has the following formal definition:

in the formula:

as interest C_jThe interest weight of (2) implies the interest C of the user_jThe degree of preference of (c).

About interest weight

This document holds the same thinking as the second question, considering that a user's preference for a certain interest should be reflected by the user's interaction with an item, the user being at a certain time period T_iIf and belong to C_jThe more item interactions of (C), the user is to C_jThe higher the interest of (1), the interest weight

The larger; also, the user has a certain period T_iIf the inner pair belongs to C_jThe higher the item preference of (C), the user pair C_jThe preference of (a) will increase as the preference of the item increases. Therefore, the preference weight is given to the interest for the reason of considering the two aspects

The following calculation formula is presented herein:

in the formula:

for the user during the period T_iThe set of interactive items of (1);

for the user during the period T_iHas interaction therein and belongs to C_jIs that of

For user to item b_qA preference weight of

The invention provides a project preference measurement mode based on score deviation, which uses the difference value between the user score and the project average score for calculating the project preference, and is shown as the following formula:

in the formula: r is_uGiving user u an item b_qScoring of (4);

is an item b_qAverage score of (3). As can be seen from the formula, the larger the difference value is, the larger the weight is, the higher the preference of the user for the item is;

step A36: all will be

The interest weight of the corresponding interest in the table is written into the table, and the non-leaf interest table record is updated according to the principle that the parent interest weight is determined by the child interest, namely the interest weight of the non-leaf interest in each period is determined by the accumulated sum of the weights of the child interest in the corresponding period.

Step S300: interest drift modeling and current interest selection:

interest drift modeling: the interest weight in the user interest model is an important basis for judging whether the user interest changes, so that the interest drift modeling process is set as an interest weight updating rule.

Carrying out time attenuation on the interest weight, and writing the attenuated weight into a time table, which is called as an attenuation weight dw;

given the time decay function h (t), the update rule of the meta-interest decay weight is as follows: if meta-interest C_jWhen it occurs for the first time, it is calculated according to the formula 4-9, i.e.

If meta-interest C_jNot first appearing, it is calculated as follows:

in the formula:

is a meta interest C_jAt the current timeT_kInterest weight of;

is a meta interest C_jIn the last period T_k-1The attenuation weight of (d); h (t) is a pre-specified time decay function,

as can be seen from the above equation, if meta-interest does not explicitly exist in the next epoch, only the weight of the previous epoch is attenuated, i.e.

If the meta-interest exists in two adjacent periods, corresponding interest enhancement is carried out on the meta-interest according to an interest weight calculation method in a specific period on the basis of attenuation of the meta-interest so as to ensure that the updating rule conforms to the real change situation of the user interest

For the time decay function h (t), reference is made herein to newton's law of cooling, which is defined as follows:

h(t)＝e^-λ×Δt (4-12)

in the formula: Δ t is the period length of the current period; λ is a pre-specified attenuation factor.

In summary, by incorporating a time decay mechanism in the calculation process of the interest weight, the user interest model provided herein can simulate the change process of the user interest, and therefore, the weight in the schedule is considered as a result formed under the action of the user interest drift, and is considered as an important basis for judging the current interest of the user.

Current interest selection:

step 1: acquiring all meta-interests of which the time tables are not empty from the user interest tree;

step 2: grouping all the meta-interests acquired in the step 1 according to the parent interests, and grouping the meta-interests with the same parent interests into the same group;

and step 3: checking the number of meta-interests of each group and comparing the meta-interests with the number of sub-interests of the parent interests, if the meta-interests are equal to the number of sub-interests of the parent interests, replacing the group with the parent interests and recording a finally obtained interest set as CI;

and 4, step 4: arranging the interests in the set CI in sequence from large to small according to the latest weight, and intercepting the first N interests and putting the N interests into the set A, wherein N is alpha and CI, and alpha is a pre-specified interest interception factor;

and 5: arranging the interests in the set CI from large to small according to the active time, and still intercepting the first N interests and putting the former N interests into the set B;

step 6: and performing intersection operation on the set A, B, wherein the interest set obtained by the operation result is the current interest set of the user.

Through the selection process, the interest set mined by the scheme provided by the invention not only belongs to the category of the recent access interests of the user, but also achieves the purpose of including the main interests of the user.

Step S400: and (3) generating a recommendation by combining a recommendation method of heterogeneous information network representation learning:

Obtaining C₁And C₂Of which

Similarity matrix of user on complex element structure CS

Calculating a node similarity matrix of the linear element structure LS:

semantic graph defining a meta-structure s

Initializing list sequences for storing the node sequences;

computing node importance H ═ PageRank (^SGs)；

node x is added to the list sequence,computing the stopping probability p for node x_x-stop：

In the formula, P_stopIs a pre-specified fixed stopping probability; sim (n)_i，n_x) For the previous hop node n_iAnd a current node n_xSimilarity without normalization between them;

e4: outputting a list sequences of node sequences;

in the formula (I), the compound is shown in the specification,

a mapping function for embedding nodes into a d-dimensional feature space;

the node u is adjacent to the designated element structure;

step D25: node sequence R obtained by dynamically truncating random walks:

R＝DynamicTruncatedRandomWalk(SG_S，SIM_S，maxT，minT，wl，p_stop)

step D31: specifying a set of users

wherein

Representing user u in meta structure

A neighbor set of (1); neighbor set of user u

All triplets meeting the above requirements form the training data of user u

The training data of all users constitutes a meta structure

step D35: up to the user vector matrix

Converging and outputting the corrected user vector matrix

Step S432: directly butting the low-dimensional vectors of the user and the project with a recommended task, inputting the low-dimensional vectors into a domain perception factor decomposition model as recommended sample features, selecting the features by adding a group lasso as a regular term, and completing the grading prediction between the user and the project:

step F1: scoring prediction

In the formula (I), the compound is shown in the specification,

for the user

And item b_jVector representation on respective ith element structures; d is the dimension of each vector;

step F2: calculate the score using FFM model:

in the formula: w is a₀Is a global bias, w_iCorresponding weight for ith feature and

the corresponding weight of the combined feature formed by the ith and jth features, and the parameter M is the sample xⁿI.e., M ═ 2 lxd;

step F3: parameter learning

Obtaining an objective function by using minimum mean square error learning

In the formula: y isⁿThe actual score of the nth sample; n is the number of samples;

in the formula, p_gG is 1, 2,.., G; i | · | purple wind₂Is 1₂A norm;

in the formula: w is a_lIs a vector with dimension d; v_lA matrix formed by hidden vectors of the 1 st element structure characteristics on all fields; i | · | purple wind_FIs the Frobenius norm of the matrix;

Although the present invention has been described herein with reference to the illustrated embodiments thereof, which are intended to be preferred embodiments of the present invention, it is to be understood that the invention is not limited thereto, and that numerous other modifications and embodiments can be devised by those skilled in the art that will fall within the spirit and scope of the principles of this disclosure.

Claims

1. A recommendation method based on interest drift is characterized by comprising the following steps:

step S100: acquiring a recommendation data set with time information and project attribute information;

step S300: selecting a current interest of a user;

step S400: generating a recommendation by combining a recommendation method for representing learning by a heterogeneous information network;

the step S200 includes:

step A1: acquiring a basic interest set;

step A2: constructing a basic interest tree;

step A6: writing the decay weights to a schedule in chronological order;

the step a1 specifically includes:

(1) computing item similarity matrices

U_i、U_jis a set of users who have interacted with the items i, j;

CS_ijin order to be a similarity of the two items,

T_ian attribute tag for item i; t is_jAn attribute tag for item j;

(3) calculating an item Link matrix Link, wherein Link is Ne. NeT; ne is a neighbor matrix of the neighbor set,

(4) generating a basic interest set BI:

b1: acquiring a cluster u with the maximum similarity from the global heap Q;

2. The recommendation method based on interest drift according to claim 1, wherein the step a2 performs quadratic clustering with M clusters in the set BI as a cluster set C to generate a basic interest tree with a hierarchical structure, specifically comprising:

step 2: putting the cluster set C into C-Sequence;

p_q、p_rare data points;

step 6: judging whether the number of elements in the cluster set C is 1, if not, returning to the step 2 to continue executing; if yes, returning a current cluster set C and a list C-Sequence for recording intermediate results;

3. The recommendation method based on interest drift according to claim 2, wherein the method for generating the interest tree in step a3 is as follows:

step A33: acquiring meta-interest set of user in each period

Initializing interest sets

Traversing a collection of items

Outputting a set of meta-interests

Step A34: all meta-interest sets to be obtained

Arrange according to time period from far to near, and calculate user to all

An interest weight for each interest;

step A36: all will be

4. The recommendation method based on interest drift according to any one of claims 1-3, wherein the step S300 comprises:

5. The recommendation method based on interest drift according to claim 4, wherein the step S400 comprises:

6. The interest shift-based recommendation method according to claim 5, wherein said step S431 comprises:

Obtaining C₁And C₂Of which

Similarity matrix of user on complex element structure CS

Calculating a node similarity matrix of the linear element structure LS:

semantic map SG defining meta-structure s_S＝{V_SG，ε_SG) Similarity matrix SIM of nodes on element structure s_SMaximum number of migrations maxT for each node, minimum number of migrations minT for each node, maximum length of migrations wl, stopping probability of migration p_stop；

Initializing list sequences for storing the node sequences;

computing node importance H-PageRank (SG)_S)；

D231: calculating the number l of wandering times by taking the node v as an initial node;

d232: initializing a list sequence for storing the current node sequence and recording the current node n_nowV, recording the maximum walking times wl _ t;

judging whether stopping at the node x, if so, ending the walking, entering the next step, otherwise, updating the walking step length wl _ t ← wl _ t-1 and the current node n_nowX, judge the gameWhether the walking times reach l or not, if so, entering the next step, and otherwise, returning to D232;

d233: adding the sequence into a list sequences, judging whether all the nodes are calculated, if so, entering the next step, otherwise, returning to D231;

d234: outputting a list sequences of node sequences;

max_f∑_u∈νlog P(N_u|f(u))

in the formula (I), the compound is shown in the specification,

a mapping function for embedding nodes into a d-dimensional feature space;

the node u is adjacent to the designated element structure;

step D25: node sequence R obtained by dynamically truncating random walks:

R＝DynamicTruncatedRandomWalk(SG_S，SIM_S，maxT，minT，wl，p_stop) And taking R as an input of the skip-gram model, and obtaining a node low-dimensional vector phi which is a skip-gram (d, winL, R).

7. The recommendation method based on interest shift according to claim 6, wherein if there is a direct connection relationship between the users in the heterogeneous information network, the method further comprises step D3: correcting the user vector, specifically comprising:

step D31: specifying a set of users

Define triplets on the basis of<u，u_i，u_j<Where U ∈ U denotes the target user, U_iE.g. U and U_jE.g. U are direct neighbor and indirect neighbor of user U respectively, and

wherein

Representing user u in meta structure

A neighbor set of (1); neighbor set of user u

All triplets meeting the above requirements form the training data D of the user u_u：

The training data of all users constitutes a meta structure

The training data set D is used for carrying out vector correction; definitional symbol >_uTo represent the offset relationship of user u on the neighbors, i.e. triplets<u，u_i，u_j>Can use u_i＞u_jInstead of this;