CN110781409A - Article recommendation method based on collaborative filtering - Google Patents
Article recommendation method based on collaborative filtering Download PDFInfo
- Publication number
- CN110781409A CN110781409A CN201911022328.2A CN201911022328A CN110781409A CN 110781409 A CN110781409 A CN 110781409A CN 201911022328 A CN201911022328 A CN 201911022328A CN 110781409 A CN110781409 A CN 110781409A
- Authority
- CN
- China
- Prior art keywords
- item
- attention
- user
- layer
- recommendation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an item recommendation method based on collaborative filtering, which relates to the technical field of recommendation systems and introduces a special dynamic weight to better predict the preference degree of a user u to an item i, wherein the dynamic weight can be estimated by using an attention mechanism, the recommendation performance is estimated by using recall ratio and precision ratio, the effectiveness and recommendation quality of the recommendation system are improved, the attention mechanism is proved to be helpful for estimating the contribution of history items interacted by the user to the user preference representation, and the personalized recommendation is more accurate. The attention scores are respectively calculated by utilizing the point-by-point attention and the self-attention, remarkable effects are obtained, meanwhile, the transform model and the recommendation algorithm are combined and compared with a conventional embedded model, and the improvement of the recommendation effect is shown.
Description
Technical Field
The invention relates to the technical field of recommendation systems, in particular to an article recommendation method based on collaborative filtering.
Background
Collaborative Filtering (CF) is the earliest and well-known recommendation algorithm. The main functions are prediction and recommendation, and the method is not only deeply researched in academia but also widely applied in the industry. The algorithm discovers the preference of the user through mining historical behavior data of the user, and recommends similar-taste articles to the user based on different preferences. Collaborative Filtering recommendation algorithms are mainly divided into two categories, namely User-based Collaborative Filtering (User-based Collaborative Filtering, abbreviated as UserCF) and Item-based Collaborative Filtering (Item-based Collaborative Filtering, abbreviated as ItemCF). In brief, the following is: humans are classified as species and groups as groups. The collaborative filtering algorithm based on the user finds out the user's likes (such as commodity purchase, collection, content comment or share) of commodities or contents through the historical behavior data of the user, and measures and scores the likes. And calculating the relationship between the users according to attitudes and preference degrees of different users on the same commodity or content, and recommending commodities among users with the same preference. In brief, if two users A and B purchase three books x, y and z and give a good comment of 5 stars, the books w viewed by A can be recommended to the user B by the users A and B belonging to the same class. UserCF finds application in some websites (e.g., Digg), but the algorithm has some disadvantages. Firstly, as the number of users of a website is larger and larger, it is more and more difficult to calculate the user interest similarity matrix, and the increase of the operation time complexity and the space complexity and the increase of the number of users are approximate to a square relation. Second, user-based collaborative filtering makes interpretation of recommendation results difficult. Therefore, amazon, a well-known e-commerce company, proposes another article-based collaborative filtering algorithm.
An item-based collaborative filtering algorithm (ICF) recommends to users items that are similar to the items they previously liked. For example, the algorithm may recommend machine learning for you because you have purchased data mining guide. However, the ICF algorithm does not calculate the similarity between items using the content attributes of the items, and it calculates the similarity between items mainly by analyzing the behavior records of users. ICF not only provides a convincing explanation for prediction results in many recommendation scenarios, but also facilitates real-time personalization. In particular, the main calculation of estimating the similarity between items can be done off-line, whereas the online recommendation module only needs to perform a series of lookups on similar items, which is easily done in real time.
The earliest collaborative filtering ItemCF method based on items was to determine whether to add a target item to the user's recommendation list by calculating the similarity between the items that the user has been exposed to in the past and the current target item, i.e., the predicted user u for a particular itemThe grade of i is equal to the similarity s of all interacted articles j of the user u and the article i
ijMultiplying by the user's score r for j
ujAnd finally the accumulated values. The calculation formula is as follows:
early ItemCF methods used statistical measures to calculate the similarity between user historical items and target items, such as Pearson coefficient and Cosine similarity. This approach is simple but this heuristic-based approach to estimating item similarity lacks optimizations tailored to recommendations and therefore can produce suboptimal performance. Secondly, in the case of sparse data, it is assumed that the cosine similarity of the user to the unevaluated item is adjusted to 0, and the item set (co-related) evaluated by the user together in the Pearson coefficient may be small. Therefore, these methods need to be adapted and optimized to adapt different data sets to the recommended scheme. With the development of machine learning, a learning-based approach is proposed, called SLIM. The method mainly customizes a recommendation objective function to optimize the similarity between learning objects which are self-adaptive from data. That is, to minimize the loss between the original user item interaction matrix and the interaction matrix reconstructed based on the CF model of the item. Although SLIMs can achieve better recommendation accuracy, it has two inherent limitations. First, the offline training process can be very time consuming for large-scale data, since to learn directly with the similarity matrix S, the temporal complexity is O (I)
2) Magnitude. Secondly it can only estimate the similarity between two items purchased together or scored together, cannot estimate the similarity between unrelated items and therefore cannot capture the transitive relationship between items. In an actual recommendation task, particularly when data is sparse, the recommendation effect of the SLIM is reduced.
FISM addresses these limitations well. This method is primarily to represent the items as low-dimensional embedded vectors, so that the similarity s between the items
ijIt is parameterized as the inner product of the embedded vectors of items i and j. Following the userAnd the number of the articles is increased, the whole interaction matrix becomes sparse, the effectiveness of the existing Top-K recommendation method is reduced, an article-based method is provided in the FISM algorithm for generating the Top-K recommendation, and the recommendation algorithm sets the learning of the article similarity matrix as the product of two low-dimensional latent factor matrices. A whole set of experiments performed on multiple data sets at several different sparsity levels shows that the method proposed in the fish algorithm can efficiently process sparse data sets. Due to the fact, the recommendation accuracy of the FISM is superior to that of other popular Top-K recommendation algorithms, and particularly as the data set becomes sparse, the performance of the FISM is greatly improved. Although it has superior performance, it is clearly unreasonable to assume that all of the historically interactive items of the user have the same contribution to the representation of the user's preferences. For example, basketball and everyday items do not have the same effect on real-time recommended basketball shoes. Therefore, we introduce a special dynamic weight to better predict the preference of the user u for the item i, and this dynamic weight is estimated by using an attention mechanism.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an article recommendation method based on collaborative filtering.
An article recommendation method based on collaborative filtering comprises the following specific steps:
1. an article recommendation method based on collaborative filtering is characterized in that: the method comprises the following steps:
step 1: calculating the prediction score of a user u on a target item i, obtaining an embedding vector p and an embedding vector q of the prediction target item i through an embedding layer by using one-hot coding, wherein p represents that the item is a predicted item, and q represents that the item is a historical interactive item, and obtaining an evaluation index of the item, wherein the ItemCF formula based on attention is defined as follows:
a
ij=f(p
i,q
j)
wherein i is a predicted target item, and j is a user history interactive item,a
ijWeights, p, of historical interactive items to representations of user preferences calculated using an attention network
iAnd q is
jRepresenting the embedding vector of the predicted item set and the embedding vector of the user-interacted items, respectively, R represents the positive case set of user u,
showing the removal of the item i in the proper case,
is a coefficient;
step 1.1: embedding vector p to query an item set
iEmbedding vector q of user-interacted article set
jSplicing the two vectors to obtain a spliced vector c,
taking the spliced vector as the input of a point-by-point attention model, and naming the first attempt of the attention model as Dot;
step 1.1.1: independently performing three times of linear transformation on the splicing vector c, wherein the coefficient matrixes are respectively W
Q,W
K,W
VThus, the input Query, Key, Value (Q, K, V) of the attention network is obtained;
step 1.1.2: the dot product of Q and K transposes is implemented using a highly optimized matrix multiplication, after softmax, by V to get a weight matrix, expressing the Attention function as Attention (Q, K, V), and the calculation formula is as follows:
wherein d is
kExpressing the dimension of K, the softmax function converts the value into probability distribution, if the dimensions of Q, K and V are the same, the dimension of the output attention weight matrix is the same as the dimensions of the Q, K and V;
step 1.2: vector to be spliced
Putting the obtained product into a network as input, repeating the previous single point multiplied by attention for h times, splicing h times of result matrixes, and finally converting the result matrixes into required dimensionality through linear transformation, namely setting an attention function as a Self-attention model to calculate the weight of the contribution of the historical item j to the score of the user u prediction target item i, and naming the weight as Self;
step 1.3: the method comprises the steps of utilizing a main framework of a Transformer model, mainly dividing the main framework into an encoder module and a decoder module, setting the input of a first submodule of the encoder module as an embedded vector p of a target object to be predicted
iThe input of each remaining submodule is the output of the previous submodule, each encoder submodule is composed of two layers, the first layer is a self-Attention model layer, the second layer is a feedback layer, after the extension operation, the encoder and decoder both contain a fully connected forward network, including two linear transformations and a Relu activation output, and the formula is as follows:
FFN(x)=max(0,xW
1+b
1)W
2+b
2
inputting a first sub-module of the decoder module into a set q of historical items for which user interaction is set
jThe input of each remaining sub-module is the output of the previous sub-module, each decoder sub-module is composed of three layers, the first layer and the second layer are self-attention layers, but the input Q of the second layer is the output of the previous layer, K and V are the outputs of the encoder, the third layer is a feedback layer, and "Add" is added after each layer&A normaize "layer to prevent fading or explosion while preventing overfitting; the output of the model is converted to the required size by a fully connected layer and softmax function to obtain the attention weight a
ijAnd the following work is carried out, and the model is defined as Trans;
step 1.4: customizing an objective function, treating observed user-item interactions as positive examples, extracting negative examples from remaining unobserved interactions, using R
+And R
-Represents the set of positive and negative examples, uses log as a loss term, and penalizes the embedded vectors and the coefficient and bias terms of each network with L2 paradigm. Then the loss function is as follows:
where N represents the total number of training examples, σ represents sigmoid method to convert predicted values into probability values, the strength of L2 paradigm controlled by hyperparameter λ is used to prevent overfitting, θ { { p { (p)
i},{q
jW, b, h represents all trainable parameters, where W, b, h and all used parameters of linear transformation have regular penalties; a variant of the algorithm that uses stochastic gradient descent, called Adagrad optimization objective function, applies an adaptive learning rate to each parameter, extracts random samples from all training examples, and updates the relevant parameters in the negative direction of the gradient. A mini-batch method is used to randomly pick a user and then use all of its interacted article sets as a small batch.
Step 2: and (4) performing an experiment on the real article data set on the evaluation index, judging the performance according to the recommendation result, and comparing the experiment result with other recommendation methods.
The invention has the beneficial effects that:
the method applies a machine translation attention mechanism transformer in natural language processing to a recommendation model, performs experiments on the method provided by the invention on two real data sets of a movie and a picture, and evaluates by using two common recommendation model evaluation indexes of recall ratio and precision ratio. Based on the recall ratio, the method realizes the improvement of 3.2 percent relatively, and based on the precision ratio, the method realizes the improvement of 4.3 percent relatively, so the method can generate a more accurate personalized recommendation list for the user. The efficient recommendation system can provide an efficient and intelligent information filtering technology for the user under the condition that the user lacks experience in related fields or cannot process massive data, explores potential consumption tendency of the user, and provides personalized services for numerous users. Through recommending articles to the user accurately, the interest of the user can be improved, the browsing amount of the website, the click rate and the purchase rate are improved, and great convenience is brought to the life and leisure of the user while income is brought to the website. The better recommendation method can bring business value to the enterprise entity, optimize sales boundary and profit, help the product to expand the boundary, provide more various and more intimate experience through scene construction, and finally improve profit and the like.
Drawings
FIG. 1 is a basic framework of the attention-based Item CF model;
FIG. 2 is a point-by-point attention model structure;
FIG. 3 is a transform model base framework;
FIG. 4 is a comparison of the performance of the models FISM, Dot, Self, Trans at an embedding size of 16;
in FIG. a, ML-1M-HR, in FIG. b ML-1M-NDCG, in FIG. c Pinterest-20-HR and in FIG. d Pinterest-20-NDCG.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clear, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments. The specific embodiments described herein are merely illustrative of the invention and are not intended to be limiting.
An item recommendation method based on collaborative filtering comprises the following steps:
step 1: as shown in fig. 1, u is represented by Multi-hot coding, i.e. all items that user u has interacted with in the case of implicit feedback, in which case the Multi-hot coding of the user passes through the embedding layer and generates a set of vectors, where each vector represents a historical item associated with the user, and the target item to be predicted obtains its embedded vector through the embedding layer using one-hot coding. Calculating the prediction score of a user u on a target item i, and obtaining an embedding vector p and an embedding vector q of the prediction target item i through an embedding layer by using one-hot coding, wherein p represents that the item is a predicted item, and q represents that the item is a historical interactive item, as shown in FIG. 1, an attention-based ItemCF formula is defined as follows:
a
ij=f(p
i,q
j)
wherein i is a predicted target item, j is a user history interactive item, a
ijWeights, p, of historical interactive items to representations of user preferences calculated using an attention network
iAnd q is
jRepresenting the embedding vector of the predicted item set and the embedding vector of the user-interacted items, respectively, R represents the positive case set of user u,
showing the removal of the item i in the proper case,
is a coefficient;
step 1.1: embedding vector p to query an item set
iEmbedding vector q of user-interacted article set
jSplicing the two vectors to obtain a spliced vector c learning interactive weight,
taking the stitching vector as an input of the point-by-point attention model, as shown in fig. 2, the first attempt of the attention mechanism is named Dot;
step 1.1.1: independently performing three times of linear transformation on the splicing vector c, wherein the coefficient matrixes are respectively W
Q,W
K,W
VThus, the input Query, Key, Value (Q, K, V) of the attention network is obtained;
step 1.1.2: the Q and K transposed dot products are implemented using highly optimized matrix multiplication because dot products are faster, more space-saving, and can be implemented using highly optimized matrix multiplication with factors
The adjustment is performed so that the inner product is not too large, otherwise the value of the softmax layer is not 0 or 1, which causes the problem of gradient disappearance or explosion, and in doing so, the value can be kept at the position where the gradient is large, and the softmax function is to convert the value into a probability distribution, which is very friendly to the gradient calculation, and after softmax, the value is multiplied by V to cause the problem of gradient disappearance or explosionA weight matrix is obtained, and the Attention function is expressed as Attention (Q, K, V), and the calculation formula is as follows:
wherein d is
kExpressing the dimension of K, the softmax function converts the value into probability distribution, if the dimensions of Q, K and V are the same, the dimension of the output attention weight matrix is the same as the dimensions of the Q, K and V;
step 1.2: vector to be spliced
Putting the data into a network as input, repeating the previous single point times attention for h times, splicing the result matrixes of the h times, and finally converting the result matrixes into required dimensionality through linear transformation, namely setting an attention function as a Self-attention model to calculate the weight of the contribution of the historical item j to the score of the user u for predicting the target item i, and naming the weight as Self.
Step 1.3: the main framework of the transform model is mainly divided into an encoder module and a decoder module as shown in FIG. 3, wherein the input of a first sub-module of the encoder module is set as an embedded vector p of a target object to be predicted
iThe input of each remaining submodule is the output of the previous submodule, each encoder submodule is composed of two layers, the first layer is a self-Attention model layer, the second layer is a feedback layer, after the extension operation, the encoder and decoder both contain a fully connected forward network, including two linear transformations and a Relu activation output, and the formula is as follows:
FFN(x)=max(0,xW
1+b
1)W
2+b
2
inputting a first sub-module of the decoder module into a set q of historical items for which user interaction is set
jThis greatly enhances the model interpretability, the input of each of the remaining submodules is the output of the previous submodule, each decoder submodule is composed of three layers, the first and second layers are self-attention layers, but the input Q of the second layer isThe output of the previous layer, K and V are the output of the encoder, the third layer is a feedback layer, and "Add" is added after each layer&Normaize "layer to prevent fading or explosion while preventing overfitting, for the output of the model it will be transformed by a fully connected layer and softmax function to the required size to obtain attention weight α
ijAnd the following work is carried out, and the model is defined as Trans;
step 1.4: customizing an objective function, treating observed user-item interactions as positive examples, extracting negative examples from remaining unobserved interactions, using R
+And R
-Representing a set of positive and negative examples, using a loss-over-cross function as an objective function, and penalizing the embedded vectors and the coefficients and bias terms of each network with the L2 paradigm. Then the objective function is as follows:
where N represents the total number of training examples, σ represents the likelihood that sigmoid method converts predicted values to probability values representing the likelihood that user u will interact with item i, and the hyperparameter λ controls the strength used for the L2 paradigm to prevent overfitting, θ { { p { (p) }
i},{q
jW, b, h represents all trainable parameters, where W, b, h and all used parameters of linear transformation have regular penalties; a variant of the algorithm that uses stochastic gradient descent, called Adagrad optimization objective function, applies an adaptive learning rate to each parameter, extracts random samples from all training examples, and updates the relevant parameters in the negative direction of the gradient.
The present embodiment implements all models using TensorFlow, which requires that all training instances of a batch must be the same length, since some active users may have interacted with thousands of items, so that the sampled small batch training set is still very large. To solve this problem, this embodiment uses a mini-batch method to randomly pick a user and then use all the interacted article sets as a small batch, instead of randomly drawing a fixed number of training examples as a small batch of training sets. This approach has two advantages: 1) the masking skill is not needed, so the speed is higher; 2) there is no need to specify the batch size, which can avoid resizing the batch. If the attention network and the item embedding vector are trained simultaneously, the output of the attention network changes the item embedding, so that the joint training easily causes the self-adaptive effect, and the convergence speed is reduced. In order to solve the practical problem in the model training, the present embodiment uses the FISM algorithm proposed by Kabbur et al to pre-train the model, and initializes the model by using the article embedding vector learned by the FISM algorithm. Since the FISM algorithm has no self-adaptation problem, the FISM algorithm can better learn the similarity of the embedded coded objects. Therefore, the learning of the attention network can be greatly promoted by initializing the model by using the FISM algorithm, and the performance can be better and the convergence can be fast.
Step 2: and (4) performing an experiment on the real article data set on the evaluation index, judging the performance according to the recommendation result, and comparing the experiment result with other recommendation methods.
The embodiment gives a weight to each item interacted in the user history, so that the user preference can be more accurately represented by the user history item set when the user predicts and scores the target item, the recommendation effect is improved, the personalized recommendation is more accurate, and the improvements are attributed to an effective attention introducing mechanism so as to distinguish the importance of the history items in the user representation. We performed a comprehensive experiment on two authentic object data sets ML-1M and Pinterest-20 on the evaluation indices HR and NDCG to evaluate Top-K. Performance is evaluated by Hit Ratio (HR) and normalized distributed relative Gain (NDCG) of the first 10 bits of the recommended result. These two indicators have been widely used in search systems for evaluating Top-K recommendations and information retrieval documents. HR @10 may be interpreted as a recall-based metric that represents the percentage of successfully recommended users (i.e., the positive case appears in the top 10), while NDCG @10 is a measure that takes into account the predicted location of the positive case, the larger the values of these two metrics the better.
We compare the experimental results with some popular recommendations. For these embedding-based methods (MF, MLP, FISM, and models herein), the embedding size controls its modeling capabilities; therefore, we set all methods to 16. As a result, as shown in Table 1, it can be seen that the attention-based models all achieve better results and the final results are similar. They received the highest scores for NDCG and HR scores in all data sets. On the ML-1M data set, the performances of the three models are improved to a certain extent compared with the FISM, wherein the Self model is relatively improved to the FISM by 3.1 percent and 4.3 percent in the aspects of HR and NDCG. This may be a relatively simple-structured model that captures user features more fully on a less sparse data set, characterizing user preferences. On Pinterest-20, Trans is better than the other two in that it reaches the highest score and is improved by 3.2% over the FISM on NDCG, probably because of the deeper network's better ability to capture sparse data. The learning-based collaborative filtering method generally performs much better than those based on heuristics such as Pop and ItemKNN, and particularly, the FISM is much higher than ItemKNN. Considering that both methods use the same predictive model, but differ in the approach to similarity estimation of the item, the positive impact of the custom optimization on the recommendations can be clearly seen.
Table 1 comparative effect chart
As shown in fig. 4, with an article embedding size of 16, the performance of FISM and the Dot, Self and Trans proposed in this application at each epoch, the three models reach the highest scores of HR and NDCG on both data sets, which achieve the same performance level, achieving a significant improvement over FISM. We believe that these advantages are due to the efficient design of the attention network when learning item-to-item interactions. Especially at the first epoch, the performance of our model significantly exceeds the FISM, and as the training time increases, the experimental effect becomes better until convergence.
Based on the above discussion, the research on the article-based collaborative filtering algorithm is attempted to access various attention models to improve the learning of the dynamic weight coefficient, and to implement and experiment, and compared with other models, a better effect is achieved. The main contribution is (1) the demonstration that the attention mechanism helps to capture the dynamic weight of the contribution of the new item to the similarity calculation between the historical item sets that the user has been exposed to. The personalized recommendation is more accurate. (2) The attention points are calculated from the attention by using the point-by-point attention, and a good effect is achieved. (3) The method combines the transform model and the recommendation algorithm and compares the transform model with the conventional embedded model, and shows the improvement of the recommendation effect.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions as defined in the appended claims.
Claims (2)
1. An article recommendation method based on collaborative filtering is characterized in that: the method comprises the following steps:
step 1: calculating the prediction score of a user u on a target item i, obtaining an embedding vector p and an embedding vector q of the prediction target item i through an embedding layer by using one-hot coding, wherein p represents that the item is a predicted item, and q represents that the item is a historical interactive item, and obtaining an evaluation index of the item, wherein the ItemCF formula based on attention is defined as follows:
a
ij=f(p
i,q
j)
wherein i is a predicted target item, j is a user history interactive item, a
ijWeights, p, of historical interactive items to representations of user preferences calculated using an attention network
iAnd q is
jRespectively representing a predicted set of itemsThe embedding vector and the embedding vector of the user-interacted item, R represents the positive case set of user u,
showing the removal of the item i in the proper case,
is a coefficient;
step 2: and (4) performing an experiment on the real article data set on the evaluation index, judging the performance according to the recommendation result, and comparing the experiment result with other recommendation methods.
2. The collaborative filtering-based item recommendation method according to claim 1, wherein: the specific steps of the step 1 are as follows:
step 1.1: embedding vector p to predict an item set
iEmbedding vector q of user-interacted article set
jSplicing the two vectors to obtain a spliced vector c,
taking the spliced vector as the input of a point-by-point attention model, and naming the first attempt of the attention model as Dot;
step 1.1.1: independently performing three times of linear transformation on the splicing vector c, wherein the coefficient matrixes are respectively W
Q,W
K,W
VThus, the input Query, Key, Value (Q, K, V) of the attention network is obtained;
step 1.1.2: the dot product of Q and K transposes is implemented using a highly optimized matrix multiplication, after softmax, by V to get a weight matrix, expressing the Attention function as Attention (Q, K, V), and the calculation formula is as follows:
wherein d is
kExpressing the dimension of K, the softmax function converts the values into a probability distributionIf the dimensions of Q, K and V are the same, the dimensions of the output attention weight matrix are the same as the dimensions of Q, K and V;
step 1.2: vector to be spliced
Putting the obtained product into a network as input, repeating the previous single point multiplied by attention for h times, splicing h times of result matrixes, and finally converting the result matrixes into required dimensionality through linear transformation, namely setting an attention function as a Self-attention model to calculate the weight of the contribution of the historical item j to the score of the user u prediction target item i, and naming the weight as Self;
step 1.3: the method comprises the steps of utilizing a main framework of a Transformer model, mainly dividing the main framework into an encoder module and a decoder module, setting the input of a first submodule of the encoder module as an embedded vector p of a target object to be predicted
iThe input of each remaining submodule is the output of the previous submodule, each encoder submodule is composed of two layers, the first layer is a self-Attention model layer, the second layer is a feedback layer, after the extension operation, the encoder and decoder both contain a fully connected forward network, including two linear transformations and a Relu activation output, and the formula is as follows:
FFN(x)=max(0,xW
1+b
1)W
2+b
2
inputting a first sub-module of the decoder module into a set q of historical items for which user interaction is set
jThe input of each remaining sub-module is the output of the previous sub-module, each decoder sub-module is composed of three layers, the first layer and the second layer are self-attention layers, but the input Q of the second layer is the output of the previous layer, K and V are the outputs of the encoder, the third layer is a feedback layer, and "Add" is added after each layer&A normaize "layer to prevent fading or explosion while preventing overfitting; the output of the model is converted into the required size by a full connection layer and softmax function to obtain the attention weight a
ijAnd the following work is carried out, and the model is defined as Trans;
step 1.4: customizing an objective function to be observedUser-item interactions are treated as positive examples, negative examples are extracted from the remaining unobserved interactions, using R
+And R
-Representing a set of positive and negative examples, using log as a loss term and penalizing the embedded vectors and the coefficients and bias terms of each network with the L2 paradigm, then the loss function is as follows:
where N represents the total number of training examples, σ represents sigmoid method to convert predicted values into probability values, the strength of L2 paradigm controlled by hyperparameter λ is used to prevent overfitting, θ { { p { (p)
i},{q
jW, b, h represents all trainable parameters, where W, b, h and all used parameters of linear transformation have regular penalties; a variant of the stochastic gradient descent algorithm, called adagradad optimization objective function, applies an adaptive learning rate to each parameter, extracts random samples from all training examples, updates the relevant parameters to the negative direction of the gradient, employs a mini-batch method, randomly picks a user, and then uses all its interacted article sets as a small batch.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911022328.2A CN110781409B (en) | 2019-10-25 | 2019-10-25 | Article recommendation method based on collaborative filtering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911022328.2A CN110781409B (en) | 2019-10-25 | 2019-10-25 | Article recommendation method based on collaborative filtering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110781409A true CN110781409A (en) | 2020-02-11 |
CN110781409B CN110781409B (en) | 2022-02-01 |
Family
ID=69388037
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911022328.2A Active CN110781409B (en) | 2019-10-25 | 2019-10-25 | Article recommendation method based on collaborative filtering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110781409B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111737573A (en) * | 2020-06-17 | 2020-10-02 | 北京三快在线科技有限公司 | Resource recommendation method, device, equipment and storage medium |
CN112182156A (en) * | 2020-09-28 | 2021-01-05 | 齐鲁工业大学 | Aspect-level interpretable deep network scoring prediction recommendation method based on text processing |
CN112328908A (en) * | 2020-11-11 | 2021-02-05 | 北京工业大学 | Personalized recommendation method based on collaborative filtering |
CN112529414A (en) * | 2020-12-11 | 2021-03-19 | 西安电子科技大学 | Article scoring method based on multitask neural collaborative filtering network |
CN112784153A (en) * | 2020-12-31 | 2021-05-11 | 山西大学 | Tourist attraction recommendation method integrating attribute feature attention and heterogeneous type information |
CN112967101A (en) * | 2021-04-07 | 2021-06-15 | 重庆大学 | Collaborative filtering article recommendation method based on multi-interaction information of social users |
CN113158024A (en) * | 2021-02-26 | 2021-07-23 | 中国科学技术大学 | Causal reasoning method for correcting popularity deviation of recommendation system |
CN114548864A (en) * | 2022-02-15 | 2022-05-27 | 南京邮电大学 | Goods source recommendation method based on graph attention machine mechanism reinforcement learning |
CN115309975A (en) * | 2022-06-28 | 2022-11-08 | 中银金融科技有限公司 | Product recommendation method and system based on interactive features |
WO2023020185A1 (en) * | 2021-08-18 | 2023-02-23 | 华为技术有限公司 | Image classification method and related device |
CN118656550A (en) * | 2024-08-16 | 2024-09-17 | 数据空间研究院 | Location-aware collaborative recommendation method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018212710A1 (en) * | 2017-05-19 | 2018-11-22 | National University Of Singapore | Predictive analysis methods and systems |
CN109087130A (en) * | 2018-07-17 | 2018-12-25 | 深圳先进技术研究院 | A kind of recommender system and recommended method based on attention mechanism |
CN109299396A (en) * | 2018-11-28 | 2019-02-01 | 东北师范大学 | Merge the convolutional neural networks collaborative filtering recommending method and system of attention model |
-
2019
- 2019-10-25 CN CN201911022328.2A patent/CN110781409B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018212710A1 (en) * | 2017-05-19 | 2018-11-22 | National University Of Singapore | Predictive analysis methods and systems |
CN109087130A (en) * | 2018-07-17 | 2018-12-25 | 深圳先进技术研究院 | A kind of recommender system and recommended method based on attention mechanism |
CN109299396A (en) * | 2018-11-28 | 2019-02-01 | 东北师范大学 | Merge the convolutional neural networks collaborative filtering recommending method and system of attention model |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111737573A (en) * | 2020-06-17 | 2020-10-02 | 北京三快在线科技有限公司 | Resource recommendation method, device, equipment and storage medium |
CN112182156A (en) * | 2020-09-28 | 2021-01-05 | 齐鲁工业大学 | Aspect-level interpretable deep network scoring prediction recommendation method based on text processing |
CN112182156B (en) * | 2020-09-28 | 2023-02-07 | 齐鲁工业大学 | Aspect-level interpretable deep network scoring prediction recommendation method based on text processing |
CN112328908A (en) * | 2020-11-11 | 2021-02-05 | 北京工业大学 | Personalized recommendation method based on collaborative filtering |
CN112529414A (en) * | 2020-12-11 | 2021-03-19 | 西安电子科技大学 | Article scoring method based on multitask neural collaborative filtering network |
CN112529414B (en) * | 2020-12-11 | 2023-08-01 | 西安电子科技大学 | Article scoring method based on multi-task neural collaborative filtering network |
CN112784153A (en) * | 2020-12-31 | 2021-05-11 | 山西大学 | Tourist attraction recommendation method integrating attribute feature attention and heterogeneous type information |
CN112784153B (en) * | 2020-12-31 | 2022-09-20 | 山西大学 | Tourist attraction recommendation method integrating attribute feature attention and heterogeneous type information |
CN113158024B (en) * | 2021-02-26 | 2022-07-15 | 中国科学技术大学 | Causal reasoning method for correcting popularity deviation of recommendation system |
CN113158024A (en) * | 2021-02-26 | 2021-07-23 | 中国科学技术大学 | Causal reasoning method for correcting popularity deviation of recommendation system |
CN112967101A (en) * | 2021-04-07 | 2021-06-15 | 重庆大学 | Collaborative filtering article recommendation method based on multi-interaction information of social users |
WO2023020185A1 (en) * | 2021-08-18 | 2023-02-23 | 华为技术有限公司 | Image classification method and related device |
CN114548864A (en) * | 2022-02-15 | 2022-05-27 | 南京邮电大学 | Goods source recommendation method based on graph attention machine mechanism reinforcement learning |
CN115309975A (en) * | 2022-06-28 | 2022-11-08 | 中银金融科技有限公司 | Product recommendation method and system based on interactive features |
CN115309975B (en) * | 2022-06-28 | 2024-06-07 | 中银金融科技有限公司 | Product recommendation method and system based on interaction characteristics |
CN118656550A (en) * | 2024-08-16 | 2024-09-17 | 数据空间研究院 | Location-aware collaborative recommendation method and device |
Also Published As
Publication number | Publication date |
---|---|
CN110781409B (en) | 2022-02-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110781409B (en) | Article recommendation method based on collaborative filtering | |
Wu et al. | Session-based recommendation with graph neural networks | |
CN111538912B (en) | Content recommendation method, device, equipment and readable storage medium | |
CN109299396B (en) | Convolutional neural network collaborative filtering recommendation method and system fusing attention model | |
CN111797321B (en) | Personalized knowledge recommendation method and system for different scenes | |
CN110717098B (en) | Meta-path-based context-aware user modeling method and sequence recommendation method | |
Karatzoglou et al. | Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering | |
Wu et al. | Improving performance of tensor-based context-aware recommenders using bias tensor factorization with context feature auto-encoding | |
CN111737578B (en) | Recommendation method and system | |
CN112364976A (en) | User preference prediction method based on session recommendation system | |
CN111581520A (en) | Item recommendation method and system based on item importance in session | |
CN114202061A (en) | Article recommendation method, electronic device and medium based on generation of confrontation network model and deep reinforcement learning | |
Chen et al. | Generative inverse deep reinforcement learning for online recommendation | |
CN111178986B (en) | User-commodity preference prediction method and system | |
Chen et al. | Session-based recommendation: Learning multi-dimension interests via a multi-head attention graph neural network | |
CN116228368A (en) | Advertisement click rate prediction method based on deep multi-behavior network | |
CN114741590B (en) | Multi-interest recommendation method based on self-attention routing and Transformer | |
CN110851705A (en) | Project-based collaborative storage recommendation method and recommendation device thereof | |
CN117216281A (en) | Knowledge graph-based user interest diffusion recommendation method and system | |
CN114781503A (en) | Click rate estimation method based on depth feature fusion | |
Jang et al. | Attention-based multi attribute matrix factorization for enhanced recommendation performance | |
CN113763031A (en) | Commodity recommendation method and device, electronic equipment and storage medium | |
Wen et al. | Extended factorization machines for sequential recommendation | |
CN115687757A (en) | Recommendation method fusing hierarchical attention and feature interaction and application system thereof | |
CN110956528B (en) | Recommendation method and system for e-commerce platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |