1. Introduction
Traditional recommendation algorithms based on graph neural networks (GNNs) primarily learn user and item embedding representations from user–item interaction graphs [
1].
The core principle involves utilizing the graph’s topological structure to capture similar features of higher-order neighbors. However, user–item interaction data is often sparse, leading to inaccurate embedding representations for both users and items. To address this issue, research has introduced auxiliary information, such as user or item attributes, into recommendation models to alleviate the adverse effects of sparse interaction data [
2,
3]. Most recommendation studies incorporate either user attributes or item attributes as the sole auxiliary information for modeling. These attributes act as bridging elements, enabling the capture of more similar features within the graph. Some studies have employed both user and item attributes for modeling [
4], yet these models generally apply linear weighting to the attribute nodes, resulting in composite attribute representations. This simplistic approach assumes that attribute nodes function independently, thereby overlooking the interactions between attributes. As a result, the intricate relationships between user and item attributes are insufficiently explored, which ultimately hinders improvements in recommendation performance.
In graph-based recommendation models, researchers have increasingly introduced attributes as entity nodes to address the challenge of data sparsity. However, some studies [
2,
5] have neglected to distinguish between the types of neighboring nodes during the graph convolution process, overlooking the unique influences that different types of neighboring nodes can have on the target node. While other studies [
4,
6] have considered the interactions between attribute features, they often allow user–item attribute nodes to directly interact with user–item nodes during graph convolution. This approach, however, fails to account for the fact that different node types reside in distinct spatial feature domains. As a result, calculating node similarity in this manner can lead to inaccurate representations of user preferences. These inaccuracies, in turn, introduce biases into user preference modeling, which ultimately degrade the performance of the recommendation method.
To address the issues mentioned above, this paper proposes an attribute-aware graph convolutional recommendation method (AAGCNR). The model consists of several layers: an input layer, an attribute feature interaction layer, a user attribute preference mining layer, a user attribute preference fusion layer, an attribute co-convolution layer, and a prediction layer. First, the model applies separate embedding operations to the user attribute graph, the item attribute graph, and the user–item interaction graph to obtain the initial embedding representations for users, items, user attribute nodes, and item attribute nodes. Second, a multi-head self-attention mechanism is employed to capture complex semantic information from the user and item attribute features, considering their interactions. This mechanism allows the model to process the attribute information in both the user attribute space and the item attribute space. Linear and nonlinear interaction features between these attributes are then fused, resulting in user-fused attribute embedding representations and item-fused attribute embedding representations, which enhance the embedding quality by capturing fine-grained, complex relationships between attributes. Next, to differentiate between the types of neighboring nodes during the graph convolution process better, the model employs another self-attention mechanism. This mechanism models the influences of both group preferences and personalized preferences on users, thereby improving the accuracy of user–item interaction behavior during the convolution. By incorporating user preference features directly into the graph convolution process, the model enhances the embedding representations of both users and items. Finally, score prediction is carried out through an inner product operation based on the refined embeddings.
The primary contributions of this paper are as follows:
1. To address the feature interaction problem between attribute nodes in the graph, we first introduce user attribute features and item attribute features into the graph. We then model the interaction between these user and item attributes using a multi-head self-attention mechanism and a bilinear interaction module.
2. To address the impact of neighborhood node types on target nodes, we enhance the modeling of user preference features for item attributes. This approach guides the interaction between user and item nodes during the graph convolution process based on the users’ preferences for item attributes.
3. Methods
This paper introduces the attribute-aware graph convolutional network recommendation method (AAGCNR) in detail, with the model framework illustrated in
Figure 1. The model is structured with several key components: an input layer, an attribute feature interaction layer, a user attribute preference mining layer, a user attribute preference fusion layer, an attribute collaborative convolution layer, and a prediction layer. First, embedding operations are applied separately to the user attribute graph, the item attribute graph, and the user–item interaction graph to generate the initial embeddings of user nodes, item nodes, user attribute nodes, and item attribute nodes. Next, the interactions between user attribute features and item attribute features are considered. A multi-head self-attention mechanism is utilized to capture the complex semantic information within both the user attribute space and the item attribute space. The linear and nonlinear interaction features between these attributes are then fused, resulting in user-fused attribute embeddings and item-fused attribute embeddings. By capturing the intricate relationships between attributes at a fine-grained level, the embeddings of users and items are refined further. Finally, a self-attention mechanism is employed to model the influence of group preferences and personalized preferences on users. This mechanism enhances the graph convolution process by incorporating user preference features, improving the accuracy of user–item interaction predictions. The final step involves score prediction, which is carried out through an inner product operation based on the optimized user and item embeddings.
3.1. Node Embedding and Feature Mapping
After preprocessing, the input variables are fed into the initialization mapping layer to obtain embedded representations for each variable. The dataset involved in the model includes variables such as user ID u, item ID v, user attributes , and item attributes . These variables undergo numerical encoding starting from zero, incrementally, following preprocessing. The encoded variables are then used as inputs to initialize the mapping layer, which aims to derive initial vector representations of the variables in the latent space based on the row vectors of the initial embedding matrix corresponding to their encoded values.
In the context of user u, the fully connected layer matrices for item v, user attributes , and item attributes are denoted as , , and , respectively. These initialization matrices have their variables initialized based on random sampling from a normal distribution with a given mean and standard deviation. Here, m represents the number of users, n represents the number of items, p denotes the number of user attribute values, and q denotes the number of item attribute values. d signifies the dimensionality of the latent space vectors for users and items, while denotes the dimensionality of the latent space vectors for user attributes and item attributes. After encoding the variables input into the initialization mapping layer, corresponds to the i-th row of matrix U, j corresponds to the j-th row of matrix V, and corresponds to the k-th row of matrix X.
3.2. The Attribute Feature Interaction Layer
3.2.1. The Self-Attention Mechanism Layer
After the input variables are processed by the initialization mapping layer in the previous section, the AAGCN model employs a multi-head self-attention mechanism in the self-attention layer. This approach is taken because user attributes and item attributes do not reside in the same semantic space. Consequently, the multi-head self-attention mechanism captures the semantic features between user and item attributes across different semantic spaces, thereby enriching the semantic representation of both user and item attributes.
Each user has a set of user attributes as input, with p representing the number of user attributes. Similarly, each item has a set of item attributes as input, with q representing the number of item attributes. Each attribute is represented as a latent vector of dimensions. To capture the correlation coefficients between these attributes, the initialized user attribute vectors and the initialized item attribute vectors are used as inputs for the multi-head self-attention mechanism. The attribute initialization embedding vectors x with dimensions are transformed into embedding representations of dimensions, where h denotes the number of attention heads.
To obtain the correlation weight coefficient
between the
i-th attribute vector and the
j-th attribute vector under the
h-th attention head, the following formula is used:
.
Next, the representation
under the
h-th attention head is obtained through a weighted linear combination of vectors, as shown below, where
is the weight matrix under the
h-th attention head:
Thus, different interactions between attribute vectors
and
are constructed under various attention heads
h.
under different attention heads is spliced to obtain
.
Here,
h represents the number of heads in the multi-head self-attention mechanism, and
refers to the vector concatenation operation. To prevent the loss of original information in the attribute vectors, a ReLU activation function is added, incorporating the original information
into the attention-concatenated vector
, as shown in the formula below:
The above-described multi-head self-attention mechanism integrates the correlations between user and item attributes into their embedding representations. This process enriches the embedding representations of both the user and item attributes, thereby enhancing their expressive power.
3.2.2. The Bilinear Interaction Layer
In the previous section, the multi-head self-attention mechanism was used to capture the semantic features between user attributes and item attributes, integrating the semantics into the attribute embedding representations. Inspired by the NFM [
20], a feature cross-pooling function
is designed to capture the second-order interactions between features, which is then fused with a first-order linear weighting function to capture the interaction relationships between features. After these two modules, a fully connected layer is added to model the complex nonlinear interactions between user attributes and item attributes further.
Here,
denotes the bias term,
N represents the total number of features,
indicates the weight value of feature
,
represents the feature cross-pooling function as shown below, and ⊙ denotes the inner product operation.
Therefore, after obtaining the output from the multi-head self-attention layer in this model, a linear aggregation function
is employed to derive the linear representations of the user attributes
and the item attributes
, respectively.
In this context, p denotes the number of attributes associated with user u, where represents the attention vector of these user attributes and indicates their respective weight values. Likewise, q denotes the number of attributes associated with item v, where represents the attention vector of these item attributes and indicates their respective weight values.
Firstly, the embedding representations of the interactions between pairs of user attributes
and pairs of item attributes
are obtained using an attribute interaction pooling function
, as shown in Equations (
10) and (
11).
In this context, ⊙ denotes the operation of the inner product.
After obtaining the first-order linear representation
and the second-order nonlinear representation
of user attributes and the first-order linear representation
and the second-order nonlinear representation
of item attributes, the high-order interaction of the two linear representations of user attributes and the two linear representations of item attributes, respectively, is obtained through a fully connected network
, and the higher-order nonlinear embedding representations
and
between the attributes are obtained.
represents the parameter matrix of the fully connected layer, while denotes the bias constant.
By implementing the aforementioned methods, a reduction in the dimensionality of the embedding representations was achieved. Furthermore, the high-order relationships between user attributes and item attributes were thoroughly explored, resulting in more accurate and comprehensive user–item embeddings.
3.3. The User Attribute Preference Mining Layer
The preference of a user for an item cannot be comprehensively described solely based on personalized attribute preferences. This limitation is particularly evident when dealing with users with a limited interaction history. Furthermore, expressing user preferences exclusively through the preferences of the user group to which they belong lacks personalization. Based on the initialization of the user ID and item ID embeddings from the mapping layer, these embeddings are then input into the user preference mining module. This series of user preference modeling aims to enhance the precision of the interactions between nodes during the convolution process, resulting in more accurate user and item embedding representations. For example, if user u has three attributes, Age, Gender, and Occupation, then it is necessary to identify user groups with the same Age attribute , the same Gender attribute , and the same Occupation attribute first. Subsequently, the personalized preferences for the item attributes within each user attribute group are statistically analyzed and normalized to obtain the user attribute group preference weights.
3.3.1. Modeling Users’ Personalized Preferences
First, the frequency of attributes in the historical interaction items of user
u is statistically analyzed and normalized to obtain the personalized preference weights of user
u for the item attributes. Suppose user
u has interacted with
n items in the past, represented by
. Each item
corresponds to an original attribute vector
that represents all the attribute nodes of the item, where
q is the number of item attributes. The personalized preference weights of users for item attributes can be calculated using Equations (
14) and (
15).
In the process of calculating weights, the method for calculating the attribute weights is divided into two categories based on the data type of the item attribute values: continuous, discrete, or boolean types.
First, the personalized preference weights
of user
u for boolean item attributes
are calculated using Equation (
14), where
n represents the number of historical interaction items
v for user
u,
indicates that item
v has attribute
, and
indicates that item
v does not have attribute
.
Second, because the continuous attribute (Release Year), the Actor attribute, and the Director attribute have multiple attribute values, Equation (
15) is used to calculate the individualized preference of user
u for the items
v for continuous (Release Year) attributes and other (Director) attributes
and the
j-th attribute value
for project
v. In this context,
n represents the number of items
v that user
u has historically interacted with;
denotes items in
v with attribute type
and the attribute value
; and
indicates items in
v that do not have attribute value
under attribute type
.
3.3.2. Group Preferences Based on User Attributes
Firstly, a second-order neighborhood hop is performed on the user attribute graph
, resulting in the related attribute groups
being obtained for each user attribute
u, as illustrated below.
Based on the personalized attribute preference weights computed, the personalized preferences of users within each attribute group are statistically analyzed and normalized. This process yields the group attribute preference weights for each attribute group concerning the project attributes. The normalization methods for the attribute preference weights corresponding to different attribute value types are presented in Equations (
17) and (
18).
Here,
represents the normalized preference weight of user group U for item attribute
.
N denotes the number of individuals in the attribute group.
indicates the preference weight of users within group
U for item attribute
.
In this context, represents the preference weight of the normalized user attribute group U towards the j-th attribute under item attribute , while signifies the preference weight of an individual user u within the user group U towards the j-th attribute under item attribute .
3.4. The User Attribute Preference Fusion Layer
Based on the individualized user preferences and the user attribute group preferences derived from the multi-head self-attention-mechanism-based item attribute and user attribute preference mining module, these features are input into the user attribute preference fusion module to construct a fused preference embedding representation for user–item attribute interactions. This representation serves as the attribute embedding for both user nodes and item nodes in the attribute collaboration graph.
When the quantity of user historical interactions is substantial, personalized preferences offer a more comprehensive and accurate portrayal of user preferences, while group attribute preferences serve as a supplementary description of the user attributes. Conversely, when historical interactions are limited, group attribute preferences dominate in characterizing user preferences, with personalized attribute preferences providing complementary insights.
Based on the weights of the users’ personalized preferences and the user attribute group preferences derived, weighted fusion is used in conjunction with the item attribute embeddings outputted by the multi-head self-attention mechanism, ultimately yielding an embedded representation
of the user attributes. This fusion of the two attribute preference vectors takes into account the volume of user historical interactions, enabling adaptive fusion of the attribute preference embeddings, as depicted in Equation (
19). Here,
represents the embedded representation of the user attribute group,
denotes the personalized embedded representation of user attributes, and the coefficient
signifies the influence of the embedded representation of the user attribute group preferences on the overall user attribute embedding.
During the process of weighted fusion for the embedded representation of user attribute group preferences
, two fusion methods are devised. The first weighted fusion approach involves calculating the mean sum of each attribute group embedding representation
for the user, as shown below.
denotes the number of attributes possessed by the user.
The second approach for computing
involves first calculating the similarity
between a user’s personalized preference weight vector and the user attribute group preference weight vector. This assesses the similarity of the preferences between the user and attribute groups. Subsequently, the similarity coefficient
obtained is utilized to weight the embedding representation
for each attribute group, as depicted in Equation (
21).
When fusing the embedded representations of multiple attributes of an item, since our model’s embedding representation
of user attributes is derived from fusing the preferences over each item attribute embedding, each attribute embedding of the item is treated equally as an inherent and immutable factor characterizing the item itself. The fusion of each attribute yields the attribute embedding representation
of the item, as detailed below.
In this context, represents the attribute embedding representation of an item, where q denotes the number of attributes possessed by the item, and signifies the embedded representation corresponding to each individual attribute of the item.
The combined attribute embedding representation of the user output from the user attribute preference fusion layer; the combined attribute embedding representation of the item; and the embedded representation of the item attributes input into the multi-head self-attention layer, as well as the user embedding representation and the item embedding representation output from the initial mapping layer, are collectively fed into the attribute collaborative convolution module for graph convolution operations.
3.5. The Attribute Collaborative Convolution Layer
Based on the modeling of user attributes to derive user preferences for item attributes in the previous section, graph convolution operations are performed on the attribute collaboration graph to obtain the embedding representations of users and items. In the attribute collaboration graph, each user u corresponds to two embedding representations: the embedding representation of the user ID and the embedding representation of user attributes. Meanwhile, each item v corresponds to multiple embedding representations: the embedding representation of the item ID, the attribute embedding representation of the item, and the embedding representations corresponding to each attribute node of the item. For the propagation and aggregation processes on the attribute collaboration graph, an information dissemination strategy is adopted, where the embedding representation of a target node is obtained by performing graph convolutional propagation and aggregation of its neighboring nodes on the attribute collaboration graph.
Unlike other convolutional aggregation operations, our approach differentiates between different types of neighboring nodes for the target node rather than treating different node types in the neighborhood equally. Furthermore, interactions of the previously obtained embedding representations of user group attributes with those of item attributes are taken into account to perform similarity calculations. These similarity coefficients are then used as connection weights for nodes during the information aggregation process, enhancing the precision of convolutional aggregation and mitigating overfitting issues that may arise from multiple layers of aggregation, ultimately improving the accuracy of the recommendations.
As both the user’s attribute embedding representation and the item’s attribute embedding representation are derived through fusion based on the item attribute embedding representations, they possess rich semantics. Consequently, incorporating their similarity as the correlation between nodes during the convolutional aggregation process refines the interaction between nodes, enhancing the interpretability of the convolutional process.
3.5.1. The Embedding Representation of Item Nodes
During the convolutional aggregation on the attribute collaboration graph, an item node v aggregates information from its adjacent user nodes and item attribute nodes . Here, denotes the set of neighboring user nodes for item v, while represents the set of neighboring attribute nodes for item v.
Given that item attributes are regarded as inherent values of the item, it is assumed that each attribute contributes equally to the embedding representation of the item’s attributes. Consequently, an average aggregation approach to the item attributes is employed, denoted as . Here, signifies the embedding representation of attribute node b.
In the neighborhood of item v, the user node aggregates , where denotes the embedded representation of user IDs, and serves as a parameter that controls the flow of information relevant to user u towards item v.
When computing
, a similarity calculation is first performed between the user’s attribute embedding representation
and the item’s attribute embedding representation
, where
, both of which are obtained through fusing multiple attribute embedding based on items. This results in the similarity
between an item and its neighboring nodes. Finally, the similarity coefficients are normalized using Equation (
25) to yield the normalized similarity coefficient
between nodes.
As the information on the target node itself is also crucial for its embedding representation, a self-connection function is employed to aggregate the information from the node itself.
Finally, the information from the neighborhood of the target node v is fused to obtain the final embedding representation
of the item node, as shown in Equation (
26). Specifically, a fully connected layer, followed by a LeakyReLU activation function, is utilized to aggregate the information from the neighboring nodes of the target node.
wherein
represents the user node
u in the neighborhood of item node
v, and
denotes the attribute node
b in the neighborhood of item node
v.
3.5.2. The Embedding Representation of User Nodes
During the convolution process on the attribute collaboration graph, the target node
u aggregates messages passed from its adjacent item nodes
. This involves the process
, whereby the embedding representation vector of item
v conveys information to user
u.
represents the parameter that controls the extent of the information flow from item
v to user
u, and its computation method is analogous to that for transmitting information from items to their surrounding user neighbors. This yields the attention weight
, reflecting the user’s perceived importance of the item’s attributes. Finally, through weight normalization, the final normalized attribute-aware attention weight
is obtained.
Analogous to the process of fusing multiple types of attribute information with the item nodes, considering the significance of a user’s own node information for their embedding representation, self-connection is employed to incorporate their own information into the embedding representation, denoted as .
Ultimately, the final embedding representation
of user node
is derived by fusing the information from the neighborhood of target node
u, as detailed in Equation (
28). Specifically, the information from the neighboring nodes of the target node is aggregated through a fully connected layer, followed by a LeakyReLU activation function.
3.6. The Prediction Layer and Optimization
To optimize the AAGCN model, the BPR (Bayesian Personalized Ranking) method is chosen to optimize the model parameters. The BPR model assumes that the interactions observed are indicative of stronger user preferences and should be assigned higher predicted values than unobserved interactions, as shown in Equation (
29).
wherein
represents the training data for the AAGCN model,
denotes positive user–item interactions, and
represents non-positive user–item interactions.
signifies the preference score of user
u towards item
i, with the scoring function detailed in Equation (
30).
wherein ⊙ denotes the inner product operation, while
and
represent the concatenated vectors of the composite attribute embeddings of users and items, respectively, with the graph convolutional behavioral preference embeddings.
4. Experiments
4.1. Datasets
To validate the effectiveness of the model, three publicly available datasets were selected, each containing rich user attribute information and item attribute information and varying in dataset size and sparsity: MovieLens-100K [
4,
16], MovieLens-1M [
4,
16], and DoubanBook [
3]. MovieLens-100K and MovieLens-1M are a series of movie rating datasets collected and released by the GroupLens research group for the MovieLens website, a movie recommendation service platform. These datasets have been used widely in the fields of machine learning and recommendation systems. They contain integer ratings ranging from 1.0 to 5.0 given by users to movies, as well as user attribute information, such as Age, Gender, and Occupation, and movie-related attribute information, such as Release Year, Genre, Actor, and Director. DoubanBook is an online book rating dataset from Douban. All of these datasets include multiple attributes for both users and items. Attributes such as user location and group, book author, publisher, and publication year are considered as features.
Based on the input requirements of the model, unformatted and non-standard data need to be converted into the format required by the model. Thus, the three datasets require preprocessing. First, the dataset is transformed into implicit feedback based on users’ explicit rating records for items. If a user provides positive feedback, it is marked as 1, indicating an interaction between the user and the item. Negative feedback or a lack of interaction is marked as 0. For the rating threshold, scores greater than or equal to 4 are considered positive interactions. Next, user and item attribute information is converted according to the structure of the data. The relevant statistics for the MovieLens-100K, MovieLens-1M, and DoubanBook datasets after preprocessing are shown in
Table 1.
The aforementioned preprocessed datasets were partitioned into training, validation, and testing sets at an 8:1:1 ratio to conduct relevant experimental comparisons and analyses.
4.2. Evaluation Metrics
The Top-K recommendation system aims to present the K most intriguing items to users, with K typically varying as a parameter within the set 10, 20, 50, 100. Additionally, recall and Normalized Discounted Cumulative Gain (NDCG) serve as pertinent evaluation metrics within the Top-K recommendation system. These metrics are employed to benchmark the recommendation performance of relevant models through comparative analysis.
Recall@K denotes the proportion of items interacted with by the user, out of all of the items the user has interacted with in the test set, that are present within the list of K items recommended to the user. A higher recall value signifies the superior performance of the recommendation system.
wherein
represents the list of K items recommended by the recommendation system to user
u, whereas
signifies the comprehensive list of all items interacted with by user
u within the test set. Ultimately, the Recall@K evaluation metric for the model is derived by averaging the Recall@K values across all users, each computed by comparing the recommended items against a user’s actual interactions.
To define the Normalized Discounted Cumulative Gain (NDCG@K), it is first necessary to establish the concept of Discounted Cumulative Gain (DCG), as outlined below.
wherein
represents the relevance of the item at the
i-th position in the recommendation list to the user’s interests.
indicates the presence of an actual interaction between the user and the item, whereas
signifies the absence of such an interaction. Subsequently, NDCG (Normalized Discounted Cumulative Gain) is obtained by normalizing the DCG value by dividing it by the theoretically maximum DCG value. The definition of NDCG is as follows:
Within this context, IDCG@K (Ideal Discounted Cumulative Gain at position K) represents the optimal DCG that a model could theoretically achieve in predicting the best recommendation list for a user. Compared to recall, Normalized Discounted Cumulative Gain (NDCG) not only considers the number of correctly predicted items but also accounts for their relative positions and relevance within the recommended list, making it a more comprehensive and reliable metric for evaluating the performance of recommendation systems.
4.3. Comparative Approaches
This section delves into the baseline models employed for comparison with the two proposed recommendation models in this paper. These recommendation models are categorized into four distinct groups.
(1) Graph-neural-network-based methods
NGCF [1]. The NGCF model is a GCN-based collaborative filtering recommendation algorithm that leverages graph data structures. It explicitly encodes the high-order connected topological structures of user–item interactions into collaborative information through an information propagation and aggregation mechanism applied to the user–item interaction graph. Subsequently, it utilizes the learned embeddings of users and items, which encapsulate high-order collaborative information, to generate recommendations.
LightGCN [11]. The LightGCN model is an algorithm that builds upon and improves upon the NGCF model. It replaces the nonlinear activation functions and feature transformation operations in the GCN with a simple weighted sum aggregator, employing a lightweight GCN operation to learn the user and item embeddings. This not only simplifies the model but also enhances the training efficiency of the recommendation algorithm and the encoding capability of user–item embedding vectors.
(2) Feature-interaction-focused methods
NFM [20]: The NFM model is an algorithm that builds upon and improves upon Factorization Machines (FMs). The model’s feature interaction capabilities are enhanced by replacing the inner product operation of latent vectors in traditional FMs with a more expressive multi-layer neural network, thereby improving the performance of the recommendation system to a certain extent.
SAIN [4]: The SAIN model introduces a self-attention mechanism into feature interactions, enabling it to capture the interplay between user attribute features and item attribute features. This effectively integrates user–item interaction information with user and item attribute information, ultimately utilizing the fused embedding representations learned to generate recommendations.
(3) Attention-mechanism-integrated methods
AFM [21]: The AFM model enhances the FM model by incorporating an attention mechanism. Unlike the NFM model, which simply performs a weighted sum operation after feature interactions across different attribute types, AFM learns the importance weights of the resulting cross-features formed through feature interactions across various attribute types. By assigning greater attention weights to these cross-features, the AFM model improves its ability to model feature interactions, enhancing the overall modeling capacity of the system.
KGAT [2]: The KGAT model is a recommendation model that integrates graph neural networks with an attention mechanism. Based on a collaborative knowledge graph constructed from user–item interaction graphs and item knowledge graphs, KGAT aggregates neighbor information through an attention-based neighbor aggregator. It employs an attention mechanism to determine the contribution of different neighbor nodes to the representation of the current node. Furthermore, a multi-layer attention network is utilized to learn the distinct node representations at each layer, with node embeddings being propagated and aggregated across layers. This effectively integrates graph topological information with item attribute features, enhancing the model’s ability to learn graph embedding representations.
(4) Hybrid approaches that fuse attribute and behavioral methods
DG-ENN [6]: The DG-ENN model, based on the proposed user attribute graph and item attribute graph, leverages a GCN to independently learn the embedding representations of user and item attribute features. These learned attribute embeddings are then utilized to enhance the quality of the user and item embeddings derived from the user–item interaction graph, thereby improving the expressive power of the embedding representations.
AF-GCN [3]: The AF-GCN model introduces an attention-based attribute fusion module that integrates multiple attribute nodes for both users and items into composite attribute nodes. It then performs graph convolutional operations on a heterogeneous graph constructed from <user, item, attribute> triplets. Ultimately, the learned embeddings of users and items at different layers are utilized to accomplish recommendations.
AGNN [16]: The AGNN model, grounded in attribute graphs, proposes an eVAE architecture that infers the preference embeddings from the attribute distributions. Through a gated GNN structure, it effectively aggregates different types of attribute nodes in the target node’s neighborhood, enhancing the embedding representation capabilities of attributes.
AGMR [17]: The AGMR model acknowledges that the influence of attributes and behaviors on the entity nodes varies, thus devising a fine-grained preference fusion strategy that integrates attribute-based group preferences with individual behavioral preferences. This approach enhances the accuracy, comprehensiveness, and personalization of the embedding representations.
4.4. Parameter Settings
In all models, the embedding dimensions for the user IDs and item IDs were fixed at 64, while those for user attributes and item attributes were set to 64. The number of graph neural network layers was configured as 2, with a regularization parameter of 0.001 and a batch size of 1024. All model parameters were optimized using the Adam optimizer. Regarding hyperparameter tuning, a grid search method was employed, specifically adjusting the learning rate within the range of and initializing the model parameters with the Xavier initializer. As the majority of mainstream models adopt the Top-20 recommendation task for evaluating model performance, this comparative experiment followed suit. Furthermore, an early stopping strategy was implemented during model training, whereby training would cease prematurely if the Recall@20 on the validation dataset failed to improve over 50 consecutive epochs. A comparative analysis of the proposed model against relevant baseline models was conducted on the Top-20 recommendation task, with Recall@20 and NDCG@20 serving as quantitative metrics for assessing the model performance.
4.5. Experimental Results
An analysis of the experimental comparison results presented in
Table 2 reveals that the AAGCNR model outperforms ten baseline methods on both datasets in the Top-20 recommendation task.
Compared with the best model, NGCF, based on the user–item interaction features in graph neural networks, our model achieves an improvement of 5.57% in Recall@20 and 4.33% in NDCG@20 on the MovieLens-100K dataset; 15.00% in Recall@20 and 16.88% in NDCG@20 on the MovieLens-1M dataset; and 31.87% in Recall@20 and 29.88% in NDCG@20 on the DoubanBook dataset. Compared with the best results from the feature interaction models SAIN and AFM, which incorporate attention mechanisms, our model achieves an improvement of 4.28% in Recall@20 and 1.61% in NDCG@20 on the MovieLens-100K dataset; 5.34% in Recall@20 and 2.80% in NDCG@20 on the MovieLens-1M dataset; and 8.85% in Recall@20 and 7.32% in NDCG@20 on the DoubanBook dataset. This demonstrates the effectiveness of the AAGCNR model in combining user–item interaction features with attribute interaction features.
Compared with the models DG-EN and AGMR, which combine attribute feature interactions and behavioral features, our model achieves an improvement of 2.10% in Recall@20 and 1.35% in NDCG@20 on the MovieLens-100K dataset; 6.28% in Recall@20 and 3.20% in NDCG@20 on the MovieLens-1M dataset; and 11.89% in Recall@20 and 9.45% in NDCG@20 on the DoubanBook dataset. This further confirms the effectiveness of the AAGCNR model in integrating user–item interaction features with attribute interaction features.
The comparative analysis of the experimental results with baseline models from different fields demonstrates the effectiveness of the AAGCNR model, which combines user–item interaction features with attribute interaction features. It more effectively integrates interaction features and user–item attribute interaction features, learning richer and more accurate embedding representations, thereby improving the performance of the recommendation system.
4.6. Ablation Experiments and Parametric Analysis
4.6.1. Ablation Experiments
To further verify the effectiveness of combining attribute interaction feature information with graph convolution feature information compared to using only one type of feature and to assess the role of the attention fusion layer in enhancing the user and item embeddings, we designed the following variants based on the AAGCNR model:
AGCNR_DI: This variant removes the bilinear interaction layer compared to AAGCNR and does not consider the relationships between user and item attributes.
AAGCNR_DG: This variant removes the graph convolution model compared to AAGCNR and directly uses the output of the user preference fusion module as the input to the prediction layer.
AAGCNR_DA: This variant removes the self-attention layer compared to AAGCNR and does not capture the relationships between attributes in different semantic spaces.
The effectiveness of each module in the model is validated through the Top-20 recommendation task. The results of the ablation experiments are shown in
Table 3.
As shown in
Table 3, the models AAGCNR-DI, which removes the interaction feature information between attributes, and AAGCNR-DG, which removes the graph convolution module, perform the worst. Both are inferior to the AAGCNR-DA model, which lacks the attention mechanism. This indicates that the embedding representations of users and items are significantly influenced by the characteristics of low-frequency items. During the graph convolution process, distinguishing the categories of neighboring nodes is crucial, as different types of nodes affect the target node. Therefore, integrating the interaction features between attributes with the behavioral features between users and items and incorporating them through graph convolution can embed more information into the representations of users and items, thereby improving the recommendation performance of the model. For the AAGCNR-DA model, which removes the attention mechanism, the adaptive attention coefficients a and b were set to fixed values of 0.5 during the experiment, eliminating the influence of the attention mechanism on the embedding representations of users and items. The experimental results of the AAGCNR-DA model were significantly lower than those of the AAGCNR model, indicating that the adaptive attention fusion mechanism is crucial in balancing the embeddings of user attributes and user IDs, as well as item attributes and item IDs, for the final embedding representations of users and items.
4.6.2. Parametric Analysis
Experiments were conducted to investigate the impact of varying the hyperparameters on the performance of the AAGCNR model. These hyperparameters included the embedding dimension d for user and item IDs, the embedding dimension for user and item attributes, and the number of multi-head self-attention heads h. The results were compared to assess the influence of different parameter settings on the model’s overall performance.
The impact of varying the number of multi-head self-attention heads
h on the experimental results is compared in
Figure 2 when using the Top-20 recommendation task on the Movielens-100K and Movielens-1M datasets and the DoubanBook dataset.
The comparative analysis of the experimental results in
Figure 2 reveals that with all the other model parameters held optimal and constant, increasing the number of attention heads
h from 1, 2, 4, to 8 initially leads to an improvement in model performance. However, when h reaches 8, a decline in performance is observed on both datasets, indicating that an excessive number of attention heads introduces excessive noise into the interaction features between attributes, thereby negatively affecting the model performance.
Figure 3 illustrates the impact of varying the embedding dimension
d for the user and item IDs on the experimental results, as assessed through the Top-20 recommendation task on the Movielens-100K and Movielens-1M datasets and the DoubanBook dataset.
The comparative analysis of the experimental results in
Figure 3 indicates that with all the other model parameters held optimal and constant, increasing the embedding dimension
d for user and item IDs from 8, 16, 32, 64, to 128 leads to a consistent upward trend in the performance on both datasets. Specifically, on the Movielens-100K and DoubanBook datasets, the optimal performance is achieved at
, followed by a decline in model performance for
, suggesting that excessive noise in the embedding representations hinders further performance gains. On the Movielens-1M dataset, the optimal performance is also observed at
, with fluctuations noted for
, characterized by a slight decrease in Recall@20 and a marginal increase in NDCG@20.
Figure 4 illustrates the comparative impact of varying the embedding dimension
for the user and item attributes on the experimental results, as evaluated through the Top-20 recommendation task on the Movielens-100K, Movielens-1M, and DoubanBook datasets.
Through a comparative analysis of the experimental results in
Figure 4, it is evident that with the other model parameters held optimal and unchanged, by adjusting the embedding dimension
for the user and item attributes, the performance improves on both datasets as
increases, reaching the optimal value at
.
Through a comparative analysis of the impact of varying the embedding dimension d for the user and item IDs and the embedding dimension for attributes on the experimental results, it is evident that the interactive features between attributes, as well as user–item interaction features, are both effective in enhancing the recommendation performance. Furthermore, the magnitude of the change in Recall and NDCG for the ID embedding dimension d across the range of 8 to 128 is lower than that observed for the attribute embedding dimension within the same range, indicating that attribute interactions play a role in alleviating the sparsity issue inherent in user–item interaction behavior features.
5. Conclusions
This paper introduces an attribute-aware graph convolutional recommendation method, AAGCNR. The model first incorporates both user and item attribute features into the graph. By constructing attribute interaction features alongside user behavioral features, it addresses the limitations of relying solely on user–item interaction data, effectively mitigating the issue of sparse interaction data. Additionally, the model refines the aggregation process of neighboring nodes around the target node by constructing a collaborative convolutional graph. The interactions during the graph convolution process are informed by correlations between the composite attribute embeddings, thereby enhancing the interpretability of the graph convolution operation.
While existing graph recommendation algorithms, which integrate user and item attribute interaction features with user–item behavioral features, have shown some success, certain areas still require improvement and optimization. Based on the findings of this study, future research could explore the following directions:
(1) Most attribute-based recommendation algorithms are relatively slow to detect changes in user preferences. By integrating variables such as time factors and item popularity, models could become more responsive to newly introduced popular items.
(2) The current recommendation algorithms predominantly focus on analyzing and modeling users’ historical behaviors and the attributes of items they have previously engaged with. Introducing interactive recommendation mechanisms could increase the exposure of items that users have not yet interacted with.