CN118096237A

CN118096237A - A deep learning-driven customer behavior prediction model

Info

Publication number: CN118096237A
Application number: CN202410263063.XA
Authority: CN
Inventors: 滕文娟
Original assignee: Beijing Jiahuaming Brand Planning Co ltd Guangdong Branch
Current assignee: Beijing Jiahuaming Brand Planning Co ltd
Priority date: 2024-03-08
Filing date: 2024-03-08
Publication date: 2024-05-28
Anticipated expiration: 2044-03-08
Also published as: CN118096237B

Abstract

The invention discloses a client behavior prediction model driven by deep learning, which comprises the following steps: step 1, constructing an interaction matrix; step 2, performing matrix decomposition on the interaction matrix; step 3, constructing two feature matrixes based on interaction submatrix results of non-negative matrix factorization; step 4, compressing and learning hidden vector representation of the user behavior sequence based on the self-encoder network; and 5, obtaining the behavior prediction of the downstream task by using the joint hidden vector learned by the self-encoder network as input. The invention utilizes the time sequence and the user behavior data and combines the deep learning method, so that the prediction model can capture the user interests and the behavior modes more accurately, thereby improving the prediction accuracy.

Description

A deep learning-driven customer behavior prediction model

技术领域Technical Field

本发明属于计算机技术领域，尤其涉及一种深度学习驱动的客户行为预测模型。The present invention belongs to the field of computer technology, and in particular to a deep learning driven customer behavior prediction model.

背景技术Background technique

随着信息技术和大数据分析技术的快速发展，客户行为预测在电子商务、金融分析、市场营销等领域变得日益重要。传统的客户行为预测方法依赖于统计分析和机器学习技术，如逻辑回归、决策树和支持向量机等，这些方法往往以静态的用户特征为基础，缺乏对用户兴趣随时间演变的动态捕捉。With the rapid development of information technology and big data analysis technology, customer behavior prediction has become increasingly important in e-commerce, financial analysis, marketing and other fields. Traditional customer behavior prediction methods rely on statistical analysis and machine learning techniques, such as logistic regression, decision trees and support vector machines, which are often based on static user characteristics and lack the ability to dynamically capture the evolution of user interests over time.

在现有技术中，虽然已经有研究尝试通过引入时间序列分析来追踪用户行为的动态变化，但这些方法在处理高维度、大规模和稀疏的用户-商品交互数据时仍面临挑战。此外，现有的时间序列分析方法通常要求对数据进行复杂的预处理和特征工程，这不仅增加了模型的复杂度，也限制了模型的泛化能力。In the existing technology, although some studies have tried to track the dynamic changes of user behavior by introducing time series analysis, these methods still face challenges when dealing with high-dimensional, large-scale and sparse user-product interaction data. In addition, existing time series analysis methods usually require complex data preprocessing and feature engineering, which not only increases the complexity of the model, but also limits the generalization ability of the model.

深度学习方法能够自动提取用户行为中的高级特征，但是在实际应用中，深度学习模型需要大量的标记数据进行训练，而在许多实际场景中，尤其是当新产品推出时，可用的历史交互数据非常有限，这使得深度学习模型难以发挥其最大的优势。Deep learning methods can automatically extract high-level features from user behavior, but in practical applications, deep learning models require a large amount of labeled data for training. In many practical scenarios, especially when new products are launched, the available historical interaction data is very limited, which makes it difficult for deep learning models to play their greatest advantages.

因此，如何设计一个能够充分利用用户行为数据，同时又能适应数据稀疏和动态变化特征的预测模型，并且能够以尽可能少的计算资源计算对当前客户行为预测，成为了当前客户行为预测领域亟需解决的技术难题。Therefore, how to design a prediction model that can fully utilize user behavior data while adapting to data sparsity and dynamic changes, and can calculate current customer behavior predictions with as few computing resources as possible has become a technical problem that urgently needs to be solved in the current customer behavior prediction field.

发明内容Summary of the invention

针对上述现有技术中存在的缺陷，本发明提供一种深度学习驱动的客户行为预测方法，所述方法包括：In view of the defects in the above-mentioned prior art, the present invention provides a deep learning driven customer behavior prediction method, the method comprising:

步骤1，通过数据收集阶段获取用户对商品的购买次数、浏览次数和评分等历史交互数据进行交互矩阵构建；Step 1: Obtain historical interaction data such as the number of purchases, views, and ratings of products by users during the data collection phase to construct an interaction matrix;

步骤2，对所述交互矩阵进行矩阵分解，将三维的交互矩阵转换为二维矩阵后，应用非负矩阵分解方法，分解二维矩阵为两个非负矩阵；Step 2, performing matrix decomposition on the interaction matrix, converting the three-dimensional interaction matrix into a two-dimensional matrix, and applying a non-negative matrix decomposition method to decompose the two-dimensional matrix into two non-negative matrices;

步骤3，基于非负矩阵分解的交互子矩阵结果构建两个特征矩阵，分别构造用户特征矩阵W'和时间特征矩阵H'；Step 3, construct two feature matrices based on the interaction submatrix results of non-negative matrix decomposition, and construct the user feature matrix W' and the time feature matrix H' respectively;

步骤4，基于自编码器网络来压缩并学习用户行为序列的隐向量表示，包括基于用户特征矩阵W'和时间特征矩阵H'的联合自编码器学习用户和时序特征的共同隐向量；Step 4: compress and learn the latent vector representation of the user behavior sequence based on the autoencoder network, including learning the common latent vector of the user and time series features based on the joint autoencoder of the user feature matrix W' and the time feature matrix H';

其中，在隐向量融合中通过哈达玛积融合；Among them, the latent vector fusion is fused through the Hadamard product;

步骤5，使用自编码器网络学习到联合隐向量作为输入，获得下游任务的行为预测。Step 5: Use the autoencoder network to learn the joint latent vector as input to obtain behavior predictions for downstream tasks.

其中，所述步骤1具体包括初始化一个三维矩阵M，其维度由商品数I、特征数F和时间点数T确定；The step 1 specifically includes initializing a three-dimensional matrix M, whose dimensions are determined by the number of products I, the number of features F, and the number of time points T;

在矩阵初始化阶段，所有元素值均设为零；During the matrix initialization phase, all element values are set to zero;

其后，根据收集到的数据填充矩阵，对应具体商品的价格、浏览次数、购买次数和评分等特征在不同时间点的值。Afterwards, the matrix is filled in based on the collected data, corresponding to the values of the specific product’s features such as price, number of views, number of purchases, and ratings at different points in time.

其中，所述步骤2中还包括在所述交互矩阵分解过程中，基于衰减因子和置信度调整后的优化目标为每个用户构建分解的交互子矩阵。Wherein, the step 2 also includes constructing a decomposed interaction sub-matrix for each user based on the optimization target after the attenuation factor and the confidence adjustment during the interaction matrix decomposition process.

其中，所述步骤3中特征矩阵W′是直接从非负矩阵分解得到的W矩阵，特征矩阵H′通过转置非负矩阵分解得到的H矩阵构建。The characteristic matrix W′ in step 3 is the W matrix directly obtained from non-negative matrix decomposition, and the characteristic matrix H′ is constructed by transposing the H matrix obtained from non-negative matrix decomposition.

其中，所述步骤5中所述下游任务的目标包括预测用户的购买次数；Wherein, the goal of the downstream task in step 5 includes predicting the number of purchases by the user;

以及，使用隐向量Z作为线性回归模型的输入特征来预测购买次数。And, use the latent vector Z as the input feature of the linear regression model to predict the number of purchases.

其中，创建一个三维矩阵M，其维度为商品数I、特征数F和时间点数T，所述三维矩阵中的每个元素初始设置为0；Here, a three-dimensional matrix M is created, whose dimensions are the number of products I, the number of features F, and the number of time points T, and each element in the three-dimensional matrix is initially set to 0;

其中i∈I、f∈F和t∈T，i∈{1，2，...，I}：商品索引；f∈{1，2，3，4}：特征索引，分别对应价格、浏览次数、购买次数、评分；t∈{1，2，...，T}：时间索引；M[i][f][t]：三维矩阵的元素，表示第i个商品的第f个特征在时间t的值；Where i∈I, f∈F and t∈T, i∈{1, 2, ..., I}: product index; f∈{1, 2, 3, 4}: feature index, corresponding to price, number of views, number of purchases, and ratings respectively; t∈{1, 2, ..., T}: time index; M[i][f][t]: element of the three-dimensional matrix, representing the value of the fth feature of the i-th product at time t;

对尺寸为I×F×T的三维矩阵M填充，填充元素用m_ift表示；Fill the three-dimensional matrix M of size I×F×T, and the filling elements are represented by _mift ;

对于每个商品i、特征f和时间点t，填充矩阵中的相应元素：For each product i, feature f, and time point t, fill in the corresponding elements in the matrix:

m_i1t：商品i在时间t的价格；m _i1t : the price of commodity i at time t;

m_i2t：商品i在时间t的浏览次数；m _i2t : the number of views of product i at time t;

m_i3t：商品i在时间t的购买次数；m _i3t : the number of times product i is purchased at time t;

m_i4t：商品i在时间t的评分。m _i4t : The rating of item i at time t.

其中，对于时间点t，衰减因子δ(t)可以被定义为：δ(t)＝e^-λ(T-t)；Wherein, for time point t, the attenuation factor δ(t) can be defined as: δ(t) = e ^-λ(Tt) ;

其中：in:

λ是衰减率参数，决定了兴趣随时间的衰减速度；λ is the decay rate parameter, which determines how fast the interest decays over time;

T是最近的时间点，用于归一化时间衰减；T is the most recent time point, used to normalize time decay;

t是当前时间点；t is the current time point;

其中，对于商品i、特征f和时间点t，基于用户交互的置信度权重ci_ftAmong them, for product i, feature f and time point t, the confidence weight ci _f t based on user interaction

定义为：c_ift＝1+α·r_ift，It is defined as: c _ift ＝1+α·r _ift ,

其中：in:

r_ift是用户对商品i在时间t的特征f的评分；R _ift is the user's rating of feature f of product i at time t;

α是一个超参数，用于调整置信度权重的影响。α is a hyperparameter used to adjust the influence of confidence weights.

其中，基于衰减因子和置信度调整后的优化目标为每个用户构建分解的交互子矩阵，优化目标函数具体为：Among them, based on the optimization objective after the attenuation factor and confidence adjustment, a decomposed interaction submatrix is constructed for each user, and the optimization objective function is specifically:

其中：in:

m_ift是原始三维矩阵M中商品i的特征f在时间t的值；m _ift is the value of feature f of product i at time t in the original three-dimensional matrix M;

[W·H]_ift是近似矩阵WH中相应的元素；[W·H] _ift is the corresponding element in the approximation matrix WH;

c_ift是置信度加权；c _ift is confidence weighting;

δ(t)是时间衰减因子。δ(t) is the time decay factor.

其中，所述步骤4中包括定义自编码器网络结构、确定隐层维度、确定损失函数、训练自编码器，训练完成后，提取隐向量；Wherein, the step 4 includes defining the autoencoder network structure, determining the hidden layer dimension, determining the loss function, training the autoencoder, and extracting the hidden vector after the training is completed;

其中，输入特征矩阵，通过编码器生成隐向量，然后通过解码器重构输出，计算损失并通过反向传播更新网络权进行自编码器的训练。The feature matrix is input, the latent vector is generated by the encoder, and then the output is reconstructed by the decoder, the loss is calculated, and the network weights are updated through back propagation to train the autoencoder.

本发明采用非负矩阵分解方法，可以在保持数据非负性的前提下，将原始稀疏矩阵分解为低维度的特征矩阵，有效缓解数据稀疏性问题。通过自编码器网络学习用户行为序列的隐向量表示，本发明能够学习到用户和时序特征的共同隐向量，使得特征表示更加丰富和准确。The present invention adopts a non-negative matrix decomposition method, which can decompose the original sparse matrix into a low-dimensional feature matrix while maintaining the non-negativity of the data, effectively alleviating the data sparsity problem. By learning the latent vector representation of the user behavior sequence through the autoencoder network, the present invention can learn the common latent vector of the user and time series features, making the feature representation richer and more accurate.

本发明利用时间序列和用户行为数据，结合深度学习方法，使得预测模型能够更精确地捕获用户兴趣和行为模式，从而提高预测的准确性。The present invention utilizes time series and user behavior data, combined with deep learning methods, so that the prediction model can more accurately capture user interests and behavior patterns, thereby improving the accuracy of the prediction.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

通过参考附图阅读下文的详细描述，本公开示例性实施方式的上述以及其他目的、特征和优点将变得易于理解。在附图中，以示例性而非限制性的方式示出了本公开的若干实施方式，并且相同或对应的标号表示相同或对应的部分，其中：By reading the detailed description below with reference to the accompanying drawings, the above and other purposes, features and advantages of the exemplary embodiments of the present disclosure will become readily understood. In the accompanying drawings, several embodiments of the present disclosure are shown in an exemplary and non-limiting manner, and the same or corresponding reference numerals represent the same or corresponding parts, wherein:

图1是示出根据本发明实施例的一种深度学习驱动的客户行为预测模型的流程图。FIG1 is a flow chart showing a deep learning driven customer behavior prediction model according to an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案和优点更加清楚，下面将结合附图对本发明作进一步地详细描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例，都属于本发明保护的范围。In order to make the purpose, technical scheme and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

在本发明实施例中使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本发明。在本发明实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义，“多种”一般包含至少两种。The terms used in the embodiments of the present invention are only for the purpose of describing specific embodiments, and are not intended to limit the present invention. The singular forms "a", "said" and "the" used in the embodiments of the present invention and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings, and "multiple" generally includes at least two.

应当理解，尽管在本发明实施例中可能采用术语第一、第二、第三等来描述……，但这些……不应限于这些术语。这些术语仅用来将……区分开。例如，在不脱离本发明实施例范围的情况下，第一……也可以被称为第二……，类似地，第二……也可以被称为第一……。It should be understood that although the terms first, second, third, etc. may be used to describe ... in the embodiments of the present invention, these ... should not be limited to these terms. These terms are only used to distinguish .... For example, without departing from the scope of the embodiments of the present invention, the first ... may also be referred to as the second ..., and similarly, the second ... may also be referred to as the first ....

应当理解，本文中使用的术语“和/或”仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。另外，本文中字符“/”，一般表示前后关联对象是一种“或”的关系。It should be understood that the term "and/or" used in this article is only a description of the association relationship of associated objects, indicating that there can be three relationships. For example, A and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone. In addition, the character "/" in this article generally indicates that the associated objects before and after are in an "or" relationship.

取决于语境，如在此所使用的词语“如果”、“若”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地，取决于语境，短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。As used herein, the words "if" and "if" may be interpreted as "at the time of" or "when" or "in response to determining" or "in response to detecting", depending on the context. Similarly, the phrases "if it is determined" or "if (stated condition or event) is detected" may be interpreted as "when it is determined" or "in response to determining" or "when detecting (stated condition or event)" or "in response to detecting (stated condition or event)", depending on the context.

还需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的商品或者装置不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种商品或者装置所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的商品或者装置中还存在另外的相同要素。It should also be noted that the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a product or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such product or device. In the absence of more restrictions, the elements defined by the sentence "comprises a ..." do not exclude the presence of other identical elements in the product or device including the elements.

传统的用户行为预测方法往往忽略了用户兴趣随时间变化的动态性，且用户-商品交互数据通常是稀疏的，这给预测任务带来挑战。在传统方法中，用户和商品的特征往往是独立处理，忽略了特征之间可能存在的相关性。Traditional user behavior prediction methods often ignore the dynamics of user interests over time, and user-item interaction data is usually sparse, which brings challenges to the prediction task. In traditional methods, user and item features are often processed independently, ignoring the possible correlation between features.

如图1所示，本发明提出了一种深度学习驱动的客户行为预测方法，所述方法包括：As shown in FIG1 , the present invention proposes a deep learning driven customer behavior prediction method, the method comprising:

步骤1，通过数据收集阶段获取用户对商品的购买次数、浏览次数和评分等历史交互数据进行交互矩阵构建。具体包括初始化一个三维矩阵M，其维度由商品数I、特征数F和时间点数T确定。在矩阵初始化阶段，所有元素值均设为零。其后，根据收集到的数据填充矩阵，对应具体商品的价格、浏览次数、购买次数和评分等特征在不同时间点的值。Step 1: Obtain the historical interaction data of users on the number of purchases, views, and ratings of the products through the data collection phase to construct the interaction matrix. Specifically, it includes initializing a three-dimensional matrix M, whose dimensions are determined by the number of products I, the number of features F, and the number of time points T. In the matrix initialization phase, all element values are set to zero. Afterwards, fill the matrix according to the collected data, corresponding to the values of the price, views, purchases, and ratings of the specific products at different time points.

步骤2，对所述交互矩阵进行矩阵分解，该步骤考虑了时间因素和用户交互置信度，以更准确地反映用户的兴趣变化。其中，交互矩阵分解过程中，基于衰减因子和置信度调整后的优化目标为每个用户构建分解的交互子矩阵，以找到最能代表用户交互数据的矩阵分解。Step 2, matrix decomposition of the interaction matrix, which takes into account the time factor and user interaction confidence to more accurately reflect the changes in user interests. In the process of interaction matrix decomposition, a decomposed interaction sub-matrix is constructed for each user based on the optimization target adjusted by the attenuation factor and confidence to find the matrix decomposition that best represents the user interaction data.

以及，将三维交互数据矩阵(用户×商品特征×时间)转换为二维矩阵后，应用非负矩阵分解方法，分解二维矩阵为两个更小的非负矩阵，这两个矩阵分别捕捉用户和商品特征的隐含关系。Furthermore, after converting the three-dimensional interaction data matrix (user × product feature × time) into a two-dimensional matrix, the non-negative matrix factorization method is applied to decompose the two-dimensional matrix into two smaller non-negative matrices, which respectively capture the implicit relationship between user and product features.

步骤3，基于非负矩阵分解的结果构建特征矩阵，分别构造用户特征矩阵W'和时间特征矩阵H'。Step 3: construct a feature matrix based on the result of non-negative matrix decomposition, and construct a user feature matrix W' and a time feature matrix H' respectively.

特征矩阵W'是直接从非负矩阵分解得到的W矩阵。The feature matrix W' is the W matrix obtained directly from non-negative matrix factorization.

特征矩阵H'通过转置非负矩阵分解得到的H矩阵构建。The feature matrix H' is constructed by transposing the H matrix obtained by non-negative matrix factorization.

步骤4，基于自编码器网络来压缩并学习用户行为序列的隐向量表示，包括基于用户特征矩阵W'和时间特征矩阵H'的联合自编码器学习用户和时序特征的共同隐向量。Step 4: compress and learn the latent vector representation of the user behavior sequence based on the autoencoder network, including learning the common latent vector of user and time series features based on the joint autoencoder of the user feature matrix W' and the time feature matrix H'.

其中，在隐向量融合中通过哈达玛积融合。Among them, the latent vector fusion is carried out through the Hadamard product.

步骤5，使用自编码器网络学习到联合隐向量作为输入，获得下游任务的行为预测。其中，训练一个下游任务的行为预测模型。其中，所述下游任务的目标包括预测用户的购买次数。Step 5, using the joint latent vector learned by the autoencoder network as input to obtain the behavior prediction of the downstream task. Herein, a behavior prediction model for the downstream task is trained. Herein, the goal of the downstream task includes predicting the number of purchases of the user.

在某一实施例中，步骤1，进行交互矩阵构建中首先需要收集用户的历史交互数据，包括某一用户对每个商品的购买次数、浏览次数和评分数据，矩阵初始化后基于上述数据进行填充矩阵。In one embodiment, in step 1, the user's historical interaction data needs to be collected first in constructing the interaction matrix, including the number of purchases, views and ratings of each product by a certain user. After the matrix is initialized, the matrix is filled based on the above data.

使用数据库查询或API调用从内部系统中收集数据。如果数据存储在关系型数据库中(例如MySQL，PostgreSQL，SQL Server等)，编写SQL查询来提取不同时间点的商品价格、浏览次数、购买次数和评分数据。如果系统提供API接口，可以通过API调用来获取数据。Collect data from internal systems using database queries or API calls. If the data is stored in a relational database (such as MySQL, PostgreSQL, SQL Server, etc.), write SQL queries to extract product prices, views, purchases, and ratings at different points in time. If the system provides an API interface, you can get data through API calls.

设置定时任务(如cron作业)来定期执行数据抽取脚本。以及设置事件驱动，包括在商品信息变动、用户浏览或购买发生时，通过事件钩子(webhooks)或消息队列(如RabbitMQ，Kafka等)触发数据抽取。Set up scheduled tasks (such as cron jobs) to execute data extraction scripts regularly. And set up event-driven, including triggering data extraction through event hooks (webhooks) or message queues (such as RabbitMQ, Kafka, etc.) when product information changes, users browse or purchase.

将清洗后的数据存储在合适的存储系统中，包括将数据插入到关系型或非关系型数据库表中，设计表结构以优化查询性能。或者，使用数据仓库技术来存储和管理数据。Store the cleaned data in a suitable storage system, including inserting the data into a relational or non-relational database table and designing the table structure to optimize query performance. Alternatively, use data warehouse technology to store and manage the data.

在某一实施例中，假设有I个商品和T个时间点。特征F包括价格(Price)、浏览次数(Browse Count)、购买次数(Purchase Count)和评分(Rating)。三维矩阵M的大小将是I×F×T。In one embodiment, assume that there are I products and T time points. Features F include price, browse count, purchase count, and rating. The size of the three-dimensional matrix M will be I×F×T.

矩阵构建步骤，包括：The matrix construction steps include:

1.初始化矩阵1. Initialize the matrix

创建一个三维矩阵M，其维度为商品数I、特征数F和时间点数T。矩阵中的每个元素初始设置为0，其中i∈I、f∈F和t∈T。Create a three-dimensional matrix M with the dimensions of the number of products I, the number of features F, and the number of time points T. Each element in the matrix is initially set to 0, where i∈I, f∈F, and t∈T.

其中，in,

i∈{1，2，...，I}：商品索引；i∈{1, 2, ..., I}: product index;

f∈{1，2，3，4}：特征索引，分别对应价格、浏览次数、购买次数、评分；f∈{1, 2, 3, 4}: feature index, corresponding to price, number of views, number of purchases, and ratings, respectively;

t∈{1，2，...，T}：时间索引；t∈{1, 2, ..., T}: time index;

M[i][f][t]：三维矩阵的元素，表示第i个商品的第f个特征在时间t的值。M[i][f][t]: The elements of the three-dimensional matrix represent the value of the f-th feature of the i-th product at time t.

2.填充矩阵2. Fill the matrix

对尺寸为I×F×T的三维矩阵M填充，填充元素用m_ift表示。Fill the three-dimensional matrix M of size I×F×T, and the filling elements are represented by _mift .

填充后的三维矩阵A′的元素A_ift′可以按以下公式定义：The element A _ift ′ of the filled three-dimensional matrix A′ can be defined by the following formula:

其中，in,

A_ift是原始三维矩阵中位置在(i，f，t)的元素，通常初始化状态默认为0。A _ift is the element at position (i, f, t) in the original three-dimensional matrix, and its initialization state is usually 0 by default.

m_ift′是填充后的三维矩阵A′中位置在(i，f，t)的元素。m _ift ′ is the element at position (i, f, t) in the filled three-dimensional matrix A′.

i、f、t分别代表深度、行和列的索引，满足1≤i≤I，1≤f≤F，1≤t≤Ti, f, t represent the depth, row, and column indices respectively, satisfying 1≤i≤I, 1≤f≤F, 1≤t≤T

m_ift是填充值，其中：m _ift is the fill value, where:

对于矩阵M中的各个元素，有：For each element in the matrix M, we have:

M[i][1][t]：商品i在时间t的价格；M[i][2][t]：商品i在时间t的浏览次数；M[i][3][t]：商品i在时间t的购买次数；M[i][4][t]：商品i在时间t的评分。M[i][1][t]: the price of product i at time t; M[i][2][t]: the number of views of product i at time t; M[i][3][t]: the number of purchases of product i at time t; M[i][4][t]: the rating of product i at time t.

即，对于每个商品i、特征f和时间点t，填充矩阵中的相应元素：That is, for each product i, feature f, and time point t, fill in the corresponding elements in the matrix:

m_i4t：商品i在时间t的评分。m _i4t : The rating of item i at time t.

在某一实施例中，步骤2，交互矩阵分解，包括对于较早的交互数据，可以赋予一个衰减因子，反映用户兴趣的动态变化，以及根据用户交互的置信度(如评价分数)调整每个元素的权重。基于衰减因子和置信度调整后的优化目标为每个用户构建分解的交互子矩阵。In one embodiment, step 2, interaction matrix decomposition, includes assigning a decay factor to earlier interaction data to reflect the dynamic changes of user interests, and adjusting the weight of each element according to the confidence of the user interaction (such as the evaluation score). A decomposed interaction sub-matrix is constructed for each user based on the decay factor and the optimization target after the confidence adjustment.

在某一实施例中，首先，定义时间衰减因子，该因子用于捕捉用户兴趣随时间变化的动态性。对于时间点t，衰减因子δ(t)可以被定义为：δ(t)＝e^-λ(T-t)，In one embodiment, first, a time decay factor is defined, which is used to capture the dynamics of user interests changing over time. For time point t, the decay factor δ(t) can be defined as: δ(t) = e ^-λ(Tt) ,

其中：in:

t是当前时间点。t is the current time point.

定义置信度加权。对于商品i、特征f和时间点t，基于用户交互的置信度权重c_ift定义为：c_ift＝1+α·r_ift，Define confidence weighting. For product i, feature f, and time point t, the confidence weight c _ift based on user interaction is defined as: c _ift = 1 + α·r _ift ,

其中：in:

基于衰减因子和置信度调整后的优化目标为每个用户构建分解的交互子矩阵。矩阵分解问题选择非负矩阵分解(NMF)作为基础方法，并且的目标是找到两个非负矩阵W和H，使得它们的乘积可以近似原始矩阵M。考虑到时间衰减和置信度加权，的优化问题可以写成：Based on the optimization objective adjusted by the decay factor and confidence, a decomposed interaction submatrix is constructed for each user. The matrix decomposition problem selects non-negative matrix factorization (NMF) as the basic method, and the goal is to find two non-negative matrices W and H so that their product can approximate the original matrix M. Considering time decay and confidence weighting, the optimization problem can be written as:

其中：in:

c_ift是置信度加权；c _ift is confidence weighting;

δ(t)是时间衰减因子。δ(t) is the time decay factor.

通过非负矩阵分解(NMF)来近似的尺寸为I×F×T的三维矩阵M，需要将三维矩阵M转换为一个二维矩阵，具体涉及到将一个或多个维度“展开″以形成一个二维矩阵。Approximating a three-dimensional matrix M of size I×F×T by non-negative matrix factorization (NMF) requires converting the three-dimensional matrix M into a two-dimensional matrix, which specifically involves "unfolding" one or more dimensions to form a two-dimensional matrix.

将时间维度T视为一个内部维度，并将M展开为一个I×(F×T)大小的二维矩阵，每行对应一个商品，每列对应该商品的一个特征在一个具体时间点的值。The time dimension T is regarded as an internal dimension, and M is expanded into a two-dimensional matrix of size I×(F×T), where each row corresponds to a product and each column corresponds to the value of a feature of the product at a specific point in time.

然后，NMF会将这个二维矩阵M分解为两个较小的非负矩阵W和H，其中W是尺寸为I×K的矩阵，H是尺寸为K×(F×T)的矩阵，且K是选择的隐特征的数量。Then, NMF decomposes this two-dimensional matrix M into two smaller non-negative matrices W and H, where W is a matrix of size I×K, H is a matrix of size K×(F×T), and K is the number of latent features selected.

例如，对于一个有I个商品、F个特征和T个时间点的数据集，M可以表示为For example, for a dataset with I products, F features, and T time points, M can be expressed as

在某一实施例中，假设K＝2，那么W和H如下所示：In one embodiment, assuming K=2, W and H are as follows:

其中，W的每一行w_i代表商品i和K个隐特征之间的关系。H的每一列h_f代表K个隐特征对应于所有商品的特征f在时间点t的值。Each row _wi of W represents the relationship between product i and K latent features. Each column _hf of H represents the value of feature f of K latent features corresponding to all products at time point t.

分解的目标是选择W和H以最小化加权平方误差：The goal of the decomposition is to choose W and H to minimize the weighted squared error:

其中，c_ift是置信度权重，δ(t)是时间衰减因子，[W·H]_ift是W和H相乘的结果矩阵中对应的。Among them, c _ift is the confidence weight, δ(t) is the time decay factor, and [W·H] _ift is the corresponding matrix in the result matrix of multiplying W and H.

例如，有3个商品(I＝3)，2个特征(F＝2)，和4个时间点(T＝4)，那么M的二维形式将是一个3×8矩阵。将展开的I×(F×T)矩阵使用NMF分解为两个矩阵W和H。W是I×K矩阵，H是K×(F×T)矩阵，其中K是隐特征数量。For example, if there are 3 products (I=3), 2 features (F=2), and 4 time points (T=4), then the two-dimensional form of M will be a 3×8 matrix. The unfolded I×(F×T) matrix is decomposed into two matrices W and H using NMF. W is an I×K matrix and H is a K×(F×T) matrix, where K is the number of latent features.

设K＝2，则W和H如下所示：Assume K = 2, then W and H are as follows:

其中，w_ik表示商品i和隐特征k之间的关系，h_kf表示隐特征k和特征f在时间点t的关系。矩阵WH的每个元素[WH]_ift是W的第i行与H的第f×T+t列乘积的和。Among them, w _ik represents the relationship between product i and latent feature k, and h _kf represents the relationship between latent feature k and feature f at time point t. Each element [WH] _ift of the matrix WH is the sum of the product of the i-th row of W and the f×T+t-th column of H.

例如，M矩阵中商品1的特征1在时间点1的近似值[WH]₁₁₁，由以下计算得出：For example, the approximate value [WH] ₁₁₁ of feature 1 of item 1 at time point 1 in the M matrix is calculated as follows:

[WH]₁₁₁＝w₁₁·h₁₁+w₁₂·h₂₁。[WH] ₁₁₁ ＝w ₁₁ ·h ₁₁ +w ₁₂ ·h ₂₁ .

这个值是W的第一行和H的第一列相乘的结果。需要对所有i、f、t进行这样的计算，以填充WH矩阵的所有元素。This value is the product of the first row of W and the first column of H. This calculation needs to be done for all i, f, t to fill all the elements of the WH matrix.

最终，通过调整W和H来最小化原始三维矩阵M中的元素m_ift与WH矩阵中相应元素[WH]_ift之间的加权平方差，加权因子是置信度c_ift乘以时间衰减因子δ(t)。这样，就可以得到W和H的最优值。Finally, W and H are adjusted to minimize the weighted square difference between the element m _ift in the original three-dimensional matrix M and the corresponding element [WH] _ift in the WH matrix, where the weighting factor is the confidence c _ift multiplied by the time decay factor δ(t). In this way, the optimal values of W and H can be obtained.

NMF分解可能会丢失一些重要信息，这取决于分解时选择的隐特征数量K的大小。如果K设置得太小，可能无法捕捉到所有重要的特征。因此，在实际应用中，需要通过实验来确定最佳的k值，例如实际中K可能大于2。NMF decomposition may lose some important information, depending on the size of the number of latent features K selected during decomposition. If K is set too small, all important features may not be captured. Therefore, in practical applications, it is necessary to determine the optimal k value through experiments, for example, in practice K may be greater than 2.

本发明将三维矩阵利用NMF分解成两个二维矩阵，可以在后续基于深度学习类算法的训练优化过程中极大地减少对云计算资源量的需求，降低了所需要的计算成本。The present invention decomposes the three-dimensional matrix into two two-dimensional matrices using NMF, which can greatly reduce the demand for cloud computing resources in the subsequent training optimization process based on deep learning algorithms and reduce the required computing cost.

步骤3，序列特征矩阵构建Step 3: Construction of sequence feature matrix

确定序列长度并构建特征矩阵，每个用户的行为序列被转换成两个二维矩阵。Determine the sequence length and construct the feature matrix. Each user's behavior sequence is converted into two two-dimensional matrices.

在某一实施例中，对于非负矩阵分解的结果W和H，构建特征矩阵，包括：In one embodiment, for the results W and H of the non-negative matrix decomposition, a feature matrix is constructed, including:

构建特征矩阵W′。W矩阵代表用户和隐特征之间的关系，W′可以直接设为W：Construct the feature matrix W′. The W matrix represents the relationship between users and latent features, and W′ can be directly set to W:

其中，I是商品的数量，K是隐特征的数量。Among them, I is the number of products and K is the number of latent features.

构建特征矩阵H′H矩阵代表时间步和特征之间的关系。需要构建H′来表征每个时间步的项目特征。首先需要确定序列长度L_H，它等于H中的特征数量k。然后转置H来构建H′，以便每一行h_j′代表时间步j的特征向量： Construct the feature matrix H′ The H matrix represents the relationship between time steps and features. H′ needs to be constructed to characterize the item features at each time step. First, the sequence length L _H needs to be determined, which is equal to the number of features k in H. Then H′ is constructed by transposing H so that each row h _j ′ represents the feature vector of time step j:

转置后，H′的每一行现在代表一个时间步的特征向量，列代表不同的隐特征。这样，H′的维度变为(F×T)×K。After transposition, each row of H′ now represents a feature vector of a time step, and the columns represent different latent features. In this way, the dimension of H′ becomes (F×T)×K.

其中，T是时间步的数量，F是特征的数量，K是与W中相同的隐特征数量。Where T is the number of time steps, F is the number of features, and K is the same number of latent features as in W.

因此，特征矩阵W′和H′可以被看作是用户随时间变化的行为的编码，它们可以被用来训练机器学习模型，以预测用户的未来行为的偏好。Therefore, the feature matrices W′ and H′ can be viewed as encodings of users’ behaviors over time, and they can be used to train machine learning models to predict users’ future behavior preferences.

步骤4，时序行为矩阵自编码Step 4: Time series behavior matrix self-encoding

步骤4涉及创建一个自编码器网络来学习用户行为序列的压缩表达(隐向量)。自编码器是一种无监督学习算法，它通过最小化输入和输出之间的差异来学习数据的有效表示。以下步骤4的具体实现方式如下：1.定义自编码器网络结构。自编码器通常包含两个主要部分，编码器(Encoder)和解码器(Decoder)。编码器负责将输入数据编码成隐向量，而解码器负责将隐向量解码还原成原始数据。在定义网络结构时，需要确定每层的神经元数量以及网络的深度。确定神经元数量，可以从少量的神经元开始，逐渐增加，直到模型的性能不再显著提升。过多的神经元可能导致过拟合，而过少可能导致欠拟合。较深的网络可以学习更复杂的特征，但也更难训练，更容易过拟合，从两个隐藏层开始，并根据需要逐渐增加深度。Step 4 involves creating an autoencoder network to learn a compressed representation (latent vector) of the user behavior sequence. An autoencoder is an unsupervised learning algorithm that learns an effective representation of data by minimizing the difference between input and output. The specific implementation of step 4 is as follows: 1. Define the autoencoder network structure. An autoencoder usually contains two main parts, an encoder and a decoder. The encoder is responsible for encoding the input data into a latent vector, while the decoder is responsible for decoding the latent vector back to the original data. When defining the network structure, it is necessary to determine the number of neurons in each layer and the depth of the network. To determine the number of neurons, you can start with a small number of neurons and gradually increase it until the performance of the model is no longer significantly improved. Too many neurons may lead to overfitting, while too few may lead to underfitting. Deeper networks can learn more complex features, but are also more difficult to train and more prone to overfitting. Start with two hidden layers and gradually increase the depth as needed.

2.确定隐层维度：选择一个合适的隐层维度d，这通常是一个超参数，可以通过实验来优化。隐层维度决定了隐向量的大小，以及自编码器的压缩能力。2. Determine the hidden layer dimension: Choose an appropriate hidden layer dimension d, which is usually a hyperparameter that can be optimized through experiments. The hidden layer dimension determines the size of the hidden vector and the compression ability of the autoencoder.

3.选择损失函数：自编码器的训练目标是最小化输入和输出之间的差异。损失函数是基于均方误差(MSE)确定的。3. Choose the loss function: The training goal of the autoencoder is to minimize the difference between the input and output. The loss function is determined based on the mean squared error (MSE).

4.训练自编码器：输入特征矩阵，通过编码器生成隐向量，然后通过解码器重构输出，计算损失并通过反向传播更新网络权重。4. Train the autoencoder: input the feature matrix, generate the latent vector through the encoder, then reconstruct the output through the decoder, calculate the loss and update the network weights through backpropagation.

5.训练完成后，提取隐向量。5. After training is completed, extract the latent vector.

在某一实施例中，步骤4基于联合自编码器框架，在这个框架中，会使用W和H作为输入到自编码器的编码器部分，并且网络使其能够同时学习用户的隐向量和时序行为特征的表示，具体包括：In one embodiment, step 4 is based on a joint autoencoder framework, in which W and H are used as the encoder part of the input to the autoencoder, and the network enables it to simultaneously learn the representation of the user's latent vector and temporal behavior characteristics, specifically including:

步骤4.1：定义联合编码器：编码器将使用W和H^T作为输入，从而学习用户和时序特征的共同隐向量Z。编码器形式为：Step 4.1: Define the joint encoder: The encoder will use W and ^HT as input to learn the common latent vector Z of user and time series features. The encoder form is:

其中，in,

是编码器针对W矩阵的权重矩阵。 is the encoder's weight matrix for the W matrix.

是编码器针对W矩阵的偏置向量。 is the encoder bias vector for the W matrix.

是编码器针对H^T矩阵的权重矩阵。 is the weight matrix of the encoder for the ^HT matrix.

是编码器针对H^T矩阵的偏置向量。 is the encoder bias vector for the ^HT matrix.

是编码器针对W的编码函数。 is the encoding function of the encoder for W.

是编码器针对H^T的编码函数。 is the encoding function of the encoder for ^HT .

σ：是激活函数Sigmoid。σ: is the activation function Sigmoid.

然后将Z_W和Z_H融合得到联合隐向量Z：Z＝Z_W⊙Z_H，其中⊙为对Z_W和Z_H进行哈达玛积(逐元素相乘)得到联合隐向量对。Then Z _W and Z _H are fused to obtain the joint latent vector Z: Z = Z _W ⊙ Z _H , where ⊙ is the Hadamard product (element-wise multiplication) of Z _W and Z _H to obtain the joint latent vector pair.

步骤4.2：定义联合解码器：解码器的目标是分别重建W和H^T：Step 4.2: Define the joint decoder: The goal of the decoder is to reconstruct W and ^HT separately:

其中，in,

是解码器针对W的权重矩阵。 is the decoder weight matrix for W.

是解码器针对W的偏置向量。 is the decoder bias vector for W.

是解码器针对H^T的权重矩阵。 is the weight matrix of the decoder for ^HT .

是解码器针对H^T的偏置向量。 is the decoder bias vector for ^HT .

是解码器针对W的解码函数。 is the decoding function of the decoder for W.

是解码器针对H^T的解码函数。 is the decoding function of the decoder for ^HT .

步骤4.3：定义损失函数：损失函数现同时考虑W和H^T的重建误差，具体为：Step 4.3: Define the loss function: The loss function now considers the reconstruction errors of W and ^HT at the same time, specifically:

其中：in:

L：表示整体的损失函数。L: represents the overall loss function.

α：是权衡W重建误差的权重系数。α: is the weight coefficient for weighing the reconstruction error of W.

β：是权衡H^T重建误差的权重系数。β: is the weight coefficient for weighing the ^HT reconstruction error.

N_W：是W矩阵中的样本数量。N _W : is the number of samples in the W matrix.

N_H：是H^T矩阵中的样本数量。 _NH : is the number of samples in the ^HT matrix.

步骤4.4：训练自编码器，包括使用梯度下降算法更新编码器和解码器的所有参数： Step 4.4: Train the autoencoder, which involves updating all the parameters of the encoder and decoder using the gradient descent algorithm:

在自编码器的训练过程中，以下参数会通过梯度下降算法进行更新：During the training of the autoencoder, the following parameters are updated using the gradient descent algorithm:

编码器的权重矩阵，用于商品特征矩阵W。 The encoder weight matrix is used for the product feature matrix W.

编码器的偏置向量，用于商品特征矩阵W。 The encoder bias vector is used for the product feature matrix W.

解码器的权重矩阵，用于重构商品特征矩阵W。 The weight matrix of the decoder is used to reconstruct the product feature matrix W.

解码器的偏置向量，用于重构商品特征矩阵W。 The decoder’s bias vector is used to reconstruct the product feature matrix W.

编码器的权重矩阵，用于特征一时间矩阵H。 The encoder weight matrix is used for the feature-time matrix H.

编码器的偏置向量，用于特征-时间矩阵H。 The encoder bias vector for the feature-time matrix H.

解码器的权重矩阵，用于重构特征一时间矩阵H。 The decoder weight matrix is used to reconstruct the feature-time matrix H.

解码器的偏置向量，用于重构特征-时间矩阵H。 The bias vector of the decoder is used to reconstruct the feature-time matrix H.

更新这些参数的目的是最小化损失函数L，这通常是通过计算损失函数关于每个参数的梯度并进行参数更新来实现的。The purpose of updating these parameters is to minimize the loss function L, which is usually achieved by calculating the gradient of the loss function with respect to each parameter and performing parameter updates.

步骤4.5：训练完成后，可以通过编码器提取联合隐向量Z：Step 4.5: After training, the joint latent vector Z can be extracted through the encoder:

Z＝Z_W⊙Z_H，Z＝Z _W ⊙ Z _H ,

其中，Z_W和Z_H分别是从商品特征矩阵W和特征一时间矩阵H通过编码器得到的隐向量，⊙表示逐元素相乘(哈达玛积)。这样，Z就是将Z_W和Z_H结合起来的联合隐向量，它融合了两个隐向量的信息。Among them, Z _W and Z _H are the latent vectors obtained from the product feature matrix W and the feature-time matrix H through the encoder, respectively, and ⊙ represents element-by-element multiplication (Hadamard product). In this way, Z is the joint latent vector that combines Z _W and Z _H , which integrates the information of the two latent vectors.

之后在步骤5中，将使用联合隐向量Z作为特征来训练下游的行为预测模型。Then in step 5, the joint latent vector Z will be used as a feature to train the downstream behavior prediction model.

步骤5，下游行为预测，将自编码网络学习到的用户隐向量表示作为下游任务的特征输入。Step 5: downstream behavior prediction, using the user latent vector representation learned by the autoencoder network as the feature input of the downstream task.

在某一实施例中，基于自编码网络学习到的用户隐向量预测用户的购买次数，包括：In one embodiment, predicting the number of purchases of a user based on the user latent vector learned by the autoencoder network includes:

使用隐向量Z作为线性回归模型的输入特征来预测购买次数。其中，所述线性回归模型为：The latent vector Z is used as the input feature of the linear regression model to predict the number of purchases. The linear regression model is:

其中，in,

是预测的购买次数。 is the predicted number of purchases.

Z是由商品特征Z_W和时间特征Z_H结合得到的联合隐向量。Z is the joint latent vector obtained by combining the product feature Z _W and the time feature Z _H.

θ是模型的权重参数向量。θ is the weight parameter vector of the model.

γ是模型的偏置参数。γ is the bias parameter of the model.

基于均方误差(MSE)作为损失函数，定义如下： Based on the mean square error (MSE) as the loss function, it is defined as follows:

通过最小化损失函数来估计θ和γ。使用梯度下降法来优化优化算法，参数更新规则如下：θ and γ are estimated by minimizing the loss function. The gradient descent method is used to optimize the optimization algorithm, and the parameter update rule is as follows:

在某一实施例中，线性回归模型的训练步骤包括：初始化参数：θ可以随机初始化，γ初始化为0。设定学习率：选择一个合适的学习率α。迭代更新：通过梯度下降法更新参数θ和γ。In one embodiment, the linear regression model The training steps include: Initialize parameters: θ can be randomly initialized, and γ is initialized to 0. Set learning rate: Choose a suitable learning rate α. Iterative update: Update parameters θ and γ by gradient descent.

本发明采用非负矩阵分解方法，可以在保持数据非负性的前提下，将原始稀疏矩阵分解为低维度的特征矩阵，有效缓解数据稀疏性问题。通过自编码器网络学习用户行为序列的隐向量表示，本发明能够学习到用户和时序特征的共同隐向量，使得特征表示更加丰富和准确。本发明利用时间序列和用户行为数据，结合深度学习方法，使得预测模型能够更精确地捕获用户兴趣和行为模式，从而提高预测的准确性。The present invention adopts a non-negative matrix decomposition method, which can decompose the original sparse matrix into a low-dimensional feature matrix while maintaining the non-negativity of the data, effectively alleviating the data sparsity problem. By learning the latent vector representation of the user behavior sequence through the autoencoder network, the present invention can learn the common latent vector of the user and time series features, making the feature representation richer and more accurate. The present invention uses time series and user behavior data, combined with deep learning methods, so that the prediction model can more accurately capture user interests and behavior patterns, thereby improving the accuracy of the prediction.

需要说明的是，本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中，计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：电线、光缆、RF(射频)等等，或者上述的任意合适的组合。It should be noted that the computer-readable medium disclosed above may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium containing or storing a program that may be used by or in combination with an instruction execution system, device or device. In the present disclosure, a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, in which a computer-readable program code is carried. This propagated data signal may take a variety of forms, including but not limited to an electromagnetic signal, an optical signal, or any suitable combination of the above. The computer readable signal medium may also be any computer readable medium other than a computer readable storage medium, which may send, propagate or transmit a program for use by or in conjunction with an instruction execution system, apparatus or device. The program code contained on the computer readable medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.

上述计算机可读介质可以是上述电子设备中所包含的；也可以是单独存在，而未装配入该电子设备中。The computer-readable medium may be included in the electronic device, or may exist independently without being installed in the electronic device.

可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码，上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++，还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, or a combination thereof, including object-oriented programming languages, such as Java, Smalltalk, C++, and conventional procedural programming languages, such as "C" or similar programming languages. The program code may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., through the Internet using an Internet service provider).

附图中的流程图和框图，图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flow chart and block diagram in the accompanying drawings illustrate the possible architecture, function and operation of the system, method and computer program product according to various embodiments of the present disclosure. In this regard, each square box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function. It should also be noted that in some implementations as replacements, the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two square boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.

描述于本公开实施例中所涉及到的单元可以通过软件的方式实现，也可以通过硬件的方式来实现。其中，单元的名称在某种情况下并不构成对该单元本身的限定。The units involved in the embodiments described in the present disclosure may be implemented by software or hardware, wherein the name of a unit does not, in some cases, constitute a limitation on the unit itself.

以上介绍了本发明的较佳实施方式，旨在使得本发明的精神更加清楚和便于理解，并不是为了限制本发明，凡在本发明的精神和原则之内，所做的修改、替换、改进，均应包含在本发明所附的权利要求概括的保护范围之内。The above introduces the preferred embodiments of the present invention, which is intended to make the spirit of the present invention clearer and easier to understand, but is not intended to limit the present invention. All modifications, substitutions, and improvements made within the spirit and principles of the present invention should be included in the scope of protection outlined by the claims attached to the present invention.

Claims

1. A deep learning driven customer behavior prediction method, the method comprising:

Step 1: Obtain historical interaction data such as the number of purchases, views, and ratings of products by users during the data collection phase to construct an interaction matrix;

Step 2, performing matrix decomposition on the interaction matrix, converting the three-dimensional interaction matrix into a two-dimensional matrix, and applying a non-negative matrix decomposition method to decompose the two-dimensional matrix into two non-negative matrices;

Step 3, construct two feature matrices based on the interaction submatrix results of non-negative matrix decomposition, and construct the user feature matrix W' and the time feature matrix H' respectively;

Step 4: compress and learn the latent vector representation of the user behavior sequence based on the autoencoder network, including learning the common latent vector of the user and time series features based on the joint autoencoder of the user feature matrix W' and the time feature matrix H';

Among them, the latent vector fusion is fused through the Hadamard product;

Step 5: Use the autoencoder network to learn the joint latent vector as input to obtain behavior predictions for downstream tasks.

2. The deep learning driven customer behavior prediction method according to claim 1, characterized in that the step 1 specifically comprises initializing a three-dimensional matrix M, whose dimensions are determined by the number of products I, the number of features F and the number of time points T;

During the matrix initialization phase, all element values are set to zero;

Afterwards, the matrix is filled in based on the collected data, corresponding to the values of the specific product’s features such as price, number of views, number of purchases, and ratings at different points in time.

3. The deep learning-driven customer behavior prediction method as described in claim 1 is characterized in that step 2 also includes constructing a decomposed interaction sub-matrix for each user based on the optimization target after the attenuation factor and confidence adjustment during the interaction matrix decomposition process.

4. The deep learning driven customer behavior prediction method as described in claim 1 is characterized in that the feature matrix W' in step 3 is the W matrix obtained directly from non-negative matrix decomposition, and the feature matrix H' is constructed by transposing the H matrix obtained by non-negative matrix decomposition.

5. The deep learning driven customer behavior prediction method according to claim 1, wherein the goal of the downstream task in step 5 includes predicting the number of purchases of the user;

And, use the latent vector Z as the input feature of the linear regression model to predict the number of purchases.

6. The deep learning driven customer behavior prediction method according to claim 2, characterized in that a three-dimensional matrix M is created, whose dimensions are the number of products I, the number of features F and the number of time points T, and each element in the three-dimensional matrix is initially set to 0;

Where i∈I, f∈F and t∈T, i∈{1, 2, ..., I}: product index; f∈{1, 2, 3, 4}: feature index, corresponding to price, number of views, number of purchases, and ratings respectively; t∈{1, 2, ..., T}: time index; M[i][f][t]: element of the three-dimensional matrix, representing the value of the fth feature of the i-th product at time t;

Fill the three-dimensional matrix M of size I×F×T, and the filling elements are represented by _mift ;

For each product i, feature f, and time point t, fill in the corresponding elements in the matrix:

m _i1t : the price of commodity i at time t;

m _i2t : the number of views of product i at time t;

m _i3t : the number of times product i is purchased at time t;

m _i4t : The rating of item i at time t.

7. The deep learning driven customer behavior prediction method according to claim 3, characterized in that, for a time point t, the decay factor δ(t) can be defined as: δ(t) = e ^-λ(Tt) ;

in:

λ is the decay rate parameter, which determines how fast the interest decays over time;

T is the most recent time point, used to normalize time decay;

t is the current time point.

8. The deep learning driven customer behavior prediction method according to claim 3, characterized in that for product i, feature f and time point t, the confidence weight c _ift based on user interaction is defined as: c _ift = 1 + α·r _ift ,

in:

R _ift is the user's rating of feature f of product i at time t;

α is a hyperparameter used to adjust the influence of confidence weights.

9. The deep learning driven customer behavior prediction method according to claim 3, characterized in that a decomposed interaction submatrix is constructed for each user based on the optimization objective after the attenuation factor and the confidence adjustment, and the optimization objective function is specifically:

in:

m _ift is the value of feature f of product i at time t in the original three-dimensional matrix M;

[W·H] _ift is the corresponding element in the approximation matrix WH;

c _ift is confidence weighting;

δ(t) is the time decay factor.

10. The deep learning driven customer behavior prediction method according to claim 1, characterized in that the step 4 comprises defining the autoencoder network structure, determining the hidden layer dimension, determining the loss function, training the autoencoder, and extracting the hidden vector after the training is completed;

The feature matrix is input, the latent vector is generated by the encoder, and then the output is reconstructed by the decoder, the loss is calculated, and the network weights are updated through back propagation to train the autoencoder.