Nothing Special   »   [go: up one dir, main page]

CN113469730A - Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene - Google Patents

Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene Download PDF

Info

Publication number
CN113469730A
CN113469730A CN202110637643.7A CN202110637643A CN113469730A CN 113469730 A CN113469730 A CN 113469730A CN 202110637643 A CN202110637643 A CN 202110637643A CN 113469730 A CN113469730 A CN 113469730A
Authority
CN
China
Prior art keywords
repurchase
lightgbm
prediction
model
fusion model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110637643.7A
Other languages
Chinese (zh)
Other versions
CN113469730B (en
Inventor
吴军
杨李平
牛夏夏
石力
李圆圆
孙李傲
宋鑫玉
郝伟怡
宋思聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Chemical Technology
Original Assignee
Beijing University of Chemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Chemical Technology filed Critical Beijing University of Chemical Technology
Priority to CN202110637643.7A priority Critical patent/CN113469730B/en
Priority claimed from CN202110637643.7A external-priority patent/CN113469730B/en
Publication of CN113469730A publication Critical patent/CN113469730A/en
Application granted granted Critical
Publication of CN113469730B publication Critical patent/CN113469730B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Software Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a client repurchase prediction method and device based on an RF-LightGBM fusion model under a non-contract scene. The method comprises the following steps: acquiring historical data of a user, and performing preprocessing and characteristic engineering on the historical data; taking the data after data preprocessing as a sample, and balancing a sample set by utilizing a SMOTE-ENN method; carrying out hyper-parameter optimization on a random forest algorithm and a LightGBM algorithm through a TPE optimization algorithm to construct a weak classifier; and performing ensemble learning on the training samples through the weak classifiers to obtain a strong classifier, and obtaining a final result about the repurchase prediction. The method analyzes according to the consumption data of the clients purchased by the enterprise, accurately predicts the repurchase behavior of the existing clients, guides the client relationship management decision and the accurate marketing strategy according to the repurchase behavior, improves the marketing conversion rate and reduces the enterprise operation cost.

Description

Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene
Technical Field
The invention relates to the technical field of computers, in particular to a customer repurchase prediction method and device based on an RF-LightGBM fusion model under a non-contract scene.
Background
With the advent of the big data age, predicting future purchasing intentions of consumers from massive historical consumer transaction data has become an important issue in enterprise management. The prediction of the client repeated purchasing behavior under the non-contract scene mainly refers to the prediction of the repeated purchasing behavior of the next time the client purchases the enterprise product under the situation that the enterprise and the client do not sign a purchase contract. The consumers with repeated purchasing intention can be accurately predicted, the customer demands can be more accurately matched through accurate marketing, the value of the new consumers is improved, and the new consumers are converted into faithful customers.
In the prior art, a chinese patent of invention (No. CN109146533B) discloses an information push method and apparatus, which specifically disclose obtaining at least two pieces of order information of a user for an item of the same item type, determining an average daily consumption of the user for an interval of the item type based on a purchase amount in the at least two pieces of order information, and determining a push date for pushing item information associated with the item of the item type to a user terminal of the user based on the average daily consumption and a purchase amount corresponding to a latest order, thereby improving effectiveness of information push. The chinese invention patent (publication No. CN108171530B) discloses a method and a device for increasing the unit price and the repurchase rate of customers, which comprises: selecting historical marketing data of a target store to obtain a historical marketing campaign effect, and obtaining a marketing campaign effect estimation initial value of the target store according to the historical marketing data and the historical marketing campaign effect; and constructing threshold adjustment factors according to the ratio of the historical marketing activities of all stores meeting the threshold order number and meeting the customer order number, calibrating the pre-estimated marketing activity effect of the target stores by using the threshold adjustment factors, and obtaining the pre-estimated value of the marketing activity effect of the target stores, thereby solving the problem that the marketing activity effect cannot be estimated more accurately according to the change of the threshold in the existing promotion activity effect evaluation technology. Although the product recommendation and the effect prediction are realized according to historical data in the prior art, the customer behavior cannot be accurately predicted.
The existing machine learning method is widely applied to the field of customer behavior prediction, but most of the existing machine learning method focuses on prediction in a shopping mall scene. In the prior art, the chinese invention application (publication No. CN110956497A) discloses a method for predicting a repeat purchasing behavior of an e-commerce platform user, comprising: the method comprises the steps of obtaining historical purchasing behavior data of a user, fusing a deep Catboost individual model, a double-layer attention BiGRU individual model and a DeepGBM individual model, modeling discrete purchasing record numerical values and behavior sequence characteristics in the historical purchasing data of the user, and improving accuracy of a prediction result. The Chinese invention application (publication number CN108520469A) discloses a user re-purchasing behavior analysis method based on an e-commerce platform, which selects effective purchasing records of users in a statistical period; carrying out data cleaning; marking a label of whether the purchase is repeated or not, a label of whether the purchase is repeated for a platform or a label of whether the purchase is repeated for a dangerous seed or not on each effective purchase record; counting the total number of purchasing users, the number of repeated purchasing users, the total number of purchasing users of each platform, the total number of repeated purchasing users of each platform, the total number of purchasing users of each dangerous type and the total number of repeated purchasing users of each dangerous type; and calculating the repeated purchase rate, the platform repeated purchase rate and the dangerous seed repeated purchase rate in the statistical time period. However, in the e-market scenario, the "implicit" feedback behavior of the customer's collection, praise, etc. can be retained, which is not available in the broader non-contract scenario. And the machine learning algorithm is mainly used for algorithm integration at present, so that the influence of the data set on the prediction result is ignored. Generally, in a purchasing situation, users who purchase repeatedly are less than users who purchase once, and thus, the problem of data category imbalance exists, which often causes overfitting of a model and causes low prediction accuracy.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a client repurchase prediction method and a client repurchase prediction device based on an RF-LightGBM fusion model under a non-contract scene, and the invention adopts the following technical scheme:
a customer repurchase prediction method based on an RF-LightGBM fusion model under a non-contract scene comprises the following steps:
acquiring historical purchase record data of a user, preprocessing the historical purchase record data and extracting features;
balancing the data subjected to the feature extraction by using a sample balancing method to obtain a balanced sample;
training sample data by using an optimization algorithm, and performing iterative optimization on the weak classifier in a specified weak classifier hyperparametric space;
performing ensemble learning to obtain a strong classifier by giving the same weight to each weak classifier;
predicting by using a strong classifier to obtain final results of product recommendation and repurchase behavior prediction;
and pushing product information to the terminal equipment of the user and/or sending a re-purchasing behavior prediction result to a management system according to the final result.
Further, the extracting features includes:
time of last purchase, frequency of purchases, total amount of purchases, duration of relationship, purchase interval.
Further, the sample equalization method comprises:
generating a few samples of the extracted features by using a SMOTE oversampling method, judging the generated samples by using an ENN (edited KNN) method, and removing the samples if the prediction result is different from the actual class label to obtain balanced samples.
Further, the optimization algorithm comprises:
and optimizing the model hyper-parameters by using a TPE (Tree-structured park Estimator) Tree-shaped park estimation optimization algorithm, and training the model under the condition of the optimal hyper-parameters.
Further, the weak analyzer comprises a random forest RF (random forest) model and a Light GBM model, the output results of the weak analyzer are classification probability values, and the mathematical expression is as follows:
Figure BDA0003105816590000031
in the formula, NtreeIs the total number of decision trees, hiFor the ith decision tree, P (x | y) represents the probability that the prediction sample x belongs to the class y.
Further, the ensemble learning specifically includes:
the RF model and the Light GBM model are given the same weight, and are integrated by using a Soft Voting (Soft Voting) method on the basis of the prediction probability, and the mathematical expression form is as follows:
PSoft Voting=(PRF+PLightGBM)/2
Figure BDA0003105816590000041
wherein, PSoft VotingPrediction probability, P, for a soft voting fusion modelRF,PLightGBMThe prediction probabilities of the random forest and the LightGBM are respectively represented, Result represents the prediction Result of the fusion model, 1 represents that the user belongs to the repurchase type, 0 represents that the user belongs to the non-repurchase type, and threshold represents the classification threshold.
And further, the method is used as a product recommendation guide based on the repurchase behavior prediction and the repurchase probability prediction.
The invention also provides a client repurchase prediction device based on the RF-LightGBM fusion model under a non-contract scene, which comprises the following components:
the acquisition module is used for acquiring historical purchase record data of a user, preprocessing the historical purchase record data and extracting features;
the balance module is used for balancing the data subjected to the feature extraction by using a sample balance method to obtain a balanced sample;
the optimization training module is used for training sample data by using an optimization algorithm and performing iterative optimization on the weak classifier in the specified hyper-parameter space of the weak classifier;
the ensemble learning module is used for performing ensemble learning to obtain a strong classifier by endowing the weak classifiers with the same weight;
the prediction module is used for predicting by using the strong classifier to obtain final results of product recommendation and repurchase behavior prediction;
and the pushing module is used for pushing the product information to the terminal equipment of the user according to the final result.
The invention also includes an electronic device comprising:
a processor, and a memory;
the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions to perform a method for forecasting customer buys under a non-contract scenario based on an RF-LightGBM fusion model as described above.
The present invention also includes a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for predicting a customer buyback based on an RF-LightGBM fusion model in a non-contract scenario as described above.
The invention achieves the following beneficial effects: analyzing according to the existing user purchasing behavior records of the enterprise, accurately predicting the existing user re-purchasing condition, and guiding a customer relationship management strategy and a marketing strategy according to the situation, so that the marketing conversion rate is improved, and the related operation cost is reduced; based on the purchasing behavior data of the customers, the re-purchasing behavior of the customers on the commodities is accurately predicted, the actual effective requirements of the customers are met, and meanwhile the enterprise communication cost can be reduced; the enterprise operation strategy is dynamically guided by the data, the data promotes decision making and assists in achieving the product marketing goal, and finally the goal of recommending a proper product to a proper user in an intelligent mode is achieved.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby. It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the embodiment discloses a customer repurchase prediction method based on an RF-LightGBM fusion model in a non-contract scenario, which includes the following steps:
(1) and acquiring historical purchase record data of the user, preprocessing the historical purchase record data and extracting features. The historical purchase record data is data that already exists. The extracting features includes: time of last purchase (R), frequency of purchases (F), total amount of purchases (M), duration of relationship (S), purchase interval (T).
(2) And carrying out sample equalization on the data subjected to the feature extraction by using a SMOTE-ENN method to obtain a model training set. And (4) adopting a multi-time sampling method with replacement for each type of sample in the original sample set to form a test sample.
(3) Training the training sample data by using a TPE optimization algorithm, and performing iterative optimization on the weak classifier in the specified hyper-parameter space of the weak classifier.
(4) And assigning the same weight to each weak classifier, performing ensemble learning to obtain a strong classifier, and obtaining a final result about product recommendation and repeated purchasing behavior retest.
In this embodiment, the step of preprocessing includes: to facilitate computer processing and user tagging, the character type is converted to numerical data, and the numerical data is converted to date type data. The extracted features include a recent purchase time (R), a frequency of purchases (F), a total amount of purchases (M), a relationship duration (S), a purchase interval (T):
a) r: the last consumption time of the product by the client is as follows:
R=Tlast_time-Tplast_time
wherein T islast_timeDenotes the end time of the reference period, Tplast_timeIndicating the time of the last order transaction by the customer for the item within the reference time period.
b) F: the number of purchases made by the customer over the observation period.
c) M: the total purchase amount of the product by the customer is in the following form:
Figure BDA0003105816590000071
where n represents the total number of times consumed by the customer over the reference time period and M represents the amount of a single consumption by the customer.
d) S: refers to the time interval from the first transaction to the last transaction of the client occurring within the reference time, and is in the form of:
S=Tplast_time-Tpfirst_time
wherein T isplast_timeIndicates the time of the last order transaction, T, of the customer for the item within the reference time periodpfirst_timeIndicating the time of the first order trade of the customer for the item within the reference time period.
e) T: the average trade time interval over a period of time for a customer is of the form:
Figure BDA0003105816590000072
the invention provides a method for processing unbalanced samples by adopting a SMOTE-ENN method, which has the advantages of having good effect on the problem of two classifications of only a small number of positive samples and having better performance by comparing different methods. The SMOTE-ENN method comprises the following steps:
(1) SMOTE method (Synthetic Minrity Oversampling Technique):
let A denote a minority of classes, arbitrarily take XiE.g. A, calculating the distance from the sample to all samples in the minority class sample set A by taking the Euclidean distance as a standard to obtain XiK nearest neighbor samples, randomly selecting one sample from the nearest neighbor samples, namely Xij(j ═ 1,2,. n); at XiAnd Xij(j ═ 1, 2.. times, n) are interpolated by random linear interpolation to construct new few samples Yj
Yj=Xi+rand(0,1)×(Xij-Xi)
In the formula, rand (0,1) represents a random number in the interval (0, 1).
(2) ENN method (Edited KNN)
And predicting each sample in the data set ND generated by the SMOTE method by using a K nearest neighbor (K is 5), and rejecting the sample if the prediction result is different from the actual class label. The Euclidean distance is selected as a measurement formula of the KNN algorithm, and the form is as follows:
Figure BDA0003105816590000081
in the formula, x and y represent two different users, and i represents a feature number.
Assigning a hyper-parameter configuration space of the weak classifier, and performing iterative optimization on the parameter space of the assigned weak classifier by adopting a TPE (thermal plastic article-Enn) optimization algorithm on a sample set constructed by the SMOTE-ENN method, wherein the optimization formula is as follows:
x*=arg minx∈χF(x)
wherein F (x) represents the objective function of the weak learner; x is the number of*Is the parameter at which the best results are obtained.
The TPE algorithm density is defined as:
Figure BDA0003105816590000082
wherein l (x) is represented by an observed value { x }iIs less than y*G (x) is the observed value { x }iAn objective function F (x) of y or more*The density composition of (a). Using y*As quantile γ for the observed value y. The Expected Improvement (EI) is:
Figure BDA0003105816590000091
the output result of the random forest model is the average of the probabilities of all decision trees, and the mathematical expression form is as follows:
Figure BDA0003105816590000092
wherein N istreeIs the total number of decision trees, hiFor the ith decision tree, P (x | y) represents the probability that the prediction sample x belongs to the class y.
The LightGBM model also outputs classification probabilities using the method described above.
The RF model and the Light GBM model are given the same weight, and are integrated by using a Soft Voting (Soft Voting) method on the basis of the prediction probability, and the mathematical expression form is as follows:
PSoft Voting=(PRF+PLightGBM)/2
Figure BDA0003105816590000093
wherein, PSoft VotingPrediction probability, P, for a soft voting fusion modelRF,PLightGBMRespectively representing the prediction probabilities of the random forest and the LightGBM model, Result representing the prediction Result of the fusion model,1 represents belonging to a subscriber of the type of repurchase, and 0 represents belonging to a subscriber of the type of non-repurchase. According to the test, the threshold value threshold of the invention is set to be 0.5, the prediction label is 1 when the threshold value threshold is larger than 0.5, and the prediction label is 0 when the threshold value threshold is smaller than 0.5, so that a prediction matrix is obtained
Figure BDA0003105816590000094
Therefore, the forecasting of the repeated purchasing behavior of the customer can be realized.
And pushing product information to the terminal equipment of the user and/or sending a re-purchasing behavior prediction result to a management system according to the final result.
The performance of the invention is measured as follows: the current algorithm uses the values of accuracy rate P, recall rate R and F1 as evaluation indexes, and performs the index calculation through the implementation of the data preprocessing method in the invention, and calculates the evaluation indexes by using the obtained label matrix, wherein the calculation formula is as follows:
Figure BDA0003105816590000101
Figure BDA0003105816590000102
Figure BDA0003105816590000103
the invention has good performance in the multi-channel marketing process of enterprises under a non-contract scene, and by taking the super-commercial power marketing as an example, after the system is applied, the conversion rate of the power marketing can be greatly improved, and more transactions are promoted to be generated. For enterprises, the effects of improving marketing guidance, increasing sales success rate, increasing the amount of finished products and transaction amount, reducing personnel cost and the like can be achieved. The performance on the data set, in particular: (1) on a training set generated by SMOTE-ENN, the model prediction accuracy rate is 98.73%, the recall rate is 99.09%, and the F1 value is 0.9874; (2) on a verification set consisting of real samples, the model prediction accuracy is 87.13%, the recall rate is 95.15%, and the F1 value is 0.8587; (3) the result is better than the prediction performance of the RF and LightGBM single model.
According to the invention, the user behavior characteristics are extracted from the display feedback of the historical purchase record of the customer by improving the classic RFM model to form a sample set, so that the problem that a large amount of implicit feedback is not available in a non-contract scene in the prior art is solved; according to the invention, the problem of data class imbalance of the data set in the prior art is effectively solved through the SMOTE-ENNN sample balancing method; the embodiment result shows that the method has good prediction performance and practical application value.

Claims (10)

1.一种非合同场景下基于RF-LightGBM融合模型的客户复购预测方法,其特征在于,包括如下步骤:1. a customer repurchase prediction method based on RF-LightGBM fusion model under a non-contract scenario, is characterized in that, comprises the steps: 获取用户的历史购买记录数据,对其进行预处理,并提取特征;Obtain the user's historical purchase record data, preprocess it, and extract features; 使用样本均衡方法对经过特征提取后的数据进行平衡,得到均衡后的样本;Use the sample balance method to balance the data after feature extraction to obtain balanced samples; 利用优化算法对训练样本数据进行训练,在指定的弱分类器超参数空间中对弱分类器进行迭代优化;Use the optimization algorithm to train the training sample data, and iteratively optimize the weak classifier in the specified weak classifier hyperparameter space; 通过对各弱分类器赋予相同权重,进行集成学习得到强分类器;A strong classifier is obtained by ensemble learning by assigning the same weight to each weak classifier; 使用强分类器进行预测,得到关于产品推荐、复购行为预测的最终结果;Use strong classifiers to make predictions, and get the final results of product recommendation and repurchase behavior prediction; 根据所述最终结果,向用户的终端设备推送产品信息和/或向管理系统发送复购行为预测结果。According to the final result, the product information is pushed to the user's terminal device and/or the repurchase behavior prediction result is sent to the management system. 2.根据权利要求1所述的非合同场景下基于RF-LightGBM融合模型的客户复购预测方法,其特征在于,所述提取特征包括:2. The method for predicting customer repurchase based on the RF-LightGBM fusion model in a non-contract scenario according to claim 1, wherein the extraction feature comprises: 最近购买时间、购买频次、购买总金额、关系持续时间、购买间隔。Recent purchase time, purchase frequency, total purchase amount, relationship duration, purchase interval. 3.根据权利要求1所述的非合同场景下基于RF-LightGBM融合模型的客户复购预测方法,其特征在于,所述样本均衡方法包括:3. The method for predicting customer repurchase based on the RF-LightGBM fusion model in a non-contract scenario according to claim 1, wherein the sample equalization method comprises: 先使用SMOTE过采样方法对所述提取特征的少数类样本进行生成,再使用ENN(EditedKNN)方法对生成样本进行判断,若预测结果和实际类别标签不同则剔除该样本,得到均衡后的样本。First, the SMOTE oversampling method is used to generate the minority class samples of the extracted features, and then the ENN (EditedKNN) method is used to judge the generated samples. If the predicted result is different from the actual class label, the sample is eliminated to obtain a balanced sample. 4.根据权利要求1所述的非合同场景下基于RF-LightGBM融合模型的客户复购预测方法,其特征在于,所述优化算法包括:4. The method for predicting customer repurchase based on the RF-LightGBM fusion model in a non-contract scenario according to claim 1, wherein the optimization algorithm comprises: 使用TPE(Tree-structured Parzen Estimator)树状Parzen估计优化算法对模型超参数进行优化,在最优超参数情况下进行模型训练。The TPE (Tree-structured Parzen Estimator) tree-structured Parzen Estimator is used to optimize the model hyperparameters, and the model is trained in the case of the optimal hyperparameters. 5.根据权利要求1所述的非合同场景下基于RF-LightGBM融合模型的客户复购预测方法,其特征在于,所述弱分类器包括随机森林RF(Random Forests)模型、Light GBM模型,弱分类器输出结果均为分类概率值,数学表达式为:5. The customer repurchase prediction method based on RF-LightGBM fusion model under non-contract scenario according to claim 1, is characterized in that, described weak classifier comprises random forest RF (Random Forests) model, Light GBM model, weak The output results of the classifier are all classification probability values, and the mathematical expression is:
Figure FDA0003105816580000021
Figure FDA0003105816580000021
式中,Ntree为决策树的总数,hi为第i颗决策树,P(x|y)表示预测样本x属于类别y的概率。In the formula, N tree is the total number of decision trees, hi is the ith decision tree, and P(x|y) represents the probability that the predicted sample x belongs to the category y.
6.根据权利要求1所述的非合同场景下基于RF-LightGBM融合模型的客户复购预测方法,其特征在于,所述集成学习具体包括:6. The method for predicting customer repurchase based on the RF-LightGBM fusion model in a non-contract scenario according to claim 1, wherein the integrated learning specifically comprises: 对RF模型、Light GBM模型赋给相同权重,在其预测概率基础上使用软投票(SoftVoting)方法进行集成,其数学表现形式为:The RF model and the Light GBM model are given the same weight, and the SoftVoting method is used for integration on the basis of their predicted probability. The mathematical expression is as follows: PSoft Voting=(PRF+PLightGBM)/2P Soft Voting = (P RF +P LightGBM )/2
Figure FDA0003105816580000022
Figure FDA0003105816580000022
其中,PSoft Voting指软投票融合模型的预测概率,PRF,PLightGBM分别表示随机森林、LightGBM模型的预测概率,Result表示融合模型的预测结果,1表示属于复购型用户,0表示属于未复购型用户,threshold表示分类阈值。Among them, P Soft Voting refers to the prediction probability of the soft voting fusion model, P RF , P LightGBM represent the prediction probability of the random forest and LightGBM models, respectively, Result represents the prediction result of the fusion model, 1 indicates that it belongs to the repurchasing user, and 0 indicates that it belongs to the non-repurchasing user. For repurchasing users, threshold represents the classification threshold.
7.根据权利要求1所述的非合同场景下基于RF-LightGBM融合模型的客户复购预测方法,其特征在于,基于复购行为预测与复购概率预测作为产品推荐指导。7 . The method for predicting customer repurchase based on the RF-LightGBM fusion model in a non-contract scenario according to claim 1 , wherein the repurchase behavior prediction and the repurchase probability prediction are used as product recommendation guidance. 8 . 8.一种非合同场景下基于RF-LightGBM融合模型的客户复购预测装置,其特征在于,包括:8. A customer repurchase prediction device based on the RF-LightGBM fusion model under a non-contract scenario, characterized in that, comprising: 获取模块,获取用户的历史购买记录数据,对其进行预处理,并提取特征;The acquisition module obtains the user's historical purchase record data, preprocesses it, and extracts features; 平衡模块,使用样本均衡方法对经过特征提取后的数据进行平衡,得到均衡后的样本;The balance module uses the sample balance method to balance the data after feature extraction to obtain balanced samples; 优化训练模块,利用优化算法对训练样本数据进行训练,在指定的弱分类器超参数空间中对弱分类器进行迭代优化;Optimize the training module, use the optimization algorithm to train the training sample data, and iteratively optimize the weak classifier in the specified weak classifier hyperparameter space; 集成学习模块,通过对各弱分类器赋予相同权重,进行集成学习得到强分类器;The ensemble learning module, by assigning the same weight to each weak classifier, performs ensemble learning to obtain a strong classifier; 预测模块,使用强分类器进行预测,得到关于产品推荐、复购行为预测的最终结果;The prediction module uses a strong classifier to make predictions, and obtains the final results of product recommendation and repurchase behavior prediction; 推送模块,根据所述最终结果,向用户的终端设备推送产品信息。The push module pushes the product information to the user's terminal device according to the final result. 9.一种电子设备,其特征在于:9. An electronic device, characterized in that: 包括处理器,以及存储器;including processor, and memory; 所述存储器用于存储计算机可执行指令,所述处理器用于执行所述计算机可执行指令从而执行如权利要求1-7任一项所述的非合同场景下基于RF-LightGBM融合模型的客户复购预测方法。The memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions to execute the client recovery based on the RF-LightGBM fusion model in the non-contract scenario according to any one of claims 1-7. Purchase forecasting method. 10.一种计算机可读存储介质,其上存储有计算机程序,其特征在于:该程序被处理器执行时实现如权利要求1-7中任一所述的非合同场景下基于RF-LightGBM融合模型的客户复购预测方法。10. A computer-readable storage medium on which a computer program is stored, characterized in that: when the program is executed by a processor, the RF-LightGBM-based fusion in a non-contract scenario according to any one of claims 1-7 is realized The model's customer repurchase prediction method.
CN202110637643.7A 2021-06-08 A customer repurchase prediction method and device based on RF-LightGBM fusion model in non-contractual scenarios Active CN113469730B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110637643.7A CN113469730B (en) 2021-06-08 A customer repurchase prediction method and device based on RF-LightGBM fusion model in non-contractual scenarios

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110637643.7A CN113469730B (en) 2021-06-08 A customer repurchase prediction method and device based on RF-LightGBM fusion model in non-contractual scenarios

Publications (2)

Publication Number Publication Date
CN113469730A true CN113469730A (en) 2021-10-01
CN113469730B CN113469730B (en) 2025-02-25

Family

ID=

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114049155A (en) * 2021-11-17 2022-02-15 浙江华坤道威数据科技有限公司 Marketing operation method and system based on big data analysis
CN114511330A (en) * 2022-04-18 2022-05-17 山东省计算中心(国家超级计算济南中心) Improved CNN-RF-based Ethernet workshop Pompe deception office detection method and system
CN114549071A (en) * 2022-02-18 2022-05-27 上海钧正网络科技有限公司 Marketing strategy determination method and device, computer equipment and storage medium
CN114863341A (en) * 2022-05-17 2022-08-05 济南大学 Online course learning supervision method and system
CN114970700A (en) * 2022-05-12 2022-08-30 安徽华云安科技有限公司 Information Classification Method and Device Based on Region Buffering KNN Algorithm
CN115204537A (en) * 2022-09-17 2022-10-18 华北理工大学 Bagging-based student achievement prediction method
CN117114807A (en) * 2023-08-24 2023-11-24 众合九通(北京)电子科技有限公司 Commodity recommendation method and system based on user relationship
CN117593044A (en) * 2024-01-18 2024-02-23 青岛网信信息科技有限公司 Dual-angle marketing campaign effect prediction method, medium and system

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107016569A (en) * 2017-03-21 2017-08-04 聚好看科技股份有限公司 The targeted customer's account acquisition methods and device of a kind of networking products
CN107294993A (en) * 2017-07-05 2017-10-24 重庆邮电大学 A kind of WEB abnormal flow monitoring methods based on integrated study
WO2018069817A1 (en) * 2016-10-10 2018-04-19 Tata Consultancy Services Limited System and method for predicting repeat behavior of customers
CN108171530A (en) * 2017-12-06 2018-06-15 口碑(上海)信息技术有限公司 It is a kind of to be used for visitor's unit price and the again method for improving and device of purchase rate
CN108520469A (en) * 2018-06-19 2018-09-11 南京新贝金服科技有限公司 A kind of user based on electric business platform purchases behavior analysis method again
CN108776922A (en) * 2018-06-04 2018-11-09 北京至信普林科技有限公司 Finance product based on big data recommends method and device
CN110210913A (en) * 2019-06-14 2019-09-06 重庆邮电大学 A kind of businessman frequent customer's prediction technique based on big data
CN110322085A (en) * 2018-03-29 2019-10-11 北京九章云极科技有限公司 A kind of customer churn prediction method and apparatus
CN110599336A (en) * 2018-06-13 2019-12-20 北京九章云极科技有限公司 Financial product purchase prediction method and system
CN110956497A (en) * 2019-11-27 2020-04-03 桂林电子科技大学 A method for predicting repeated purchase behavior of e-commerce platform users
CN111008871A (en) * 2019-12-10 2020-04-14 重庆锐云科技有限公司 Real estate repurchase customer follow-up quantity calculation method, device and storage medium
CN111045716A (en) * 2019-11-04 2020-04-21 中山大学 A Relevant Patch Recommendation Method Based on Heterogeneous Data
CN111899055A (en) * 2020-07-29 2020-11-06 亿达信息技术有限公司 Machine learning and deep learning-based insurance client repurchase prediction method in big data financial scene

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018069817A1 (en) * 2016-10-10 2018-04-19 Tata Consultancy Services Limited System and method for predicting repeat behavior of customers
CN107016569A (en) * 2017-03-21 2017-08-04 聚好看科技股份有限公司 The targeted customer's account acquisition methods and device of a kind of networking products
CN107294993A (en) * 2017-07-05 2017-10-24 重庆邮电大学 A kind of WEB abnormal flow monitoring methods based on integrated study
CN108171530A (en) * 2017-12-06 2018-06-15 口碑(上海)信息技术有限公司 It is a kind of to be used for visitor's unit price and the again method for improving and device of purchase rate
CN110322085A (en) * 2018-03-29 2019-10-11 北京九章云极科技有限公司 A kind of customer churn prediction method and apparatus
CN108776922A (en) * 2018-06-04 2018-11-09 北京至信普林科技有限公司 Finance product based on big data recommends method and device
CN110599336A (en) * 2018-06-13 2019-12-20 北京九章云极科技有限公司 Financial product purchase prediction method and system
CN108520469A (en) * 2018-06-19 2018-09-11 南京新贝金服科技有限公司 A kind of user based on electric business platform purchases behavior analysis method again
CN110210913A (en) * 2019-06-14 2019-09-06 重庆邮电大学 A kind of businessman frequent customer's prediction technique based on big data
CN111045716A (en) * 2019-11-04 2020-04-21 中山大学 A Relevant Patch Recommendation Method Based on Heterogeneous Data
CN110956497A (en) * 2019-11-27 2020-04-03 桂林电子科技大学 A method for predicting repeated purchase behavior of e-commerce platform users
CN111008871A (en) * 2019-12-10 2020-04-14 重庆锐云科技有限公司 Real estate repurchase customer follow-up quantity calculation method, device and storage medium
CN111899055A (en) * 2020-07-29 2020-11-06 亿达信息技术有限公司 Machine learning and deep learning-based insurance client repurchase prediction method in big data financial scene

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
JAMES BERGSTRA: "Algorithms for hyper-parameter optimization", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 31 December 2011 (2011-12-31), pages 1 *
JAMES BERGSTRA: "Algorithms for hyper-parameter optimization", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, pages 1 *
JUN WU: "User Value Identification Based on Improved RFM Model and -Means++ Algorithm for Complex Data Analysis", WIRELESS COMMUNICATIONS AND MOBILE COMPUTING *
季晨雨;: "不平衡数据分类研究及在银行营销中的应用", 山西电子技术, no. 05 *
张李义;李一然;文璇;: "新消费者重复购买意向预测研究", 数据分析与知识发现, no. 11 *
张浩;陈龙;魏志强: "基于数据增强和模型更新的异常流量检测技术", 信息网络安全, no. 02, 10 February 2020 (2020-02-10), pages 66 *
张浩等: "基于数据增强和模型更新的异常流量检测技术", 信息网络安全, pages 66 *
杨霞霞;苏锋;黄戌霞;: "基于改进随机森林算法的不平衡数据分类方法研究", 网络安全技术与应用, no. 10 *
陶新民等: "不均衡数据SVM分类算法及其应用", 31 October 2011, 黑龙江科学技术出版社, pages: 43 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114049155A (en) * 2021-11-17 2022-02-15 浙江华坤道威数据科技有限公司 Marketing operation method and system based on big data analysis
CN114049155B (en) * 2021-11-17 2022-08-19 浙江华坤道威数据科技有限公司 Marketing operation method and system based on big data analysis
CN114549071A (en) * 2022-02-18 2022-05-27 上海钧正网络科技有限公司 Marketing strategy determination method and device, computer equipment and storage medium
CN114511330A (en) * 2022-04-18 2022-05-17 山东省计算中心(国家超级计算济南中心) Improved CNN-RF-based Ethernet workshop Pompe deception office detection method and system
CN114970700A (en) * 2022-05-12 2022-08-30 安徽华云安科技有限公司 Information Classification Method and Device Based on Region Buffering KNN Algorithm
CN114863341A (en) * 2022-05-17 2022-08-05 济南大学 Online course learning supervision method and system
CN114863341B (en) * 2022-05-17 2024-05-31 济南大学 A method and system for supervising online course learning
CN115204537A (en) * 2022-09-17 2022-10-18 华北理工大学 Bagging-based student achievement prediction method
CN117114807A (en) * 2023-08-24 2023-11-24 众合九通(北京)电子科技有限公司 Commodity recommendation method and system based on user relationship
CN117593044A (en) * 2024-01-18 2024-02-23 青岛网信信息科技有限公司 Dual-angle marketing campaign effect prediction method, medium and system
CN117593044B (en) * 2024-01-18 2024-05-31 青岛网信信息科技有限公司 Dual-angle marketing campaign effect prediction method, medium and system

Similar Documents

Publication Publication Date Title
CN108648074B (en) Loan assessment method, device and equipment based on support vector machine
CN110163647B (en) Data processing method and device
CN112418653A (en) Number portability and network diver identification system and method based on machine learning algorithm
CN106445988A (en) Intelligent big data processing method and system
CN107403345A (en) Best-selling product Forecasting Methodology and system, storage medium and electric terminal
CN109636482B (en) Data processing method and system based on similarity model
CN108921602B (en) User purchasing behavior prediction method based on integrated neural network
CN114612251A (en) Risk assessment method, device, equipment and storage medium
CN112417294A (en) Intelligent business recommendation method based on neural network mining model
CN118037401A (en) Agricultural products e-commerce recommendation system based on knowledge graph
CN112801693A (en) Advertisement characteristic analysis method and system based on high-value user
CN117314593A (en) Insurance item pushing method and system based on user behavior analysis
CN112288554A (en) Commodity recommendation method and device, storage medium and electronic device
Chitra et al. Customer retention in banking sector using predictive data mining technique
CN116456323A (en) A user package recommendation method and system based on user preference decoupling
CN111506813A (en) Remote sensing information accurate recommendation method based on user portrait
CN118569936B (en) Advertisement user analysis method and system
CN118735661A (en) A method and system for optimizing product information display based on real-time user interaction
CN118780899A (en) An e-commerce intelligent customer service product recommendation method based on customer behavior
CN118261637A (en) Agricultural product supply forecasting system and method based on online trading platform
CN115081501A (en) User classification method and device, cascaded user classification model and equipment
CN113763032B (en) Commodity purchase intention recognition method and device
CN116703533A (en) Business management data optimized storage analysis method
CN112150179A (en) Information pushing method and device
CN113469730A (en) Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant