CN113469730A

CN113469730A - Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene

Info

Publication number: CN113469730A
Application number: CN202110637643.7A
Authority: CN
Inventors: 吴军; 杨李平; 牛夏夏; 石力; 李圆圆; 孙李傲; 宋鑫玉; 郝伟怡; 宋思聪
Original assignee: Beijing University of Chemical Technology
Current assignee: Beijing University of Chemical Technology
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2021-10-01
Anticipated expiration: 2041-06-08

Abstract

The invention relates to a client repurchase prediction method and device based on an RF-LightGBM fusion model under a non-contract scene. The method comprises the following steps: acquiring historical data of a user, and performing preprocessing and characteristic engineering on the historical data; taking the data after data preprocessing as a sample, and balancing a sample set by utilizing a SMOTE-ENN method; carrying out hyper-parameter optimization on a random forest algorithm and a LightGBM algorithm through a TPE optimization algorithm to construct a weak classifier; and performing ensemble learning on the training samples through the weak classifiers to obtain a strong classifier, and obtaining a final result about the repurchase prediction. The method analyzes according to the consumption data of the clients purchased by the enterprise, accurately predicts the repurchase behavior of the existing clients, guides the client relationship management decision and the accurate marketing strategy according to the repurchase behavior, improves the marketing conversion rate and reduces the enterprise operation cost.

Description

Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene

Technical Field

The invention relates to the technical field of computers, in particular to a customer repurchase prediction method and device based on an RF-LightGBM fusion model under a non-contract scene.

Background

With the advent of the big data age, predicting future purchasing intentions of consumers from massive historical consumer transaction data has become an important issue in enterprise management. The prediction of the client repeated purchasing behavior under the non-contract scene mainly refers to the prediction of the repeated purchasing behavior of the next time the client purchases the enterprise product under the situation that the enterprise and the client do not sign a purchase contract. The consumers with repeated purchasing intention can be accurately predicted, the customer demands can be more accurately matched through accurate marketing, the value of the new consumers is improved, and the new consumers are converted into faithful customers.

In the prior art, a chinese patent of invention (No. CN109146533B) discloses an information push method and apparatus, which specifically disclose obtaining at least two pieces of order information of a user for an item of the same item type, determining an average daily consumption of the user for an interval of the item type based on a purchase amount in the at least two pieces of order information, and determining a push date for pushing item information associated with the item of the item type to a user terminal of the user based on the average daily consumption and a purchase amount corresponding to a latest order, thereby improving effectiveness of information push. The chinese invention patent (publication No. CN108171530B) discloses a method and a device for increasing the unit price and the repurchase rate of customers, which comprises: selecting historical marketing data of a target store to obtain a historical marketing campaign effect, and obtaining a marketing campaign effect estimation initial value of the target store according to the historical marketing data and the historical marketing campaign effect; and constructing threshold adjustment factors according to the ratio of the historical marketing activities of all stores meeting the threshold order number and meeting the customer order number, calibrating the pre-estimated marketing activity effect of the target stores by using the threshold adjustment factors, and obtaining the pre-estimated value of the marketing activity effect of the target stores, thereby solving the problem that the marketing activity effect cannot be estimated more accurately according to the change of the threshold in the existing promotion activity effect evaluation technology. Although the product recommendation and the effect prediction are realized according to historical data in the prior art, the customer behavior cannot be accurately predicted.

The existing machine learning method is widely applied to the field of customer behavior prediction, but most of the existing machine learning method focuses on prediction in a shopping mall scene. In the prior art, the chinese invention application (publication No. CN110956497A) discloses a method for predicting a repeat purchasing behavior of an e-commerce platform user, comprising: the method comprises the steps of obtaining historical purchasing behavior data of a user, fusing a deep Catboost individual model, a double-layer attention BiGRU individual model and a DeepGBM individual model, modeling discrete purchasing record numerical values and behavior sequence characteristics in the historical purchasing data of the user, and improving accuracy of a prediction result. The Chinese invention application (publication number CN108520469A) discloses a user re-purchasing behavior analysis method based on an e-commerce platform, which selects effective purchasing records of users in a statistical period; carrying out data cleaning; marking a label of whether the purchase is repeated or not, a label of whether the purchase is repeated for a platform or a label of whether the purchase is repeated for a dangerous seed or not on each effective purchase record; counting the total number of purchasing users, the number of repeated purchasing users, the total number of purchasing users of each platform, the total number of repeated purchasing users of each platform, the total number of purchasing users of each dangerous type and the total number of repeated purchasing users of each dangerous type; and calculating the repeated purchase rate, the platform repeated purchase rate and the dangerous seed repeated purchase rate in the statistical time period. However, in the e-market scenario, the "implicit" feedback behavior of the customer's collection, praise, etc. can be retained, which is not available in the broader non-contract scenario. And the machine learning algorithm is mainly used for algorithm integration at present, so that the influence of the data set on the prediction result is ignored. Generally, in a purchasing situation, users who purchase repeatedly are less than users who purchase once, and thus, the problem of data category imbalance exists, which often causes overfitting of a model and causes low prediction accuracy.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a client repurchase prediction method and a client repurchase prediction device based on an RF-LightGBM fusion model under a non-contract scene, and the invention adopts the following technical scheme:

a customer repurchase prediction method based on an RF-LightGBM fusion model under a non-contract scene comprises the following steps:

acquiring historical purchase record data of a user, preprocessing the historical purchase record data and extracting features;

balancing the data subjected to the feature extraction by using a sample balancing method to obtain a balanced sample;

training sample data by using an optimization algorithm, and performing iterative optimization on the weak classifier in a specified weak classifier hyperparametric space;

performing ensemble learning to obtain a strong classifier by giving the same weight to each weak classifier;

predicting by using a strong classifier to obtain final results of product recommendation and repurchase behavior prediction;

and pushing product information to the terminal equipment of the user and/or sending a re-purchasing behavior prediction result to a management system according to the final result.

Further, the extracting features includes:

time of last purchase, frequency of purchases, total amount of purchases, duration of relationship, purchase interval.

Further, the sample equalization method comprises:

generating a few samples of the extracted features by using a SMOTE oversampling method, judging the generated samples by using an ENN (edited KNN) method, and removing the samples if the prediction result is different from the actual class label to obtain balanced samples.

Further, the optimization algorithm comprises:

and optimizing the model hyper-parameters by using a TPE (Tree-structured park Estimator) Tree-shaped park estimation optimization algorithm, and training the model under the condition of the optimal hyper-parameters.

Further, the weak analyzer comprises a random forest RF (random forest) model and a Light GBM model, the output results of the weak analyzer are classification probability values, and the mathematical expression is as follows:

in the formula, N_treeIs the total number of decision trees, h_iFor the ith decision tree, P (x | y) represents the probability that the prediction sample x belongs to the class y.

Further, the ensemble learning specifically includes:

the RF model and the Light GBM model are given the same weight, and are integrated by using a Soft Voting (Soft Voting) method on the basis of the prediction probability, and the mathematical expression form is as follows:

P_{Soft Voting}＝(P_RF+P_LightGBM)/2

wherein, P_{Soft Voting}Prediction probability, P, for a soft voting fusion model_RF,P_LightGBMThe prediction probabilities of the random forest and the LightGBM are respectively represented, Result represents the prediction Result of the fusion model, 1 represents that the user belongs to the repurchase type, 0 represents that the user belongs to the non-repurchase type, and threshold represents the classification threshold.

And further, the method is used as a product recommendation guide based on the repurchase behavior prediction and the repurchase probability prediction.

The invention also provides a client repurchase prediction device based on the RF-LightGBM fusion model under a non-contract scene, which comprises the following components:

the acquisition module is used for acquiring historical purchase record data of a user, preprocessing the historical purchase record data and extracting features;

the balance module is used for balancing the data subjected to the feature extraction by using a sample balance method to obtain a balanced sample;

the optimization training module is used for training sample data by using an optimization algorithm and performing iterative optimization on the weak classifier in the specified hyper-parameter space of the weak classifier;

the ensemble learning module is used for performing ensemble learning to obtain a strong classifier by endowing the weak classifiers with the same weight;

the prediction module is used for predicting by using the strong classifier to obtain final results of product recommendation and repurchase behavior prediction;

and the pushing module is used for pushing the product information to the terminal equipment of the user according to the final result.

The invention also includes an electronic device comprising:

a processor, and a memory;

the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions to perform a method for forecasting customer buys under a non-contract scenario based on an RF-LightGBM fusion model as described above.

The present invention also includes a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for predicting a customer buyback based on an RF-LightGBM fusion model in a non-contract scenario as described above.

The invention achieves the following beneficial effects: analyzing according to the existing user purchasing behavior records of the enterprise, accurately predicting the existing user re-purchasing condition, and guiding a customer relationship management strategy and a marketing strategy according to the situation, so that the marketing conversion rate is improved, and the related operation cost is reduced; based on the purchasing behavior data of the customers, the re-purchasing behavior of the customers on the commodities is accurately predicted, the actual effective requirements of the customers are met, and meanwhile the enterprise communication cost can be reduced; the enterprise operation strategy is dynamically guided by the data, the data promotes decision making and assists in achieving the product marketing goal, and finally the goal of recommending a proper product to a proper user in an intelligent mode is achieved.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby. It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

As shown in fig. 1, the embodiment discloses a customer repurchase prediction method based on an RF-LightGBM fusion model in a non-contract scenario, which includes the following steps:

(1) and acquiring historical purchase record data of the user, preprocessing the historical purchase record data and extracting features. The historical purchase record data is data that already exists. The extracting features includes: time of last purchase (R), frequency of purchases (F), total amount of purchases (M), duration of relationship (S), purchase interval (T).

(2) And carrying out sample equalization on the data subjected to the feature extraction by using a SMOTE-ENN method to obtain a model training set. And (4) adopting a multi-time sampling method with replacement for each type of sample in the original sample set to form a test sample.

(3) Training the training sample data by using a TPE optimization algorithm, and performing iterative optimization on the weak classifier in the specified hyper-parameter space of the weak classifier.

(4) And assigning the same weight to each weak classifier, performing ensemble learning to obtain a strong classifier, and obtaining a final result about product recommendation and repeated purchasing behavior retest.

In this embodiment, the step of preprocessing includes: to facilitate computer processing and user tagging, the character type is converted to numerical data, and the numerical data is converted to date type data. The extracted features include a recent purchase time (R), a frequency of purchases (F), a total amount of purchases (M), a relationship duration (S), a purchase interval (T):

a) r: the last consumption time of the product by the client is as follows:

R＝T_{last_time}-T_{plast_time}

wherein T is_{last_time}Denotes the end time of the reference period, T_{plast_time}Indicating the time of the last order transaction by the customer for the item within the reference time period.

b) F: the number of purchases made by the customer over the observation period.

c) M: the total purchase amount of the product by the customer is in the following form:

where n represents the total number of times consumed by the customer over the reference time period and M represents the amount of a single consumption by the customer.

d) S: refers to the time interval from the first transaction to the last transaction of the client occurring within the reference time, and is in the form of:

S＝T_{plast_time}-T_{pfirst_time}

wherein T is_{plast_time}Indicates the time of the last order transaction, T, of the customer for the item within the reference time period_{pfirst_time}Indicating the time of the first order trade of the customer for the item within the reference time period.

e) T: the average trade time interval over a period of time for a customer is of the form:

the invention provides a method for processing unbalanced samples by adopting a SMOTE-ENN method, which has the advantages of having good effect on the problem of two classifications of only a small number of positive samples and having better performance by comparing different methods. The SMOTE-ENN method comprises the following steps:

(1) SMOTE method (Synthetic Minrity Oversampling Technique):

let A denote a minority of classes, arbitrarily take X_iE.g. A, calculating the distance from the sample to all samples in the minority class sample set A by taking the Euclidean distance as a standard to obtain X_iK nearest neighbor samples, randomly selecting one sample from the nearest neighbor samples, namely X_ij(j ═ 1,2,. n); at X_iAnd X_ij(j ═ 1, 2.. times, n) are interpolated by random linear interpolation to construct new few samples Y_j：

Y_j＝X_i+rand(0,1)×(X_ij-X_i)

In the formula, rand (0,1) represents a random number in the interval (0, 1).

(2) ENN method (Edited KNN)

And predicting each sample in the data set ND generated by the SMOTE method by using a K nearest neighbor (K is 5), and rejecting the sample if the prediction result is different from the actual class label. The Euclidean distance is selected as a measurement formula of the KNN algorithm, and the form is as follows:

in the formula, x and y represent two different users, and i represents a feature number.

Assigning a hyper-parameter configuration space of the weak classifier, and performing iterative optimization on the parameter space of the assigned weak classifier by adopting a TPE (thermal plastic article-Enn) optimization algorithm on a sample set constructed by the SMOTE-ENN method, wherein the optimization formula is as follows:

x^*＝arg min_x∈χF(x)

wherein F (x) represents the objective function of the weak learner; x is the number of^*Is the parameter at which the best results are obtained.

The TPE algorithm density is defined as:

wherein l (x) is represented by an observed value { x }ⁱIs less than y^*G (x) is the observed value { x }ⁱAn objective function F (x) of y or more^*The density composition of (a). Using y^*As quantile γ for the observed value y. The Expected Improvement (EI) is:

the output result of the random forest model is the average of the probabilities of all decision trees, and the mathematical expression form is as follows:

wherein N is_treeIs the total number of decision trees, h_iFor the ith decision tree, P (x | y) represents the probability that the prediction sample x belongs to the class y.

The LightGBM model also outputs classification probabilities using the method described above.

P_{Soft Voting}＝(P_RF+P_LightGBM)/2

wherein, P_{Soft Voting}Prediction probability, P, for a soft voting fusion model_RF,P_LightGBMRespectively representing the prediction probabilities of the random forest and the LightGBM model, Result representing the prediction Result of the fusion model,1 represents belonging to a subscriber of the type of repurchase, and 0 represents belonging to a subscriber of the type of non-repurchase. According to the test, the threshold value threshold of the invention is set to be 0.5, the prediction label is 1 when the threshold value threshold is larger than 0.5, and the prediction label is 0 when the threshold value threshold is smaller than 0.5, so that a prediction matrix is obtained

Therefore, the forecasting of the repeated purchasing behavior of the customer can be realized.

The performance of the invention is measured as follows: the current algorithm uses the values of accuracy rate P, recall rate R and F1 as evaluation indexes, and performs the index calculation through the implementation of the data preprocessing method in the invention, and calculates the evaluation indexes by using the obtained label matrix, wherein the calculation formula is as follows:

the invention has good performance in the multi-channel marketing process of enterprises under a non-contract scene, and by taking the super-commercial power marketing as an example, after the system is applied, the conversion rate of the power marketing can be greatly improved, and more transactions are promoted to be generated. For enterprises, the effects of improving marketing guidance, increasing sales success rate, increasing the amount of finished products and transaction amount, reducing personnel cost and the like can be achieved. The performance on the data set, in particular: (1) on a training set generated by SMOTE-ENN, the model prediction accuracy rate is 98.73%, the recall rate is 99.09%, and the F1 value is 0.9874; (2) on a verification set consisting of real samples, the model prediction accuracy is 87.13%, the recall rate is 95.15%, and the F1 value is 0.8587; (3) the result is better than the prediction performance of the RF and LightGBM single model.

According to the invention, the user behavior characteristics are extracted from the display feedback of the historical purchase record of the customer by improving the classic RFM model to form a sample set, so that the problem that a large amount of implicit feedback is not available in a non-contract scene in the prior art is solved; according to the invention, the problem of data class imbalance of the data set in the prior art is effectively solved through the SMOTE-ENNN sample balancing method; the embodiment result shows that the method has good prediction performance and practical application value.

Claims

1. a customer repurchase prediction method based on RF-LightGBM fusion model under a non-contract scenario, is characterized in that, comprises the steps:

Obtain the user's historical purchase record data, preprocess it, and extract features;

Use the sample balance method to balance the data after feature extraction to obtain balanced samples;

Use the optimization algorithm to train the training sample data, and iteratively optimize the weak classifier in the specified weak classifier hyperparameter space;

A strong classifier is obtained by ensemble learning by assigning the same weight to each weak classifier;

Use strong classifiers to make predictions, and get the final results of product recommendation and repurchase behavior prediction;

According to the final result, the product information is pushed to the user's terminal device and/or the repurchase behavior prediction result is sent to the management system.

2. The method for predicting customer repurchase based on the RF-LightGBM fusion model in a non-contract scenario according to claim 1, wherein the extraction feature comprises:

Recent purchase time, purchase frequency, total purchase amount, relationship duration, purchase interval.

3. The method for predicting customer repurchase based on the RF-LightGBM fusion model in a non-contract scenario according to claim 1, wherein the sample equalization method comprises:

First, the SMOTE oversampling method is used to generate the minority class samples of the extracted features, and then the ENN (EditedKNN) method is used to judge the generated samples. If the predicted result is different from the actual class label, the sample is eliminated to obtain a balanced sample.

4. The method for predicting customer repurchase based on the RF-LightGBM fusion model in a non-contract scenario according to claim 1, wherein the optimization algorithm comprises:

The TPE (Tree-structured Parzen Estimator) tree-structured Parzen Estimator is used to optimize the model hyperparameters, and the model is trained in the case of the optimal hyperparameters.

5. The customer repurchase prediction method based on RF-LightGBM fusion model under non-contract scenario according to claim 1, is characterized in that, described weak classifier comprises random forest RF (Random Forests) model, Light GBM model, weak The output results of the classifier are all classification probability values, and the mathematical expression is:

In the formula, N _tree is the total number of decision trees, hi is the _ith decision tree, and P(x|y) represents the probability that the predicted sample x belongs to the category y.

6. The method for predicting customer repurchase based on the RF-LightGBM fusion model in a non-contract scenario according to claim 1, wherein the integrated learning specifically comprises:

The RF model and the Light GBM model are given the same weight, and the SoftVoting method is used for integration on the basis of their predicted probability. The mathematical expression is as follows:

P _{Soft Voting} = (P _RF +P _LightGBM )/2

Among them, P _{Soft Voting} refers to the prediction probability of the soft voting fusion model, P _RF , P _LightGBM represent the prediction probability of the random forest and LightGBM models, respectively, Result represents the prediction result of the fusion model, 1 indicates that it belongs to the repurchasing user, and 0 indicates that it belongs to the non-repurchasing user. For repurchasing users, threshold represents the classification threshold.

7 . The method for predicting customer repurchase based on the RF-LightGBM fusion model in a non-contract scenario according to claim 1 , wherein the repurchase behavior prediction and the repurchase probability prediction are used as product recommendation guidance. 8 .

8. A customer repurchase prediction device based on the RF-LightGBM fusion model under a non-contract scenario, characterized in that, comprising:

The acquisition module obtains the user's historical purchase record data, preprocesses it, and extracts features;

The balance module uses the sample balance method to balance the data after feature extraction to obtain balanced samples;

Optimize the training module, use the optimization algorithm to train the training sample data, and iteratively optimize the weak classifier in the specified weak classifier hyperparameter space;

The ensemble learning module, by assigning the same weight to each weak classifier, performs ensemble learning to obtain a strong classifier;

The prediction module uses a strong classifier to make predictions, and obtains the final results of product recommendation and repurchase behavior prediction;

The push module pushes the product information to the user's terminal device according to the final result.

9. An electronic device, characterized in that:

including processor, and memory;

The memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions to execute the client recovery based on the RF-LightGBM fusion model in the non-contract scenario according to any one of claims 1-7. Purchase forecasting method.

10. A computer-readable storage medium on which a computer program is stored, characterized in that: when the program is executed by a processor, the RF-LightGBM-based fusion in a non-contract scenario according to any one of claims 1-7 is realized The model's customer repurchase prediction method.