CN112990989A - Value prediction model input data generation method, device, equipment and medium - Google Patents
Value prediction model input data generation method, device, equipment and medium Download PDFInfo
- Publication number
- CN112990989A CN112990989A CN202110531498.4A CN202110531498A CN112990989A CN 112990989 A CN112990989 A CN 112990989A CN 202110531498 A CN202110531498 A CN 202110531498A CN 112990989 A CN112990989 A CN 112990989A
- Authority
- CN
- China
- Prior art keywords
- target
- value
- user
- similar
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Entrepreneurship & Innovation (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Game Theory and Decision Science (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application relates to the technical field of big data, in particular to a value prediction model input data generation method, device, equipment and medium. The method comprises the following steps: acquiring target historical service data corresponding to a target user; extracting preset target characteristic values corresponding to a plurality of target characteristics from target historical service data; when the target features of which the target feature values are not extracted exist, the target features of which the target feature values are not extracted are used as missing features, and similar users corresponding to the target users and the similarity between the target users and the similar users are obtained; extracting similar characteristic values corresponding to the missing characteristics from similar service data corresponding to similar users; calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the similarity and the similar characteristic value of the similar user; and obtaining value prediction model input data according to the calculated missing characteristic value and the extracted target characteristic value. By adopting the method, the accuracy of generating the model input data can be improved.
Description
Technical Field
The application relates to the technical field of big data, in particular to a value prediction model input data generation method, device, equipment and medium.
Background
With the development of computer technology, traditional offline services are gradually shifted to online for processing, so that the amount of online data becomes more and more huge. For companies, it is becoming more and more important how to analyze and process huge online data to obtain valid data.
For example, a company can predict the value level of a user in a future time period according to the value level of the user in a historical time period by analyzing online data, and further can execute business activities of corresponding levels on users of different levels in the future time period, so that the business execution efficiency is improved.
In the traditional method, in the process of predicting the user value level, the future value level of the user is directly predicted according to the historical data corresponding to the user, and when the historical data of the user is missing, the user value level predicted according to the historical data of the user is inaccurate.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a value prediction model input data generation method, device, apparatus, and medium capable of improving accuracy of model input data acquisition.
A value prediction model input data generation method comprises the following steps:
acquiring target historical service data corresponding to a target user;
extracting preset target characteristic values corresponding to a plurality of target characteristics from target historical service data;
when the target features of which the target feature values are not extracted exist, the target features of which the target feature values are not extracted are used as missing features, and similar users corresponding to the target users and the similarity between the target users and the similar users are obtained;
extracting similar characteristic values corresponding to the missing characteristics from similar service data corresponding to similar users;
calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the similarity and the similar characteristic value of the similar user;
and obtaining value prediction model input data according to the calculated missing characteristic value and the extracted target characteristic value.
In one embodiment, a target feature which is extracted from target historical business data to a target feature value is taken as a normal feature; acquiring similar users corresponding to the target user and the similarity between the target user and the similar users, wherein the steps comprise:
extracting preset similar characteristic values corresponding to all target characteristics from similar service data corresponding to similar users;
determining a similar characteristic mean value according to each similar characteristic value;
determining a normal characteristic mean value according to normal characteristic values corresponding to all normal characteristics in the target historical service data;
determining a similarity difference value of each target feature according to the similarity feature mean value and the similarity feature value corresponding to each target feature;
determining a target difference value of each target characteristic according to the target characteristic mean value and a target characteristic value corresponding to each target characteristic;
and determining the similarity between the target user and the similar user according to the similarity difference corresponding to each target feature and the target difference.
In one embodiment, calculating a missing feature value corresponding to the missing feature according to the similarity of similar users and the similar feature value includes:
determining a similarity adjustment value corresponding to each similar user according to the similarity characteristic mean value, the similarity characteristic value and the similarity degree respectively corresponding to each similar user;
and calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the normal characteristic mean value and the similar adjustment value respectively corresponding to each similar user.
In one embodiment, the method further comprises:
preprocessing target historical service data; the preprocessing includes at least one of data verification, data cleansing, and data normalization.
A method of user value prediction, the method comprising:
according to the value prediction model input data generation method in the embodiment, model input data are obtained;
inputting the model input data into a pre-constructed value prediction model, and obtaining the predicted value of the target user in a future time period according to the value prediction model; the value prediction model is constructed according to the training characteristics and the training values, and the training characteristics and the training values are obtained from historical business data corresponding to more than one user respectively.
In one embodiment, a method of constructing a value prediction model includes:
acquiring historical service data corresponding to more than one user respectively;
respectively extracting training characteristics from each historical service data;
extracting value calculation features from the training features;
acquiring the feature weight of each value calculation feature;
determining the training value of each user according to the characteristic weight and the value calculation characteristic;
and training the prediction model according to the training characteristics and the training value, and stopping the training of the prediction model when the training ending condition is met to obtain the value prediction model.
A user request processing method comprises the following steps:
receiving a user request, wherein the user request carries user data;
processing the user data by the user value prediction method in the embodiment to obtain the user value;
acquiring a service strategy corresponding to the user value;
and processing the user request according to the service strategy.
A value prediction model input data generation apparatus, the apparatus comprising:
the acquisition module is used for acquiring target historical service data corresponding to a target user;
the extraction module is used for extracting preset target characteristic values corresponding to a plurality of target characteristics from the target historical service data;
the similarity acquisition module is used for taking the target features of which the target feature values are not extracted as missing features when the target features of which the target feature values are not extracted exist, and acquiring similar users corresponding to the target users and the similarity between the target users and the similar users;
the similar data extraction module is used for extracting similar characteristic values corresponding to the missing characteristics from similar service data corresponding to similar users;
the calculation module is used for calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the similarity of the similar users and the similar characteristic value;
and the generating module is used for obtaining value prediction model input data according to the calculated missing characteristic value and the extracted target characteristic value.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the method when the processor executes the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
The value prediction model input data generation method acquires target historical service data corresponding to a target user, and extracts preset target characteristic values corresponding to a plurality of target characteristics from the target historical service data. And after the target characteristic value is obtained, whether the target characteristic value is not extracted is judged (namely whether the target characteristic value corresponding to each target characteristic has a missing value is judged), so that the accuracy of the target historical service data is judged before the user value is predicted. And when there is a target feature for which the target feature value is not extracted (that is, when there is a target feature value corresponding to the target feature as a missing value), the target feature for which the target feature value is not extracted is taken as the missing feature. And then acquiring similar users corresponding to the target user and similar service data corresponding to each similar user, so as to complete missing characteristic values of missing characteristics in the target service data according to the similar service data and the similarity between each similar user and the target user, thereby realizing the completion of the target historical service data and improving the accuracy of the target historical service data.
Drawings
FIG. 1 is a diagram of an application scenario of a method for generating value prediction model input data in one embodiment;
FIG. 2 is a schematic flow chart diagram illustrating a method for generating value prediction model input data in one embodiment;
FIG. 3 is a flow diagram illustrating a method for user value prediction in one embodiment;
FIG. 4 is a schematic flow chart illustrating a method for constructing a value prediction model according to an embodiment;
FIG. 5 is an overall schematic diagram of model training and prediction in one embodiment;
FIG. 6 is a block diagram of an embodiment of a value prediction model input data generation apparatus;
FIG. 7 is a block diagram showing the structure of a user value prediction apparatus according to an embodiment;
FIG. 8 is a block diagram of a user request processing device in one embodiment;
FIG. 9 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The value prediction model input data generation method provided by the application can be applied to the application environment shown in FIG. 1. Wherein the terminal 102 communicates with the server 104 via a network. The server 104 acquires target historical service data corresponding to a target user from the terminal 102; extracting preset target characteristic values corresponding to a plurality of target characteristics from target historical service data; when the target features of which the target feature values are not extracted exist, the target features of which the target feature values are not extracted are used as missing features, and similar users corresponding to the target users and the similarity between the target users and the similar users are obtained; extracting similar characteristic values corresponding to the missing characteristics from similar service data corresponding to similar users; calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the similarity and the similar characteristic value of the similar user; and obtaining value prediction model input data according to the calculated missing characteristic value and the extracted target characteristic value. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
In one embodiment, as shown in fig. 2, a value prediction model input data generation method is provided, which is described by taking the method as an example applied to the server 104 in fig. 1, and in other embodiments, the method can also be applied to the terminal 102, and the method includes the following steps:
Wherein the target user is a user who needs to perform value prediction. The target historical service data may be data corresponding to the target user in a historical period of time. For example, the target historical service data may include attribute information of the target user, historical transaction data of the target user, historical behavior information of the target user, and the like. It is understood that the user attribute information may be the name, gender, and geographic location of the target user. The historical transaction data of the target user may be the transaction stream generated by the target user during the transaction, such as the product purchased by the target user, the frequency of purchasing the product, the price of purchasing the product, and the like. The behavior information of the target user may be the behavior of the user in a transaction scenario or in other non-transaction scenarios, such as the behavior of whether the transaction of the target user is successful or not.
Furthermore, the server can also perform cleaning processing on the crawled target historical service data, such as removing error data in the target historical service data or data which does not conform to a standard format, so that the target historical service data conforms to subsequent data processing requirements, and the accuracy of data processing is improved. And the server can also normalize the crawled target historical business data so as to enable data calculation among data of different dimensions.
The target characteristics are characteristics extracted from target historical business data and used for characterizing target users. The target feature value is a numerical value of the target feature. It is to be understood that the target characteristics may be the target user's purchase interval, frequency of purchases, successive years of purchases, premium, profit, house property value, vehicle value, monthly revenue, and premium revenue proportion, among others. The target feature value is a specific numerical value corresponding to each target feature. Also, the target feature may be a predetermined feature.
And step 206, when the target features of which the target feature values are not extracted exist, taking the target features of which the target feature values are not extracted as missing features, and acquiring similar users corresponding to the target users and the similarity between the target users and the similar users.
The missing feature refers to a target feature which cannot be extracted from the target historical service data to obtain a corresponding target feature value. Specifically, the target historical service data does not include the target feature, for example, the target historical service data does not include the property value of the target user, so the server cannot extract the target feature value of the property value from the target historical service data.
In one embodiment, more than one target feature may be predetermined, then the server extracts a target feature value corresponding to each predetermined target feature from the target historical business data, and when the target feature values corresponding to all the predetermined target features are successfully extracted from the target historical business data, the target historical business data is qualified data, and the prediction step of the target user value may be performed according to the target historical business data. When the target characteristic values corresponding to all the predetermined target characteristics cannot be successfully extracted from the target historical service data, the target historical service data is indicated to be unqualified data, namely missing part data in the target historical service data, and at this time, if the value of the target user predicted according to the target historical service data of the missing part data is inaccurate, the target historical service data is not accurate.
In one embodiment, the server extracts a target characteristic value corresponding to each target characteristic from the target historical service data, and divides the target characteristic into a missing characteristic and a normal characteristic according to whether the target characteristic value of the target characteristic is a missing value. Specifically, a target feature from which a target feature value cannot be extracted from the target historical traffic data is taken as a missing feature, and a target feature from which a target feature value can be extracted is taken as a normal feature. When the server judges that the target characteristics have the missing characteristics, the server indicates that the data used for representing the missing characteristics of the user are missing in the target historical service data. After extracting the target characteristic value corresponding to more than one target characteristic from the target historical business data, the method further comprises the following steps: and when the server judges that the target characteristics have missing characteristics, acquiring similar users corresponding to the target users and similar service data corresponding to the similar users. And extracting similar data corresponding to the missing features from the similar service data, and determining missing feature values corresponding to the missing features according to the similar data.
Wherein the similar user is a user similar to the target user, such as the similar user may be a user with similar transaction behavior with the target user. Specifically, when the server judges that the target feature has the missing feature, similar users similar to the target user are obtained, and the missing feature value of the target user is estimated according to similar service data of the similar users. It should be noted that the number of similar users may be one or more, and is not limited herein.
And step 208, extracting similar characteristic values corresponding to the missing characteristics from the similar service data corresponding to the similar users.
And the server acquires similar service data corresponding to similar users so as to supplement missing data in the target historical service data according to the similar service data. Specifically, the server may extract a similar feature value corresponding to the missing feature from the similar service data, and then supplement the missing feature value of the missing feature with the similar feature value, so as to obtain the missing feature value of the missing feature.
And step 210, calculating to obtain a missing feature value corresponding to the missing feature according to the similarity of the similar users and the similar feature value.
In one embodiment, when the number of the similar users is multiple, the method further includes calculating the similarity between each similar user and the target user. And then calculating to obtain a missing characteristic value corresponding to the missing characteristic in the server according to the similarity of the similar users and the similar characteristic value calculated by each similar user.
In one embodiment, when the insurance company collects user (client) data, only basic information of the user may be collected, and value information of the user, such as monthly income, liability condition, house value, vehicle value and the like, may not be acquired, or data of other aspects of the user may be collected through specific business scenarios, but the obtained user data is a sparse matrix, and in many cases, the collected target historical business data of the target user is not comprehensive. And when the sparse matrix is input into the random forest prediction model to predict the user value, the accuracy of the model is influenced. Therefore, in the embodiment, the missing data of the target user can be complemented by the data of the similar user similar to the target user. Specifically, the unknown data of the target user may be weight voted according to the correlation coefficient, and the sparse matrix may be filled, for example, the target historical service data may be complemented by local matrix voting.
The value prediction model input data generation method acquires target historical service data corresponding to a target user, and extracts preset target characteristic values corresponding to a plurality of target characteristics from the target historical service data. And after the target characteristic value is obtained, whether the target characteristic value is not extracted is judged (namely whether the target characteristic value corresponding to each target characteristic has a missing value is judged), so that the accuracy of the target historical service data is judged before the user value is predicted. And when there is a target feature for which the target feature value is not extracted (that is, when there is a target feature value corresponding to the target feature as a missing value), the target feature for which the target feature value is not extracted is taken as the missing feature. And then acquiring similar users corresponding to the target user and similar service data corresponding to each similar user, so as to complete missing characteristic values of missing characteristics in the target service data according to the similar service data and the similarity between each similar user and the target user, thereby realizing the completion of the target historical service data and improving the accuracy of the target historical service data.
In one embodiment, a target feature which is extracted from target historical business data to a target feature value is taken as a normal feature; acquiring similar users corresponding to the target user and the similarity between the target user and the similar users, wherein the steps comprise: and extracting the preset similar characteristic values corresponding to the target characteristics from the similar service data corresponding to the similar users. And determining a similar characteristic mean value according to each similar characteristic value, and determining a normal characteristic mean value according to normal characteristic values corresponding to each normal characteristic in the target historical service data. Determining a similarity difference value of each target feature according to the similarity feature mean value and the similarity feature value corresponding to each target feature; determining a target difference value of each target characteristic according to the target characteristic mean value and a target characteristic value corresponding to each target characteristic; and determining the similarity between the target user and the similar user according to the similarity difference corresponding to each target feature and the target difference.
Specifically, the server extracts similar characteristic values corresponding to the target characteristics j from similar service data corresponding to the similar users i. Then, averaging all similar characteristic values to obtain similar characteristic mean values. Assuming that the information of similar users is a matrix of i x j columns, whereinAnd corresponding similar characteristic values on the target characteristic j are the similar users i. Then the mean of similar features corresponding to similar users i can be calculated by the following formula (1).
The server extracts normal characteristic values corresponding to all normal characteristics from the target historical service data of the target user a comprising the missing characteristicsThen, a normal feature mean value is calculated from the normal feature values. Then according to the mean value of the similar featuresAnd similar characteristic value corresponding to each target characteristicDetermining similarity difference values of the target features. According to the target feature meanAnd target feature values corresponding to the respective target featuresDetermining a target difference value for each target feature. And finally, determining the similarity between the target user and the similar users according to the similarity difference corresponding to each target feature and the target difference, specifically, the similarity between the target user a and the similar user iThe calculation formula is shown in formula (2).
And, according to the calculationThe corresponding magnitude of the value may be used to evaluate the degree of similarity between the target user and the similar users. Specifically, the larger the similarity value between the target user and the similar user is, the higher the similarity between the target user and the similar user is, that is, when the similar user scores the missing feature value of the target user, the higher the corresponding weight is.
In one embodiment, calculating a missing feature value corresponding to the missing feature according to the similarity of similar users and the similar feature value includes: determining a similarity adjustment value corresponding to each similar user according to the similarity characteristic mean value, the similarity characteristic value and the similarity degree respectively corresponding to each similar user; and calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the normal characteristic mean value and the similar adjustment value respectively corresponding to each similar user.
Specifically, for the target user a with unknown information (missing feature), the value of the missing feature estimated according to similar usersThe calculation may be by weight voting. In one embodiment, the predicted value of the target feature j of the target user aThe sum of the normal feature mean value of the normal feature corresponding to the known information of the target user a and the difference value between the target feature value of the n similar users and the similar feature mean value of the corresponding similar users multiplied by the similar values of the similar users. The specific calculation process is shown in formula (3).
In this embodiment, the target historical service data of the target user is complemented by the known similar service data corresponding to the similar user, so that the accuracy of the target historical service data is higher. And the missing characteristic value of the target user is voted according to the similarity between different similar users and the target user to supplement the matrix information of the target user, so that a sparse matrix is changed into a dense matrix, and the accuracy of predicting the user value according to a value prediction model in the follow-up process is improved.
In one embodiment, the method further comprises: preprocessing target historical service data; the preprocessing includes at least one of data verification, data cleansing, and data normalization.
In one embodiment, the data verification may include verifying the accuracy of the data in the target historical traffic data. The data cleansing may include cleansing erroneous data in the user data and removing the erroneous data. Specifically, due to data input errors, different representations caused by different source data, inconsistency between data and the like, the existing data has such or other dirty data, which mainly appears as follows: illegal values, entered non-specifications, inconsistent values, data duplication, and the like. The data cleaning function comprises removing unnecessary fields, cleaning format contents, filling vacancy values, cleaning logic errors, verifying data authenticity and the like.
Firstly, extracting determined customer information data from Oracle to a hive platform, inputting and extracting a database table and data to be cleaned, and performing information authenticity check according to a specific check rule, such as province check of an identity card, sector verification of a mobile phone number and the like, so as to clean real and effective data. In one embodiment, data cleansing includes: 1. the method specifically requires that the length of the identity card number is 15 or 18 bits, the identity card number needs to be in accordance with regional code verification, the identity card number needs to be in accordance with identity card date verification, the identity card number needs to be in accordance with identity card check bit judgment, and the identity card number does not contain abnormal numbers such as '0000'. When the ID card code is determined to be unsatisfactory with any of the above requirements, the ID card code is nulled. 2. If the user and the field personnel set the identity card number or the mobile phone number to be the same numerical value, the identity card number or the mobile phone number is also set to be null. 3. The name only keeps pure Chinese, pure letters and blank spaces, and the data mixed with Chinese and English is removed. 4. The length of the mobile phone number is not equal to 11 bits, the unconventional mobile phone number is verified according to a given rule, or the number comprises unconventional numbers such as 000000' and the like, and the number is empty. 5. The name length is more than or equal to 3 bits, and contains the character of 'equal to' and is removed. 7. And 3 different clients using the same identity card number and mobile phone number are rejected. 8. The name contains "company", culling.
In one embodiment, the data cleansing rules include: a. and the cleaning rule (admission rule) comprises null value verification, identity card number verification, mobile phone number verification and the like, and when the field does not accord with the configured rule, a new value is given according to a specified default value. When the field is null or null, it is replaced with a character string 'null'. And finally, forming a corresponding new data record by each piece of original record data according to the data cleaning rule, if the data of the row is valid, entering the next ID to get through, and if not, filtering. b. And (4) checking a null value, judging whether the field value is the null value, and if so, giving a character string 'null' to a field default value. c. And (4) checking a Null value, judging whether the field value is the Null value, and if so, giving a character string Null to a field default value. d. Checking the ID card number, judging whether the ID card number is legal, whether the area code verification is effective, whether the ID card date verification is effective, judging the last bit of the ID card value, and judging the length of the ID card. The method specifically comprises the following steps: when the province code of the identity card is incorrect, the identity card is nulled; when the regular expression of the identity card is judged to be incorrect, emptying the identity card; when the check position of the identity card is incorrect, the identity card is empty; when the ID card contains '0000', the ID card is empty.
And (3) checking the mobile phone number, specifically, judging whether the mobile phone number is legal or not, such as judging the length of the mobile phone number, judging whether the mobile phone number starts with 1 or not, judging whether the mobile phone number is an abnormal number such as 1111111111 or not, and the like. The method comprises the following specific steps: when the length of the mobile phone number is not equal to 11, the mobile phone number is empty; when the mobile phone number is not started with 1, the mobile phone number is empty; when the mobile phone number contains '000000', the mobile phone number is empty; when the mobile phone number contains '11111111', the mobile phone number is empty; when the mobile phone number contains '22222222', the mobile phone number is set to be null; when the mobile phone number contains '33333333', the mobile phone number is empty; when the mobile phone number contains '44444444', the mobile phone number is empty; when the mobile phone number contains '5555555555', the mobile phone number is empty; when the mobile phone number contains '66666666', the mobile phone number is empty; when the mobile phone number contains '77777777', the mobile phone number is empty; when the mobile phone number contains '88888888', the mobile phone number is empty; when the mobile phone number contains '99999999', the mobile phone number is empty; when the mobile phone number contains '23456789', the mobile phone number is set to be null; when the mobile phone number contains '12345678', the mobile phone number is set to be null; when the mobile phone number contains '01234567', the mobile phone number is empty; when the mobile phone number contains '34567890', the mobile phone number is empty; when the mobile phone number contains '456789', the mobile phone number is empty; when the mobile phone number contains '1380013800', the mobile phone number is set to be empty.
The same process as the agent information. Specifically, when the data in the user basic information summary table is judged not to belong to the agent but the agent information is used, the corresponding information is nulled.
In one embodiment, the step of normalizing the data comprises: because a client may be reached by multiple portals through multiple paths, the same client may be tagged with multiple IDs on different systems. Also, when a user transacts a business several times, it may be considered as two clients because of the difference in the provided information. When analyzing the value of a client, the ID normalization is needed to collect the data of the client in all systems and all time periods, and the client ID is called through. Specifically, the data cut-through rule includes: the method comprises the steps of obtaining a user basic information data summary table in a server, generating a new user ID for a user as a unique identification of the user, storing the new user ID in the summary table at a first field position, wherein one client ID corresponds to a plurality of pieces of record data, but one record only belongs to one client ID.
And (4) ID opening: and adding a field which can identify the user uniquely by a specified rule to each piece of data, and simultaneously, keeping a main key of each piece of data in the source table by a field so as to trace back the source table, and finally storing the processed result data in the hive data warehouse. Specifically, the rules currently used as the rules for identifying users are as follows: determining a user by the first two digits of the name plus the certificate number; name + mobile phone number, determining a user; determining a user by the certificate number and the mobile phone number; name + bank card number, determining a user; name + micro-signal identifies a user; the mobile phone number and the bank card number determine a user; determining a user by the mobile phone number and the micro signal; name + device ID, determining a user; mobile phone number + device ID, determine a user, etc. And are not intended to be limiting herein.
As shown in fig. 3, a flow chart of a user value prediction method is provided, and the method includes:
The value prediction model is a pre-trained model and can be used for predicting the value of the target user. Specifically, the server acquires historical service data corresponding to a plurality of users respectively, then extracts training features from the historical service data, and determines a training value according to the training features so as to train a model according to the training features and the training value to obtain a value prediction model. It is understood that the value prediction model may be a decision tree model, a random forest model, a regression model, or a machine learning model, and the like, and is not limited herein.
It should be noted that the training feature and the target feature may be the same feature or different features, and are not limited herein. And the training value is all or part of the features extracted from the training features and is used for calculating the training value of the user according to the training value.
The user value prediction method obtains target historical service data corresponding to a target user, and extracts preset target characteristic values corresponding to a plurality of target characteristics from the target historical service data. And after the target characteristic values are obtained, whether the target characteristic values corresponding to the target characteristics have missing values or not is judged, so that the accuracy of the target historical service data is judged before the user value is predicted. And only when the target characteristic values corresponding to the target characteristics are judged to have no missing values, the accurate target characteristic values are input into a pre-constructed value prediction model, the prediction value of the target user in the future time period is obtained according to the value prediction model, and the accuracy of value prediction of the target user is improved. The value prediction model is constructed according to the training characteristics and the training values, and the training characteristics and the training values are obtained from historical business data corresponding to more than one user respectively. Before the value of the target user is predicted, the accuracy of the value of the target user obtained according to target characteristic prediction in the following process is guaranteed by verifying the target characteristic value extracted from the target historical service data.
In one embodiment, as shown in fig. 4, a flow chart of a method for constructing a value prediction model is provided, and the method includes:
The historical service data may be data corresponding to a historical period of time. The historical service data may include attribute information of the user, historical transaction data of the user, historical behavior information of the user, and the like. It is understood that the user attribute information may be the name, gender, and geographic location of the user. The user's historical transaction data may be a running line of transactions that the user generates during the transaction, such as the products purchased by the user, the frequency with which the products are purchased, the price at which the products are purchased, and the like. The behavior information of the user may be the behavior of the user in a transaction scenario or in other non-transaction scenarios, such as the behavior of whether the transaction of the user is successful or not.
Specifically, the server crawls historical business data of a plurality of users from a business system. Furthermore, the server can also perform cleaning processing on the crawled historical service data, such as removing error data in the historical service data or data which does not conform to a standard format, so that the historical service data conforms to subsequent data processing requirements, and the accuracy of data processing is improved. And the server can also normalize the crawled historical service data so as to enable data calculation among data of different dimensions.
The training features are extracted from historical traffic data to characterize the user. It will be appreciated that the training features may be the same features as the target features, e.g., the training features may be user intervals for purchase of the product, frequency of purchase, age of successive purchases, premium, profit, house property value, vehicle value, monthly revenue, and premium revenue duty, etc.
Wherein the value calculation feature is a part or all of the features extracted from the training features for calculating the training value of the user. Specifically, the server extracts premium characteristics and profit characteristics from the training characteristics and then calculates a training value based on the premium characteristics and the profit characteristics.
The feature weight may be predetermined, and the feature weight is set in advance for the value calculation feature. In one embodiment, the server acquires historical business data of a plurality of users and value labels corresponding to the users, analyzes and mines the historical business data of the users to find out the value calculation features with the maximum relevance with the value labels of the users from the historical business data, determines the influence of the value calculation features on the value labels of the users, and determines the feature weight of the value calculation features according to the influence.
Specifically, the server multiplies the feature weight by the corresponding value calculation feature, and sums up to obtain the training value of each user.
And 412, training the prediction model according to the training characteristics and the training value, and stopping training the prediction model when the training ending condition is met to obtain the value prediction model.
Specifically, the training characteristics and the training value are used as a training set and used for training a prediction model, and when the training precision of the obtained prediction model meets a preset precision condition or the iteration number of the training reaches a preset number, the training of the prediction model is stopped, so that the value prediction model is obtained.
In one embodiment, after the training features are obtained, extracting some or all of the features from the training features to obtain key features, and training the prediction model according to the key features extracted from the training features and the training value. Specifically, the step of extracting key features from the training features includes: extracting a feature vector corresponding to each training feature; and extracting key features from the training features according to the vector feature value of each feature vector. Specifically, the top-ranked key features can be obtained according to the scores of the vector feature values.
In an embodiment, the description will be given by taking an example that the server acquires historical service data corresponding to the user in 2015-2018 and predicts the value of the user in 2019 according to the historical service data.
Firstly, data normalization is carried out on the data of the premium characteristic value and the profit characteristic value according to the distribution situation of the premium characteristic value and the profit characteristic value data in the historical business data. And determining a premium weight for the premium characteristic and a profit weight for the profit characteristic based on the business experience. And then fitting the premium characteristic value and the premium weight and the profit characteristic value and the profit weight to construct a premium profit comprehensive consideration expression. And calculating a score according to the comprehensive consideration expression, and determining the user value according to the score.
Specifically, the users are comprehensively ranked and layered according to the scores calculated by the comprehensive consideration expression. For example, users may be ranked according to score ranking, such as ranking top 10% of users with a first value of training value, 20% -50% of users with a second value of training value, 50% -70% of users with a third value of training value, 70% -90% of users with a fourth value of training value, and 90% -100% of users with a fifth value of training value.
Then, training a prediction model according to training characteristics extracted from historical service data and a training value obtained by calculation according to a comprehensive consideration expression, carrying out model training, optimizing and iterating to obtain a stable algorithm model, and storing the obtained value prediction model. In one embodiment, a prediction model can be constructed by using a random forest algorithm, and the results of a plurality of decision trees are subjected to weighted aggregation, so that the algorithm is more stable, and the risk of overfitting is reduced.
Furthermore, in the process of constructing the prediction model by using the random forest algorithm, the parameter tuning algorithm of the prediction model mainly has three characteristics for tuning. One of which is the maximum number of features: random forests allow a single decision tree to use the maximum number of features. Increasing the maximum features of a single tree generally improves the fitting performance of the model, since more choices are available for consideration at each node. However, the diversity of the individual trees is reduced and the speed of the algorithm is reduced by increasing the maximum number of features. It is therefore necessary to select the best maximum number of features. Typically, for models with feature numbers less than 200, the maximum feature data may be considered between 35% and 75% of the total features. And performing targeted adjustment according to the fitting condition of the model, wherein overfitting is reduced, and underfitting is improved. Again, the number of trees: the number of trees has two influences on the model, one is the fitting ability of the model, and the larger the number is, the lower the calculation speed is, and the better the fitting ability is. For low sample diversity (depending on the number of features and label class), typically no more than two hundred trees are used. And leaf node minimum sample number: the number of the samples of the leaf nodes can control the complexity of the model, and meanwhile, the robustness of the model can be well guaranteed, and in one embodiment of the application, for a scene with few classes of business and multiple training samples, the minimum number of the samples of the leaf nodes can be set to be a large value (larger than 50).
Specifically, grid _ search can be performed through cross validation, and the optimal parameters are selected to obtain the optimal model. According to general business knowledge, in a specified parameter range, a smaller super-parameter value field is listed, and Cartesian products (permutation and combination) of the super-parameter value fields form a group of super-parameters. And then, carrying out error test on each hyper-parameter model by using a cross validation method to obtain an optimal model.
As shown in FIG. 5, FIG. 5 provides an overall schematic of model training and prediction. Specifically, the value prediction model shown in fig. 5 includes two stages, namely training and prediction. Specifically, the example of predicting the user value in 2019 by using the historical business data training model in 2015-2018 is described. The section starting from the upper left corner in fig. 5 corresponds to the feature extraction stage. Specifically, the server obtains 2015-2018 historical service data (the historical service data may include user policy data, user claim data, user report data, and the like), and performs data processing (data prediction, such as data verification, data cleaning, data normalization, and the like) on the historical service data. And then, performing feature engineering on the data subjected to data processing, for example, extracting preset target feature values corresponding to a plurality of target features from the historical service data subjected to data processing, and then performing feature screening on the target features to obtain the screened features related to the value of the user.
Continuing to refer to fig. 5, starting from the lower left corner in fig. 5, the server obtains the historical service data corresponding to the year 2015-. And finally, establishing a multi-classification problem according to the features obtained by screening and the value categories (targets) of the users as learning labels, training the model to obtain a logistic regression model or a decision tree model and the like, and predicting the value categories of the customers (users) of the users in 2019 according to the trained model.
A user request processing method comprises the following steps: receiving a user request, wherein the user request carries user data; processing the user data by the user value prediction method illustrated in any one of the above embodiments to obtain a user value; acquiring a service strategy corresponding to the user value; and processing the user request according to the service strategy.
In a specific application scenario, value classification can be performed on users, and different services can be distributed to users with different values and different service strategies can be adopted. Wherein the service policy can be a service tactical policy, or a service content policy, etc.
In a specific application scenario, a business department hopes to model the user value of a user so as to realize value division of the user, and further, different service strategies are adopted for different value users, so that the function of matching corresponding services for users of different grades is realized, the matching degree of the user and the server strategy is improved, unnecessary computer matching processes are reduced, and computer resources are saved. Meanwhile, the user experience and the service accuracy are improved.
In a specific application scenario, firstly, normalization of user ID is performed on the obtained user historical service data, user historical service data in different service systems are communicated, and unique identification is given. And then selecting a time interval to extract features through the normalized user ID. The target features will be converted to value tags according to business logic. And (3) constructing a random forest model, and carrying out super-parameter tuning by using a grid search CV method to obtain an optimal model. And storing the user value generated by the model into a user data analysis application platform as a user label. Differentiated service or differentiated claim settlement is carried out through the user tags, and when a user touches the user, the user seat can inquire the value tags of the user through the user data analysis platform. And by taking the label as a reference, different dialogues and service strategies are used for providing individuation for the user and providing targeted service.
It should be understood that although the various steps in the flow charts of fig. 2-4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-4 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, as shown in FIG. 6, there is provided a value prediction model input data generation apparatus 600 comprising:
the obtaining module 602 is configured to obtain target historical service data corresponding to a target user.
The extracting module 604 is configured to extract target feature values corresponding to a plurality of preset target features from the target historical service data.
And a similarity obtaining module 606, configured to, when there is a target feature for which the target feature value is not extracted, take the target feature for which the target feature value is not extracted as a missing feature, and obtain a similar user corresponding to the target user and a similarity between the target user and the similar user.
The similar data extracting module 608 is configured to extract a similar feature value corresponding to the missing feature from similar service data corresponding to a similar user.
And the calculating module 610 is configured to calculate a missing feature value corresponding to the missing feature according to the similarity of the similar users and the similar feature value.
And the generating module 612 is configured to obtain value prediction model input data according to the calculated missing feature value and the extracted target feature value.
In an embodiment, the similarity obtaining module 606 is further configured to extract, from similar service data corresponding to similar users, a preset similar feature value corresponding to each target feature; determining a similar characteristic mean value according to each similar characteristic value; determining a normal characteristic mean value according to normal characteristic values corresponding to all normal characteristics in the target historical service data; determining a similarity difference value of each target feature according to the similarity feature mean value and the similarity feature value corresponding to each target feature; determining a target difference value of each target characteristic according to the target characteristic mean value and a target characteristic value corresponding to each target characteristic; and determining the similarity between the target user and the similar user according to the similarity difference corresponding to each target feature and the target difference.
In one embodiment, the calculation module 610 is further configured to: determining a similarity adjustment value corresponding to each similar user according to the similarity characteristic mean value, the similarity characteristic value and the similarity degree respectively corresponding to each similar user; and calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the normal characteristic mean value and the similar adjustment value respectively corresponding to each similar user.
In one embodiment, the value prediction model input data generation apparatus further includes a preprocessing module 614, where the preprocessing module 614 is configured to preprocess the target historical business data; the preprocessing includes at least one of data verification, data cleansing, and data normalization.
In one embodiment, as shown in FIG. 7, there is provided a user value prediction apparatus 700, comprising:
an input data obtaining module 702 is configured to obtain model input data according to the value prediction model input data generation method in the foregoing embodiment.
The prediction module 704 is used for inputting the model input data into a pre-constructed value prediction model and obtaining the predicted value of the target user in the future time period according to the value prediction model; the value prediction model is constructed according to the training characteristics and the training values, and the training characteristics and the training values are obtained from historical business data corresponding to more than one user respectively.
In one embodiment, the user value prediction apparatus 700 further includes a model building module 706, where the model building module 706 is configured to obtain historical service data corresponding to more than one user; respectively extracting training characteristics from each historical service data; extracting value calculation features from the training features; acquiring the feature weight of each value calculation feature; determining the training value of each user according to the characteristic weight and the value calculation characteristic; and training the prediction model according to the training characteristics and the training value, and stopping the training of the prediction model when the training ending condition is met to obtain the value prediction model.
In one embodiment, as shown in fig. 8, there is provided a user request processing apparatus 800, the apparatus comprising:
a request receiving module 802, configured to receive a user request, where the user request carries user data.
The value calculating module 804 is configured to process the user data by using the user value predicting method in the foregoing embodiment, so as to obtain the user value.
And a policy matching module 806, configured to obtain a service policy corresponding to the user value.
And the processing module 808 is configured to process the user request according to the service policy.
For the specific limitations of the above apparatus, reference may be made to the limitations of the above method, which are not described herein again. The various modules in the above-described apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing relevant business data for predicting the value of the user. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a value prediction model input data generation method.
Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program: acquiring target historical service data corresponding to a target user; extracting preset target characteristic values corresponding to a plurality of target characteristics from target historical service data; when the target features of which the target feature values are not extracted exist, the target features of which the target feature values are not extracted are used as missing features, and similar users corresponding to the target users and the similarity between the target users and the similar users are obtained; extracting similar characteristic values corresponding to the missing characteristics from similar service data corresponding to similar users; calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the similarity and the similar characteristic value of the similar user; and obtaining value prediction model input data according to the calculated missing characteristic value and the extracted target characteristic value.
In one embodiment, the processor, when executing the computer program, further performs the step of obtaining a similar user corresponding to the target user and a similarity between the target user and the similar user, to: extracting preset similar characteristic values corresponding to all target characteristics from similar service data corresponding to similar users; determining a similar characteristic mean value according to each similar characteristic value; determining a normal characteristic mean value according to normal characteristic values corresponding to all normal characteristics in the target historical service data; determining a similarity difference value of each target feature according to the similarity feature mean value and the similarity feature value corresponding to each target feature; determining a target difference value of each target characteristic according to the target characteristic mean value and a target characteristic value corresponding to each target characteristic; and determining the similarity between the target user and the similar user according to the similarity difference corresponding to each target feature and the target difference.
In one embodiment, when the processor executes the computer program, the step of calculating the missing feature value corresponding to the missing feature according to the similarity and the similar feature value of the similar user is further configured to: determining a similarity adjustment value corresponding to each similar user according to the similarity characteristic mean value, the similarity characteristic value and the similarity degree respectively corresponding to each similar user; and calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the normal characteristic mean value and the similar adjustment value respectively corresponding to each similar user.
In one embodiment, the processor, when executing the computer program, further performs the steps of: preprocessing target historical service data; the preprocessing includes at least one of data verification, data cleansing, and data normalization.
In one embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program: according to the value prediction model input data generation method in the embodiment, model input data are obtained; inputting the model input data into a pre-constructed value prediction model, and obtaining the predicted value of the target user in a future time period according to the value prediction model; the value prediction model is constructed according to the training characteristics and the training values, and the training characteristics and the training values are obtained from historical business data corresponding to more than one user respectively.
In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring historical service data corresponding to more than one user respectively; respectively extracting training characteristics from each historical service data; extracting value calculation features from the training features; acquiring the feature weight of each value calculation feature; determining the training value of each user according to the characteristic weight and the value calculation characteristic; and training the prediction model according to the training characteristics and the training value, and stopping the training of the prediction model when the training ending condition is met to obtain the value prediction model.
In one embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program: receiving a user request, wherein the user request carries user data; processing the user data by the user value prediction method in the embodiment to obtain the user value; acquiring a service strategy corresponding to the user value; and processing the user request according to the service strategy.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor performs the steps of: acquiring target historical service data corresponding to a target user; extracting preset target characteristic values corresponding to a plurality of target characteristics from target historical service data; when the target features of which the target feature values are not extracted exist, the target features of which the target feature values are not extracted are used as missing features, and similar users corresponding to the target users and the similarity between the target users and the similar users are obtained; extracting similar characteristic values corresponding to the missing characteristics from similar service data corresponding to similar users; calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the similarity and the similar characteristic value of the similar user; and obtaining value prediction model input data according to the calculated missing characteristic value and the extracted target characteristic value.
In one embodiment, the computer program when executed by the processor further performs the step of obtaining a similar user corresponding to the target user and a similarity between the target user and the similar user, further: extracting preset similar characteristic values corresponding to all target characteristics from similar service data corresponding to similar users; determining a similar characteristic mean value according to each similar characteristic value; determining a normal characteristic mean value according to normal characteristic values corresponding to all normal characteristics in the target historical service data; determining a similarity difference value of each target feature according to the similarity feature mean value and the similarity feature value corresponding to each target feature; determining a target difference value of each target characteristic according to the target characteristic mean value and a target characteristic value corresponding to each target characteristic; and determining the similarity between the target user and the similar user according to the similarity difference corresponding to each target feature and the target difference.
In one embodiment, when being executed by a processor, the computer program further performs the step of calculating a missing feature value corresponding to the missing feature according to the similarity and the similar feature value of the similar user, and is further configured to: determining a similarity adjustment value corresponding to each similar user according to the similarity characteristic mean value, the similarity characteristic value and the similarity degree respectively corresponding to each similar user; and calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the normal characteristic mean value and the similar adjustment value respectively corresponding to each similar user.
In one embodiment, the computer program when executed by the processor further performs the steps of: preprocessing target historical service data; the preprocessing includes at least one of data verification, data cleansing, and data normalization.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program that when executed by the processor performs the steps of: according to the value prediction model input data generation method in the embodiment, model input data are obtained; inputting the model input data into a pre-constructed value prediction model, and obtaining the predicted value of the target user in a future time period according to the value prediction model; the value prediction model is constructed according to the training characteristics and the training values, and the training characteristics and the training values are obtained from historical business data corresponding to more than one user respectively.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring historical service data corresponding to more than one user respectively; respectively extracting training characteristics from each historical service data; extracting value calculation features from the training features; acquiring the feature weight of each value calculation feature; determining the training value of each user according to the characteristic weight and the value calculation characteristic; and training the prediction model according to the training characteristics and the training value, and stopping the training of the prediction model when the training ending condition is met to obtain the value prediction model.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program that when executed by the processor performs the steps of: receiving a user request, wherein the user request carries user data; processing the user data by the user value prediction method in the embodiment to obtain the user value; acquiring a service strategy corresponding to the user value; and processing the user request according to the service strategy.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (10)
1. A method of generating value prediction model input data, the method comprising:
acquiring target historical service data corresponding to a target user;
extracting preset target characteristic values corresponding to a plurality of target characteristics from the target historical service data;
when target features with target feature values not extracted exist, the target features with the target feature values not extracted are used as missing features, and similar users corresponding to the target users and the similarity between the target users and the similar users are obtained;
extracting similar characteristic values corresponding to the missing characteristics from similar service data corresponding to the similar users;
calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the similarity of the similar users and the similar characteristic value;
and obtaining value prediction model input data according to the calculated missing characteristic value and the extracted target characteristic value.
2. The value prediction model input data generation method according to claim 1, characterized in that a target feature extracted from the target historical business data to a target feature value is taken as a normal feature; the obtaining of the similar user corresponding to the target user and the similarity between the target user and the similar user includes:
extracting preset similar characteristic values corresponding to the target characteristics from the similar service data corresponding to the similar users;
determining a similar characteristic mean value according to each similar characteristic value;
determining a normal characteristic mean value according to normal characteristic values corresponding to the normal characteristics in the target historical service data;
determining a similarity difference value of each target feature according to the similarity feature mean value and the similarity feature value corresponding to each target feature;
determining a target difference value of each target feature according to the target feature mean value and the target feature value corresponding to each target feature;
and determining the similarity between the target user and the similar user according to the similarity difference corresponding to each target feature and the target difference.
3. The method for generating value prediction model input data according to claim 2, wherein the step of calculating the missing feature value corresponding to the missing feature according to the similarity of the similar users and the similar feature value comprises:
determining a similarity adjustment value corresponding to each similar user according to the similar feature mean value, the similar feature value and the similarity corresponding to each similar user;
and calculating to obtain a missing feature value corresponding to the missing feature according to the normal feature mean value and the similar adjustment value corresponding to each similar user.
4. The value prediction model input data generation method of any one of claims 1 to 3, further comprising:
preprocessing the target historical service data; the preprocessing comprises at least one of data verification, data cleaning and data normalization.
5. A method for predicting user value, the method comprising:
the value prediction model input data generation method according to any one of claims 1 to 4, obtaining model input data;
inputting the model input data into a pre-constructed value prediction model, and obtaining the predicted value of the target user in a future time period according to the value prediction model; the value prediction model is constructed according to training features and training values, and the training features and the training values are obtained from historical business data corresponding to more than one user respectively.
6. The method of claim 5, wherein the method of constructing the value prediction model comprises:
acquiring historical service data corresponding to more than one user respectively;
respectively extracting training characteristics from each historical service data;
extracting value calculation features from the training features;
obtaining a feature weight of each value calculation feature;
determining the training value of each user according to the feature weight and the value calculation feature;
and training a prediction model according to the training characteristics and the training value, and stopping training the prediction model when a training ending condition is met to obtain the value prediction model.
7. A method for processing a user request, the method comprising:
receiving a user request, wherein the user request carries user data;
processing the user data by the user value prediction method of any one of claims 5 to 6 to obtain a user value;
acquiring a service strategy corresponding to the user value;
and processing the user request according to the service strategy.
8. A value prediction model input data generation apparatus, the apparatus comprising:
the acquisition module is used for acquiring target historical service data corresponding to a target user;
the extraction module is used for extracting preset target characteristic values corresponding to a plurality of target characteristics from the target historical service data;
the similarity acquisition module is used for taking the target features of which the target feature values are not extracted as missing features when the target features of which the target feature values are not extracted exist, and acquiring similar users corresponding to the target users and the similarity between the target users and the similar users;
a similar data extraction module, configured to extract a similar feature value corresponding to the missing feature from similar service data corresponding to the similar user;
the calculation module is used for calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the similarity of the similar users and the similar characteristic value;
and the generating module is used for obtaining value prediction model input data according to the calculated missing characteristic value and the extracted target characteristic value.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110531498.4A CN112990989B (en) | 2021-05-17 | 2021-05-17 | Value prediction model input data generation method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110531498.4A CN112990989B (en) | 2021-05-17 | 2021-05-17 | Value prediction model input data generation method, device, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112990989A true CN112990989A (en) | 2021-06-18 |
CN112990989B CN112990989B (en) | 2021-07-30 |
Family
ID=76336636
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110531498.4A Active CN112990989B (en) | 2021-05-17 | 2021-05-17 | Value prediction model input data generation method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112990989B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113657945A (en) * | 2021-08-27 | 2021-11-16 | 建信基金管理有限责任公司 | User value prediction method, device, electronic equipment and computer storage medium |
CN115455708A (en) * | 2022-09-19 | 2022-12-09 | 贵州航天云网科技有限公司 | Multi-model local modeling method based on vector identity |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111694830A (en) * | 2020-06-12 | 2020-09-22 | 复旦大学 | Missing data completion method based on deep ensemble learning |
CN112241916A (en) * | 2020-10-22 | 2021-01-19 | 北京大学 | Personal credit risk default early warning method, device, equipment and storage medium |
CN112269937A (en) * | 2020-11-16 | 2021-01-26 | 加和(北京)信息科技有限公司 | Method, system and device for calculating user similarity |
-
2021
- 2021-05-17 CN CN202110531498.4A patent/CN112990989B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111694830A (en) * | 2020-06-12 | 2020-09-22 | 复旦大学 | Missing data completion method based on deep ensemble learning |
CN112241916A (en) * | 2020-10-22 | 2021-01-19 | 北京大学 | Personal credit risk default early warning method, device, equipment and storage medium |
CN112269937A (en) * | 2020-11-16 | 2021-01-26 | 加和(北京)信息科技有限公司 | Method, system and device for calculating user similarity |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113657945A (en) * | 2021-08-27 | 2021-11-16 | 建信基金管理有限责任公司 | User value prediction method, device, electronic equipment and computer storage medium |
CN115455708A (en) * | 2022-09-19 | 2022-12-09 | 贵州航天云网科技有限公司 | Multi-model local modeling method based on vector identity |
CN115455708B (en) * | 2022-09-19 | 2023-12-19 | 贵州航天云网科技有限公司 | Multi-model local modeling method based on vector discrimination |
Also Published As
Publication number | Publication date |
---|---|
CN112990989B (en) | 2021-07-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108876133B (en) | Risk assessment processing method, device, server and medium based on business information | |
CN107066616B (en) | Account processing method and device and electronic equipment | |
CN109711955B (en) | Poor evaluation early warning method and system based on current order and blacklist base establishment method | |
CN112990386B (en) | User value clustering method and device, computer equipment and storage medium | |
CN112132233A (en) | Criminal personnel dangerous behavior prediction method and system based on effective influence factors | |
US11562262B2 (en) | Model variable candidate generation device and method | |
CN109583966A (en) | A kind of high value customer recognition methods, system, equipment and storage medium | |
Kolodiziev et al. | Automatic machine learning algorithms for fraud detection in digital payment systems | |
CN112288279A (en) | Business risk assessment method and device based on natural language processing and linear regression | |
CN114202336A (en) | Risk behavior monitoring method and system in financial scene | |
CN112990989B (en) | Value prediction model input data generation method, device, equipment and medium | |
CN112035775B (en) | User identification method and device based on random forest model and computer equipment | |
CN112487284A (en) | Bank customer portrait generation method, equipment, storage medium and device | |
CN115311042A (en) | Commodity recommendation method and device, computer equipment and storage medium | |
CN111091276A (en) | Enterprise risk scoring method and device, computer equipment and storage medium | |
Khare et al. | AI-Powered Fraud Prevention: A Comprehensive Analysis of Machine Learning Applications in Online Transactions | |
CN114612239A (en) | Stock public opinion monitoring and wind control system based on algorithm, big data and artificial intelligence | |
CN110610378A (en) | Product demand analysis method and device, computer equipment and storage medium | |
CN116821759A (en) | Identification prediction method and device for category labels, processor and electronic equipment | |
CN113706258B (en) | Product recommendation method, device, equipment and storage medium based on combined model | |
CN116739764A (en) | Transaction risk detection method, device, equipment and medium based on machine learning | |
CN116361488A (en) | Method and device for mining risk object based on knowledge graph | |
CN110570301B (en) | Risk identification method, device, equipment and medium | |
CN114022712A (en) | User classification method and device, computer equipment and storage medium | |
CN115293783A (en) | Risk user identification method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |