CN109165691B

CN109165691B - Training method and device for model for identifying cheating users and electronic equipment

Info

Publication number: CN109165691B
Application number: CN201811030204.4A
Authority: CN
Inventors: 韩冰; 陈家耀
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2018-09-05
Filing date: 2018-09-05
Publication date: 2022-04-22
Anticipated expiration: 2038-09-05
Also published as: CN109165691A

Abstract

The embodiment of the invention provides a training method and a training device for identifying a model of a cheating user and electronic equipment, wherein the method comprises the following steps: acquiring and storing user information of a first type of access user; determining user information which does not accord with a preset rule in the stored user information of the first type of visiting users as a training sample, wherein the preset rule is as follows: rules determined by an unsupervised learning algorithm based on stored user information of the second type of access users; training a preset model to be trained based on the training sample; and when the accuracy of the output result of the model to be trained reaches the preset accuracy, stopping training to obtain the identification model for identifying the cheating user. Compared with the prior art, the scheme provided by the embodiment of the invention can improve the identification accuracy of the trained identification model to the newly-appeared cheating users and the recall rate of the identification model.

Description

Training method and device for model for identifying cheating users and electronic equipment

Technical Field

The invention relates to the technical field of computers, in particular to a training method and a training device for identifying a model of a cheating user and electronic equipment.

Background

Currently, with the continuous development of internet technology, more and more users choose to publish various types of information, such as photographed videos, written novels, product advertisements, and the like, through a network. These users generally want the information they publish to gain more attention, such as higher video play, higher novel reading, higher advertisement click-through rate, etc.

However, in some cases, the above-mentioned attention may not be true, and there may be users who are generated by some cheating application simulation and do not exist in reality, that is, cheating users, among the accessing users of the above-mentioned information. For example, in the case of an advertisement, there may be a case where a cheating user clicks or plays the advertisement among viewing users of the advertisement, resulting in that the click amount or the play amount of the advertisement is not real.

In order to perform corresponding processing on the cheating users, various information websites need to identify which of the information access users are the cheating users, namely, to perform anti-cheating. In the prior art, the anti-cheating method is generally as follows: marking the pre-acquired cheating user information, taking the marked cheating user information as sample user information, training through a machine learning algorithm based on the sample user information to obtain an identification model for identifying the cheating user, detecting the user information of the visiting user by using the trained identification model, and determining the cheating user in the visiting user according to a detection result.

However, the inventors found that in identifying a cheating user by the above-described method, the method has at least the following problems: in the process of obtaining the sample user information by the manual marking method, only the user information of the cheating users of the found type can be marked, and due to the fact that the mode for simulating the cheating users is updated quickly, the user information of the cheating users of the new type cannot be marked by the manual marking method, so that the recognition accuracy of the recognition model obtained through training on the cheating users of the new type is low, and the recall rate of the recognition model is low.

Disclosure of Invention

The embodiment of the invention aims to provide a training method and a training device for a model for identifying cheating users and electronic equipment, so as to improve the identification accuracy of the trained model for identifying the cheating users to the cheating users of new types and the recall rate of the model. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a training method for identifying a model of a cheating user, where the method includes:

acquiring and storing user information of a first type of access user;

determining user information which does not accord with a preset rule in the stored user information of the first type of visiting users as a training sample, wherein the preset rule is as follows: based on the stored user information of a second type of access users, determining a rule through an unsupervised learning algorithm, wherein the user information of the second type of access users is the user information of the access users which is acquired and stored before the user information of the first type of access users is acquired;

training a preset model to be trained based on the training sample, wherein the model to be trained is a model for identifying whether the first type of visiting user and the second type of visiting user are cheating users;

and when the accuracy of the output result of the model to be trained reaches the preset accuracy, stopping training to obtain the identification model for identifying the cheating user.

As an implementation manner of the embodiment of the present invention, the step of acquiring and storing the user information of the first type of access user includes:

acquiring and storing user information of a first type of access user in a current period;

the step of determining the user information which does not conform to the preset rule in the stored user information of the first type of visiting users as the training sample comprises the following steps:

when the current period is finished, determining user information which does not accord with a preset rule in the user information of the first type of access users stored in the current period as a training sample;

the step of training a preset model to be trained based on the training samples comprises:

adding the training samples into a target sample set, wherein the target sample set is a set of samples used for training a target model at the end of the last period, and the target model is a model used for identifying cheating users in the current period;

and inputting the added target sample set into the target model for training.

As an implementation manner of the embodiment of the present invention, the method further includes: and after entering the next period, storing the user information of the first-class visiting users in the current period, identifying the first-class visiting users through the identification model, and returning to the step of determining the user information which does not accord with the preset rule in the user information of the first-class visiting users stored in the current period as a training sample when the current period is finished.

As an implementation manner of the embodiment of the present invention, before the step of adding the training sample to the target sample set, the method further includes: and determining that the online frequency corresponding to each training sample meets the preset frequency.

As an implementation manner of the embodiment of the present invention, the step of determining, as a training sample, user information that does not meet a preset rule in user information of a first type of visiting user that has been stored in a current period includes:

acquiring user information and an operation log of a first type of access user stored in a current period;

judging whether the corresponding operation log of each first-class access user accords with a preset rule or not;

and if the user information does not accord with the preset rule, determining the user information of the first type of visiting user as a training sample.

As an implementation manner of the embodiment of the present invention, the operation log includes one type of operation data; the step of judging whether the corresponding operation log of each first-class access user accords with a preset rule comprises the following steps:

judging whether the operation data accords with a first type preset rule corresponding to the type of each first type access user;

if the preset condition is not met, the step of determining the user information of the first type of visiting user as the training sample comprises the following steps:

and if the operation data does not accord with a first type preset rule corresponding to the type of the operation data, determining the user information of the first type visiting user as a training sample.

As an implementation manner of the embodiment of the present invention, the operation log includes a plurality of types of operation data; the step of judging whether the corresponding operation log of each first-class access user accords with a preset rule comprises the following steps:

judging whether the operation data accords with a second type preset rule corresponding to the type of the operation data aiming at the operation data of each type in the operation log of each first type access user;

if the operation data do not meet the second type preset condition, determining the operation data as target operation data;

for each first type of access user, judging whether the quantity of target operation data corresponding to the first type of access user is not less than a preset numerical value;

and if the quantity of the target operation data corresponding to the first type of visiting user is not less than the preset numerical value, determining the user information of the first type of visiting user as a training sample.

As an implementation manner of the embodiment of the present invention, the types of the operation data include: the method comprises the steps of obtaining the click rate of the access user to the advertisement, the exposure rate of the access user to the advertisement, the distribution proportion of the access time of the access user and the click rate proportion of the access user to the advertisement of the same video in different time periods.

As an implementation manner of the embodiment of the present invention, the step of identifying the first type of access user through the identification model includes:

acquiring user information of the first type of access users;

and inputting the user information into the identification model for detection to obtain an identification result of the first type of access users.

As an implementation manner of the embodiment of the present invention, the step of acquiring the user information of the first type of access user includes:

when the next period is finished, acquiring user information of the first-class access users stored in the next period in an off-line state; or the like, or, alternatively,

when an access request sent by a first type of access user is received, user information of the first type of access user is obtained.

As an implementation manner of the embodiment of the present invention, the method further includes: and when the identification result of the first type of access user is a cheating user, shielding the access request of the first type of access user.

In a second aspect, an embodiment of the present invention provides a training apparatus for identifying a model of a cheating user, where the apparatus includes:

the user information acquisition module is used for acquiring and storing the user information of the first type of access users;

a training sample determining module, configured to determine user information that does not meet a preset rule in stored user information of a first type of visiting user, as a training sample, where the preset rule is: based on the stored user information of a second type of access users, determining a rule through an unsupervised learning algorithm, wherein the user information of the second type of access users is the user information of the access users which is acquired and stored before the user information of the first type of access users is acquired;

the model training module is used for training a preset model to be trained based on the training sample, wherein the model to be trained is a model used for identifying whether the first-class visiting user and the second-class visiting user are cheating users or not;

and the identification model obtaining module is used for stopping training when the accuracy of the output result of the model to be trained reaches a preset accuracy to obtain an identification model for identifying the cheating user.

As an implementation manner of the embodiment of the present invention, the user information obtaining module includes: a user information acquisition submodule, configured to: acquiring and storing user information of a first type of access user in a current period;

the training sample determination module comprises: a training sample determination submodule to: when the current period is finished, determining user information which does not accord with a preset rule in the user information of the first type of access users stored in the current period as a training sample;

the model training module comprises: a sample set adding submodule and a model training submodule; the sample set adding submodule is used for adding the training samples into a target sample set, wherein the target sample set is a set of samples used for training a target model at the end of the last period, and the target model is a model used for identifying cheating users in the current period; and the model training submodule is used for inputting the added target sample set into the target model for training.

As an implementation manner of the embodiment of the present invention, the apparatus further includes: and the information storage and model application module is used for storing the user information of the first type of visiting users in the current period after entering the next period, identifying the first type of visiting users through the identification model and triggering the training sample determination module.

As an implementation manner of the embodiment of the present invention, the apparatus further includes: and the online frequency determining module is used for determining that the online frequency corresponding to each training sample meets a preset frequency before the training samples are added into the target sample set.

As an implementation manner of the embodiment of the present invention, the user information obtaining sub-module includes: the user information acquisition unit is used for acquiring the user information and the operation log of the first type of access users stored in the current period when the current period is finished;

the preset rule judging unit is used for judging whether the corresponding operation log of each first-class access user accords with a preset rule or not, and if not, the training sample determining unit is triggered;

and the training sample determining unit is used for determining the user information of the first type of visiting user as a training sample.

As an implementation manner of the embodiment of the present invention, the operation log includes one type of operation data; the preset rule judging unit is specifically configured to: and judging whether the operation data accords with a first type preset rule corresponding to the type of the operation data or not aiming at each first type access user, and if not, triggering the training sample determining unit.

As an implementation manner of the embodiment of the present invention, the operation log includes a plurality of types of operation data, and the preset rule determining unit includes:

the preset rule judging subunit is used for judging whether the operation data conforms to a second type preset rule corresponding to the type of the operation data or not aiming at the operation data of each type in the operation log of each first type access user, and if not, triggering the data determining subunit;

the data determining subunit is used for determining the operation data as target operation data;

and the preset value judgment subunit is used for judging whether the quantity of the target operation data corresponding to each first-class access user is not less than a preset value or not, and if so, triggering the training sample determination unit.

As an implementation manner of the embodiment of the present invention, the type of the operation data is: the method comprises the steps of obtaining the click rate of the access user to the advertisement, the exposure rate of the access user to the advertisement, the distribution proportion of the access time of the access user and the click rate proportion of the access user to the advertisement of the same video in different time periods.

As an implementation manner of the embodiment of the present invention, the information storage and model application module includes:

the access information acquisition submodule is used for acquiring the user information of the first type of access users;

and the visiting user identification submodule is used for inputting the user information into the identification model for detection to obtain an identification result of the first type of visiting users.

As an implementation manner of the embodiment of the present invention, the access information obtaining sub-module is specifically configured to:

when the next period is finished, acquiring the user information of the first-class access users stored in the next period in an off-line state; or the like, or, alternatively,

and when receiving an access request sent by the first type of access user, acquiring the user information of the first type of access user.

As an implementation manner of the embodiment of the present invention, the apparatus further includes: and the access request shielding module is used for shielding the access request of the first type of access user when the identification result of the first type of access user is the cheating user.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

a processor adapted to implement any of the method steps of the training method for identifying a model of a cheating user provided by the first aspect when executing a program stored in the memory.

In yet another aspect of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform any of the above-described training methods for identifying models of cheating users.

In yet another aspect of the present invention, the present invention also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute any of the above-mentioned training methods for identifying models of cheating users.

Therefore, in the scheme provided by the embodiment of the invention, in the current period, the target model is applied to identify whether the visiting user is a cheating user, and the target model is obtained by training based on the target training sample set when the previous period is finished. When the current period is finished, determining user information which can be used as a training sample from user information of the access users which is stored in the current period according to a preset rule and a preset frequency; the preset rule is determined by an unsupervised learning algorithm based on user information of the access user obtained before the current period; the training samples can be added into a target sample set to obtain a new target sample set; inputting a new target sample set obtained after adding the training samples into a target model for training, stopping training when the accuracy of an output result of the target model reaches a preset accuracy, and obtaining a new target model, wherein the obtained new target model is an identification model for identifying whether an access user is a cheating user in the next period; after entering the next period, the next period is the current period, the user information of the visiting user in the current period can be stored, the visiting user is identified through the obtained identification model, and when the current period is finished, the user information which is in accordance with the preset rule in the user information of the visiting user stored in the current period is returned again to be determined to be used as a step of a candidate training sample, and then the subsequent other steps are executed again, and the whole scheme is carried out circularly according to the period.

As can be seen from the above, in the scheme provided in the embodiment of the present invention, the user information that does not conform to the preset rule in the stored access information of the first type of access user may be determined according to the preset rule, where the preset rule is determined through unsupervised learning based on the user information of the second type of access user that is obtained and stored before the user information of the first type of access user is obtained. Obviously, these determined user information are user information of new occurring types of cheating users that cannot be identified by the currently used identification model. And then, the determined user information can be used as training samples, a preset model to be trained is trained based on the training samples, and when the accuracy of the input result of the model to be trained reaches the preset accuracy, the training is stopped to obtain the identification model for identifying the cheating user. In this way, since the newly obtained recognition model is trained based on the user information of the new type of cheating user that cannot be recognized by the currently used recognition model, the newly obtained recognition model can recognize the new type of cheating user.

In the scheme provided by the embodiment of the invention, the preset rule can be determined based on the user information of the second type of access users through an unsupervised algorithm, so that the user information of the cheating users of the new type can be determined in the stored user information of the first type of access users, and further, a new recognition model is obtained based on the determined user information through training, so that the cheating users of the new type can be recognized by the new recognition model. Therefore, the user information of the cheating users of the new type is marked through the preset rules, the phenomenon that the cheating users of the new type cannot be marked by a manual marking method is avoided, and the identification accuracy of the new identification model obtained through training on the cheating users of the new type and the recall rate of the identification model are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a schematic flowchart of a training method for identifying a model of a cheating user according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a manner of determining that user information of an access user stored in a current period does not conform to a preset rule in a specific implementation manner according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a training apparatus for identifying a model of a cheating user according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

In the prior art, the inventor finds that the existing method at least has the following problems in the process of identifying the cheating user: in the process of obtaining the sample user information by the manual marking method, only the user information of the cheating users of the found type can be marked, and due to the fact that the mode for simulating the cheating users is updated quickly, the user information of the cheating users of the new type cannot be marked by the manual marking method, so that the recognition accuracy of the recognition model obtained through training on the cheating users of the new type is low, and the recall rate of the recognition model is low.

In order to solve the problems in the prior art, an embodiment of the present invention provides a training method for identifying a model of a cheating user, where the method includes:

acquiring and storing user information of a first type of access user;

As can be seen from the above, in the scheme provided in the embodiment of the present invention, the unsupervised algorithm may determine the preset rule based on the user information of the second type of access user, so that the user information of a cheating user of a new type may be determined from the stored user information of the first type of access user, and further, a new recognition model may be obtained based on the user information training determined, so that the new recognition model may recognize the cheating user of the new type. Therefore, the user information of the cheating users of the new type is marked through the preset rules, the phenomenon that the cheating users of the new type cannot be marked by a manual marking method is avoided, and the identification accuracy of the new identification model obtained through training on the cheating users of the new type and the recall rate of the identification model are improved.

The training method for the model of the user for identifying the cheating user provided by the embodiment of the invention can be applied to any electronic equipment, such as a processor, a computer, a server and the like, and is not particularly limited herein, and is hereinafter referred to as the electronic equipment for short.

First, a training method for identifying a model of a cheating user according to an embodiment of the present invention is described below.

As shown in fig. 1, a schematic flowchart of a training method for identifying a model of a cheating user according to an embodiment of the present invention is provided, where the method includes:

s101: acquiring and storing user information of a first type of access user;

s102: determining user information which does not accord with a preset rule in the stored user information of the first type of visiting users as a training sample;

wherein, the preset rule is as follows: based on the stored user information of the second type of access users, the rules are determined through an unsupervised learning algorithm, and the user information of the second type of access users is the user information of the access users which is obtained and stored before the user information of the first type of access users is obtained;

s103: training a preset model to be trained based on the training sample;

the model to be trained is used for identifying whether the first-class visiting user and the second-class visiting user are cheating users.

S104: and when the accuracy of the output result of the model to be trained reaches the preset accuracy, stopping training to obtain the identification model for identifying the cheating user.

It should be noted that, before the step S101, the electronic device receives an access request of a second type of access user, and identifies whether the second type of access user is a cheating user by using a currently used identification model for identifying the cheating user, where the currently used identification model for identifying the cheating user is a preset model to be trained, and the model to be trained is also used for identifying whether the first type of access user is a cheating user.

As can be appreciated, the model to be trained is trained based on the user information of the manually labeled cheating users of the found type, so that the electronic equipment can access all the cheating users of the found type in the users of the second type. Then, the user information of the cheating user may not exist in the user information of the second type of access user stored by the electronic device.

However, since the manner of simulating the cheating users is updated quickly, when the electronic device receives the user request of the first-class access user, the first-class access user may include a new cheating user, and then, the model to be trained is continuously used to identify the first-class access user, so that all the cheating users may not be identified.

That is, in the above step S101, there may be new types of cheating users in the user information of the first type of access user acquired and stored by the electronic device, and these cheating users cannot be identified by the currently used identification model.

In order to accurately identify the cheating users of the newly appeared type in the first type of access users when the subsequent access users are identified, the electronic equipment can train the model to be trained on the basis of the user information of the cheating users which cannot be identified as training samples, so as to obtain an identification model for identifying the cheating users later, the identification model can learn the characteristics of the user information of the cheating users of the newly appeared type in the first type of access users in the training process, and when the access request of the new access user is obtained later, the electronic equipment can accurately identify the cheating users of various types currently existing.

In order to obtain the training sample, in step S102, the electronic device may determine user information that does not conform to the preset rule in the stored user information of the first type of visiting user, and obviously, the user information that does not conform to the preset rule is user information of a new type of cheating user, so that the user information that does not conform to the preset rule may be used as the training sample.

The preset rule is determined by an unsupervised learning algorithm based on the stored user information of the second type of access users. It will be appreciated that the user information for the second type of access user is obtained and stored prior to obtaining the user information for the first type of access user.

It should be noted that the user information of the cheating user may not exist in the second type of access user, and the preset rule may be understood as being determined based on the characteristics of the user information of the real access user, so that when the user information of a certain first type of access user does not conform to the preset rule, it may be determined that the first type of access user may be a new cheating user. Then, the user information of the first type accessing user can be used as a training sample.

Specifically, the user information of the access user may include multiple types of information, and after the stored user information of the second type of access user is obtained, the electronic device may divide the user information into a plurality of information groups through an unsupervised learning algorithm according to the similarity of the type information included in the user information of each second type of access user from the perspective of a certain type of information included in the user information of all the second type of access users.

For example, the electronic device may divide the user information of all the second type accessing users into a plurality of groups according to the click rate of each second type accessing user for the advertisement, wherein the click rate of each second type accessing user for the advertisement is smaller than a preset difference value.

When receiving an access request of a first-class access user, the electronic device may identify the first-class access user by using the model to be trained, and determine whether the first-class access user is a cheating user, so that the access request of the determined cheating user may be shielded, and the user information of the cheating user may not exist in the stored user information of the first-class access user.

That is, in theory, in step S102, the electronic device should not acquire an outlier from the stored user information of the first-class access user according to the preset rule, where the outlier is: and if the user information of one or more first-class access users cannot be classified into any information group classified based on the user information of the second-class access users, the user information of the one or more access users is the outlier.

Once the electronic device acquires the outlier, it can be stated that a new way of simulating the cheating user may occur in the process of receiving the access request of the first-class access user, so that user information of the cheating user with new characteristics is generated, and the cheating user cannot be identified through the model to be trained. That is, the access information corresponding to the outlier is likely to be the user information of the access user that cannot be trained by the model to be trained, and therefore, the electronic device may use the access information corresponding to the outlier as the training sample.

Therefore, in step S102, the electronic device may obtain, according to the preset rule, user information that does not conform to the preset rule from the stored user information of the first type of visiting user, and the user information that does not conform to the preset rule may be used as a training sample. That is, the electronic device may determine the outliers in the stored user information of the first type of access user according to the preset rule.

In this embodiment of the present invention, the unsupervised learning algorithm in step S102 may be any learning algorithm that can divide the stored user information of the second type of access users into a plurality of groups according to the similarity of the type information included in the user information of each second type of access user from the perspective of a certain type of information included in the user information of all the second type of access users through the unsupervised learning algorithm. For example, the k-means Clustering algorithm, DBSCAN (Density-Based Clustering of Applications with Noise, representative Density-Based Clustering), iForest (Isolation Forest) algorithm, etc. may be used, and are not particularly limited herein.

In the following, the preset rule is described by taking a k-means clustering algorithm as an example:

in general, the user information of the second type of access users includes access time, and in this case, from the angle of proportional distribution of access time included in the user information of all the second type of access users, the stored user information of the second type of access users may be clustered by using a k-means clustering algorithm according to the similarity degree of proportional distribution of access time of each second type of access user, so that a plurality of clusters may be obtained.

It can be understood that the distance between the user information of each second type visiting user in each cluster and the cluster center of the cluster meets a preset similarity threshold, that is, the difference between the visiting time proportion distribution of each second type visiting user in each cluster and the visiting time proportion distribution of the second type visiting user corresponding to the cluster center of the cluster is within a preset range. The preset range can be set according to the identification accuracy requirement of the cheating user in practical application, when the identification accuracy requirement is high, the preset range can be smaller, and otherwise, the preset range can be larger.

It should be noted that the multiple clusters obtained by the k-means clustering algorithm reflect the distribution of the access time proportion of the normal users, and when the electronic device finds the outlier in the stored user information of the first-class access user according to the preset rule, it indicates that the first-class access user has a great possibility of being not a normal user but a cheating user, and further, the information of the outlier can be used as a training sample.

When the electronic device is a server, the user information that does not comply with the preset rule in the stored user information of the first type of access user may be the user information that does not comply with the preset rule in the user information of the first access user that is stored corresponding to the access request when the server receives the access request.

When the electronic device is a non-server electronic device such as a processor, a computer and the like, the electronic device can establish communication connection with the server, send a user information acquisition request to the server to request to acquire user information which does not accord with a preset rule in user information of a first type of access user stored in the server, and receive the user information which does not accord with the preset rule in the stored user information of the first type of access user and is sent by the server when the server responds to the user information acquisition request.

When the electronic device is a non-server electronic device such as a processor or a computer, the electronic device can establish communication connection with the server, and the server can send user information which does not accord with preset rules in the stored user information of the first type of access users to the electronic device.

In the application, the electronic device does not specifically limit the way of determining the user information which does not conform to the preset rule in the stored user information of the first-class access user.

After obtaining the training samples, the electronic device may execute the above steps S103 to S104, train a preset model to be trained based on the training samples, and stop the training when the accuracy of the output result of the model to be trained reaches the preset accuracy, to obtain an identification model for identifying the cheating user.

In the training process, the model to be trained can learn the characteristics of the training samples, that is, the model to be trained can learn the characteristics of the user information of each new type of cheating user in the first type of visiting users. Through the learning of a large number of training samples, the model to be trained can match the characteristics of the input user information with the characteristics of the learned user information of various types of cheating users, so that the access user corresponding to the input user information is identified, whether the access user is a cheating user is determined, and then the identification model for identifying the cheating user is obtained.

The characteristics of the user information of the various types of cheating users may include: the model to be trained learns the characteristics of the user information of each type of cheating user before training based on the training sample, and the characteristics of the user information of each new type of cheating user in the first type of visiting users.

After the model to be recognized is trained based on the training sample to obtain the recognition model for recognizing the cheating user, the recognition model can be used for predicting the training sample to obtain the output result of the recognition model. Then, whether the prediction result of the recognition model on each training sample is correct or not can be judged, the accuracy rate is calculated, and the accuracy rate of the output result of the recognition model is further obtained.

For example, if the number of the training samples is 200, and the prediction results of 194 training samples are correct, the accuracy rate may be calculated to be 97%, that is, the accuracy rate of the output result of the recognition model for recognizing the cheating user is 97%.

When the output accuracy of the obtained recognition model for recognizing the cheating user reaches the preset accuracy, the training can be stopped to obtain the recognition model which is finally trained, and then the recognition model can be used for recognizing the following visiting user to determine whether the visiting user is the cheating user.

The preset accuracy can be determined according to the requirement on the identification accuracy of the cheating user in practical application, and when the requirement on the identification accuracy of the cheating user is high, the preset accuracy can be high.

It should be noted that the output accuracy of the recognition model obtained by training based on the training sample is generally not less than that of the model to be trained, so that the recognition accuracy of the recognition model obtained by training for the cheating user is not less than that of the model to be recognized for the cheating user, and further, the recognition accuracy of the cheating user is not reduced.

Therefore, in one case, when the output accuracy of the recognition model obtained based on the training sample is always smaller than the output accuracy of the model to be trained after multiple iterations, it may be considered that the representativeness of the training sample determined in step S102 by the electronic device is insufficient, or the obtained user information that does not conform to the preset rule cannot be used as the training sample, and therefore, in this case, the model to be trained may be continuously used for recognizing the cheating user, so as to ensure that the accuracy of the recognition of the cheating user is not reduced.

In another case, when the number of training samples determined by the electronic device in step S102 is zero, that is, there is no way for a new simulated cheating user to appear in the first-class visiting users, and thus, the model to be trained can be continuously used for identifying the cheating user.

It should be noted that, the electronic device may periodically update the identification model applied in the current period for identifying the cheating user, and for convenience of distinguishing from the above identification model, the model for identifying the cheating user in the current period is referred to as a target model. Then, the set of samples used to train the target model at the end of the last cycle may be referred to as a target sample set.

Then, it can be understood that, at the end of the previous period, the electronic device may train based on the target sample set to obtain a target model, and then, in the current period, identify the visiting user by using the target model to determine whether the visiting user is a cheating user. Further, the electronic device may identify the same type of cheating user as included in the target sample set. However, since the manner of simulating the cheating users is updated faster, in the current period, the cheating users of different types from the cheating users included in the target sample set may appear, and the cheating users cannot be identified by the target model used in the current period, that is, the electronic device includes the user information of the cheating users that cannot be identified by the target model used in the current period in the user information of the accessing user stored in the current period.

In order to accurately identify the cheating users which cannot be identified by the target model used in the current period in the next period, the electronic equipment can take the user information of the cheating users which cannot be identified as training samples, add the training samples to the target sample set, and train the target model again by using the target sample set to which the training samples are added, so that the identification model used for identifying the cheating users in the next period is obtained, the identification model can learn the characteristics of the user information of the cheating users which cannot be identified in the training process, and the cheating users of various types currently existing in the next period can be accurately identified.

Then, in a first specific implementation manner provided in the embodiment of the present invention:

in the step S101, the obtaining and storing the user information of the first type of access user may include: acquiring and storing user information of a first type of access user in a current period;

in step S102, determining user information that does not meet a preset rule in the stored user information of the first type of visiting user, as a training sample, which may include: when the current period is finished, determining user information which does not accord with a preset rule in the user information of the first type of access users stored in the current period as a training sample;

in step S103, training the preset model to be trained based on the training sample may include: adding training samples into a target sample set, wherein the target sample set is a set of samples used for training a target model at the end of the last period, and the target model is a model used for identifying cheating users in the current period; and inputting the added target sample set into a target model for training.

Specifically, in this implementation manner, in order to obtain the training sample, the electronic device may determine, by analyzing the user information of the visiting user that has been stored in the current period, user information that does not meet a preset rule in the user information of the visiting user that has been stored in the current period. Obviously, the user information that does not conform to the preset rule is the user information of the cheating user that cannot be identified by the target model in the current period, and then the user information that does not conform to the preset rule can be the user information of the cheating user of a new type in the current period. Therefore, the user information that does not conform to the preset rule can be used as the training sample.

The preset rule may be a rule determined by an unsupervised learning algorithm based on user information of the visiting user obtained before the current period.

Specifically, the electronic device may determine, in multiple ways, user information that does not meet a preset rule in the user information of the first type of access user that is stored in the current period, and use the user information as a training sample. For clarity, the following description will exemplify a manner in which the electronic device determines, as a training sample, user information that does not meet a preset rule in the user information of the first type of visiting user that is stored in the current period.

After determining the training samples, the electronic device may perform the addition of the training samples to the target sample set, as described above, where the target sample set is a set of samples used for training the target model at the end of the previous cycle, and the trained target model is used for identifying the visiting user in the current cycle and determining whether the visiting user is a cheating user.

After the training samples are added to the target sample set, the electronic device may input the added target sample set into the target model for training, and when the accuracy of the output result of the target model reaches a preset accuracy, stop training to obtain the recognition model for recognizing the cheating user.

In the training process, the target model can learn the characteristics of the user information in the added target sample set, that is, the target model can learn the characteristics of the user information of various types of cheating users appearing in the current period. Through the learning of a large number of training samples in the added target sample set, the target model can match the characteristics of the input user information with the characteristics of the learned user information of various types of cheating users, so that the access user corresponding to the input user information is identified, whether the access user is the cheating user or not is determined, and then the identification model for identifying the cheating user is obtained.

After the added target sample set is input into the target model and trained to obtain the target model, the training samples in the added target sample set can be predicted by using the trained target model to obtain an output result of the target model. Then, whether the prediction result of the target model on each training sample in the added target sample set is correct or not can be judged, the accuracy rate is calculated, and the accuracy rate of the output result of the recognition model is further obtained. When the output accuracy of the target model reaches the preset accuracy, the training can be stopped, and the recognition model for recognizing the cheating user is obtained.

It can be understood that, in order to periodically expand the number and types of samples in the target sample set, the types of cheating users that can be included in the target sample set are increased, so that the identification model for identifying the cheating users can always quickly identify the cheating users of new types, and the identification accuracy and recall rate of the identification model are ensured.

Then, the identification model for identifying the cheating user obtained in the first specific implementation manner may be an identification model for identifying the cheating user in the next cycle, that is, in the next cycle, the visiting user may be identified by using the identification model obtained in the above embodiment manner to determine whether the visiting user is the cheating user.

Therefore, on the basis of the first specific implementation manner, a second specific implementation manner provided as an embodiment of the present invention may further include the following steps:

step A1: and after entering the next period, storing the user information of the first-class access users in the current period, identifying the first-class access users through an identification model, and returning to the step of determining the user information which does not accord with the preset rule in the user information of the first-class access users stored in the current period as a training sample when the current period is ended.

That is, after entering the next period, the electronic device may identify the access user using the obtained identification model, determine whether the access user is a cheating user, and store user information of the access user that is not determined as the cheating user. Furthermore, when the period is over, the electronic device may return to execute the acquisition and storage of the user information of the first type of visiting user in the current period, so that the user information of the cheating user who newly appears in the period and cannot be identified by the identification model may be acquired as a training sample, and further, the target sample set may be further expanded, so that the identification model obtained by training based on the expanded target sample set may identify the cheating user who newly appears in the next period. Furthermore, the identification accuracy of various types of cheating users of the identification model and the recall rate of the identification model can be improved.

It can be understood that, many times, when the electronic device determines user information that does not meet the preset rule from among the user information of the first type of access users that has been stored in the current period, the user information of the real user in the user information of the first type of access users that has been stored in the current period may be determined as an outlier.

For example, it is assumed that the access times of the real users are determined to be six am to twelve am every day after the access times of the access users are grouped by using an unsupervised learning algorithm based on the stored user information of the second type of access users. In a certain day in the current period, a certain real user needs to send an access request at two points in the morning of the day due to work reasons, and obviously, in this case, because the user is a real user, the preset model to be trained cannot identify the user, so that the user information of the user can be stored in the current period to acquire and store the user information of the first type of access user. And when the user information which does not accord with the preset rule is determined in the stored user information of the first-class visiting users in the current period, the user information of the user becomes an outlier, so that the user information is considered as the user information of a newly-appeared cheating user by the electronic equipment, and the user information is used as a training sample.

Obviously, when the current period is finished, the user information which is determined in the user information of the first-class visiting users and does not conform to the preset rule and is stored in the current period is used as the training sample to train the preset model to be trained, so that the user information under the accidental condition of the real user can be misjudged as the training sample, and the finally obtained recognition model for recognizing the cheating user has low recognition accuracy.

Therefore, in order to avoid misjudging the user information under the accidental condition of the real user as the training sample, the accuracy of the obtained training sample is improved, and the finally obtained recognition model for recognizing the cheating user is ensured to have higher recognition accuracy.

On the basis of the first implementation manner provided by the embodiment of the present invention, as a third implementation manner provided by the embodiment of the present invention, before the step of adding the training samples to the target sample set, the electronic device may determine that the online frequency corresponding to each training sample satisfies the preset frequency.

Specifically, after the training samples are obtained, the electronic device can judge whether the online frequency corresponding to each training sample meets the preset frequency, and delete the training samples of which the online frequency does not meet the preset frequency, so that the determined online frequency of each training sample meets the preset frequency. Furthermore, the electronic device may add the training samples with the determined online frequency satisfying the preset frequency to the target sample set.

The online frequency is understood to be: and dividing the preset period into a plurality of time periods with equal length, and counting the number of the time periods in which the training samples appear aiming at each training sample, wherein the number is the online frequency of the training sample.

For example, assuming that the duration of the current period is ten days, according to a preset duration of 24 hours, the current period may be divided into ten time periods, and the time periods are numbered by using 1 to 10, where the preset frequency is 6, and when the online frequency of the training samples is not less than the preset frequency, it is determined that the frequency of the training samples meets the preset frequency.

If the training sample a is found to appear in the 1 st, 2 nd, 3 rd, 4 th, 5 th, 7 th, 8 th and 9 th time periods through statistics, that is, the number of the time periods in which the training sample a appears can be determined to be 8, that is, the online frequency of the training sample a is 8, and since 8>6, the electronic device determines that the online frequency of the training sample a meets the preset frequency, and then, the training sample a can be added to the target sample set.

It should be noted that the foregoing example is only one implementation manner of the embodiment of the present invention, and the embodiment of the present invention does not limit the online frequency of the training sample and the specific content of the preset frequency, nor limits the specific manner for determining that the online frequency of the training sample satisfies the preset frequency.

When the electronic device determines that the online frequency corresponding to a certain training sample does not meet the frequency condition, the electronic device may continue to determine whether the online frequency corresponding to the next training sample meets the preset frequency.

It should be noted that the electronic device sequentially determines that the online frequency of each training sample meets the preset frequency, and adds the training sample whose online frequency meets the preset frequency to the target sample set, or may simultaneously determine that the online frequency of each training sample meets the preset frequency, and adds the training sample whose online frequency meets the preset frequency to the target sample set. This is all reasonable.

Next, a method for determining, by the electronic device, user information that does not meet a preset rule in user information of a first type of access user stored in a current period is described as a training sample.

Specifically, as shown in fig. 2, the method may include the following steps:

s201: acquiring user information and an operation log of a first type of access user stored in a current period;

the user information of the first type of accessing user may include multiple types of user information, which may identify the user characteristics of the first type of accessing user, such as user IP, user ID, browser-related information, such as browser type, cookie, and the like. The operation log may include multiple types of operation data, where the operation data may be data identifying the online state of the first type of access user, for example, data identifying the online time of the first type of access user, data identifying the online duration of the first type of access user, or data obtained by statistics of various types of operations performed by the first type of access user when the first type of access user is online, for example, the click rate of the first type of access user on various types of resources obtained by statistics when the first type of access user is online.

Generally, when receiving an access request, a server may store user information of first-class access users corresponding to the access request, and track, for the user information of each first-class access user, various operations performed by the user information in online time, so as to store various types of operation data of the first-class access user in an operation log. That is, after receiving an access request, the server may store access information and an operation log of a first type of access user corresponding to the access request. The operation log may include one type of operation data or may include multiple types of operation data, which is not specifically limited in this application.

It should be noted that, when the electronic device is a server, the user information and the operation log of the first type of access user that have been stored in the current period may be the user information and the operation log of the first type of access user that are stored in the current period when the server receives an access request.

When the electronic device is a non-server electronic device such as a processor, a computer and the like, the electronic device can establish communication connection with the server, and when the current period is finished, the electronic device can send an information acquisition request to the server to request to acquire user information and operation logs of a first type of access user stored in the server in the current period, and when the server responds to the information acquisition request, the user information and the operation logs of the first type of access user stored in the server in the current period are received.

When the electronic device is a non-server electronic device such as a processor or a computer, the electronic device may establish a communication connection with the server, and the server may send the user information and the operation log of the first type of access user, which are stored in the current period, to the electronic device when the current period ends.

In the application, the way of acquiring the user information and the operation log of the first-class access user, which are stored in the current period, by the electronic device is not specifically limited.

S202: for each first-class access user, judging whether the corresponding operation log meets a preset rule, if the operation log corresponding to the first-class access user does not meet the preset rule, executing step S203;

after the user information and the operation log of the first type of access user stored in the current period are obtained, for each first type of access user, the electronic device may determine whether the operation log corresponding to the first type of access user conforms to a preset rule, and if the operation log corresponding to the first type of access user does not conform to the preset rule, the electronic device may continue to execute step S203.

In general, when the first type of access user is a real user, the numerical value of each type of operation data of the first type of access user stored by the server is usually within a relatively determined numerical value range, or is usually a relatively determined numerical value, so that by using an unsupervised learning algorithm, when the user information of the first type of access user is grouped according to the similarity of each type of operation data, the user information of the first type of access user can be divided into a plurality of groups, and no outlier exists.

For example, when the first type accessing user is a real user, the number of clicks of the advertisement by users of the same user IP is usually within the value set [ 1.98%, 2.02% ], and for example, when the first type accessing user is a real user, the number of access traffic ratios of the pre-posted advertisement, the mid-posted advertisement and the pause advertisement in a certain television play is usually 1:7: 2.

Therefore, the preset condition may be that the numerical value of each type of operation data in the operation log does not conform to the numerical range of each type of operation data corresponding to the real user, or the numerical value of each type of operation data is greatly different from the numerical value of each type of operation data corresponding to the real user, so that when the operation log corresponding to the first type of access user is far different from the operation log of the real user, the user information of the first type of access user may become an outlier due to the fact that the operation log does not conform to the preset rule, and therefore, the probability that the first type of access user is a cheating user is high, and then step S203 may be executed.

If the electronic device performs the determination result in the step S202 that the operation log corresponding to the first type of access user meets the preset rule, the electronic device may continue to determine whether the operation log corresponding to the next first type of access user meets the preset rule.

It should be noted that it is reasonable that the electronic device may sequentially determine whether the operation log corresponding to each first-type access user meets the preset rule, or may simultaneously determine whether the operation logs corresponding to all the first-type access users meet the preset rule.

S203: determining the user information of the first type of visiting user as a training sample;

when the operation logs corresponding to the first-class access users do not accord with the preset rules, the numerical values of the operation data of the types in the operation logs corresponding to the first-class access users are different from the numerical values of the operation data of the types in the operation logs corresponding to the real users, so that the probability that the first-class access users are cheating users is high, the first-class access users are not identified as cheating users by the target model, therefore, the first-class access users are probably new-type cheating users in the current period to a great extent, and further, the user information of the first-class access users can be added into the target sample set as training samples to a great extent. Accordingly, the electronic device may determine the user information of the visiting user as a training sample.

It should be noted that, in the operation log of the first type access user corresponding to the access request, which is stored by the server when the access request is received, all the operation data of the first type access user may be generally included, however, for the embodiment of the present invention, the electronic device may acquire one type or multiple types of operation data according to the requirement of the actual application without including all the operation data of the first type access user in the operation log of the first type access user, which is stored in the current period and acquired when the above step S201 is executed.

In general, different types of operation data have different functions for determining whether the user information corresponding to the operation data is a training sample, and therefore, different types of operation data have different weight values for determining whether the user information corresponding to the operation data is a training sample, and the larger the function of a certain type of operation data is, the higher the weight value of the certain type of operation data can be.

For example, it is assumed that the weighted value of the first-class visiting user to the advertisement click rate for determining whether the user information corresponding to the first-class visiting user is the candidate training sample is 80%, and the weighted value is higher, which indicates that the first-class visiting user has a greater effect on the advertisement click rate for determining whether the user information corresponding to the first-class visiting user is the training sample, and may play a decisive role, so that the operation log obtained when the electronic device performs step S201 may only include the advertisement click rate of the visiting user.

For another example, assuming that the click rate of the first type of access user to the advertisement, the access time distribution ratio of the first type of access user, and the click rate ratio of the first type of access user to the advertisement in different time periods of the same video are respectively 40%, 30%, and 30% of the weighted value that is used for judging whether the corresponding user information is the training sample, it is described that the functions of the three types of operation data on judging whether the corresponding user information is the training sample are relatively average and cannot play a decisive role, and therefore, the operation log obtained when the electronic device executes the step S201 may include the three types of operation data.

When the number of types of operation data included in the operation log of the first-class access user, which is stored in the current period and acquired by the electronic device when the step S201 is performed, is different, the preset rule adopted by the electronic device when the step S202 is performed may be different. Next, a specific manner in which the electronic device executes the step S202 when the operation log includes one type of operation data and a plurality of types of operation data respectively is illustrated.

In one implementation, the operation log may include a type of operation data.

The step S202 of determining whether the corresponding operation log of each first-class access user meets the preset rule may include:

for each first-type access user, judging whether the operation data conforms to a first-type preset rule corresponding to the type of the operation data, and if the operation data does not conform to the first-type preset rule corresponding to the type of the operation data, executing the step S203;

after obtaining the user information and the operation log of the first type of access user stored in the current period, for each first type of access user, the electronic device may determine the type of the operation data included in the operation log corresponding to the first type of access user, and further determine whether the operation data conforms to the first type preset rule corresponding to the type of the operation data, and if the operation data does not conform to the first type preset rule corresponding to the type of the operation data, the electronic device continues to perform step S203. That is, if the operation data does not conform to the first type preset rule corresponding to the type of the operation data, the electronic device may determine that the user information of the first type visiting user is a training sample.

For example, the type of the operation data is the advertisement click rate of the first-type visiting user, the corresponding first-type preset rule is that the value of the advertisement click rate of the first-type visiting user is located in the set [ 1.98%, 2.02% ], the advertisement click rate included in the operation log of the first-type visiting user a acquired by the electronic device is 3.5%, and the electronic device may determine that the user information of the first-type visiting user a is the training sample because 3.5% is located outside the set [ 1.98%, 2.02% ]. If the advertisement click rate included in the operation log of the first-type visiting user B acquired by the electronic device is 1.99%, the electronic device may determine that the user information of the first-type visiting user a is a non-training sample because 1.99% is located in the set [ 1.98%, 2.02% ].

It should be noted that after the user information and the operation log of the first type of access user stored in the current period are acquired, the electronic device may sequentially determine, for each first type of access user, whether the operation data included in the operation log conforms to a first type preset rule corresponding to the type of the operation data; it is reasonable to determine whether the operation data included in the operation log conforms to the first-type preset rule corresponding to the type of the operation data for all the first-type access users at the same time.

In another implementation, the operation log may include a plurality of types of operation data;

step B1: for each type of operation data included in the operation log of each first type of access user, judging whether the operation data conforms to a second type preset rule corresponding to the type of the operation data, and if not, executing the step B2;

step B2: determining the operation data as target operation data;

after obtaining the user information and the operation log of the first type of access user stored in the current period, for each first type of access user, the electronic device may determine the type of the operation data included in the operation log corresponding to the first type of access user, and further determine whether each operation data conforms to the second type pre-rule corresponding to the type of the operation data, and if the operation data does not conform to the second type pre-rule corresponding to the type of the operation data, the electronic device continues to execute step B2. Namely, the operation data which does not conform to the second type preset rule corresponding to the type is determined as the target operation data.

Since different types of operation data have different functions for determining whether the user information corresponding to the operation data is the training sample, although when the operation data is determined to be the target operation data, it can be stated that the operation data is far different from the operation data of the real user, that is, the probability that the first type of visiting user corresponding to the operation data is the cheating user is high, in this case, it may not necessarily be stated that the first type of visiting user corresponding to the operation data is the cheating user. Therefore, in order to more accurately determine the training sample in the acquired user information of the first type of visiting user, the electronic device may continue to perform step B3 after determining the target operation data.

It should be noted that it is reasonable that the user may sequentially determine whether the operation data included in the acquired operation logs conforms to the second type preset rule corresponding to the type of the operation data, or may simultaneously determine whether the operation data included in each acquired operation log conforms to the second type preset rule corresponding to the type of the operation data.

Step B3: for each first-class access user, judging whether the quantity of the target operation data corresponding to the first-class access user is not less than a preset numerical value, and if the quantity of the target operation data corresponding to the first-class access user is not less than the preset numerical value, executing the step S203;

for each first-class access user, the electronic device may determine the amount of the target operation data corresponding to the first-class access user, and further may determine whether the amount is not less than a preset value, and if the amount is not less than the preset value, it indicates that the multiple types of operation data of the first-class access user are far different from the operation data of the real user, and further, it indicates that the first-class access user is a cheating user with a high possibility, the electronic device may continue to execute step S203, that is, when the amount is not less than the preset value, the electronic device may determine that the user information of the first-class access user is a training sample.

The preset value can be set according to the function of the different types of operation data on judging whether the user information corresponding to the operation data is the training sample or not, and the requirement on the identification accuracy of the cheating user in practical application, for example, the higher the requirement on the identification accuracy of the cheating user in practical application is, the larger the preset value can be.

For example, for the first type of access user C, the operation log of the first type of access user C acquired by the electronic device includes four types of operation data, which are the click rate of the first type of access user to the advertisement, the exposure rate of the first type of access user to the advertisement, the access time distribution ratio of the first type of access user, and the click rate ratio of the first type of access user to the advertisement of the same video in different time periods, respectively, where through the above steps B1-B2, the electronic device determines that the advertisement click rate, the exposure rate of the first type of access user to the advertisement, and the access time distribution ratio of the first type of access user in the operation log of the first type of access user C are target operation data, the number of the target operation data is 3, and the preset value is 3, the electronic device may determine that the number of the target operation data corresponding to the access user C is not less than the preset value, the user information of the first type visiting user C may be determined as a training sample.

It should be noted that when the number of the target operation data corresponding to the first-class access user is not less than the preset value, it is indicated that the plurality of operation data of the first-class access user are all far different from the operation data of the real user, so that the situation that the real user accidentally generates abnormal operation data can be eliminated to a greater extent, and thus the accuracy of the determined training sample is improved, and further, the electronic device can further determine the training sample from the training sample.

When the electronic device executes step B3, it may sequentially determine, for each first-class access user, whether the amount of the target operation data corresponding to the first-class access user is not less than a preset value, or determine, for all first-class access users, whether the amount of the target operation data corresponding to the first-class access user is not less than the preset value. This is all reasonable.

Optionally, the step B3, for each first-class access user, determining whether the amount of the target operation data corresponding to the first-class access user is not less than a preset value, may include:

for each first-class access user, determining whether the sum of the weights of the target operation data corresponding to the first-class access user is not less than a preset weight value, and if the sum of the weights of the target operation data corresponding to the first-class access user is not less than the preset weight value, executing the step S203.

When judging whether the user information of the first-class visiting user corresponding to the target operation data is a training sample, judging whether the user information corresponding to the operation data is a weighted value occupied by the training sample according to different types of operation data pairs, determining the sum of the weighted values of the target operation data, and when the sum of the weighted values is greater than a preset weighted value, indicating that the determined target operation data can largely determine that the corresponding first-class visiting user is a cheating user, and further determining that the user information of the first-class visiting user corresponding to the target operation data is the training sample.

The preset weight value may be set according to an effect of different types of operation data on judging whether the user information corresponding to the operation data is a training sample, and a requirement for the identification accuracy of the cheating user in the actual application, for example, the higher the requirement for the identification accuracy of the cheating user in the actual application is, the larger the preset weight value may be.

For example, it is assumed that types of operation data included in the operation log of the first type of access user are a click rate of the first type of access user to an advertisement, an access time distribution ratio of the first type of access user, and a click rate ratio of the first type of access user to an advertisement of the same video in different time periods, and weight values of the three types of operation data for determining that user information corresponding to the three types of operation data is training samples are respectively 80%, 5%, and 15%, and a preset weight value is 70%.

The electronic device judges the operation data included in the operation log of the first-class visiting user D, and determines that only the first-class visiting user has the advertisement click rate as the target operation data, and the sum of the weights of the target visiting data is 80%, and since 80% > 70%, the electronic device can determine the user information of the first-class visiting user D as the training sample.

The electronic equipment judges operation data included in the operation log of the first-class visiting user E, determines that the visiting time distribution proportion of the first-class visiting user and the click rate proportion of the first-class visiting user to advertisements of the same video in different time periods are target operation data, the sum of the weights of the target access data is 20%, and the electronic equipment cannot determine user information of the first-class visiting user D as a training sample due to the fact that 20% is less than 70%.

As an implementation manner of the embodiment of the present invention, the type of the operation data included in the operation log of the first type access user, which is acquired by the electronic device in the step S201 and stored in the current cycle, may be one or more of the following operation data:

the method comprises the steps that the click rate of a first type of access users to advertisements, the exposure rate of the first type of access users to the advertisements, the distribution proportion of the access time of the first type of access users, and the click rate proportion of the first type of access users to the advertisements of the same video in different time periods.

Of course, the operation data included in the operation log of the first-class access user, which is acquired by the electronic device in the step S201 and stored in the current period, may also include other types of operation data, which is not specifically limited in this application.

As an implementation manner of the embodiment of the present invention, the step of identifying the first type of access user through the identification model in step B1 may include:

step C1: acquiring user information of a first type of access user;

after entering the next period, the electronic device may identify the first type of visiting user by using an identification model trained based on the added target sample set when the current period is over, and determine whether the first type of visiting user is a cheating user.

When the electronic equipment is a server, user information of a first type of access user corresponding to the received access request can be acquired; when the electronic device is other electronic device than the server, the electronic device establishes communication connection with the server, and when the server receives an access request, the server can send user information of a first type of access user corresponding to the access request to the electronic device. The method for acquiring the user information of the first-class access user by the electronic equipment is not specifically limited.

Step C2: and inputting the user information into the identification model for detection to obtain an identification result of the first-class access user.

After the electronic equipment acquires the user information of the first-class access user, the user information can be input into the identification model, and the identification model can learn the characteristics of the user information of each type of cheating user appearing in the current period, so that the identification model can match the characteristics of the input user information with the learned characteristics of the user information of each type of cheating user, thereby identifying the first-class access user corresponding to the input user information and determining whether the first-class access user is a cheating user.

It should be noted that the identification model established in the above for identifying the cheating user in the next period may be used in an offline state or an online state. While the manner in which the electronic device obtains the user information of the first type accessing user in the step C1 may be different when the recognition model is used in different states, the manner in which the electronic device obtains the user information of the first type accessing user in the step C1 in different states will be illustrated below.

In one embodiment, the step C1 of acquiring the user information of the first type of access user may include:

and when the next period is finished, acquiring the user information of the first-class access user stored in the next period in an off-line state.

When an access request is received, the server can store user information of a first type of access user corresponding to the access request, when the next period is finished, the user information of the first type of access user stored in the next period can be obtained through obtaining an access log of the server in an off-line state, the obtained user information is input into the identification model, the first type of access user is identified by the identification model, and whether the first type of access user is a cheating user or not is determined.

When the electronic equipment is a server, the electronic equipment can acquire user information of a first type of access user stored in a local current period; when the electronic device is a non-server device, the electronic device may establish a communication connection with the server, and further may receive user information of the first type of access user stored in the current period sent by the server when the next period ends, or send an information acquisition request to the server, where the information acquisition request may include a preset time period, and the user information of the first type of access user stored in the current period sent by the server in response to the information acquisition request is acquired.

In another embodiment, the step C1 of acquiring the user information of the first type of access user may include:

and when receiving an access request sent by a first-class access user, acquiring user information of the first-class access user.

In this embodiment, after entering the next period, the electronic device may obtain, in real time, the user information of the first type of access user corresponding to the received access request. That is to say, the electronic device may obtain, in an online state, user information of a first type of access user in a next period in real time, further input the obtained user information into the identification model, identify the first type of access user by using the identification model, and determine whether the first type of access user is a cheating user.

When the electronic equipment is a server, the electronic equipment can acquire user information of a first type of access user corresponding to an access request when the electronic equipment receives the access request; when the electronic device is a non-server device, the electronic device may establish a communication connection with the server, and further, when the server receives an access request, the server may send user information of a first type of access user corresponding to the access request to the electronic device. This is all reasonable.

As an implementation manner of the embodiment of the present invention, the training method for identifying a model of a cheating user may further include:

and when the identification result of the first type of access users is the cheating user, shielding the access request of the first type of access users.

Optionally, after the first type of access user corresponding to the access request is determined to be a cheating user, the electronic device may mark target user information carried in the access request, where the mark indicates that the target user information is the cheating user information. When the access request received again by the electronic device carries the marked target user information, the electronic device can identify the first type of access user corresponding to the access request as a cheating user through the mark, and further shield the access request.

Optionally, after the first type of access user corresponding to the access request is determined to be a cheating user, the electronic device may also record target user information carried in the access request to obtain a cheating user information statistics table. When the access request received by the electronic device again carries the marked target user information, the electronic device can match the target user information obtained again with the information in the cheating user information statistical table, and determine whether the target user information is recorded in the coordinate user information statistical table, so that the electronic device can determine whether the first type of access user corresponding to the access request is a cheating user, and if so, the access request can be shielded.

As an implementation manner of the embodiment of the present invention, after the first type of access user corresponding to the access request is determined to be a cheating user, the electronic device may further send target user information of the first type of access user to another electronic device in communication connection with the first type of access user, and when the other electronic device receives the access request carrying the target user information, the access request may also be shielded.

Therefore, in the embodiment, the electronic device can shield the access request carrying the information of the cheating user, so that the influence of the cheating user on the click rate or the play rate of various resources can be reduced, the factuality of the click rate or the play rate of the resources obtained through statistics is improved, and the adverse influence brought by the cheating user is reduced when decision is made according to the click rate or the play rate of the resources.

Corresponding to the training method for identifying the model of the cheating user provided by the embodiment of the present invention, an embodiment of the present invention further provides a training device for identifying the model of the cheating user, as shown in fig. 3, the training device includes:

a user information obtaining module 310, configured to obtain and store user information of a first type of access user;

the training sample determining module 320 is configured to determine user information that does not meet a preset rule in the stored user information of the first type of visiting user, and use the user information as a training sample.

the model training module 330 is configured to train a preset model to be trained based on a training sample, where the model to be trained is a model for identifying whether the first-class visiting user and the second-class visiting user are cheating users;

and the recognition model obtaining module 340 is configured to stop training when the accuracy of the output result of the model to be trained reaches a preset accuracy, so as to obtain a recognition model for recognizing the cheating user.

As an implementation of an embodiment of the present invention,

the user information obtaining module 310 may include:

a user information obtaining sub-module (not shown in fig. 3) for obtaining and storing user information of the first type of visiting user in the current period;

the training sample determination module 320 may include:

a training sample determining submodule (not shown in fig. 3) configured to determine, when the current period ends, user information that does not meet a preset rule in user information of a first type of access user that has been stored in the current period, as a training sample;

the model training module 330 may include: a sample set addition sub-module (not shown in FIG. 3) and a model training sub-module (not shown in FIG. 3);

a sample set adding submodule (not shown in fig. 3) for adding the training samples to the target sample set;

the target sample set is a set of samples used for training a target model when the last period is finished, and the target model is a model used for identifying cheating users in the current period.

A model training sub-module (not shown in fig. 3) for inputting the added target sample set into the target model for training;

as an implementation manner of the embodiment of the present invention, the training apparatus for identifying a model of a cheating user may further include:

and the information storage and model application module (not shown in fig. 3) is configured to store user information of the first-class visiting user in the current period after entering the next period, identify the first-class visiting user through the identification model, and trigger the training sample determination module.

and an online frequency determination module (not shown in fig. 3) configured to determine that the online frequency corresponding to each training sample satisfies a preset frequency before adding the training sample to the target sample set.

As an implementation manner of the embodiment of the present invention, the user information obtaining sub-module (not shown in fig. 3) may include:

a user information obtaining unit (not shown in fig. 3) configured to obtain, at the end of the current period, user information and an operation log of a first type of access user that have been stored in the current period;

a preset rule determining unit (not shown in fig. 3) configured to determine, for each first-class visiting user, that the corresponding operation log meets a preset rule, and if the operation log does not meet the preset rule, trigger a training sample determining unit (not shown in fig. 3);

a training sample determination unit (not shown in fig. 3) for determining the user information of the first type of visiting user as an alternative training sample.

In one implementation, the operation log may include a type of operation data.

The preset rule judgment unit (not shown in fig. 3) may be specifically configured to: for each first-type access user, it is determined whether the operation data conforms to a first-type preset rule corresponding to the type of the operation data, and if not, a training sample determination unit (not shown in fig. 3) is triggered.

In one implementation, the operation log may include a plurality of types of operation data.

The preset rule judgment unit (not shown in fig. 3) may include:

a preset rule determining subunit (not shown in fig. 3) configured to determine, for each type of operation data included in the operation log of each first-type access user, whether the operation data meets a second-type preset rule corresponding to the type of the operation data, and if not, trigger the data determining subunit (not shown in fig. 3);

a data determination subunit (not shown in fig. 3) for determining the operation data as target operation data;

a preset value determining subunit (not shown in fig. 3) configured to determine, for each first-class visiting user, whether the amount of the target operation data corresponding to the first-class visiting user is not less than a preset value, and if so, trigger the candidate training sample determining unit (not shown in fig. 3).

Optionally, the operation data may include: the method comprises the steps of obtaining the click rate of the access user to the advertisement, the exposure rate of the access user to the advertisement, the distribution proportion of the access time of the access user and the click rate proportion of the access user to the advertisement of the same video in different time periods.

As an implementation manner provided by the embodiment of the present invention, the information storage and model application module (not shown in fig. 3) may include:

an access information obtaining sub-module (not shown in fig. 3) for obtaining user information of a first type of access user;

and the visiting user identification submodule (not shown in fig. 3) is used for inputting the user information into the identification model for detection, and obtaining the identification result of the first type visiting user.

As an implementation manner provided by the embodiment of the present invention, the access information obtaining sub-module (not shown in fig. 3) may be specifically configured to:

when the next period is finished, acquiring user information of a first type of access users stored in the next period in an off-line state; or the like, or, alternatively,

As an implementation manner provided by an embodiment of the present invention, the training apparatus for identifying a model of a cheating user may further include:

and an access request shielding module (not shown in fig. 3) for shielding the access request of the first type of access user when the identification result of the first type of access user is the cheating user.

An embodiment of the present invention further provides an electronic device, as shown in fig. 4, including a processor 401, a communication interface 402, a memory 403, and a communication bus 404, where the processor 401, the communication interface 402, and the memory 403 complete mutual communication through the communication bus 404,

a memory 403 for storing a computer program;

the processor 401, when executing the program stored in the memory 403, is configured to implement the method steps of the training method for identifying the model of the cheating user according to the embodiment of the present invention as described above:

specifically, the training method for identifying the model of the cheating user comprises

Acquiring and storing user information of a first type of access user;

determining user information which does not accord with a preset rule in the stored user information of the first type of visiting users as a training sample, wherein the preset rule is as follows: based on the stored user information of the second type of access users, the rules are determined through an unsupervised learning algorithm, and the user information of the second type of access users is the user information of the access users which is obtained and stored before the user information of the first type of access users is obtained;

training a preset model to be trained based on a training sample, wherein the model to be trained is a model for identifying whether the first type of visiting user and the second type of visiting user are cheating users;

It should be noted that other implementation manners of the training method for identifying the model of the cheating user, which is implemented by the processor 401 executing the program stored in the memory 403, are the same as the method embodiment of the training method for identifying the model of the cheating user, which is provided in the foregoing method embodiment section, and are not described herein again.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In yet another embodiment of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform the training method for identifying a model of a cheating user as described in any of the above embodiments.

In yet another embodiment provided by the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the training method for identifying a model of a cheating user as set forth in any of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, apparatus embodiments, electronic device embodiments, computer-readable storage medium embodiments, and computer program product embodiments containing instructions are substantially similar to method embodiments and are described with relative ease with reference to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A training method for identifying models of cheating users, the method comprising:

when the current period is finished, determining user information which does not accord with a preset rule in the user information of the first type of access users stored in the current period as a training sample, wherein the preset rule is as follows: based on the stored user information of a second type of access users, determining a rule through an unsupervised learning algorithm, wherein the user information of the second type of access users is the user information of the access users which is acquired and stored before the user information of the first type of access users is acquired;

2. The method of claim 1,

and inputting the added target sample set into the target model for training.

3. The method of claim 2, further comprising:

and after entering the next period, storing the user information of the first-class visiting users in the current period, identifying the first-class visiting users through the identification model, and returning to the step of determining the user information which does not accord with the preset rule in the user information of the first-class visiting users stored in the current period as a training sample when the current period is finished.

4. The method of claim 2, wherein prior to the step of adding the training sample to a set of target samples, the method further comprises:

and determining that the online frequency corresponding to each training sample meets the preset frequency.

5. The method according to claim 1, wherein the step of determining, as the training sample, user information that does not meet a preset rule in the user information of the first type of visiting users that has been stored in the current period includes:

6. The method of claim 5, wherein the oplog comprises a type of operational data;

the step of judging whether the corresponding operation log of each first-class access user accords with a preset rule comprises the following steps:

7. The method of claim 5, wherein the oplog comprises a plurality of types of operational data;

8. A training apparatus for identifying models of cheating users, the apparatus comprising:

the user information acquisition module is used for acquiring and storing the user information of the first type of access users in the current period;

a training sample determining module, configured to determine, when a current period ends, user information that does not meet a preset rule in user information of a first type of access user that is stored in the current period, as a training sample, where the preset rule is: based on the stored user information of a second type of access users, determining a rule through an unsupervised learning algorithm, wherein the user information of the second type of access users is the user information of the access users which is acquired and stored before the user information of the first type of access users is acquired;

9. The apparatus of claim 8,

10. The apparatus of claim 9, further comprising:

and the information storage and model application module is used for storing the user information of the first type of visiting users in the current period after entering the next period, identifying the first type of visiting users through the identification model and triggering the training sample determination module.

11. The apparatus of claim 9, further comprising:

and the online frequency determining module is used for determining that the online frequency corresponding to each training sample meets a preset frequency before the training samples are added into the target sample set.

12. The apparatus of claim 8, wherein the user information obtaining module comprises:

the user information acquisition unit is used for acquiring the user information and the operation log of the first type of access users stored in the current period when the current period is finished;

13. The apparatus of claim 12, wherein the oplog comprises a type of operational data;

the preset rule judging unit is specifically configured to: and judging whether the operation data accords with a first type preset rule corresponding to the type of the operation data or not aiming at each first type access user, and if not, triggering the training sample determining unit.

14. The apparatus of claim 12, wherein the operation log includes a plurality of types of operation data, and the preset rule determining unit includes:

15. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 7 when executing a program stored in the memory.