CN110222710A - Data processing method, device and storage medium - Google Patents
Data processing method, device and storage medium Download PDFInfo
- Publication number
- CN110222710A CN110222710A CN201910361278.4A CN201910361278A CN110222710A CN 110222710 A CN110222710 A CN 110222710A CN 201910361278 A CN201910361278 A CN 201910361278A CN 110222710 A CN110222710 A CN 110222710A
- Authority
- CN
- China
- Prior art keywords
- sample
- target
- training
- selection
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
This application discloses a kind of data processing method, device and storage mediums, belong to data processing field.The described method includes: first concentrating the corresponding sample data set of selection target modeling type from multiple sample datas of storage.Then at least one characteristic dimension is selected from multiple characteristic dimensions, from the multiple training patterns stored for Target Modeling type, selection target training pattern.Data of each sample for including according to the sample data set of selection later at least one characteristic dimension, are trained target training pattern.Finally according to the model obtained after training, multiple and optimization aim information are extended according to target, determines growth data collection.Due to when needing to be trained object module, directly object module can be selected from multiple training patterns, different codes is write to multiple training patterns to realize without operator, so that training process is simplified, so that determining that the process of growth data collection is more efficient.
Description
Technical field
This application involves data processing field, in particular to a kind of data processing method, device and storage medium.
Background technique
Machine learning techniques are usually used to the excavation for carrying out some data.That is, according to the sample data set got
It treats trained model to be trained, other data is then excavated according to the model obtained after training.For example, in the marketing
Field can treat trained model according to the marketing sample data set got and be trained, and then obtain according to after training
To model excavate some other data, consequently facilitating the formulation of marketing strategy.Currently, the whole process of machine learning techniques
Code realization is mainly write by operator, that is to say and realize that the whole process of machine learning techniques needs operator
With certain basis of coding, so that the realization of the whole process of machine learning techniques is more difficult.
Summary of the invention
The embodiment of the present application provides a kind of data processing method, device and storage medium, can solve in the related technology
Realize that the whole process of machine learning techniques needs operator to have certain basis of coding, so that machine learning techniques
Whole process the more difficult problem of realization.The technical solution is as follows:
In a first aspect, providing a kind of data processing method, which comprises
The corresponding sample data set of selection target modeling type, the sample number of selection are concentrated from multiple sample datas of storage
It include multiple samples according to collection, each sample includes the data of multiple characteristic dimensions;
At least one characteristic dimension is selected from the multiple characteristic dimension, from what is stored for the Target Modeling type
In multiple training patterns, selection target training pattern;
Data of each sample for including according to the selected sample data set at least one described characteristic dimension,
The target training pattern is trained;
According to the model obtained after training, multiple and optimization aim information are extended according to target, determines growth data collection, institute
Ratio between the quantity for the sample that the quantity and the selected sample data set for stating the sample that growth data collection includes include
Extend multiple for the target, the optimization aim information refer to the growth data collection and the selected sample data set it
Between matching index.
Optionally, each sample for including according to the selected sample data set is at least one described feature dimensions
Data on degree, before being trained to the target training pattern, the method also includes:
Display parameters set interface, the parameter setting interface include at least one parameter edit box;
At least one parameter for target training pattern setting is obtained from least one described parameter edit box.
Optionally, described according to the model obtained after training, multiple and optimization aim information are extended according to target, determines and expands
After opening up data set, the method also includes:
Show assessment result, the growth data collection that the assessment result is used to determine the model that obtains after the training into
Row assessment;
According to the assessment result, at least one parameter for including to the target training pattern is adjusted.
Optionally, the method also includes:
In the training process of the target training pattern, the training flow chart of the target training pattern is shown, it is described
It include multiple trained nodes in training flow chart, the display mode of each trained node is the first display mode, the second display side
Formula or third display mode, first display mode, which is used to indicate, is completed corresponding training node, second display
Mode, which is used to indicate, is in corresponding training node, and the third display mode, which is used to indicate, does not reach corresponding training section
Point.
Optionally, described according to the model obtained after training, multiple and optimization aim information are extended according to target, determines and expands
After opening up data set, the method also includes:
The each sample deployment strategy for including to the growth data collection.
Second aspect, provides a kind of data processing equipment, and described device includes:
First choice module, for concentrating the corresponding sample number of selection target modeling type from multiple sample datas of storage
According to collection, the sample data set of selection includes multiple samples, and each sample includes the data of multiple characteristic dimensions;
Second selecting module, for selecting at least one characteristic dimension from the multiple characteristic dimension, from for described
In multiple training patterns of Target Modeling type storage, selection target training pattern;
Training module, each sample for including according to the selected sample data set is at least one described feature
Data in dimension are trained the target training pattern;
Determining module, for extending multiple and optimization aim information according to target, determining according to the model obtained after training
The number for the sample that growth data collection, the quantity for the sample that the growth data collection includes and the selected sample data set include
Ratio between amount is that the target extends multiple, the optimization aim information refer to the growth data collection with it is selected
Matching index between sample data set.
Optionally, described device further include:
First display model, is used for display parameters set interface, and the parameter setting interface includes that at least one parameter is compiled
Collect frame;
Module is obtained, for being obtained from least one described parameter edit box for target training pattern setting
At least one parameter.
Optionally, described device further include:
Second display module, for showing that assessment result, the assessment result are used for the model obtained after the training
Determining growth data collection is assessed;
Adjust module, for according to the assessment result, at least one parameter for including to the target training pattern into
Row adjustment.
Optionally, described device further include:
Third display module, for showing the target training pattern in the training process of the target training pattern
Training flow chart, include multiple trained nodes in the trained flow chart, the display mode of each trained node is first aobvious
Show mode, the second display mode or third display mode, first display mode, which is used to indicate, is completed corresponding training
Node, second display mode, which is used to indicate, is in corresponding training node, and the third display mode is used to indicate not
Reach corresponding training node.
Optionally, described device further include:
Deployment module, each sample deployment strategy for including to the growth data collection.
The third aspect, provides a kind of data processing equipment, and described device includes:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to the step of executing any one method described in above-mentioned first aspect.
Fourth aspect provides a kind of computer readable storage medium, finger is stored on the computer readable storage medium
The step of enabling, any one method described in above-mentioned first aspect realized when described instruction is executed by processor.
5th aspect, provides a kind of computer program product including instruction, when run on a computer, so that
Computer executes the step of any one of above-mentioned first aspect the method.
Technical solution provided by the embodiments of the present application can at least bring it is following the utility model has the advantages that
In the embodiment of the present application, the corresponding sample of selection target modeling type first is concentrated from multiple sample datas of storage
Data set, the sample data set of selection include multiple samples.Since the sample data set of selection includes multiple characteristic dimensions, so
At least one characteristic dimension can be selected from multiple characteristic dimension, from the multiple trained moulds stored for Target Modeling type
In type, selection target training pattern.Later, each sample for including according to the sample data set of selection is at least one feature dimensions
Data on degree are trained target training pattern.According to the model obtained after training, multiple and optimization are extended according to target
Target information determines growth data collection.Index is matched due to existing between growth data collection and the sample data set of selection, so
There is similitude between the data set of extension and the sample data set of selection.The embodiment of the present application, due to being directed to Target Modeling class
Multiple training patterns of type are previously stored in computer equipment, need to the object module in multiple training patterns into
When row training, directly object module can be selected from multiple training patterns, then according to the sample number according to selection
Data of each sample for including according to collection at least one characteristic dimension, are trained target training pattern.Furthermore, it is possible to
Target training pattern is repeatedly trained, or is trained to using different training patterns as target training pattern.?
Being data processing method provided by the embodiments of the present application writes different codes to multiple training patterns without operator
Realize that there is certain basis of coding without operator, to simplify the training process of target training pattern, and then make
It must determine the process more simple and effective of growth data collection.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, the drawings in the following description are only some examples of the present application, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is a kind of schematic diagram of implementation environment provided by the embodiments of the present application.
Fig. 2 is the flow chart of the first data processing method provided by the embodiments of the present application.
Fig. 3 is the flow chart of second of data processing method provided by the embodiments of the present application.
Fig. 4 is the first interface schematic diagram provided by the embodiments of the present application.
Fig. 5 is second of interface schematic diagram provided by the embodiments of the present application.
Fig. 6 is the third interface schematic diagram provided by the embodiments of the present application.
Fig. 7 is the 4th kind of interface schematic diagram provided by the embodiments of the present application.
Fig. 8 is the 5th kind of interface schematic diagram provided by the embodiments of the present application.
Fig. 9 is the 6th kind of interface schematic diagram provided by the embodiments of the present application.
Figure 10 is the 7th kind of interface schematic diagram provided by the embodiments of the present application.
Figure 11 is the 8th kind of interface schematic diagram provided by the embodiments of the present application.
Figure 12 is the 9th kind of interface schematic diagram provided by the embodiments of the present application.
Figure 13 is the provided by the embodiments of the present application ten kind of interface schematic diagram.
Figure 14 is a kind of data processing equipment block diagram provided by the embodiments of the present application.
Figure 15 is a kind of structural schematic diagram of data processing equipment provided by the embodiments of the present application.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they are only and the application
The consistent device and method of some aspects example.
Before carrying out detailed explanation to the embodiment of the present application, first the implementation environment of the embodiment of the present application is carried out
It introduces:
Fig. 1 is a kind of schematic diagram of implementation environment provided by the embodiments of the present application, and referring to Fig. 1, which includes meter
Machine equipment 100 is calculated, computer equipment 100 includes output equipment 110, input equipment 120, business logic modules 130, life cycle
Management module 140 and algorithmic dispatching module 150.
Output equipment 110 can be communicated with business logic modules 130, and output equipment 110 is displayed for multiple interfaces.
Output equipment 110 can be liquid crystal display (liquid crystal display, LCD), Light-Emitting Diode (light
Emitting diode, LED) display equipment, cathode-ray tube (cathode ray tube, CRT) display equipment or projector
(projector) etc..
Input equipment 120 can be communicated with business logic modules 130, and input equipment 120 can receive use in many ways
The input at family.For example, input equipment 120 can be mouse, keyboard, touch panel device or sensing equipment etc..
Business logic modules 130 can receive the operation information of user, generates and appoints to what target training pattern was trained
Business, and life cycle management module 140 is written into the task.
Life cycle management module 140 includes multiple application programming interfaces (Application Programming
Interface, API), life cycle management module 140, which can store, trains mould to target by what business logic modules 130 were written
The task that type is trained.Also, the API for passing through life cycle management module 140, can be to life cycle management module 140
In code library in store the information such as the code of multiple training patterns, the parameter of multiple training patterns and assessment result, or
Person can be deleted and be stored in the code library in life cycle management module 140 by the API of life cycle management module 140
The information such as the code of multiple training patterns, the parameter of multiple training patterns and assessment result.Wherein, life cycle management mould
Block 140 can realize multiple function by machine learning process (Machine Learning flow, MLflow) application program
Energy.
Life cycle management mould can be dispatched and be executed to algorithmic dispatching module 150 according to certain execution sequence and time
Task in block 140, and can be added by the API of life cycle management module 140 to life cycle management module 140
New task, or delete the historic task in life cycle management module 140.
Computer equipment 100 can be a general purpose computing device or a dedicated computing machine equipment.It is implementing
In, computer equipment 100 can be desktop computer, portable computer, network server, palm PC (Personal Digital
Assistant, PDA), cell phone, tablet computer, wireless terminal device, communication equipment or embedded device, the application implemented
The unlimited type for determining computer equipment of example.
Fig. 2 is a kind of flow chart of data processing method provided by the embodiments of the present application, and this method is set applied to computer
It is standby.Wherein the computer equipment can be computer equipment 100 shown in FIG. 1, referring to fig. 2, this method comprises:
Step 201: concentrating the corresponding sample data set of selection target modeling type, selection from multiple sample datas of storage
Sample data set include multiple samples, each sample includes the data of multiple characteristic dimensions.
Step 202: at least one characteristic dimension is selected from multiple characteristic dimensions, from what is stored for Target Modeling type
In multiple training patterns, selection target training pattern.
Step 203: number of each sample that the sample data set according to selection includes at least one characteristic dimension
According to being trained to target training pattern.
Step 204: according to the model obtained after training, extending multiple and optimization aim information according to target, determine extension
Ratio between the quantity for the sample that data set, the quantity for the sample that growth data collection includes and the sample data set of selection include
Multiple is extended for target, optimization aim information refers to the matching index between growth data collection and the sample data set of selection.
In the embodiment of the present application, the corresponding sample of selection target modeling type first is concentrated from multiple sample datas of storage
Data set, the sample data set of selection include multiple samples.Since the sample data set of selection includes multiple characteristic dimensions, so
At least one characteristic dimension can be selected from multiple characteristic dimension, from the multiple trained moulds stored for Target Modeling type
In type, selection target training pattern.Later, each sample for including according to the sample data set of selection is at least one feature dimensions
Data on degree are trained target training pattern.According to the model obtained after training, multiple and optimization are extended according to target
Target information determines growth data collection.Index is matched due to existing between growth data collection and the sample data set of selection, so
There is similitude between the data set of extension and the sample data set of selection.The embodiment of the present application, due to being directed to Target Modeling class
Multiple training patterns of type are previously stored in computer equipment, need to the object module in multiple training patterns into
When row training, directly object module can be selected from multiple training patterns, then according to the sample number according to selection
Data of each sample for including according to collection at least one characteristic dimension, are trained target training pattern.Furthermore, it is possible to
Target training pattern is repeatedly trained, or is trained to using different training patterns as target training pattern.?
Being data processing method provided by the embodiments of the present application writes different codes to multiple training patterns without operator
Realize that there is certain basis of coding without operator, to simplify the training process of target training pattern, and then make
It must determine the process more simple and effective of growth data collection.
Optionally, number of each sample for including according to the sample data set of selection at least one characteristic dimension
According to, before being trained to target training pattern, this method further include:
Display parameters set interface, parameter setting interface include at least one parameter edit box;
At least one parameter for the setting of target training pattern is obtained from least one parameter edit box.
Optionally, according to the model obtained after training, multiple and optimization aim information is extended according to target, determines spreading number
After collection, this method further include:
Show that assessment result, assessment result are used to assess the growth data collection that the model obtained after training determines;
According to assessment result, at least one parameter that target training pattern includes is adjusted.
Optionally, this method further include:
In the training process of target training pattern, the training flow chart of displaying target training pattern is trained in flow chart
Including multiple trained nodes, the display mode of each trained node is that the first display mode, the second display mode or third are aobvious
Show mode, the first display mode, which is used to indicate, is completed corresponding training node, and the second display mode, which is used to indicate, is in phase
The training node answered, third display mode, which is used to indicate, does not reach corresponding training node.
Optionally, according to the model obtained after training, multiple and optimization aim information is extended according to target, determines spreading number
After collection, this method further include:
The each sample deployment strategy for including to growth data collection.
All the above alternatives, can form the alternative embodiment of the application according to any combination, and the application is real
It applies example and this is no longer repeated one by one.
Fig. 3 is a kind of flow chart of data processing method provided by the embodiments of the present application, and this method is set applied to computer
It is standby.Referring to Fig. 3, this method comprises:
Step 301: concentrating the corresponding sample data set of selection target modeling type, selection from multiple sample datas of storage
Sample data set include multiple samples, each sample includes the data of multiple characteristic dimensions.
It should be noted that Target Modeling type can model type for consumption propensity, crowd extends modeling type, potential
Client assesses modeling type, customer churn prediction modeling type or crowd and clusters modeling type etc..The sample data set packet of selection
The quantity of the multiple samples included is usually more, for example, the sample data set of selection may include 10000,20000 samples
Deng.In multiple samples that the sample data set of selection includes, each sample may include sample identification, the sample mark of each sample
Know for uniquely indicating each sample.Illustratively, multiple samples can be multiple users, in this way, the sample mark of each sample
Know to be the user account of each user.In addition, characteristic dimension can be age, gender, educational background, hobby, location
Domain or purchase intention etc..
In the case where a kind of possible, in multiple samples that the sample data set of selection includes, different two samples
Including multiple characteristic dimensions can identical perhaps part it is identical or completely not identical.When different two samples
Including multiple characteristic dimensions in there are when identical characteristic dimension, the two different samples in identical characteristic dimension
Data can be identical or not identical.
For example, with reference to table 1, sample 1, sample 2, sample 3, sample 4 and sample 5 are the sample data set packet of selection in table 1
5 samples in the multiple samples included.From table 1 it follows that sample 1 include multiple characteristic dimensions be gender and age,
And the data of the two characteristic dimensions are respectively women and 20 years old;Multiple characteristic dimensions that sample 2 includes be gender and age, and
The data of the two characteristic dimensions are respectively women and 20 years old;Multiple characteristic dimensions that sample 3 includes be gender and age, and this
The data of two characteristic dimensions are respectively male and 30 years old;Multiple characteristic dimensions that sample 4 includes are academic and hobby, and
The data of the two characteristic dimensions are respectively undergraduate course and travelling;Multiple characteristic dimensions that sample 5 includes be gender and educational background, and this
The data of two characteristic dimensions are respectively male and undergraduate course.
It that is to say, it is gender and age that 2 characteristic dimensions that sample 1, sample 2 and sample 3 include are identical.Wherein, sample
The data for this 2 characteristic dimensions that sheet 1 and sample 2 include are all the same, the number for this 2 characteristic dimensions that sample 1 and sample 3 include
According to being all different.In addition, 2 characteristic dimensions that sample 1 and sample 4 include are all different.12 for including with sample 5 of sample
In characteristic dimension, there are an identical characteristic dimension, i.e. gender, but sample 1 with sample 5 in this identical characteristic dimension
Data it is not identical.Furthermore in 2 characteristic dimensions that sample 4 and sample 5 include, there are an identical characteristic dimensions, that is, learn
It goes through, the data of sample 4 and sample 5 in this identical characteristic dimension are identical.
Table 1
Gender | Age | Educational background | Hobby | |
Sample 1 | Women | 20 years old | ||
Sample 2 | Women | 20 years old | ||
Sample 3 | Male | 30 years old | ||
Sample 4 | Undergraduate course | Travelling | ||
Sample 5 | Male | Undergraduate course |
It is worth noting that, the data for multiple characteristic dimensions that each sample includes can be obtained by sample database.
Specifically, it when each sample includes sample identification, can be obtained from sample database according to the sample identification of each sample
The data for multiple characteristic dimensions that each sample includes.And the data of each sample multiple characteristic dimensions for including can with two into
Number processed indicates.By taking this characteristic dimension of gender as an example, the data of gender can be indicated with 1 or 0, for example, gender is women,
0 can be then expressed as;Gender is male, then can be expressed as 1.The characteristic dimension more for data class such as age, educational backgrounds,
Multiple characteristic dimension units of these characteristic dimensions can be first determined, if the data of characteristic dimension are located at some characteristic dimension list
The data of this characteristic dimension unit are then expressed as 1 by member, the data of other characteristic dimension units are expressed as 0, are indicated with this
The data of this feature dimension.By taking this characteristic dimension of age as an example, the age can be divided into multiple characteristic dimension units, for example, 0
~18 years old, 19~30 years old, 31~40 years old, 41~50 years old etc..If the age is 20 years old, can be by 19~30 years old characteristic dimension
The data of unit are expressed as 1, and the data of other characteristic dimension units are expressed as 0, and the data at age are indicated with this.Certainly, also
It can indicate that the data for multiple characteristic dimensions that each sample includes, the embodiment of the present application do not limit this by other means
It is fixed.
In one possible implementation, computer equipment can be shown including modeling type selection interface, the modeling
It include multiple modeling types in type selection interface.It, can be any by this when detecting the selection operation of any modeling type
Modeling type is determined as Target Modeling type.At this point, computer equipment can be shown including collection selection interface, the data set
It include multiple sample data sets in selection interface.It, can be any by this when detecting the selection operation of any sample data set
Sample data set is determined as the corresponding sample data set of Target Modeling type.If multiple sample datas at collection selection interface
It concentrates, when not including the sample data set for thinking selection, sample data set can also be added in computer equipment, thus when detection
The sample data set of addition is shown on to data set selection interface.Then when detecting the selection to the sample data set of addition
When operation, the sample data set of addition can be determined as to the corresponding sample data set of Target Modeling type.
Illustratively, a modeling type can usually represent a scene, and therefore, modeling type selection interface can claim
For scene selection interface.It is referring to fig. 4 modeling type selection interface to Fig. 5, Fig. 4, which includes more
A modeling type is respectively as follows: consumption propensity modeling type, crowd extends modeling type, potential customers assess modeling type, user
Attrition prediction models type or crowd clusters modeling type.When detecting the selection operation to crowd's extension modeling type,
It is when detecting the clicking operation to " entrance " option on the modeling type selection interface, computer equipment can be shown
Collection selection interface as shown in Figure 5, the collection selection interface include 2 sample data sets, and this 2 sample data sets
It is respectively as follows: " the brand audient of the second quarter in 2018 clicks crowd " and " the brand audient of the first quarter in 2018 clicks people
Group ".Then it when detecting the selection operation to " the brand audient of the second quarter in 2018 clicks crowd ", that is to say when detection
When to clicking operation to " the brand audient of the second quarter in 2018 click crowd " corresponding " selection " option on the interface,
" the brand audient of the second quarter in 2018 clicks crowd " can be determined as crowd's extension modeling type and corresponded to by computer equipment
Sample data set.It does not include the sample data set for thinking selection if this 2 sample datas at collection selection interface are concentrated
When, sample data set, and the letter such as title and path for determining the sample data set of addition can also be added in computer equipment
Breath.Then when detecting the clicking operation to " addition sample data set " option on the interface, display adds on the surface
Add pop-up.The addition pop-up includes the edit box of sample data set, when detecting to the editor of the edit box of sample data set behaviour
When making, computer equipment can determine the information such as title, the path of sample data set obtained after editor, if the sample with addition
The information such as title, the path of notebook data collection are consistent, if unanimously, the sample data set added in computer equipment can be shown
Show in collection selection interface.At this point, when detecting the selection operation to the sample data set of addition, namely work as and detect
When the clicking operation of " selection " option corresponding to the sample data set of addition, computer equipment can be by the sample data of addition
Collection is determined as crowd and extends the corresponding sample data set of modeling type.
It under normal conditions, can be direct after which is determined as Target Modeling type by computer equipment
Set of displayable data selection interface.For example, detecting the point to " entrance " option on modeling type selection interface shown in Fig. 4
When hitting operation, the modeling type where this " entrance " option directly can be determined as Target Modeling type, it is then directly aobvious
Show collection selection interface shown in fig. 5.Certainly, computer equipment by any modeling type be determined as Target Modeling type it
Afterwards, when detecting the selection operation of collection selection label, set of displayable data selection interface.For example, detecting to Fig. 4 institute
When the selection operation of the collection selection label on the left of modeling type selection interface shown, collection selection shown in fig. 5 is shown
Interface.
Step 302: at least one characteristic dimension is selected from multiple characteristic dimensions, from what is stored for Target Modeling type
In multiple training patterns, selection target training pattern.
Under normal conditions, it after computer equipment determines Target Modeling type corresponding sample data set, can show
Characteristic dimension selection interface, includes multiple characteristic dimensions in this feature dimension selection interface, and multiple characteristic dimension is built for target
Multiple characteristic dimensions of the corresponding sample data set of mould type.When the selection operation for detecting at least one any characteristic dimension
When, which can be determined as at least one feature dimensions selected from above-mentioned multiple characteristic dimensions
Degree.
It illustratively, include multiple characteristic dimensions, respectively property in characteristic dimension selection interface shown in Fig. 6 referring to Fig. 6
Not, age, educational background, marital status ... that is to say when detecting the selection operation of gender, age and educational background, detect
Gender, age and educational background can be determined as selection at least by when the choosing operation of choice box before gender, age and educational background
One characteristic dimension.
Under normal conditions, it after computer equipment determines the corresponding sample data set of Target Modeling type, can directly show
Show characteristic dimension selection interface.For example, detecting the click to " selection " option on collection selection interface shown in fig. 5
When operation, the data set where this " selection " option directly can be determined as the corresponding data set of Target Modeling type, so
After directly display characteristic dimension selection interface shown in fig. 6.Certainly, computer equipment determines the corresponding sample of Target Modeling type
After data set, when detecting the selection operation of characteristic dimension selection label, characteristic dimension selection interface is shown.For example,
When detecting the selection operation to the characteristic dimension selection label on the left of collection selection interface shown in fig. 5, show shown in Fig. 6
Characteristic dimension selection interface.
It should be noted that different modeling types can correspond to different multiple training patterns.For example, when modeling type
When extending modeling type for crowd, the multiple training patterns for extending modeling type storage for crowd may include that differential index (di) increases
Strong model, single category support vector machines (One Class Support Vector Machine, One Class SVM) model or
Two layers of convolutional neural networks (Convolutional Neural Networks (2layers), CNN (2layers)) model etc..
When modeling type is that consumption propensity models type, multiple training patterns for consumption propensity modeling type storage may include
Latent visitor purchases on a barter basis mixed model, the annular mixed model of latent visitor etc..It is poly- for crowd when modeling type is that people's clustering class models type
Multiple training patterns that class models type storage may include K mean cluster model etc..It therefore, can be from for Target Modeling class
Select a training pattern as target training module in multiple training patterns of type storage.In addition, multiple training patterns can be with
It is stored in the life cycle management module 140 in computer equipment 100 described in Fig. 1.
In one possible implementation, computer equipment can be with display model selection interface, the model selection interface
In include for Target Modeling type storage multiple training patterns.It, can when detecting the selection operation of any training pattern
Any training module is determined as target training pattern.
It illustratively, include 3 training patterns in model selection interface shown in Fig. 7 referring to Fig. 7, and this 3 training
Model is respectively as follows: differential index (di) enhancing model, One Class SVM model and CNN (2layers) model.When detecting this 3
In training pattern when the selection operation of any training pattern, which can be determined as target training pattern.?
It is when the choosing operation of any choice box before detecting this 3 training patterns, it can be true by any training pattern
It is set to target training pattern.
Under normal conditions, after computer equipment determines at least one characteristic dimension, model selection circle can be directly displayed
Face.For example, can will be selected detecting to when the choosing operation of choice box before at least one characteristic dimension in Fig. 6
The selected characteristic dimension of frame determines at least one characteristic dimension, then directly displays model selection interface shown in Fig. 7.When
So, after the characteristic dimension that choice box is selected can also being determined as at least one characteristic dimension, model selection is being detected
When the selection operation of label, display model selection interface.Alternatively, checking " next step " choosing in characteristic dimension selection interface
When the selection operation of item, display model selection interface.
It is worth noting that, in practical applications, in multiple training patterns for the storage of Target Modeling type, not wrapping
When including the target training pattern of needs, operator can also pass through the code of the storage target training pattern into computer equipment
Mode, addition to target training pattern is realized, so as to realize from the multiple training stored for Target Modeling type
In model, selection target training pattern.Alternatively, if existing not in multiple training patterns of Target Modeling type storage
When necessary training pattern, operator can also delete the code of the unnecessary training pattern in computer equipment,
To realize the deletion to unnecessary training pattern.It that is to say in the embodiment of the present application, can be added according to use demand
Or delete the training pattern for being directed to Target Modeling type.
Illustratively, by taking implementation environment shown in FIG. 1 as an example, when for multiple training patterns of Target Modeling type storage
In, when not including the target training pattern needed, operator can pass through the life cycle management mould in computer equipment 100
The API of block 140 stores the code of target training pattern into the code library of life cycle management module 140, to realize to mesh
Mark the addition of training pattern.Alternatively, if there are unnecessary instructions in multiple training patterns of Target Modeling type storage
When practicing model, operator can will be stored in life by the API of the life cycle management module 140 in computer equipment 100
The code for ordering unnecessary training pattern in the code library of cycle management module 140 is deleted, to realize to unnecessary
The deletion of training pattern.
In some embodiments, each training pattern in multiple training patterns may each comprise at least one parameter, institute
(1)-step (2) can also determine as follows for the setting of target training pattern extremely before carrying out step 303
A few parameter.
(1): display parameters set interface, parameter setting interface include at least one parameter edit box.
It should be noted that the parameter of target training pattern may include maximum number of iterations, convergence, regular coefficient
With minimum convergence error etc..
In one possible implementation, parameter setting interface can be individual interface, or model selection
A part in interface.Illustratively, referring to Fig. 8, parameter setting interface is a part in model selection interface in Fig. 8,
On interface i.e. shown in Fig. 8, multiple training patterns for the storage of Target Modeling type are not only shown, are also shown at least
One parameter edit box.
(2): at least one parameter for the setting of target training pattern is obtained from least one parameter edit box.
In the case where a kind of possible, one is provided in advance at least one parameter edit box in parameter setting interface
A numerical value, at this point it is possible to not have to be configured at least one parameter in parameter edit box.In other words, at least one is joined
Number can regard the numerical value of default as.For example, when parameter includes maximum number of iterations, convergence, regular coefficient and minimum
Whens convergence error etc., can in advance by maximum number of iterations be set as 500, convergence be set as 1, regular coefficient be set as 0,
Minimum convergence error is set as 0.005 etc..It certainly, in this case, can also be to silent at least one parameter edit box
Recognize numerical value to modify.
In the case where alternatively possible, number is not provided at least one parameter edit box in parameter setting interface
Value, at this point, at least one that can will be obtained after editor is joined when detecting the edit operation at least one parameter edit box
Number, as at least one parameter for the setting of target training pattern.
Step 303: number of each sample that the sample data set according to selection includes at least one characteristic dimension
According to being trained to target training pattern.
In one possible implementation, target training pattern is trained, is by the sample data set of selection
Including data of each sample at least one characteristic dimension, be input in target training pattern, to target training pattern
It is trained.In some embodiments, target training pattern can regard a kind of algorithm as, include by the sample data set of selection
Data of each sample at least one characteristic dimension, be input in target training pattern, be according to the algorithm to choosing
Data of each sample that the sample data set selected includes at least one characteristic dimension are handled.For different targets
For training pattern, the processing of data of each sample that the sample data set to selection includes at least one characteristic dimension
Mode is different.
Illustratively, data of each sample for including by the sample data set of selection at least one characteristic dimension,
After being input in target training pattern, due to the data of sample total in any feature dimension at least one characteristic dimension
Multiple types can be divided into, therefore for any feature dimension at least one characteristic dimension, can determine that any feature is tieed up
The ratio between the quantity of the corresponding sample of each data and the quantity of total sample on degree.Then these ratios are determined
For the corresponding multiple referential datas of a variety of data of any feature dimension.
For example, the sample data set of selection includes 1000 samples.Multiple feature dimensions that the sample data set of selection includes
Degree is gender, age and click behavior.Wherein, the data of gender can be divided into two classes, i.e. women and male;Age can divide
For multiple characteristic dimension units: 0~18 years old, 19~30 years old, 31~40 years old, 41~50 years old, that is, the data at age can be divided into
Four classes;The data of click behavior can be divided into two classes, that is, click and do not click on.Target training pattern determines this 1000 samples
In, gender is that the quantity of the sample of women is 600, then gender be the sample of women quantity and total sample quantity it
Between ratio be 600 divided by 1000, i.e., 0.6.Gender is that the quantity of the sample of male is 400, then gender is male's
Ratio between the quantity of sample and the quantity of total sample is 0.4.The data at age are located at 0~18 years old, 19~30 years old, 31
The quantity of~40 years old and 41~50 years old samples is respectively 100,400,300 and 200.So similarly, age bit
Ratio point between the quantity of 0~18 years old, 19~30 years old, 31~40 years old and 41~50 years old sample and the quantity of total sample
Not are as follows: 0.1,0.4,0.3 and 0.2.Similarly, it is 800 that click behavior, which is the quantity for the sample clicked, and click behavior is not click on
Sample quantity be 200, then click behavior be click the quantity of sample and the quantity of total sample between ratio
It is 0.8, click behavior is that the ratio between the quantity for the sample not clicked on and the quantity of total sample is 0.2.Target trains mould
These ratios can be determined as the corresponding multiple ginsengs of a variety of data of each characteristic dimension after determining these ratios by type
Examine numerical value.
In addition, computer equipment can be instructed after computer equipment is trained target training pattern with display model
Practice details interface, the model training details interface include the information and target training pattern of target training pattern training into
Degree, thus by the model training details interface can the training progress to target training pattern be observed.
Illustratively, it is assumed that the target training pattern selected in model selection interface shown in Fig. 7 is differential index (di) increasing
Strong model, at this point, computer equipment can show model training details interface as shown in Figure 9, model training shown in Fig. 9 is detailed
The training progress of information and target training pattern on feelings interface including target training pattern.Wherein, target training pattern
Information may include Target Modeling type, the sample data set of selection, the number of at least one characteristic dimension of selection, target instruction
Practice model, the number that target training pattern has been trained excessively and training progress of target training pattern etc..
It under normal conditions, can be directly according to the sample data of selection after computer equipment determines target training pattern
Data of each sample that collection includes at least one characteristic dimension, are trained target training pattern.Certainly, some
In embodiment, the training that computer equipment can first without target training pattern, but display model training details interface.This
When, when detecting the selection operation of the " RUN " option on model training details interface, target training pattern can be carried out
Training.Illustratively, when detecting to the clicking operation of " RUN " option on model training details shown in Fig. 9 interface, start
Target training pattern is trained.
It is gone through it is worth noting that, can also be shown on model training details interface to what other training patterns were trained
History training record.Operator can so more easily grasped to go through target training pattern or other training patterns
The information of history training record.
Optionally, in the training process of target training pattern, the training flow chart of displaying target training pattern, training stream
Include multiple trained nodes in journey figure, the display mode of each trained node be the first display mode, the second display mode or
Third display mode.
It should be noted that in some embodiments, multiple trained nodes may include the sample data set for inputting selection,
Determine corresponding referential data of data etc. in each characteristic dimension.It is completed accordingly in addition, the first display mode is used to indicate
Training node, the second display mode, which is used to indicate, is in corresponding training node, and third display mode, which is used to indicate, not to be reached
Corresponding training node.
In one possible implementation, can on model training details interface displaying target training pattern training
Flow chart.For example, with reference to Figure 10.The training process of displaying target training pattern on model training details shown in Fig. 10 interface
Figure.
Illustratively, the first display mode, the second display mode and third display mode can by different colors come
It indicates.For example, the first display mode is to be set as grey to the color that corresponding training node is completed, the second display mode is
Red is set as to the color in corresponding training node, third display mode is to the face for not reaching corresponding training node
Color is set as green.It is, of course, also possible to indicate that the first display mode, the second display mode and third are shown by other forms
Mode, the embodiment of the present application do not limit this.
It is worth noting that during target training pattern is trained, the training process of displaying target training pattern
Figure, can make operator more intuitive and clearly the training process of master goal training pattern namely operator can be with
Intuitively observe which step the training of target training pattern proceeds to.
Step 304: according to the model obtained after training, extending multiple and optimization aim information according to target, determine extension
Data set.
It should be noted that the sample that the quantity for the sample that growth data collection includes and the sample data set of selection include
Ratio between quantity is that target extends multiple.Optimization aim information refers between growth data collection and the sample data set of selection
Matching index.Wherein, optimization aim information can be click behavior, hobby or region diversity etc..It should be understood that
It is that optimization aim information is different, determining growth data collection may be different.
In addition, target extension multiple and optimization aim information can be and is arranged in computer equipment in advance, it can also be with
It is to be configured on the interface that computer equipment is shown before being trained to target training pattern.
Illustratively, if target extension multiple and optimization aim information are the advances being trained to target training pattern
If row setting, referring to Figure 11, the dragging item and multiple optimization mesh of extension multiple can also be shown in model selection interface
Mark the choice box of information.Multiple can be extended to target by the dragging item of dragging extension multiple to be configured, or pass through a little
"+" option and "-" option is hit to be configured target extension multiple.In addition, passing through the selection for choosing multiple optimization aim information
Frame can be configured optimization aim information.After being trained to target training pattern, it can be obtained according to after training
The model arrived extends multiple and optimization aim information according to target, determines growth data collection.
Illustratively, if optimization aim is click behavior, at least one characteristic dimension is determined according to above-mentioned steps 303
After the corresponding multiple referential datas of a variety of data of middle any feature dimension, it can also be determined each from sample database
The corresponding referential data of data for multiple characteristic dimensions that sample includes.Then, by sample each in sample database multiple
Corresponding multiple referential datas are added in characteristic dimension, obtain the reference score of each sample.According to the reference point of each sample
The descending sequence of number, all samples for including by sample database are ranked up.Multiple is extended according to target, after sequence
Sample in selected section sample form growth data collection.At this point, composition growth data collection and selection sample data set it
Between matching index be click behavior, in other words, the sample of growth data the collection multiple samples for including and selection of composition
Multiple samples that data set includes have similar click behavior.If the sample data of selection is concentrated, behavior of clicking is to click
The quantity of sample and the quantity of total sample between ratio be 0.8, then growth data concentrate, it is understood that there may be click behavior
Ratio between the quantity of the sample of click and the quantity of total sample is 0.8.If include to growth data collection is all
Sample launches advertisement, clicks it would be possible that having 80% sample to the advertisement of dispensing, has 20% sample to dispensing
Advertisement is without clicking.
For example, optimization aim information is click behavior, it is 10 that target, which extends multiple, and the sample data set of selection includes 1000
A sample, that is to say that determining growth data collection needs includes 10000 samples.In conjunction with the citing in above-mentioned steps 303, sample
Database includes 20000 samples, if this 20000 samples include the data of this 2 characteristic dimensions of gender and age, if
Gender is that the corresponding referential data of women is 0.6, and gender is that the corresponding referential data of male is 0.4, the age be located at 0~18 years old,
19~30 years old, 31~40 years old and 41~50 years old corresponding referential data are respectively 0.1,0.4,0.3 and 0.2, determine this 20000
The reference score of each sample in sample.Specifically, for one of sample, by the data correspondence of this characteristic dimension of gender
Corresponding with the data of this characteristic dimension of the age referential data of referential data be added, obtain the reference score of the sample.So
This 20000 samples are ranked up by the sequence descending according to the reference score of each sample afterwards.It chooses and is arranged with reference to score
Growth data collection is formed in the sample of 1-10000.Since optimization aim information is click behavior, that is to say, if to spreading number
Advertisement is launched according to this 10000 samples of concentration, 8000 samples is might have and the advertisement of dispensing is clicked, there are 2000
Sample is to the advertisement of dispensing without clicking.
Optionally, before step 304, the weight of enhancing index can also be configured.Wherein, enhancing index can be with
Including purchasing power, subjective interest and browsing history.
It should be noted that purchasing power, subjective interest and browsing this 3 enhancing indexs of history can respectively correspond it is multiple
Relevant characteristic dimension, to this 3 enhance indexs weight be configured after, growth data can be concentrated each sample with
Multiple referential datas corresponding to the data of the relevant multiple characteristic dimensions of this 3 enhancing indexs, respectively multiplied by corresponding this 3
Enhance the weight of index, to determine this 3 significance levels for enhancing indexs and concentrating in growth data.That is to say, weight more it is big then
Significance level is higher, and the smaller then significance level of weight is lower.
In one possible implementation, referring to Figure 12, purchasing power, subjectivity can be shown in model selection interface
The edit box of the weight of interest and browsing history.In the case where a kind of possible, in the model selection interface, purchasing power,
The weight of this 3 enhancing indexs is provided in the edit box of the weight of subjective interest and browsing history in advance, at this point it is possible to not
The weight that enhancing used in this 3 enhances index to this 3 in the edit box of the weight of index is configured.Alternatively, in the model
In selection interface, this 3 enhancing indexs are not provided in the edit box of purchasing power, subjective interest and the weight for browsing history
Weight, at this point, can will be obtained after editor when detecting the edit operation for the edit box of weight that this 3 are enhanced with index
3 weights, as corresponding purchasing power, subjective interest and the weight for browsing history.
For example, the weight of purchasing power, subjective interest and browsing history is respectively 0.5,0.3 and 0.1, then it can be by sample
Corresponding multiple referential datas, will multiplied by 0.5 in each sample multiple characteristic dimensions relevant to purchasing power in database
In each sample in sample database multiple characteristic dimensions relevant to subjective interest corresponding multiple referential datas multiplied by
0.3, by corresponding multiple referential datas in each sample multiple characteristic dimensions relevant to browsing history in sample database
It multiplied by 0.1, then sums it up again, obtains the reference score of each sample data.
Optionally, it can also include the following steps A- step B after the step 304.
Step A: display assessment result.
It should be noted that assessment result is used to assess the growth data collection that the model obtained after training determines.
Assessment result may include multiple evaluation indexes.Wherein, multiple evaluation indexes can pass through recipient's operating characteristic (ROC) curve
Figure, accurate rate recall rate (precision vs recall, P-R) curve graph, optimization aim and crowd extend multiple curve graph and
Distribution map of multiple samples that multiple samples that the sample data set of selection includes include with growth data collection etc. indicates.Separately
Outside, multiple evaluation indexes can also include accuracy, and accuracy is used to assess the precision degree of growth data collection, and accuracy is higher
Show that the levels of precision of growth data collection is higher, accuracy is lower, and the levels of precision for showing growth data collection is lower.
Wherein, ROC curve figure is using false positive example rate as horizontal axis, using real example rate as the longitudinal axis.Each point reflects on ROC curve
Sensitivity of the growth data collection to same signal stimulus.Wherein, false positive example rate refers to that growth data is concentrated, by target training pattern
Predict the ratio between the quantity of negative sample and the quantity of practical negative sample that are positive, real example rate refers to that growth data is concentrated,
Ratio between the quantity for the positive sample being positive predicted by target training pattern and the quantity of practical positive sample.Positive sample and negative
Sample refers to the two different samples divided according to a certain mode classification.ROC curve and horizontally and vertically between area
(Area Under Curve, AUC) shows that more greatly the quality of growth data collection is higher, the smaller matter for showing growth data collection of area
It measures lower.
P-R curve graph is negative axis using recall rate as horizontal axis with accurate rate.Or in some embodiments, P-R curve is to look into
Full rate is horizontal axis, using precision ratio as the longitudinal axis.Wherein, recall ratio refers to that growth data is concentrated, and is positive by the prediction of target training pattern
Positive sample and all positive samples between ratio, precision ratio refer to growth data concentrate, by target training pattern prediction be positive
Sample in, predict the positive sample being positive and all ratios predicted between the sample that is positive.P-R curve with horizontally and vertically it
Between area show that the quality of growth data collection is higher more greatly, the smaller quality for showing growth data collection of area is lower.
Optimization aim and crowd extend multiple curve graph to extend multiple as horizontal axis, using clicking rate as the longitudinal axis.Ordinary circumstance
Under, the extension bigger clicking rate of multiple is lower.It can be appreciated that extension multiple is bigger, the quality of growth data collection is lower.
The distribution map for multiple samples that multiple samples that the sample data set of selection includes and growth data collection include be by
The sample that the sample and growth data collection that the sample data set of selection includes include uses certain technology-mapped to two dimension
In plane, so as to be intuitively observed to the similar situation between the sample data set of selection and growth data collection.
In one possible implementation, assessment result can be shown on model training details interface.Specifically, when
When detecting assessment result display operation, assessment result can be shown on model training details interface.Illustratively, referring to figure
9, it, can be detailed in training pattern when detection is to the clicking operation of " assessment " option on model training details shown in Fig. 9 interface
Feelings show assessment result on interface.
It is worth noting that since assessment result is the growth data collection that can be determined the model obtained after training and determine
The data of quality that is to say that assessment result is better, and the quality of determining growth data collection is higher.Display assessment result helps to grasp
Make personnel intuitively to judge the quality of determining growth data collection.Also, assessment result is shown in computer equipment
On interface, operator need to only be observed, and assessment result is obtained without using other tools, more easy to be laborsaving in this way.
Step B: according to assessment result, at least one parameter that target training pattern includes is adjusted.
In one possible implementation, the multiple evaluation indexes that can include according to assessment result, to target training
At least one parameter that model includes is adjusted, so that the assessment result for the growth data collection for determining target training pattern is more
Meet preset demand.
Step 305: each sample deployment strategy for including to growth data collection.
In one possible implementation, can be according to optimization aim information, each sample for including to growth data collection
This deployment strategy.For example, optimization aim information is click behavior, each sample that the sample data of selection is concentrated, which can be, to be thrown
The sample for letting off advertisement, that is to say, each sample that the sample data of selection is concentrated some advertisement of dispensing click or
It is not clicked.At this point it is possible to launch identical advertisement to growth data collection, then growth data concentrate to the identical advertisement into
The quantity for the sample that row is clicked, the ratio between the quantity for the total sample for including with growth data collection, the sample number with selection
It is similar according to the corresponding ratio of concentration;Growth data concentrates the quantity for the sample that do not clicked to the identical advertisement, with extension
Ratio between the quantity for total sample that data set includes, it is similar with the corresponding ratio of the sample data of selection concentration.Change sentence
It talks about, click behavior of the sample that growth data is concentrated to the advertisement of dispensing, the sample concentrated with the sample data of selection is to throwing
The click behavior for the advertisement put is similar.
It in one possible implementation, can be by growth data after computer equipment determines growth data collection
Collection is stored, and computer equipment can show deployment interface.Wherein, the correlation in interface including growth data collection is disposed
Information, and the option of " deployment strategy " is carried out to growth data collection.Specifically, computer equipment determine growth data collection it
Afterwards, when detecting the selection operation of deployment label, display deployment interface.Alternatively, in detecting model training details interface
" next step " option selection operation when, display deployment interface.
It illustratively, include the relevant information of growth data collection, the correlation referring to Figure 13, in deployment interface shown in Figure 13
Information includes Target Modeling type, target training pattern, data processing method, evaluation index.Wherein, data processing method is
For the target training pattern mentioned in above-mentioned steps 303 to the processing mode of the sample data set of selection, evaluation index is above-mentioned
What is mentioned in step A includes the evaluation index of AUC and accuracy etc..When detect to deployment interface on " deployment strategy " option
Clicking operation when, each sample deployment strategy that can include to growth data collection.It specifically, can be by growth data collection portion
It affixes one's name in some production environment, according to the use demand of production environment, each sample that growth data collection includes is disposed corresponding
Strategy.It is, of course, also possible to each sample deployment strategy for including to growth data collection by other means, the embodiment of the present application
It does not limit this.
In the embodiment of the present application, the corresponding sample of selection target modeling type first is concentrated from multiple sample datas of storage
Data set, the sample data set of selection include multiple samples.Since the sample data set of selection includes multiple characteristic dimensions, so
At least one characteristic dimension can be selected from multiple characteristic dimension, from the multiple trained moulds stored for Target Modeling type
In type, selection target training pattern.Later, each sample for including according to the sample data set of selection is at least one feature dimensions
Data on degree are trained target training pattern.According to the model obtained after training, multiple and optimization are extended according to target
Target information determines growth data collection.Index is matched due to existing between growth data collection and the sample data set of selection, so
There is similitude between the data set of extension and the sample data set of selection.The each sample that finally can include to growth data collection
This deployment strategy.The embodiment of the present application, since multiple training patterns for Target Modeling type are to be previously stored computer
It, can be directly from multiple training patterns when needing to be trained the object module in multiple training patterns in equipment
Object module is selected, each sample for then including according to the sample data set according to selection is at least one feature dimensions
Data on degree are trained target training pattern.Furthermore, it is possible to repeatedly be trained to target training pattern or right
It is trained different training patterns as target training pattern.It that is to say data processing method provided by the embodiments of the present application
Different codes is write to multiple training patterns without operator to realize, there is certain coding base without operator
Plinth, so that the training process of target training pattern is simplified, so that determining the process more simple and effective of growth data collection.
Figure 14 is a kind of data processing equipment block diagram provided by the embodiments of the present application, is applied to computer equipment.Referring to figure
14, which includes: first choice module 1401, the second selecting module 1402, training module 1403 and determining module 1404.
First choice module 1401, for concentrating the corresponding sample of selection target modeling type from multiple sample datas of storage
Notebook data collection, the sample data set of selection include multiple samples, and each sample includes the data of multiple characteristic dimensions;
Second selecting module 1402, for selecting at least one characteristic dimension from multiple characteristic dimensions, from for target
In the multiple training patterns for modeling type storage, selection target training pattern;
Training module 1403, each sample for including for the sample data set according to selection is at least one feature dimensions
Data on degree are trained target training pattern;
Determining module 1404, for extending multiple and optimization aim information according to target according to the model obtained after training,
Determine growth data collection, the quantity of the sample that the quantity for the sample that growth data collection includes and the sample data set of selection include it
Between ratio be that target extends multiple, optimization aim information refers to the matching between growth data collection and the sample data set of selection
Index.
Optionally, the device further include:
First display model is used for display parameters set interface, and parameter setting interface includes at least one parameter edit box;
Module is obtained, for obtaining at least one for the setting of target training pattern from least one parameter edit box
A parameter.
Optionally, the device further include:
Second display module, for showing that assessment result, assessment result are used for the expansion determined to the model obtained after training
Exhibition data set is assessed;
Module is adjusted, for being adjusted at least one parameter that target training pattern includes according to assessment result.
Optionally, the device further include:
Third display module, in the training process of target training pattern, the training of displaying target training pattern to be flowed
Cheng Tu, training includes multiple trained nodes in flow chart, and the display mode of each trained node is the first display mode, second shows
Show mode or third display mode, the first display mode, which is used to indicate, is completed corresponding training node, the second display mode
It is used to indicate and is in corresponding training node, third display mode, which is used to indicate, does not reach corresponding training node.
Optionally, the device further include:
Deployment module, each sample deployment strategy for including to growth data collection.
In the embodiment of the present application, the corresponding sample of selection target modeling type first is concentrated from multiple sample datas of storage
Data set, the sample data set of selection include multiple samples.Since the sample data set of selection includes multiple characteristic dimensions, so
At least one characteristic dimension can be selected from multiple characteristic dimension, from the multiple trained moulds stored for Target Modeling type
In type, selection target training pattern.Later, each sample for including according to the sample data set of selection is at least one feature dimensions
Data on degree are trained target training pattern.According to the model obtained after training, multiple and optimization are extended according to target
Target information determines growth data collection.Index is matched due to existing between growth data collection and the sample data set of selection, so
There is similitude between the data set of extension and the sample data set of selection.The embodiment of the present application, due to being directed to Target Modeling class
Multiple training patterns of type are previously stored in computer equipment, need to the object module in multiple training patterns into
When row training, directly object module can be selected from multiple training patterns, then according to the sample number according to selection
Data of each sample for including according to collection at least one characteristic dimension, are trained target training pattern.Furthermore, it is possible to
Target training pattern is repeatedly trained, or is trained to using different training patterns as target training pattern.?
Being data processing method provided by the embodiments of the present application writes different codes to multiple training patterns without operator
Realize that there is certain basis of coding without operator, to simplify the training process of target training pattern, and then make
It must determine the process more simple and effective of growth data collection.
It should be understood that data processing equipment provided by the above embodiment is when carrying out data processing, only with above-mentioned each
The division progress of functional module can according to need and for example, in practical application by above-mentioned function distribution by different function
Energy module is completed, i.e., the internal structure of device is divided into different functional modules, to complete whole described above or portion
Divide function.In addition, data processing equipment provided by the above embodiment and data processing method embodiment belong to same design, have
Body realizes that process is detailed in embodiment of the method, and which is not described herein again.
Figure 15 is a kind of structural schematic diagram of data processing equipment provided by the embodiments of the present application, the data processing equipment
1500 can generate bigger difference because configuration or performance are different, may include one or more processors (central
Processing units, CPU) 1501 and one or more memory 1502, wherein it is deposited in the memory 1502
At least one instruction is contained, at least one instruction is loaded and executed by the processor 1501, to realize in above-described embodiment
Data processing method.Certainly, which can also have wired or wireless network interface, keyboard and input
The components such as output interface, to carry out input and output, which can also include other for realizing equipment
The component of function, this will not be repeated here.
In the exemplary embodiment, a kind of computer readable storage medium is additionally provided, the memory for example including instruction,
Above-metioned instruction can be executed by the processor in data processing equipment to complete data processing method in above-described embodiment.For example, institute
It states computer readable storage medium and can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk and light data and deposit
Store up equipment etc..
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware
It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely the preferred embodiments of the application, not to limit the application, it is all in spirit herein and
Within principle, any modification, equivalent replacement, improvement and so on be should be included within the scope of protection of this application.
Claims (10)
1. a kind of data processing method, which is characterized in that the described method includes:
The corresponding sample data set of selection target modeling type, the sample data set of selection are concentrated from multiple sample datas of storage
Including multiple samples, each sample includes the data of multiple characteristic dimensions;
At least one characteristic dimension is selected from the multiple characteristic dimension, it is multiple from being stored for the Target Modeling type
In training pattern, selection target training pattern;
Data of each sample for including according to the selected sample data set at least one described characteristic dimension, to institute
Target training pattern is stated to be trained;
According to the model obtained after training, multiple and optimization aim information are extended according to target, determines growth data collection, the expansion
Ratio between the quantity for the sample that the quantity for the sample that exhibition data set includes and the selected sample data set include is institute
Target extension multiple is stated, the optimization aim information refers between the growth data collection and the selected sample data set
Match index.
2. the method as described in claim 1, which is characterized in that described each of to include according to the selected sample data set
Data of the sample at least one described characteristic dimension, before being trained to the target training pattern, the method is also
Include:
Display parameters set interface, the parameter setting interface include at least one parameter edit box;
At least one parameter for target training pattern setting is obtained from least one described parameter edit box.
3. method according to claim 1 or 2, which is characterized in that it is described according to the model obtained after training, expand according to target
Multiple and optimization aim information are opened up, after determining growth data collection, the method also includes:
Show that assessment result, the assessment result are used to comment the growth data collection that the model obtained after the training determines
Estimate;
According to the assessment result, at least one parameter for including to the target training pattern is adjusted.
4. the method as described in claim 1, which is characterized in that the method also includes:
In the training process of the target training pattern, the training flow chart of the target training pattern, the training are shown
Include multiple trained nodes in flow chart, the display mode of each trained node be the first display mode, the second display mode or
Person's third display mode, first display mode, which is used to indicate, is completed corresponding training node, second display mode
It is used to indicate and is in corresponding training node, the third display mode, which is used to indicate, does not reach corresponding training node.
5. the method as described in claim 1, which is characterized in that it is described according to the model obtained after training, it is extended according to target
Multiple and optimization aim information, after determining growth data collection, the method also includes:
The each sample deployment strategy for including to the growth data collection.
6. a kind of data processing equipment, which is characterized in that described device includes:
First choice module, for concentrating the corresponding sample data of selection target modeling type from multiple sample datas of storage
Collection, the sample data set of selection includes multiple samples, and each sample includes the data of multiple characteristic dimensions;
Second selecting module, for selecting at least one characteristic dimension from the multiple characteristic dimension, from for the target
In the multiple training patterns for modeling type storage, selection target training pattern;
Training module, each sample for including according to the selected sample data set is at least one described characteristic dimension
On data, the target training pattern is trained;
Determining module, for extending multiple and optimization aim information according to target, determining extension according to the model obtained after training
The quantity for the sample that data set, the quantity of the sample that the growth data collection includes and the selected sample data set include it
Between ratio be that the target extends multiple, the optimization aim information refers to the growth data collection and the selected sample
Matching index between data set.
7. device as claimed in claim 6, which is characterized in that described device further include:
First display model is used for display parameters set interface, and the parameter setting interface includes at least one parameter edit box;
Module is obtained, for being obtained from least one described parameter edit box for target training pattern setting at least
One parameter.
8. device as claimed in claims 6 or 7, which is characterized in that described device further include:
Second display module, for showing that assessment result, the assessment result are used to determine the model obtained after the training
Growth data collection assessed;
Module is adjusted, for according to the assessment result, at least one parameter for including to the target training pattern to be adjusted
It is whole.
9. a kind of data processing equipment, which is characterized in that described device includes:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to the step of perform claim requires any one method described in 1-5.
10. a kind of computer readable storage medium, instruction is stored on the computer readable storage medium, which is characterized in that
The step of any one method described in claim 1-5 is realized when described instruction is executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910361278.4A CN110222710B (en) | 2019-04-30 | 2019-04-30 | Data processing method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910361278.4A CN110222710B (en) | 2019-04-30 | 2019-04-30 | Data processing method, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110222710A true CN110222710A (en) | 2019-09-10 |
CN110222710B CN110222710B (en) | 2022-03-08 |
Family
ID=67820210
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910361278.4A Active CN110222710B (en) | 2019-04-30 | 2019-04-30 | Data processing method, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110222710B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717535A (en) * | 2019-09-30 | 2020-01-21 | 北京九章云极科技有限公司 | Automatic modeling method and system based on data analysis processing system |
CN112613983A (en) * | 2020-12-25 | 2021-04-06 | 北京知因智慧科技有限公司 | Feature screening method and device in machine modeling process and electronic equipment |
US11367019B1 (en) * | 2020-11-30 | 2022-06-21 | Shanghai Icekredit, Inc. | Data processing method and apparatus, and computer device |
US11651380B1 (en) * | 2022-03-30 | 2023-05-16 | Intuit Inc. | Real-time propensity prediction using an ensemble of long-term and short-term user behavior models |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8756175B1 (en) * | 2012-02-22 | 2014-06-17 | Google Inc. | Robust and fast model fitting by adaptive sampling |
CN103902968A (en) * | 2014-02-26 | 2014-07-02 | 中国人民解放军国防科学技术大学 | Pedestrian detection model training method based on AdaBoost classifier |
CN104166706A (en) * | 2014-08-08 | 2014-11-26 | 苏州大学 | Multi-label classifier constructing method based on cost-sensitive active learning |
CN107169575A (en) * | 2017-06-27 | 2017-09-15 | 北京天机数测数据科技有限公司 | A kind of modeling and method for visualizing machine learning training pattern |
US20170300783A1 (en) * | 2016-04-13 | 2017-10-19 | Xerox Corporation | Target domain characterization for data augmentation |
CN107958268A (en) * | 2017-11-22 | 2018-04-24 | 用友金融信息技术股份有限公司 | The training method and device of a kind of data model |
CN108230296A (en) * | 2017-11-30 | 2018-06-29 | 腾讯科技(深圳)有限公司 | The recognition methods of characteristics of image and device, storage medium, electronic device |
US20180247227A1 (en) * | 2017-02-24 | 2018-08-30 | Xtract Technologies Inc. | Machine learning systems and methods for data augmentation |
CN108664975A (en) * | 2018-04-24 | 2018-10-16 | 新疆大学 | A kind of hand-written Letter Identification Method of Uighur, system and electronic equipment |
US20180350347A1 (en) * | 2017-05-31 | 2018-12-06 | International Business Machines Corporation | Generation of voice data as data augmentation for acoustic model training |
CN109389143A (en) * | 2018-06-19 | 2019-02-26 | 北京九章云极科技有限公司 | A kind of Data Analysis Services system and method for automatic modeling |
-
2019
- 2019-04-30 CN CN201910361278.4A patent/CN110222710B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8756175B1 (en) * | 2012-02-22 | 2014-06-17 | Google Inc. | Robust and fast model fitting by adaptive sampling |
CN103902968A (en) * | 2014-02-26 | 2014-07-02 | 中国人民解放军国防科学技术大学 | Pedestrian detection model training method based on AdaBoost classifier |
CN104166706A (en) * | 2014-08-08 | 2014-11-26 | 苏州大学 | Multi-label classifier constructing method based on cost-sensitive active learning |
US20170300783A1 (en) * | 2016-04-13 | 2017-10-19 | Xerox Corporation | Target domain characterization for data augmentation |
US20180247227A1 (en) * | 2017-02-24 | 2018-08-30 | Xtract Technologies Inc. | Machine learning systems and methods for data augmentation |
US20180350347A1 (en) * | 2017-05-31 | 2018-12-06 | International Business Machines Corporation | Generation of voice data as data augmentation for acoustic model training |
CN107169575A (en) * | 2017-06-27 | 2017-09-15 | 北京天机数测数据科技有限公司 | A kind of modeling and method for visualizing machine learning training pattern |
CN107958268A (en) * | 2017-11-22 | 2018-04-24 | 用友金融信息技术股份有限公司 | The training method and device of a kind of data model |
CN108230296A (en) * | 2017-11-30 | 2018-06-29 | 腾讯科技(深圳)有限公司 | The recognition methods of characteristics of image and device, storage medium, electronic device |
CN108664975A (en) * | 2018-04-24 | 2018-10-16 | 新疆大学 | A kind of hand-written Letter Identification Method of Uighur, system and electronic equipment |
CN109389143A (en) * | 2018-06-19 | 2019-02-26 | 北京九章云极科技有限公司 | A kind of Data Analysis Services system and method for automatic modeling |
Non-Patent Citations (2)
Title |
---|
LUKE TAYLOR等: "Improving Deep Learning with Generic Data Augmentation", 《SSCI》 * |
何成栋 等: "基于颜色属性的光谱重建训练样本正交优化", 《包装学报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717535A (en) * | 2019-09-30 | 2020-01-21 | 北京九章云极科技有限公司 | Automatic modeling method and system based on data analysis processing system |
CN110717535B (en) * | 2019-09-30 | 2020-09-11 | 北京九章云极科技有限公司 | Automatic modeling method and system based on data analysis processing system |
US11367019B1 (en) * | 2020-11-30 | 2022-06-21 | Shanghai Icekredit, Inc. | Data processing method and apparatus, and computer device |
CN112613983A (en) * | 2020-12-25 | 2021-04-06 | 北京知因智慧科技有限公司 | Feature screening method and device in machine modeling process and electronic equipment |
CN112613983B (en) * | 2020-12-25 | 2023-11-21 | 北京知因智慧科技有限公司 | Feature screening method and device in machine modeling process and electronic equipment |
US11651380B1 (en) * | 2022-03-30 | 2023-05-16 | Intuit Inc. | Real-time propensity prediction using an ensemble of long-term and short-term user behavior models |
Also Published As
Publication number | Publication date |
---|---|
CN110222710B (en) | 2022-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110222710A (en) | Data processing method, device and storage medium | |
Davis et al. | Clearing the FOG: Fuzzy, overlapping groups for social networks | |
Yang et al. | Predicting links in multi-relational and heterogeneous networks | |
Studer | WeightedCluster library manual | |
CN102737333B (en) | For calculating user and the offer order engine to the coupling of small segmentation | |
Gilbert et al. | Communities and hierarchical structures in dynamic social networks: analysis and visualization | |
KR102412461B1 (en) | Method for predicting demand using visual schema of product and system thereof | |
CN106709037B (en) | A kind of film recommended method based on Heterogeneous Information network | |
Li et al. | A link clustering based memetic algorithm for overlapping community detection | |
Yao et al. | Predicting academic performance via semi-supervised learning with constructed campus social network | |
Altman et al. | ORA user’s guide 2020 | |
Usman et al. | Interactive spatial analytics for human-aware building design | |
Santiago et al. | A methodology for the characterization of flow conductivity through the identification of communities in samples of fractured rocks | |
Zhang et al. | An innovation service system and personalized recommendation for customer-product interaction life cycle in smart product service system | |
Zhou et al. | An overlapping community detection algorithm in complex networks based on information theory | |
Richards et al. | Assessing the functional coherence of gene sets with metrics based on the Gene Ontology graph | |
Lyu et al. | IF-City: Intelligible fair city planning to measure, explain and mitigate inequality | |
Martínez-Torres et al. | Identifying the features of reputable users in eWOM communities by using Particle Swarm Optimization | |
CN118134553B (en) | E-commerce and explosion type multi-platform collaborative pushing system, method, equipment and medium | |
Kimani et al. | VidaMine: a visual data mining environment | |
Žalik et al. | A local multiresolution algorithm for detecting communities of unbalanced structures | |
de Vries et al. | Relative neighborhood graphs uncover the dynamics of social media engagement | |
Aziz et al. | Implementing Aproiri Algorithm for Predicting Result Analysis | |
Niwa et al. | Visual data mining using a constellation graph | |
Yang et al. | Risk Factors Discovery for Cancer Survivability Analysis Using Graph‐Rule Mining |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |