Nothing Special   »   [go: up one dir, main page]

CN110222710A - Data processing method, device and storage medium - Google Patents

Data processing method, device and storage medium Download PDF

Info

Publication number
CN110222710A
CN110222710A CN201910361278.4A CN201910361278A CN110222710A CN 110222710 A CN110222710 A CN 110222710A CN 201910361278 A CN201910361278 A CN 201910361278A CN 110222710 A CN110222710 A CN 110222710A
Authority
CN
China
Prior art keywords
sample
target
training
selection
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910361278.4A
Other languages
Chinese (zh)
Other versions
CN110222710B (en
Inventor
马纯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shenyan Intelligent Technology Co Ltd
Original Assignee
Beijing Shenyan Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shenyan Intelligent Technology Co Ltd filed Critical Beijing Shenyan Intelligent Technology Co Ltd
Priority to CN201910361278.4A priority Critical patent/CN110222710B/en
Publication of CN110222710A publication Critical patent/CN110222710A/en
Application granted granted Critical
Publication of CN110222710B publication Critical patent/CN110222710B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This application discloses a kind of data processing method, device and storage mediums, belong to data processing field.The described method includes: first concentrating the corresponding sample data set of selection target modeling type from multiple sample datas of storage.Then at least one characteristic dimension is selected from multiple characteristic dimensions, from the multiple training patterns stored for Target Modeling type, selection target training pattern.Data of each sample for including according to the sample data set of selection later at least one characteristic dimension, are trained target training pattern.Finally according to the model obtained after training, multiple and optimization aim information are extended according to target, determines growth data collection.Due to when needing to be trained object module, directly object module can be selected from multiple training patterns, different codes is write to multiple training patterns to realize without operator, so that training process is simplified, so that determining that the process of growth data collection is more efficient.

Description

Data processing method, device and storage medium
Technical field
This application involves data processing field, in particular to a kind of data processing method, device and storage medium.
Background technique
Machine learning techniques are usually used to the excavation for carrying out some data.That is, according to the sample data set got It treats trained model to be trained, other data is then excavated according to the model obtained after training.For example, in the marketing Field can treat trained model according to the marketing sample data set got and be trained, and then obtain according to after training To model excavate some other data, consequently facilitating the formulation of marketing strategy.Currently, the whole process of machine learning techniques Code realization is mainly write by operator, that is to say and realize that the whole process of machine learning techniques needs operator With certain basis of coding, so that the realization of the whole process of machine learning techniques is more difficult.
Summary of the invention
The embodiment of the present application provides a kind of data processing method, device and storage medium, can solve in the related technology Realize that the whole process of machine learning techniques needs operator to have certain basis of coding, so that machine learning techniques Whole process the more difficult problem of realization.The technical solution is as follows:
In a first aspect, providing a kind of data processing method, which comprises
The corresponding sample data set of selection target modeling type, the sample number of selection are concentrated from multiple sample datas of storage It include multiple samples according to collection, each sample includes the data of multiple characteristic dimensions;
At least one characteristic dimension is selected from the multiple characteristic dimension, from what is stored for the Target Modeling type In multiple training patterns, selection target training pattern;
Data of each sample for including according to the selected sample data set at least one described characteristic dimension, The target training pattern is trained;
According to the model obtained after training, multiple and optimization aim information are extended according to target, determines growth data collection, institute Ratio between the quantity for the sample that the quantity and the selected sample data set for stating the sample that growth data collection includes include Extend multiple for the target, the optimization aim information refer to the growth data collection and the selected sample data set it Between matching index.
Optionally, each sample for including according to the selected sample data set is at least one described feature dimensions Data on degree, before being trained to the target training pattern, the method also includes:
Display parameters set interface, the parameter setting interface include at least one parameter edit box;
At least one parameter for target training pattern setting is obtained from least one described parameter edit box.
Optionally, described according to the model obtained after training, multiple and optimization aim information are extended according to target, determines and expands After opening up data set, the method also includes:
Show assessment result, the growth data collection that the assessment result is used to determine the model that obtains after the training into Row assessment;
According to the assessment result, at least one parameter for including to the target training pattern is adjusted.
Optionally, the method also includes:
In the training process of the target training pattern, the training flow chart of the target training pattern is shown, it is described It include multiple trained nodes in training flow chart, the display mode of each trained node is the first display mode, the second display side Formula or third display mode, first display mode, which is used to indicate, is completed corresponding training node, second display Mode, which is used to indicate, is in corresponding training node, and the third display mode, which is used to indicate, does not reach corresponding training section Point.
Optionally, described according to the model obtained after training, multiple and optimization aim information are extended according to target, determines and expands After opening up data set, the method also includes:
The each sample deployment strategy for including to the growth data collection.
Second aspect, provides a kind of data processing equipment, and described device includes:
First choice module, for concentrating the corresponding sample number of selection target modeling type from multiple sample datas of storage According to collection, the sample data set of selection includes multiple samples, and each sample includes the data of multiple characteristic dimensions;
Second selecting module, for selecting at least one characteristic dimension from the multiple characteristic dimension, from for described In multiple training patterns of Target Modeling type storage, selection target training pattern;
Training module, each sample for including according to the selected sample data set is at least one described feature Data in dimension are trained the target training pattern;
Determining module, for extending multiple and optimization aim information according to target, determining according to the model obtained after training The number for the sample that growth data collection, the quantity for the sample that the growth data collection includes and the selected sample data set include Ratio between amount is that the target extends multiple, the optimization aim information refer to the growth data collection with it is selected Matching index between sample data set.
Optionally, described device further include:
First display model, is used for display parameters set interface, and the parameter setting interface includes that at least one parameter is compiled Collect frame;
Module is obtained, for being obtained from least one described parameter edit box for target training pattern setting At least one parameter.
Optionally, described device further include:
Second display module, for showing that assessment result, the assessment result are used for the model obtained after the training Determining growth data collection is assessed;
Adjust module, for according to the assessment result, at least one parameter for including to the target training pattern into Row adjustment.
Optionally, described device further include:
Third display module, for showing the target training pattern in the training process of the target training pattern Training flow chart, include multiple trained nodes in the trained flow chart, the display mode of each trained node is first aobvious Show mode, the second display mode or third display mode, first display mode, which is used to indicate, is completed corresponding training Node, second display mode, which is used to indicate, is in corresponding training node, and the third display mode is used to indicate not Reach corresponding training node.
Optionally, described device further include:
Deployment module, each sample deployment strategy for including to the growth data collection.
The third aspect, provides a kind of data processing equipment, and described device includes:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to the step of executing any one method described in above-mentioned first aspect.
Fourth aspect provides a kind of computer readable storage medium, finger is stored on the computer readable storage medium The step of enabling, any one method described in above-mentioned first aspect realized when described instruction is executed by processor.
5th aspect, provides a kind of computer program product including instruction, when run on a computer, so that Computer executes the step of any one of above-mentioned first aspect the method.
Technical solution provided by the embodiments of the present application can at least bring it is following the utility model has the advantages that
In the embodiment of the present application, the corresponding sample of selection target modeling type first is concentrated from multiple sample datas of storage Data set, the sample data set of selection include multiple samples.Since the sample data set of selection includes multiple characteristic dimensions, so At least one characteristic dimension can be selected from multiple characteristic dimension, from the multiple trained moulds stored for Target Modeling type In type, selection target training pattern.Later, each sample for including according to the sample data set of selection is at least one feature dimensions Data on degree are trained target training pattern.According to the model obtained after training, multiple and optimization are extended according to target Target information determines growth data collection.Index is matched due to existing between growth data collection and the sample data set of selection, so There is similitude between the data set of extension and the sample data set of selection.The embodiment of the present application, due to being directed to Target Modeling class Multiple training patterns of type are previously stored in computer equipment, need to the object module in multiple training patterns into When row training, directly object module can be selected from multiple training patterns, then according to the sample number according to selection Data of each sample for including according to collection at least one characteristic dimension, are trained target training pattern.Furthermore, it is possible to Target training pattern is repeatedly trained, or is trained to using different training patterns as target training pattern.? Being data processing method provided by the embodiments of the present application writes different codes to multiple training patterns without operator Realize that there is certain basis of coding without operator, to simplify the training process of target training pattern, and then make It must determine the process more simple and effective of growth data collection.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the drawings in the following description are only some examples of the present application, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is a kind of schematic diagram of implementation environment provided by the embodiments of the present application.
Fig. 2 is the flow chart of the first data processing method provided by the embodiments of the present application.
Fig. 3 is the flow chart of second of data processing method provided by the embodiments of the present application.
Fig. 4 is the first interface schematic diagram provided by the embodiments of the present application.
Fig. 5 is second of interface schematic diagram provided by the embodiments of the present application.
Fig. 6 is the third interface schematic diagram provided by the embodiments of the present application.
Fig. 7 is the 4th kind of interface schematic diagram provided by the embodiments of the present application.
Fig. 8 is the 5th kind of interface schematic diagram provided by the embodiments of the present application.
Fig. 9 is the 6th kind of interface schematic diagram provided by the embodiments of the present application.
Figure 10 is the 7th kind of interface schematic diagram provided by the embodiments of the present application.
Figure 11 is the 8th kind of interface schematic diagram provided by the embodiments of the present application.
Figure 12 is the 9th kind of interface schematic diagram provided by the embodiments of the present application.
Figure 13 is the provided by the embodiments of the present application ten kind of interface schematic diagram.
Figure 14 is a kind of data processing equipment block diagram provided by the embodiments of the present application.
Figure 15 is a kind of structural schematic diagram of data processing equipment provided by the embodiments of the present application.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they are only and the application The consistent device and method of some aspects example.
Before carrying out detailed explanation to the embodiment of the present application, first the implementation environment of the embodiment of the present application is carried out It introduces:
Fig. 1 is a kind of schematic diagram of implementation environment provided by the embodiments of the present application, and referring to Fig. 1, which includes meter Machine equipment 100 is calculated, computer equipment 100 includes output equipment 110, input equipment 120, business logic modules 130, life cycle Management module 140 and algorithmic dispatching module 150.
Output equipment 110 can be communicated with business logic modules 130, and output equipment 110 is displayed for multiple interfaces. Output equipment 110 can be liquid crystal display (liquid crystal display, LCD), Light-Emitting Diode (light Emitting diode, LED) display equipment, cathode-ray tube (cathode ray tube, CRT) display equipment or projector (projector) etc..
Input equipment 120 can be communicated with business logic modules 130, and input equipment 120 can receive use in many ways The input at family.For example, input equipment 120 can be mouse, keyboard, touch panel device or sensing equipment etc..
Business logic modules 130 can receive the operation information of user, generates and appoints to what target training pattern was trained Business, and life cycle management module 140 is written into the task.
Life cycle management module 140 includes multiple application programming interfaces (Application Programming Interface, API), life cycle management module 140, which can store, trains mould to target by what business logic modules 130 were written The task that type is trained.Also, the API for passing through life cycle management module 140, can be to life cycle management module 140 In code library in store the information such as the code of multiple training patterns, the parameter of multiple training patterns and assessment result, or Person can be deleted and be stored in the code library in life cycle management module 140 by the API of life cycle management module 140 The information such as the code of multiple training patterns, the parameter of multiple training patterns and assessment result.Wherein, life cycle management mould Block 140 can realize multiple function by machine learning process (Machine Learning flow, MLflow) application program Energy.
Life cycle management mould can be dispatched and be executed to algorithmic dispatching module 150 according to certain execution sequence and time Task in block 140, and can be added by the API of life cycle management module 140 to life cycle management module 140 New task, or delete the historic task in life cycle management module 140.
Computer equipment 100 can be a general purpose computing device or a dedicated computing machine equipment.It is implementing In, computer equipment 100 can be desktop computer, portable computer, network server, palm PC (Personal Digital Assistant, PDA), cell phone, tablet computer, wireless terminal device, communication equipment or embedded device, the application implemented The unlimited type for determining computer equipment of example.
Fig. 2 is a kind of flow chart of data processing method provided by the embodiments of the present application, and this method is set applied to computer It is standby.Wherein the computer equipment can be computer equipment 100 shown in FIG. 1, referring to fig. 2, this method comprises:
Step 201: concentrating the corresponding sample data set of selection target modeling type, selection from multiple sample datas of storage Sample data set include multiple samples, each sample includes the data of multiple characteristic dimensions.
Step 202: at least one characteristic dimension is selected from multiple characteristic dimensions, from what is stored for Target Modeling type In multiple training patterns, selection target training pattern.
Step 203: number of each sample that the sample data set according to selection includes at least one characteristic dimension According to being trained to target training pattern.
Step 204: according to the model obtained after training, extending multiple and optimization aim information according to target, determine extension Ratio between the quantity for the sample that data set, the quantity for the sample that growth data collection includes and the sample data set of selection include Multiple is extended for target, optimization aim information refers to the matching index between growth data collection and the sample data set of selection.
In the embodiment of the present application, the corresponding sample of selection target modeling type first is concentrated from multiple sample datas of storage Data set, the sample data set of selection include multiple samples.Since the sample data set of selection includes multiple characteristic dimensions, so At least one characteristic dimension can be selected from multiple characteristic dimension, from the multiple trained moulds stored for Target Modeling type In type, selection target training pattern.Later, each sample for including according to the sample data set of selection is at least one feature dimensions Data on degree are trained target training pattern.According to the model obtained after training, multiple and optimization are extended according to target Target information determines growth data collection.Index is matched due to existing between growth data collection and the sample data set of selection, so There is similitude between the data set of extension and the sample data set of selection.The embodiment of the present application, due to being directed to Target Modeling class Multiple training patterns of type are previously stored in computer equipment, need to the object module in multiple training patterns into When row training, directly object module can be selected from multiple training patterns, then according to the sample number according to selection Data of each sample for including according to collection at least one characteristic dimension, are trained target training pattern.Furthermore, it is possible to Target training pattern is repeatedly trained, or is trained to using different training patterns as target training pattern.? Being data processing method provided by the embodiments of the present application writes different codes to multiple training patterns without operator Realize that there is certain basis of coding without operator, to simplify the training process of target training pattern, and then make It must determine the process more simple and effective of growth data collection.
Optionally, number of each sample for including according to the sample data set of selection at least one characteristic dimension According to, before being trained to target training pattern, this method further include:
Display parameters set interface, parameter setting interface include at least one parameter edit box;
At least one parameter for the setting of target training pattern is obtained from least one parameter edit box.
Optionally, according to the model obtained after training, multiple and optimization aim information is extended according to target, determines spreading number After collection, this method further include:
Show that assessment result, assessment result are used to assess the growth data collection that the model obtained after training determines;
According to assessment result, at least one parameter that target training pattern includes is adjusted.
Optionally, this method further include:
In the training process of target training pattern, the training flow chart of displaying target training pattern is trained in flow chart Including multiple trained nodes, the display mode of each trained node is that the first display mode, the second display mode or third are aobvious Show mode, the first display mode, which is used to indicate, is completed corresponding training node, and the second display mode, which is used to indicate, is in phase The training node answered, third display mode, which is used to indicate, does not reach corresponding training node.
Optionally, according to the model obtained after training, multiple and optimization aim information is extended according to target, determines spreading number After collection, this method further include:
The each sample deployment strategy for including to growth data collection.
All the above alternatives, can form the alternative embodiment of the application according to any combination, and the application is real It applies example and this is no longer repeated one by one.
Fig. 3 is a kind of flow chart of data processing method provided by the embodiments of the present application, and this method is set applied to computer It is standby.Referring to Fig. 3, this method comprises:
Step 301: concentrating the corresponding sample data set of selection target modeling type, selection from multiple sample datas of storage Sample data set include multiple samples, each sample includes the data of multiple characteristic dimensions.
It should be noted that Target Modeling type can model type for consumption propensity, crowd extends modeling type, potential Client assesses modeling type, customer churn prediction modeling type or crowd and clusters modeling type etc..The sample data set packet of selection The quantity of the multiple samples included is usually more, for example, the sample data set of selection may include 10000,20000 samples Deng.In multiple samples that the sample data set of selection includes, each sample may include sample identification, the sample mark of each sample Know for uniquely indicating each sample.Illustratively, multiple samples can be multiple users, in this way, the sample mark of each sample Know to be the user account of each user.In addition, characteristic dimension can be age, gender, educational background, hobby, location Domain or purchase intention etc..
In the case where a kind of possible, in multiple samples that the sample data set of selection includes, different two samples Including multiple characteristic dimensions can identical perhaps part it is identical or completely not identical.When different two samples Including multiple characteristic dimensions in there are when identical characteristic dimension, the two different samples in identical characteristic dimension Data can be identical or not identical.
For example, with reference to table 1, sample 1, sample 2, sample 3, sample 4 and sample 5 are the sample data set packet of selection in table 1 5 samples in the multiple samples included.From table 1 it follows that sample 1 include multiple characteristic dimensions be gender and age, And the data of the two characteristic dimensions are respectively women and 20 years old;Multiple characteristic dimensions that sample 2 includes be gender and age, and The data of the two characteristic dimensions are respectively women and 20 years old;Multiple characteristic dimensions that sample 3 includes be gender and age, and this The data of two characteristic dimensions are respectively male and 30 years old;Multiple characteristic dimensions that sample 4 includes are academic and hobby, and The data of the two characteristic dimensions are respectively undergraduate course and travelling;Multiple characteristic dimensions that sample 5 includes be gender and educational background, and this The data of two characteristic dimensions are respectively male and undergraduate course.
It that is to say, it is gender and age that 2 characteristic dimensions that sample 1, sample 2 and sample 3 include are identical.Wherein, sample The data for this 2 characteristic dimensions that sheet 1 and sample 2 include are all the same, the number for this 2 characteristic dimensions that sample 1 and sample 3 include According to being all different.In addition, 2 characteristic dimensions that sample 1 and sample 4 include are all different.12 for including with sample 5 of sample In characteristic dimension, there are an identical characteristic dimension, i.e. gender, but sample 1 with sample 5 in this identical characteristic dimension Data it is not identical.Furthermore in 2 characteristic dimensions that sample 4 and sample 5 include, there are an identical characteristic dimensions, that is, learn It goes through, the data of sample 4 and sample 5 in this identical characteristic dimension are identical.
Table 1
Gender Age Educational background Hobby
Sample 1 Women 20 years old
Sample 2 Women 20 years old
Sample 3 Male 30 years old
Sample 4 Undergraduate course Travelling
Sample 5 Male Undergraduate course
It is worth noting that, the data for multiple characteristic dimensions that each sample includes can be obtained by sample database. Specifically, it when each sample includes sample identification, can be obtained from sample database according to the sample identification of each sample The data for multiple characteristic dimensions that each sample includes.And the data of each sample multiple characteristic dimensions for including can with two into Number processed indicates.By taking this characteristic dimension of gender as an example, the data of gender can be indicated with 1 or 0, for example, gender is women, 0 can be then expressed as;Gender is male, then can be expressed as 1.The characteristic dimension more for data class such as age, educational backgrounds, Multiple characteristic dimension units of these characteristic dimensions can be first determined, if the data of characteristic dimension are located at some characteristic dimension list The data of this characteristic dimension unit are then expressed as 1 by member, the data of other characteristic dimension units are expressed as 0, are indicated with this The data of this feature dimension.By taking this characteristic dimension of age as an example, the age can be divided into multiple characteristic dimension units, for example, 0 ~18 years old, 19~30 years old, 31~40 years old, 41~50 years old etc..If the age is 20 years old, can be by 19~30 years old characteristic dimension The data of unit are expressed as 1, and the data of other characteristic dimension units are expressed as 0, and the data at age are indicated with this.Certainly, also It can indicate that the data for multiple characteristic dimensions that each sample includes, the embodiment of the present application do not limit this by other means It is fixed.
In one possible implementation, computer equipment can be shown including modeling type selection interface, the modeling It include multiple modeling types in type selection interface.It, can be any by this when detecting the selection operation of any modeling type Modeling type is determined as Target Modeling type.At this point, computer equipment can be shown including collection selection interface, the data set It include multiple sample data sets in selection interface.It, can be any by this when detecting the selection operation of any sample data set Sample data set is determined as the corresponding sample data set of Target Modeling type.If multiple sample datas at collection selection interface It concentrates, when not including the sample data set for thinking selection, sample data set can also be added in computer equipment, thus when detection The sample data set of addition is shown on to data set selection interface.Then when detecting the selection to the sample data set of addition When operation, the sample data set of addition can be determined as to the corresponding sample data set of Target Modeling type.
Illustratively, a modeling type can usually represent a scene, and therefore, modeling type selection interface can claim For scene selection interface.It is referring to fig. 4 modeling type selection interface to Fig. 5, Fig. 4, which includes more A modeling type is respectively as follows: consumption propensity modeling type, crowd extends modeling type, potential customers assess modeling type, user Attrition prediction models type or crowd clusters modeling type.When detecting the selection operation to crowd's extension modeling type, It is when detecting the clicking operation to " entrance " option on the modeling type selection interface, computer equipment can be shown Collection selection interface as shown in Figure 5, the collection selection interface include 2 sample data sets, and this 2 sample data sets It is respectively as follows: " the brand audient of the second quarter in 2018 clicks crowd " and " the brand audient of the first quarter in 2018 clicks people Group ".Then it when detecting the selection operation to " the brand audient of the second quarter in 2018 clicks crowd ", that is to say when detection When to clicking operation to " the brand audient of the second quarter in 2018 click crowd " corresponding " selection " option on the interface, " the brand audient of the second quarter in 2018 clicks crowd " can be determined as crowd's extension modeling type and corresponded to by computer equipment Sample data set.It does not include the sample data set for thinking selection if this 2 sample datas at collection selection interface are concentrated When, sample data set, and the letter such as title and path for determining the sample data set of addition can also be added in computer equipment Breath.Then when detecting the clicking operation to " addition sample data set " option on the interface, display adds on the surface Add pop-up.The addition pop-up includes the edit box of sample data set, when detecting to the editor of the edit box of sample data set behaviour When making, computer equipment can determine the information such as title, the path of sample data set obtained after editor, if the sample with addition The information such as title, the path of notebook data collection are consistent, if unanimously, the sample data set added in computer equipment can be shown Show in collection selection interface.At this point, when detecting the selection operation to the sample data set of addition, namely work as and detect When the clicking operation of " selection " option corresponding to the sample data set of addition, computer equipment can be by the sample data of addition Collection is determined as crowd and extends the corresponding sample data set of modeling type.
It under normal conditions, can be direct after which is determined as Target Modeling type by computer equipment Set of displayable data selection interface.For example, detecting the point to " entrance " option on modeling type selection interface shown in Fig. 4 When hitting operation, the modeling type where this " entrance " option directly can be determined as Target Modeling type, it is then directly aobvious Show collection selection interface shown in fig. 5.Certainly, computer equipment by any modeling type be determined as Target Modeling type it Afterwards, when detecting the selection operation of collection selection label, set of displayable data selection interface.For example, detecting to Fig. 4 institute When the selection operation of the collection selection label on the left of modeling type selection interface shown, collection selection shown in fig. 5 is shown Interface.
Step 302: at least one characteristic dimension is selected from multiple characteristic dimensions, from what is stored for Target Modeling type In multiple training patterns, selection target training pattern.
Under normal conditions, it after computer equipment determines Target Modeling type corresponding sample data set, can show Characteristic dimension selection interface, includes multiple characteristic dimensions in this feature dimension selection interface, and multiple characteristic dimension is built for target Multiple characteristic dimensions of the corresponding sample data set of mould type.When the selection operation for detecting at least one any characteristic dimension When, which can be determined as at least one feature dimensions selected from above-mentioned multiple characteristic dimensions Degree.
It illustratively, include multiple characteristic dimensions, respectively property in characteristic dimension selection interface shown in Fig. 6 referring to Fig. 6 Not, age, educational background, marital status ... that is to say when detecting the selection operation of gender, age and educational background, detect Gender, age and educational background can be determined as selection at least by when the choosing operation of choice box before gender, age and educational background One characteristic dimension.
Under normal conditions, it after computer equipment determines the corresponding sample data set of Target Modeling type, can directly show Show characteristic dimension selection interface.For example, detecting the click to " selection " option on collection selection interface shown in fig. 5 When operation, the data set where this " selection " option directly can be determined as the corresponding data set of Target Modeling type, so After directly display characteristic dimension selection interface shown in fig. 6.Certainly, computer equipment determines the corresponding sample of Target Modeling type After data set, when detecting the selection operation of characteristic dimension selection label, characteristic dimension selection interface is shown.For example, When detecting the selection operation to the characteristic dimension selection label on the left of collection selection interface shown in fig. 5, show shown in Fig. 6 Characteristic dimension selection interface.
It should be noted that different modeling types can correspond to different multiple training patterns.For example, when modeling type When extending modeling type for crowd, the multiple training patterns for extending modeling type storage for crowd may include that differential index (di) increases Strong model, single category support vector machines (One Class Support Vector Machine, One Class SVM) model or Two layers of convolutional neural networks (Convolutional Neural Networks (2layers), CNN (2layers)) model etc.. When modeling type is that consumption propensity models type, multiple training patterns for consumption propensity modeling type storage may include Latent visitor purchases on a barter basis mixed model, the annular mixed model of latent visitor etc..It is poly- for crowd when modeling type is that people's clustering class models type Multiple training patterns that class models type storage may include K mean cluster model etc..It therefore, can be from for Target Modeling class Select a training pattern as target training module in multiple training patterns of type storage.In addition, multiple training patterns can be with It is stored in the life cycle management module 140 in computer equipment 100 described in Fig. 1.
In one possible implementation, computer equipment can be with display model selection interface, the model selection interface In include for Target Modeling type storage multiple training patterns.It, can when detecting the selection operation of any training pattern Any training module is determined as target training pattern.
It illustratively, include 3 training patterns in model selection interface shown in Fig. 7 referring to Fig. 7, and this 3 training Model is respectively as follows: differential index (di) enhancing model, One Class SVM model and CNN (2layers) model.When detecting this 3 In training pattern when the selection operation of any training pattern, which can be determined as target training pattern.? It is when the choosing operation of any choice box before detecting this 3 training patterns, it can be true by any training pattern It is set to target training pattern.
Under normal conditions, after computer equipment determines at least one characteristic dimension, model selection circle can be directly displayed Face.For example, can will be selected detecting to when the choosing operation of choice box before at least one characteristic dimension in Fig. 6 The selected characteristic dimension of frame determines at least one characteristic dimension, then directly displays model selection interface shown in Fig. 7.When So, after the characteristic dimension that choice box is selected can also being determined as at least one characteristic dimension, model selection is being detected When the selection operation of label, display model selection interface.Alternatively, checking " next step " choosing in characteristic dimension selection interface When the selection operation of item, display model selection interface.
It is worth noting that, in practical applications, in multiple training patterns for the storage of Target Modeling type, not wrapping When including the target training pattern of needs, operator can also pass through the code of the storage target training pattern into computer equipment Mode, addition to target training pattern is realized, so as to realize from the multiple training stored for Target Modeling type In model, selection target training pattern.Alternatively, if existing not in multiple training patterns of Target Modeling type storage When necessary training pattern, operator can also delete the code of the unnecessary training pattern in computer equipment, To realize the deletion to unnecessary training pattern.It that is to say in the embodiment of the present application, can be added according to use demand Or delete the training pattern for being directed to Target Modeling type.
Illustratively, by taking implementation environment shown in FIG. 1 as an example, when for multiple training patterns of Target Modeling type storage In, when not including the target training pattern needed, operator can pass through the life cycle management mould in computer equipment 100 The API of block 140 stores the code of target training pattern into the code library of life cycle management module 140, to realize to mesh Mark the addition of training pattern.Alternatively, if there are unnecessary instructions in multiple training patterns of Target Modeling type storage When practicing model, operator can will be stored in life by the API of the life cycle management module 140 in computer equipment 100 The code for ordering unnecessary training pattern in the code library of cycle management module 140 is deleted, to realize to unnecessary The deletion of training pattern.
In some embodiments, each training pattern in multiple training patterns may each comprise at least one parameter, institute (1)-step (2) can also determine as follows for the setting of target training pattern extremely before carrying out step 303 A few parameter.
(1): display parameters set interface, parameter setting interface include at least one parameter edit box.
It should be noted that the parameter of target training pattern may include maximum number of iterations, convergence, regular coefficient With minimum convergence error etc..
In one possible implementation, parameter setting interface can be individual interface, or model selection A part in interface.Illustratively, referring to Fig. 8, parameter setting interface is a part in model selection interface in Fig. 8, On interface i.e. shown in Fig. 8, multiple training patterns for the storage of Target Modeling type are not only shown, are also shown at least One parameter edit box.
(2): at least one parameter for the setting of target training pattern is obtained from least one parameter edit box.
In the case where a kind of possible, one is provided in advance at least one parameter edit box in parameter setting interface A numerical value, at this point it is possible to not have to be configured at least one parameter in parameter edit box.In other words, at least one is joined Number can regard the numerical value of default as.For example, when parameter includes maximum number of iterations, convergence, regular coefficient and minimum Whens convergence error etc., can in advance by maximum number of iterations be set as 500, convergence be set as 1, regular coefficient be set as 0, Minimum convergence error is set as 0.005 etc..It certainly, in this case, can also be to silent at least one parameter edit box Recognize numerical value to modify.
In the case where alternatively possible, number is not provided at least one parameter edit box in parameter setting interface Value, at this point, at least one that can will be obtained after editor is joined when detecting the edit operation at least one parameter edit box Number, as at least one parameter for the setting of target training pattern.
Step 303: number of each sample that the sample data set according to selection includes at least one characteristic dimension According to being trained to target training pattern.
In one possible implementation, target training pattern is trained, is by the sample data set of selection Including data of each sample at least one characteristic dimension, be input in target training pattern, to target training pattern It is trained.In some embodiments, target training pattern can regard a kind of algorithm as, include by the sample data set of selection Data of each sample at least one characteristic dimension, be input in target training pattern, be according to the algorithm to choosing Data of each sample that the sample data set selected includes at least one characteristic dimension are handled.For different targets For training pattern, the processing of data of each sample that the sample data set to selection includes at least one characteristic dimension Mode is different.
Illustratively, data of each sample for including by the sample data set of selection at least one characteristic dimension, After being input in target training pattern, due to the data of sample total in any feature dimension at least one characteristic dimension Multiple types can be divided into, therefore for any feature dimension at least one characteristic dimension, can determine that any feature is tieed up The ratio between the quantity of the corresponding sample of each data and the quantity of total sample on degree.Then these ratios are determined For the corresponding multiple referential datas of a variety of data of any feature dimension.
For example, the sample data set of selection includes 1000 samples.Multiple feature dimensions that the sample data set of selection includes Degree is gender, age and click behavior.Wherein, the data of gender can be divided into two classes, i.e. women and male;Age can divide For multiple characteristic dimension units: 0~18 years old, 19~30 years old, 31~40 years old, 41~50 years old, that is, the data at age can be divided into Four classes;The data of click behavior can be divided into two classes, that is, click and do not click on.Target training pattern determines this 1000 samples In, gender is that the quantity of the sample of women is 600, then gender be the sample of women quantity and total sample quantity it Between ratio be 600 divided by 1000, i.e., 0.6.Gender is that the quantity of the sample of male is 400, then gender is male's Ratio between the quantity of sample and the quantity of total sample is 0.4.The data at age are located at 0~18 years old, 19~30 years old, 31 The quantity of~40 years old and 41~50 years old samples is respectively 100,400,300 and 200.So similarly, age bit Ratio point between the quantity of 0~18 years old, 19~30 years old, 31~40 years old and 41~50 years old sample and the quantity of total sample Not are as follows: 0.1,0.4,0.3 and 0.2.Similarly, it is 800 that click behavior, which is the quantity for the sample clicked, and click behavior is not click on Sample quantity be 200, then click behavior be click the quantity of sample and the quantity of total sample between ratio It is 0.8, click behavior is that the ratio between the quantity for the sample not clicked on and the quantity of total sample is 0.2.Target trains mould These ratios can be determined as the corresponding multiple ginsengs of a variety of data of each characteristic dimension after determining these ratios by type Examine numerical value.
In addition, computer equipment can be instructed after computer equipment is trained target training pattern with display model Practice details interface, the model training details interface include the information and target training pattern of target training pattern training into Degree, thus by the model training details interface can the training progress to target training pattern be observed.
Illustratively, it is assumed that the target training pattern selected in model selection interface shown in Fig. 7 is differential index (di) increasing Strong model, at this point, computer equipment can show model training details interface as shown in Figure 9, model training shown in Fig. 9 is detailed The training progress of information and target training pattern on feelings interface including target training pattern.Wherein, target training pattern Information may include Target Modeling type, the sample data set of selection, the number of at least one characteristic dimension of selection, target instruction Practice model, the number that target training pattern has been trained excessively and training progress of target training pattern etc..
It under normal conditions, can be directly according to the sample data of selection after computer equipment determines target training pattern Data of each sample that collection includes at least one characteristic dimension, are trained target training pattern.Certainly, some In embodiment, the training that computer equipment can first without target training pattern, but display model training details interface.This When, when detecting the selection operation of the " RUN " option on model training details interface, target training pattern can be carried out Training.Illustratively, when detecting to the clicking operation of " RUN " option on model training details shown in Fig. 9 interface, start Target training pattern is trained.
It is gone through it is worth noting that, can also be shown on model training details interface to what other training patterns were trained History training record.Operator can so more easily grasped to go through target training pattern or other training patterns The information of history training record.
Optionally, in the training process of target training pattern, the training flow chart of displaying target training pattern, training stream Include multiple trained nodes in journey figure, the display mode of each trained node be the first display mode, the second display mode or Third display mode.
It should be noted that in some embodiments, multiple trained nodes may include the sample data set for inputting selection, Determine corresponding referential data of data etc. in each characteristic dimension.It is completed accordingly in addition, the first display mode is used to indicate Training node, the second display mode, which is used to indicate, is in corresponding training node, and third display mode, which is used to indicate, not to be reached Corresponding training node.
In one possible implementation, can on model training details interface displaying target training pattern training Flow chart.For example, with reference to Figure 10.The training process of displaying target training pattern on model training details shown in Fig. 10 interface Figure.
Illustratively, the first display mode, the second display mode and third display mode can by different colors come It indicates.For example, the first display mode is to be set as grey to the color that corresponding training node is completed, the second display mode is Red is set as to the color in corresponding training node, third display mode is to the face for not reaching corresponding training node Color is set as green.It is, of course, also possible to indicate that the first display mode, the second display mode and third are shown by other forms Mode, the embodiment of the present application do not limit this.
It is worth noting that during target training pattern is trained, the training process of displaying target training pattern Figure, can make operator more intuitive and clearly the training process of master goal training pattern namely operator can be with Intuitively observe which step the training of target training pattern proceeds to.
Step 304: according to the model obtained after training, extending multiple and optimization aim information according to target, determine extension Data set.
It should be noted that the sample that the quantity for the sample that growth data collection includes and the sample data set of selection include Ratio between quantity is that target extends multiple.Optimization aim information refers between growth data collection and the sample data set of selection Matching index.Wherein, optimization aim information can be click behavior, hobby or region diversity etc..It should be understood that It is that optimization aim information is different, determining growth data collection may be different.
In addition, target extension multiple and optimization aim information can be and is arranged in computer equipment in advance, it can also be with It is to be configured on the interface that computer equipment is shown before being trained to target training pattern.
Illustratively, if target extension multiple and optimization aim information are the advances being trained to target training pattern If row setting, referring to Figure 11, the dragging item and multiple optimization mesh of extension multiple can also be shown in model selection interface Mark the choice box of information.Multiple can be extended to target by the dragging item of dragging extension multiple to be configured, or pass through a little "+" option and "-" option is hit to be configured target extension multiple.In addition, passing through the selection for choosing multiple optimization aim information Frame can be configured optimization aim information.After being trained to target training pattern, it can be obtained according to after training The model arrived extends multiple and optimization aim information according to target, determines growth data collection.
Illustratively, if optimization aim is click behavior, at least one characteristic dimension is determined according to above-mentioned steps 303 After the corresponding multiple referential datas of a variety of data of middle any feature dimension, it can also be determined each from sample database The corresponding referential data of data for multiple characteristic dimensions that sample includes.Then, by sample each in sample database multiple Corresponding multiple referential datas are added in characteristic dimension, obtain the reference score of each sample.According to the reference point of each sample The descending sequence of number, all samples for including by sample database are ranked up.Multiple is extended according to target, after sequence Sample in selected section sample form growth data collection.At this point, composition growth data collection and selection sample data set it Between matching index be click behavior, in other words, the sample of growth data the collection multiple samples for including and selection of composition Multiple samples that data set includes have similar click behavior.If the sample data of selection is concentrated, behavior of clicking is to click The quantity of sample and the quantity of total sample between ratio be 0.8, then growth data concentrate, it is understood that there may be click behavior Ratio between the quantity of the sample of click and the quantity of total sample is 0.8.If include to growth data collection is all Sample launches advertisement, clicks it would be possible that having 80% sample to the advertisement of dispensing, has 20% sample to dispensing Advertisement is without clicking.
For example, optimization aim information is click behavior, it is 10 that target, which extends multiple, and the sample data set of selection includes 1000 A sample, that is to say that determining growth data collection needs includes 10000 samples.In conjunction with the citing in above-mentioned steps 303, sample Database includes 20000 samples, if this 20000 samples include the data of this 2 characteristic dimensions of gender and age, if Gender is that the corresponding referential data of women is 0.6, and gender is that the corresponding referential data of male is 0.4, the age be located at 0~18 years old, 19~30 years old, 31~40 years old and 41~50 years old corresponding referential data are respectively 0.1,0.4,0.3 and 0.2, determine this 20000 The reference score of each sample in sample.Specifically, for one of sample, by the data correspondence of this characteristic dimension of gender Corresponding with the data of this characteristic dimension of the age referential data of referential data be added, obtain the reference score of the sample.So This 20000 samples are ranked up by the sequence descending according to the reference score of each sample afterwards.It chooses and is arranged with reference to score Growth data collection is formed in the sample of 1-10000.Since optimization aim information is click behavior, that is to say, if to spreading number Advertisement is launched according to this 10000 samples of concentration, 8000 samples is might have and the advertisement of dispensing is clicked, there are 2000 Sample is to the advertisement of dispensing without clicking.
Optionally, before step 304, the weight of enhancing index can also be configured.Wherein, enhancing index can be with Including purchasing power, subjective interest and browsing history.
It should be noted that purchasing power, subjective interest and browsing this 3 enhancing indexs of history can respectively correspond it is multiple Relevant characteristic dimension, to this 3 enhance indexs weight be configured after, growth data can be concentrated each sample with Multiple referential datas corresponding to the data of the relevant multiple characteristic dimensions of this 3 enhancing indexs, respectively multiplied by corresponding this 3 Enhance the weight of index, to determine this 3 significance levels for enhancing indexs and concentrating in growth data.That is to say, weight more it is big then Significance level is higher, and the smaller then significance level of weight is lower.
In one possible implementation, referring to Figure 12, purchasing power, subjectivity can be shown in model selection interface The edit box of the weight of interest and browsing history.In the case where a kind of possible, in the model selection interface, purchasing power, The weight of this 3 enhancing indexs is provided in the edit box of the weight of subjective interest and browsing history in advance, at this point it is possible to not The weight that enhancing used in this 3 enhances index to this 3 in the edit box of the weight of index is configured.Alternatively, in the model In selection interface, this 3 enhancing indexs are not provided in the edit box of purchasing power, subjective interest and the weight for browsing history Weight, at this point, can will be obtained after editor when detecting the edit operation for the edit box of weight that this 3 are enhanced with index 3 weights, as corresponding purchasing power, subjective interest and the weight for browsing history.
For example, the weight of purchasing power, subjective interest and browsing history is respectively 0.5,0.3 and 0.1, then it can be by sample Corresponding multiple referential datas, will multiplied by 0.5 in each sample multiple characteristic dimensions relevant to purchasing power in database In each sample in sample database multiple characteristic dimensions relevant to subjective interest corresponding multiple referential datas multiplied by 0.3, by corresponding multiple referential datas in each sample multiple characteristic dimensions relevant to browsing history in sample database It multiplied by 0.1, then sums it up again, obtains the reference score of each sample data.
Optionally, it can also include the following steps A- step B after the step 304.
Step A: display assessment result.
It should be noted that assessment result is used to assess the growth data collection that the model obtained after training determines. Assessment result may include multiple evaluation indexes.Wherein, multiple evaluation indexes can pass through recipient's operating characteristic (ROC) curve Figure, accurate rate recall rate (precision vs recall, P-R) curve graph, optimization aim and crowd extend multiple curve graph and Distribution map of multiple samples that multiple samples that the sample data set of selection includes include with growth data collection etc. indicates.Separately Outside, multiple evaluation indexes can also include accuracy, and accuracy is used to assess the precision degree of growth data collection, and accuracy is higher Show that the levels of precision of growth data collection is higher, accuracy is lower, and the levels of precision for showing growth data collection is lower.
Wherein, ROC curve figure is using false positive example rate as horizontal axis, using real example rate as the longitudinal axis.Each point reflects on ROC curve Sensitivity of the growth data collection to same signal stimulus.Wherein, false positive example rate refers to that growth data is concentrated, by target training pattern Predict the ratio between the quantity of negative sample and the quantity of practical negative sample that are positive, real example rate refers to that growth data is concentrated, Ratio between the quantity for the positive sample being positive predicted by target training pattern and the quantity of practical positive sample.Positive sample and negative Sample refers to the two different samples divided according to a certain mode classification.ROC curve and horizontally and vertically between area (Area Under Curve, AUC) shows that more greatly the quality of growth data collection is higher, the smaller matter for showing growth data collection of area It measures lower.
P-R curve graph is negative axis using recall rate as horizontal axis with accurate rate.Or in some embodiments, P-R curve is to look into Full rate is horizontal axis, using precision ratio as the longitudinal axis.Wherein, recall ratio refers to that growth data is concentrated, and is positive by the prediction of target training pattern Positive sample and all positive samples between ratio, precision ratio refer to growth data concentrate, by target training pattern prediction be positive Sample in, predict the positive sample being positive and all ratios predicted between the sample that is positive.P-R curve with horizontally and vertically it Between area show that the quality of growth data collection is higher more greatly, the smaller quality for showing growth data collection of area is lower.
Optimization aim and crowd extend multiple curve graph to extend multiple as horizontal axis, using clicking rate as the longitudinal axis.Ordinary circumstance Under, the extension bigger clicking rate of multiple is lower.It can be appreciated that extension multiple is bigger, the quality of growth data collection is lower.
The distribution map for multiple samples that multiple samples that the sample data set of selection includes and growth data collection include be by The sample that the sample and growth data collection that the sample data set of selection includes include uses certain technology-mapped to two dimension In plane, so as to be intuitively observed to the similar situation between the sample data set of selection and growth data collection.
In one possible implementation, assessment result can be shown on model training details interface.Specifically, when When detecting assessment result display operation, assessment result can be shown on model training details interface.Illustratively, referring to figure 9, it, can be detailed in training pattern when detection is to the clicking operation of " assessment " option on model training details shown in Fig. 9 interface Feelings show assessment result on interface.
It is worth noting that since assessment result is the growth data collection that can be determined the model obtained after training and determine The data of quality that is to say that assessment result is better, and the quality of determining growth data collection is higher.Display assessment result helps to grasp Make personnel intuitively to judge the quality of determining growth data collection.Also, assessment result is shown in computer equipment On interface, operator need to only be observed, and assessment result is obtained without using other tools, more easy to be laborsaving in this way.
Step B: according to assessment result, at least one parameter that target training pattern includes is adjusted.
In one possible implementation, the multiple evaluation indexes that can include according to assessment result, to target training At least one parameter that model includes is adjusted, so that the assessment result for the growth data collection for determining target training pattern is more Meet preset demand.
Step 305: each sample deployment strategy for including to growth data collection.
In one possible implementation, can be according to optimization aim information, each sample for including to growth data collection This deployment strategy.For example, optimization aim information is click behavior, each sample that the sample data of selection is concentrated, which can be, to be thrown The sample for letting off advertisement, that is to say, each sample that the sample data of selection is concentrated some advertisement of dispensing click or It is not clicked.At this point it is possible to launch identical advertisement to growth data collection, then growth data concentrate to the identical advertisement into The quantity for the sample that row is clicked, the ratio between the quantity for the total sample for including with growth data collection, the sample number with selection It is similar according to the corresponding ratio of concentration;Growth data concentrates the quantity for the sample that do not clicked to the identical advertisement, with extension Ratio between the quantity for total sample that data set includes, it is similar with the corresponding ratio of the sample data of selection concentration.Change sentence It talks about, click behavior of the sample that growth data is concentrated to the advertisement of dispensing, the sample concentrated with the sample data of selection is to throwing The click behavior for the advertisement put is similar.
It in one possible implementation, can be by growth data after computer equipment determines growth data collection Collection is stored, and computer equipment can show deployment interface.Wherein, the correlation in interface including growth data collection is disposed Information, and the option of " deployment strategy " is carried out to growth data collection.Specifically, computer equipment determine growth data collection it Afterwards, when detecting the selection operation of deployment label, display deployment interface.Alternatively, in detecting model training details interface " next step " option selection operation when, display deployment interface.
It illustratively, include the relevant information of growth data collection, the correlation referring to Figure 13, in deployment interface shown in Figure 13 Information includes Target Modeling type, target training pattern, data processing method, evaluation index.Wherein, data processing method is For the target training pattern mentioned in above-mentioned steps 303 to the processing mode of the sample data set of selection, evaluation index is above-mentioned What is mentioned in step A includes the evaluation index of AUC and accuracy etc..When detect to deployment interface on " deployment strategy " option Clicking operation when, each sample deployment strategy that can include to growth data collection.It specifically, can be by growth data collection portion It affixes one's name in some production environment, according to the use demand of production environment, each sample that growth data collection includes is disposed corresponding Strategy.It is, of course, also possible to each sample deployment strategy for including to growth data collection by other means, the embodiment of the present application It does not limit this.
In the embodiment of the present application, the corresponding sample of selection target modeling type first is concentrated from multiple sample datas of storage Data set, the sample data set of selection include multiple samples.Since the sample data set of selection includes multiple characteristic dimensions, so At least one characteristic dimension can be selected from multiple characteristic dimension, from the multiple trained moulds stored for Target Modeling type In type, selection target training pattern.Later, each sample for including according to the sample data set of selection is at least one feature dimensions Data on degree are trained target training pattern.According to the model obtained after training, multiple and optimization are extended according to target Target information determines growth data collection.Index is matched due to existing between growth data collection and the sample data set of selection, so There is similitude between the data set of extension and the sample data set of selection.The each sample that finally can include to growth data collection This deployment strategy.The embodiment of the present application, since multiple training patterns for Target Modeling type are to be previously stored computer It, can be directly from multiple training patterns when needing to be trained the object module in multiple training patterns in equipment Object module is selected, each sample for then including according to the sample data set according to selection is at least one feature dimensions Data on degree are trained target training pattern.Furthermore, it is possible to repeatedly be trained to target training pattern or right It is trained different training patterns as target training pattern.It that is to say data processing method provided by the embodiments of the present application Different codes is write to multiple training patterns without operator to realize, there is certain coding base without operator Plinth, so that the training process of target training pattern is simplified, so that determining the process more simple and effective of growth data collection.
Figure 14 is a kind of data processing equipment block diagram provided by the embodiments of the present application, is applied to computer equipment.Referring to figure 14, which includes: first choice module 1401, the second selecting module 1402, training module 1403 and determining module 1404.
First choice module 1401, for concentrating the corresponding sample of selection target modeling type from multiple sample datas of storage Notebook data collection, the sample data set of selection include multiple samples, and each sample includes the data of multiple characteristic dimensions;
Second selecting module 1402, for selecting at least one characteristic dimension from multiple characteristic dimensions, from for target In the multiple training patterns for modeling type storage, selection target training pattern;
Training module 1403, each sample for including for the sample data set according to selection is at least one feature dimensions Data on degree are trained target training pattern;
Determining module 1404, for extending multiple and optimization aim information according to target according to the model obtained after training, Determine growth data collection, the quantity of the sample that the quantity for the sample that growth data collection includes and the sample data set of selection include it Between ratio be that target extends multiple, optimization aim information refers to the matching between growth data collection and the sample data set of selection Index.
Optionally, the device further include:
First display model is used for display parameters set interface, and parameter setting interface includes at least one parameter edit box;
Module is obtained, for obtaining at least one for the setting of target training pattern from least one parameter edit box A parameter.
Optionally, the device further include:
Second display module, for showing that assessment result, assessment result are used for the expansion determined to the model obtained after training Exhibition data set is assessed;
Module is adjusted, for being adjusted at least one parameter that target training pattern includes according to assessment result.
Optionally, the device further include:
Third display module, in the training process of target training pattern, the training of displaying target training pattern to be flowed Cheng Tu, training includes multiple trained nodes in flow chart, and the display mode of each trained node is the first display mode, second shows Show mode or third display mode, the first display mode, which is used to indicate, is completed corresponding training node, the second display mode It is used to indicate and is in corresponding training node, third display mode, which is used to indicate, does not reach corresponding training node.
Optionally, the device further include:
Deployment module, each sample deployment strategy for including to growth data collection.
In the embodiment of the present application, the corresponding sample of selection target modeling type first is concentrated from multiple sample datas of storage Data set, the sample data set of selection include multiple samples.Since the sample data set of selection includes multiple characteristic dimensions, so At least one characteristic dimension can be selected from multiple characteristic dimension, from the multiple trained moulds stored for Target Modeling type In type, selection target training pattern.Later, each sample for including according to the sample data set of selection is at least one feature dimensions Data on degree are trained target training pattern.According to the model obtained after training, multiple and optimization are extended according to target Target information determines growth data collection.Index is matched due to existing between growth data collection and the sample data set of selection, so There is similitude between the data set of extension and the sample data set of selection.The embodiment of the present application, due to being directed to Target Modeling class Multiple training patterns of type are previously stored in computer equipment, need to the object module in multiple training patterns into When row training, directly object module can be selected from multiple training patterns, then according to the sample number according to selection Data of each sample for including according to collection at least one characteristic dimension, are trained target training pattern.Furthermore, it is possible to Target training pattern is repeatedly trained, or is trained to using different training patterns as target training pattern.? Being data processing method provided by the embodiments of the present application writes different codes to multiple training patterns without operator Realize that there is certain basis of coding without operator, to simplify the training process of target training pattern, and then make It must determine the process more simple and effective of growth data collection.
It should be understood that data processing equipment provided by the above embodiment is when carrying out data processing, only with above-mentioned each The division progress of functional module can according to need and for example, in practical application by above-mentioned function distribution by different function Energy module is completed, i.e., the internal structure of device is divided into different functional modules, to complete whole described above or portion Divide function.In addition, data processing equipment provided by the above embodiment and data processing method embodiment belong to same design, have Body realizes that process is detailed in embodiment of the method, and which is not described herein again.
Figure 15 is a kind of structural schematic diagram of data processing equipment provided by the embodiments of the present application, the data processing equipment 1500 can generate bigger difference because configuration or performance are different, may include one or more processors (central Processing units, CPU) 1501 and one or more memory 1502, wherein it is deposited in the memory 1502 At least one instruction is contained, at least one instruction is loaded and executed by the processor 1501, to realize in above-described embodiment Data processing method.Certainly, which can also have wired or wireless network interface, keyboard and input The components such as output interface, to carry out input and output, which can also include other for realizing equipment The component of function, this will not be repeated here.
In the exemplary embodiment, a kind of computer readable storage medium is additionally provided, the memory for example including instruction, Above-metioned instruction can be executed by the processor in data processing equipment to complete data processing method in above-described embodiment.For example, institute It states computer readable storage medium and can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk and light data and deposit Store up equipment etc..
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely the preferred embodiments of the application, not to limit the application, it is all in spirit herein and Within principle, any modification, equivalent replacement, improvement and so on be should be included within the scope of protection of this application.

Claims (10)

1. a kind of data processing method, which is characterized in that the described method includes:
The corresponding sample data set of selection target modeling type, the sample data set of selection are concentrated from multiple sample datas of storage Including multiple samples, each sample includes the data of multiple characteristic dimensions;
At least one characteristic dimension is selected from the multiple characteristic dimension, it is multiple from being stored for the Target Modeling type In training pattern, selection target training pattern;
Data of each sample for including according to the selected sample data set at least one described characteristic dimension, to institute Target training pattern is stated to be trained;
According to the model obtained after training, multiple and optimization aim information are extended according to target, determines growth data collection, the expansion Ratio between the quantity for the sample that the quantity for the sample that exhibition data set includes and the selected sample data set include is institute Target extension multiple is stated, the optimization aim information refers between the growth data collection and the selected sample data set Match index.
2. the method as described in claim 1, which is characterized in that described each of to include according to the selected sample data set Data of the sample at least one described characteristic dimension, before being trained to the target training pattern, the method is also Include:
Display parameters set interface, the parameter setting interface include at least one parameter edit box;
At least one parameter for target training pattern setting is obtained from least one described parameter edit box.
3. method according to claim 1 or 2, which is characterized in that it is described according to the model obtained after training, expand according to target Multiple and optimization aim information are opened up, after determining growth data collection, the method also includes:
Show that assessment result, the assessment result are used to comment the growth data collection that the model obtained after the training determines Estimate;
According to the assessment result, at least one parameter for including to the target training pattern is adjusted.
4. the method as described in claim 1, which is characterized in that the method also includes:
In the training process of the target training pattern, the training flow chart of the target training pattern, the training are shown Include multiple trained nodes in flow chart, the display mode of each trained node be the first display mode, the second display mode or Person's third display mode, first display mode, which is used to indicate, is completed corresponding training node, second display mode It is used to indicate and is in corresponding training node, the third display mode, which is used to indicate, does not reach corresponding training node.
5. the method as described in claim 1, which is characterized in that it is described according to the model obtained after training, it is extended according to target Multiple and optimization aim information, after determining growth data collection, the method also includes:
The each sample deployment strategy for including to the growth data collection.
6. a kind of data processing equipment, which is characterized in that described device includes:
First choice module, for concentrating the corresponding sample data of selection target modeling type from multiple sample datas of storage Collection, the sample data set of selection includes multiple samples, and each sample includes the data of multiple characteristic dimensions;
Second selecting module, for selecting at least one characteristic dimension from the multiple characteristic dimension, from for the target In the multiple training patterns for modeling type storage, selection target training pattern;
Training module, each sample for including according to the selected sample data set is at least one described characteristic dimension On data, the target training pattern is trained;
Determining module, for extending multiple and optimization aim information according to target, determining extension according to the model obtained after training The quantity for the sample that data set, the quantity of the sample that the growth data collection includes and the selected sample data set include it Between ratio be that the target extends multiple, the optimization aim information refers to the growth data collection and the selected sample Matching index between data set.
7. device as claimed in claim 6, which is characterized in that described device further include:
First display model is used for display parameters set interface, and the parameter setting interface includes at least one parameter edit box;
Module is obtained, for being obtained from least one described parameter edit box for target training pattern setting at least One parameter.
8. device as claimed in claims 6 or 7, which is characterized in that described device further include:
Second display module, for showing that assessment result, the assessment result are used to determine the model obtained after the training Growth data collection assessed;
Module is adjusted, for according to the assessment result, at least one parameter for including to the target training pattern to be adjusted It is whole.
9. a kind of data processing equipment, which is characterized in that described device includes:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to the step of perform claim requires any one method described in 1-5.
10. a kind of computer readable storage medium, instruction is stored on the computer readable storage medium, which is characterized in that The step of any one method described in claim 1-5 is realized when described instruction is executed by processor.
CN201910361278.4A 2019-04-30 2019-04-30 Data processing method, device and storage medium Active CN110222710B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910361278.4A CN110222710B (en) 2019-04-30 2019-04-30 Data processing method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910361278.4A CN110222710B (en) 2019-04-30 2019-04-30 Data processing method, device and storage medium

Publications (2)

Publication Number Publication Date
CN110222710A true CN110222710A (en) 2019-09-10
CN110222710B CN110222710B (en) 2022-03-08

Family

ID=67820210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910361278.4A Active CN110222710B (en) 2019-04-30 2019-04-30 Data processing method, device and storage medium

Country Status (1)

Country Link
CN (1) CN110222710B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717535A (en) * 2019-09-30 2020-01-21 北京九章云极科技有限公司 Automatic modeling method and system based on data analysis processing system
CN112613983A (en) * 2020-12-25 2021-04-06 北京知因智慧科技有限公司 Feature screening method and device in machine modeling process and electronic equipment
US11367019B1 (en) * 2020-11-30 2022-06-21 Shanghai Icekredit, Inc. Data processing method and apparatus, and computer device
US11651380B1 (en) * 2022-03-30 2023-05-16 Intuit Inc. Real-time propensity prediction using an ensemble of long-term and short-term user behavior models

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8756175B1 (en) * 2012-02-22 2014-06-17 Google Inc. Robust and fast model fitting by adaptive sampling
CN103902968A (en) * 2014-02-26 2014-07-02 中国人民解放军国防科学技术大学 Pedestrian detection model training method based on AdaBoost classifier
CN104166706A (en) * 2014-08-08 2014-11-26 苏州大学 Multi-label classifier constructing method based on cost-sensitive active learning
CN107169575A (en) * 2017-06-27 2017-09-15 北京天机数测数据科技有限公司 A kind of modeling and method for visualizing machine learning training pattern
US20170300783A1 (en) * 2016-04-13 2017-10-19 Xerox Corporation Target domain characterization for data augmentation
CN107958268A (en) * 2017-11-22 2018-04-24 用友金融信息技术股份有限公司 The training method and device of a kind of data model
CN108230296A (en) * 2017-11-30 2018-06-29 腾讯科技(深圳)有限公司 The recognition methods of characteristics of image and device, storage medium, electronic device
US20180247227A1 (en) * 2017-02-24 2018-08-30 Xtract Technologies Inc. Machine learning systems and methods for data augmentation
CN108664975A (en) * 2018-04-24 2018-10-16 新疆大学 A kind of hand-written Letter Identification Method of Uighur, system and electronic equipment
US20180350347A1 (en) * 2017-05-31 2018-12-06 International Business Machines Corporation Generation of voice data as data augmentation for acoustic model training
CN109389143A (en) * 2018-06-19 2019-02-26 北京九章云极科技有限公司 A kind of Data Analysis Services system and method for automatic modeling

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8756175B1 (en) * 2012-02-22 2014-06-17 Google Inc. Robust and fast model fitting by adaptive sampling
CN103902968A (en) * 2014-02-26 2014-07-02 中国人民解放军国防科学技术大学 Pedestrian detection model training method based on AdaBoost classifier
CN104166706A (en) * 2014-08-08 2014-11-26 苏州大学 Multi-label classifier constructing method based on cost-sensitive active learning
US20170300783A1 (en) * 2016-04-13 2017-10-19 Xerox Corporation Target domain characterization for data augmentation
US20180247227A1 (en) * 2017-02-24 2018-08-30 Xtract Technologies Inc. Machine learning systems and methods for data augmentation
US20180350347A1 (en) * 2017-05-31 2018-12-06 International Business Machines Corporation Generation of voice data as data augmentation for acoustic model training
CN107169575A (en) * 2017-06-27 2017-09-15 北京天机数测数据科技有限公司 A kind of modeling and method for visualizing machine learning training pattern
CN107958268A (en) * 2017-11-22 2018-04-24 用友金融信息技术股份有限公司 The training method and device of a kind of data model
CN108230296A (en) * 2017-11-30 2018-06-29 腾讯科技(深圳)有限公司 The recognition methods of characteristics of image and device, storage medium, electronic device
CN108664975A (en) * 2018-04-24 2018-10-16 新疆大学 A kind of hand-written Letter Identification Method of Uighur, system and electronic equipment
CN109389143A (en) * 2018-06-19 2019-02-26 北京九章云极科技有限公司 A kind of Data Analysis Services system and method for automatic modeling

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LUKE TAYLOR等: "Improving Deep Learning with Generic Data Augmentation", 《SSCI》 *
何成栋 等: "基于颜色属性的光谱重建训练样本正交优化", 《包装学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717535A (en) * 2019-09-30 2020-01-21 北京九章云极科技有限公司 Automatic modeling method and system based on data analysis processing system
CN110717535B (en) * 2019-09-30 2020-09-11 北京九章云极科技有限公司 Automatic modeling method and system based on data analysis processing system
US11367019B1 (en) * 2020-11-30 2022-06-21 Shanghai Icekredit, Inc. Data processing method and apparatus, and computer device
CN112613983A (en) * 2020-12-25 2021-04-06 北京知因智慧科技有限公司 Feature screening method and device in machine modeling process and electronic equipment
CN112613983B (en) * 2020-12-25 2023-11-21 北京知因智慧科技有限公司 Feature screening method and device in machine modeling process and electronic equipment
US11651380B1 (en) * 2022-03-30 2023-05-16 Intuit Inc. Real-time propensity prediction using an ensemble of long-term and short-term user behavior models

Also Published As

Publication number Publication date
CN110222710B (en) 2022-03-08

Similar Documents

Publication Publication Date Title
CN110222710A (en) Data processing method, device and storage medium
Davis et al. Clearing the FOG: Fuzzy, overlapping groups for social networks
Yang et al. Predicting links in multi-relational and heterogeneous networks
Studer WeightedCluster library manual
CN102737333B (en) For calculating user and the offer order engine to the coupling of small segmentation
Gilbert et al. Communities and hierarchical structures in dynamic social networks: analysis and visualization
KR102412461B1 (en) Method for predicting demand using visual schema of product and system thereof
CN106709037B (en) A kind of film recommended method based on Heterogeneous Information network
Li et al. A link clustering based memetic algorithm for overlapping community detection
Yao et al. Predicting academic performance via semi-supervised learning with constructed campus social network
Altman et al. ORA user’s guide 2020
Usman et al. Interactive spatial analytics for human-aware building design
Santiago et al. A methodology for the characterization of flow conductivity through the identification of communities in samples of fractured rocks
Zhang et al. An innovation service system and personalized recommendation for customer-product interaction life cycle in smart product service system
Zhou et al. An overlapping community detection algorithm in complex networks based on information theory
Richards et al. Assessing the functional coherence of gene sets with metrics based on the Gene Ontology graph
Lyu et al. IF-City: Intelligible fair city planning to measure, explain and mitigate inequality
Martínez-Torres et al. Identifying the features of reputable users in eWOM communities by using Particle Swarm Optimization
CN118134553B (en) E-commerce and explosion type multi-platform collaborative pushing system, method, equipment and medium
Kimani et al. VidaMine: a visual data mining environment
Žalik et al. A local multiresolution algorithm for detecting communities of unbalanced structures
de Vries et al. Relative neighborhood graphs uncover the dynamics of social media engagement
Aziz et al. Implementing Aproiri Algorithm for Predicting Result Analysis
Niwa et al. Visual data mining using a constellation graph
Yang et al. Risk Factors Discovery for Cancer Survivability Analysis Using Graph‐Rule Mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant