CN105975992A - Unbalanced data classification method based on adaptive upsampling - Google Patents
Unbalanced data classification method based on adaptive upsampling Download PDFInfo
- Publication number
- CN105975992A CN105975992A CN201610331709.9A CN201610331709A CN105975992A CN 105975992 A CN105975992 A CN 105975992A CN 201610331709 A CN201610331709 A CN 201610331709A CN 105975992 A CN105975992 A CN 105975992A
- Authority
- CN
- China
- Prior art keywords
- sample
- positive
- positive sample
- samples
- new
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2148—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to an unbalanced data classification method based on adaptive upsampling. The method includes the following steps of calculating the total of positive samples to be newly generated; calculating the probability density distribution for each positive sample by taking the Euclidean distance as the metric; determining the number of the new samples to be generated of the positive sample; generating a new positive sample and adding the newly generated positive sample points to an original unbalanced training set to make the positive and negative samples be same in number, namely, obtaining a new balance training set including n<n> positive samples and n<n> negative samples; and training the newly generated balance training set by means of an Adaboost algorithm and obtaining a final classification model after the iteration for T times. According to the invention, the classification performance of the unbalanced dataset is improved.
Description
Art
The present invention relates to mode identification technology, be specifically related to a kind of grader for unbalanced dataset.
Background technology
Along with data mining, the fast development of pattern recognition and machine learning techniques, data classification image retrieval,
Medical treatment detection with diagnose, detect a lie, multiple fields such as text classification and crude oil leakage detection are applied and play a significant role.
But, the such as classical taxonomy algorithm such as support vector machine, artificial neural network and linear discriminant analysis all assumes that instruction when design
In data set used by white silk, all kinds of comprised sample numbers are roughly the same.But it practice, in above-mentioned several fields, exceptional sample
The number of (positive sample) is often far fewer than normal sample (negative sample).Now, for obtaining higher overall accuracy rate, classical taxonomy
Device can more concern negative sample classes, classification boundaries can move to positive sample orientation so that a large amount of positive samples are divided into negative class by mistake,
Cause positive class sample classification hydraulic performance decline eventually.It is possessed of higher values in decision-making, for carrying in view of in most cases exceptional sample
High positive sample classification accuracy rate, the sorting algorithm for unbalanced dataset becomes study hotspot.
In recent years, scientific research personnel proposes the multiple sorting technique for unbalanced dataset.According to effective object not
With, these methods are mainly divided into data level method and the big class of algorithm level method two.
Data level method mainly by data are carried out resampling change data be distributed, the number making positive negative sample is basic
Identical, realize data balancing with this.Negative sample carried out down-sampled and align sample and carry out a liter sampling and all can reach this purpose.
Patent " the protein-nucleotide bindings bit point prediction method based on there being supervision up-sampling study " (CN104077499A) have employed
The method rising sampling, by increasing positive sample size to obtain the data set of balance and for Training Support Vector Machines.But due to
This kind of method adds in original data set after simply being replicated by positive sample, is equivalent to each positive sample standard deviation and is repeatedly instructed
Practice, Expired Drugs easily occurs, ultimately result in classifier performance and decline.Patent is " based on sub-sampling towards unbalanced dataset
Traffic event automatic detection method " (CN103927874A) use down-sampled method, concentrate from negative sample and randomly draw part
Grader is trained by sample sample positive with entirety composition training set.But owing to having abandoned a large amount of negative sample, the method cannot
Ensureing that the negative sample subset that extraction obtains can preferably represent original sample set, therefore training effect is the most not ideal enough.
Algorithm level method mainly changes data distribution solve uneven classification problem by improving sorting algorithm.
Adaboost is one of classical algorithm level method.This method is by cascading multiple graders, and is continuously increased wrong point of sample
This weight, to improve such sample wrong cost divided again, thus improves the accuracy rate of classification.But, due to traditional
Adaboost algorithm itself does not too much pay close attention to positive sample, and therefore effect is the most not ideal enough.
As can be seen from the above analysis, although data level method and algorithm level method can alleviate data nonbalance to dividing
The impact that class effect produces, but two kinds of methods all have some limitations.
Summary of the invention
It is an object of the invention to overcome the most methodical deficiency, propose a kind of unbalanced data rising sampling based on self adaptation
Collection sorting algorithm, to improve the classification performance of unbalanced dataset.Technical scheme is as follows:
A kind of unbalanced dataset sorting technique rising sampling based on self adaptation, if original unbalanced data concentrates positive sample
Number is np, negative sample number is nn, the method comprises the following steps:
(1) according to npAnd nnCalculate the unbalance factor IR of unbalanced dataset, IR calculating needs newly-generated positive sample total
Number G;
(2) with Euclidean distance for tolerance, for each positive sample i, search unbalanced data is concentrated with its closest K
Individual nearest samples, adds up the ratio shared by negative sample in above-mentioned K nearest samples, is designated as pi, to each positive sample gained
The p arrivediValue is added and is normalized, and the value obtained after process being completed is designated as ri, the r of the most each positive sampleiValue sum
Be 1, i.e. riFormation probability Density Distribution, claims riProbability for positive sample i;
(3) for each positive sample i, according to the probability r obtained in positive total sample number G-value and step (2)iDetermine this positive sample
This required new samples number g generatedi;
(4) for each positive sample i, K the nearest samples obtained in step (2) randomly selects giIndividual, respectively
Forming sample pair with it, randomly select and a little i.e. obtain newly-generated positive sample on the line of sample pair, new positive sample is raw
One-tenth process generates G new positive sample point after completing, newly-generated G positive sample point is joined original Nonblanced training sets
In, make positive and negative number of samples identical, i.e. obtain comprising positive sample and each n of negative samplenIndividual new balance training collection;
(5) iterations of note Adaboost algorithm is T, uses Adaboost algorithm to enter newly-generated balance training collection
Row training, obtains final disaggregated model after iteration T time.
The present invention is directed to unbalanced dataset, the algorithm that data level method and algorithm level method are combined, and to a liter sampling
Algorithm improves and optimizes, and the positive sample point near positive and negative sample boundary mainly carries out a liter sampling, to away from border
Positive sample does not processes, to obtain more preferable classifying quality on unbalanced dataset, combine self adaptation rise sampling algorithm with
The advantage of Adaboost algorithm, it is ensured that rise the new positive sample generated in sampling and be concentrated mainly near border, simultaneously by combination
Grader carries out strengthening study, improves grader overall performance.Comparing through experiment, the present invention is in multiple grader evaluation indexes
There is clear superiority.
Accompanying drawing explanation
Fig. 1 is that Adaboost strengthens learning algorithm flow chart.
Fig. 2 is the flow chart of the present invention.
Detailed description of the invention
The present invention is risen Adaboost algorithm shown in sampling algorithm and Fig. 1 by self adaptation and inspires, and the two is combined, is formed
One integrated classifier.The present invention is further detailed explanation below in conjunction with the accompanying drawings.
(1) test and training data are obtained: the present invention selects the vehicle class identification database in KEEL data base, altogether bag
Containing 846, sample.Positive sample in data base is buggy data, totally 199, i.e. np=199.Negative sample comprise bus,
The data of Opel car, Sa Bo automobile totally three kinds of vehicles, totally 647, i.e. nn=647.Data base comprises moment of torsion, turns to half
Totally 18 dimensional feature such as footpath, maximum braking distance.Unbalance factor is calculated by (1) formula,
IR=nn/np(1)
Unbalance factor in this experiment can be obtained and should be 3.25.
(2) the positive number of samples needing to generate is calculated by (2) formula,
G=(nn-np)×β(2)
Wherein, β is a constant between 0 to 1.When β=1, after liter sampling, the number of positive negative sample is by complete
Exactly the same, data set reaches complete equipilibrium, and the present invention takes β=1.Understand, need the new positive number of samples generated to should be 448.With
Align sample according to this value afterwards and carry out self adaptation liter sampling processing, make positive and negative number of samples reach balance.Method particularly includes: for
Each positive sample, using Euclidean distance as tolerance, calculates negative sample proportion p in K the sample point that it is nearest respectivelyi:
pi=ki/ K, i=1 ..., np (3)
For ensure accurately to judge each positive sample whether near positive and negative sample boundary, K should take higher value, but along with K value
Increase, amount of calculation also will substantially increase.For keeping relatively low computation complexity, above-mentioned two demands are carried out at compromise by the present invention
Reason, takes K=5.Subsequently, to all piIt is normalized so that it is be expressed as probability density distribution and calculate each positive sample
The new positive number of samples that should generate
From (4) formula, the sample point that negative sample is more in border, neighbouring sample will be used for generating more just
Sample, and the sample point being positive sample away from border, neighbouring sample is not used to generate positive sample.Subsequently, to each
Individual positive sample, randomly selects g in its K nearest samples pointiIndividual, by the positive sample that the generation of (5) formula method is new:
newi=xi+λ(xni-xi)(5)
Wherein, newiBeing newly-generated sample point, λ is value random number between 0 to 1, xniFor being selected at random
In neighbouring sample point.For each positive sample, this process will carry out giSecondary.After sample generation process completes, by newly-generated
Sample point join in original Nonblanced training sets, i.e. can get new balance training collection.This adaptive increasing is sampled
Method may insure that newly-generated training set does not exist imbalance problem, and newly-generated sample is predominantly located at positive and negative sample and distinguishes
The borderline region that difficulty is bigger.
Being can be seen that by Fig. 1 and Fig. 2, if rising sampling the most at random, all positive sample points being replicated, the most newly-generated
Sample point will be completely superposed and be distributed in original positive sample point in whole positive sample space.And self adaptation liter sampling is permissible
Generate the positive sample different from former sample point, and newly-generated positive sample standard deviation is near border.
(3) present invention takes five folding cross validations be trained unbalanced dataset and test.Train and all select with test
Select the C4.5 decision tree Adaboost sorting algorithm as base grader.Wherein, if the minimum leaf segment of C4.5 decision tree count into
2, confidence level is 0.25, and tree training needs to carry out beta pruning process after completing.All data all complete normalization before entering grader
Process, i.e. data minima is 0, and maximum is 1.Positive sample data label is+1, and negative sample data label is-1.
The positive negative sample of balance is gone out training set and test set by five folding cross-validation division, now training set should comprise
Positive each 518 of negative sample.Number of samples used by training is 2nn, i.e. 1036.Take the iterations T=10 of Adaboost algorithm,
It is trained the most as follows:
1. remember that each sample weights is Dt(i), wherein, the integer value between t desirable 1 to (T-1), represent current iteration wheel
Secondary, i represents sample number.The weights initializing each sample are D1(i)=1/ (2nn), i=1 ..., 2nn.
2. it is used for training grader h by the training set after weightingt, after having trained, calculate its training error rate
Wherein, t=1 ... T, for the iteration wheel number of times being presently in.εtIt is the training error rate of t wheel iteration, Dt(i)
The weight of each sample, y in iteration is taken turns for thisiFor sample xiAffiliated class label, value is 1 or-1.h(xi) it is sample xi
Tag along sort after trained.
3. set the grader obtained after t wheel iteration completes weight in final vote as αt, according to often taking turns in iteration
Training error rate calculate this and take turns weight of grader that repetitive exercise generates and be
Meanwhile, in next round iteration, the weight of each sample is updated to
Wherein, ZtFor the weights sum of sample each in current iteration round, it is used for each sample weights is normalized place
Reason.
4. perform 2,3 steps T time altogether, complete whole iteration and right value update process, thus complete classifier training.For
Test sample to be sorted, its classification results should be
From (7) formula, the weight of each sub-classifier is determined by its classification error rate.The grader that error rate is lower will
Higher weight is obtained in the voting process of (9) formula.Additionally, for single sample, by formula (8) if it will be seen that sample
Original tag is different from classification results, then the value of exponential depth will be greater than 0, and the result of natural logrithm will be less than 1 so that this sample exists
Weight in lower whorl iteration increases.Otherwise, sample weights in lower whorl iteration will reduce.
Test set sample is inputted in the grader of training, the final classification results of test sample, as shown in Figure 2.
Table 1 gives and directly uses C4.5 decision tree to classify unbalanced dataset, aligns sample and rise at random
Use C4.5 to carry out classifying after sampling and method used in the present invention carries out the test result respectively obtained of classifying.We use
Classifier performance is evaluated by following index:
Table 1 sorting algorithm result with compare (result black matrix best under same index marks)
By table 1 data it can be seen that although direct use C4.5 decision tree carries out classifying can obtain the highest specificity
Index, but sensitivity is minimum, it was demonstrated that and now classification performance is created and significantly affects by data nonbalance phenomenon.The border of positive sample
Region is invaded bites, and a large amount of positive samples are divided into negative sample by mistake.After simple random liter sampling, this problem has been alleviated,
But sensitivity is the biggest with specific gap;And the present invention has obtained good sensitivity and specific index simultaneously, two
The geometrical mean of person is the highest in the several method participating in contrast, it was demonstrated that sensitivity and specificity are had most preferably by the present invention
Compromise.
In sum, the present invention can obtain good classifying quality on unbalanced dataset, effectively eliminates data not
The negative influence that classification is brought by equilibrium problem.
Claims (1)
1. rise a unbalanced dataset sorting technique for sampling based on self adaptation, if original unbalanced data concentrates positive sample number
Mesh is np, negative sample number is nn, the method comprises the following steps:
(1) according to npAnd nnCalculate the unbalance factor IR of unbalanced dataset, IR calculating needs newly-generated positive total sample number G;
(2) with Euclidean distance for tolerance, for each positive sample i, search unbalanced data is concentrated with its closest K
Neighbour's sample, adds up the ratio shared by negative sample in above-mentioned K nearest samples, is designated as pi, to the p obtained by each positive samplei
Value is added and is normalized, and the value obtained after process being completed is designated as ri, the r of the most each positive sampleiValue sum is 1, i.e.
riFormation probability Density Distribution, claims riProbability for positive sample i;
(3) for each positive sample i, according to the probability r obtained in positive total sample number G-value and step (2)iDetermine this positive sample institute
New samples number g that need to generatei;
(4) for each positive sample i, K the nearest samples obtained in step (2) randomly selects giIndividual, respectively with its group
Becoming sample pair, randomly select and a little i.e. obtain newly-generated positive sample on the line of sample pair, new positive sample generates process
Generate G new positive sample point after completing, newly-generated G positive sample point is joined in original Nonblanced training sets, makes
Positive and negative number of samples is identical, i.e. obtains comprising positive sample and each n of negative samplenIndividual new balance training collection;
(5) iterations of note Adaboost algorithm is T, uses Adaboost algorithm to instruct newly-generated balance training collection
Practice, after iteration T time, obtain final disaggregated model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610331709.9A CN105975992A (en) | 2016-05-18 | 2016-05-18 | Unbalanced data classification method based on adaptive upsampling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610331709.9A CN105975992A (en) | 2016-05-18 | 2016-05-18 | Unbalanced data classification method based on adaptive upsampling |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105975992A true CN105975992A (en) | 2016-09-28 |
Family
ID=56955297
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610331709.9A Pending CN105975992A (en) | 2016-05-18 | 2016-05-18 | Unbalanced data classification method based on adaptive upsampling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105975992A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107273916A (en) * | 2017-05-22 | 2017-10-20 | 上海大学 | The unknown Information Hiding & Detecting method of steganographic algorithm |
CN108133223A (en) * | 2016-12-01 | 2018-06-08 | 富士通株式会社 | The device and method for determining convolutional neural networks CNN models |
CN108334455A (en) * | 2018-03-05 | 2018-07-27 | 清华大学 | The Software Defects Predict Methods and system of cost-sensitive hypergraph study based on search |
CN108629413A (en) * | 2017-03-15 | 2018-10-09 | 阿里巴巴集团控股有限公司 | Neural network model training, trading activity Risk Identification Method and device |
CN108733633A (en) * | 2018-05-18 | 2018-11-02 | 北京科技大学 | A kind of the unbalanced data homing method and device of sample distribution adjustment |
CN108776711A (en) * | 2018-03-07 | 2018-11-09 | 中国电力科学研究院有限公司 | A kind of electrical power system transient sample data extracting method and system |
CN109086412A (en) * | 2018-08-03 | 2018-12-25 | 北京邮电大学 | A kind of unbalanced data classification method based on adaptive weighted Bagging-GBDT |
CN109327464A (en) * | 2018-11-15 | 2019-02-12 | 中国人民解放军战略支援部队信息工程大学 | Class imbalance processing method and processing device in a kind of network invasion monitoring |
CN109614967A (en) * | 2018-10-10 | 2019-04-12 | 浙江大学 | A kind of detection method of license plate based on negative sample data value resampling |
CN109740750A (en) * | 2018-12-17 | 2019-05-10 | 北京深极智能科技有限公司 | Method of data capture and device |
CN109756494A (en) * | 2018-12-29 | 2019-05-14 | 中国银联股份有限公司 | A kind of negative sample transform method and device |
CN109862392A (en) * | 2019-03-20 | 2019-06-07 | 济南大学 | Recognition methods, system, equipment and the medium of internet gaming video flow |
CN110163226A (en) * | 2018-02-12 | 2019-08-23 | 北京京东尚科信息技术有限公司 | Equilibrating data set generation method and apparatus and classification method and device |
CN110998648A (en) * | 2018-08-09 | 2020-04-10 | 北京嘀嘀无限科技发展有限公司 | System and method for distributing orders |
CN111062806A (en) * | 2019-12-13 | 2020-04-24 | 合肥工业大学 | Personal finance credit risk evaluation method, system and storage medium |
WO2020082734A1 (en) * | 2018-10-24 | 2020-04-30 | 平安科技(深圳)有限公司 | Text emotion recognition method and apparatus, electronic device, and computer non-volatile readable storage medium |
CN111598189A (en) * | 2020-07-20 | 2020-08-28 | 北京瑞莱智慧科技有限公司 | Generative model training method, data generation method, device, medium, and apparatus |
CN111652268A (en) * | 2020-04-22 | 2020-09-11 | 浙江盈狐云数据科技有限公司 | Unbalanced stream data classification method based on resampling mechanism |
CN113903030A (en) * | 2021-10-12 | 2022-01-07 | 杭州迪英加科技有限公司 | Liquid-based cell pathology image generation method based on weak supervised learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103927874A (en) * | 2014-04-29 | 2014-07-16 | 东南大学 | Automatic incident detection method based on under-sampling and used for unbalanced data set |
CN104573708A (en) * | 2014-12-19 | 2015-04-29 | 天津大学 | Ensemble-of-under-sampled extreme learning machine |
CN104951809A (en) * | 2015-07-14 | 2015-09-30 | 西安电子科技大学 | Unbalanced data classification method based on unbalanced classification indexes and integrated learning |
CN105373806A (en) * | 2015-10-19 | 2016-03-02 | 河海大学 | Outlier detection method based on uncertain data set |
-
2016
- 2016-05-18 CN CN201610331709.9A patent/CN105975992A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103927874A (en) * | 2014-04-29 | 2014-07-16 | 东南大学 | Automatic incident detection method based on under-sampling and used for unbalanced data set |
CN104573708A (en) * | 2014-12-19 | 2015-04-29 | 天津大学 | Ensemble-of-under-sampled extreme learning machine |
CN104951809A (en) * | 2015-07-14 | 2015-09-30 | 西安电子科技大学 | Unbalanced data classification method based on unbalanced classification indexes and integrated learning |
CN105373806A (en) * | 2015-10-19 | 2016-03-02 | 河海大学 | Outlier detection method based on uncertain data set |
Non-Patent Citations (3)
Title |
---|
HAIBO HE 等: "ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning", 《2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS》 * |
刘余霞 等: "一种新的过采样算法DB_SMOTE", 《计算机工程与应用》 * |
陶新民 等: "不均衡数据分类算法的综述", 《重庆邮电大学学报(自然科学版)》 * |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108133223B (en) * | 2016-12-01 | 2020-06-26 | 富士通株式会社 | Device and method for determining convolutional neural network CNN model |
CN108133223A (en) * | 2016-12-01 | 2018-06-08 | 富士通株式会社 | The device and method for determining convolutional neural networks CNN models |
CN108629413A (en) * | 2017-03-15 | 2018-10-09 | 阿里巴巴集团控股有限公司 | Neural network model training, trading activity Risk Identification Method and device |
CN108629413B (en) * | 2017-03-15 | 2020-06-16 | 创新先进技术有限公司 | Neural network model training and transaction behavior risk identification method and device |
CN107273916B (en) * | 2017-05-22 | 2020-10-16 | 上海大学 | Information hiding detection method for unknown steganography algorithm |
CN107273916A (en) * | 2017-05-22 | 2017-10-20 | 上海大学 | The unknown Information Hiding & Detecting method of steganographic algorithm |
CN110163226A (en) * | 2018-02-12 | 2019-08-23 | 北京京东尚科信息技术有限公司 | Equilibrating data set generation method and apparatus and classification method and device |
CN108334455A (en) * | 2018-03-05 | 2018-07-27 | 清华大学 | The Software Defects Predict Methods and system of cost-sensitive hypergraph study based on search |
CN108334455B (en) * | 2018-03-05 | 2020-06-26 | 清华大学 | Software defect prediction method and system based on search cost-sensitive hypergraph learning |
CN108776711A (en) * | 2018-03-07 | 2018-11-09 | 中国电力科学研究院有限公司 | A kind of electrical power system transient sample data extracting method and system |
CN108733633A (en) * | 2018-05-18 | 2018-11-02 | 北京科技大学 | A kind of the unbalanced data homing method and device of sample distribution adjustment |
CN109086412A (en) * | 2018-08-03 | 2018-12-25 | 北京邮电大学 | A kind of unbalanced data classification method based on adaptive weighted Bagging-GBDT |
CN110998648A (en) * | 2018-08-09 | 2020-04-10 | 北京嘀嘀无限科技发展有限公司 | System and method for distributing orders |
CN109614967B (en) * | 2018-10-10 | 2020-07-17 | 浙江大学 | License plate detection method based on negative sample data value resampling |
CN109614967A (en) * | 2018-10-10 | 2019-04-12 | 浙江大学 | A kind of detection method of license plate based on negative sample data value resampling |
WO2020082734A1 (en) * | 2018-10-24 | 2020-04-30 | 平安科技(深圳)有限公司 | Text emotion recognition method and apparatus, electronic device, and computer non-volatile readable storage medium |
CN109327464A (en) * | 2018-11-15 | 2019-02-12 | 中国人民解放军战略支援部队信息工程大学 | Class imbalance processing method and processing device in a kind of network invasion monitoring |
CN109740750A (en) * | 2018-12-17 | 2019-05-10 | 北京深极智能科技有限公司 | Method of data capture and device |
CN109756494A (en) * | 2018-12-29 | 2019-05-14 | 中国银联股份有限公司 | A kind of negative sample transform method and device |
CN109756494B (en) * | 2018-12-29 | 2021-04-16 | 中国银联股份有限公司 | Negative sample transformation method and device |
CN109862392A (en) * | 2019-03-20 | 2019-06-07 | 济南大学 | Recognition methods, system, equipment and the medium of internet gaming video flow |
CN109862392B (en) * | 2019-03-20 | 2021-04-13 | 济南大学 | Method, system, device and medium for identifying video traffic of internet game |
CN111062806A (en) * | 2019-12-13 | 2020-04-24 | 合肥工业大学 | Personal finance credit risk evaluation method, system and storage medium |
CN111062806B (en) * | 2019-12-13 | 2022-05-10 | 合肥工业大学 | Personal finance credit risk evaluation method, system and storage medium |
CN111652268A (en) * | 2020-04-22 | 2020-09-11 | 浙江盈狐云数据科技有限公司 | Unbalanced stream data classification method based on resampling mechanism |
CN111598189B (en) * | 2020-07-20 | 2020-10-30 | 北京瑞莱智慧科技有限公司 | Generative model training method, data generation method, device, medium, and apparatus |
CN111598189A (en) * | 2020-07-20 | 2020-08-28 | 北京瑞莱智慧科技有限公司 | Generative model training method, data generation method, device, medium, and apparatus |
CN113903030A (en) * | 2021-10-12 | 2022-01-07 | 杭州迪英加科技有限公司 | Liquid-based cell pathology image generation method based on weak supervised learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105975992A (en) | Unbalanced data classification method based on adaptive upsampling | |
CN103632168B (en) | Classifier integration method for machine learning | |
CN103728551B (en) | A kind of analog-circuit fault diagnosis method based on cascade integrated classifier | |
CN101944174B (en) | Identification method of characters of licence plate | |
CN105844287B (en) | A kind of the domain adaptive approach and system of classification of remote-sensing images | |
CN108764366A (en) | Feature selection and clustering sampling integration two-classification method for unbalanced data | |
CN114241273B (en) | Multi-modal image processing method and system based on Transformer network and hypersphere space learning | |
CN108985327B (en) | Terrain matching area self-organization optimization classification method based on factor analysis | |
CN106202952A (en) | A kind of Parkinson disease diagnostic method based on machine learning | |
CN103020122A (en) | Transfer learning method based on semi-supervised clustering | |
CN104881671B (en) | A kind of high score remote sensing image Local Feature Extraction based on 2D Gabor | |
CN109214460A (en) | Method for diagnosing fault of power transformer based on Relative Transformation Yu nuclear entropy constituent analysis | |
CN106845717A (en) | A kind of energy efficiency evaluation method based on multi-model convergence strategy | |
CN105426919A (en) | Significant guidance and unsupervised feature learning based image classification method | |
CN110009030A (en) | Sewage treatment method for diagnosing faults based on stacking meta learning strategy | |
CN102201236A (en) | Speaker recognition method combining Gaussian mixture model and quantum neural network | |
CN106778796A (en) | Human motion recognition method and system based on hybrid cooperative model training | |
CN108460421A (en) | The sorting technique of unbalanced data | |
CN106529605A (en) | Image identification method of convolutional neural network model based on immunity theory | |
CN109344856B (en) | Offline signature identification method based on multilayer discriminant feature learning | |
CN110363230A (en) | Stacking integrated sewage handling failure diagnostic method based on weighting base classifier | |
CN108877947A (en) | Depth sample learning method based on iteration mean cluster | |
CN105975611A (en) | Self-adaptive combined downsampling reinforcing learning machine | |
CN103886030B (en) | Cost-sensitive decision-making tree based physical information fusion system data classification method | |
CN107133640A (en) | Image classification method based on topography's block description and Fei Sheer vectors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160928 |