CN107895168A - The method of data processing, the device of data processing and computer-readable recording medium - Google Patents
The method of data processing, the device of data processing and computer-readable recording medium Download PDFInfo
- Publication number
- CN107895168A CN107895168A CN201710951683.2A CN201710951683A CN107895168A CN 107895168 A CN107895168 A CN 107895168A CN 201710951683 A CN201710951683 A CN 201710951683A CN 107895168 A CN107895168 A CN 107895168A
- Authority
- CN
- China
- Prior art keywords
- model
- semi
- data
- supervised
- data processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 82
- 238000012545 processing Methods 0.000 title claims abstract description 44
- 239000003814 drug Substances 0.000 claims abstract description 24
- 238000004458 analytical method Methods 0.000 claims abstract description 16
- NOESYZHRGYRDHS-UHFFFAOYSA-N insulin Chemical compound N1C(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(NC(=O)CN)C(C)CC)CSSCC(C(NC(CO)C(=O)NC(CC(C)C)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CCC(N)=O)C(=O)NC(CC(C)C)C(=O)NC(CCC(O)=O)C(=O)NC(CC(N)=O)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CSSCC(NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2C=CC(O)=CC=2)NC(=O)C(CC(C)C)NC(=O)C(C)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2NC=NC=2)NC(=O)C(CO)NC(=O)CNC2=O)C(=O)NCC(=O)NC(CCC(O)=O)C(=O)NC(CCCNC(N)=N)C(=O)NCC(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC(O)=CC=3)C(=O)NC(C(C)O)C(=O)N3C(CCC3)C(=O)NC(CCCCN)C(=O)NC(C)C(O)=O)C(=O)NC(CC(N)=O)C(O)=O)=O)NC(=O)C(C(C)CC)NC(=O)C(CO)NC(=O)C(C(C)O)NC(=O)C1CSSCC2NC(=O)C(CC(C)C)NC(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CC(N)=O)NC(=O)C(NC(=O)C(N)CC=1C=CC=CC=1)C(C)C)CC1=CN=CN1 NOESYZHRGYRDHS-UHFFFAOYSA-N 0.000 claims description 92
- 102000004877 Insulin Human genes 0.000 claims description 46
- 108090001061 Insulin Proteins 0.000 claims description 46
- 229940125396 insulin Drugs 0.000 claims description 46
- 230000015654 memory Effects 0.000 claims description 28
- 238000003860 storage Methods 0.000 claims description 22
- 238000012549 training Methods 0.000 claims description 22
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 claims description 19
- 239000008103 glucose Substances 0.000 claims description 18
- 238000003066 decision tree Methods 0.000 claims description 17
- 238000004422 calculation algorithm Methods 0.000 claims description 12
- 238000012360 testing method Methods 0.000 claims description 12
- 238000005457 optimization Methods 0.000 claims description 7
- 238000003672 processing method Methods 0.000 claims description 7
- 241001269238 Data Species 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 6
- 238000010801 machine learning Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 5
- 230000010365 information processing Effects 0.000 claims description 2
- 238000003745 diagnosis Methods 0.000 claims 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 24
- 201000010099 disease Diseases 0.000 description 22
- 206010012601 diabetes mellitus Diseases 0.000 description 15
- 230000008569 process Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 14
- 238000012706 support-vector machine Methods 0.000 description 14
- 239000008280 blood Substances 0.000 description 12
- 210000004369 blood Anatomy 0.000 description 12
- 230000008859 change Effects 0.000 description 12
- 238000007726 management method Methods 0.000 description 12
- 230000001154 acute effect Effects 0.000 description 8
- 230000001684 chronic effect Effects 0.000 description 8
- 230000003902 lesion Effects 0.000 description 8
- 238000004519 manufacturing process Methods 0.000 description 8
- 208000013016 Hypoglycemia Diseases 0.000 description 7
- 230000002218 hypoglycaemic effect Effects 0.000 description 7
- 208000002249 Diabetes Complications Diseases 0.000 description 6
- 206010012655 Diabetic complications Diseases 0.000 description 6
- 230000037396 body weight Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 238000002347 injection Methods 0.000 description 6
- 239000007924 injection Substances 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 5
- 231100001011 cardiovascular lesion Toxicity 0.000 description 4
- 201000001421 hyperglycemia Diseases 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 235000004251 balanced diet Nutrition 0.000 description 3
- 238000007681 bariatric surgery Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 229940079593 drug Drugs 0.000 description 3
- 208000004104 gestational diabetes Diseases 0.000 description 3
- 230000036541 health Effects 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 3
- 230000002265 prevention Effects 0.000 description 3
- 238000007637 random forest analysis Methods 0.000 description 3
- 230000005586 smoking cessation Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 206010018473 Glycosuria Diseases 0.000 description 2
- 201000003176 Severe Acute Respiratory Syndrome Diseases 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 239000003337 fertilizer Substances 0.000 description 2
- 210000003734 kidney Anatomy 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 201000001119 neuropathy Diseases 0.000 description 2
- 230000007823 neuropathy Effects 0.000 description 2
- 208000033808 peripheral neuropathy Diseases 0.000 description 2
- 231100001028 renal lesion Toxicity 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 208000008589 Obesity Diseases 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- -1 dosage Chemical compound 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 235000020824 obesity Nutrition 0.000 description 1
- 238000006748 scratching Methods 0.000 description 1
- 230000002393 scratching effect Effects 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H20/00—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
- G16H20/10—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/285—Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/22—Social work or social welfare, e.g. community support activities or counselling services
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H40/00—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
- G16H40/60—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices
- G16H40/67—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices for remote operation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Primary Health Care (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Biomedical Technology (AREA)
- Epidemiology (AREA)
- Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Chemical & Material Sciences (AREA)
- General Business, Economics & Management (AREA)
- Medicinal Chemistry (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Child & Adolescent Psychology (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses a kind of method of data processing, the method comprising the steps of:Obtain the medical initial data of insurance institution's database and medical institutions at different levels;Establish disaggregated model, naturally semantic crawl model, semi-supervised learning model and identity information crawl model;And the initial data is inputted into the disaggregated model, naturally semantic crawl model, semi-supervised learning model and identity information crawl model, each model analysis output violation document is carried out to the initial data of input.It can be directed to that different regions are quick, violation of batch extracting management and control medicine ground document by methods described, substantially reduce the number cost of labor and greatly improve efficiency.
Description
Technical field
The present invention relates to data processing field, more particularly to the method for data processing, the device of data processing and computer
Readable storage medium storing program for executing.
Background technology
It is well known that there are many management and control measures in the medical institutions such as hospital for the medicine for treating some diseases, these pipes
It is essential for disease corresponding to treatment to control medicine, but many problems be present, for example dosage, usage if inappropriate can
Certain harm is caused, different degrees of patient is different for the demand of medicine.
Medical insurance policies lower limit qualitative insulin, which uses, at present the clear and definite service logic of comparison, but specifically implements to every
All difficulties be present in individual city, area, medical institutions.Traditional violation document extracting method was business expert to past 1 year
Full dose data are investigated in turn, finally orient violation document information, consume the plenty of time, and so investigation is not each personnel
It can operate and (be limited to business expert).And if without sufficient amount, the violation document of the violation amount of money, medical insurance mechanism can't
Directly adopt extraction result.It is more troublesome to be, identical work is repeated again after different cities are switched, and efficiency is too low.
The content of the invention
In view of this, it is an object of the invention to provide a kind of data processing method, the device and computer of data processing
Readable storage medium storing program for executing, for different regions are quick, violation of batch extracting management and control medicine ground document, it substantially reduce the number cost of labor
And greatly improve efficiency.
To achieve the above object, the present invention provides a kind of method of data processing, and the method comprising the steps of:
The medical initial data of insurance institution's database and medical institutions at different levels is obtained, establishes disaggregated model, naturally semanteme
Capture model, semi-supervised learning model and identity information crawl model;And
The initial data is inputted into the disaggregated model, naturally semantic crawl model, semi-supervised learning model and identity
Information scratching model, each model carry out analysis output violation document to the initial data of input;
Wherein, the disaggregated model is classified to the medical initial data, and in the medical initial data
Missed case correct so that the missed case is correctly classified.
Preferably, the classification initial data includes medical insurance declaration form, charge document, reimbursement document.
Preferably, the disaggregated model includes one of decision tree classifier, selection Tree Classifier.
Preferably, the foundation of the disaggregated model comprises the following steps:
All samples are divided into training sample and test sample two parts by selected sample;
Classifier algorithm is performed on the training sample, generates the disaggregated model;
The disaggregated model is performed in the test sample, generates prediction result;And
According to the prediction result, necessary evaluation index is calculated, assesses the performance of the disaggregated model.
Preferably, the natural semantic model is based on natural semantic processes and being used for of establishing captures the moulds of specific fields
Type, the identity information crawl model are mainly used in obtaining the identity information of medicine user in initial data.
Preferably, the foundation of the semi-supervised learning model comprises the following steps:
For having flag data and a large amount of Unlabeled datas, the multiple semi-supervised classifiers of random initializtion on a small quantity;
For each initial semi-supervised classifier, the prediction result of semi-supervised classifier is carried out by optimization method excellent
Change;
The prediction result of the semi-supervised classifier optimized is divided into multiple clusters by the clustering method of machine learning;
For each cluster of cluster result, the output wherein optimal semi-supervised classifier of desired value;
The semi-supervised classifier of each cluster output is collected, obtains multiple semi-supervised classifiers.
Preferably, the semi-supervised classifier include the semi-supervised classifier based on production, semi-supervised point based on figure
One of class device, the semi-supervised classifier based on inconsistency and semi-supervised classifier based on SVMs.
Preferably, methods described also includes step:
The disaggregated model is analyzed the initial data of input, sorted out, and the case not diagnosed correctly is reclassified;
Naturally the semantic crawl model crawl keyword " insulin " and " glucose ", obtained twice and two by analyzing
The document of secondary " insulin " used above and " glucose ";
The semi-supervised learning model is divided the Unlabeled data of input by the analysis to the initial data
Class, the violation document in initial data is exported;And
The identity information of the identity information crawl model crawl patient, according to medicine service condition corresponding to different patients
Violation document is exported.
Because the selection of data above processing method, the present invention can be directed to that different regions are quick, batch extracting management and control medicine
Violation document, substantially reduce the number cost of labor and greatly improve efficiency, using different models to initial data carry out
Analysis, also more comprehensively and accurately obtain violation document.
To achieve the above object, the present invention also provides a kind of data processing equipment, and described information processing unit includes:Storage
Device, processor, the computer program that can be run on the processor, the computer program quilt are stored with the memory
The step of described data processing method is realized during the computing device.
Further, to achieve the above object, the present invention also provides a kind of computer-readable recording medium, the computer
Readable storage medium storing program for executing is stored with the program of data processing, the program of the data processing can by least one computing device, with
The step of making the method for at least one computing device data processing described above.
Compared to prior art, using the method for data processing of the present invention, the device and computer of data processing
Readable storage medium storing program for executing can be directed to that different regions are quick, violation of batch extracting management and control medicine ground document, substantially reduce the number artificial
Cost simultaneously greatly improves efficiency.First, the medical initial data of insurance institution's database and medical institutions at different levels is obtained;Its
It is secondary, establish disaggregated model, naturally semantic crawl model, semi-supervised learning model and identity information crawl model;Finally, by described in
Initial data inputs the disaggregated model, naturally semantic crawl model, semi-supervised learning model and identity information crawl model, respectively
Model carries out analysis output violation document to the initial data of input.Initial data is divided using different models
Analysis, also more comprehensively and accurately obtain violation document.
Brief description of the drawings
Fig. 1 is each optional application environment schematic diagram of embodiment one of the present invention;
Fig. 2 is one optional configuration diagram of data processing equipment in Fig. 1;
Fig. 3 is the module diagram of the embodiment of data handling system one in Fig. 2;
Fig. 4 is the implementation process diagram of data processing method first embodiment of the present invention;
Fig. 5 is the implementation process diagram of the method that disaggregated model is established in Fig. 4;
Fig. 6 is the implementation process diagram of the method that semi-supervised learning model is established in Fig. 4;
Fig. 7 is the implementation process diagram of data directional transmission method second embodiment of the present invention;
The object of the invention is realized, functional characteristics and advantage will be described further referring to the drawings in conjunction with the embodiments.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples
The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, not
For limiting the present invention.Based on the embodiment in the present invention, those of ordinary skill in the art are not before creative work is made
The every other embodiment obtained is put, belongs to the scope of protection of the invention.
It should be noted that the description for being related to " first ", " second " etc. in the present invention is only used for describing purpose, and can not
It is interpreted as indicating or implies its relative importance or imply the quantity of the technical characteristic indicated by indicating.Thus, define " the
One ", at least one this feature can be expressed or be implicitly included to the feature of " second ".In addition, the skill between each embodiment
Art scheme can be combined with each other, but must can be implemented as basis with those of ordinary skill in the art, when technical scheme
With reference to occurring conflicting or will be understood that the combination of this technical scheme is not present when can not realize, also not in application claims
Protection domain within.
As shown in fig.1, it is each optional application environment schematic diagram of embodiment one of the present invention.
In the present embodiment, present invention can apply to application environment 1, the application environment 1 to include but not limited to, and protects
Dangerous mechanism 10, medical institutions 11, network 12 and data processing equipment 13.
In one embodiment, the data processing equipment 13 can be mobile phone, smart phone, notebook computer,
Digit broadcasting receiver, PDA (personal digital assistant), PAD (tablet personal computer) etc. movable equipment, and such as desk-top meter
The fixed terminal of calculation machine, notebook, server etc..The insurance institution 10 and medical institutions 11 can be the service of data storage
Device or database, the server can be rack-mount server, blade server, tower server or cabinet-type service
The computing devices such as device, the server can be the server clusters that independent server or multiple servers are formed.
The database, the implementation of each specialized company is different, and main type of database is Oracle, can also be existed
The various databases of the types such as PostgreSQL, MySQL.The network 12 can be intranet (Intranet), interconnection
Net (Internet), global system for mobile communications (Global System of Mobile communication, GSM), broadband
CDMA (Wideband Code Division Multiple Access, WCDMA), 4G networks, 5G networks, bluetooth
(Bluetooth), the wirelessly or non-wirelessly network such as Wi-Fi, speech path network.
In one embodiment, the insurance institution 10 and medical institutions 11 pass through the network 12 and one or more institutes
Data processing equipment 13 (one is only shown in figure) communication connection is stated, so that data processing equipment 13 can pass through network 12 and institute
State insurance institution 10 and medical institutions 11 carry out data transmission and interacted.
Referring to Fig. 2, Fig. 2 is 13 1 optional configuration diagram of data processing equipment of the present invention.
As shown in Fig. 2 in one embodiment, the data processing equipment 13 includes data handling system 3, memory 21
And processor 22.
In one embodiment, the memory 21 comprises at least a type of readable storage medium storing program for executing, described readable to deposit
Storage media includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memories etc.), random access storage device
(RAM), static random-access memory (SRAM), read-only storage (ROM), Electrically Erasable Read Only Memory
(EEPROM), programmable read only memory (PROM), magnetic storage, disk, CD etc..In certain embodiments, it is described to deposit
Reservoir 21 can be the internal storage unit of data processing equipment 13, such as the hard disk or internal memory of the data processing equipment 13.
In other embodiments, the memory 21 can also be at the External memory equipment of data processing equipment 13, such as the data
The plug-in type hard disk being equipped with reason device 13, intelligent memory card (Smart Media Card, SMC), secure digital (Secure
Digital, SD) card, flash card (Flash Card) etc..Certainly, the memory 21 can also both include data processing equipment
13 internal storage unit also includes its External memory equipment.In the present embodiment, the memory 21 is generally used for storage installation
In the operating system and types of applications software of the data processing equipment 13, such as the program code of the data handling system 3
Deng.In addition, the memory 21 can be also used for temporarily storing the Various types of data that has exported or will export.
In one embodiment, the processor 22 can be central processing unit (Central in certain embodiments
Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chips.The processor 22 is logical
It is usually used in controlling the overall operation of the data processing equipment 13.In the present embodiment, the processor 22 is used to run described deposit
The program code or processing data stored in reservoir 21, such as run described data handling system 3 etc..
Referring to Fig. 3, Fig. 3 is the module diagram of the embodiment of data handling system 3 one in Fig. 2.
As shown in figure 3, in one embodiment, the data handling system 3 include acquisition module 31, establish module 32,
Sort module 33, naturally semantic handling module 34, identity information handling module 35, semi-supervised learning module 36, memory module 37
And output module 38.
In one embodiment, the acquisition module 31 is used for the doctor for obtaining insurance institution's database and medical institutions at different levels
Treat initial data.
Specifically, the database is the database of our company, and medical initial data mainly includes declaration form, receipt, doctors at different levels
Treating mechanism includes the hospital in province, city, area and other regions, medical centre etc..
In one embodiment, the module 32 of establishing is used to establishing disaggregated model, the naturally semantic model, semi-supervised of capturing
Learning model and identity information crawl model.
Specifically, in one embodiment, the disaggregated model can be decision tree classifier, and decision-tree model provides one
Attribute set, decision tree are sorted data into by making a series of decision-making on the basis of property set;Select Tree Classifier,
Selection Tree Classifier is classified using the technology similar to decision tree classifier to data.Unlike decision tree, selection
Special selection node is included in tree, selects node there are multiple branches.In addition, also can be used artificial neural network, case inference,
The graders such as nearest-neighbors method, SVMs and random forest.
Specifically, in one embodiment, the construction of the disaggregated model can be divided into following steps:
All samples are divided into training sample and test sample two parts by selected sample;
Classifier algorithm is performed on training sample, generates disaggregated model;
Disaggregated model is performed in test sample, generates prediction result;
According to prediction result, necessary evaluation index is calculated, assesses the performance of disaggregated model.
Specifically, in one embodiment, the sample data of disaggregated model can be treating diabetes scheme and case data.
The prevention and treatment mode of diabetes includes maintaining balanced diet, regular exercise, smoking cessation, maintenance ideal body weight.First type glycosuria
Disease must insulin injection to control blood glucose;And the second patients with type Ⅰ DM can then use oral medicine control blood glucose, if necessary
Can be arranged in pairs or groups injection of insulin.The part oral medicine of diabetes is likely to result in hypoglycemia with insulin.Pair simultaneously with fertilizer
For second diabetes mellitus type of fat disease, bariatric surgery is effectively to treat.For the patient of gestational diabetes mellitus, blood glucose
It would generally recover normal after manufacture.Wherein, the service condition of the use for insulin and other drugs, for example, it is dosage, secondary
The characteristic parameters such as number, time can be as the training sample of disaggregated model.
Specifically, in one embodiment, the natural semantic model is being used for for establishing based on natural semantic processes
Capture the model of specific fields.For example, by the natural semantic model can be captured from the initial data obtained " glucose ",
Fields such as " insulin ".
Specifically, in one embodiment, the semi-supervised learning model combines one using the data of a large amount of unmarked mistakes
A little labeled data make the model trained to solve the problems, such as that data volume is rare and scattered.The semi-supervised model is
Established and formed based on semi-supervised learning method, semi-supervised learning method between supervised learning method and non-supervisory formula learning method it
Between.In the case where one naturally considers, the data (labeled data) crossed if the same number tag, whether we can profits
More accurate classification is built with a large amount of handy unmarked mistakes, untreated data (unlabeled data)
Device (classifier), this problem will generally be classified as semi-supervised learning (semi-supervised
learning).Be using the reason for semi-supervised learning method, in real-life, flag data not only expend the time, it is artificial,
Even money;It is relative, without mark data volume more than and also it is easily available.Therefore how it to be one using Unlabeled data
Important problem, such as:Include many original documents in the initial data that previous step obtains, these documents are originally to need
Want artificial treatment just to do correct classification, waste time and energy, in these cases, if the data of unmarked mistake can allow mould really
The efficiency of type improves.
Further, in one embodiment, the semi-supervised learning model may be selected to be based on self-training algorithm (self-
Training), generation model (generative models), semisupervised support vector machines (SVMs), Graph-theoretical Approach (graph-
Based methods), the model of various visual angles algorithm (multiview learing) etc..
Further, in one embodiment, the foundation of semi-supervised learning model may include following steps:
For having flag data and a large amount of Unlabeled datas, the multiple semi-supervised classifiers of random initializtion on a small quantity;
For each initial semi-supervised classifier, the prediction result of semi-supervised classifier is carried out by optimization method excellent
Change;
The prediction result of the semi-supervised classifier optimized is divided into multiple clusters by the clustering method of machine learning;
For each cluster of cluster result, the output wherein optimal semi-supervised classifier of desired value;
The semi-supervised classifier of each cluster output is collected, obtains multiple semi-supervised classifiers.
Specifically, in one embodiment, a small amount of flag data can be diabetic complications case data, including acute conjunction
And disease and chronic complication.For example, acute complication includes hypoglycemia and hyperglycaemia, chronic complication includes eyes lesion, kidney
Popular name for change, DPN, cardiovascular lesion and foot lesion etc..
Further, in one embodiment, semi-supervised classifier includes the semi-supervised classifier based on production, is based on
The semi-supervised classifier of figure, the semi-supervised classifier based on inconsistency and the semi-supervised classifier based on SVMs.
In one embodiment, identity information crawl model is mainly used in obtaining limited insulin in initial data and used
The information such as the identity information of person, such as age, height, body weight.For example, acceptable age obtains what restricted insulin used
The logic of violation document is that diabetes develop into children, young man, the elderly via original disease of old people and can all obtained
Disease.Although same disease, the management of different age group is really different, such as, time for disease control is different,
Different time up to standard, health difference, Blood sugar management standard difference are treated, because these differences, the different ages exists
Insulin has differently feature using upper, therefore the violation document that limited insulin uses can be obtained according to the age.
In one embodiment, the sort module 33 is the module established based on the disaggregated model, is mainly used in pair
The initial data of input is analyzed, sorted out, and incorrect diagnosed SARS case (including extent of patient etc.) is reclassified, and
The use of limited insulin and different cases are corresponding, and most the document of the limited insulin of Misuse is defeated at last for disaggregated model
Go out.
In one embodiment, the naturally semantic handling module 34 is the mould established based on the natural semantic model
Block, it is mainly used in capturing the keyword such as " insulin " and " glucose ", will twice and " insulin " makes more than twice by analysis
Exported with the document of record.
In one embodiment, the identity information handling module 35 is to capture what model was established based on the identity information
Module, it is mainly used in capturing the identity information of limited insulin user, the medicine according to corresponding to the situation of different users
Service condition exports to the document of the limited insulin of Misuse.
In one embodiment, the supervised learning module 36 of doing is the mould established based on the semi-supervised learning model
Block, it is mainly used in, by the analysis to initial data, classifying to Unlabeled data, will according to the characteristics of diabetic complications
The document of the limited insulin of Misuse is exported in initial data.
In one embodiment, the memory module 38 is mainly used in storing data, document of each module output etc., described
Memory module 38 can be readable storage medium storing program for executing, and the readable storage medium storing program for executing includes flash memory, hard disk, multimedia card, card-type memory
(for example, SD or DX memories etc.), random access storage device (RAM), static random-access memory (SRAM), read-only storage
(ROM), Electrically Erasable Read Only Memory (EEPROM), programmable read only memory (PROM), magnetic storage, magnetic
Disk, CD etc..
In one embodiment, the output module 38 is used in data and the memory module 38 to the output of each module
Data exported.For example, in the data and the memory module 38 that can be exported each module by mail he
Data are sent to the equipment on network, the data that can also be exported each module by modes such as WiFi, bluetooths and the storage
The equipment that data in module 38 are sent to surrounding.
Referring to Fig. 4, Fig. 4 is the implementation process diagram of data processing method first embodiment of the present invention.In other realities
Apply in mode, according to different demands, the execution sequence of the schematic flow sheet shown in Fig. 4 can change, and some steps are according to need
It can omit.
S110, obtain the medical initial data of insurance institution's database and medical institutions at different levels.
Specifically, the database is the database of our company, and medical initial data mainly includes declaration form, receipt, doctors at different levels
Treating mechanism includes the hospital in province, city, area and other regions, medical centre etc..
S120, establish disaggregated model, naturally semantic crawl model, semi-supervised learning model and identity information crawl model.
Specifically, in one embodiment, the disaggregated model can be decision tree classifier, and decision-tree model provides one
Attribute set, decision tree are sorted data into by making a series of decision-making on the basis of property set;Select Tree Classifier,
Selection Tree Classifier is classified using the technology similar to decision tree classifier to data.Unlike decision tree, selection
Special selection node is included in tree, selects node there are multiple branches.In addition, also can be used artificial neural network, case inference,
The graders such as nearest-neighbors method, SVMs and random forest.
Specifically, in one embodiment, the construction of disaggregated model can be divided into following steps:
All samples are divided into training sample and test sample two parts by selected sample;
Classifier algorithm is performed on training sample, generates disaggregated model;
Disaggregated model is performed in test sample, generates prediction result;
According to prediction result, necessary evaluation index is calculated, assesses the performance of disaggregated model.
Specifically, in one embodiment, the sample data of disaggregated model can be treating diabetes scheme and case data.
The prevention and treatment mode of diabetes includes maintaining balanced diet, regular exercise, smoking cessation, maintenance ideal body weight.First type glycosuria
Disease must insulin injection to control blood glucose;And the second patients with type Ⅰ DM can then use oral medicine control blood glucose, if necessary
Can be arranged in pairs or groups injection of insulin.The part oral medicine of diabetes is likely to result in hypoglycemia with insulin.Pair simultaneously with fertilizer
For second diabetes mellitus type of fat disease, bariatric surgery is effectively to treat.For the patient of gestational diabetes mellitus, blood glucose
It would generally recover normal after manufacture.Wherein, the service condition of the use for insulin and other drugs, for example, it is dosage, secondary
The characteristic parameters such as number, time can be as the training sample of disaggregated model.
Specifically, in one embodiment, the natural semantic model is being used for for establishing based on natural semantic processes
Capture the model of specific fields.For example, by the natural semantic model can be captured from the initial data obtained " glucose ",
Fields such as " insulin ".
Specifically, in one embodiment, the semi-supervised learning model combines one using the data of a large amount of unmarked mistakes
A little labeled data make the model trained to solve the problems, such as that data volume is rare and scattered.Semi-supervised model is to be based on
Semi-supervised learning method is established and formed, and semi-supervised learning method is between supervised learning method and non-supervisory formula learning method.
Under one naturally considers, the data (labeled data) crossed if the same number tag, whether we can be using greatly
Measure handy unmarked mistake, untreated data (unlabeled data) build a more accurate grader
(classifier), this problem will generally be classified as semi-supervised learning (semi-supervised learning).
It is using the reason for semi-supervised learning method, in real-life, flag data not only expends time, artificial, even money;Phase
To, without mark data volume more than and also it is easily available.Therefore how to be an important problem using Unlabeled data,
Such as:Include many original documents in the initial data that previous step obtains, these documents are originally to need artificial treatment
Correct classification can be just done, is wasted time and energy, in these cases, if the data of unmarked mistake can allow the efficiency of model to carry really
It is high.
Further, in one embodiment, the semi-supervised learning model may be selected to be based on self-training algorithm (self-
Training), generation model (generative models), semisupervised support vector machines (SVMs), Graph-theoretical Approach (graph-
Based methods), the model of various visual angles algorithm (multiview learing) etc..
Further, in one embodiment, semi-supervised classifier includes the semi-supervised classifier based on production, is based on
The semi-supervised classifier of figure, the semi-supervised classifier based on inconsistency and the semi-supervised classifier based on SVMs.
Further, in one embodiment, the foundation of the semi-supervised learning model may include following steps:
For having flag data and a large amount of Unlabeled datas, the multiple semi-supervised classifiers of random initializtion on a small quantity;
For each initial semi-supervised classifier, the prediction result of semi-supervised classifier is carried out by optimization method excellent
Change;
The prediction result of the semi-supervised classifier optimized is divided into multiple clusters by the clustering method of machine learning;
For each cluster of cluster result, the output wherein optimal semi-supervised classifier of desired value;
The semi-supervised classifier of each cluster output is collected, obtains multiple semi-supervised classifiers.
Specifically, a small amount of flag data can be diabetic complications case data, including acute complication and chronic merging
Disease.For example, acute complication includes hypoglycemia and hyperglycaemia, chronic complication includes eyes lesion, renal lesions, neuropathy
Change, cardiovascular lesion and foot lesion etc..
In one embodiment, identity information crawl model is mainly used in obtaining limited insulin in initial data and used
The information such as the identity information of person, such as age, height, body weight.Diabetes develop into children, youth via original disease of old people
The disease that people, the elderly can obtain.Although same disease, the management of different age group is really different, such as, for disease
The time difference of disease management, the treatment time up to standard are different, health is different, Blood sugar management standard is different, because these are not
Same, the different ages have differently feature in insulin using upper, therefore can obtain limited insulin according to the age
The violation document used.
S130, by the initial data input the disaggregated model, it is naturally semantic capture model, semi-supervised learning model and
Identity information captures model, and each model carries out analysis output violation document to the initial data of input.
Specifically, in one embodiment, disaggregated model is analyzed the initial data of input, sorted out, will be incorrect
Diagnosed SARS case (including extent of patient etc.) reclassifies, and the use of limited insulin and different cases are corresponding,
The disaggregated model most document output of the limited insulin of Misuse at last.
Specifically, in one embodiment, the naturally semantic crawl model crawl keyword such as " insulin " and " glucose ",
The document of twice and more than twice " insulin " usage record is exported by analysis.
Specifically, in one embodiment, semi-supervised learning model is by the analysis to initial data, to Unlabeled data
Classified, it is according to the characteristics of diabetic complications that the document progress of the limited insulin of Misuse in initial data is defeated
Go out.
Specifically, in one embodiment, identity information crawl model captures the identity letter of limited insulin user
Breath, medicine service condition is carried out defeated to the document of the limited insulin of Misuse according to corresponding to the situation of different users
Go out.
Fig. 5 is the implementation process diagram of the method that disaggregated model is established in Fig. 4.In other embodiments, according to not
Same demand, the execution sequence of the schematic flow sheet shown in Fig. 5 can change, and some steps can be omitted as needed.
S210, sample is selected, all samples are divided into training sample and test sample two parts.
S220, classifier algorithm is performed on the training sample, generate the disaggregated model.
S230, the disaggregated model is performed in the test sample, generate prediction result.
S240, according to the prediction result, necessary evaluation index is calculated, assesses the performance of the disaggregated model.
Specifically, in one embodiment, the training data of the disaggregated model is treating diabetes scheme and case number of cases
According to.The prevention and treatment mode of diabetes includes maintaining balanced diet, regular exercise, smoking cessation, maintenance ideal body weight.First type sugar
Urine disease must insulin injection to control blood glucose;And the second patients with type Ⅰ DM can then use oral medicine control blood glucose, if necessary
Can also be arranged in pairs or groups injection of insulin.The part oral medicine of diabetes is likely to result in hypoglycemia with insulin.Pair suffer from simultaneously
For second diabetes mellitus type of obesity, bariatric surgery is effectively to treat.For the patient of gestational diabetes mellitus, blood
Sugar would generally recover normal after manufacture.Wherein, the service condition of the use for insulin and other drugs, such as dosage,
The characteristic parameters such as number, time can be as the training sample of disaggregated model.
Fig. 6 is the implementation process diagram of the method that semi-supervised learning model is established in Fig. 4.In other embodiments,
According to different demands, the execution sequence of the schematic flow sheet shown in Fig. 5 can change, and some steps can be omitted as needed.
S310, for having flag data and a large amount of Unlabeled datas, the multiple semi-supervised classifiers of random initializtion on a small quantity.
S320, for each initial semi-supervised classifier, the prediction result of semi-supervised classifier is entered by optimization method
Row optimization.
S330, the prediction result of the semi-supervised classifier optimized is divided into by the clustering method of machine learning multiple
Cluster.
S340, for each cluster of cluster result, the output wherein optimal semi-supervised classifier of desired value.
S350, the semi-supervised classifier of each cluster output is collected, obtains multiple semi-supervised classifiers.
Specifically, in one embodiment, a small amount of flag data can be diabetic complications case data, including acute conjunction
And disease and chronic complication.For example, acute complication includes hypoglycemia and hyperglycaemia, chronic complication includes eyes lesion, kidney
Popular name for change, DPN, cardiovascular lesion and foot lesion etc..
Specifically, in one embodiment, semi-supervised classifier includes the semi-supervised classifier based on production, based on figure
Semi-supervised classifier, the semi-supervised classifier based on inconsistency and the semi-supervised classifier based on SVMs.
Fig. 7 is the implementation process diagram of data directional transmission method second embodiment of the present invention.In other embodiment
In, according to different demands, the execution sequence of the schematic flow sheet shown in Fig. 5 can change, and some steps can save as needed
Slightly.
S410, obtain the medical initial data of insurance institution's database and medical institutions at different levels.
Specifically, the database is the database of our company, and medical initial data mainly includes declaration form, receipt, doctors at different levels
Treating mechanism includes the hospital in province, city, area and other regions, medical centre etc..
S420, establish disaggregated model, naturally semantic crawl model, semi-supervised learning model and identity information crawl model.
Specifically, in one embodiment, the disaggregated model can be decision tree classifier, and decision-tree model provides one
Attribute set, decision tree are sorted data into by making a series of decision-making on the basis of property set;Select Tree Classifier,
Selection Tree Classifier is classified using the technology similar to decision tree classifier to data.Unlike decision tree, selection
Special selection node is included in tree, selects node there are multiple branches.In addition, also can be used artificial neural network, case inference,
The graders such as nearest-neighbors method, SVMs and random forest.
Specifically, in one embodiment, the construction of disaggregated model can be divided into following steps:
All samples are divided into training sample and test sample two parts by selected sample;
Classifier algorithm is performed on training sample, generates disaggregated model;
Disaggregated model is performed in test sample, generates prediction result;
According to prediction result, necessary evaluation index is calculated, assesses the performance of disaggregated model.
Specifically, the natural semantic model is based on natural semantic processes and being used for of establishing captures the moulds of specific fields
Type.For example, the fields such as " glucose ", " insulin " can be captured from the initial data obtained by the natural semantic model.
Specifically, the semi-supervised learning model combines some labeled numbers using the data of a large amount of unmarked mistakes
According to making the model trained to solve the problems, such as that data volume is rare and scattered.Semi-supervised model is built based on semi-supervised learning method
Vertical to form, semi-supervised learning method is between supervised learning method and non-supervisory formula learning method.In the case where one naturally considers,
The data (labeled data) crossed if the same number tag, it is a large amount of handy unmarked whether we can utilize
Cross, untreated data (unlabeled data) build a more accurate grader (classifier), this is asked
Topic will generally be classified as semi-supervised learning (semi-supervised learning).Using the original of semi-supervised learning method
Because being, in real-life, flag data not only expends time, artificial, even money;Relative, without the number of mark
It is more and easily available according to amount.Therefore how to be an important problem using Unlabeled data, such as:Obtained in previous step
Initial data include many original documents, these documents are originally to need artificial treatment just to do correct classification, are taken
When it is laborious, in these cases, if the data of unmarked mistake can allow the efficiency of model to improve really.
Further, in one embodiment, the semi-supervised learning model may be selected to be based on self-training algorithm (self-
Training), generation model (generative models), semisupervised support vector machines (SVMs), Graph-theoretical Approach (graph-
Based methods), the model of various visual angles algorithm (multiview learing) etc..
Further, in one embodiment, semi-supervised classifier includes the semi-supervised classifier based on production, is based on
The semi-supervised classifier of figure, the semi-supervised classifier based on inconsistency and the semi-supervised classifier based on SVMs.
Further, in one embodiment, the foundation of semi-supervised learning model may include following steps:
For having flag data and a large amount of Unlabeled datas, the multiple semi-supervised classifiers of random initializtion on a small quantity;
For each initial semi-supervised classifier, the prediction result of semi-supervised classifier is carried out by optimization method excellent
Change;
The prediction result of the semi-supervised classifier optimized is divided into multiple clusters by the clustering method of machine learning;
For each cluster of cluster result, the output wherein optimal semi-supervised classifier of desired value;
The semi-supervised classifier of each cluster output is collected, obtains multiple semi-supervised classifiers.
Specifically, a small amount of flag data can be diabetic complications case data, including acute complication and chronic merging
Disease.For example, acute complication includes hypoglycemia and hyperglycaemia, chronic complication includes eyes lesion, renal lesions, neuropathy
Change, cardiovascular lesion and foot lesion etc..
In one embodiment, identity information crawl model is mainly used in obtaining limited insulin in initial data and used
The information such as the identity information of person, such as age, height, body weight.
Further, in one embodiment, diabetes develop into children, young man, old via original disease of old people
The disease that year people can obtain.Although same disease, the management of different age group is really different, such as, for disease pipe
The time of reason is different, it is different to treat the time up to standard, health is different, Blood sugar management standard is different, because these differences,
The different ages has differently feature in insulin using upper, therefore can obtain what limited insulin used according to the age
Violation document.
S430, the initial data is inputted into the disaggregated model, the initial data of input is analyzed, sorted out, will
The case not diagnosed correctly is reclassified, and the violation document sorted out is exported.
S440, by the initial data input naturally semantic crawl model, crawl keyword " insulin " and " grape
Sugar ", the document for using " insulin " and " glucose " twice and more than twice is obtained by analyzing, and by corresponding violation
Document exports.
S450, the initial data is inputted into the semi-supervised learning model, it is right by the analysis to the initial data
The Unlabeled data of input is classified, and the violation document in initial data is exported.
S460, the initial data is inputted into the identity information and captures model, the identity information of patient is captured, according to not
Violation document is exported with medicine service condition corresponding to patient.
Compared to prior art, using the method for data processing of the present invention, the device and computer of data processing
Readable storage medium storing program for executing can be directed to that different regions are quick, violation of batch extracting management and control medicine ground document, substantially reduce the number artificial
Cost simultaneously greatly improves efficiency.First, the medical initial data of insurance institution's database and medical institutions at different levels is obtained;Its
It is secondary, establish disaggregated model, naturally semantic crawl model, semi-supervised learning model and identity information crawl model;Finally, by described in
Initial data inputs the disaggregated model, naturally semantic crawl model, semi-supervised learning model and identity information crawl model, respectively
Model carries out analysis output violation document to the initial data of input.Initial data is divided using different models
Analysis, also more comprehensively and accurately obtain violation document.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
It should be noted that herein, term " comprising ", "comprising" or its any other variant are intended to non-row
His property includes, so that process, method, article or device including a series of elements not only include those key elements, and
And also include the other element being not expressly set out, or also include for this process, method, article or device institute inherently
Key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including this
Other identical element also be present in the process of key element, method, article or device.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to realized by hardware, but a lot
In the case of the former be more preferably embodiment.Based on such understanding, technical scheme is substantially in other words to existing
The part that technology contributes can be embodied in the form of software product, and the computer software product is stored in a storage
In medium (such as ROM/RAM, magnetic disc, CD), including some instructions to cause a station terminal equipment (can be mobile phone, calculate
Machine, server, air conditioner, or network equipment etc.) perform method described in each embodiment of the present invention.
Above by reference to the preferred embodiments of the present invention have been illustrated, not thereby limit to the interest field of the present invention.On
State that sequence number of the embodiment of the present invention is for illustration only, do not represent the quality of embodiment.Patrolled in addition, though showing in flow charts
Order is collected, but in some cases, can be with the step shown or described by being performed different from order herein.
Those skilled in the art do not depart from the scope of the present invention and essence, can have a variety of flexible programs to realize the present invention,
It can be used for another embodiment for example as the feature of one embodiment and obtain another embodiment.All technologies with the present invention
The all any modification, equivalent and improvement made within design, all should be within the interest field of the present invention.
Claims (10)
- A kind of 1. method of data processing, it is characterised in that the method comprising the steps of:The medical initial data of insurance institution's database and medical institutions at different levels is obtained, establishes disaggregated model, naturally semantic crawl Model, semi-supervised learning model and identity information crawl model;AndThe initial data is inputted into the disaggregated model, naturally semantic crawl model, semi-supervised learning model and identity information Model is captured, each model carries out analysis output violation document to the initial data of input;Wherein, the disaggregated model is classified to the medical initial data, and to the mistaken diagnosis in the medical initial data Case correct so that the missed case is correctly classified.
- 2. the method for data processing as claimed in claim 1, it is characterised in that the classification initial data is protected including medical insurance Single, charge document, reimbursement document.
- 3. the method for data processing as claimed in claim 1, it is characterised in that the disaggregated model includes decision tree classification One of device, selection Tree Classifier.
- 4. the method for data processing as claimed in claim 3, it is characterised in that the foundation of the disaggregated model includes following step Suddenly:All samples are divided into training sample and test sample two parts by selected sample;Classifier algorithm is performed on the training sample, generates the disaggregated model;The disaggregated model is performed in the test sample, generates prediction result;AndAccording to the prediction result, necessary evaluation index is calculated, assesses the performance of the disaggregated model.
- 5. the method for data processing as claimed in claim 1, it is characterised in that the natural semantic model is to be based on nature language The model for being used to capture specific fields that justice is handled and established, the identity information crawl model are mainly used in obtaining initial data The identity information of middle medicine user.
- 6. the method for data processing as claimed in claim 1, it is characterised in that the foundation of the semi-supervised learning model includes Following steps:For having flag data and a large amount of Unlabeled datas, the multiple semi-supervised classifiers of random initializtion on a small quantity;For each initial semi-supervised classifier, the prediction result of semi-supervised classifier is optimized by optimization method;The prediction result of the semi-supervised classifier optimized is divided into multiple clusters by the clustering method of machine learning;For each cluster of cluster result, the output wherein optimal semi-supervised classifier of desired value;The semi-supervised classifier of each cluster output is collected, obtains multiple semi-supervised classifiers.
- 7. the method for data processing as claimed in claim 6, it is characterised in that the semi-supervised classifier is included based on generation The semi-supervised classifier of formula, the semi-supervised classifier based on figure, the semi-supervised classifier based on inconsistency and based on support to One of semi-supervised classifier of amount machine.
- 8. the method for data processing as claimed in claim 1, it is characterised in that methods described also includes step:The disaggregated model is analyzed the initial data of input, sorted out, and the case not diagnosed correctly is reclassified;Naturally the semantic crawl model crawl keyword " insulin " and " glucose ", by analyze obtain twice and twice with The upper document for using " insulin " and " glucose ";The semi-supervised learning model is classified to the Unlabeled data of input, incited somebody to action by the analysis to the initial data Violation document in initial data is exported;AndThe identity information of the identity information crawl model crawl patient, according to medicine service condition corresponding to different patients to disobeying Rule document is exported.
- 9. a kind of data processing equipment, it is characterised in that described information processing unit includes:Memory, processor, the storage The computer program that can be run on the processor is stored with device, the computer program is by real during the computing device Now the step of data processing method as any one of claim 1 to 8.
- 10. a kind of computer-readable recording medium, it is characterised in that be stored with computer on the computer-readable recording medium Program, the data processing method as any one of claim 1 to 8 is realized when the computer program is executed by processor The step of.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710951683.2A CN107895168A (en) | 2017-10-13 | 2017-10-13 | The method of data processing, the device of data processing and computer-readable recording medium |
PCT/CN2018/089705 WO2019071965A1 (en) | 2017-10-13 | 2018-06-03 | Data processing method, data processing device, and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710951683.2A CN107895168A (en) | 2017-10-13 | 2017-10-13 | The method of data processing, the device of data processing and computer-readable recording medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107895168A true CN107895168A (en) | 2018-04-10 |
Family
ID=61802728
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710951683.2A Pending CN107895168A (en) | 2017-10-13 | 2017-10-13 | The method of data processing, the device of data processing and computer-readable recording medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107895168A (en) |
WO (1) | WO2019071965A1 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109145296A (en) * | 2018-08-09 | 2019-01-04 | 新华智云科技有限公司 | A kind of general word recognition method and device based on monitor model |
CN109492095A (en) * | 2018-10-16 | 2019-03-19 | 平安健康保险股份有限公司 | Claims Resolution data processing method, device, computer equipment and storage medium |
CN109542901A (en) * | 2018-11-12 | 2019-03-29 | 北京懿医云科技有限公司 | Data processing method, device, computer readable storage medium and electronic equipment |
CN109583510A (en) * | 2018-12-13 | 2019-04-05 | 平安医疗健康管理股份有限公司 | Disease violation medication detection method, device, equipment and computer storage medium |
CN109636632A (en) * | 2018-12-13 | 2019-04-16 | 平安医疗健康管理股份有限公司 | Settlement of insurance claim method, apparatus, equipment and storage medium based on machine learning |
WO2019071965A1 (en) * | 2017-10-13 | 2019-04-18 | 平安科技(深圳)有限公司 | Data processing method, data processing device, and computer readable storage medium |
CN110119991A (en) * | 2019-04-12 | 2019-08-13 | 深圳壹账通智能科技有限公司 | Checking method, device and storage medium are compensated in medical treatment based on machine learning |
CN110674373A (en) * | 2019-09-17 | 2020-01-10 | 上海森亿医疗科技有限公司 | Big data processing method, device, equipment and storage medium based on sensitive data |
CN111325247A (en) * | 2020-02-10 | 2020-06-23 | 山东浪潮通软信息科技有限公司 | Intelligent auditing realization method based on least square support vector machine |
CN111477321A (en) * | 2020-03-11 | 2020-07-31 | 北京大学第三医院(北京大学第三临床医学院) | Treatment effect prediction system with self-learning capability and treatment effect prediction terminal |
CN111855771A (en) * | 2020-07-20 | 2020-10-30 | 燕山大学 | Electrochemical analysis method for simultaneous detection of glucose and insulin |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112308552B (en) * | 2020-06-29 | 2024-08-16 | 北京京东拓先科技有限公司 | Method and device for ordering medical insurance medicine |
CN113643818B (en) * | 2021-09-16 | 2023-11-24 | 上海德衡数据科技有限公司 | Method and system for integrating medical data based on regional data |
CN117637093B (en) * | 2024-01-25 | 2024-04-12 | 西南医科大学附属医院 | Patient information management method and system based on intelligent medical treatment |
CN118094105B (en) * | 2024-03-25 | 2024-10-15 | 广州探域科技有限公司 | Model increment fine tuning method, system, equipment and medium based on dynamic information |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1194045A (en) * | 1995-07-25 | 1998-09-23 | 好乐思治疗公司 | Computer assisted methods for diagnosing diseases |
US20080086327A1 (en) * | 2006-10-06 | 2008-04-10 | Qmed, Inc. | System and method for determining and verifying disease classification codes |
CN102341821A (en) * | 2009-03-02 | 2012-02-01 | 康菲丹特夏威夷有限公司 | Medical system and method for serving users with a chronic disease or health state |
CN103314386A (en) * | 2010-10-29 | 2013-09-18 | 爱克斯欧德斯支付系统有限公司 | Method and system for processing transactions using a token |
CN103390171A (en) * | 2013-07-24 | 2013-11-13 | 南京大学 | Safe semi-supervised learning method |
CN106778014A (en) * | 2016-12-29 | 2017-05-31 | 浙江大学 | A kind of risk Forecasting Methodology based on Recognition with Recurrent Neural Network |
CN106934220A (en) * | 2017-02-24 | 2017-07-07 | 黑龙江特士信息技术有限公司 | Towards the disease class entity recognition method and device of multi-data source |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104134127A (en) * | 2014-08-08 | 2014-11-05 | 平安养老保险股份有限公司 | Medical service provider performance management system and method |
CN106295531A (en) * | 2016-08-01 | 2017-01-04 | 乐视控股(北京)有限公司 | A kind of gesture identification method and device and virtual reality terminal |
CN107895168A (en) * | 2017-10-13 | 2018-04-10 | 平安科技(深圳)有限公司 | The method of data processing, the device of data processing and computer-readable recording medium |
-
2017
- 2017-10-13 CN CN201710951683.2A patent/CN107895168A/en active Pending
-
2018
- 2018-06-03 WO PCT/CN2018/089705 patent/WO2019071965A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1194045A (en) * | 1995-07-25 | 1998-09-23 | 好乐思治疗公司 | Computer assisted methods for diagnosing diseases |
US20080086327A1 (en) * | 2006-10-06 | 2008-04-10 | Qmed, Inc. | System and method for determining and verifying disease classification codes |
CN102341821A (en) * | 2009-03-02 | 2012-02-01 | 康菲丹特夏威夷有限公司 | Medical system and method for serving users with a chronic disease or health state |
CN103314386A (en) * | 2010-10-29 | 2013-09-18 | 爱克斯欧德斯支付系统有限公司 | Method and system for processing transactions using a token |
CN103390171A (en) * | 2013-07-24 | 2013-11-13 | 南京大学 | Safe semi-supervised learning method |
CN106778014A (en) * | 2016-12-29 | 2017-05-31 | 浙江大学 | A kind of risk Forecasting Methodology based on Recognition with Recurrent Neural Network |
CN106934220A (en) * | 2017-02-24 | 2017-07-07 | 黑龙江特士信息技术有限公司 | Towards the disease class entity recognition method and device of multi-data source |
Non-Patent Citations (2)
Title |
---|
周如意: "基于BP神经网络和关联规则的智能医疗保险稽核系统研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
陈晓林: "基于动态代价敏感的机器学习研究", 《中国博士学位论文全文数据库 信息科技辑》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019071965A1 (en) * | 2017-10-13 | 2019-04-18 | 平安科技(深圳)有限公司 | Data processing method, data processing device, and computer readable storage medium |
CN109145296A (en) * | 2018-08-09 | 2019-01-04 | 新华智云科技有限公司 | A kind of general word recognition method and device based on monitor model |
CN109492095A (en) * | 2018-10-16 | 2019-03-19 | 平安健康保险股份有限公司 | Claims Resolution data processing method, device, computer equipment and storage medium |
CN109542901A (en) * | 2018-11-12 | 2019-03-29 | 北京懿医云科技有限公司 | Data processing method, device, computer readable storage medium and electronic equipment |
CN109583510A (en) * | 2018-12-13 | 2019-04-05 | 平安医疗健康管理股份有限公司 | Disease violation medication detection method, device, equipment and computer storage medium |
CN109636632A (en) * | 2018-12-13 | 2019-04-16 | 平安医疗健康管理股份有限公司 | Settlement of insurance claim method, apparatus, equipment and storage medium based on machine learning |
CN110119991A (en) * | 2019-04-12 | 2019-08-13 | 深圳壹账通智能科技有限公司 | Checking method, device and storage medium are compensated in medical treatment based on machine learning |
CN110674373A (en) * | 2019-09-17 | 2020-01-10 | 上海森亿医疗科技有限公司 | Big data processing method, device, equipment and storage medium based on sensitive data |
CN111325247A (en) * | 2020-02-10 | 2020-06-23 | 山东浪潮通软信息科技有限公司 | Intelligent auditing realization method based on least square support vector machine |
CN111325247B (en) * | 2020-02-10 | 2022-08-02 | 浪潮通用软件有限公司 | Intelligent auditing realization method based on least square support vector machine |
CN111477321A (en) * | 2020-03-11 | 2020-07-31 | 北京大学第三医院(北京大学第三临床医学院) | Treatment effect prediction system with self-learning capability and treatment effect prediction terminal |
CN111477321B (en) * | 2020-03-11 | 2023-06-09 | 北京大学第三医院(北京大学第三临床医学院) | Treatment effect prediction system with self-learning capability and treatment effect prediction terminal |
CN111855771A (en) * | 2020-07-20 | 2020-10-30 | 燕山大学 | Electrochemical analysis method for simultaneous detection of glucose and insulin |
CN111855771B (en) * | 2020-07-20 | 2021-08-31 | 燕山大学 | Electrochemical analysis method for simultaneous detection of glucose and insulin |
Also Published As
Publication number | Publication date |
---|---|
WO2019071965A1 (en) | 2019-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107895168A (en) | The method of data processing, the device of data processing and computer-readable recording medium | |
Wang et al. | A novelty detection patent mining approach for analyzing technological opportunities | |
WO2021068601A1 (en) | Medical record detection method and apparatus, device and storage medium | |
CN109243618B (en) | Medical model construction method, disease label construction method and intelligent device | |
Akbari et al. | From tweets to wellness: Wellness event detection from twitter streams | |
Malik et al. | Design and evaluation of a hybrid technique for detecting sunflower leaf disease using deep learning approach | |
CN111785384B (en) | Abnormal data identification method based on artificial intelligence and related equipment | |
CN111145910A (en) | Abnormal case identification method and device based on artificial intelligence and computer equipment | |
CN106845147A (en) | Medical practice summarizes method for building up, device and the data assessment method of model | |
WO2021151295A1 (en) | Method, apparatus, computer device, and medium for determining patient treatment plan | |
CN109615012A (en) | Medical data exception recognition methods, equipment and storage medium based on machine learning | |
CN110210194A (en) | Electronic contract display methods, device, electronic equipment and storage medium | |
US20240005211A1 (en) | Data processing method and apparatus | |
Dong et al. | Cervical cell classification based on the CART feature selection algorithm | |
KR20200063841A (en) | Method for standardizing recognized term from document image | |
CN110060750A (en) | Medical data method for pushing, system, computer equipment and readable storage medium storing program for executing | |
CN115050442B (en) | Disease category data reporting method and device based on mining clustering algorithm and storage medium | |
CN108717862A (en) | A kind of careful square evolution model of the intelligence based on machine learning | |
CN109635113A (en) | Abnormal insured people purchases medicine data detection method, device, equipment and storage medium | |
CN117557331A (en) | Product recommendation method and device, computer equipment and storage medium | |
CN113724878B (en) | Medical risk information pushing method and device based on machine learning | |
Himel et al. | Vision intelligence for smart sheep farming: applying ensemble learning to detect sheep breeds | |
CN113657550A (en) | Patient marking method, device, equipment and storage medium based on hierarchical calculation | |
CN108022635A (en) | Violation document methods of marking, violation document scoring apparatus and computer-readable recording medium | |
Prottasha et al. | Impact learning: A learning method from feature’s impact and competition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180410 |