Nothing Special   »   [go: up one dir, main page]

CN103473375A - Data cleaning method and data cleaning system - Google Patents

Data cleaning method and data cleaning system Download PDF

Info

Publication number
CN103473375A
CN103473375A CN2013104563951A CN201310456395A CN103473375A CN 103473375 A CN103473375 A CN 103473375A CN 2013104563951 A CN2013104563951 A CN 2013104563951A CN 201310456395 A CN201310456395 A CN 201310456395A CN 103473375 A CN103473375 A CN 103473375A
Authority
CN
China
Prior art keywords
data
field
value
record
numeric
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013104563951A
Other languages
Chinese (zh)
Inventor
李登高
陈卫华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Medical Information Technology Co Ltd Of Beijing University
Original Assignee
Founder International Co Ltd
Founder International Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Founder International Co Ltd, Founder International Beijing Co Ltd filed Critical Founder International Co Ltd
Priority to CN2013104563951A priority Critical patent/CN103473375A/en
Publication of CN103473375A publication Critical patent/CN103473375A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a data cleaning method and a data cleaning system. The data cleaning system comprises a processor. The data cleaning method comprises the following steps of cleaning domain data from different systems according to a preset rule, and adjusting the preset rule according to a cleaning result. The data cleaning method and the data cleaning system provided by the invention have the advantages that by the technical scheme, the domain data is cleaned from different data sources so as to meet the requirement of a main database of the data cleaning system and lay a foundation for data classifying and comparison in the next step and merging to recognize different data representing the same object.

Description

Data clean system and Data Cleaning Method
Technical field
The present invention relates to computer realm, in particular to a kind of Data clean system and a kind of Data Cleaning Method.
Background technology
China's medical information is polymorphic and deposits and gradual perfection at present, and final target reaches the medical information socialization.In medical system, each system is separate, such as door emergency treatment system, be in hospital, Physical Examination System, image center etc., the patient information data demand of part system is low, typing is imperfect.Each operation system standard is inconsistent, service fields is inconsistent or language performance difference, thereby causes patient information there is no association, and between system, information is independent.Patient data only has the part field effective, can not carry out the uniqueness confirmation to the patient, the disappearance sign.Between system, platform is inconsistent, and data standard is inconsistent, and causing alternately can not be unimpeded.
Data cleansing is the first stage of carrying out Data Comparison, the credible merging of data, forming the key of unified Main index of patients, is that Main index of patients is processed the core formed.Therefore, need a kind of data cleansing scheme, different-format, the different data of expressing can be unified into and meet the data that predetermined format requires, make data normalization, be convenient to follow-up data and process, make to link up between system smooth and easy.
Summary of the invention
The present invention just is being based on the problems referred to above, has proposed a kind of data cleansing scheme, different-format, the different data of expressing can be unified into and meet the data that predetermined format requires, and makes data normalization.Be convenient to follow-up data and process, make to link up between system smooth and easy.
In view of this, according to an aspect of the present invention, proposed a kind of Data clean system, having comprised: processor, clean the numeric field data from different system according to preset rules, and adjust described preset rules according to wash result.
The form of the numeric field data in different mirror image servers is all different, likely that field is different, likely that expression way is different, it is likely the field value mistake, it is invalid that this Data clean system can identify, and undesirable data can be cleaned the numeric field data from different system, realize data normalization, be convenient to follow-up association and calculate.
In technique scheme, preferred, described processor reads the field value of every record in described numeric field data, by meeting pre-conditioned field value, is not replaced as preset value or null value or deletes described field value.
Because each data recording has all comprised one or more fields, therefore, when data are cleaned, can comprise the cleaning of field level, for example, for the time field, different systems means that mode is different, the time of these different table modes need be carried out samely, according to the form of the time field of the master data base of Data clean system, is unified.
In technique scheme, preferably, described processor, also for reading one by one the record of described numeric field data, is that the fields match that lacks field value goes out described field value according to the incidence relation between field in described record, and described field value is filled to respective field.
Although data recording has comprised a plurality of fields, but likely some field lacks key value, can identify the incidence relation between field in data recording, derive the respective field value of the field that lacks key value based on this incidence relation, thereby complete coupling and the filling of record.
In technique scheme, preferred, described processor can comprise: computing unit, for the weighted value according to each field in every record of described numeric field data, calculate total weighted value of respective record, and delete the record that total weighted value is less than or equal to threshold value.The weight that the I.D. field for example is set is 50% to the maximum, the weight that name is set is 20%, the weight minimum 5% of address field, and it is 5% that threshold value is set, while supposing to have the character value of a record only to comprise the address field, the weighted value that calculates this record is 5%, and the weighted value of this record is less than or equal to threshold value, therefore delete this record.In technique scheme, preferably, described processor is also for identifying a plurality of numeric field datas that belong to same territory, and the described a plurality of numeric field datas under more same territory, when comparing the discrepant field of tool, according to the discrepant field of the described tool of data relationship correction in the territory in described same territory.
Some the time, even if in data recording the field value of all fields all filled and expression way also correct, but in different systems, corresponding field value is different, in order to determine which field value is correct, need to there is according to data relationship correction in Xia territory, same territory the field of field value difference.For example, the system under same territory comprises public security system, market system and hospital system, and when difference appears in data recording, data recording that can public security system is as the criterion, and revises the numeric field data of other system, to guarantee the accuracy of data as far as possible.
According to a further aspect in the invention, also proposed a kind of Data Cleaning Method, having comprised: cleaned the numeric field data from different system according to preset rules, and adjust described preset rules according to wash result.
The form of the numeric field data in different mirror image servers is all different, likely that field is different, likely that expression way is different, it is likely the field value mistake, it is invalid that this Data clean system can identify, and undesirable data can be cleaned the numeric field data from different mirror image servers, realize data normalization, be convenient to follow-up association and calculate.
In technique scheme, preferred, the concrete steps of cleaning described numeric field data comprise: read the field value of every record in described numeric field data, by meeting pre-conditioned field value, be not replaced as preset value or null value.
Because each data recording has all comprised one or more fields, therefore, when data are cleaned, can comprise the cleaning of field level, for example, for the time field, different systems means that mode is different, the time of these different table modes need be carried out samely, according to the form of the time field of the master data base of Data clean system, is unified.
In technique scheme, preferably, the concrete steps of cleaning described numeric field data also comprise: read one by one the record in described numeric field data, be that the fields match that lacks field value goes out described field value according to the incidence relation between field in described record, and described field value is filled to respective field.
Although data recording has comprised a plurality of fields, but likely some field lacks key value, can identify the incidence relation between field in data recording, derive the respective field value of the field that lacks key value based on this incidence relation, thereby complete coupling and the filling of record.
In technique scheme, preferred, the concrete steps of cleaning described numeric field data can also comprise: the weighted value according to each field in every record in described numeric field data, calculate total weighted value of respective record, and delete the record that total weighted value is less than or equal to threshold value.
In technique scheme, preferred, the concrete steps of cleaning described numeric field data also comprise: identify a plurality of numeric field datas that belong to same territory, and the described a plurality of numeric field datas under more same territory; When comparing the discrepant field of tool, according to the discrepant field of the described tool of data relationship correction in the territory in described same territory.
Some the time, even if in data recording the field value of all fields all filled and expression way also correct, but in different systems, corresponding field value is different, in order to determine which field value is correct, need to there is according to data relationship correction in Xia territory, same territory the field of field value difference.For example, the system under same territory comprises public security system, market system and hospital system, and when difference appears in data recording, data recording that can public security system is as the criterion, and revises the numeric field data of other system, to guarantee the accuracy of data as far as possible.
Because data layout between different system is widely different, therefore need to be cleaned the data from different system, to meet standardized format, think that associated calculating prepare, so that link up more smooth and easy between system.The cleaning of data comprises cleaning, the cleaning of record level and the system-level cleaning of field level.It is mainly against regulation, invalid being replaced field value that field level is cleaned, and then the data layout that does not meet the master data base standard is changed.It is mainly mate and fill incorrect field and field value according to the incidence relation between field in record that record level is cleaned.System-level cleaning is that the master data under same territory is compared, and unnecessary or incorrect field is revised by data relationship in territory, thereby completed the cleaning process of data, and the data after also making to clean are also more accurate when meeting standard format.
The accompanying drawing explanation
Fig. 1 shows the block diagram of Data clean system according to an embodiment of the invention;
Fig. 2 shows the schematic diagram of data cleansing principle according to an embodiment of the invention;
Fig. 3 shows the process flow diagram of Data Cleaning Method according to an embodiment of the invention;
Fig. 4 shows the process flow diagram of Data Cleaning Method according to another embodiment of the present invention.
Embodiment
In order more clearly to understand above-mentioned purpose of the present invention, feature and advantage, below in conjunction with the drawings and specific embodiments, the present invention is further described in detail.It should be noted that, in the situation that do not conflict, the application's embodiment and the feature in embodiment can combine mutually.
A lot of details have been set forth in the following description so that fully understand the present invention; but; the present invention can also adopt other to be different from other modes described here and implement, and therefore, protection scope of the present invention is not subject to the restriction of following public specific embodiment.
Fig. 1 shows the block diagram of Data clean system according to an embodiment of the invention.
As shown in Figure 1, Data clean system 100 according to an embodiment of the invention, comprising: processor 102, clean the numeric field data from different system according to preset rules, and adjust described preset rules according to wash result.
The form of the numeric field data in different mirror image servers is all different, likely that field is different, likely that expression way is different, it is likely the field value mistake, it is invalid that this Data clean system can identify, and undesirable data can be cleaned the numeric field data from different mirror image servers, realize data normalization, be convenient to follow-up association and calculate.
In technique scheme, preferred, described processor 102 reads the field value of described numeric field data, by meeting pre-conditioned field value, is not replaced as preset value or null value or deletes described field value.
Because each data recording has all comprised one or more fields, therefore, when data are cleaned, can comprise the cleaning of field level, for example, for the time field, different systems means that mode is different, the time of these different table modes need be carried out samely, according to the form of the time field of the master data base of Data clean system, is unified.
In technique scheme, preferably, described processor 102, also for reading one by one the record of described numeric field data, is that the fields match that lacks field value goes out described field value according to the incidence relation between field in described record, and described field value is filled to respective field.
Although data recording has comprised a plurality of fields, but likely some field lacks key value, can identify the incidence relation between field in data recording, derive the respective field value of the field that lacks key value based on this incidence relation, thereby complete coupling and the filling of record.
In technique scheme, preferably, described processor 102 can comprise: computing unit 1022, for the weighted value according to each field in every record of described numeric field data, calculate total weighted value of respective record, delete the record that total weighted value is less than or equal to threshold value.The weight that the I.D. field for example is set is 50% to the maximum, the weight that name is set is 20%, the weight minimum 5% of address field, and it is 5% that threshold value is set, while supposing to have the character value of a record only to comprise the address field, the weighted value that calculates this record is 5%, and the weighted value of this record is less than or equal to threshold value, therefore delete this record.
In technique scheme, preferably, described processor 102 is also for identifying a plurality of numeric field datas that belong to same territory, and the described a plurality of numeric field datas under more same territory, when comparing the discrepant field of tool, according to the discrepant field of the described tool of data relationship correction in the territory in described same territory.
Some the time, even if in data recording the field value of all fields all filled and expression way also correct, but in different systems, corresponding field value is different, in order to determine which field value is correct, need to there is according to data relationship correction in Xia territory, same territory the field of field value difference.For example, the system under same territory comprises public security system, market system and hospital system, and when difference appears in data recording, data recording that can public security system is as the criterion, and revises the numeric field data of other system, to guarantee the accuracy of data as far as possible.
Fig. 2 shows the schematic diagram of data cleansing principle according to an embodiment of the invention.
As shown in Figure 2, the numeric field data that Data clean system 100 receives from different mirror image servers 202, although should be appreciated that in figure and mean different mirror image servers with identical label, be actually different mirror image servers, for example clinic system, Physical Examination System and the system of being in hospital.Data clean system 100 is cleaned these numeric field datas according to the form of the master data base of book server, to realize the standardization of data, obtains standardized data.When cleaning, can be divided into field level and clean, record level cleaning and system-level cleaning.
Fig. 3 shows the process flow diagram of Data Cleaning Method according to an embodiment of the invention.
As shown in Figure 3, Data Cleaning Method according to an embodiment of the invention can comprise the following steps: step 302, and clean the numeric field data from different system according to preset rules, and adjust described preset rules according to wash result.
The form of the numeric field data in different mirror image servers is all different, likely that field is different, likely that expression way is different, it is likely the field value mistake, it is invalid that this Data clean system can identify, and undesirable data can be cleaned the numeric field data from different mirror image servers, realize data normalization, be convenient to follow-up association and calculate.
In technique scheme, preferred, the concrete steps of cleaning described numeric field data comprise: read the field value of every record in described numeric field data, by meeting pre-conditioned field value, be not replaced as preset value or null value.
Because each data recording has all comprised one or more fields, therefore, when data are cleaned, can comprise the cleaning of field level, for example, for the time field, different systems means that mode is different, the time of these different table modes need be carried out samely, according to the form of the time field of the master data base of Data clean system, is unified.
In technique scheme, preferably, the concrete steps of cleaning described numeric field data also comprise: read one by one the record in described numeric field data, be that the fields match that lacks field value goes out described field value according to the incidence relation between field in described record, and described field value is filled to respective field.
Although data recording has comprised a plurality of fields, but likely some field lacks key value, can identify the incidence relation between field in data recording, derive the respective field value of the field that lacks key value based on this incidence relation, thereby complete coupling and the filling of record.
In technique scheme, preferred, the concrete steps of cleaning described numeric field data also comprise: the weighted value according to each field in every record in described numeric field data, calculate total weighted value of respective record, and delete the record that total weighted value is less than or equal to threshold value.
In technique scheme, preferred, the concrete steps of cleaning described numeric field data also comprise: identify a plurality of numeric field datas that belong to same territory, and the described a plurality of numeric field datas under more same territory; When comparing the discrepant field of tool, according to the discrepant field of the described tool of data relationship correction in the territory in described same territory.
Some the time, even if in data recording the field value of all fields all filled and expression way also correct, but in different systems, corresponding field value is different, in order to determine which field value is correct, need to there is according to data relationship correction in Xia territory, same territory the field of field value difference.For example, the system under same territory comprises public security system, market system and hospital system, and when difference appears in data recording, data recording that can public security system is as the criterion, and revises the numeric field data of other system, to guarantee the accuracy of data as far as possible.
Fig. 4 shows the process flow diagram of Data Cleaning Method according to another embodiment of the present invention.
As shown in Figure 4, in step 402, receive the numeric field data from different mirror image servers; In step 404, read the field value of numeric field data, be not replaced as preset value or null value by meeting pre-conditioned field value.Be provided with rule base in Data clean system, according to different data cases, corresponding rule is set, for example in the Data clean system master data base, the expression way of time is 2012-12-12, and the time expression way of other numeric field datas is on Dec 12nd, 2012 or 2012.12.12, the time data of these these forms can be unified into to 2012-12-12 so.Again for example, in master data base, between the word of name, there is no space, but there is space between the word of the name of numeric field data, these spaces can be deleted so, conform to the form of name in master data base.Again for example, in master data base, 1 means the sex man, and 2 mean the sex female, and, in other regional data bases, M means the man, and W means the female, therefore the sex form of numeric field data also need to be unified into to the sex form in master data base.This cleaning process is called field level and cleans.
In step 406, judge whether the field value of numeric field data has vacant position, if having, enter step 408, otherwise enter step 410.
In step 408, if the corresponding field value of certain fields default in data, the incidence relation between identification field, derive corresponding field value, and filled.
For example numeric field data comprises birth address field, I.D. field, former bit digital that the incidence relation of address field and I.D. field is ID (identity number) card No. are relevant with the birth address, if the disappearance of the number relevant for the birth address of I.D., can find out corresponding Code Number according to the field value of birth address field so, this Code Number is filled in the number lacked in I.D., so just, make the identity data record more perfect, more effectively, with more accurate, such processing procedure is called the record level and cleans.
In step 410, whether judge between different numeric field datas and if having, to enter step 412 by the discrepant character of tool, otherwise, finish this flow process.
In step 412, according to the discrepant field of data relationship correction tool in the territory in same territory.For example the ID (identity number) card No. of the ID (identity number) card No. of the user a of system A and the user a in system B is different, supposing the system A is hospital system, system B is public security bureau's system, in territory, data relationship is the weight that the weight of the data of public security bureau's system is greater than the data of hospital system, data that so can system B are as the criterion the more data of positive system A, make data more accurate.This cleaning process is called system-level cleaning.
More than be described with reference to the accompanying drawings technical scheme of the present invention, because data layout between different system is widely different, therefore need to be cleaned the data from different system, to meet standardized format, think that associated calculating prepare, so that link up more smooth and easy between system.The cleaning of data comprises cleaning, the cleaning of record level and the system-level cleaning of field level.It is mainly against regulation, invalid being replaced field value that field level is cleaned, and then the data layout that does not meet the master data base standard is changed.It is mainly mate and fill incorrect field and field value according to the incidence relation between field in record that record level is cleaned.System-level cleaning is that the master data under same territory is compared, and unnecessary or incorrect field is revised by data relationship in territory, thereby completed the cleaning process of data, and the data after also making to clean are also more accurate when meeting standard format.
In the present invention, term " a plurality of " refers to two or more, unless clear and definite restriction separately arranged.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (10)

1. a Data clean system, is characterized in that, comprising:
Processor, clean the numeric field data from different system according to preset rules, and adjust described preset rules according to wash result.
2. Data clean system according to claim 1, is characterized in that, described processor reads the field value of every record in described numeric field data, by meeting pre-conditioned field value, is not replaced as preset value or null value or deletes described field value.
3. Data clean system according to claim 2, it is characterized in that, described processor is also for reading one by one the record of described numeric field data, be that the fields match that lacks field value goes out described field value according to the incidence relation between field in described record, and described field value is filled to respective field.
4. Data clean system according to claim 2, is characterized in that, described processor comprises:
Computing unit, for the weighted value according to each field in every record of described numeric field data, calculate total weighted value of respective record, deletes the record that total weighted value is less than or equal to threshold value.
5. according to the described Data clean system of any one in claim 2 to 4, it is characterized in that, described processor is also for identifying a plurality of numeric field datas that belong to same territory, and the described a plurality of numeric field datas under more same territory, when comparing the discrepant field of tool, according to the discrepant field of the described tool of data relationship correction in the territory in described same territory.
6. a Data Cleaning Method, is characterized in that, comprising:
Clean the numeric field data from different system according to preset rules, and adjust described preset rules according to wash result.
7. Data Cleaning Method according to claim 6, is characterized in that, the concrete steps of cleaning described numeric field data comprise:
Read the field value of every record in described numeric field data, by meeting pre-conditioned field value, be not replaced as preset value or null value.
8. Data Cleaning Method according to claim 7, is characterized in that, the concrete steps of cleaning described numeric field data also comprise:
Read one by one the record in described numeric field data, be that the fields match that lacks field value goes out described field value according to the incidence relation between field in described record, and described field value is filled to respective field.
9. Data Cleaning Method according to claim 7, is characterized in that, the concrete steps of cleaning described numeric field data also comprise:
Weighted value according to each field in every record in described numeric field data, calculate total weighted value of respective record, and delete the record that total weighted value is less than or equal to threshold value.
10. according to the described Data Cleaning Method of any one in claim 7 to 9, it is characterized in that, the concrete steps of cleaning described numeric field data also comprise:
Identify a plurality of numeric field datas that belong to same territory, and the described a plurality of numeric field datas under more same territory;
When comparing the discrepant field of tool, according to the discrepant field of the described tool of data relationship correction in the territory in described same territory.
CN2013104563951A 2013-09-29 2013-09-29 Data cleaning method and data cleaning system Pending CN103473375A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013104563951A CN103473375A (en) 2013-09-29 2013-09-29 Data cleaning method and data cleaning system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013104563951A CN103473375A (en) 2013-09-29 2013-09-29 Data cleaning method and data cleaning system

Publications (1)

Publication Number Publication Date
CN103473375A true CN103473375A (en) 2013-12-25

Family

ID=49798223

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013104563951A Pending CN103473375A (en) 2013-09-29 2013-09-29 Data cleaning method and data cleaning system

Country Status (1)

Country Link
CN (1) CN103473375A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504021A (en) * 2014-12-11 2015-04-08 北京国双科技有限公司 Data matching method and device
CN104572946A (en) * 2014-12-30 2015-04-29 小米科技有限责任公司 Method and device for processing data of yellow pages
CN104699796A (en) * 2015-03-18 2015-06-10 浪潮集团有限公司 Data cleaning method based on data warehouse
CN104993958A (en) * 2015-06-29 2015-10-21 北京京东尚科信息技术有限公司 Method and system for generating user master data
CN105447126A (en) * 2015-11-17 2016-03-30 苏州蜗牛数字科技股份有限公司 Game prop personalized recommendation method
CN105468658A (en) * 2014-09-26 2016-04-06 中国移动通信集团湖北有限公司 Data cleaning method and apparatus
CN106230890A (en) * 2016-07-15 2016-12-14 中电长城网际系统应用有限公司 A kind of message normalization processing method and system
CN106294492A (en) * 2015-06-08 2017-01-04 深圳中兴网信科技有限公司 Data cleaning method and cleaning engine
CN106446125A (en) * 2016-09-19 2017-02-22 广东中标数据科技股份有限公司 Method and device for improving data quality
CN106933992A (en) * 2017-02-24 2017-07-07 北京华安普惠高新技术有限公司 Distributed data purging system and method based on data analysis
CN107103048A (en) * 2017-03-31 2017-08-29 苏州艾隆信息技术有限公司 Medicine information matching process and system
CN107229662A (en) * 2016-03-25 2017-10-03 阿里巴巴集团控股有限公司 Data cleaning method and device
CN107408268A (en) * 2015-01-28 2017-11-28 环联公司 System and method for retrieving and processing credit data for centralized review
CN108073591A (en) * 2016-11-10 2018-05-25 北京宸信征信有限公司 The integration storage system and method for a kind of multi-source data with identity attribute
CN109241363A (en) * 2018-06-04 2019-01-18 平安科技(深圳)有限公司 List cleaning method, system, computer equipment and storage medium
WO2019080427A1 (en) * 2017-10-27 2019-05-02 平安科技(深圳)有限公司 Medical data cleaning method, electronic apparatus and storage medium
CN109947751A (en) * 2018-12-29 2019-06-28 医渡云(北京)技术有限公司 A kind of medical data processing method, device, readable medium and electronic equipment
CN111581182A (en) * 2020-04-21 2020-08-25 北京龙云科技有限公司 Data cleaning method and device
CN111949641A (en) * 2020-08-06 2020-11-17 武汉理工光科股份有限公司 Method and system for cleaning and synchronizing data between multi-stage platforms
CN113535518A (en) * 2021-07-23 2021-10-22 北京八分量信息科技有限公司 Distributed real-time dynamic monitoring method and system for user behaviors
CN113821503A (en) * 2021-09-23 2021-12-21 北京金山云网络技术有限公司 Medical data processing method and device and edge server
CN115098478A (en) * 2022-06-23 2022-09-23 中电通商数字技术(上海)有限公司 Resident main index generation method, device and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110055252A1 (en) * 2003-03-28 2011-03-03 Dun & Bradstreet, Inc. System and method for data cleansing
CN102156893A (en) * 2011-03-24 2011-08-17 大连海事大学 Cleaning system and method thereof for data acquired by RFID device under network
CN102411569A (en) * 2010-09-20 2012-04-11 上海众融信息技术有限公司 Database conversion and cleaning information processing method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110055252A1 (en) * 2003-03-28 2011-03-03 Dun & Bradstreet, Inc. System and method for data cleansing
CN102411569A (en) * 2010-09-20 2012-04-11 上海众融信息技术有限公司 Database conversion and cleaning information processing method
CN102156893A (en) * 2011-03-24 2011-08-17 大连海事大学 Cleaning system and method thereof for data acquired by RFID device under network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
包从剑: "数据清洗的若干关键技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
叶振春: "实兵对抗演习评估系统中数据清理方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
杨宏娜: "基于数据仓库的数据清洗技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
陈伟: "数据清理关键技术及其软件平台的研究与应用", 《中国优秀博硕士学位论文全文数据库 (博士) 信息科技辑》 *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468658B (en) * 2014-09-26 2020-04-03 中国移动通信集团湖北有限公司 Data cleaning method and device
CN105468658A (en) * 2014-09-26 2016-04-06 中国移动通信集团湖北有限公司 Data cleaning method and apparatus
CN104504021A (en) * 2014-12-11 2015-04-08 北京国双科技有限公司 Data matching method and device
CN104572946A (en) * 2014-12-30 2015-04-29 小米科技有限责任公司 Method and device for processing data of yellow pages
CN104572946B (en) * 2014-12-30 2018-07-06 小米科技有限责任公司 Yellow page data processing method and processing device
CN107408268A (en) * 2015-01-28 2017-11-28 环联公司 System and method for retrieving and processing credit data for centralized review
CN104699796A (en) * 2015-03-18 2015-06-10 浪潮集团有限公司 Data cleaning method based on data warehouse
CN106294492A (en) * 2015-06-08 2017-01-04 深圳中兴网信科技有限公司 Data cleaning method and cleaning engine
CN104993958A (en) * 2015-06-29 2015-10-21 北京京东尚科信息技术有限公司 Method and system for generating user master data
CN105447126A (en) * 2015-11-17 2016-03-30 苏州蜗牛数字科技股份有限公司 Game prop personalized recommendation method
CN107229662B (en) * 2016-03-25 2022-02-25 阿里巴巴集团控股有限公司 Data cleaning method and device
CN107229662A (en) * 2016-03-25 2017-10-03 阿里巴巴集团控股有限公司 Data cleaning method and device
CN106230890A (en) * 2016-07-15 2016-12-14 中电长城网际系统应用有限公司 A kind of message normalization processing method and system
CN106446125A (en) * 2016-09-19 2017-02-22 广东中标数据科技股份有限公司 Method and device for improving data quality
CN106446125B (en) * 2016-09-19 2019-12-24 广东中标数据科技股份有限公司 Method and device for improving data quality
CN108073591A (en) * 2016-11-10 2018-05-25 北京宸信征信有限公司 The integration storage system and method for a kind of multi-source data with identity attribute
CN108073591B (en) * 2016-11-10 2021-10-12 北京宸信征信有限公司 Integrated storage system and method of multi-source data with identity attribute
CN106933992A (en) * 2017-02-24 2017-07-07 北京华安普惠高新技术有限公司 Distributed data purging system and method based on data analysis
CN106933992B (en) * 2017-02-24 2018-02-06 北京华安普惠高新技术有限公司 Distributed data purging system and method based on data analysis
CN107103048B (en) * 2017-03-31 2021-04-20 苏州艾隆信息技术有限公司 Medicine information matching method and system
CN107103048A (en) * 2017-03-31 2017-08-29 苏州艾隆信息技术有限公司 Medicine information matching process and system
WO2019080427A1 (en) * 2017-10-27 2019-05-02 平安科技(深圳)有限公司 Medical data cleaning method, electronic apparatus and storage medium
WO2019232952A1 (en) * 2018-06-04 2019-12-12 平安科技(深圳)有限公司 List clearing method, system, computer device, and storage medium
CN109241363A (en) * 2018-06-04 2019-01-18 平安科技(深圳)有限公司 List cleaning method, system, computer equipment and storage medium
CN109947751A (en) * 2018-12-29 2019-06-28 医渡云(北京)技术有限公司 A kind of medical data processing method, device, readable medium and electronic equipment
CN111581182A (en) * 2020-04-21 2020-08-25 北京龙云科技有限公司 Data cleaning method and device
CN111949641A (en) * 2020-08-06 2020-11-17 武汉理工光科股份有限公司 Method and system for cleaning and synchronizing data between multi-stage platforms
CN111949641B (en) * 2020-08-06 2023-07-14 武汉理工光科股份有限公司 Method and system for cleaning and synchronizing data among multiple stages of platforms
CN113535518A (en) * 2021-07-23 2021-10-22 北京八分量信息科技有限公司 Distributed real-time dynamic monitoring method and system for user behaviors
CN113535518B (en) * 2021-07-23 2023-12-05 北京八分量信息科技有限公司 Distributed real-time dynamic monitoring method and system for user behaviors
CN113821503A (en) * 2021-09-23 2021-12-21 北京金山云网络技术有限公司 Medical data processing method and device and edge server
CN115098478A (en) * 2022-06-23 2022-09-23 中电通商数字技术(上海)有限公司 Resident main index generation method, device and medium

Similar Documents

Publication Publication Date Title
CN103473375A (en) Data cleaning method and data cleaning system
Langley et al. A decision tree for nonmetric sex assessment from the skull
CN112365987B (en) Diagnostic data abnormality detection method, diagnostic data abnormality detection device, computer device, and storage medium
CN103530334B (en) Based on the data matching system and method for comparing template
CN110378347B (en) Method and device for extracting key information of medical examination sheet
CN107194167A (en) A kind of doctors and patients' data management system and method
CN103473373A (en) Threshold matching model-based similarity analysis system and threshold matching model-based similarity analysis method
CN101727535A (en) Cross indexing method for patients crossing system and system thereof
CN110197214A (en) A kind of patient identity matching process based on multi-field similarity calculation
CN111785341A (en) Patient main index data merging method and device based on similarity
US20200013491A1 (en) Interoperable Record Matching Process
Kamnikar et al. Intraobserver error in macromorphoscopic trait data
CN109448811B (en) Prescription auditing improvement method and device, electronic equipment and storage medium
KR20190118618A (en) Information processing apparatus, information processing method and recording medium
CN109545319B (en) Prescription alarm method based on knowledge relation analysis and terminal equipment
CN104063567A (en) Establishment method of patient identity source cross reference
CN107480299B (en) Information processing method and device
CN108320779A (en) Medical data processing method and processing device
CN108388610B (en) Data ETL processing method and device
KR101456189B1 (en) Method for evaluating patents using engine and evaluation server
CN116206767A (en) Disease knowledge mining method, device, electronic equipment and storage medium
CN115293915A (en) Service data verification method, device, equipment and storage medium
CN115640376A (en) Text labeling method and device, electronic equipment and computer-readable storage medium
CN103489051A (en) Method for checking and normalizing customer information in multiple information systems of fund company
CN113221541A (en) Data extraction method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: PKU HEALTHCARE IT CO., LTD.

Free format text: FORMER OWNER: FOUNDER INTERNATIONAL CO., LTD.

Effective date: 20150203

Free format text: FORMER OWNER: FOUNDER INTERNATIONAL (BEIJING) CO., LTD.

Effective date: 20150203

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 215123 SUZHOU, JIANGSU PROVINCE TO: 100080 HAIDIAN, BEIJING

TA01 Transfer of patent application right

Effective date of registration: 20150203

Address after: 100080, No. 19, No. 52 West Fourth Ring Road, Beijing, Haidian District

Applicant after: Medical information Technology Co., Ltd. of Beijing University

Address before: Suzhou City, Jiangsu Province, Suzhou Industrial Park 215123 Xinghu Street No. 328 Creative Industry Park founder International Building

Applicant before: Founder International Co., Ltd.

Applicant before: Founder international software (Beijing) Co., Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20131225