CN110349639B

CN110349639B - Multi-center medical term standardization system based on general medical term library

Info

Publication number: CN110349639B
Application number: CN201910629244.9A
Authority: CN
Inventors: 李劲松; 王执晓; 周天舒; 董凯奇; 田雨
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2019-07-12
Filing date: 2019-07-12
Publication date: 2022-01-04
Anticipated expiration: 2039-07-12
Also published as: JP7093593B2; CN110349639A; JP2022508350A; WO2020233256A1

Abstract

The invention discloses a multi-center medical term standardization system based on a general medical term library. The system includes a source database, a database connection management module, a pre-analysis module, a term mapping unit, an incremental update module, an exception processing module and a multi-center interaction module. module; the invention solves the problem of standardization of medical terminology in multiple medical data centers, and maintains the consistency of medical terminology expression in each medical data center; automatically realizes scanning and analysis of the source database of the medical data center, and realizes existing standards on this basis Automatic mapping of coded medical terms; fully consider the inevitable complexity of medical term mapping, and realize a spiral process of automatic mapping to fuzzy matching mapping and then to custom term mapping; incremental update mechanism makes full use of previous mappings records, which greatly reduces the stress of follow-up work and greatly improves the standardization of medical term mapping.

Description

Multi-center medical term standardization system based on general medical term library

Technical Field

The invention belongs to the field of term standardization, and particularly relates to a multi-center medical term standardization system based on a general medical term library.

Background

With the rapid development of medical informatization, the types and scales of medical data are rapidly increased, and it is a necessary trend to perform data analysis and mining by using medical data of multiple medical data centers (simply referred to as "multiple centers") to provide support for clinical decision, medical management service and scientific research. However, the related standards of medical terms in China are deficient, the system is not complete, and medical information system manufacturers are numerous, so that the isomerism of term names and codes among medical data centers and even in medical data centers is serious, and a large amount of semi-structured and unstructured data is accompanied; the internationally mature related term sets are limited to be applied domestically, and the mapping relation among the internationally existing standard term sets is difficult to be applied to the standardization of domestic medical terms due to language barriers; due to the reasons, the medical information systems cannot be operated mutually, and the standardization and the sharing of the medical data among the multiple medical data centers are difficult to realize.

The general medical term library is a medical concept term standard library covering the whole medical process and taking an international general medical term set such as medical term system naming-clinical terms (SNOMED-CT), international disease classification and code (ICD-10), clinical drug standard naming (RxMorm), observation index identifier logic naming precoding system (LOINC) and the like as a core. After the multi-center medical data are mapped to the unified general medical term library, the operations such as big data analysis and the like can be conveniently carried out. Before data analysis is performed by using multi-center medical data, how to perform term standardization and cleaning on medical data of different medical information systems becomes a big problem.

In the prior art scheme [ CN 201510922676-a method and a system for automatically constructing a mapping relationship of medical terms based on participle codes ], [ CN 201710101827-a method and a device for data standardization processing of medical big data ] and [ CN 201710152584-a method and a device for determining medical synonyms ] more, from the perspective of Chinese participles, the participles of the medical terms are realized based on a character string matching and equally participle method, and then the similarity between the medical terms is calculated, so that the medical term with the highest similarity is selected to establish the mapping relationship with a target term. The scheme only aims at solving the matching problem of Chinese medical terms, but not solving the term standardization problem among the whole medical information systems, and only aims at mapping among the Chinese medical terms, and the standardization between the Chinese medical terms and a foreign standard medical term set is not realized.

In the patent documents in the prior art [ CN 201610173625-a method and system for automatically standardizing a medical data dictionary ], a cloud-based data dictionary standardization model is mainly established in a logical level, and a term set of all medical data centers needs to be extracted to the cloud for unified mapping processing.

At present, a relatively wide processing method is that an information technician and a doctor and other personnel with medical background knowledge determine the mapping relationship between data in a medical system and a general medical term library one by one, and then perform semi-automatic mapping by executing an sql script and the like to obtain standardized medical terms; another operation of standardized medical terms is to require medical personnel to enter data in a standardized format as the data is entered. However, the current methods have significant disadvantages:

1. the prior art only focuses on the establishment of mapping relationships between medical terms, and does not address the standardization of medical terms throughout medical information systems.

2. The existing scheme is specific to a certain specific data model, is not only lack of practicability and pertinence, but also is limited to mapping between Chinese medical terms, and cannot establish mapping relation with an international universal medical term library.

3. For medical data coded by using the international universal medical term set, the existing mapping relation among the existing term sets is not fully discovered, and for medical terms which do not use the standard medical term set in the medical data center and self-defined medical terms in the medical data center, the medical terms are generally solved by adopting a fuzzy matching mode or are directly abandoned, and a complete mapping process and mechanism are not established.

4. For term mapping that must involve personnel with medical background knowledge, a relatively friendly interactive interface and a standardized manual review and exception handling mechanism are not provided.

5. Because of data clutter, the term mapping and the subsequent data cleaning are not combined in the conventional medical term standardization process, the term standardization process cannot be actually completed by utilizing the mapping relation among terms, the quality of the mapped data cannot be ensured, and the subsequent data analysis result is seriously influenced.

6. For the medical data after mapping and cleaning, a detailed quality assessment mechanism is not established to ensure the accuracy of the term mapping and data cleaning.

7. The processing mechanism and the subsequent incremental updating mechanism after the updating of the related international general medical term library are not considered.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a multi-center medical term standardization system based on a general medical term library, which solves the multi-center medical term standardization problem based on a plurality of standardized medical term sets, simplifies the medical term mapping operation and enriches the whole process of medical term standardization.

The purpose of the invention is realized by the following technical scheme: a multi-center medical term standardization system based on a general medical term library comprises a source database, a database connection management module, a pre-analysis module, a term mapping unit, an increment updating module, an exception handling module and a multi-center interaction module;

the source database is distributed in the preposed servers of the medical data centers and stores the service data of the medical data centers;

the database connection management module: managing information required by accessing the source database, and providing support for the term mapping tool to access and modify the source database;

the pre-analysis module: automatically scanning a source database, counting the occurrence frequency of each medical term in the original medical data, giving a abandon suggestion to terms with the occurrence frequency less than a set threshold, and sending the terms with the occurrence frequency more than or equal to the set threshold to a term mapping unit for subsequent term mapping;

the term mapping unit comprises an automatic mapping module, a fuzzy matching module and a self-defined term module;

the automatic mapping module: supporting automatic mapping of medical terms, and realizing multidirectional mapping for terms using international universal medical term library standard codes according to the mapping relation among the existing universal medical term library standard codes;

the fuzzy matching module: traversing and inquiring medical terms which cannot be mapped directly according to the mapping relation between the standard codes in the conventional medical term library in a fuzzy matching manner, and providing several groups of standard medical terms with highest similarity for selection as target terms mapped by the terms;

the custom term module: for medical terms which cannot depend on the mapping relation between standard codes in the existing medical term library and cannot be matched with target terms in the existing general medical term library in a fuzzy manner, after a user generates a self-defined term application, the medical terms are sent to a multi-center interaction module to be checked and fed back;

the multi-center interaction module: after receiving the self-defined term application of each medical data center sent by the self-defined term module, auditing the self-defined terms, adding the self-defined terms which are approved as standard terms into the general medical term library, and sending the standard terms to each medical data center to keep the general medical term libraries of each medical data center consistent;

the incremental update module: aiming at the medical term standardization process of generating incremental data by a source database which executes medical term standardization mapping due to business reasons, calling a historical mapping relation record generated by a term mapping unit to complete term standardization mapping on the incremental data;

the exception handling module: recording the execution process of each module, generating an error log aiming at the error occurrence condition, and backtracking the whole medical term mapping process according to the error log.

Further, the system also comprises a data cleaning module which is used for formulating cleaning rules, giving weight to each data element and screening out the data with serious deletion, including cleaning dirty data of a structure level and an example level.

Further, the database connection management module specifically includes: the JDBC module is formed by classes and interfaces written by a programming language, a uniform access interface is provided for various databases, and the functions of establishing connection with the database or other data sources, sending SQL commands to the database and processing the returned results of the database are realized.

Further, after the database connection management module realizes connection to the source database, the pre-analysis module automatically scans the structural information of all data in the source database and the statistical information of specific fields thereof through the module to generate a statistical form, which includes two parts:

firstly, summarizing statistics on all tables in a source database, wherein the summarized statistics comprises field names, numerical value types, maximum lengths of all values, total rows in the tables and the proportion of null values in each table;

secondly, statistics is made on detailed information and occurrence frequency of specific terms in a specific table, and the terms with higher occurrence frequency are arranged according to the occurrence frequency from large to small for subsequent term mapping to preferentially select and process the terms with higher occurrence frequency.

Further, the automatic mapping module: aiming at terms of the standard codes of the international universal medical term library in the source database, after the codes belong to the standard, a target term set to be mapped is selected, if a referential mapping relation exists between the codes of the standard term set to which the terms belong in the source database and the codes of the target term set, the terms can automatically generate mapping SQL sentences, and the terms in the source database are automatically mapped and corresponding data loading is completed.

Further, in the fuzzy matching module, a specific method of fuzzy matching is as follows:

(1) term participles: performing word segmentation on all words in the general medical term library, and performing frequency statistics on each word segmentation to serve as basic word frequency; the source medical term M that needs to be fuzzy matched is participled before matching.

(2) Fuzzy matching: by comparing the probability difference between medical terms as the standard of the similarity, the specific operation is as follows:

(2.1) screening all terms including the participle from the general medical term library, and performing participle to combine the terms into a term set A;

(2.2) calculating the matching degree by using the following formula, and solving the average weighted probability of all terms in the term M and the term set A; wherein n is the number of participles obtained by each term, and P1, P2, P3 and P4 … Pn are the corresponding probabilities of each participle in the basic word frequency:

(2.3) subtracting the average weighted probability of all standard terms in the term set A from the term M needing fuzzy matching, taking the negative value as the matching degree, wherein the larger the matching degree is, the higher the similarity of the two is, and the formula is as follows:

S(M,A)＝|D(M)-D(A)|

further, the custom term module: defining constraints in advance to avoid conflict between the custom terms and the known standard terms; when the custom terms are added, the consistency of the added custom standard terms is required to be kept among all the medical data centers, repeated addition is prevented, and meanwhile, the data sharing of the multi-center medical data can be realized after the multi-center medical data is standardized through term mapping. Before adding the custom terms, a request for adding the custom terms is submitted to the multi-center interaction module, and the request content comprises the following steps: custom terms to be added, detailed descriptions of the custom terms, code of the custom terms; after the auditing of the relevant operators of the multi-center interaction module is passed, determining that no custom code similar to the repeated medical terms exists, generating a custom standard term code, and then calling the automatic mapping module to complete term mapping and loading of covered data; if the audit is not passed, returning the existing custom term code for the medical data center to complete the subsequent mapping or returning the reason of the failure of generating the custom term, generating an error document and prompting a user.

Furthermore, the multi-center interaction module is responsible for coordinating and unifying the general medical term libraries and term codes thereof of all the medical data centers, and the personnel with the highest authority of the multi-center interaction module checks and coordinates the use problem of the custom standard terms.

Further, the incremental updating module is used for a subsequent medical term standardization process of the medical data center which operates the medical term mapping, the incremental data is updated mainly according to the mapping record which is generated by the term mapping unit and is standardized by the previous terms, and the custom term module is repeatedly executed for the medical terms which still cannot complete the standardized mapping.

Further, the exception handling module: the log storage module is used for storing all logs during the operation of the system and recording whether each module operates normally; sorting saves an error log comprising: errors occurring during the operation of the system, errors occurring during the calling of each module, and errors occurring during the mapping of each module to a single term during the operation of each module; classifying and saving terms which are not mapped successfully, including terms which are omitted in the automatic analysis module and terms which are omitted in the self-defining module, and generating a failure term document; the exception handling module supports a database backtracking function by setting a timestamp on the database, and supports a user to backtrack the matched database to data of a specified date.

The invention has the beneficial effects that: the medical term standardization problem of a plurality of medical data centers is solved systematically, and the consistency of medical term expression of each medical data center is kept; the method comprises the steps of automatically realizing automatic scanning and analysis of a medical data center source database, and realizing automatic mapping of medical terms with standard codes on the basis; the complexity of medical term mapping is fully considered, and a spiral ascending process of automatic mapping to fuzzy matching mapping and then to user-defined term mapping is realized; the incremental updating mechanism fully utilizes the prior mapping records, greatly lightens the pressure of subsequent work and greatly improves the standardization of medical term mapping.

Drawings

FIG. 1 is a system flow diagram;

FIG. 2 is a system data flow diagram;

FIG. 3 is a schematic diagram of JDBC implementing database connection management;

FIG. 4 is a flow chart of medical term standardized mapping;

fig. 5 is a diagram of a multi-center interaction principle.

Detailed Description

The invention is described in further detail below with reference to the figures and specific examples.

As shown in fig. 1, the system for standardizing a multicenter medical term based on a universal medical term library provided by the present invention includes a source database, a database connection management module, a pre-analysis module, a term mapping unit, an increment update module, an exception handling module, a multicenter interaction module, and may further include a data cleaning module;

the source database is distributed in the prepositive servers of the medical data centers and stores service data of medical information systems such as HIS, LIS, PACS, EMR and the like of the medical data centers, wherein the service data comprises basic information of patients, information of treatment, cost information, diagnosis information, medication information, operation information, inspection information, examination information, text case history information and nursing vital sign information;

the database connection management module: managing (including loading, modifying, storing) information needed to access the source database, providing support for the term mapping tool to access and modify different types of source databases;

a pre-analysis module: automatically scanning a source database, counting the occurrence frequency of each medical term in the original medical data, giving a abandon suggestion to terms with the occurrence frequency less than a set threshold, and sending the terms with the occurrence frequency more than or equal to the set threshold to a term mapping unit for subsequent term mapping;

an automatic mapping module: supporting automatic mapping of medical terms, realizing multidirectional mapping for terms using standard codes of the international universal medical term library according to the mapping relation among the standard codes of the existing universal medical term library, and only needing to perform quality control on mapping results;

a fuzzy matching module: for medical terms which cannot be mapped directly according to the mapping relation between the standard codes in the existing medical term library, traversal query can be performed in the general medical term library in a fuzzy matching mode, and several groups of standard medical terms with highest similarity are provided for selection as target terms of the term mapping;

a custom term module: for medical terms which cannot depend on the mapping relation between standard codes in the existing medical term library and cannot be matched with target terms in the existing general medical term library in a fuzzy manner, after a user generates a self-defined term application (which can be determined by technical personnel and doctors together), the self-defined term application is sent to a multi-center interaction module for auditing and feeding back the self-defined term application;

a multi-center interaction module: after receiving the self-defined term application of each medical data center sent by the self-defined term module, auditing the self-defined terms, adding the self-defined terms which are approved as standard terms into the general medical term library, and sending the standard terms to each medical data center to keep the general medical term libraries of each medical data center consistent;

an incremental update module: aiming at the medical term standardization process of generating incremental data by a source database which executes medical term standardization mapping due to business reasons, calling a historical mapping relation record generated by a term mapping unit to complete term standardization mapping on the incremental data;

an exception handling module: the execution process of each module is recorded, especially an error log is generated aiming at the error occurrence condition, and the backtracking of the whole medical term mapping process can be ensured according to the error log.

A data cleaning module: and (4) formulating a cleaning rule, giving a weight to each data element, screening out data with serious deletion and improving the data quality.

The specific implementation of each module is as follows:

first, database connection management module

Managing (including loading, modifying, storing) information needed to access the source database, the source database and the target database may be physically the same database system. The implementation mode mainly comprises that a JDBC module is formed by classes and interfaces written by the existing java programming language, so that a uniform access interface is provided for various databases, the system has good cross-platform performance, and the functions of establishing connection with the databases or other data sources, sending SQL commands to the databases, processing return results of the databases and the like are mainly realized, and the schematic diagram of the system is shown in FIG. 3.

II, a pre-analysis module: after the database connection management module realizes the connection of the source database, the module automatically scans the structural information of all data in the source database and the statistical information of specific fields thereof to generate a statistical table A, and the table comprises two parts:

first, summary statistics of all tables in the source database, including field names, numeric types, maximum lengths of all values, total rows in the tables, and proportions of null values, are as follows:

A	B	C	D	E	F
						table name	Column name	Type of value	Maximum length	Line number	Empty running ratio
PATIENT	Patient identification	NUMBER	8	3000	0
						PATIENT	Name (I)	VARchar2	20	3000	0
PATIENT	Date of birth	DATE	10	3000	0

Secondly, the detailed information and the occurrence frequency of specific terms in a specific table are counted, and the terms with higher occurrence frequency can be preferentially selected for processing by the subsequent term mapping according to the arrangement of the occurrence frequency from large to small, the system can give a suggestion whether the terms with extremely low occurrence frequency need to participate in the subsequent term mapping, when the terms are undefined, all the terms are defaulted to participate in the mapping, and a user can adjust parameters according to specific conditions so as to determine the minimum occurrence frequency threshold value which does not participate in the subsequent term mapping, so that the subsequent term mapping process can be greatly simplified, certain workload is reduced, and data quality is improved.

A	B	C
			Encoding	Sex	Frequency of
Z03.001	For male	200
			Z03.002	Woman	100

For example: a certain term a is a non-standard term with a total amount of N2 and a total amount of data of N1, and the frequency of a is P ═ N1/N2. M is the set minimum frequency of occurrence participating in mapping, if P is more than or equal to M, A is a mapping object; p < M, A is a non-standard term with extremely low occurrence frequency and does not participate in subsequent term mapping, wherein M is a threshold set by a user according to actual conditions.

The document information generated by the module supports the export of formats such as pdf, excel, CSV and the like.

Third, automatic mapping module

Aiming at terms of the standard codes of the international universal medical term library in the source database, after the codes belong to the standard, a target term set to be mapped is selected, if a referential mapping relation exists between the codes of the standard term set to which the terms belong in the source database and the codes of the target term set, the terms can automatically generate mapping SQL sentences, and the terms in the source database are automatically mapped and corresponding data loading is completed.

Four, fuzzy matching module

And performing fuzzy matching on the part of medical terms and standard terms in the general medical term library one by one to give the standard terms of the recommended mapping and the standard term set codes where the standard terms are located. Fuzzy matching generally recommends a plurality of standard terms as matching objects, a professional with medical knowledge background is required to manually determine a unique matching object, and after a mapping relation is determined, an automatic mapping module is called to complete mapping of the medical terms and loading of data covered by the medical terms. The specific method of fuzzy matching is as follows:

(1) term participle

The medical term is mostly composed of a plurality of words and phrases, and herein, the medical term is subdivided into a plurality of words and phrases according to a specific rule.

(1.1) according to the method, all the words in the general medical term library are participated, and each participated word is subjected to frequency statistics to be used as basic word frequency.

(1.2) the source medical terms that need fuzzy matching are also participled before matching. For example: the term M is segmented to obtain [ segmentation 1, segmentation 2, and segmentation n of … ].

(2) Fuzzy matching

The invention compares the probability difference between medical terms as the standard of similarity, and the specific operation is as follows:

(2.1) screening all terms comprising the participles from the general medical term library, and performing participle to combine into a term set A { a, b, c, d, e, … };

and (2.2) calculating the matching degree by using the following formula, and calculating the average weighted probability of all terms in the term M and the term set A. Wherein n is the number of participles obtained by each term, and P1, P2, P3 and P4 … Pn are the corresponding probabilities of each participle in the basic word frequency:

(2.3) subtracting the average weighted probability of all standard terms in the term set A from the term M needing fuzzy matching, taking the negative value as the matching degree, wherein the larger the matching degree is, the higher the similarity of the two is. The formula is as follows:

S(M,A)＝|D(M)-D(A)|

the term "donkey-hide gelatin oral liquid for prolonging life" is taken as an example:

a) performing word segmentation on the general medical term library terms, and obtaining the probability of each word segmentation;

b) the term "donkey-hide gelatin oral liquid for prolonging life" is divided into words to obtain the "donkey-hide gelatin \ oral liquid for prolonging life". Inquiring corresponding probabilities in the basic word frequency to obtain donkey-hide gelatin frequency p1, longevity p2 and oral liquid p3 respectively, and calculating the average probability D (M) of each word segmentation;

c) inquiring all terms including donkey-hide gelatin, longevity and oral liquid in a general medical term library, and performing word segmentation to obtain a term set A { [ "donkey-hide gelatin", "calcium", "oral liquid" ], [ "donkey-hide gelatin", "granule" ], [ "donkey-hide gelatin", "blood enriching", "oral liquid" ] … }, and obtain D (a), D (b) and D (c) …;

d) finding matching degree and sequencing

Fuzzy matching terms	Generic database terminology	Degree of matching
			Donkey-hide gelatin oral liquid for prolonging life	Donkey-hide gelatin calcium oral liquid	S(M,a)
	Donkey-hide gelatin oral liquid for enriching blood	S(M,c)
				Donkey-hide gelatin granules	S(M,b)

Fifthly, self-defining term module

Under complex conditions, particularly for the actual conditions that the data of the domestic medical data center is redundant and more medical terms related to traditional Chinese medicines and traditional treatment means exist, the situation that the data cannot be matched with the international universal medical term library exists. The custom term module may define the necessary constraints in advance to avoid the custom terms from conflicting with known standard terms, such as: in terms of coding, it is mandatory that custom terms use a defined coding range.

When the custom terms are added, the consistency of the added custom standard terms is required to be kept among all the medical data centers, repeated addition is prevented, and meanwhile, the data sharing of the multi-center medical data can be realized after the multi-center medical data is standardized through term mapping. Therefore, when the term standardization mapping is carried out on the medical data of the medical data center, before the custom term is added, a report for adding the custom term is submitted to the multi-center interaction module, and the report content comprises: custom terms that need to be added, detailed descriptions of the custom terms, code of the custom terms (system auto-generated). After the auditing of the relevant central operators is passed, if the self-defined code without similar repeated medical terms is determined, a self-defined standard term code is generated, and then an automatic mapping module can be called to complete term mapping and loading of covered data; if the audit is not passed, returning the existing custom term code for the medical data center to complete the subsequent mapping or returning the reason of the failure in generating the custom term, generating an error document and prompting a user, wherein the operation schematic diagram of the custom term module is shown in fig. 4.

Six, multi-center interaction module

To achieve data standardization and data sharing among medical information systems of various medical data centers, all medical data centers are required to use a unified general medical term library and a unified medical term set code. The invention adopts a mode of uniformly adding after submitting the audit, and prevents each medical data center from generating term expression difference when customizing the standard terms. In the process of submitting, auditing and authorizing, the interaction problem of multiple medical data centers exists. The multi-center interaction module is responsible for coordinating and unifying the general medical term libraries and term codes of all the medical data centers, the highest authority personnel of the multi-center interaction module audits and coordinates the use problem of the self-defined standard terms, and the multi-center self-defined term interaction network is shown in figure 5.

Seven, incremental updating module

The subsequent medical term standardization process for the medical data center which operates the medical term mapping mainly realizes the updating of the incremental data according to the former term standardization mapping record generated by the term mapping unit, and repeatedly executes the self-defined term module for the medical terms which still can not complete the standardized mapping.

Eight, exception handling module

The log storage module is used for storing all logs during the operation of the system and recording whether each module operates normally; sorting saves an error log comprising: errors occurring during the operation of the system, errors occurring during the calling of each module, and errors occurring during the mapping of each module to a single term during the operation of each module; and classifying and saving the terms which are not mapped successfully, including the terms which are ignored in the automatic analysis module and the terms which are ignored in the self-defining module, and generating a failure term document. The exception handling module supports a database backtracking function by setting a timestamp on the database, and supports a user to backtrack the matched database to data of a specified date.

Nine, data cleaning module

After the standardized mapping of medical terms is completed, medical data cleaning is extremely necessary to improve the quality of medical data for subsequent data mining and analysis; providing a common data cleaning strategy, wherein the dirty data of the structure level and the instance level are mainly cleaned, and the dirty data respectively comprise data violating the requirements of data patterns and integrity constraints, such as data value out-of-range, attribute dependency relationship damage, uniqueness relationship damage, reference integrity damage and the like, and data corresponding to error attributes and dependency relationship damage among the attributes, such as missing values, repeated records, contradictory records, reference errors and the like; the integrity, uniqueness, authority, legality and consistency of the data are met to the maximum extent, data redundancy is reduced, and data quality is improved.

1) Structure level cleaning rules: unified data schema (including data type) definitions; a unified integrity constraint definition; a unified function dependency requirement definition.

2) Example level cleaning rules: and analyzing dirty data, formulating a cleaning rule, evaluating and verifying, and recording a cleaning action into a log for tracing.

The invention is a collaborative mode designed for realizing data sharing among a plurality of medical data centers (mainly hospitals) and fully ensuring the data security of each medical data center along with the continuous improvement of the requirements of the data quantity and quality of the current data mining and analysis, so that the medical process can be optimized, the development of related scientific research is accelerated and the medical service quality of patients is finally improved by sharing medical data. The premise of data sharing among multiple medical data centers is the standardization of medical data, which comprises two parts of contents, namely, the standardization of a data structure and the standardization of medical terms, wherein the contents are designed for the standardization of the latter. The technical points of the invention are summarized as follows:

1. through the interaction among all the modules, the automatic analysis and scanning of the database in the medical information system are realized, the statistical information such as the occurrence frequency of the medical terms in the database is returned, and the practical basis is provided for the subsequent medical term mapping and performance optimization.

2. The mapping of the medical data covered by the medical terms of the part is automatically realized according to the mapping relation between the existing medical term set codes for the data adopting the international universal medical term set codes.

3. For data which is not coded by a standard term set in a medical data center, self-defined medical terms or domestic unique medical terms such as traditional Chinese medicines, the method supports information such as the occurrence frequency of the data in the medical data center according to the medical terms, and supports relevant personnel to visually carry out reasonable and scientific fuzzy matching or directly increase the self-defined standard terms.

4. The interaction requirements between the medical data centers are completed at regular time, the standardization of the universal medical databases of all the medical data centers is kept uniform after the medical terms of all the medical data centers are standardized, and data sharing can be realized.

5. And cleaning the data according to a cleaning strategy to ensure the data quality.

6. All error exceptions are recorded and written into the log, so that functions of error checking, quality evaluation and the like are conveniently realized.

7. The established mapping relation between the medical terms in the medical data center and the international universal standard medical term set is fully utilized, and semi-automatic or even automatic mapping and standardization of the terms in the subsequent medical data center are realized.

The above are merely examples of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement and the like, which are not made by the inventive work, are included in the scope of protection of the present invention within the spirit and principle of the present invention.

Claims

1. a multi-center medical term standardization system based on general medical terminology library, is characterized in that, this system comprises source database, database connection management module, pre-analysis module, term mapping unit, incremental update module, exception handling module and multiple Center interaction module;

The source database is distributed in the front-end servers of each medical data center, and stores the business data of each medical data center;

The database connection management module: manages the information required for accessing the source database, and provides support for the term mapping tool to access and modify the source database;

The pre-analysis module: automatically scans the source database, counts the frequency of occurrence of each medical term in the original medical data, gives suggestions for discarding the terms whose frequency of occurrence is less than the set threshold, and sends the terms greater than or equal to the set threshold to the term mapping unit Perform subsequent term mapping;

The term mapping unit includes an automatic mapping module, a fuzzy matching module and a custom terminology module;

The automatic mapping module: supports the automatic mapping of medical terms, and for the terms encoded by the international general medical terminology base standard, multi-directional mapping is realized according to the mapping relationship between the existing standard encodings of the general medical terminology base;

The fuzzy matching module: for medical terms that cannot be directly mapped according to the mapping relationship between standard codes in the existing medical terminology database, a traversal query is performed in the general medical terminology database by means of fuzzy matching, and several sets of standards with the highest similarity are provided. The medical term is selected as the target term for this term mapping; the specific method of fuzzy matching is as follows:

(1) Term segmentation: perform word segmentation on all words in the general medical terminology database, and count the frequency of each word segment as the basic word frequency; perform word segmentation on the source medical term M that needs fuzzy matching before matching;

(2) Fuzzy matching: By comparing the probability difference between medical terms as the standard of similarity, the specific operations are as follows:

(2.1) Screen out all terms including word segmentation from the general medical terminology database, perform word segmentation, and combine them into term set A;

(2.2) Use the following formula to calculate the degree of matching, and find the average weighted probability of term M and all terms in term set A; where n is the number of word segments obtained for each term, and P1, P2, P3, P4...Pn are each The probability that the word segmentation corresponds to the basic word frequency:

(2.3) Compare the average weighted probability of all standard terms in term set A with the term M that needs fuzzy matching, and take the negative value as the matching degree. The greater the matching degree, the higher the similarity between the two. The formula is as follows:

S(M,A)=|D(M)-D(A)|

Described custom terminology module: For medical terms that cannot rely on the mapping relationship between standard codes in the existing medical termbase and cannot be vaguely matched to the target term in the existing general medical termbase, after the user generates a custom term application, Send it to the multi-center interaction module for review and feedback;

The multi-center interaction module: after receiving the custom term application from each medical data center sent by the custom term module, the custom term will be reviewed, and the reviewed custom term will be added to the general medical term base as a standard term , and send it to each medical data center to keep the general medical terminology database consistent in each medical data center;

The incremental update module: for the medical term standardization process in which incremental data is generated due to business reasons in the source database that has performed standardized mapping of medical terms, the historical mapping relationship records generated by the term mapping unit are retrieved to complete the term standardized mapping of incremental data. ;

The exception handling module: records the execution process of each of the above modules, generates an error log in response to the occurrence of errors, and can perform backtracking of the entire process of medical term mapping according to the error log.

2. a kind of multi-center medical term standardization system based on general medical terminology library according to claim 1, is characterized in that, this system also comprises data cleaning module, is used for formulating cleaning rules, gives weight to each data element, Filter out severely missing data, including cleaning dirty data at the structure level and instance level.

3. a kind of multi-center medical term standardization system based on general medical term library according to claim 1, is characterized in that, described database connection management module specifically comprises: the class and interface that programming language is written form JDBC module, for Various types of databases provide a unified access interface to realize the functions of establishing a connection with the database or other data sources, sending SQL commands to the database, and processing the results returned by the database.

4. a kind of multi-center medical terminology standardization system based on general medical terminology library according to claim 1, is characterized in that, described pre-analysis module after database connection management module realizes connection to source database, through this module automatically Scan the structure information of all data in the source database and the statistical information of its specific fields to generate a statistical table, including two parts:

First, the general statistics of all tables in the source database, including the field name, value type, maximum length of all values, the total number of rows in the table, and the proportion of empty values in each table;

Secondly, the detailed information and frequency of occurrence of specific terms in a specific table are counted, and they are arranged in descending order of occurrence frequency for subsequent term mapping to preferentially select terms with higher occurrence frequency for processing. Whether it is necessary to participate in the follow-up term mapping suggestion for very low-frequency terms. If not defined, all terms participate in the mapping by default. Users can also adjust it according to the specific situation to determine the minimum occurrence frequency threshold for not participating in the subsequent term mapping.

5. a kind of multi-center medical terminology standardization system based on general medical terminology library according to claim 1, is characterized in that, described automatic mapping module: for the term of international general medical terminology library standard coding in the source database, After determining the standard to which the code belongs, select the target term set to be mapped. If there is a reference mapping relationship between the standard term set code to which the term belongs to and the target term set code in the source database, this part of the terms can be automatically mapped SQL statement to complete automatic mapping of terms in the source database and corresponding data loading.

6. A multi-center medical term standardization system based on a general medical term library according to claim 1, wherein the custom terminology module: defines constraints in advance to avoid custom terms and known standard terms Conflict with each other; when adding custom terms, each medical data center needs to maintain the consistency of the custom standard terms that have been added to prevent repeated additions, and at the same time ensure that multi-center medical data can realize data sharing after term mapping standardization; Before adding custom terms, you need to submit an application for adding custom terms to the multi-center interaction module. The application content includes: the custom terms to be added, the specific description of the custom terms, and the code of the custom terms; to be added to the multi-center interaction module After the relevant operators have passed the review and determined that there is no custom code similar to repeated medical terms, a custom standard term code will be generated, and then the automatic mapping module can be called to complete the term mapping and the loading of the covered data; if the review fails , the existing custom term code is returned for the medical data center to complete subsequent mapping or the reason for the failure of custom term generation is returned, an error document is generated and prompted to the user.

7. A kind of multi-center medical term standardization system based on general medical term base according to claim 1, it is characterized in that, described multi-center interaction module is responsible for the coordination of general medical term base and term coding of each medical data center With Unity, the use of custom standard terminology is reviewed and coordinated by the highest authority personnel of the Multicenter Interaction Module.

8. A multi-center medical term standardization system based on a general medical term library according to claim 1, wherein the incremental update module is used for subsequent medical terms in a medical data center that has operated medical term mapping The standardization process mainly implements the update of incremental data according to the previous term standardized mapping records generated by the term mapping unit. For medical terms that still cannot complete standardized mapping, the custom terminology module is repeatedly executed.

9. a kind of multi-center medical term standardization system based on general medical term library according to claim 1, is characterized in that, described exception handling module: is used to save all logs when the system is running, and records whether each module is normal Run; classify and save error logs, including: errors that occur when the system is running, errors that occur when each module is called, and errors that occur when each module is running when mapping a single term; classify and save terms that have not been successfully mapped, including in automatic analysis. The terms ignored in the module and in the custom module will generate the failed term document; the exception handling module supports the database backtracking function by setting the timestamp on the database, and supports the user to backtrack the matched database to the specified date. data.