Nothing Special   »   [go: up one dir, main page]

CN110349639B - Multi-center medical term standardization system based on general medical term library - Google Patents

Multi-center medical term standardization system based on general medical term library Download PDF

Info

Publication number
CN110349639B
CN110349639B CN201910629244.9A CN201910629244A CN110349639B CN 110349639 B CN110349639 B CN 110349639B CN 201910629244 A CN201910629244 A CN 201910629244A CN 110349639 B CN110349639 B CN 110349639B
Authority
CN
China
Prior art keywords
term
medical
module
mapping
terms
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910629244.9A
Other languages
Chinese (zh)
Other versions
CN110349639A (en
Inventor
李劲松
王执晓
周天舒
董凯奇
田雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN201910629244.9A priority Critical patent/CN110349639B/en
Publication of CN110349639A publication Critical patent/CN110349639A/en
Priority to JP2021533326A priority patent/JP7093593B2/en
Priority to PCT/CN2020/083586 priority patent/WO2020233256A1/en
Application granted granted Critical
Publication of CN110349639B publication Critical patent/CN110349639B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种基于通用医疗术语库的多中心医疗术语标准化系统,该系统包括源数据库、数据库连接管理模块、预分析模块、术语映射单元、增量更新模块、异常处理模块和多中心交互模块;本发明解决了多家医疗数据中心的医疗术语标准化问题,且保持各医疗数据中心医疗术语表达的一致性;自动化实现医疗数据中心源数据库的扫描与分析,并在此基础上实现存在标准编码的医疗术语的自动化映射;充分考虑医疗术语映射必然存在的复杂性,实现了自动化映射到模糊匹配映射再到自定义术语映射这样一个螺旋上升的过程;增量更新机制充分利用了以往的映射记录,极大地减轻了后续工作的压力,并大大提高了医疗术语映射的标准化程度。

Figure 201910629244

The invention discloses a multi-center medical term standardization system based on a general medical term library. The system includes a source database, a database connection management module, a pre-analysis module, a term mapping unit, an incremental update module, an exception processing module and a multi-center interaction module. module; the invention solves the problem of standardization of medical terminology in multiple medical data centers, and maintains the consistency of medical terminology expression in each medical data center; automatically realizes scanning and analysis of the source database of the medical data center, and realizes existing standards on this basis Automatic mapping of coded medical terms; fully consider the inevitable complexity of medical term mapping, and realize a spiral process of automatic mapping to fuzzy matching mapping and then to custom term mapping; incremental update mechanism makes full use of previous mappings records, which greatly reduces the stress of follow-up work and greatly improves the standardization of medical term mapping.

Figure 201910629244

Description

Multi-center medical term standardization system based on general medical term library
Technical Field
The invention belongs to the field of term standardization, and particularly relates to a multi-center medical term standardization system based on a general medical term library.
Background
With the rapid development of medical informatization, the types and scales of medical data are rapidly increased, and it is a necessary trend to perform data analysis and mining by using medical data of multiple medical data centers (simply referred to as "multiple centers") to provide support for clinical decision, medical management service and scientific research. However, the related standards of medical terms in China are deficient, the system is not complete, and medical information system manufacturers are numerous, so that the isomerism of term names and codes among medical data centers and even in medical data centers is serious, and a large amount of semi-structured and unstructured data is accompanied; the internationally mature related term sets are limited to be applied domestically, and the mapping relation among the internationally existing standard term sets is difficult to be applied to the standardization of domestic medical terms due to language barriers; due to the reasons, the medical information systems cannot be operated mutually, and the standardization and the sharing of the medical data among the multiple medical data centers are difficult to realize.
The general medical term library is a medical concept term standard library covering the whole medical process and taking an international general medical term set such as medical term system naming-clinical terms (SNOMED-CT), international disease classification and code (ICD-10), clinical drug standard naming (RxMorm), observation index identifier logic naming precoding system (LOINC) and the like as a core. After the multi-center medical data are mapped to the unified general medical term library, the operations such as big data analysis and the like can be conveniently carried out. Before data analysis is performed by using multi-center medical data, how to perform term standardization and cleaning on medical data of different medical information systems becomes a big problem.
In the prior art scheme [ CN 201510922676-a method and a system for automatically constructing a mapping relationship of medical terms based on participle codes ], [ CN 201710101827-a method and a device for data standardization processing of medical big data ] and [ CN 201710152584-a method and a device for determining medical synonyms ] more, from the perspective of Chinese participles, the participles of the medical terms are realized based on a character string matching and equally participle method, and then the similarity between the medical terms is calculated, so that the medical term with the highest similarity is selected to establish the mapping relationship with a target term. The scheme only aims at solving the matching problem of Chinese medical terms, but not solving the term standardization problem among the whole medical information systems, and only aims at mapping among the Chinese medical terms, and the standardization between the Chinese medical terms and a foreign standard medical term set is not realized.
In the patent documents in the prior art [ CN 201610173625-a method and system for automatically standardizing a medical data dictionary ], a cloud-based data dictionary standardization model is mainly established in a logical level, and a term set of all medical data centers needs to be extracted to the cloud for unified mapping processing.
At present, a relatively wide processing method is that an information technician and a doctor and other personnel with medical background knowledge determine the mapping relationship between data in a medical system and a general medical term library one by one, and then perform semi-automatic mapping by executing an sql script and the like to obtain standardized medical terms; another operation of standardized medical terms is to require medical personnel to enter data in a standardized format as the data is entered. However, the current methods have significant disadvantages:
1. the prior art only focuses on the establishment of mapping relationships between medical terms, and does not address the standardization of medical terms throughout medical information systems.
2. The existing scheme is specific to a certain specific data model, is not only lack of practicability and pertinence, but also is limited to mapping between Chinese medical terms, and cannot establish mapping relation with an international universal medical term library.
3. For medical data coded by using the international universal medical term set, the existing mapping relation among the existing term sets is not fully discovered, and for medical terms which do not use the standard medical term set in the medical data center and self-defined medical terms in the medical data center, the medical terms are generally solved by adopting a fuzzy matching mode or are directly abandoned, and a complete mapping process and mechanism are not established.
4. For term mapping that must involve personnel with medical background knowledge, a relatively friendly interactive interface and a standardized manual review and exception handling mechanism are not provided.
5. Because of data clutter, the term mapping and the subsequent data cleaning are not combined in the conventional medical term standardization process, the term standardization process cannot be actually completed by utilizing the mapping relation among terms, the quality of the mapped data cannot be ensured, and the subsequent data analysis result is seriously influenced.
6. For the medical data after mapping and cleaning, a detailed quality assessment mechanism is not established to ensure the accuracy of the term mapping and data cleaning.
7. The processing mechanism and the subsequent incremental updating mechanism after the updating of the related international general medical term library are not considered.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a multi-center medical term standardization system based on a general medical term library, which solves the multi-center medical term standardization problem based on a plurality of standardized medical term sets, simplifies the medical term mapping operation and enriches the whole process of medical term standardization.
The purpose of the invention is realized by the following technical scheme: a multi-center medical term standardization system based on a general medical term library comprises a source database, a database connection management module, a pre-analysis module, a term mapping unit, an increment updating module, an exception handling module and a multi-center interaction module;
the source database is distributed in the preposed servers of the medical data centers and stores the service data of the medical data centers;
the database connection management module: managing information required by accessing the source database, and providing support for the term mapping tool to access and modify the source database;
the pre-analysis module: automatically scanning a source database, counting the occurrence frequency of each medical term in the original medical data, giving a abandon suggestion to terms with the occurrence frequency less than a set threshold, and sending the terms with the occurrence frequency more than or equal to the set threshold to a term mapping unit for subsequent term mapping;
the term mapping unit comprises an automatic mapping module, a fuzzy matching module and a self-defined term module;
the automatic mapping module: supporting automatic mapping of medical terms, and realizing multidirectional mapping for terms using international universal medical term library standard codes according to the mapping relation among the existing universal medical term library standard codes;
the fuzzy matching module: traversing and inquiring medical terms which cannot be mapped directly according to the mapping relation between the standard codes in the conventional medical term library in a fuzzy matching manner, and providing several groups of standard medical terms with highest similarity for selection as target terms mapped by the terms;
the custom term module: for medical terms which cannot depend on the mapping relation between standard codes in the existing medical term library and cannot be matched with target terms in the existing general medical term library in a fuzzy manner, after a user generates a self-defined term application, the medical terms are sent to a multi-center interaction module to be checked and fed back;
the multi-center interaction module: after receiving the self-defined term application of each medical data center sent by the self-defined term module, auditing the self-defined terms, adding the self-defined terms which are approved as standard terms into the general medical term library, and sending the standard terms to each medical data center to keep the general medical term libraries of each medical data center consistent;
the incremental update module: aiming at the medical term standardization process of generating incremental data by a source database which executes medical term standardization mapping due to business reasons, calling a historical mapping relation record generated by a term mapping unit to complete term standardization mapping on the incremental data;
the exception handling module: recording the execution process of each module, generating an error log aiming at the error occurrence condition, and backtracking the whole medical term mapping process according to the error log.
Further, the system also comprises a data cleaning module which is used for formulating cleaning rules, giving weight to each data element and screening out the data with serious deletion, including cleaning dirty data of a structure level and an example level.
Further, the database connection management module specifically includes: the JDBC module is formed by classes and interfaces written by a programming language, a uniform access interface is provided for various databases, and the functions of establishing connection with the database or other data sources, sending SQL commands to the database and processing the returned results of the database are realized.
Further, after the database connection management module realizes connection to the source database, the pre-analysis module automatically scans the structural information of all data in the source database and the statistical information of specific fields thereof through the module to generate a statistical form, which includes two parts:
firstly, summarizing statistics on all tables in a source database, wherein the summarized statistics comprises field names, numerical value types, maximum lengths of all values, total rows in the tables and the proportion of null values in each table;
secondly, statistics is made on detailed information and occurrence frequency of specific terms in a specific table, and the terms with higher occurrence frequency are arranged according to the occurrence frequency from large to small for subsequent term mapping to preferentially select and process the terms with higher occurrence frequency.
Further, the automatic mapping module: aiming at terms of the standard codes of the international universal medical term library in the source database, after the codes belong to the standard, a target term set to be mapped is selected, if a referential mapping relation exists between the codes of the standard term set to which the terms belong in the source database and the codes of the target term set, the terms can automatically generate mapping SQL sentences, and the terms in the source database are automatically mapped and corresponding data loading is completed.
Further, in the fuzzy matching module, a specific method of fuzzy matching is as follows:
(1) term participles: performing word segmentation on all words in the general medical term library, and performing frequency statistics on each word segmentation to serve as basic word frequency; the source medical term M that needs to be fuzzy matched is participled before matching.
(2) Fuzzy matching: by comparing the probability difference between medical terms as the standard of the similarity, the specific operation is as follows:
(2.1) screening all terms including the participle from the general medical term library, and performing participle to combine the terms into a term set A;
(2.2) calculating the matching degree by using the following formula, and solving the average weighted probability of all terms in the term M and the term set A; wherein n is the number of participles obtained by each term, and P1, P2, P3 and P4 … Pn are the corresponding probabilities of each participle in the basic word frequency:
Figure BDA0002128204250000041
(2.3) subtracting the average weighted probability of all standard terms in the term set A from the term M needing fuzzy matching, taking the negative value as the matching degree, wherein the larger the matching degree is, the higher the similarity of the two is, and the formula is as follows:
S(M,A)=|D(M)-D(A)|
further, the custom term module: defining constraints in advance to avoid conflict between the custom terms and the known standard terms; when the custom terms are added, the consistency of the added custom standard terms is required to be kept among all the medical data centers, repeated addition is prevented, and meanwhile, the data sharing of the multi-center medical data can be realized after the multi-center medical data is standardized through term mapping. Before adding the custom terms, a request for adding the custom terms is submitted to the multi-center interaction module, and the request content comprises the following steps: custom terms to be added, detailed descriptions of the custom terms, code of the custom terms; after the auditing of the relevant operators of the multi-center interaction module is passed, determining that no custom code similar to the repeated medical terms exists, generating a custom standard term code, and then calling the automatic mapping module to complete term mapping and loading of covered data; if the audit is not passed, returning the existing custom term code for the medical data center to complete the subsequent mapping or returning the reason of the failure of generating the custom term, generating an error document and prompting a user.
Furthermore, the multi-center interaction module is responsible for coordinating and unifying the general medical term libraries and term codes thereof of all the medical data centers, and the personnel with the highest authority of the multi-center interaction module checks and coordinates the use problem of the custom standard terms.
Further, the incremental updating module is used for a subsequent medical term standardization process of the medical data center which operates the medical term mapping, the incremental data is updated mainly according to the mapping record which is generated by the term mapping unit and is standardized by the previous terms, and the custom term module is repeatedly executed for the medical terms which still cannot complete the standardized mapping.
Further, the exception handling module: the log storage module is used for storing all logs during the operation of the system and recording whether each module operates normally; sorting saves an error log comprising: errors occurring during the operation of the system, errors occurring during the calling of each module, and errors occurring during the mapping of each module to a single term during the operation of each module; classifying and saving terms which are not mapped successfully, including terms which are omitted in the automatic analysis module and terms which are omitted in the self-defining module, and generating a failure term document; the exception handling module supports a database backtracking function by setting a timestamp on the database, and supports a user to backtrack the matched database to data of a specified date.
The invention has the beneficial effects that: the medical term standardization problem of a plurality of medical data centers is solved systematically, and the consistency of medical term expression of each medical data center is kept; the method comprises the steps of automatically realizing automatic scanning and analysis of a medical data center source database, and realizing automatic mapping of medical terms with standard codes on the basis; the complexity of medical term mapping is fully considered, and a spiral ascending process of automatic mapping to fuzzy matching mapping and then to user-defined term mapping is realized; the incremental updating mechanism fully utilizes the prior mapping records, greatly lightens the pressure of subsequent work and greatly improves the standardization of medical term mapping.
Drawings
FIG. 1 is a system flow diagram;
FIG. 2 is a system data flow diagram;
FIG. 3 is a schematic diagram of JDBC implementing database connection management;
FIG. 4 is a flow chart of medical term standardized mapping;
fig. 5 is a diagram of a multi-center interaction principle.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples.
As shown in fig. 1, the system for standardizing a multicenter medical term based on a universal medical term library provided by the present invention includes a source database, a database connection management module, a pre-analysis module, a term mapping unit, an increment update module, an exception handling module, a multicenter interaction module, and may further include a data cleaning module;
the source database is distributed in the prepositive servers of the medical data centers and stores service data of medical information systems such as HIS, LIS, PACS, EMR and the like of the medical data centers, wherein the service data comprises basic information of patients, information of treatment, cost information, diagnosis information, medication information, operation information, inspection information, examination information, text case history information and nursing vital sign information;
the database connection management module: managing (including loading, modifying, storing) information needed to access the source database, providing support for the term mapping tool to access and modify different types of source databases;
a pre-analysis module: automatically scanning a source database, counting the occurrence frequency of each medical term in the original medical data, giving a abandon suggestion to terms with the occurrence frequency less than a set threshold, and sending the terms with the occurrence frequency more than or equal to the set threshold to a term mapping unit for subsequent term mapping;
the term mapping unit comprises an automatic mapping module, a fuzzy matching module and a self-defined term module;
an automatic mapping module: supporting automatic mapping of medical terms, realizing multidirectional mapping for terms using standard codes of the international universal medical term library according to the mapping relation among the standard codes of the existing universal medical term library, and only needing to perform quality control on mapping results;
a fuzzy matching module: for medical terms which cannot be mapped directly according to the mapping relation between the standard codes in the existing medical term library, traversal query can be performed in the general medical term library in a fuzzy matching mode, and several groups of standard medical terms with highest similarity are provided for selection as target terms of the term mapping;
a custom term module: for medical terms which cannot depend on the mapping relation between standard codes in the existing medical term library and cannot be matched with target terms in the existing general medical term library in a fuzzy manner, after a user generates a self-defined term application (which can be determined by technical personnel and doctors together), the self-defined term application is sent to a multi-center interaction module for auditing and feeding back the self-defined term application;
a multi-center interaction module: after receiving the self-defined term application of each medical data center sent by the self-defined term module, auditing the self-defined terms, adding the self-defined terms which are approved as standard terms into the general medical term library, and sending the standard terms to each medical data center to keep the general medical term libraries of each medical data center consistent;
an incremental update module: aiming at the medical term standardization process of generating incremental data by a source database which executes medical term standardization mapping due to business reasons, calling a historical mapping relation record generated by a term mapping unit to complete term standardization mapping on the incremental data;
an exception handling module: the execution process of each module is recorded, especially an error log is generated aiming at the error occurrence condition, and the backtracking of the whole medical term mapping process can be ensured according to the error log.
A data cleaning module: and (4) formulating a cleaning rule, giving a weight to each data element, screening out data with serious deletion and improving the data quality.
The specific implementation of each module is as follows:
first, database connection management module
Managing (including loading, modifying, storing) information needed to access the source database, the source database and the target database may be physically the same database system. The implementation mode mainly comprises that a JDBC module is formed by classes and interfaces written by the existing java programming language, so that a uniform access interface is provided for various databases, the system has good cross-platform performance, and the functions of establishing connection with the databases or other data sources, sending SQL commands to the databases, processing return results of the databases and the like are mainly realized, and the schematic diagram of the system is shown in FIG. 3.
II, a pre-analysis module: after the database connection management module realizes the connection of the source database, the module automatically scans the structural information of all data in the source database and the statistical information of specific fields thereof to generate a statistical table A, and the table comprises two parts:
first, summary statistics of all tables in the source database, including field names, numeric types, maximum lengths of all values, total rows in the tables, and proportions of null values, are as follows:
A B C D E F
table name Column name Type of value Maximum length Line number Empty running ratio
PATIENT Patient identification NUMBER 8 3000 0
PATIENT Name (I) VARchar2 20 3000 0
PATIENT Date of birth DATE 10 3000 0
Secondly, the detailed information and the occurrence frequency of specific terms in a specific table are counted, and the terms with higher occurrence frequency can be preferentially selected for processing by the subsequent term mapping according to the arrangement of the occurrence frequency from large to small, the system can give a suggestion whether the terms with extremely low occurrence frequency need to participate in the subsequent term mapping, when the terms are undefined, all the terms are defaulted to participate in the mapping, and a user can adjust parameters according to specific conditions so as to determine the minimum occurrence frequency threshold value which does not participate in the subsequent term mapping, so that the subsequent term mapping process can be greatly simplified, certain workload is reduced, and data quality is improved.
A B C
Encoding Sex Frequency of
Z03.001 For male 200
Z03.002 Woman 100
For example: a certain term a is a non-standard term with a total amount of N2 and a total amount of data of N1, and the frequency of a is P ═ N1/N2. M is the set minimum frequency of occurrence participating in mapping, if P is more than or equal to M, A is a mapping object; p < M, A is a non-standard term with extremely low occurrence frequency and does not participate in subsequent term mapping, wherein M is a threshold set by a user according to actual conditions.
The document information generated by the module supports the export of formats such as pdf, excel, CSV and the like.
Third, automatic mapping module
Aiming at terms of the standard codes of the international universal medical term library in the source database, after the codes belong to the standard, a target term set to be mapped is selected, if a referential mapping relation exists between the codes of the standard term set to which the terms belong in the source database and the codes of the target term set, the terms can automatically generate mapping SQL sentences, and the terms in the source database are automatically mapped and corresponding data loading is completed.
Four, fuzzy matching module
And performing fuzzy matching on the part of medical terms and standard terms in the general medical term library one by one to give the standard terms of the recommended mapping and the standard term set codes where the standard terms are located. Fuzzy matching generally recommends a plurality of standard terms as matching objects, a professional with medical knowledge background is required to manually determine a unique matching object, and after a mapping relation is determined, an automatic mapping module is called to complete mapping of the medical terms and loading of data covered by the medical terms. The specific method of fuzzy matching is as follows:
(1) term participle
The medical term is mostly composed of a plurality of words and phrases, and herein, the medical term is subdivided into a plurality of words and phrases according to a specific rule.
(1.1) according to the method, all the words in the general medical term library are participated, and each participated word is subjected to frequency statistics to be used as basic word frequency.
(1.2) the source medical terms that need fuzzy matching are also participled before matching. For example: the term M is segmented to obtain [ segmentation 1, segmentation 2, and segmentation n of … ].
(2) Fuzzy matching
The invention compares the probability difference between medical terms as the standard of similarity, and the specific operation is as follows:
(2.1) screening all terms comprising the participles from the general medical term library, and performing participle to combine into a term set A { a, b, c, d, e, … };
and (2.2) calculating the matching degree by using the following formula, and calculating the average weighted probability of all terms in the term M and the term set A. Wherein n is the number of participles obtained by each term, and P1, P2, P3 and P4 … Pn are the corresponding probabilities of each participle in the basic word frequency:
Figure BDA0002128204250000081
(2.3) subtracting the average weighted probability of all standard terms in the term set A from the term M needing fuzzy matching, taking the negative value as the matching degree, wherein the larger the matching degree is, the higher the similarity of the two is. The formula is as follows:
S(M,A)=|D(M)-D(A)|
the term "donkey-hide gelatin oral liquid for prolonging life" is taken as an example:
a) performing word segmentation on the general medical term library terms, and obtaining the probability of each word segmentation;
b) the term "donkey-hide gelatin oral liquid for prolonging life" is divided into words to obtain the "donkey-hide gelatin \ oral liquid for prolonging life". Inquiring corresponding probabilities in the basic word frequency to obtain donkey-hide gelatin frequency p1, longevity p2 and oral liquid p3 respectively, and calculating the average probability D (M) of each word segmentation;
c) inquiring all terms including donkey-hide gelatin, longevity and oral liquid in a general medical term library, and performing word segmentation to obtain a term set A { [ "donkey-hide gelatin", "calcium", "oral liquid" ], [ "donkey-hide gelatin", "granule" ], [ "donkey-hide gelatin", "blood enriching", "oral liquid" ] … }, and obtain D (a), D (b) and D (c) …;
d) finding matching degree and sequencing
Fuzzy matching terms Generic database terminology Degree of matching
Donkey-hide gelatin oral liquid for prolonging life Donkey-hide gelatin calcium oral liquid S(M,a)
Donkey-hide gelatin oral liquid for enriching blood S(M,c)
Donkey-hide gelatin granules S(M,b)
Fifthly, self-defining term module
Under complex conditions, particularly for the actual conditions that the data of the domestic medical data center is redundant and more medical terms related to traditional Chinese medicines and traditional treatment means exist, the situation that the data cannot be matched with the international universal medical term library exists. The custom term module may define the necessary constraints in advance to avoid the custom terms from conflicting with known standard terms, such as: in terms of coding, it is mandatory that custom terms use a defined coding range.
When the custom terms are added, the consistency of the added custom standard terms is required to be kept among all the medical data centers, repeated addition is prevented, and meanwhile, the data sharing of the multi-center medical data can be realized after the multi-center medical data is standardized through term mapping. Therefore, when the term standardization mapping is carried out on the medical data of the medical data center, before the custom term is added, a report for adding the custom term is submitted to the multi-center interaction module, and the report content comprises: custom terms that need to be added, detailed descriptions of the custom terms, code of the custom terms (system auto-generated). After the auditing of the relevant central operators is passed, if the self-defined code without similar repeated medical terms is determined, a self-defined standard term code is generated, and then an automatic mapping module can be called to complete term mapping and loading of covered data; if the audit is not passed, returning the existing custom term code for the medical data center to complete the subsequent mapping or returning the reason of the failure in generating the custom term, generating an error document and prompting a user, wherein the operation schematic diagram of the custom term module is shown in fig. 4.
Six, multi-center interaction module
To achieve data standardization and data sharing among medical information systems of various medical data centers, all medical data centers are required to use a unified general medical term library and a unified medical term set code. The invention adopts a mode of uniformly adding after submitting the audit, and prevents each medical data center from generating term expression difference when customizing the standard terms. In the process of submitting, auditing and authorizing, the interaction problem of multiple medical data centers exists. The multi-center interaction module is responsible for coordinating and unifying the general medical term libraries and term codes of all the medical data centers, the highest authority personnel of the multi-center interaction module audits and coordinates the use problem of the self-defined standard terms, and the multi-center self-defined term interaction network is shown in figure 5.
Seven, incremental updating module
The subsequent medical term standardization process for the medical data center which operates the medical term mapping mainly realizes the updating of the incremental data according to the former term standardization mapping record generated by the term mapping unit, and repeatedly executes the self-defined term module for the medical terms which still can not complete the standardized mapping.
Eight, exception handling module
The log storage module is used for storing all logs during the operation of the system and recording whether each module operates normally; sorting saves an error log comprising: errors occurring during the operation of the system, errors occurring during the calling of each module, and errors occurring during the mapping of each module to a single term during the operation of each module; and classifying and saving the terms which are not mapped successfully, including the terms which are ignored in the automatic analysis module and the terms which are ignored in the self-defining module, and generating a failure term document. The exception handling module supports a database backtracking function by setting a timestamp on the database, and supports a user to backtrack the matched database to data of a specified date.
Nine, data cleaning module
After the standardized mapping of medical terms is completed, medical data cleaning is extremely necessary to improve the quality of medical data for subsequent data mining and analysis; providing a common data cleaning strategy, wherein the dirty data of the structure level and the instance level are mainly cleaned, and the dirty data respectively comprise data violating the requirements of data patterns and integrity constraints, such as data value out-of-range, attribute dependency relationship damage, uniqueness relationship damage, reference integrity damage and the like, and data corresponding to error attributes and dependency relationship damage among the attributes, such as missing values, repeated records, contradictory records, reference errors and the like; the integrity, uniqueness, authority, legality and consistency of the data are met to the maximum extent, data redundancy is reduced, and data quality is improved.
1) Structure level cleaning rules: unified data schema (including data type) definitions; a unified integrity constraint definition; a unified function dependency requirement definition.
2) Example level cleaning rules: and analyzing dirty data, formulating a cleaning rule, evaluating and verifying, and recording a cleaning action into a log for tracing.
The invention is a collaborative mode designed for realizing data sharing among a plurality of medical data centers (mainly hospitals) and fully ensuring the data security of each medical data center along with the continuous improvement of the requirements of the data quantity and quality of the current data mining and analysis, so that the medical process can be optimized, the development of related scientific research is accelerated and the medical service quality of patients is finally improved by sharing medical data. The premise of data sharing among multiple medical data centers is the standardization of medical data, which comprises two parts of contents, namely, the standardization of a data structure and the standardization of medical terms, wherein the contents are designed for the standardization of the latter. The technical points of the invention are summarized as follows:
1. through the interaction among all the modules, the automatic analysis and scanning of the database in the medical information system are realized, the statistical information such as the occurrence frequency of the medical terms in the database is returned, and the practical basis is provided for the subsequent medical term mapping and performance optimization.
2. The mapping of the medical data covered by the medical terms of the part is automatically realized according to the mapping relation between the existing medical term set codes for the data adopting the international universal medical term set codes.
3. For data which is not coded by a standard term set in a medical data center, self-defined medical terms or domestic unique medical terms such as traditional Chinese medicines, the method supports information such as the occurrence frequency of the data in the medical data center according to the medical terms, and supports relevant personnel to visually carry out reasonable and scientific fuzzy matching or directly increase the self-defined standard terms.
4. The interaction requirements between the medical data centers are completed at regular time, the standardization of the universal medical databases of all the medical data centers is kept uniform after the medical terms of all the medical data centers are standardized, and data sharing can be realized.
5. And cleaning the data according to a cleaning strategy to ensure the data quality.
6. All error exceptions are recorded and written into the log, so that functions of error checking, quality evaluation and the like are conveniently realized.
7. The established mapping relation between the medical terms in the medical data center and the international universal standard medical term set is fully utilized, and semi-automatic or even automatic mapping and standardization of the terms in the subsequent medical data center are realized.
The above are merely examples of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement and the like, which are not made by the inventive work, are included in the scope of protection of the present invention within the spirit and principle of the present invention.

Claims (9)

1.一种基于通用医疗术语库的多中心医疗术语标准化系统,其特征在于,该系统包括源数据库、数据库连接管理模块、预分析模块、术语映射单元、增量更新模块、异常处理模块和多中心交互模块;1. a multi-center medical term standardization system based on general medical terminology library, is characterized in that, this system comprises source database, database connection management module, pre-analysis module, term mapping unit, incremental update module, exception handling module and multiple Center interaction module; 所述源数据库分布于各医疗数据中心前置服务器内,存储各医疗数据中心的业务数据;The source database is distributed in the front-end servers of each medical data center, and stores the business data of each medical data center; 所述数据库连接管理模块:管理访问源数据库所需的信息,对术语映射工具访问与修改源数据库提供支持;The database connection management module: manages the information required for accessing the source database, and provides support for the term mapping tool to access and modify the source database; 所述预分析模块:自动扫描源数据库,统计原始医疗数据中各医疗术语的出现频次,对术语出现频次小于设定阈值的术语给出舍弃建议,大于等于设定阈值的术语发送至术语映射单元进行后续术语映射;The pre-analysis module: automatically scans the source database, counts the frequency of occurrence of each medical term in the original medical data, gives suggestions for discarding the terms whose frequency of occurrence is less than the set threshold, and sends the terms greater than or equal to the set threshold to the term mapping unit Perform subsequent term mapping; 所述术语映射单元包括自动映射模块、模糊匹配模块和自定义术语模块;The term mapping unit includes an automatic mapping module, a fuzzy matching module and a custom terminology module; 所述自动映射模块:支持医疗术语自动化映射,对于使用国际通用医疗术语库标准编码的术语,根据现有的通用医疗术语库标准编码之间的映射关系实现多向映射;The automatic mapping module: supports the automatic mapping of medical terms, and for the terms encoded by the international general medical terminology base standard, multi-directional mapping is realized according to the mapping relationship between the existing standard encodings of the general medical terminology base; 所述模糊匹配模块:对于无法直接依据现有医疗术语库内部标准编码间映射关系进行映射的医疗术语,通过模糊匹配的方式在通用医疗术语库中进行遍历查询,提供相似度最高的几组标准医疗术语以供选择作为该术语映射的目标术语;模糊匹配的具体方法如下:The fuzzy matching module: for medical terms that cannot be directly mapped according to the mapping relationship between standard codes in the existing medical terminology database, a traversal query is performed in the general medical terminology database by means of fuzzy matching, and several sets of standards with the highest similarity are provided. The medical term is selected as the target term for this term mapping; the specific method of fuzzy matching is as follows: (1)术语分词:将通用医疗术语库中的所有词汇进行分词,并将每个分词进行频率统计,作为基础词频;将需要模糊匹配的源医疗术语M在匹配前进行分词;(1) Term segmentation: perform word segmentation on all words in the general medical terminology database, and count the frequency of each word segment as the basic word frequency; perform word segmentation on the source medical term M that needs fuzzy matching before matching; (2)模糊匹配:通过比较医疗术语间的概率差异作为相似度大小的标准,具体操作如下:(2) Fuzzy matching: By comparing the probability difference between medical terms as the standard of similarity, the specific operations are as follows: (2.1)从通用医疗术语库中筛选出所有包括分词的术语,并进行分词,组合为术语集A;(2.1) Screen out all terms including word segmentation from the general medical terminology database, perform word segmentation, and combine them into term set A; (2.2)利用如下公式进行匹配度计算,求术语M、术语集A中所有术语的平均加权概率;其中,n为每个术语得到分词个数,P1、P2、P3、P4…Pn为每个分词在基础词频中对应的概率:(2.2) Use the following formula to calculate the degree of matching, and find the average weighted probability of term M and all terms in term set A; where n is the number of word segments obtained for each term, and P1, P2, P3, P4...Pn are each The probability that the word segmentation corresponds to the basic word frequency:
Figure FDA0003242110290000011
Figure FDA0003242110290000011
(2.3)将术语集A中所有标准术语的平均加权概率与需要模糊匹配的术语M做差,得到的数值取负作为匹配度,匹配度越大,二者相似度越高,公式如下:(2.3) Compare the average weighted probability of all standard terms in term set A with the term M that needs fuzzy matching, and take the negative value as the matching degree. The greater the matching degree, the higher the similarity between the two. The formula is as follows: S(M,A)=|D(M)-D(A)|S(M,A)=|D(M)-D(A)| 所述自定义术语模块:对于无法依靠现有医疗术语库内标准编码间的映射关系也无法在现有通用医疗术语库中模糊匹配到目标术语的医疗术语,在用户生成自定义术语申请后,发送至多中心交互模块对其进行审核与反馈;Described custom terminology module: For medical terms that cannot rely on the mapping relationship between standard codes in the existing medical termbase and cannot be vaguely matched to the target term in the existing general medical termbase, after the user generates a custom term application, Send it to the multi-center interaction module for review and feedback; 所述多中心交互模块:接收自定义术语模块发送的各医疗数据中心的自定义术语申请后,将对自定义术语进行审核,将审核通过的自定义术语作为标准术语添加到通用医疗术语库中,并发送至各医疗数据中心,保持各医疗数据中心通用医疗术语库一致;The multi-center interaction module: after receiving the custom term application from each medical data center sent by the custom term module, the custom term will be reviewed, and the reviewed custom term will be added to the general medical term base as a standard term , and send it to each medical data center to keep the general medical terminology database consistent in each medical data center; 所述增量更新模块:针对执行过医疗术语标准化映射的源数据库因业务原因生成增量数据的医疗术语标准化过程,调取术语映射单元产生的历史映射关系记录完成对增量数据的术语标准化映射;The incremental update module: for the medical term standardization process in which incremental data is generated due to business reasons in the source database that has performed standardized mapping of medical terms, the historical mapping relationship records generated by the term mapping unit are retrieved to complete the term standardized mapping of incremental data. ; 所述异常处理模块:对上述每一个模块的执行过程进行记录,针对发生错误的情况生成错误日志,根据错误日志能够进行医疗术语映射全过程的回溯。The exception handling module: records the execution process of each of the above modules, generates an error log in response to the occurrence of errors, and can perform backtracking of the entire process of medical term mapping according to the error log.
2.根据权利要求1所述的一种基于通用医疗术语库的多中心医疗术语标准化系统,其特征在于,该系统还包括数据清洗模块,用于制定清洗规则,给每个数据元赋予权重,将缺失严重的数据筛除,包括清洗结构层次和实例层次的脏数据。2. a kind of multi-center medical term standardization system based on general medical terminology library according to claim 1, is characterized in that, this system also comprises data cleaning module, is used for formulating cleaning rules, gives weight to each data element, Filter out severely missing data, including cleaning dirty data at the structure level and instance level. 3.根据权利要求1所述的一种基于通用医疗术语库的多中心医疗术语标准化系统,其特征在于,所述数据库连接管理模块具体包括:通过编程语言编写的类及接口组成JDBC模块,为多种类型数据库提供统一的访问接口,实现建立与数据库或者其他数据源的连接、向数据库发送SQL命令和处理数据库返回结果的功能。3. a kind of multi-center medical term standardization system based on general medical term library according to claim 1, is characterized in that, described database connection management module specifically comprises: the class and interface that programming language is written form JDBC module, for Various types of databases provide a unified access interface to realize the functions of establishing a connection with the database or other data sources, sending SQL commands to the database, and processing the results returned by the database. 4.根据权利要求1所述的一种基于通用医疗术语库的多中心医疗术语标准化系统,其特征在于,所述预分析模块在数据库连接管理模块实现对源数据库的连接后,通过该模块自动扫描源数据库中所有数据的结构信息及其具体字段的统计信息,生成统计表格,包括两部分:4. a kind of multi-center medical terminology standardization system based on general medical terminology library according to claim 1, is characterized in that, described pre-analysis module after database connection management module realizes connection to source database, through this module automatically Scan the structure information of all data in the source database and the statistical information of its specific fields to generate a statistical table, including two parts: 首先对源数据库内所有表的概述性统计,包括各个表内字段名称、数值类型、所有值中的最大长度、表内总行数以及空值所占比例;First, the general statistics of all tables in the source database, including the field name, value type, maximum length of all values, the total number of rows in the table, and the proportion of empty values in each table; 其次对具体某一表内部具体术语的详细信息与出现频次作出统计,且按照出现频次高低由大到小排列,供后续术语映射优先选择出现频次较高的术语进行处理,系统会给出对于出现频次极低的术语是否有必要参与后续术语映射的建议,未定义时,默认所有术语参与映射,用户也可根据具体情况进行调整,以此确定不参与后续术语映射的最小出现频次阈值。Secondly, the detailed information and frequency of occurrence of specific terms in a specific table are counted, and they are arranged in descending order of occurrence frequency for subsequent term mapping to preferentially select terms with higher occurrence frequency for processing. Whether it is necessary to participate in the follow-up term mapping suggestion for very low-frequency terms. If not defined, all terms participate in the mapping by default. Users can also adjust it according to the specific situation to determine the minimum occurrence frequency threshold for not participating in the subsequent term mapping. 5.根据权利要求1所述的一种基于通用医疗术语库的多中心医疗术语标准化系统,其特征在于,所述自动映射模块:针对在源数据库内存在国际通用医疗术语库标准编码的术语,在确定其编码所属标准后,选定其将要映射的目标术语集,若源数据库内术语所属标准术语集编码与目标术语集编码之间已存在可参考映射关系,则此部分术语可自动生成映射SQL语句,完成对源数据库内术语的自动映射和相应数据装载。5. a kind of multi-center medical terminology standardization system based on general medical terminology library according to claim 1, is characterized in that, described automatic mapping module: for the term of international general medical terminology library standard coding in the source database, After determining the standard to which the code belongs, select the target term set to be mapped. If there is a reference mapping relationship between the standard term set code to which the term belongs to and the target term set code in the source database, this part of the terms can be automatically mapped SQL statement to complete automatic mapping of terms in the source database and corresponding data loading. 6.根据权利要求1所述的一种基于通用医疗术语库的多中心医疗术语标准化系统,其特征在于,所述自定义术语模块:事先定义约束,以避免自定义术语与已知的标准术语相互冲突;在添加自定义术语时,各医疗数据中心之间需保持已添加的自定义标准术语的一致性,防止重复添加,同时保证多中心医疗数据在经术语映射标准化之后能够实现数据共享;在添加自定义术语之前,需向多中心交互模块递交添加自定义术语的申请,申请内容包括:需要添加的自定义术语、自定义术语的具体描述、自定义术语的代码;待多中心交互模块的相关操作人员审核通过后,确定无类似重复医疗术语的自定义编码,则生成一条自定义标准术语编码,而后即可调用自动映射模块,完成术语映射以及所涵盖数据的装载;若审核未通过,则返回已有的自定义术语编码供该医疗数据中心完成后续映射或返回自定义术语生成失败原因,生成错误文档并向用户提示。6. A multi-center medical term standardization system based on a general medical term library according to claim 1, wherein the custom terminology module: defines constraints in advance to avoid custom terms and known standard terms Conflict with each other; when adding custom terms, each medical data center needs to maintain the consistency of the custom standard terms that have been added to prevent repeated additions, and at the same time ensure that multi-center medical data can realize data sharing after term mapping standardization; Before adding custom terms, you need to submit an application for adding custom terms to the multi-center interaction module. The application content includes: the custom terms to be added, the specific description of the custom terms, and the code of the custom terms; to be added to the multi-center interaction module After the relevant operators have passed the review and determined that there is no custom code similar to repeated medical terms, a custom standard term code will be generated, and then the automatic mapping module can be called to complete the term mapping and the loading of the covered data; if the review fails , the existing custom term code is returned for the medical data center to complete subsequent mapping or the reason for the failure of custom term generation is returned, an error document is generated and prompted to the user. 7.根据权利要求1所述的一种基于通用医疗术语库的多中心医疗术语标准化系统,其特征在于,所述多中心交互模块负责各个医疗数据中心的通用医疗术语库及其术语编码的协调与统一,由多中心交互模块的最高权限人员审核协调自定义标准术语的使用问题。7. A kind of multi-center medical term standardization system based on general medical term base according to claim 1, it is characterized in that, described multi-center interaction module is responsible for the coordination of general medical term base and term coding of each medical data center With Unity, the use of custom standard terminology is reviewed and coordinated by the highest authority personnel of the Multicenter Interaction Module. 8.根据权利要求1所述的一种基于通用医疗术语库的多中心医疗术语标准化系统,其特征在于,所述增量更新模块用于已经操作过医疗术语映射的医疗数据中心的后续医疗术语标准化过程,主要依据术语映射单元产生的以往的术语标准化的映射记录实现对增量数据的更新,对于仍然无法完成标准化映射的医疗术语,重复执行自定义术语模块。8. A multi-center medical term standardization system based on a general medical term library according to claim 1, wherein the incremental update module is used for subsequent medical terms in a medical data center that has operated medical term mapping The standardization process mainly implements the update of incremental data according to the previous term standardized mapping records generated by the term mapping unit. For medical terms that still cannot complete standardized mapping, the custom terminology module is repeatedly executed. 9.根据权利要求1所述的一种基于通用医疗术语库的多中心医疗术语标准化系统,其特征在于,所述异常处理模块:用于保存系统运行时的所有日志,记录每个模块是否正常运行;分类保存错误日志,包括:系统运行时出现的错误,每个模块调用时出现的错误,各个模块运行时对于单个术语映射时出现的错误;分类保存未映射成功的术语,包括在自动分析模块中被忽略的和在自定义模块中被忽略的术语,生成失败术语文档;异常处理模块通过在数据库上设定时间戳,支持数据库回溯功能,支持用户将匹配后的数据库回溯到指定日期的数据。9. a kind of multi-center medical term standardization system based on general medical term library according to claim 1, is characterized in that, described exception handling module: is used to save all logs when the system is running, and records whether each module is normal Run; classify and save error logs, including: errors that occur when the system is running, errors that occur when each module is called, and errors that occur when each module is running when mapping a single term; classify and save terms that have not been successfully mapped, including in automatic analysis. The terms ignored in the module and in the custom module will generate the failed term document; the exception handling module supports the database backtracking function by setting the timestamp on the database, and supports the user to backtrack the matched database to the specified date. data.
CN201910629244.9A 2019-07-12 2019-07-12 Multi-center medical term standardization system based on general medical term library Active CN110349639B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201910629244.9A CN110349639B (en) 2019-07-12 2019-07-12 Multi-center medical term standardization system based on general medical term library
JP2021533326A JP7093593B2 (en) 2019-07-12 2020-04-07 Multi-center medical term standardization system based on general-purpose medical term library
PCT/CN2020/083586 WO2020233256A1 (en) 2019-07-12 2020-04-07 General medical termbase-based multi-center medical terminology standardization system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910629244.9A CN110349639B (en) 2019-07-12 2019-07-12 Multi-center medical term standardization system based on general medical term library

Publications (2)

Publication Number Publication Date
CN110349639A CN110349639A (en) 2019-10-18
CN110349639B true CN110349639B (en) 2022-01-04

Family

ID=68176052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910629244.9A Active CN110349639B (en) 2019-07-12 2019-07-12 Multi-center medical term standardization system based on general medical term library

Country Status (3)

Country Link
JP (1) JP7093593B2 (en)
CN (1) CN110349639B (en)
WO (1) WO2020233256A1 (en)

Families Citing this family (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12125054B2 (en) 2018-09-25 2024-10-22 Valideck International Corporation System, devices, and methods for acquiring and verifying online information
CN110349639B (en) * 2019-07-12 2022-01-04 之江实验室 Multi-center medical term standardization system based on general medical term library
CN111126018B (en) * 2019-11-25 2023-08-08 泰康保险集团股份有限公司 Form generation method and device, storage medium and electronic equipment
CN110990591A (en) * 2019-12-26 2020-04-10 北京亚信数据有限公司 Method and system for auditing transcoding quality of medical data
CN111291225B (en) * 2020-05-08 2020-08-11 成都金盘电子科大多媒体技术有限公司 Method and system for quickly verifying medical health information data standard
CN112035451A (en) * 2020-08-25 2020-12-04 上海灵长软件科技有限公司 Data verification optimization processing method and device, electronic equipment and storage medium
CN112069774A (en) * 2020-09-03 2020-12-11 微医云(杭州)控股有限公司 A data mapping method, device, electronic terminal and storage medium
CN112347266A (en) * 2020-09-11 2021-02-09 湖南中医药大学 Special term standardization system for children rehabilitation
CN112052667B (en) * 2020-09-27 2024-05-03 沈阳东软智能医疗科技研究院有限公司 Method, device and equipment for realizing medical coding mapping
CN112365939B (en) * 2020-10-14 2023-04-07 山东大学 Data management method and system based on medical health big data
CN112633005B (en) * 2020-11-11 2024-06-21 上海数创医疗科技有限公司 Electrocardiogram term semantic matching method
CN112395854B (en) * 2020-12-02 2022-11-22 中国标准化研究院 A Consistency Checking Method for Standard Elements
CN112883157B (en) * 2021-02-07 2023-04-07 武汉大学 Method and device for standardizing multi-source heterogeneous medical data
CN112951355B (en) * 2021-02-25 2023-05-02 武汉大学 Quality inspection function method and device for warehousing massive medical data
CN112817945A (en) * 2021-03-03 2021-05-18 江苏汇鑫融智软件科技有限公司 Medical heterogeneous system data warehouse construction method based on ESB
CN112988966A (en) * 2021-03-04 2021-06-18 中建海峡建设发展有限公司 Voice interaction construction log management system and implementation method
CN113284630B (en) * 2021-04-13 2024-05-14 常州市第二人民医院 Medical ontology-based medical term knowledge base construction system and method
CN113239115B (en) * 2021-05-19 2023-06-02 中国医学科学院医学生物学研究所 Quick and accurate synchronization method for vaccine adverse reaction batch data
CN113377897B (en) * 2021-05-27 2022-04-22 杭州莱迈医疗信息科技有限公司 Multi-language medical term standard standardization system and method based on deep confrontation learning
CN113342793B (en) * 2021-06-18 2023-04-07 立信(重庆)数据科技股份有限公司 Research data standardization method and system
CN113704555B (en) * 2021-07-16 2023-11-07 杭州医康慧联科技股份有限公司 Feature management method based on medical direction federal learning
CN113764086A (en) * 2021-08-17 2021-12-07 卫宁健康科技集团股份有限公司 Nursing information processing system and method based on JHNEBP model
CN113836126B (en) * 2021-09-22 2024-01-30 上海妙一生物科技有限公司 Data cleaning method, device, equipment and storage medium
CN113656604B (en) * 2021-10-19 2022-02-22 之江实验室 Medical term normalization system and method based on heterogeneous graph neural network
CN114003791B (en) * 2021-12-30 2022-04-08 之江实验室 Depth map matching-based automatic classification method and system for medical data elements
CN114417859A (en) * 2022-01-10 2022-04-29 广西壮族自治区农村信用社联合社 A data standardization method and system based on cloud blockchain technology
CN114461714B (en) * 2022-01-13 2024-03-29 湖北国际物流机场有限公司 BIM code conversion system
CN114595668A (en) * 2022-01-28 2022-06-07 北京医鸣技术有限公司 Method, platform, medium and equipment for standardizing medical diagnosis terms
CN115017323B (en) * 2022-02-17 2024-08-02 镇江市精神卫生中心(镇江市第五人民医院) Automatic medical knowledge graph labeling system and method with variable multi-element framework
CN114974490B (en) * 2022-05-27 2024-11-05 神州医疗科技股份有限公司 Method, device, electronic device and medium for building a medical terminology platform
CN115080751B (en) * 2022-08-16 2022-11-11 之江实验室 Medical standard term management system and method based on general model
CN115712839B (en) * 2022-11-14 2023-10-24 国网山东省电力公司日照供电公司 An automatic matching system and method for communication models of relay protection devices
CN116303377A (en) * 2022-11-23 2023-06-23 南京视察者智能科技有限公司 Government affair data cleaning and filtering method
CN115952770B (en) * 2023-03-15 2023-07-25 广州汇通国信科技有限公司 Data standardization processing method and device, electronic equipment and storage medium
CN116110560A (en) * 2023-04-13 2023-05-12 杭州璞睿生命科技有限公司 Method, device, equipment and medium for docking clinical diagnosis and treatment data to EDC system
CN116167354B (en) * 2023-04-19 2023-07-07 北京亚信数据有限公司 Medical term feature extraction model training and standardization method and device
CN116386799B (en) * 2023-06-05 2023-08-18 数据空间研究院 Medical data acquisition and standard conversion method and system
CN117216042A (en) * 2023-07-26 2023-12-12 中电云计算技术有限公司 Construction method and device of data standardization platform
CN116737697B (en) * 2023-08-10 2023-10-20 云筑信息科技(成都)有限公司 Method and device for managing main data of materials in construction industry and electronic equipment
CN117995332B (en) * 2024-04-07 2024-07-05 北方健康医疗大数据科技有限公司 Value range code standardized conversion system and method
CN118035504B (en) * 2024-04-15 2024-09-03 上海森亿医疗科技有限公司 Medical core word knowledge base construction method, device, medium and terminal
CN118173211B (en) * 2024-05-15 2024-07-23 万链指数(青岛)信息科技有限公司 Data standardized treatment method and system for medical big data
CN118586404B (en) * 2024-08-06 2024-11-08 杭州古珀医疗科技有限公司 Method and device for extracting and standardizing hospital leaving doctor's advice information

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003077151A2 (en) 2002-03-05 2003-09-18 Siemens Medical Solutions Health Services Corporation A dynamic dictionary and term repository system
KR100538577B1 (en) 2003-07-14 2005-12-22 이지케어텍(주) Method For Standardization Of Computerization Of Medical Information
JP4955197B2 (en) 2004-09-07 2012-06-20 株式会社日本医療データセンター Receipt file generation system
JP4661415B2 (en) 2005-07-13 2011-03-30 株式会社日立製作所 Expression fluctuation processing system
US7610192B1 (en) * 2006-03-22 2009-10-27 Patrick William Jamieson Process and system for high precision coding of free text documents against a standard lexicon
CN101452503A (en) * 2008-11-28 2009-06-10 上海生物信息技术研究中心 Isomerization clinical medical information shared system and method
US10204703B2 (en) 2014-11-10 2019-02-12 Accenture Global Services Limited Medical coding management system using an intelligent coding, reporting, and analytics-focused tool
JP2016200978A (en) 2015-04-10 2016-12-01 株式会社日立製作所 Training data generation device
US20160342746A1 (en) * 2015-05-21 2016-11-24 Naveen Sarabu Cloud-Based Medical-Terminology Manager and Translator
CN106383853A (en) * 2016-08-30 2017-02-08 刘勇 Realization method and system for electronic medical record post-structuring and auxiliary diagnosis
KR101878217B1 (en) 2016-11-07 2018-07-13 경희대학교 산학협력단 Method, apparatus and computer program for medical data
JP2020527804A (en) 2017-07-18 2020-09-10 コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. Coded medical vocabulary mapping
CN107978341A (en) * 2017-12-22 2018-05-01 南京昂特医信数据技术有限公司 Isomeric data adaptation method and its system under a kind of medicine semantic frame based on linguistic context
CN109033080B (en) * 2018-07-12 2023-03-24 上海金仕达卫宁软件科技有限公司 Medical term standardization method and system based on probability transfer matrix
CN109446340A (en) * 2018-10-17 2019-03-08 长沙瀚云信息科技有限公司 A kind of Medicine standard term ontology management system and method, equipment and storage medium
CN109408820A (en) * 2018-10-17 2019-03-01 长沙瀚云信息科技有限公司 A kind of medical terminology mapped system and method, equipment and storage medium
CN110349639B (en) * 2019-07-12 2022-01-04 之江实验室 Multi-center medical term standardization system based on general medical term library

Also Published As

Publication number Publication date
JP7093593B2 (en) 2022-06-30
CN110349639A (en) 2019-10-18
JP2022508350A (en) 2022-01-19
WO2020233256A1 (en) 2020-11-26

Similar Documents

Publication Publication Date Title
CN110349639B (en) Multi-center medical term standardization system based on general medical term library
CN110415831B (en) Medical big data cloud service analysis platform
US7630993B2 (en) Generating database schemas for relational and markup language data from a conceptual model
Barateiro et al. A survey of data quality tools.
CN111370127A (en) Decision support system for early diagnosis of chronic nephropathy in cross-department based on knowledge graph
US9378271B2 (en) Database system for analysis of longitudinal data sets
EP3131021A1 (en) Hybrid data storage system and method and program for storing hybrid data
Fabbri et al. Explanation-based auditing
CN111243748A (en) Needle pushing health data standardization system
CN110544528A (en) Upper-lower-level ophthalmology remote diagnosis platform and its construction method based on deep learning
CN117995419A (en) Data quality control method, system, terminal and storage medium for medical data
Moro et al. Schema advisor for hybrid relational-XML DBMS
US20150356130A1 (en) Database management system
Kirsten et al. Metadata management for data integration in medical sciences
Hu Research on monitoring system of daily statistical indexes through big data
Oliveira et al. Towards a Data Catalog for Data Analytics
Silva et al. Interoperable Electronic Health Records (EHRs) for Ecuador
Arias The benefits of graph databases for the computation of clinical quality measures
Bréant et al. Design of a Multi Dimensional Database for the Archimed DataWarehouse
CN116166698B (en) Method and system for quickly constructing queues based on general medical terms
Mansmann Extending the OLAP technology to handle non-conventional and complex data
Romanchikova et al. A framework for user-configurable data quality assurance of electronic patient records
Zamani Forooshani A Tool for integrating dynamic healthcare data sources
Nind et al. The Research Data Management Platform (RDMP)
CN118708579A (en) A method for automatically generating medical quality control indicators

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant