CN105069124B

CN105069124B - A kind of International Classification of Diseases coding method of automation and system

Info

Publication number: CN105069124B
Application number: CN201510496513.0A
Authority: CN
Inventors: 金以东; 朱华玲; 陈志永
Original assignee: Ebaotech Internet Medical Information Technology (beijing) Co Ltd
Current assignee: Ebaotech Internet Medical Information Technology (beijing) Co Ltd
Priority date: 2015-08-13
Filing date: 2015-08-13
Publication date: 2018-06-15
Anticipated expiration: 2035-08-13
Also published as: CN105069124A

Abstract

Embodiments of the present invention provide a kind of International Classification of Diseases coding method of automation.This method includes：The Chinese medical diagnosis on disease information of input；Natural language processing is carried out to Chinese medical diagnosis on disease information, obtains one or more titles to be encoded；Based on standard terminology library and expand terminology bank, search the standard terminology to match with title to be encoded or expand term, and by the standard terminology of successful match or expand the coding of term, be determined as the coding of title to be encoded；Wherein, standard terminology is each disease term included in the ICD versions to be referred to, expands subclass disease term or newly generated disease term that term is being commonly called as of standard terminology, nickname or abbreviation or standard terminology.By the method for the above embodiment of the present invention, ICD codings can be automatically performed, without manually participating in, have many advantages, such as that coding rate is high, at low cost, accuracy is high.In addition, embodiments of the present invention additionally provide a kind of International Classification of Diseases coded system of automation.

Description

Automatic international disease classification coding method and system

Technical Field

The embodiment of the invention relates to the field of disease classification, in particular to an automatic international disease classification coding method and system.

Background

International Classification of Diseases (ICD) is a system that classifies Diseases according to certain characteristics of the Diseases according to rules and uses a coding method to represent the Diseases, and has been used in China for more than twenty years. The ICD version that is currently most widely used worldwide is ICD-10 published by the world health organization WHO in 1992. According to the WHO's regulations, the WHO provides ICD-10 with only 4-bit encoding, countries or regions can expand ICD-10 as needed to form localized versions (e.g., the number of diseases can be increased by adding spreading codes).

ICDs standardize and format disease terms, are the application foundation of medical informatization and medical information management, and are also important bases for medical insurance settlement, and therefore, effective use of ICDs plays a very important role in the development of the health and medical system.

In the ICD application field, the method mainly comprises two modes of manual coding and computer-aided coding. In China, manual coding modes are used all the time, professional coder posts are arranged in medical record rooms of large hospitals, and the codes which are the same as or similar to the diagnosis results of doctors are selected by inquiring with the help of a dictionary base on the basis of code specifications through professional learning and training. With the development of network and informatization, computer-aided coding becomes a hotspot in the field and has strong development potential, and at present, disease classification paths and coding libraries are constructed and configured in an information system at home, and are automatically guided and recommended for coding according to a diagnosis result manually input, and are manually selected and confirmed.

Disclosure of Invention

Both the current manual coding mode and the computer-aided coding mode can be completed only by manual participation, but the manual participation process has the defects of low efficiency and high cost, and different people may output different coding results, which is not beneficial to the operation of checking medical information management, medical insurance settlement and the like.

In addition, because the Chinese disease diagnosis information input by the doctor belongs to natural language, the format is complex and various, and no unified standard exists (for example, mixed expression of multiple languages, use of irregular grammar, entry of wrong information, adoption of abbreviation or common name to replace standard terms, mixed symbols in characters and other messy information and the like), the coding difficulty is further increased, and the error rate is higher.

For this reason, an improved ICD coding scheme is highly desirable.

In this context, embodiments of the present invention are intended to provide an automated international disease classification coding method and system.

In a first aspect of embodiments of the present invention, there is provided an automated international disease classification coding method, comprising:

step 1, inputting Chinese disease diagnosis information;

step 2, natural language processing is carried out on the Chinese disease diagnosis information to obtain one or more names to be coded;

step 3, based on the standard term library and the extended term library, searching the standard term or the extended term matched with the name to be coded, and determining the successfully matched code of the standard term or the extended term as the code of the name to be coded;

the standard term library is created according to the following mode:

determining an ICD version of the international disease classification to be referred to;

determining each disease term contained in the ICD version to be referred to as a standard term;

determining the code of each standard term according to the ICD version to be referred to;

storing the standard terms and the codes thereof to obtain a standard term library;

wherein, the extended term library is created according to the following mode:

the following various types not included in the ICD version to which reference is made are determined as expanded terms: colloquial \ alternative \ acronym for the standard terms, subclass disease terms for the standard terms, and newly generated disease terms after publication of the ICD version to which reference is made;

when the extension term is the colloquial name \ alternative name \ abbreviation of any one standard term, the code of the standard term is given to the extension term;

when the expanded term is a disease term that is a subclass of any of the standard terms or the newly generated disease term, assigning the expanded term a code of the standard term that most closely relates to the genus relationship of the expanded term;

and storing the expansion terms and the codes thereof to obtain an expansion term library.

In a second aspect of embodiments of the present invention there is provided an automated international disease classification coding system comprising:

the standard term library creating module is used for determining each disease term contained in the ICD version to be referred as a standard term according to the international disease classification version to be referred; determining the code of each standard term according to the ICD version to be referred to; storing the standard terms and the codes thereof to obtain a standard term library;

an augmented term library creation module for determining as augmented terms various types of the following not included in the ICD version to be referenced: colloquial \ alternative \ acronym for the standard terms, subclass disease terms for the standard terms, and newly generated disease terms after publication of the ICD version to which reference is made; when the expansion term is judged to be the colloquial name \ alternative name \ abbreviation of any one standard term, the code of the standard term is given to the expansion term; when the expansion term is judged to be a disease term of the subclass of any one of the standard terms or the newly generated disease term, the expansion term is assigned with the code of the standard term closest to the generic relationship of the expansion term; storing the expansion terms and the codes thereof to obtain an expansion term library;

the import module is used for inputting Chinese disease diagnosis information;

the data processing module is used for carrying out natural language processing on the Chinese disease diagnosis information to obtain one or more names to be coded;

and the coding module is used for searching the standard term or the extended term matched with the name to be coded based on the standard term library and the extended term library, and determining the successfully matched code of the standard term or the successfully matched code of the extended term as the code of the name to be coded.

According to the international disease classification coding method and system, the characteristics that Chinese disease diagnosis information input by a doctor belongs to natural language, the format is complex and various, no unified standard exists and the like are fully considered, the Chinese operation information character strings are matched by utilizing various dictionaries established in advance according to ICD-9-CM-3, so that the operation names are automatically, quickly and accurately identified and coded, manual participation is not required in the whole process, ICD coding can be automatically completed without manual participation, the coding speed is improved, the coding cost is reduced, and the coding accuracy is ensured.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

FIG. 1 schematically illustrates an application scenario in which embodiments of the present invention may be implemented;

FIG. 2A schematically illustrates a flow chart of an ICD encoding method in an exemplary method of the present invention;

FIG. 2B schematically illustrates a flow diagram for creating a standard term base in an exemplary method of the invention;

FIG. 2C schematically illustrates a standard term library in the form of a data table in an exemplary method of the invention;

FIG. 2D schematically illustrates a flow chart for creating an augmented term base in an exemplary method of the present invention;

FIG. 2E schematically illustrates an augmented term base in the form of a data table in an exemplary method of the invention;

FIG. 3A is a schematic flow chart illustrating an ICD encoding method according to an embodiment of the present invention;

FIG. 3B is a schematic diagram illustrating a process for creating a database of hypothetical taxonomy terms according to one embodiment of the present invention;

FIG. 3C is a schematic diagram illustrating a database of hypothetical taxonomy terms in the form of a data table according to one embodiment of the present invention;

fig. 4A schematically illustrates a flowchart of an ICD encoding method according to a second embodiment of the present invention;

FIG. 4B is a schematic diagram illustrating a process of creating a multi-coding term library according to a second embodiment of the present invention;

FIG. 4C is a schematic diagram illustrating a database of multiple coding terms in the form of a data table according to a second embodiment of the present invention;

fig. 5A schematically illustrates a flowchart of an ICD encoding method in a third embodiment of the present invention;

FIG. 5B is a schematic diagram illustrating a flow chart of creating a merged term base according to a third embodiment of the present invention;

FIG. 5C is a schematic diagram of a merged term base in the form of a data table according to a third embodiment of the present invention;

fig. 6A schematically shows a flowchart of an ICD encoding method in the fourth embodiment of the present invention;

FIG. 6B is a schematic diagram illustrating a database of uncoded terms in the form of a data table according to a fourth embodiment of the present invention;

FIG. 7 schematically illustrates a block diagram of an ICD encoding system in an exemplary device of the present invention;

FIG. 8 schematically illustrates a block diagram of another ICD encoding system in an exemplary device of the present invention;

FIG. 9 is a block diagram schematically illustrating the structure of yet another ICD encoding system in an exemplary device of the present invention;

FIG. 10 schematically illustrates a block diagram of yet another ICD encoding system in an exemplary device of the present invention;

FIG. 11 is a block diagram schematically illustrating the structure of yet another ICD encoding system in an exemplary device of the present invention;

FIG. 12A is a flowchart schematically illustrating natural language processing of Chinese disease diagnosis information according to a fifth embodiment of the present invention;

fig. 12B schematically shows a part of disease degree terms included in the disease degree term dictionary;

fig. 12C schematically shows a part of disease concurrent terms included in the disease concurrent terms dictionary;

FIG. 12D schematically illustrates a portion of the morbidity site terms included in the morbidity site term dictionary;

fig. 12E schematically shows a flowchart of splitting the first-type substring and the second-type substring in the fifth embodiment of the present invention;

FIG. 12F schematically illustrates a slicing rule;

FIG. 12G schematically illustrates another segmentation rule;

FIG. 12H schematically illustrates yet another segmentation rule;

FIG. 12I schematically illustrates yet another segmentation rule;

FIG. 12J schematically illustrates yet another segmentation rule;

FIG. 12K schematically illustrates yet another segmentation rule;

fig. 13 schematically shows a flowchart of searching for a standard term or an extended term matching a name to be encoded in the sixth embodiment of the present invention.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to the embodiment of the invention, an automatic international disease classification coding method and system are provided.

In this context, it is to be understood that the term "clinical" as used herein refers to the diagnosis and treatment of disease by a physician in advance of a clinic, and generally refers to the practice of a medical facility.

Moreover, any number of elements in the drawings are by way of example and not by way of limitation, and any nomenclature is used solely for differentiation and not by way of limitation.

The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments of the invention.

Summary of The Invention

The inventor finds that in the medical field, different regions, different units or different practitioners use the disease term, the adopted disease term standards are different (for example, the same disease term has multiple expressions), and the disease term standard covers incomplete conditions (for example, new terms cannot be covered), which causes a great amount of unqualified and recyclable disease terms in the generated Chinese disease diagnosis information (for example, information recorded in the basic medical insurance statement), and brings great obstacles to ICD coding work based on the Chinese disease diagnosis information, in this case, it is necessary to distinguish these ineligible disease terms manually, i.e. by means of the currently used manual coding or computer-assisted coding, however, the ICD encoding method with manual participation has the disadvantages of low efficiency, high cost, and the possibility of outputting different encoding results with different human participation.

To this end, the present invention provides an automated ICD encoding mechanism. The ICD encoding process may be: inputting Chinese disease diagnosis information; natural language processing is carried out on the Chinese disease diagnosis information to obtain one or more names to be coded; and searching the standard terms or the extended terms matched with the name to be coded based on the standard term library and the extended term library, and determining the codes of the successfully matched standard terms or extended terms as the codes of the name to be coded.

The standard term library is created according to the ICD version to be referred to and comprises standard terms and codes thereof, the standard terms are disease terms contained in the ICD version to be referred to, and the codes of the standard terms are consistent with the codes thereof in the ICD version to be referred to. The extended term library includes extended terms and their encodings, the extended terms being of the following types not included in the ICD version to which reference is made: the colloquial name \ alternative name \ abbreviation of the standard term, the subclass disease term of the standard term, or the disease term newly generated after the ICD version to which reference is made is published, the code of the extended term is the code of the standard term synonymous with the extended term, or the code of the standard term most closely related to the species relationship of the extended term.

In the present invention, the standard term library covers all disease terms and their codes recorded in the ICD version to be referred to, and the expanded term library covers some disease terms not included in the ICD version to be referred to, which include common names, alternative names or abbreviations of diseases frequently used in some regions or units, or disease terms of subclasses of disease terms recorded in the ICD version, or disease terms newly generated with the development of medical technology. The standard term library and the expanded term library cover most disease terms possibly appearing in the Chinese disease diagnosis information, and basically meet the requirement of automatically distinguishing the disease terms in the Chinese disease diagnosis information, so that the automatic ICD coding is realized. The whole ICD encoding process does not need manual participation, and has the advantages of high encoding speed, low cost, high accuracy and the like.

Having described the general principles of the invention, various non-limiting embodiments of the invention are described in detail below.

Application scene overview

Reference is first made to fig. 1, which illustrates an application scenario in which embodiments of the present invention may be implemented.

The scenario shown in fig. 1 includes a medical information processing terminal 100 and a medical information processing server 200. The medical information processing terminal 100 may be a desktop computer, a notebook computer, a tablet computer, a personal digital assistant, or the like used by a doctor. The medical information processing server 200 may be a server or the like that runs a hospital information management system. The medical information processing terminal 100 and the medical information processing server 200 may be connected to each other by communication via a hospital lan, for example.

When ICD coding based on the chinese disease diagnosis information is required, the chinese disease diagnosis information may be input at the medical information processing terminal 100, more specifically, for example, on a software interface installed on the medical information processing terminal 100, or a large amount of chinese disease diagnosis information may be imported into the medical information processing terminal 100 by using a data storage device such as a usb disk or a mobile hard disk. The medical information processing server 200 receives the Chinese disease diagnosis information, and obtains the name to be coded by performing natural language processing on the Chinese disease diagnosis information; then, the medical information processing server 200 queries the standard term or the extended term matching the name to be encoded based on the standard term base and the extended term base, and finally determines the code of the standard term matching the name to be encoded or the code of the extended term matching the name to be encoded as the code of the name to be encoded.

Exemplary method

An ICD encoding method according to an exemplary embodiment of the present invention is described below with reference to fig. 2A to 2E in conjunction with the application scenario of fig. 1.

It should be noted that the above application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present invention, and the embodiments of the present invention are not limited in this respect. Rather, embodiments of the present invention may be applied to any scenario where applicable.

For example, refer to fig. 2A, which is a flowchart of an ICD encoding method according to an embodiment of the present invention, and a standard term base and an extended term base.

As shown in fig. 2A, the ICD encoding method may include:

step S101, inputting Chinese disease diagnosis information.

Alternatively, the Chinese disease diagnosis information may be medical history information input by medical staff, or information recorded in a basic medical insurance statement.

Step S102, natural language processing is carried out on the Chinese disease diagnosis information to obtain one or more names to be coded.

Specifically, the step can perform mechanical word segmentation and other processing on the Chinese disease diagnosis information based on the characteristics of the Chinese disease diagnosis information, so as to analyze the disease terms from the Chinese disease diagnosis information, wherein the disease terms analyzed from the Chinese disease diagnosis information are the names to be coded.

A specific example of how the exemplary method performs natural language processing on chinese disease diagnostic information will be described below by way of example five.

Step S103, based on the standard term library and the extended term library, searching the standard term or the extended term matched with the name to be coded, and determining the successfully matched code of the standard term or the extended term as the code of the name to be coded.

In this embodiment, the standard term library is created according to the steps shown in fig. 2B:

step a1, determine the international disease classification ICD version to be referred to.

Alternatively, the ICD version of the international disease classification to be referred to may be an ICD version published by the WHO (e.g., ICD-10 published by the WHO in 1992) or various localized ICD versions extended from the ICD version published by the WHO (e.g., Chinese version of ICD-10 recommended by the Ministry of health in China). In specific implementation, an appropriate ICD version can be selected as a reference according to actual needs, and the present invention is not limited to this.

Step a2, each disease term contained in the ICD version to be referred to is determined as a standard term.

Step A3, determining the code of each standard term according to the ICD version to be referred to.

Specifically, since the coding of each disease term is explicitly described in the ICD version to which reference is made, the coding of each standard term can be determined directly therefrom.

And step A4, storing the standard terms and the codes thereof to obtain a standard term library.

Alternatively, the standard term library may store the standard terms and their encodings in the form of a data table or tree structure.

ICD records disease terms according to the relation of category, genus and the like, and the relation of category, genus and the like among the disease terms is beneficial to improving the speed of searching for specific disease terms. Based on the situation, when the standard term library is created, a data table or a tree structure can be created according to the relationship of the category, the category and the like of each disease term in the ICD version to be referred to, so that the standard term stored in the standard term library has a clear structure, is convenient to search, and is beneficial to improving the speed of matching the name to be coded.

Optionally, the standard term library may also be modified in real time, for example, when the ICD version being referred to has a new updated version, the standard terms are added, modified or deleted according to the updated version, so that the standard term library better conforms to the ICD coding requirement.

FIG. 2C shows a standard term library in a tree structure according to the present embodiment.

In this embodiment, the extended term library is created according to the steps shown in fig. 2D:

step B1, determining as expanded terms the following various types not included in the ICD version to be referenced: colloquial names \ alternative names \ abbreviations for the standard terms, disease terms that are subclasses of the standard terms, and disease terms that are newly generated after the release of the ICD version to which reference is made.

In the medical field, when a disease term is used by different regions, different units or different practitioners, the disease term may not be the disease term (i.e. standard term) described in the ICD version, but may be a colloquial name, alternative name or abbreviation of the standard term, or a more detailed name (i.e. subclass disease name) of the standard term, etc.; in addition, with the development of medical technology, new disease terms are continuously generated, and the phenomenon that the newly generated disease terms cannot be covered by the ICD version which is released in the past occurs. In consideration of these situations, the colloquial, alternative or abbreviation of the standard terms used in the actual work can be counted in the specific region or specific unit implementing the method, and the newly generated disease terms can be counted and stored as the expanded terms in the expanded term library to meet the requirement of ICD coding.

Step B2, when the expansion term is the colloquial name, alternative name or abbreviation of any standard term, assigning the code of the standard term to the expansion term; when the expanded term is a disease term that is a subclass of any one standard term or the newly generated disease term, the expanded term is assigned a code of the standard term that most closely relates to the genus relationship of the expanded term.

When the extension term is a colloquial name, alternative name or abbreviation of the standard term, the extension term and the standard term have a synonymous relationship, so that the code of the standard term can be directly used as the code of the extension term.

When the expansion term is a disease term of any one of the standard terms, the standard term closest to the generic relationship of the disease term of the subclass can be determined based on clinical experience for coding, and the code of the standard term is defined as the code of the disease term of the subclass.

Since the ICD version released in the past cannot cover the newly generated disease terms, the standard terms closest to the generic relationship of the newly generated disease terms can be searched according to clinical experience for coding needs, and the codes of the searched standard terms are used as the codes of the newly generated disease terms.

And step B3, storing the expanded terms and the codes thereof to obtain an expanded term library.

Alternatively, the augmented term library may store the augmented terms and their encodings in the form of a data table or tree structure.

Optionally, the expanded term library may be modified in real time, for example, by adding common names, alternative names or abbreviations of standard terms, and adding newly generated disease terms, so that the expanded term library covers more expanded terms to meet the ICD coding requirement.

FIG. 2E shows an extended term library in the form of a data table of this embodiment, and the shaded portion in FIG. 2E is for explanation and may not be present in the actual extended term library.

Optionally, when step S103 is implemented, the standard term or the extended term matching the name to be encoded may be searched by traversing the standard term library and the extended term library. Considering that the time cost for traversing the term library may be high, optionally, the possible generic relationship of the name to be encoded may be determined according to the semantic of the name to be encoded, and then the standard term or the extended term capable of being matched is searched in a specific data table or tree structure.

A specific example of how the exemplary method finds a standard term or an extended term that matches a name to be encoded will be described below by way of example six.

In this embodiment, the standard term library and the expanded term library cover most of the disease terms that may appear in the chinese disease diagnosis information, and basically satisfy the requirement of automatically distinguishing the disease terms in the chinese disease diagnosis information, thereby implementing the automated ICD coding. The ICD coding method provided by the embodiment does not need manual participation, and has the advantages of high coding speed, low cost, high accuracy and the like.

Example one

Referring to fig. 3A to fig. 3C, an ICD encoding method according to an embodiment of the present invention is shown.

As shown in fig. 3A, the ICD encoding method may include:

step S201, Chinese disease diagnosis information is input.

Step S202, natural language processing is carried out on the Chinese disease diagnosis information to obtain one or more names to be coded.

Step S203, based on the standard term library, the extended term library and the assumed classification term library, searching the standard term, the extended term or the assumed classification term matched with the name to be coded, and determining the successfully matched code of the standard term, the extended term or the assumed classification term as the code of the name to be coded.

In this embodiment, the standard term library and the expanded term library are created by the same method as the exemplary method, and are not described herein again.

In this embodiment, it is assumed that the classification term library is created according to the steps shown in fig. 3B:

step C1, determining the disease term that is not included in the ICD version to be referred to, is related to any standard term, is clinically defaulted to be identical to the standard term, and is not colloquial \ alternative \ abbreviation of the standard term, as the assumed classification term.

Step C2, assigning the standard term's code associated with the postulated classification term to the postulated classification term.

This is often the case in the medical field: a disease is classified into a plurality of types, one of which is a clinically common type and the others of which are clinically uncommon types, in which case medical staff often fills or reads a case to default the general name of the disease to the name of the clinically common type, and when diagnosed as those clinically uncommon types, the name of the clinically uncommon type is clearly written. For example, mitral stenosis is classified into rheumatic mitral stenosis and non-rheumatic mitral stenosis, which are clinically common and rarely seen, and medical staff usually identify "mitral stenosis" as a default to "rheumatic mitral stenosis" when filling or reading a medical record, and use "non-rheumatic mitral stenosis" when filling the medical record only when diagnosing non-rheumatic mitral stenosis, in order to distinguish.

However, ICD may not have a general name for this disease, but may have various specific types, for example, ICD may not have a disease term of "mitral stenosis" but may have "rheumatic mitral stenosis" and "non-rheumatic mitral stenosis". In this case, when ICD coding is performed based on the general term of diseases appearing in the chinese disease diagnosis information, a situation occurs in which the specific type of ICD coding is not known.

In the present embodiment, the general term of the disease in the above case is determined as the assumed categorical term.

When ICD encoding is performed, such assumed categorical terms, if encountered, may be assumed to be a clinically common type of the disease and the assumed categorical terms are assigned a code for the clinically common type of the disease.

For example, assume the categorical term "mitral stenosis," which codes the same as "rheumatic mitral stenosis.

And step C3, storing the assumed classification terms and the codes thereof to obtain an assumed classification term library.

Alternatively, the hypothetical taxonomy library may store hypothetical taxonomy terms and their encodings in the form of a data table or tree structure.

Optionally, the database of assumed taxonomy terms may be revised in real-time, for example, by adding new assumed taxonomy terms or deleting existing assumed taxonomy terms, so that the database of assumed taxonomy terms is more consistent with the ICD coding requirements.

FIG. 3C shows a database of assumed classification terms in the form of a data table of the present embodiment, and the shaded portion in FIG. 3C is for explanation and may not appear in the actual database of assumed classification terms.

Optionally, when step S203 is implemented, the standard term or the extended term or the assumed classification term matching the name to be encoded may be searched by traversing the standard term library, the extended term library and the assumed classification term library.

In consideration of the time cost of traversing the term library, optionally, the possible generic relationship of the name to be encoded may be judged according to the semantic of the name to be encoded, and then the standard term or the extended term or the assumed classification term capable of being matched is searched in a specific data table or a tree structure.

In the embodiment, on the basis of the standard term base and the expanded term base, the assumed classification term base is additionally arranged, and the assumed classification terms appearing in the Chinese disease diagnosis information are taken into consideration, so that the possible disease terms appearing in the Chinese disease diagnosis information are covered to a larger extent, a more complete basis is provided for automatically distinguishing the disease terms in the Chinese disease diagnosis information, and the realization of automatic ICD coding is facilitated. The ICD coding method provided by the embodiment does not need manual participation, and has the advantages of high coding speed, low cost, high accuracy and the like.

Example two

Referring to fig. 4A to 4B, an ICD encoding method according to an embodiment of the invention is shown.

As shown in fig. 4A, the ICD encoding method may include:

step S301, Chinese disease diagnosis information is input.

Step S302, natural language processing is carried out on the Chinese disease diagnosis information to obtain one or more names to be coded.

Step S303, based on the standard term library, the extended term library and the multi-coding term library, searching for the standard term, the extended term or the multi-coding term matched with the name to be coded, and determining the successfully matched code of the standard term, the extended term or the multi-coding term as the code of the name to be coded.

Optionally, in this step, a hypothetical classification term matched with the name to be encoded may be searched based on a hypothetical classification term library, and a code of the hypothetical classification term successfully matched is determined as the code of the name to be encoded, where this embodiment may create the hypothetical classification term library by using the same method as that in the first embodiment, and details are not repeated here.

In this embodiment, the multi-coding term library is created according to the steps shown in fig. 4B:

step D1, determining a disease term, which is not included in the ICD version to be referred to and consists of at least two different said standard terms, as a multicoding term.

And D2, combining the codes of all standard terms composing the multi-code term together to form the code of the multi-code term.

In the medical field, there are often cases where a plurality of diseases occur concurrently, and the corresponding disease term may be the result of combining a plurality of standard terms. In consideration of this situation, the present embodiment stores such disease terms as multi-code terms in a multi-code term library, and combines the codes of a plurality of standard terms constituting the multi-code terms in order as the codes of the multi-code terms.

For example, for the multicoding term "mitral stenosis with atrial fibrillation with left atrial thrombus", the standard terms constituting the multicoding term are "mitral stenosis", "atrial fibrillation" and "atrial thrombus", respectively, wherein the ICD for "mitral stenosis" is encoded as I05.000, the ICD for "atrial fibrillation" is encoded as I487.x01, the ICD for "atrial thrombus" is encoded as I51.302, and the ICD for "mitral stenosis with atrial fibrillation with left atrial thrombus" is encoded as I05.0i487.x01i 51.302.

And D3, storing the multi-coding terms and the codes thereof to obtain a multi-coding term library.

Alternatively, the multi-coding term library may store the multi-coding terms and their codes in the form of a data table or a tree structure.

Optionally, the multi-coding term library may be revised in real time, for example, by adding new multi-coding terms or deleting existing multi-coding terms, so that the multi-coding term library better conforms to the ICD coding requirement.

FIG. 4C shows a multi-coding term library in the form of a data table of this embodiment, and the shaded portion in FIG. 4C is for explanation and may not appear in the actual assumed classification term library.

Optionally, in the step S303, a standard term or an expanded term or multiple encoding terms matching the name to be encoded may be searched by traversing the standard term library, the expanded term library and the multiple encoding term library. In consideration of the time cost of traversing the term library, optionally, the possible generic relationship of the name to be encoded may be determined according to the semantics of the name to be encoded, and then the standard term or the extended term or the multiple encoding terms that can be matched are searched in a specific data table or tree structure.

In the embodiment, on the basis of the standard term base and the expanded term base, the multi-coding term base is additionally arranged, and the multi-coding terms appearing in the Chinese disease diagnosis information are taken into consideration, so that the disease terms possibly appearing in the Chinese disease diagnosis information are covered to a larger extent, a more complete basis is provided for automatically distinguishing the disease terms in the Chinese disease diagnosis information, and the realization of automatic ICD coding is facilitated. The ICD coding method provided by the embodiment does not need manual participation, and has the advantages of high coding speed, low cost, high accuracy and the like.

EXAMPLE III

Referring to fig. 5A to 5B, an ICD encoding method according to an embodiment of the invention is shown.

As shown in fig. 5A, the ICD encoding method may include:

step S401, inputting Chinese disease diagnosis information.

Step S402, natural language processing is carried out on the Chinese disease diagnosis information to obtain one or more names to be coded.

Step S403, based on the merged term library, preprocessing the one or more names to be coded obtained in step S402, determining whether all merged objects of any one or more merged terms are included in the one or more names to be coded, and if so, replacing all merged objects of any one or more merged terms with corresponding merged terms.

In this embodiment, the merged term library is created according to the steps shown in fig. 5B:

a step E1 of determining a single standard term, which can replace at least two simultaneously occurring standard terms, as a merging term; and determining each of the at least two different standard terms as a merging object of the merging term.

Step E2, determining the code of each merged term according to the ICD version to be referred to.

And E3, storing the merged term and its code, and all merged objects of the merged term to obtain a merged term library.

In ICDs, if a plurality of disease terms occur simultaneously, these plurality of disease terms occurring simultaneously may be replaced by another disease term, and in ICD coding, ICD specifies that only the coding of the single disease term is output. In the present embodiment, a single disease term that can substitute for other plural concurrently occurring disease terms belonging to the above case is determined as a merging term, and each disease term that can be substituted is determined as a merging object.

For example, in the disease category, if "gastric ulcer" and "upper gastrointestinal bleeding" are simultaneously present, they may be replaced by "gastric ulcer with bleeding", and the ICD code may be output only by outputting the "gastric ulcer with bleeding" code.

In view of the above situation, after natural language processing is performed on the chinese disease diagnosis information to obtain one or more names to be encoded, the present embodiment adds a step of preprocessing the names to be encoded, that is, searching whether there is a merged object that can be replaced in the names to be encoded, and if all merged objects corresponding to a certain merged term are included, replacing all merged objects with the merged term.

Alternatively, the merged term store may store the merged terms and their encodings in the form of a data table or tree structure.

Optionally, the merged term library may also be modified in real time, for example, when the referenced ICD version has a new updated version, the merged terms are added, modified or deleted according to the updated version, so that the merged term library better conforms to the ICD coding requirement.

FIG. 5C shows a merged term library in the form of a data table in the embodiment, and the shaded portion in FIG. 5C is for explanation and may not appear in the actual merged term library.

Step S404, based on the created standard term library, extended term library, assumed classification term library and multi-coding term library, searching the standard term, extended term, assumed classification term or multi-coding term matched with the name to be coded preprocessed in step S403, and determining the successfully matched code of the standard term, extended term, assumed classification term or multi-coding term as the code of the name to be coded.

In the present embodiment, the standard term library and the expanded term library are created by the same method as the exemplary method, the assumed classification term library is created by the same method as the first embodiment, and the multi-coding term library is created by the same method as the second embodiment, which are not repeated herein.

Optionally, in step S403, a standard term or an expanded term or a hypothetical taxonomy term or multiple coding terms matching the name to be coded may be searched by traversing the standard term library, the expanded term library, the hypothetical taxonomy term library, and the multiple coding term library. In consideration of the time cost of traversing the term library, optionally, the possible generic relationship of the name to be encoded may be judged according to the semantics of the name to be encoded, and then the standard term or the extended term or the assumed classification term or the multiple encoding terms capable of being matched are searched in a specific data table or a tree structure.

In the embodiment, on the basis of the standard term base and the expanded term base, the combined term base is added, the combined terms appearing in the Chinese disease diagnosis information are taken into consideration, the possible disease terms appearing in the Chinese disease diagnosis information are covered to a larger extent, a more complete basis is provided for automatically distinguishing the disease terms in the Chinese disease diagnosis information, and the realization of automatic ICD coding is facilitated. The ICD coding method provided by the embodiment does not need manual participation, and has the advantages of high coding speed, low cost, high accuracy and the like.

Example four

Referring to fig. 6A, an ICD encoding method according to an embodiment of the present invention is shown.

As shown in fig. 6A, the ICD encoding method may include:

step S501, Chinese disease diagnosis information is input.

Step S502, natural language processing is carried out on the Chinese disease diagnosis information to obtain one or more names to be coded.

Step S503, based on the merged term library, pre-processing the one or more names to be coded obtained in step S502, determining whether all merged objects of any one or more merged terms are included in the one or more names to be coded, and if so, replacing all merged objects of any one or more merged terms with corresponding merged terms.

Step S504, based on the standard term library, the expanded term library, the assumed classification term library and the multi-coding term library, searching the standard term, the expanded term, the assumed classification term and the multi-coding term matched with the name to be coded, and determining the codes of the successfully matched standard term, expanded term, assumed classification term and multi-coding term as the codes of the name to be coded; determining the names to be coded of the standard terms, the extended terms, the assumed classification terms and the multi-coding terms which are not matched with each other to be the names to be coded of undetermined codes;

Step S505, matching the name to be encoded of the undetermined code with the term without code in the term without code library, if the matching is successful, executing a preset processing step to indicate that the name to be encoded of the undetermined code is not encoded (for example, the output is empty, or character information such as "code without code can be encoded" is displayed), and if the matching is failed, sending the name to be encoded of the undetermined code to a manual processing platform for manual processing.

In this embodiment, the non-coding term library includes a plurality of non-coding terms. These non-coding terms include: a preset Chinese medicine term; a preset surgical terminology; a preset drug name term; a preset medical consumable term; and preset inspection terminology.

FIG. 6B shows a table-type uncoded term base of the present embodiment, and the shaded portion in FIG. 6B is for explanation and may not be present in the actual uncoded term base.

The actual chinese disease diagnosis information often relates to various concepts in the medical field, not only disease terms, but also operation terms, drug name terms, medical consumable terms, examination and inspection terms, etc., but the present invention only relates to classification and coding of diseases, and the international ICD version of disease classification does not classify and code operation terms, drug name terms, medical consumable terms, examination and inspection terms, etc., so if the operation terms, drug name terms, medical consumable terms, examination and inspection terms appear in the chinese disease diagnosis information, no coding is performed (i.e., no coding is performed). In addition, the international disease classification ICD version does not classify and encode chinese-medicine terms, so if chinese-medicine terms appear in the chinese-medicine disease diagnosis information, they are not encoded (i.e., they can be encoded without codes).

For such non-coded terms, a predetermined result (e.g., a result such as "no code can be coded" may be output) may be output to indicate that it has been identified as a surgical procedure term, a drug name term, a medical consumable term, an examination and verification term, or a medico term, except that no ICD code may be assigned.

In this embodiment, for a to-be-encoded name for which a matched standard term, an extension term, a presumed classification term, or a multi-encoded term is not found, if a matched non-encoded term can be found, it is described that the to-be-encoded name belongs to one of an operation term, a drug name term, a medical consumable term, an inspection term, or a medical term, and is not encoded, and for a non-encoded term for which a matched term cannot be found, it is described that the to-be-encoded term does not belong to the above type, and for such to-be-encoded name, this embodiment sends the to-be-encoded term to a manual processing platform, and the to-be-encoded term is processed continuously by manual, and.

EXAMPLE five

As shown in fig. 12A, a specific embodiment of natural language processing on the chinese disease diagnosis information to obtain the name to be encoded, which is applicable to the exemplary method of the present invention, includes:

and step S61, preprocessing the Chinese disease diagnosis information character string to obtain a preprocessed Chinese disease diagnosis information character string.

The purpose of this step is to convert the characters in the chinese disease diagnostic information string into a uniform coding format for subsequent processing.

Alternatively, this step may be implemented in the following specific manner: performing format normalization processing on non-Chinese characters in the Chinese disease diagnosis information character string (for example, converting all symbols in the Chinese disease diagnosis information character string into a half-corner format or converting all symbols in the Chinese disease diagnosis information character string into a full-corner format, and converting all English letters in the Chinese disease diagnosis information character string into a capital format or a lowercase format); and delete non-medical terms in the chinese disease diagnostic information string. Wherein the non-medical terms are provided by a pre-established dictionary of non-medical terms, and the non-medical terms are words or descriptive sentences (e.g., "to be investigated, cause, warm prompt, advice, please see for example the case aggravation at any time"), etc.) that play a role in remarking.

And step S62, based on the pre-established ontology dictionary, disease degree term dictionary, disease concurrent term dictionary and disease part term dictionary, segmenting the pre-processed Chinese disease diagnosis information character strings into first type sub character strings and/or second type sub character strings.

The first type substring and the second type substring have independent semantics, namely the represented medical information is not influenced by characters before or after the first type substring and the second type substring are not directly matched with the ontology in the ontology dictionary.

The ontology dictionary comprises the standard term library and the extended term library, and specifically comprises standard terms and extended terms and corresponding codes, wherein the standard terms and the extended terms are regarded as ontologies in the ontology dictionary.

It should be noted that, when the above-mentioned assumed classification term library and/or multi-coding term library is used in the automated international disease classification coding method provided by the present invention, the ontology dictionary should also include the assumed classification term library and/or multi-coding term library (in this case, the assumed classification term and/or multi-coding term is also regarded as an ontology in the ontology dictionary), so that the cut-out first type substring or second type substring can be matched with the assumed classification term or multi-coding term when being used as a name to be coded.

The disease degree term dictionary includes several disease degree terms, which are words for describing the degree of urgency or chronicity of a disease or the severity of a disease or the type of pathology or clinical stage, etc. Fig. 12B shows a part of the disease degree terms included in the disease degree term dictionary.

The disease complication term dictionary includes several disease complication terms, which are words for describing the occurrence of at least two diseases concurrently. Fig. 12C shows a part of disease concurrent terminology included in the disease concurrent terminology dictionary.

The term dictionary of onset part includes several terms of onset part, which are words for describing the onset part of a disease. FIG. 12D shows a part of the disease site terms included in the disease site term dictionary.

The purpose of this step is to segment the Chinese disease diagnosis information into substrings (first type substrings or second type substrings) with independent semantics, so as to effectively avoid the problem that a plurality of characters with association relation are respectively identified to cause identification errors.

And step S63, determining the cut first-type substring and the cut second-type substring as the names to be coded.

After the first type substring and the second type substring are determined as the names to be encoded, when the names to be encoded are subsequently preprocessed by using the merged term library in the third embodiment, since the bodies corresponding to the first type substring and the second type substring may be expanded terms, and the merged objects in the merged term library are standard terms, the expanded terms corresponding to the first type substring and the second type substring need to be converted into corresponding standard terms, and then the merged term library is used for preprocessing.

As shown in fig. 12E, step S62 specifically includes:

step S70, judging whether the character string of the preprocessed Chinese disease diagnosis information contains a symbol; if the symbol is contained, performing step S71; if no symbol is contained, step S72 is performed.

Step S71, matching the characters between every two adjacent symbols in the character string of the preprocessed Chinese disease diagnosis information with an ontology in an ontology dictionary as a whole; if the matching is successful, go to step S711; if the matching fails, step S712 is executed.

In step S711, the characters between the two adjacent symbols are cut out as the first type substring.

In step S712, the adjacent two symbols and the character therebetween are determined as the temporary non-split character string, and then step S73 is performed.

The processing rules according to step S71, step S711, and step S712 are: matching all characters between adjacent symbols with the body as a whole, and segmenting only when matching is carried out, or not segmenting temporarily.

For example, FIG. 12F shows the case of "severe arthritis, with hematocele; type a thymoma; segmentation of coronary heart disease, wherein "severe arthritis, hemagglutination", "thymoma type a" and "coronary heart disease" are all characters between symbols, and matched bodies can be found, thus being segmented out respectively.

Step S72, matching the preprocessed Chinese disease diagnosis information character string with an ontology in an ontology dictionary by adopting a mechanical word segmentation method; if all the characters in the preprocessed Chinese disease diagnosis information character string can be matched with the body, executing step S721; if there is a single character or a plurality of continuous characters in the preprocessed chinese disease diagnosis information character string that fails to match the ontology, step S722 is performed.

Step S721, segmenting the characters in the preprocessed chinese disease diagnosis information character string according to the matched ontology to be used as a first type substring.

Step S722, judging whether the single character or the plurality of continuous characters which cannot be matched with the body is a disease degree term, a disease complication term or a disease onset part term, and if the single character or the plurality of continuous characters is the disease degree term, the disease complication term or the disease onset part term, executing step S7221; if it is not a disease degree term, a disease complication term, or a disease onset part term, step S7222 is performed.

The processing rules according to steps S72, S721, and S722 are: matching the characters in the preprocessed Chinese disease diagnosis information character string with the body by adopting a mechanical word segmentation method, segmenting only when all the characters can be found out of the matched body, or not segmenting temporarily.

For example, fig. 12G shows segmentation of "hypertensive coronary heart disease", and the matching bodies of "hypertensive" and "coronary heart disease" can be found by mechanical segmentation, and thus are segmented out.

The mechanical word segmentation method adopted in step S72 may be a forward maximum matching type, a reverse maximum matching type, or a least-segmentation type. The specific segmentation process is not described in detail in this embodiment.

Step S7221, according to the position of the single character or the plurality of continuous characters which can not be matched with the body in the character string of the preprocessed Chinese disease diagnosis information, combining and cutting the single character or the plurality of continuous characters which can not be matched with the body and the single character or the plurality of continuous characters which can be matched with the body before or after the single character or the plurality of continuous characters which can not be matched with the body out as a second type sub-character string, and cutting the rest single character or the plurality of continuous characters which can be matched with the body out as a first type sub-character string.

Step S7222, the preprocessed Chinese disease diagnosis information character string is wholly divided into second type sub character strings.

The processing rules in step S7221 and step S7222 are: if the single character or a plurality of continuous characters which can not be matched with the ontology are disease degree terms, disease concurrent terms or disease incidence part terms, the segmentation is carried out and merged and segmented with the characters before or after the segmentation.

For example, fig. 12H shows a division of "prostatic hyperplasia with acute urinary retention diabetes", and the matching bodies of "prostatic hyperplasia", "acute urinary retention" and "diabetes" can be found respectively by using a mechanical word segmentation method, wherein "accompanied" is a disease-complicated term, so that "prostatic hyperplasia" and "acute urinary retention" are combined and divided, and "diabetes" is divided separately.

For example, fig. 12I shows the segmentation of "prostatic hyperplasia acute renal anemia", and the bodies matching "prostatic hyperplasia" and "renal anemia" can be found by using mechanical segmentation, wherein "acute" is a disease degree term, so that "prostatic hyperplasia" is segmented separately, and "acute" and "renal anemia" are combined and segmented.

For example, fig. 12J shows segmentation of "subacute bronchitis with prostatic hyperplasia" and the ontology of "bronchitis" and "prostatic hyperplasia" can be found by mechanical word segmentation method respectively, where "subacute" is a disease degree term and the position of "subacute" in the character string of the preprocessed chinese disease diagnosis information is the beginning, so that "subacute" and "bronchitis" are combined and segmented, and "prostatic hyperplasia" is segmented separately.

For example, fig. 12K shows segmentation of "bronchitis and advanced prostate cancer", where "advanced stage" is a disease degree term and the position of "advanced stage" in the character string of preprocessed chinese disease diagnosis information is the end, so that "bronchitis" is segmented separately and "prostate cancer" and "advanced stage" are segmented together.

Step S73, judging whether the character string which is not cut temporarily contains a preset special symbol; if the string of characters contains the special symbol, go to step S731; if the temporary non-divided character string does not contain the special symbol, step S733 is performed.

Step S731, searching the character model to which the temporarily unsingulated character string belongs, and segmenting the temporarily unsingulated character string according to the segmentation rule corresponding to the character model to which the temporarily unsingulated character string belongs; the character model is provided by a pre-established character model library, and the character model has a one-to-one correspondence segmentation rule.

Step 332, matching the cut characters with an ontology in an ontology dictionary, if the matching is successful, determining the cut characters as first-type substrings, and if the matching is failed, determining the cut characters as second-type substrings;

in step S733, the non-split character string is directly determined as the second-type substring.

The processing rules according to step S73, step S731, step 332, and step S733 are: when the temporarily unsingulated character string contains a preset special symbol, segmenting according to a character model to which the temporarily unsingulated character string belongs, otherwise, directly segmenting; and matching the character cut out based on the character model with the body again, wherein the character which can be directly matched with the body is used as a first type substring, and the character which can not be directly matched with the body is used as a second type substring.

For example, the predetermined special symbols may include, but are not limited to, commas, pause signs, periods, colon, plus signs, semicolons, slashes, and the like.

For example, the following are part of the character models in the character model library and the segmentation rules thereof:

(1) character model: XABY type, A is number, B is comma, pause or period;

and (3) segmentation rule: respectively cutting X and Y;

(2) character model: CDE type, and C, E one is Chinese character, D is colon;

and (3) segmentation rule: cutting out C, E Chinese characters;

(3) character model: FGH type, wherein F, H are all Chinese characters, G is plus sign;

and (3) segmentation rule: cutting the FGH as a whole;

(4) character model: IJK type, I, K are all Chinese characters, J is semicolon, period, question mark, exclamation mark,

and (3) segmentation rule: respectively cutting I and K;

(5) character model: LOP type, wherein L, P are all Chinese characters, and O is colon;

and (3) segmentation rule: cutting out the LOP as a whole;

(6) character model: STU type, and S and/or U is single Chinese character, T is slash line;

and (3) segmentation rule: the STU is cut out as a whole.

For example, for "abdominal pain: ?' and finding out the character model library to find out the type of CDE, the "abdominal pain" is cut out separately.

For example, for "congenital heart disease: and (3) segmenting ventricular septal defect, and searching a character model library to find that the ventricular septal defect belongs to the LOP type, wherein the method comprises the following steps of: ventricular septal defect is divided into two parts.

For example, if the "Mycoplasma/Chlamydia infection" is segmented and found to be STU-type by searching the character model library, the "Mycoplasma/Chlamydia infection" is segmented as a whole.

For example, to "abdominal pain; the prostatitis is segmented, and the examination of the character model library shows that the prostatitis belongs to the IJK type, and the prostatitis is segmented into abdominal pain and prostatitis.

For example, for the disease of 1, cervical spondylosis 2, lumbar disc herniation 3, pregnancy 24+3 weeks 4, uterine prolapse, II degrees; 5. the method comprises the steps of carrying out segmentation on Mycoplasma/Chlamydia infection, finding a character model library to know that the character string relates to a plurality of character models, and finally segmenting characters which are respectively 'cervical spondylosis', 'lumbar disc herniation', 'pregnancy 24+3 weeks', 'uterine prolapse, II degree' and 'Mycoplasma/Chlamydia infection', continuously matching the segmented characters with a body, wherein the 'cervical spondylosis' and the 'lumbar disc herniation' can be directly matched with the body to be used as a first type of sub-character string, and the 'pregnancy 24+3 weeks', 'uterine prolapse, II degree' and 'Mycoplasma/Chlamydia infection' cannot be directly matched with the body to be used as a second type of sub-character string.

In the process of natural language processing of the Chinese disease diagnosis information, the method fully considers the characteristics that the Chinese disease diagnosis information belongs to natural language, has complex and various formats and has no unified standard, and the like, and performs segmentation and matching on the Chinese disease diagnosis information character strings by utilizing various pre-established dictionaries so as to identify the disease diagnosis names as the names to be coded.

EXAMPLE six

As shown in fig. 13, a specific embodiment of finding a standard term or an extended term matching a name to be encoded, which is applicable to the exemplary method of the present invention, includes:

step S80, if the name to be coded is a first type substring, determining the ontology matched with the first type substring as a standard term or an extended term matched with the name to be coded, and if the name to be coded is a second type substring, performing first-dimension analysis on the second type substring and each ontology in an ontology dictionary to obtain a plurality of first-dimension analysis results of the second type substring and a plurality of first-dimension analysis results of each ontology;

this step uses the substring of the second type and the ontology as the parsing object, optionally, the parsing of the parsing object in the first dimension may include but is not limited to:

(1) determining the letters of the beginning part in the analysis object, wherein if the beginning part is not the letters, the analysis result is null;

(2) determining the disease degree term contained in the analysis object, if the disease degree term is not contained in the analysis object, the analysis result is null;

(3) determining characters after commas in the analysis object, and if the characters do not contain commas, determining that the analysis result is null;

(4) determining characters in parentheses in the analysis object, and if the characters do not contain the parentheses, the analysis result is null; and the number of the first and second groups,

(5) the characters (hereinafter referred to as the remaining characters in the body) other than the letters of the beginning part, the disease degree term, the characters after comma, and the characters in parentheses in the analysis object are determined, and are generally the core word stem of the analysis object.

When the parsing object is a substring of the second type, the respective first-dimension parsing results thereof may include, but are not limited to: the letters at the beginning of the second type substring, the disease degree term contained in the second type substring, the character after comma in the second type substring, the character in parentheses in the second type substring, and the remaining characters.

When the analysis object is an ontology, the respective first-dimension analysis results thereof may include, but are not limited to: letters at the beginning of the body, terms of the extent of disease contained in the body, characters after commas in the body, characters in parentheses in the body, and the remaining characters.

Step S81, matching each first dimension analysis result of the second type substring with each first dimension analysis result of each body in the body dictionary, and searching whether each first dimension analysis result of a certain body is matched with each first dimension analysis result of the second type substring; if such an ontology exists, step S82 is performed, and if such an ontology does not exist, step S83 is performed.

And step S82, determining the searched body as the body matched with the second type substring.

Step S83, selecting a part of first dimension analysis results in all first dimension analysis results of the second type substring to match with a part of first dimension analysis results in all first dimension analysis results of each body in the body dictionary, and searching whether the part of first dimension analysis results of a certain body is matched with the part of first dimension analysis results of the second type substring; if such an ontology exists, step S831 is performed; if no such ontology exists, step S832 is performed.

In step S831, the searched ontology is determined as the ontology matched with the second type substring.

Matching the letters at the beginning of the second type substring with the letters at the beginning of the body, matching the disease degree terms contained in the second type substring with the disease degree terms contained in the body, matching the characters after commas in the second type substring with the characters after commas in the body, matching the characters in parentheses in the second type substring with the characters in parentheses in the body, and matching the remaining characters in the second type substring with the remaining characters in the body.

And if all the first dimension resolution results are matched, determining the ontology as the ontology matched with the second type substring.

And if some first-dimension analysis results are not matched, selecting part of the first-dimension analysis results to be matched respectively.

In view of the fact that the remaining characters in the second-type substring are often the core stems of the second-type substring, in an embodiment, it is preferable that the selected portion of the first-dimension resolution result at least includes the remaining characters in the second-type substring and the remaining characters in the body. For example, only the remaining characters of the analysis object and the disease degree term are selected and matched, or only the remaining characters of the analysis object are selected and matched, or the remaining characters of the analysis object and the initial letters, the disease degree term, the characters after comma, the characters in parentheses, or the like may be selected and matched.

For example, a second-type substring is "mucopolysaccharide storage disorder type 4", and the first-dimension analysis is performed on the second-type substring, and the obtained analysis results are shown in table 1, and table 2 shows the ontology matched with the second-type substring and the respective first-dimension analysis results thereof.

TABLE 1

TABLE 2

Step S832, performing second-dimension analysis on the second-type substring and each ontology in the ontology dictionary to obtain second-dimension analysis results of the second-type substring and each second-dimension analysis result of each ontology in the ontology dictionary.

In this step, the second type substring and the ontology are respectively used as analysis objects, and optionally, performing preset dimension analysis on the analysis objects may include but is not limited to:

(1) determining each Chinese character in the analysis object;

(2) determining the initial consonant of each Chinese character in the analysis object;

(3) determining the vowel of each Chinese character in the analysis object;

(4) determining a first character of the analysis object;

(5) determining the pinyin of the first character of the analysis object; and the number of the first and second groups,

(6) and determining the non-Chinese characters in the analysis object, and if the non-Chinese characters are not contained in the analysis object, the analysis result of the item is null.

When the parsing object is a substring of the second type, the parsing results of the dimensions thereof may include, but are not limited to: each Chinese character in the second type sub-character string, the initial consonant of each Chinese character in the second type sub-character string, the vowel of each Chinese character in the second type sub-character string, the first character of the second type sub-character string, the pinyin of the first character of the second type sub-character string, and the non-Chinese character in the second type sub-character string.

When the parsing object is an entry, the parsing result may include, but is not limited to: each Chinese character in the entry, the initial consonant of each Chinese character in the entry, the vowel of each Chinese character in the entry, the first character of the entry, the pinyin of the first character of the entry, and the non-Chinese character of the entry.

For example, table 3 shows the second dimension analysis results of the second type substring "hypertension".

TABLE 3

Step S833, based on the plurality of second dimension analysis results of the second type substring and the plurality of second dimension analysis results of the ontology, calculating a matching degree between the second type substring and each ontology.

Specifically, the step may calculate the similarity between the second-type substring and each ontology, or may calculate the total confidence of the second-type substring and each ontology. Compared with the similarity, the total confidence can reflect the matching degree of the substrings of the second type and each ontology, but the calculation process of the total confidence is more complex compared with the calculation process of the similarity. Specifically, in the implementation of step S833, if a faster processing speed is required, a process of calculating the similarity may be selected, and if a more accurate matching result is required, a process of calculating the total confidence may be selected.

In step S833, a similarity between the second-type substring and each ontology is calculated as follows:

calculating the similarity between the second-type substring and each ontology according to the following formula, and determining the calculated similarity as the matching degree between the second-type substring and each ontology:

wherein M represents similarity;

t represents each second dimension analysis result of the second type substring;

q represents a second type substring;

t in q represents each second dimension of the second type substring;

d represents an ontology;

tf (t in d) represents the frequency of matching the second dimension analysis result of the second type substring with the second dimension analysis result of the ontology in the same second dimension;

wherein T represents the total number of ontologies in the ontology dictionary, and T (T) represents the total number of ontologies of which each second-dimension analysis result is matched with each second-dimension analysis result of the second-type substring;

getboost () represents a preset weight of each second dimension;

norm (t, d) represents the length normalization factor of the ontology.

One implementation of step S833 is to calculate the total confidence of the substring of the second type and each ontology, which is specifically as follows:

calculating the total confidence of the second-type substrings and each ontology according to the following process, and determining the calculated total confidence as the matching degree of the second-type substrings and each ontology:

1) each Chinese character in the second type sub-string is determined.

2) Calculating the cosine confidence of each body matched with the second type substring according to the following formula:

wherein N represents a cosine confidence;

v represents the total number of Chinese characters contained in the second type substring and the matched body;

q represents a second type substring;

d' represents an ontology matching the second type substring;

w_Q,jrepresenting the frequency of occurrence of each Chinese character in the second type sub-character string;

w_d',jrepresenting the frequency of each Chinese character appearing in the body matched with the second type substring;

j represents the serial number of the Chinese character contained in the second type substring and the matched body.

3) Calculating the total confidence of each ontology with which the second type substring is matched according to the following formula:

S＝M×a+N×b

wherein S represents the total confidence;

m represents similarity;

a represents a preset weight corresponding to the similarity M;

b represents a preset weight corresponding to the cosine confidence coefficient N;

and, the similarity M is calculated according to the following formula:

wherein t represents each second dimension analysis result of the second type substring;

q represents a second type substring;

t in q represents each second dimension of the second type substring;

d represents an ontology;

getboost () represents a preset weight of each second dimension;

norm (t, d) represents the length normalization factor of the ontology.

Step S834 of determining one or more ontologies as ontologies matched with the second type substring according to the matching degree of the second type substring and each ontology.

Alternatively, the step may be embodied as follows: sorting all the ontologies according to the matching degree with the second type substring, and determining the ontologies with a preset number (for example, 2 ontologies with the top) in the top sorting as ontologies matched with the second type substring; or determining one or more ontologies of which the matching degree with the second type substring reaches a preset threshold value as ontologies matched with the second type substring.

In order to clarify the matching degree of the second-type substring with each matching ontology and utilize the matching degree, the final output result may further include the matching degree of the second-type substring with each matching ontology. For example, the matching degree of the second-type substring and each matching ontology is output, and then one ontology which is matched with the second-type substring can be selected manually according to the matching degree.

Step S84, determining the ontology matched with the second type substring or one or more ontologies that reach the preset matching condition with the second type substring as the standard term or the extended term matched with the name to be encoded.

In the process of natural language processing of the Chinese disease diagnosis information, the method fully considers the characteristics that the Chinese disease diagnosis information belongs to natural language, has complex and various formats and has no unified standard, and the like, and uses a plurality of pre-established dictionaries to segment and match the character strings of the Chinese disease diagnosis information so as to search the standard terms or the expansion terms matched with the names to be coded.

Exemplary device

Having described the method of an exemplary embodiment of the present invention, the ICD encoding system of an exemplary embodiment of the present invention is next described with reference to fig. 7.

Implementation of the ICD coding system can refer to implementation of the above method, and repeated details are omitted. The term "module," as used below, may be a combination of software and/or hardware that implements a predetermined function. While the system described in the embodiments below is preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.

As shown in fig. 7, the ICD encoding system may include: a standard term library creating module 61, an extended term library creating module 62, an importing module 63, a data processing module 64 and an encoding module 65.

A standard term library creating module 61, configured to determine, according to an ICD version to be referred to, each disease term included in the ICD version to be referred to as a standard term; determining the code of each standard term according to the ICD version to be referred to; and storing the standard terms and the codes thereof to obtain a standard term library.

Alternatively, the CD version to be referred to may be an ICD version published by the WHO (e.g., ICD-10 published by the WHO in 1992) or various localized ICD versions extended from the ICD version published by the WHO (e.g., chinese version of ICD-10 recommended by the ministry of health in china). In specific implementation, an appropriate ICD version can be selected as a reference according to actual needs, and the present invention is not limited to this.

An augmented term library creation module 62 for determining as augmented terms the following various types not included in the ICD version to be referenced: colloquial \ alternative \ acronym for the standard terms, subclass disease terms for the standard terms, and newly generated disease terms after publication of the ICD version to which reference is made; when the expansion term is judged to be the colloquial name \ alternative name \ abbreviation of any one standard term, the code of the standard term is given to the expansion term; when the expansion term is judged to be a disease term of the subclass of any one of the standard terms or the newly generated disease term, the expansion term is assigned with the code of the standard term closest to the generic relationship of the expansion term; and storing the expansion terms and the codes thereof to obtain an expansion term library.

And the import module 63 is used for inputting Chinese disease diagnosis information.

And the data processing module 64 is used for performing natural language processing on the Chinese disease diagnosis information to obtain one or more names to be coded.

Specifically, the data processing module 64 may perform word segmentation, word extraction, and other processing on the chinese disease diagnosis information based on the characteristics of the chinese disease diagnosis information, so as to analyze the disease terms from the chinese disease diagnosis information, where the disease terms analyzed from the chinese disease diagnosis information are the names to be encoded.

And the encoding module 65 is configured to search for the standard term or the extended term matched with the name to be encoded based on the standard term library and the extended term library, and determine the successfully matched code of the standard term or the successfully matched code of the extended term as the code of the name to be encoded.

Optionally, as shown in fig. 8, the ICD coding system may further include, in addition to the standard term library creating module 61, the extended term library creating module 62, the importing module 63, the data processing module 64, and the coding module 65: assume a taxonomy term library creation module 71.

A hypothesis taxonomy term library creation module 71, configured to determine, as a hypothesis taxonomy term, a disease term that is not included in the ICD version to be referred to, is related to any one of the standard terms, is clinically equivalent to the standard term by default, and is not colloquial \ alternative \ abbreviation of the standard term; assigning the code of the standard term associated with the postulated classification term to the postulated classification term; and storing the assumed classification terms and the codes thereof to obtain an assumed classification term library.

In the ICD encoding system shown in fig. 8, the encoding module 65 is further configured to search the assumed classification term matching the name to be encoded based on the assumed classification term library; and determining the codes of the assumed classification terms which are successfully matched as the codes of the names to be coded.

Optionally, as shown in fig. 9, the ICD coding system may further include, in addition to the standard term library creating module 61, the extended term library creating module 62, the importing module 63, the data processing module 64, and the coding module 65: a multi-coding term base creation module 81.

A multi-coding term library creation module 81 for determining a disease term, which is not included in the ICD version to be referred to and is composed of at least two different standard terms, as a multi-coding term; combining codes of all standard terms constituting the multi-coded term together as a code of the multi-coded term; and storing the multi-coding terms and the codes thereof to obtain a multi-coding term library.

In the ICD encoding system shown in fig. 9, the encoding module 65 is further configured to search, based on the multi-coding term library, a multi-coding term matched with the name to be encoded; and determining the codes of the multiple coding terms which are successfully matched as the codes of the names to be coded.

Optionally, as shown in fig. 10, the ICD coding system may further include, in addition to the standard term library creating module 61, the extended term library creating module 62, the importing module 63, the data processing module 64, and the coding module 65: a merged term base creation module 91 and a preprocessing module 92.

A merged term base creation module 91 for determining a single standard term, which can replace at least two simultaneously occurring standard terms, as a merged term; and determining each of the at least two simultaneously occurring standard terms as a merging object of the merging term; determining the code of each merging term according to the ICD version to be referred to; and storing the merged term and the code thereof and all merged objects of the merged term to obtain a merged term library.

A preprocessing module 92, configured to preprocess the one or more names to be coded obtained by the data processing module 64, determine whether all merging objects of any one or more merging terms are included in the one or more names to be coded, and if so, replace all merging objects of any one or more merging terms with corresponding merging terms; the preprocessed name to be encoded is then sent to the encoding module 65.

Optionally, the ICD coding system may include, in addition to the standard term library creating module 61, the extended term library creating module 62, the importing module 63, the data processing module 64, and the coding module 65, the ICD coding system further includes: and the real-time revising module is used for revising the standard term library, the expanded term library, the assumed classification term library, the multi-coding term library and the combined term library in real time.

Optionally, as shown in fig. 11, the ICD coding system may further include, in addition to the standard term library creating module 61, the extended term library creating module 62, the importing module 63, the data processing module 64, and the coding module 65: there is no encoding processing module 101.

The no-code processing module 101 is configured to match a name to be coded of an undetermined code with a no-code term in a no-code term library, if the matching is successful, not code the name to be coded of the undetermined code and/or output a preset result, and if the matching is failed, send the name to be coded of the undetermined code to a manual processing platform for manual processing. Wherein the non-coding term library comprises a plurality of non-coding terms. These several unencoded terms include: a preset Chinese medicine term; a preset surgical terminology; a preset drug name term; a preset medical consumable term; and preset inspection terminology.

The ICD coding system provided by the embodiment of the invention meets the requirement of automatically distinguishing the disease terms in the Chinese disease diagnosis information by establishing a plurality of term banks to cover the disease terms which may appear in most of the Chinese disease diagnosis information, so that the automatic ICD coding is realized.

It should be noted that although several modules of the ICD encoding system are mentioned in the above detailed description, such partitioning is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the modules described above may be embodied in one module according to embodiments of the invention. Conversely, the features and functions of one module described above may be further divided into embodiments by a plurality of modules.

Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. An automated international disease classification coding method comprising:

step 1, inputting Chinese disease diagnosis information;

the standard term library is created according to the following mode:

determining each disease term contained in the international disease classification ICD version to be referred to as a standard term;

determining the code of each standard term according to the ICD version of the international disease classification to be referred to;

wherein, the extended term library is created according to the following mode:

determining as expanded terms the following various types not included in the international disease classification ICD version to which reference is made: colloquial \ alternative \ acronym for the standard terms, subclass disease terms for the standard terms, and disease terms newly generated after publication of the international disease classification ICD version to which reference is made;

storing the expansion terms and the codes thereof to obtain an expansion term library;

wherein the step 2 comprises:

step 21, preprocessing the Chinese disease diagnosis information character string to obtain a preprocessed Chinese disease diagnosis information character string;

step 22, based on a pre-established ontology dictionary, a disease degree term dictionary, a disease concurrent term dictionary and a disease part term dictionary, segmenting the preprocessed Chinese disease diagnosis information character strings into a plurality of first type sub character strings and/or second type sub character strings;

the ontology dictionary comprises the standard term library and the extended term library, and the standard term and the extended term are ontologies;

the disease degree term dictionary includes several disease degree terms, which are words for describing the degree of acute or chronic disease or the severity of disease or the type of pathology or clinical stage;

the disease complication term dictionary includes a number of disease complication terms, which are words for describing the occurrence of at least two diseases concurrently;

the morbidity site term dictionary comprises a plurality of morbidity site terms, and the morbidity site terms are words for describing disease morbidity sites;

the first type substring is directly matchable with an ontology in the ontology dictionary, and the second type substring is not directly matchable with an ontology in the ontology dictionary;

step 23, determining the cut first type substring and the cut second type substring as names to be coded;

wherein the step 21 comprises:

carrying out format normalization processing on non-Chinese characters in the Chinese disease diagnosis information character string, and deleting non-medical terms in the Chinese disease diagnosis information character string to obtain a preprocessed Chinese disease diagnosis information character string, wherein the non-medical terms are provided by a pre-established non-medical term dictionary and are words with remarking function;

wherein the step 22 comprises:

judging whether the preprocessed Chinese disease diagnosis information character string contains a symbol or not;

if the preprocessed Chinese disease diagnosis information character string contains symbols, matching characters between every two adjacent symbols in the preprocessed Chinese disease diagnosis information character string with an ontology in an ontology dictionary as a whole; if the matching is successful, cutting out the characters between the two adjacent symbols as a first type substring; if the matching fails, determining the two adjacent symbols and the character between the two adjacent symbols as a temporary unsingulated character string, and judging whether the temporary unsingulated character string contains a preset special symbol or not;

if the temporarily unsingulated character string contains the special symbol, searching a character model to which the temporarily unsingulated character string belongs, segmenting the temporarily unsingulated character string according to a segmentation rule corresponding to the character model to which the temporarily unsingulated character string belongs, matching the segmented character with a body in a body dictionary, if the matching is successful, taking the segmented character as a first type sub-character string, and if the matching is failed, taking the segmented character as a second type sub-character string; the character model is provided by a pre-established character model library, and the character model has one-to-one corresponding segmentation rule;

if the temporary non-segmentation character string does not contain a special symbol, directly determining the temporary non-segmentation character string as a second type sub-character string;

if the preprocessed Chinese disease diagnosis information character string does not contain symbols, matching a single character or a plurality of continuous characters in the preprocessed Chinese disease diagnosis information character string with an ontology in the ontology dictionary by adopting a mechanical word segmentation method;

if all characters in the preprocessed Chinese disease diagnosis information character string can be matched with the body, cutting out a single character or a plurality of continuous characters in the preprocessed Chinese disease diagnosis information character string as a first type sub-character string according to the matched body;

if the preprocessed Chinese disease diagnosis information character string has a single character or a plurality of continuous characters which cannot be matched with the body, judging whether the single character or the plurality of continuous characters which cannot be matched with the body is a disease degree term, a disease complication term or a disease incidence part term;

when the single character or the plurality of continuous characters which cannot be matched with the body is a disease degree term, a disease complication term or a disease incidence part term, combining and cutting the single character or the plurality of continuous characters which cannot be matched with the body and the single character or the plurality of continuous characters which can be matched with the body before or after the single character or the plurality of continuous characters which cannot be matched with the body into second type sub character strings according to the positions of the single character or the plurality of continuous characters which cannot be matched with the body in the pre-processed Chinese disease diagnosis information character strings, and cutting the rest single character or the plurality of continuous characters which can be matched with the body in the pre-processed Chinese disease diagnosis information character strings into first type sub character strings;

and when the single character or the plurality of continuous characters which are not matched with the ontology are not disease degree terms, disease complication terms or disease part terms, cutting the whole preprocessed Chinese disease diagnosis information character string into second type substrings.

2. The automated international disease classification encoding method of claim 1,

the step 3 further comprises: searching a hypothetical classification term matched with the name to be coded based on a hypothetical classification term library; determining the codes of the assumed classification terms which are successfully matched as the codes of the names to be coded;

wherein the assumed classification term library is created as follows:

determining as putative classification terms disease terms that are not included in the ICD version of the international disease classification to which reference is made, that are related to any of the standard terms, that are clinically default to being identical to the standard terms, and that are not colloquial \ alternative \ abbreviations for the standard terms;

assigning the code of the standard term associated with the postulated classification term to the postulated classification term;

and storing the assumed classification terms and the codes thereof to obtain an assumed classification term library.

3. The automated international disease classification encoding method of claim 1,

the step 3 further comprises: searching a multi-coding term matched with the name to be coded based on a multi-coding term library; determining the codes of the multiple coding terms which are successfully matched as the codes of the names to be coded;

the multi-coding term library is created according to the following mode:

determining a disease term, which is not included in the international disease classification ICD version to which reference is made and consists of at least two different standard terms, as a multi-coding term;

combining codes of all standard terms constituting the multi-coded term together as a code of the multi-coded term;

and storing the multi-coding terms and the codes thereof to obtain a multi-coding term library.

4. The automated international disease classification encoding method of claim 1,

before the step 3, the method further comprises the following steps: preprocessing the one or more names to be coded based on a merged term library;

the merged term library is created as follows:

determining a single standard term, which can replace at least two simultaneously occurring standard terms, as a merged term; and determining each of the at least two simultaneously occurring standard terms as a merging object of the merging term;

determining the code of each merged term according to the ICD version of the international disease classification to be referred to;

storing the merged term and the code thereof and all merged objects of the merged term to obtain a merged term library;

the step of preprocessing the one or more names to be coded based on the created merged term library comprises the following steps:

and judging whether all the merging objects of any one or more merging terms are contained in the one or more names to be coded, and if so, replacing all the merging objects of any one or more merging terms with corresponding merging terms.

5. The automated international disease classification coding method according to any one of claims 1 to 4, wherein the step 3 is followed by further comprising:

step 4, matching the name to be coded which is not determined to be coded with the term without code in the term library without code, if the matching is successful, executing a preset processing step to show that the name to be coded which is not determined to be coded is not coded, and if the matching is failed, sending the name to be coded which is not determined to be coded to a manual processing platform for manual processing;

wherein the uncoded term library comprises a plurality of uncoded terms;

the number of unencoded terms includes:

a preset Chinese medicine term;

a preset surgical terminology;

a preset drug name term;

a preset medical consumable term; and

a preset inspection test term.

6. The automated international disease classification coding method according to claim 1, wherein the international disease classification ICD version to be referred to is an ICD version published by a world health organization WHO or various localized ICD versions extended from the ICD version published by the world health organization WHO.

7. The automated international disease classification coding method according to claim 1, wherein the step of searching for a standard term or an extended term matching the name to be coded in the step 3 comprises:

if the name to be coded is a first-type substring, determining an ontology matched with the first-type substring as a standard term or an extended term matched with the name to be coded;

if the name to be coded is a second type substring, then:

performing first-dimension analysis on the second type substring and each ontology in the ontology dictionary to obtain a plurality of first-dimension analysis results of the second type substring and a plurality of first-dimension analysis results of each ontology in the ontology dictionary;

matching each first-dimension analysis result of the second type substring with each first-dimension analysis result of each body in the body dictionary, and judging whether a body with each first-dimension analysis result matched with each first-dimension analysis result of the second type substring exists or not;

if an ontology exists, wherein each first-dimension analysis result is matched with each first-dimension analysis result of the second-type substring, determining the ontology as the ontology matched with the second-type substring;

if the ontology does not exist, the first dimension analysis results of the second type substrings are matched with the first dimension analysis results of the second type substrings, selecting partial first dimension analysis results of all the first dimension analysis results of the second type substrings to be matched with partial first dimension analysis results of all the first dimension analysis results of each ontology in the ontology dictionary, and judging whether the ontology exists, the partial first dimension analysis results of which are matched with the partial first dimension analysis results of the second type substrings;

if an ontology of which the part of the first dimension analysis result is matched with the part of the first dimension analysis result of the second type substring exists, determining the ontology as the ontology matched with the second type substring;

if an ontology of which the partial first dimension analysis result is matched with the partial first dimension analysis result of the second type substring does not exist, performing second dimension analysis on the second type substring and each ontology in the ontology dictionary to obtain a plurality of second dimension analysis results of the second type substring and a plurality of second dimension analysis results of each ontology in the ontology dictionary;

calculating the matching degree of the second type substrings and each body based on a plurality of second dimension analysis results of the second type substrings and a plurality of second dimension analysis results of the bodies;

determining one or more ontologies as ontologies matched with the second type substring according to the matching degree of the second type substring and each ontology;

and determining the ontology matched with the second type substring as a standard term or an extended term matched with the name to be coded.

8. The automated international disease classification encoding method of claim 7, wherein the first dimension resolution results of the second type substring \ ontology are respectively:

the second type substring \ an orientation term in the body;

the second type substring \ a rank term in the ontology;

the second type substring \ characters in parentheses in the body;

the second type substring \ the character after the dash in the body; and the number of the first and second groups,

the second type substring \ characters in the body except for orientation terms, level terms, characters in parentheses and characters after dash sign;

the partial first dimension analysis results in all the first dimension analysis results of the second type substring \ body comprise: in the second type substring \ characters in the body except for orientation terms, level terms, characters in parentheses and characters after dash sign; and, one or more of:

the second type substring \ orientation term and level term in the body;

the second type substring \ characters in parentheses in the body;

the second type substring \ the character after the dash in the body.

9. The automated international disease classification encoding method of claim 7, wherein each second dimension parsing result of the second type substring \ ontology is:

each Chinese character of the second type substring \ the body;

the initial consonant of each Chinese character of the second type substring \ the body;

the vowel of each Chinese character of the second type substring \ the body;

the first character of the second type substring \ the body;

the pinyin of the first character of the second type substring \ the body; and the number of the first and second groups,

the second type substring \ non-Chinese characters in the body.

10. The automated international disease classification encoding method of claim 7, wherein the step of calculating the degree of matching of the second type substring with each ontology based on the second dimension parsing results of the second type substring and the second dimension parsing results of the ontology comprises:

calculating the similarity of the second type substring and each ontology according to the following formula:

wherein M represents similarity;

q represents a second type substring;

t in q represents each second dimension of the second type substring;

d represents an ontology;

getboost () represents a preset weight of each second dimension;

norm (t, d) represents the length normalization factor of the ontology;

and determining the similarity obtained by calculation as the matching degree of the second type substring and each ontology.

11. The automated international disease classification encoding method of claim 7, wherein the step of calculating the degree of matching of the second type substring with each ontology based on the second dimension parsing results of the second type substring and the second dimension parsing results of the ontology comprises:

determining each Chinese character in the second type sub-character string;

calculating the cosine confidence of each body matched with the second type substring according to the following formula:

calculating the total confidence of each body matched with the second type substring according to the following formula:

S＝M×a+N×b

wherein N represents a cosine confidence;

q represents a second type substring;

d' represents an ontology matching the second type substring;

w_Q,jindicating each Chinese character in the second type sub-charactersThe frequency of occurrence in the string;

j represents the serial number of the Chinese character contained in the second type substring and the matched body;

s represents the total confidence;

m represents similarity;

a represents a preset weight corresponding to the similarity M;

and, the similarity M is calculated according to the following formula:

q represents a second type substring;

t in q represents each second dimension of the second type substring;

d represents an ontology;

getboost () represents a preset weight of each second dimension;

norm (t, d) represents the length normalization factor of the ontology;

and determining the total confidence obtained by calculation as the matching degree of the second type substring and each ontology.

12. The automated international disease classification coding method according to claim 7, wherein the step of determining one or more ontologies as the ontology to which the second-type substring matches according to the matching degree of the second-type substring with each ontology comprises:

sequencing all the bodies according to the matching degree of the second type substrings, and determining the bodies with preset number in the front of the sequencing as the bodies matched with the second type substrings;

or,

and determining one or more bodies with the matching degree with the second type substring reaching a preset threshold as bodies matched with the second type substring.

13. An automated international disease classification coding system comprising:

the standard term library creating module is used for determining each disease term contained in the international disease classification ICD version to be referred to as a standard term according to the international disease classification version to be referred to; determining the code of each standard term according to the ICD version of the international disease classification to be referred to; storing the standard terms and the codes thereof to obtain a standard term library;

an augmented term library creation module for determining as augmented terms the following types not included in the international disease classification ICD version to be referred to: colloquial \ alternative \ acronym for the standard terms, subclass disease terms for the standard terms, and disease terms newly generated after publication of the international disease classification ICD version to which reference is made; when the expansion term is judged to be the colloquial name \ alternative name \ abbreviation of any one standard term, the code of the standard term is given to the expansion term; when the expansion term is judged to be a disease term of the subclass of any one of the standard terms or the newly generated disease term, the expansion term is assigned with the code of the standard term closest to the generic relationship of the expansion term; storing the expansion terms and the codes thereof to obtain an expansion term library;

the import module is used for inputting Chinese disease diagnosis information;

the coding module is used for searching a standard term or an extended term matched with the name to be coded based on the standard term library and the extended term library, and determining the successfully matched code of the standard term or the successfully matched code of the extended term as the code of the name to be coded;

the data processing module is used for carrying out natural language processing on the Chinese disease diagnosis information to obtain one or more names to be coded, and the method is as follows:

wherein the step 21 comprises:

wherein the step 22 comprises:

if a single character which cannot be matched with the body exists in the preprocessed Chinese disease diagnosis information character string or

If a plurality of continuous characters exist, judging whether the single character or the plurality of continuous characters which cannot be matched with the body are disease degree terms, disease complication terms or disease incidence part terms;

14. The automated international disease classification-coding system of claim 13, wherein the system further comprises:

a presumption classification term library creating module for determining a disease term which is not included in the ICD version of the international disease classification to be referred to, is related to any standard term, is clinically defaulted to be identical to the standard term and is not colloquially known as, or alternatively known as, or abbreviated as, the standard term as a presumption classification term; assigning the code of the standard term associated with the postulated classification term to the postulated classification term; storing the assumed classification terms and the codes thereof to obtain an assumed classification term library;

the coding module is further used for searching a hypothetical classification term matched with the name to be coded based on the hypothetical classification term library; and determining the codes of the assumed classification terms which are successfully matched as the codes of the names to be coded.

15. The automated international disease classification-coding system of claim 13, wherein the system further comprises:

a multi-coding term library creation module for determining a disease term, which is not included in the ICD version of the international disease classification to be referred to and is composed of at least two different standard terms, as a multi-coding term; combining codes of all standard terms constituting the multi-coded term together as a code of the multi-coded term; storing the multi-coding terms and the codes thereof to obtain a multi-coding term library;

the coding module is further used for searching a multi-coding term matched with the name to be coded based on the multi-coding term library; and determining the codes of the multiple coding terms which are successfully matched as the codes of the names to be coded.

16. The automated international disease classification-coding system of claim 13, wherein the system further comprises:

a merged term base creation module for determining a single standard term, which can replace at least two simultaneously occurring standard terms, as a merged term; and determining each of the at least two simultaneously occurring standard terms as a merging object of the merging term; determining the code of each merged term according to the ICD version of the international disease classification to be referred to; storing the merged term and the code thereof and all merged objects of the merged term to obtain a merged term library;

the preprocessing module is used for preprocessing one or more names to be coded obtained by the data processing module, judging whether all merging objects of any one or more merging terms are contained in the one or more names to be coded, and if so, replacing all merging objects of any one or more merging terms with corresponding merging terms; and then sending the preprocessed name to be coded to the coding module.

17. The automated international disease classification-coding system of any one of claims 13 to 16, further comprising:

the non-coding processing module is used for matching the name to be coded of the undetermined code with a non-coding term in the non-coding term library, if the matching is successful, the name to be coded of the undetermined code is not coded and/or a preset result is output, and if the matching is failed, the name to be coded of the undetermined code is sent to the manual processing platform for manual processing;

wherein the uncoded term library comprises a plurality of uncoded terms;

the number of unencoded terms includes:

a preset Chinese medicine term;

a preset surgical terminology;

a preset drug name term;

a preset medical consumable term; and

a preset inspection test term.

18. The automated international disease classification coding system of claim 13, wherein the international disease classification ICD version to be referred to is an ICD version published by the world health organization WHO or various localized ICD versions extended from the ICD version published by the world health organization WHO.