WO2023078025A1 - 一种基于任务分解策略的发热待查辅助鉴别诊断系统 - Google Patents
一种基于任务分解策略的发热待查辅助鉴别诊断系统 Download PDFInfo
- Publication number
- WO2023078025A1 WO2023078025A1 PCT/CN2022/124226 CN2022124226W WO2023078025A1 WO 2023078025 A1 WO2023078025 A1 WO 2023078025A1 CN 2022124226 W CN2022124226 W CN 2022124226W WO 2023078025 A1 WO2023078025 A1 WO 2023078025A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- fever
- time
- variable
- differential diagnosis
- Prior art date
Links
- 206010037660 Pyrexia Diseases 0.000 title claims abstract description 134
- 238000003748 differential diagnosis Methods 0.000 title claims abstract description 48
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 20
- 208000037194 Fever of Unknown Origin Diseases 0.000 title abstract description 6
- 238000013145 classification model Methods 0.000 claims abstract description 23
- 238000011835 investigation Methods 0.000 claims description 53
- 238000007781 pre-processing Methods 0.000 claims description 40
- 238000000034 method Methods 0.000 claims description 30
- 239000013598 vector Substances 0.000 claims description 30
- 238000012549 training Methods 0.000 claims description 24
- 238000000605 extraction Methods 0.000 claims description 23
- 208000024891 symptom Diseases 0.000 claims description 19
- 238000005516 engineering process Methods 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 18
- 238000012545 processing Methods 0.000 claims description 17
- 238000005070 sampling Methods 0.000 claims description 17
- 238000013528 artificial neural network Methods 0.000 claims description 16
- 230000001788 irregular Effects 0.000 claims description 15
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 13
- 230000004927 fusion Effects 0.000 claims description 12
- 208000035473 Communicable disease Diseases 0.000 claims description 11
- 238000007405 data analysis Methods 0.000 claims description 11
- 201000010099 disease Diseases 0.000 claims description 11
- 208000015181 infectious disease Diseases 0.000 claims description 11
- 230000000873 masking effect Effects 0.000 claims description 11
- 208000031662 Noncommunicable disease Diseases 0.000 claims description 10
- 238000004422 calculation algorithm Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 9
- 238000007726 management method Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000002452 interceptive effect Effects 0.000 claims description 6
- 230000001613 neoplastic effect Effects 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 230000002159 abnormal effect Effects 0.000 claims description 5
- 238000013450 outlier detection Methods 0.000 claims description 5
- 238000003672 processing method Methods 0.000 claims description 5
- 238000009666 routine test Methods 0.000 claims description 5
- 206010028980 Neoplasm Diseases 0.000 claims description 4
- 230000011218 segmentation Effects 0.000 claims description 4
- 238000007619 statistical method Methods 0.000 claims description 4
- 208000023275 Autoimmune disease Diseases 0.000 claims description 3
- 208000011594 Autoinflammatory disease Diseases 0.000 claims description 3
- 206010066476 Haematological malignancy Diseases 0.000 claims description 3
- 208000002250 Hematologic Neoplasms Diseases 0.000 claims description 3
- 230000001580 bacterial effect Effects 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 claims description 3
- 201000011510 cancer Diseases 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000002538 fungal effect Effects 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 230000003071 parasitic effect Effects 0.000 claims description 3
- 239000007787 solid Substances 0.000 claims description 3
- 230000009885 systemic effect Effects 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 230000003612 virological effect Effects 0.000 claims description 3
- 238000013480 data collection Methods 0.000 claims description 2
- 238000001514 detection method Methods 0.000 claims description 2
- 238000012546 transfer Methods 0.000 claims description 2
- 230000009897 systematic effect Effects 0.000 abstract description 6
- 230000000474 nursing effect Effects 0.000 description 9
- 238000003745 diagnosis Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000013507 mapping Methods 0.000 description 6
- 238000013461 design Methods 0.000 description 3
- 239000003814 drug Substances 0.000 description 3
- 230000002458 infectious effect Effects 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 108010074051 C-Reactive Protein Proteins 0.000 description 2
- 230000036772 blood pressure Effects 0.000 description 2
- 230000036760 body temperature Effects 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 238000007499 fusion processing Methods 0.000 description 2
- 208000027866 inflammatory disease Diseases 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000004393 prognosis Methods 0.000 description 2
- 230000035485 pulse pressure Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000029058 respiratory gaseous exchange Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 102100032752 C-reactive protein Human genes 0.000 description 1
- VEXZGXHMUGYJMC-UHFFFAOYSA-M Chloride anion Chemical compound [Cl-] VEXZGXHMUGYJMC-UHFFFAOYSA-M 0.000 description 1
- 206010061818 Disease progression Diseases 0.000 description 1
- 206010059866 Drug resistance Diseases 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 108090000371 Esterases Proteins 0.000 description 1
- 208000015220 Febrile disease Diseases 0.000 description 1
- 102000006395 Globulins Human genes 0.000 description 1
- 108010044091 Globulins Proteins 0.000 description 1
- 102000003855 L-lactate dehydrogenase Human genes 0.000 description 1
- 108700023483 L-lactate dehydrogenases Proteins 0.000 description 1
- 238000012952 Resampling Methods 0.000 description 1
- FKNQFGJONOIPTF-UHFFFAOYSA-N Sodium cation Chemical compound [Na+] FKNQFGJONOIPTF-UHFFFAOYSA-N 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 208000007502 anemia Diseases 0.000 description 1
- 230000002924 anti-infective effect Effects 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 244000052616 bacterial pathogen Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000035606 childbirth Effects 0.000 description 1
- 230000015271 coagulation Effects 0.000 description 1
- 238000005345 coagulation Methods 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000005750 disease progression Effects 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 239000003792 electrolyte Substances 0.000 description 1
- 210000003743 erythrocyte Anatomy 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000005534 hematocrit Methods 0.000 description 1
- 210000003734 kidney Anatomy 0.000 description 1
- 238000009533 lab test Methods 0.000 description 1
- 210000000265 leukocyte Anatomy 0.000 description 1
- 150000002632 lipids Chemical class 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 230000003908 liver function Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000002107 myocardial effect Effects 0.000 description 1
- 210000002741 palatine tonsil Anatomy 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- BUKHSQBUKZIMLB-UHFFFAOYSA-L potassium;sodium;dichloride Chemical compound [Na+].[Cl-].[Cl-].[K+] BUKHSQBUKZIMLB-UHFFFAOYSA-L 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 238000004062 sedimentation Methods 0.000 description 1
- 229910001415 sodium ion Inorganic materials 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
- 230000004580 weight loss Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/20—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
Definitions
- the invention belongs to the technical field of medical and health information, and in particular relates to an auxiliary differential diagnosis system for fever pending investigation based on a task decomposition strategy.
- fever is not only the primary reason for approximately 30% of children's medical visits, but also occurs in up to 75% of acutely ill adult patients in ICU care.
- diagnosis and treatment technology most patients with fever under investigation can be diagnosed accordingly, there are still about 7%-53% of patients with fever under investigation in the international scope, even if they pass a comprehensive and systematic examination, they cannot be clearly diagnosed .
- the prognosis of patients with fever under investigation is highly related to the underlying etiology. Some patients with rapid disease progression may develop life-threatening complications quickly if they are not accurately diagnosed and treated in the early stage. Therefore, the later the diagnosis, the worse the prognosis.
- empiric anti-infection treatment based on unbiased diagnosis not only lacks evidence-based medicine and is highly dependent on clinician experience, but also easily leads to the increase of drug resistance of pathogenic bacteria and non-targeted drugs and drugs. Waste of medical resources such as multiple referrals.
- the prior art solution [Application Publication No.: CN112768057A, Invention Name: System for Identifying the Cause of Unexplained Fever in Children]
- the potential cause identification scheme for fever under investigation is only aimed at children, so the range of potential causes is relatively small, and the difficulty of system identification is low .
- the identification system described in this protocol only uses 8 indicators of age, sodium ion, chloride ion, lactate dehydrogenase, globulin, hematocrit, C-reactive protein, and white blood cell esterase to identify patients with fever. It is judged whether the potential etiology is infectious or not, so the differential diagnosis of potential etiologies of fever is incomplete, and the feature space that can be represented by the eight indicators is small, and the clinical adaptability is poor.
- the prior art solution [Application Publication No.: CN107785075A, Invention Name: Text-based Medical Record-Based Deep Learning Auxiliary Diagnosis System for Pediatric Fever Diseases]
- the deep learning auxiliary diagnosis scheme for febrile diseases is also only aimed at children, and the system directly diagnoses 30 common Classification of febrile illnesses in children, rather than the underlying etiology of fever of unknown origin.
- the program mainly emphasizes the utilization of clinical text medical record data, and uses natural language processing technology to extract text features from it as a feature space for differential diagnosis of fever in children, without involving other time series and structured data content.
- Fever patients often have multiple multi-department outpatient visits or inpatient visits when they are admitted to the hospital.
- the relevant clinical data is mainly related to the main index of visits.
- the existing technical solutions lack an effective mechanism for regularizing the scattered clinical data of multiple visits.
- the scattered clinical data of patients cannot be divided and integrated, resulting in a data gap between clinical business data and the data required by the auxiliary identification system.
- the present invention provides an auxiliary differential diagnosis system for fever under investigation based on a task decomposition strategy, and provides a comprehensive, systematic and hierarchical solution strategy for the differential diagnosis of potential causes of fever under investigation.
- a system for auxiliary differential diagnosis of fever pending investigation based on task decomposition strategy includes the following modules:
- (1) Data acquisition module realize the connection between the auxiliary differential diagnosis system for fever pending investigation and the heterogeneous source database; configure the data range of the target clinical information in the heterogeneous source database through the interactive interface, as well as the unique identification of the patient and the unique identification of the visit, And complete the scanning of target data and statistics of verification data, and establish a complete data path for target data collection;
- Data regularization module establish a data regularization strategy, and determine different visit cycles by setting the diagnostic anchor point of fever to be checked and the time difference before and after the anchor point;
- the business data generated at indefinite intervals are re-divided and integrated to form the smallest data analysis unit for a single patient due to a single fever visit; the earliest medical record data is extracted within the time range of the smallest data analysis unit;
- Multi-modal data preprocessing module for the specified type of medical record text data, use regular expression technology to structure the target information of the medical record text according to the structural characteristics of different types of medical record texts by adopting position-oriented mode and keyword-oriented mode Extraction; time window alignment and normalization processing for multivariate time-series data with different sampling frequencies, different lengths, and missing values; for structured data, complete outlier processing, missing value filling, and Standard coding and standardization;
- Hierarchical identification module of potential etiology of fever to be investigated including:
- the hierarchical structure of potential etiology categories of fever to be investigated is constructed, and the complex multi-classification problem with uneven sample distribution is transformed into a hierarchical classification problem including multiple two-category and three-category tasks; establishment of fever Hierarchical classification model of potential etiology to be investigated, the model classification output space is defined above the hierarchical structure of potential etiology of fever to be investigated;
- the siblings strategy is adopted to divide the positive and negative training samples, and multiple base classifiers are trained based on the divided training sample sets; in the actual application stage of the model, the Top-Down algorithm is used to classify multiple The classification results between the upper and lower levels are post-processed, the local probability of a single base classifier is corrected, and the consistent probability that conforms to the hierarchical structure of the potential cause of fever under investigation is given, and the hierarchical category classification results of potential causes of fever under investigation are obtained. Classification results give hierarchical differential diagnosis opinions.
- the system also includes a result display module, which is used for visually displaying the clinical performance data involved in the hierarchical classification model of potential causes of fever for investigation in the form of a timeline for visiting a doctor, and for analyzing the potential causes of fever for investigation
- the hierarchical category classification results and hierarchical differential diagnosis opinions obtained by the hierarchical identification module are displayed visually.
- the data acquisition module includes a database connection management unit and a target data custom unit;
- the database connection management unit includes: writing a plurality of JDBC modules through classes and interfaces of the java programming language, establishing data paths with heterogeneous databases, realizing SQL command interaction with source databases and returning data from source databases storage;
- the target data self-definition unit includes: demarcating the data range for the target clinical information required by the hierarchical classification model of potential causes of fever to be investigated, configuring the data range, the unique identifier of the patient and the unique identifier of the doctor through the interactive interface, and completing the transfer of the target data to the cache database The data transmission, to determine the complete data path.
- the electronic medical record event in which the patient was first diagnosed as fever pending investigation is used as the anchor point for the diagnosis of fever pending investigation, and medical records within 7 natural days are included in the front, and included in the next time All medical records with a difference of less than or equal to 24 hours between the start time of the visit and the end time of the current visit are regarded as one visit cycle; the medical records with the start time of the next visit and the end time of the current visit greater than 24 hours are classified as the next visit cycle, with This forms the smallest data analysis unit for a single patient due to a single fever visit.
- the multimodal data preprocessing module includes a text data preprocessing unit, a time series data preprocessing unit and a structured data preprocessing unit;
- the text data preprocessing unit includes: for past history, personal history, family history, and marriage and childbearing history, four types of medical record text data, respectively write regular expression sentences in a position-oriented mode to extract target information structure;
- the target symptom dictionary includes a systemic symptom dictionary that is not sensitive to location information, a symptom dictionary that is sensitive to location information, and body A part dictionary, the dictionary matching adopts a two-way longest matching algorithm to carry out structured extraction of symptom names, duration, frequency, and body part information;
- the time-series data preprocessing unit includes: performing time window alignment on the multivariate time-series data, and taking the data within a fixed time of each visit as the patient's early clinical performance data; each line of data corresponds to a time-series variable data sequence of each patient , according to the sampling frequency of each time series variable data and the length distribution of the sampling time span, specify the input data time window and the time interval between columns, so as to realize the time series alignment among multiple time series variables in the same visit of the same patient; Use Min-Max normalization to normalize the time series data;
- the structured data preprocessing unit includes: performing the following preprocessing operations on the structured medical record text data, basic information data and laboratory routine test data: abnormal value processing, missing value filling, standard coding and standardization.
- the outlier processing includes: for outlier detection of numerical variables, adopt statistical analysis and 3 ⁇ principle, treat outliers as missing values, and use missing value processing methods for processing ;Aiming at outlier detection of categorical variables, erroneous inputs outside the preset category are identified as outliers, and outliers are deleted and filled with the majority value in the categorical variable;
- the filling of the missing value includes: using the mode filling for the categorical variable, and adopting the mean filling method for the numerical variable if its distribution conforms to the normal distribution, and adopting the median filling method if the distribution does not conform to the normal distribution;
- the standard coding includes: performing numerical processing on categorical variables, adopting integer coding for variables with sequence relationship and unequal importance between different variable values, and adopting integer coding for variables with no sequence relationship and no importance difference between different variable values Variables, one-hot encoded.
- the hierarchical identification module of the potential etiology of fever to be investigated when classifying the potential etiology of fever for patients with fever to be investigated based on the hierarchical structure of the potential etiology of fever to be investigated, first distinguish whether the potential etiology of fever is an infectious disease or a non-infectious disease , if it belongs to infectious disease, continue to distinguish whether it belongs to bacterial, viral, fungal, parasitic or other infectious diseases; if it belongs to non-infectious disease, continue to distinguish whether it belongs to neoplastic disease, NIID or other non-infectious diseases Infectious disease; if it belongs to neoplastic disease, continue to distinguish whether it belongs to hematological malignancy, solid malignant tumor or benign tumor; if it belongs to NIID, continue to distinguish whether it belongs to autoimmune disease or autoinflammatory disease;
- the potential cause category hierarchy is asymmetric, anti-reflexive and transitive.
- an end-to-end multimodal fusion deep neural network is used as the base classifier of the hierarchical classification model of potential causes of fever to be investigated, and the structure of the base classifier is as follows:
- categorical variables use entity embedding technology to build an embedding network layer to extract features for categorical variables; use the DNN network layer to perform feature extraction on entity embedding representations of categorical variables and structured numerical variables; introduce masking in the GRU network layer Vectors, time interval factors, and attenuation coefficients for feature extraction of multivariate time-series data with different time spans, irregular sampling frequencies, and missing values;
- the post-fusion strategy is adopted to fuse the feature representation output by the DNN network layer and the feature representation output by the GRU network layer, and input the softmax layer for the calculation of the cross-entropy loss function and the training of the base classifier.
- entity embedding technology is used to map each discrete value of the high-cardinality categorical variable to a one-dimensional numerical vector, and the one-dimensional numerical vector is transformed into a linear unit to obtain an entity embedding representation of the categorical variable;
- entity embedding representation of the categorical variable and the structured numerical variable are combined and input into the DNN network layer, and the data feature representation of the sample learned by the DNN network layer is obtained through the nonlinear transformation of the multi-layer fully connected neural network.
- T n represents the number of time nodes of the nth sample
- T n represents the observed values of all time series variables of the nth sample at the tth time node, t ⁇ 1, 2,..., T n ⁇
- Model the irregular time interval of the time series variable d at the tth time node expressed as:
- the d-dimensional time series variable of the n-th sample takes the value of the masked vector at the t-th time node; express The value of the time series variable in the d-th dimension; Indicates the time interval factor of the d-dimensional time series variable of the n-th sample at the t-th time node;
- the multivariate time series data input space of the GRU network layer is expressed as Indicates the event observation time of the nth sample at the tth time node, Indicates the value of the masking vector of the nth sample at the tth time node;
- the attenuation coefficient is introduced into the GRU network layer, the potential patterns contained in missing values and irregular time intervals are mined, and the attenuation coefficient of each time series variable is learned during the end-to-end learning process of the model;
- ⁇ t exp ⁇ -max(0, W ⁇ ⁇ t +b ⁇ ) ⁇
- W ⁇ and b ⁇ are the model parameters related to the attenuation coefficient obtained by co-training with all other network parameters during the GRU network layer training process, ⁇ t represents the time interval factor at the tth time node, and ⁇ t represents the time interval factor at the tth time node. Attenuation coefficient of t time nodes;
- the last layer network output of the GRU network layer in all time series data is taken as the feature representation of the multivariate time series data.
- a hierarchical structure of potential etiologies of fever under investigation has been comprehensively and systematically constructed, covering infectious diseases, tumor diseases, NIID and other major diseases.
- the hierarchical classification model for auxiliary differential diagnosis can simulate the reasoning logic of clinicians and give differential diagnosis opinions layer by layer. Therefore, not only the scope of identification is more comprehensive and systematic, but also it has higher identification accuracy and better clinical interpretability. In addition, its top-down layer-by-layer reasoning model is more in line with clinicians' clinical practice habits.
- the clinical data used are early clinical manifestation data that are easily obtained in the early stage of patient consultation, so in the early stage of patient consultation, differential diagnosis opinions with great clinical value and credibility can be given based on limited information.
- Data preprocessing and feature extraction are performed on multi-modal data such as multi-variable time series data, text data, and structured data, and a detailed multi-modal data fusion solution is given.
- a data regularization module is designed to re-segment and integrate them, which is helpful to accurately obtain the early treatment data of patients and eliminate the inaccurate data acquisition caused by irregular treatment procedures.
- the data path between the clinical business data and the input data of the hierarchical classification model of potential causes of fever under investigation is established.
- Fig. 1 is the framework diagram of the system structure provided by the embodiment of the present invention.
- FIG. 2 is a data flow path diagram provided by an embodiment of the present invention.
- FIG. 3 is a schematic diagram of data regularization provided by an embodiment of the present invention.
- Figure 4 is a schematic diagram of the hierarchical structure of the potential cause of fever under investigation provided by the embodiment of the present invention.
- Fig. 5 is a framework diagram of a hierarchical classification model of potential causes of fever to be investigated provided by an embodiment of the present invention
- FIG. 6 is a schematic diagram of a GRU structure introducing an attenuation mechanism provided by an embodiment of the present invention
- Fig. 7 is a specific neural network structure diagram of the base classifier provided by the embodiment of the present invention.
- the implementation of the present invention provides a system for auxiliary differential diagnosis of fever under investigation based on a task decomposition strategy, as shown in Figure 1, the system includes the following modules:
- the data acquisition module including the database connection management unit and the target data custom unit;
- Database connection management unit realize the connection between the auxiliary differential diagnosis system for fever pending investigation and heterogeneous source databases
- Target data customization unit Configure the data range of the target clinical information in the heterogeneous source database through the interactive interface, as well as the unique identification of the patient and the unique identification of the visit, and complete the scanning of the target data and the statistics of the verification data, and establish the target Complete data path for data acquisition.
- Data regularization module including:
- Multimodal data preprocessing module including text data preprocessing unit, time series data preprocessing unit and structured data preprocessing unit;
- Text data preprocessing unit for the specified type of medical record text data, use regular expression technology to extract the target information of the medical record text in a position-oriented mode and a keyword-oriented mode according to the structural characteristics of different types of medical record texts;
- Time series data preprocessing unit perform time window alignment and normalization processing for multivariate time series data with different sampling frequencies, different lengths and missing values;
- Structured data preprocessing unit For structured data, complete outlier processing, missing value filling, standard coding and standardization for categorical variables and numerical variables.
- Hierarchical identification module of potential etiology of fever to be investigated including:
- the siblings strategy is adopted to divide the positive and negative training samples; based on the divided multiple training sample sets, train multiple base classifiers respectively;
- the Top-Down algorithm is used to post-process the classification results of multiple base classifiers between the upper and lower levels, and the local probability of a single base classifier is corrected to give a hierarchy of potential etiological causes of fever. Based on the consistent probability of the structure, the hierarchical category classification results of potential causes of fever patients under investigation were obtained; based on the hierarchical category classification results, hierarchical differential diagnosis opinions were given.
- the end-to-end multi-modal fusion deep neural network is used as the base classifier of the hierarchical classification model of potential causes of fever under investigation.
- the specific neural network structure of the base classifier is as follows:
- Result display module visually display the clinical performance data involved in the hierarchical classification model of potential causes of fever to be investigated in the form of a doctor visit timeline, and perform hierarchical classification results and hierarchical identification obtained from the hierarchical identification module of potential causes of fever to be investigated Visual display of diagnostic opinions.
- HIS Hospital Information System
- LIS Labeloratory Information System
- EMR Electronic Medical Record
- the implementation of the database connection management unit mainly writes multiple JDBC modules through the existing classes and interfaces of the java programming language, and establishes data paths with heterogeneous databases. Based on this, the SQL command interaction with the source database and the The source database returns the storage for the data.
- the target data self-definition unit is mainly based on the data path established by the database connection management unit, and demarcates the source data range for the target clinical information required by the subsequent hierarchical classification model of potential causes of fever to be investigated.
- the target clinical information range includes: age, sex, height and 4 categories of basic information on weight, 6 categories of medical record text data, chief complaint, past history, personal history, family history, marriage and childbirth history, and current illness history, 5 categories of nursing time series data such as body temperature, respiration, heart rhythm, pulse and blood pressure, and blood routine, Urine routine, coagulation function routine examination, myocardial enzyme spectrum routine examination, liver and kidney lipid sugar electrolyte determination, stool routine, erythrocyte sedimentation rate determination, high-sensitivity C-reactive protein, potassium sodium chloride determination, liver function routine examination and other routine laboratory tests.
- the data of 124 small test items under the item that is, the routine test data of the laboratory.
- the data transmission of the target data to the cache database is completed, thereby determining the complete data path.
- the data regularization module regularizes the business data at irregular intervals generated in the clinical business, so as to meet the requirements of the input and analysis of the hierarchical classification model of potential etiology of fever to be investigated.
- the next visit cycle forms the smallest data analysis unit for a single patient. Then, based on the aforementioned data analysis unit, extract the earliest medical record data that occurred within the time range of the data analysis unit Constitute the input feature space of the hierarchical classification model of potential etiology of subsequent fever. The technical content of the above-mentioned regularization is completed in the operation database.
- the multimodal data preprocessing module includes a text data preprocessing unit, a time series data preprocessing unit and a structured data preprocessing unit.
- the text data preprocessing unit receives specified types of medical record text data, uses natural language processing technology to understand the input medical record text, and performs structured extraction of target information on the medical record text.
- the regular expression technology is mainly used to extract the target information from the medical record text in a position-oriented mode and a keyword-oriented mode according to the structural characteristics of different types of medical record texts.
- medical history, personal history, family history, and marriage and childbearing history all have fixed format requirements. Therefore, the regular expression language is written separately through the position-oriented mode to achieve the purpose of information extraction.
- the symptom entity extraction is carried out on the early clinical symptom information of patients.
- Dictionary C includes location-insensitive systemic symptom dictionary C1 (such as weight loss, anemia, fatigue, etc.), location-sensitive symptom dictionary C2 (such as pain, space occupying, soreness, etc.) and body part dictionary C2-pos (such as head , limbs, tonsils, etc.).
- the dictionary matching mainly uses the bidirectional longest matching algorithm to extract the symptom name, duration, frequency, and body part information in a structured manner.
- Table 1 The final structured data storage structure is shown in Table 1.
- the time-series data preprocessing unit is mainly aimed at five types of nursing time-series data: body temperature, respiration, heart rate, pulse and blood pressure. Due to the relatively complex clinical business environment, the above-mentioned time-series data generated in the clinical nursing process have the characteristics of different time spans, large differences in sampling frequency between different time-series variables, common missing values, and high sparseness. Use is extremely difficult.
- this technical solution first aligns the time window of nursing time series data, and takes the data within ⁇ hours of each visit as the patient's early clinical performance data; each row of data corresponds to a time series of each patient Variable data sequence, according to the sampling frequency of each time-series variable data and the length distribution of the sampling time span, clearly input the data time window ⁇ and the time interval ⁇ between columns, so as to realize the comparison of multiple time-series variables in the same visit of the same patient Timing alignment between. Then Min-Max normalization is adopted to normalize the above nursing time-series data while retaining the time-series waveform.
- the structured data preprocessing unit mainly performs the following preprocessing operations on the structured medical record text data, basic information data (age, gender, height and weight) and laboratory routine test data: abnormal value processing, missing value filling, standard coding and standardization.
- Outlier processing mainly deals with outlier points generated by human error.
- this technical solution mainly adopts simple statistical analysis and 3 ⁇ principle.
- Simple statistical analysis is to perform descriptive statistics on variable values, preset values Reasonable space [min:max], if the judgment exceeds the reasonable space of the value, it will be identified as an outlier;
- the 3 ⁇ principle means that for a variable that conforms to a normal distribution, the probability of being 3 ⁇ away from the average value of the variable is P(
- f( ⁇ ) is the normal distribution function of the variable ⁇
- ⁇ is the expectation (mean)
- ⁇ is the standard deviation, so the data outside the interval [ ⁇ -3 ⁇ , ⁇ +3 ⁇ ] are outliers.
- the processing method is to treat outliers as missing values, and use the missing value processing method to process them.
- outlier detection of categorical variables erroneous input outside the preset category is identified as an outlier, and the processing method is to delete the outlier and fill it with the majority value in the variable.
- Missing value filling is mainly for complete random missing; for categorical variables, the mode filling is used, for numerical variables, if the distribution conforms to the normal distribution, the mean filling method is used, and if the distribution does not conform to the normal distribution, the median filling method is used. In this way, the complexity of data preprocessing in the data preprocessing stage is reduced.
- Standard coding is mainly for numerical processing of categorical variables.
- this technical solution adopts integer coding, that is, for variables that exist uniquely valued variable, which can be encoded in sequence as
- integer coding that is, for variables that exist uniquely valued variable, which can be encoded in sequence
- one-hot encoding that is, for the existence of A variable with a unique value, each variable value is expressed as a length of The [0, 1] sequence, assuming a variable value in The sort position among the unique values is k, then its one-hot encoded value is
- Standardization is to transform the data into a standard normal distribution with a mean of 0 and a standard deviation of 1 without changing the distribution of the original data, so as to eliminate the influence of different dimensions between different variables on subsequent model classification.
- This technical solution aims at objective problems such as diverse types of potential causes of fever and difficulties in differential diagnosis, combined with the research and summary of existing potential causes of fever in previous medical literature and clinical guidelines, and forms potential causes of fever to be investigated based on the task decomposition strategy
- the category hierarchy transforms the original complex multi-classification problem with unbalanced sample distribution into a hierarchical classification problem including multiple two-class and three-class classification tasks.
- the detailed category hierarchy division is shown in Figure 4.
- Hierarchical classification can be viewed as a special type of structured classification problem where the classification output space is defined over a hierarchy of categories.
- the category hierarchy T constructed by this technical solution belongs to the tree-like conventional concept hierarchy, which can be specifically defined as a partially ordered set (C, ⁇ ), where C represents all category concepts involved in the classification of potential causes of fever to be investigated.
- C represents all category concepts involved in the classification of potential causes of fever to be investigated.
- ⁇ represents the parent-child inheritance relationship "IS-A”
- the root node of the category hierarchy T is recorded as root(T).
- the class hierarchy T has asymmetry, anti-reflexivity and transitivity, respectively expressed as follows:
- this technical solution adopts the siblings strategy, That is, when classifying and predicting category c i , the positive sample is where *(c i ) represents the sample set of category ci , Represents the sample set of all subcategories of category c i ; negative samples are in Indicates the sample set of the same category as the category c i belonging to the same parent category, Indicates the sample set of all subcategories of all sibling categories belonging to the same parent category as category c i ; ⁇ indicates the set union.
- the category classification result of the current input sample not only depends on the confidence level of the current base classifier on the input sample classification result, but also depends on whether the classification result of the parent class node base classifier of the current category of the input sample is correct or not.
- the model training phase multiple base classifiers will be trained based on the aforementioned category hierarchy T.
- the implementation framework of the model training phase and the actual application phase of the model is shown in Figure 5.
- each base classifier will estimate the local probability that a given sample x belongs to the category c i
- the post-processing Top-Down algorithm gives the final consistent probability p i (x ) by modifying the local probability. If there are ⁇ categories in total, the consistent probability p i (x) that sample x belongs to category c i is expressed as:
- the clinical necessity of auxiliary differential diagnosis for patients with fever to be investigated is especially reflected in the early stage of patient consultation.
- the clinical symptoms and manifestations are highly complex and lack the specific clinical manifestations required for differential diagnosis.
- the hierarchical classification model only takes the clinical performance data that are easy to obtain in the early stage of patient consultation. by Indicates a data set containing N fever samples waiting for investigation, where E n represents a high-cardinality categorical variable mainly from medical record text data, S n represents a structured numerical variable, V n represents multivariate time-series data, and y n represents a medical sample n's fever of unknown potential etiology label.
- this technical solution constructs an end-to-end multi-modal fusion deep neural network as the base classifier for the hierarchical classification model of potential causes of fever, which includes an entity embedding network layer for feature extraction for high-cardinality categorical variables, and an entity embedding network layer for multivariate time-series data.
- the GRU Gate Recurrent Unit
- DNN Deepforward Neural Network
- this technical solution adopts the entity embedding technology derived from the word2vec technology of text feature extraction, and maps each discrete value of the high-cardinality categorical variable to a one-dimensional numerical vector.
- the one-hot encoding process of the categorical variable E i can be expressed as:
- e i the vector After a layer of linear units, it is transformed into The mapping relationship, the output vector It can be expressed as:
- ⁇ ⁇ is a one-dimensional numerical vector after one-hot encoding
- the mapping weight to the embedding layer can be learned and updated along with the error backpropagation of the overall neural network of the model
- ⁇ is the index of the embedding layer, That is, the embedding representation of the last categorical variable E i .
- the entity embedding process f ⁇ ( ⁇ ) for all categorical variables in a single sample can be expressed as:
- Subsequent entity embedding representation Combined with the standardized structured numerical variable S n into a vector X as the input of the DNN network layer, it is transformed through the nonlinear transformation of the multi-layer fully connected neural network, namely:
- X (l) is the input vector of network layer l
- X (l+1) is the input vector of network layer l+1
- W (l) and b (l) are the weight matrix and bias of network layer l respectively
- s (l) ( ⁇ ) is the weight matrix of network layer l
- Non-linear activation function can take sigmoid, tanh or ReLu. Assuming that the total number of layers of the DNN network is L, then X (L-1) is taken as the representation of the data features learned by the DNN network layer.
- the above feature representation fusion process for a single sample can also be expressed as:
- g ⁇ ( ⁇ ) represents the feature representation fusion process of the embedded representation of structured numerical variables and categorical variables for a single sample n.
- this technical solution adopts a recurrent neural network framework, based on the GRU (Gated Recurrent Unit) network to multiple Variable time series data for feature extraction.
- GRU Gate Recurrent Unit
- irregular sampling frequency and missing values may be a reflection of the patient's clinical status, if a patient's symptom disappears, the doctor may cancel the monitoring of a certain nursing vital sign or reduce the frequency of monitoring. Therefore, the above-mentioned GRU network layer built In the process of modeling, irregular sampling frequency information and missing value information are included in the time series feature space for feature mining.
- This technical solution is based on Represents the multivariate time series data of the nth sample containing D time series variables, and T n represents the number of time nodes of the nth sample.
- T n represents the number of time nodes of the nth sample.
- Modeling the irregular time interval of the time series variable d at the tth time node can be expressed as:
- the multivariate time series data input space of the GRU network layer can be expressed as in in, Indicates the event observation time of the nth sample at the tth time node, Indicates the value of the masking vector of the nth sample at the tth time node.
- ⁇ t exp ⁇ -max(0, W ⁇ ⁇ t +b ⁇ ) ⁇
- W ⁇ and b ⁇ are the model parameters related to the attenuation coefficient ⁇ that are jointly trained with all other network parameters during the GRU network layer training process
- ⁇ t represents the time interval factor at the t-th time node
- ⁇ t represents the The decay coefficient of the tth time node.
- this technical solution adopts the input attenuation coefficient ⁇ v to perform the attenuation operation on the missing variables until the empirical mean of the variables, namely:
- v t′.d represents the observed value of the d-th-dimensional time-series variable at the last non-missing t′-th time node
- m td indicates the masked vector value of the d-th dimension time series variable at the tth time node
- v td indicates the observed value of the d-th dimension time series variable at the tth time node
- this technical solution introduces the hidden state attenuation coefficient ⁇ h at the same time, that is, the hidden state h t-1 at the previous moment is attenuated before the new hidden state h t is calculated:
- h t-1 represents the hidden state of the t-1th time node
- ⁇ indicates The element-wise dot product operation between and h t-1 , Indicates the hidden state after decay calculation at the t-1th time node.
- the masking vector m t is directly input into the training process of the GRU network layer, and the missing value information of a variable and the duration of the missing state are input into the potential cause of fever under investigation without explicitly calculating the missing value.
- the classification model realizes the end-to-end solution to the irregular time interval and missing value problems of multivariate time series data during the model training process, namely:
- the update function of the GRU network layer is as follows:
- z t is the net input of the hidden layer of the GRU network
- h t represents the hidden state at the tth time node
- r t represents the forgetting gate of the GRU network layer at the t-th time node
- m t represents the value of the masking vector at the t-th time node
- ⁇ ( ⁇ ) is a Logistic function, its output range is (0, 1)
- ⁇ represents element dot product operation
- matrix W z , W r , W, U z , U r , U, H z , H r , H and vector b z , b r , b are GRU network layer parameters.
- the hidden state h t is used as the output of the GRU network layer at the tth time node After taking the GRU network layer in the last layer network output of all time series data As a feature representation for multivariate time-series data. Then the feature extraction process h ⁇ ( ⁇ ) of the above multivariate time series data can be expressed as:
- V n represents the multivariate time-series data of the nth sample
- the final multimodal fusion deep neural network can be expressed as:
- H ⁇ ( ⁇ ) represents the complete mapping conversion process of performing feature fusion on structured numerical variables, categorical variables and multivariate time series data, and obtaining sample classification prediction results.
- the result display module mainly uses the front-end visual interface design of the system to visually display the clinical performance data taken into consideration by the hierarchical classification model of potential causes of fever under investigation through the timeline of visits, and at the same time displays the differential diagnosis output from the identification module of potential causes of fever under investigation.
- the opinion and the confidence level of each base classifier's differential diagnosis opinion are convenient for clinicians to refer to.
- the present invention constructs a comprehensive and systematic hierarchal structure of the potential etiology of fever for the auxiliary differential diagnosis of potential causes of fever, and transforms complex multi-classification problems with large heterogeneity in classification space into multiple classification problems based on task decomposition strategies.
- the hierarchical classification problem of two-class and three-class classification tasks solves the problem of difficult classification and unbalanced distribution of label samples.
- the present invention fully considers the actual clinical business, designs data regularization strategy and realizes it automatically, and effectively divides and integrates the original scattered clinical data caused by multiple visits or referrals of patients to form a list of patients with fever waiting for investigation.
- the sub-febrile course is the minimum data analysis unit of the basic path.
- the present invention designs and implements a hierarchical classification model of potential etiological causes of fever. and clinical applicability.
- the invention constructs a complete multi-modal fusion deep neural network, fully and effectively integrates and mines medical record text data, laboratory routine test data, and nursing time series data that are easily obtained in the early stage of patient admission, and realizes the detection of fever pending investigation.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Public Health (AREA)
- Biomedical Technology (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Pathology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
本发明公开了一种基于任务分解策略的发热待查辅助鉴别诊断系统,首次全面且系统地构建了发热待查潜在病因类别层次结构,基于类别层次结构实现了针对发热待查潜在病因进行辅助鉴别诊断的层次分类模型,并能够模拟临床医生的推理逻辑,逐层给出鉴别诊断意见,不仅鉴别范围更全面、系统,同时具有更高的鉴别准确度和更好的临床可解释性,由上向下的逐层推理模式也更加符合临床医生的临床实践习惯;本发明所利用的临床数据都是患者就诊早期极易获取的早期临床表现数据,因此在患者早期就诊阶段就能够基于有限信息给出极具临床价值和可信度的鉴别诊断意见;本发明为发热待查潜在病因的鉴别诊断提供了全面、系统、层次化的解决策略。
Description
本发明属于医疗健康信息技术领域,尤其涉及一种基于任务分解策略的发热待查辅助鉴别诊断系统。
作为多数临床问题的最常见症状之一,发热不仅是约30%儿童就诊的首要原因,同时在ICU护理的急重症成人患者中发生率也高达75%。尽管随着诊疗技术的进步,大部分发热待查患者可以得到相应的诊断,但在国际范围内仍有约7%-53%的发热待查患者即使通过全面系统的检查也未能得到明确诊断。同时发热待查患者的预后与潜在病因高度相关,部分病程发展较快的患者,若前期得不到准确诊断与恰当治疗,可快速出现危及生命的并发症,因此确诊越晚其预后越差。除此之外,在没有倾向性诊断基础上进行经验性抗感染治疗不仅缺乏循证医学依据,高度依赖临床医生经验,同时也易导致致病菌耐药性的提升以及非靶向性药物和多次转诊等医疗资源的浪费。加之发热待查潜在病因可达200多种,且其临床表现多样、复合度高,因此对发热待查潜在病因的早期鉴别诊断仍然是国内外临床医生所面临的重要挑战,尤其是在医疗资源条件相对落后的地区。
由于发热待查潜在病因复杂,且不同地区、不同时期和不同年龄的患者在不同配置的医疗资源条件下其病因构成比例都会有所差异,因此直接通过传统机器学习手段进行潜在病因多分类,往往存在类别间样本不均衡、分类问题复杂度高的固有缺陷,难以确保分类精度。
现有技术方案[申请公布号:CN112768057A,发明名称:鉴别儿童发热待查病因的系统]提出的发热待查潜在病因鉴别方案只针对儿童群体,因此其潜在病因范围相对较小,系统鉴别难度低。除此之外,该方案描述的鉴别系统只利用了年龄、钠离子、氯离子、乳酸脱氢酶、球蛋白、红细胞压积、C反应蛋白及白细胞酯酶8项指标对发热待查患者的潜在病因是否为感染性进行判断,因此发热待查潜在病因鉴别诊断内容不完整,且8项指标所能表示的特征空间较小,临床适应性较差。
现有技术方案[申请公布号:CN107785075A,发明名称:基于文本病历的小儿发热疾病深度学习辅助诊断系统]描述的发热疾病深度学习辅助诊断方案同样只针对小儿群体,且该系统直接对30种常见小儿发热疾病进行分类,而非针对发热待查潜在病因。此外,该方案主要是强调对临床文本病历数据的利用,通过自然语言处理技术对其进行文本特征提取以作为小儿发热鉴别诊断的特征空间,不涉及其他时序、结构化数据内容。
目前针对发热待查潜在病因进行辅助鉴别诊断的技术方案相对缺乏,该研究技术领域仍处于探索阶段。现有技术方案存在如下缺陷:
1.现有技术方案均只针对儿童群体的发热相关疾病进行鉴别诊断,儿童群体的发热相关疾病类型与范围相较于整体发热待查潜在病因的类型和范围仍有很大差别,且临床实际场景中发热待查群体主要是以成年人为主。
2.现有技术方案均只局限于感染性疾病与非感染性疾病的鉴别或只局限于易区分的小部分疾病,发热待查潜在病因的鉴别诊断覆盖范围不完整,因此临床实际适用性和可扩展性差。
3.现有技术方案只是对小部分发热相关疾病进行分类,所依赖的临床数据并非患者早期就诊的非特异性数据,而对发热待查患者进行辅助鉴别诊断最具有临床价值的便是在患者就诊早期,在临床医生基于有限临床表现数据难以得出倾向性诊断的情况下给予临床医生以辅助鉴别诊断意见。
4.现有技术方案均只涉及患者几项临床化验指标或单模态临床数据,并未对多模态的数据融合给出详细解决方案,因此所能挖掘的特征关联与所能表达的信息空间有限,而对发热待查患者做早期辅助鉴别诊断亟需对有限数据进行最大程度的利用。
5.现有技术方案针对发热待查潜在病因的鉴别诊断主要依赖于机器学习模型直接进行多分类,因此难以解决因发热待查潜在病因复杂多样导致的样本分布不均衡问题,不仅难以保证多分类的分类精度,且复杂的多分类任务缺少临床可解释性,难以在临床实际应用场景条件下被医生所接受。
6.发热患者入院就诊往往存在多次多科室门诊就诊或住院就诊,相关临床数据主要以就诊主索引进行关联组织,现有技术方案均缺少针对多次就诊的分散临床数据进行规整的有效机制,无法对患者分散的临床数据进行分割与整合,造成临床业务数据与辅助鉴别系统所需数据之间的数据鸿沟。
发明内容
本发明针对现有技术方案的不足,提供一种基于任务分解策略的发热待查辅助鉴别诊断系统,为发热待查潜在病因的鉴别诊断提供了全面、系统、层次化的解决策略。
本发明的目的是通过以下技术方案实现的:一种基于任务分解策略的发热待查辅助鉴别诊断系统,该系统包括以下模块:
(1)数据获取模块:实现发热待查辅助鉴别诊断系统与异构源数据库的连接;通过交互界面配置在异构源数据库内的目标临床信息的数据范围,以及患者唯一标识、就诊唯一标识,并完成对目标数据的扫描以及校验性数据的统计,建立目标数据采集的完整数据通路;
(2)数据规整模块:建立数据规整策略,通过设定发热待查诊断锚点和锚点前后就诊时间差,确定不同就诊周期;基于数据规整策略对临床业务当中因患者多次门诊就诊与住院就诊产生的不定间隔的业务数据进行重新分割与整合,形成单个患者因单次发热就诊产生的最小数据分析单元;在最小数据分析单元时间范围内提取最早的就诊病历记录数据;
(3)多模态数据预处理模块:针对指定类型的病历文本数据,利用正则表达式技术根据不同类型病历文本的结构特点分别采取位置导向模式和关键词导向模式对病历文本进行目标信息结构化提取;对不同采样频率、不同长度以及存在缺失值的多变量时序数据,进行时间窗口对齐与归一化处理;针对结构化数据,完成对分类变量与数值变量的异常值处理、缺失值填充、标准编码以及标准化;
(4)发热待查潜在病因层次鉴别模块,包括:
结合医学文献与临床指南,基于任务分解策略构建发热待查潜在病因类别层次结构,将复杂且样本分布不均衡的多分类问题转化为包含多个二分类和三分类任务的层次分类问题;建立发热待查潜在病因层次分类模型,将模型分类输出空间定义在发热待查潜在病因类别层次结构之上;
在模型训练阶段,采取siblings策略对阳性与阴性训练样本进行划分,基于划分的多个训练样本集分别训练多个基分类器;在模型实际应用阶段,采取Top-Down算法对多个基分类器在上下层级间的分类结果进行后处理,修正单个基分类器的局部概率,给出符合发热待查潜在病因类别层次结构的一致概率,得到发热待查患者潜在病因的层次类别分类结果,基于层次类别分类结果给出层次化鉴别诊断意见。
进一步地,所述系统还包括结果展示模块,所述结果展示模块用于对发热待查潜在病因层次分类模型涉及的临床表现数据以就诊时间线的方式进行可视化展示,并对发热待查潜在病因层次鉴别模块得到的层次类别分类结果及层次化鉴别诊断意见进行可视化展示。
进一步地,所述数据获取模块包含数据库连接管理单元和目标数据自定义单元;
所述数据库连接管理单元包括:通过java编程语言的类及接口编写多个JDBC模块,建立与异构数据库之间的数据通路,实现与源数据库之间的SQL命令交互以及对源数据库返回数据的存储;
所述目标数据自定义单元包括:针对发热待查潜在病因层次分类模型所需的目标临床信息划定数据范围,通过交互界面配置数据范围、患者唯一标识和就诊唯一标识,完成目标数据到缓存数据库的数据传输,确定完整数据通路。
进一步地,所述数据规整模块中,以患者最早被诊断为发热待查的电子病历记录事件为发热待查诊断锚点,往前纳入7个自然日以内的就诊病历记录,往后纳入下次就诊开始时间 与本次就诊结束时间差小于等于24小时的所有就诊病历记录,作为一次就诊周期;下次就诊开始时间距离本次就诊结束时间大于24小时的就诊病历记录归为下一个就诊周期,以此形成单个患者因单次发热就诊产生的最小数据分析单元。
进一步地,所述多模态数据预处理模块包括文本数据预处理单元、时序数据预处理单元和结构化数据预处理单元;
所述文本数据预处理单元包括:针对既往史、个人史、家族史与婚育史这四类病历文本数据,采取位置导向模式分别编写正则表达式语句进行目标信息结构化提取;基于主诉与现病史这两类病历文本数据,采取关键词导向模式,利用词典分词技术构建目标症状词典及词典匹配规则;所述目标症状词典包括位置信息不敏感的全身症状词典、位置信息敏感的症状词典以及身体部位词典,所述词典匹配采取双向最长匹配算法对症状名称、持续时间、频次、身体部位信息进行结构化提取;
所述时序数据预处理单元包括:对多变量时序数据进行时间窗口对齐,取每次就诊固定时间内的数据作为患者早期临床表现数据;每行数据对应于每位患者的一项时序变量数据序列,依据每一项时序变量数据采样频率以及采样时间跨度的长短分布,明确输入数据时间窗口以及列与列之间的时间间隔,实现对同一患者同一次就诊内多时序变量之间的时序对齐;采取Min-Max归一化对时序数据做数值归一化;
所述结构化数据预处理单元包括:针对结构化后的病历文本数据、基本信息数据以及实验室常规化验数据进行以下预处理操作:异常值处理、缺失值填充、标准编码以及标准化。
进一步地,所述结构化数据预处理单元中,所述异常值处理包括:针对数值变量的异常值检测,采取统计分析和3σ原则,将异常值视为缺失值,利用缺失值处理方法进行处理;针对分类变量的异常值检测,对预设类别之外的错误输入认定为异常值,删除异常值并通过分类变量内的众数值进行填充;
所述缺失值填充包括:对于分类变量使用众数填充,对于数值变量若其分布符合正态分布则采取平均值填充法,若其分布不符合正态分布则采取中位数填充法;
所述标准编码包括:针对分类变量进行数值化处理,对于不同变量值之间存在序列关系、不平等重要性的变量,采取整数编码,对于不同变量值之间无序列关系、无重要性差别的变量,采取独热编码。
进一步地,所述发热待查潜在病因层次鉴别模块中,基于发热待查潜在病因类别层次结构对发热待查患者进行发热潜在病因分类时,首先区分发热潜在病因属于感染性疾病还是非感染性疾病,若属于感染性疾病,则继续区分是属于细菌性、病毒性、真菌性、寄生虫性还是其他感染性疾病;若属于非感染性疾病,则继续区分是属于肿瘤性疾病、NIID还是其他非 感染性疾病;若属于肿瘤性疾病,则继续区分是属于血液系统恶性疾病、实体恶性肿瘤还是良性肿瘤;若属于NIID,则继续区分是属于自身免疫性疾病还是自身炎症性疾病;所述发热待查潜在病因类别层次结构具有非对称性、反自反性和可传递性。
进一步地,所述发热待查潜在病因层次鉴别模块中,以端到端的多模态融合深度神经网络作为发热待查潜在病因层次分类模型的基分类器,所述基分类器结构如下:
针对高基数分类变量,利用实体嵌入技术构建嵌入网络层,对分类变量进行特征提取;通过DNN网络层对分类变量的实体嵌入表示与结构化数值变量进行特征提取;通过在GRU网络层中引入屏蔽向量、时间间隔因子以及衰减系数,对具有不同时间跨度和不规则采样频率以及缺失值的多变量时序数据进行特征提取;
采取后期融合策略,对DNN网络层输出的特征表示与GRU网络层输出的特征表示进行融合,输入softmax层进行交叉熵损失函数的计算与基分类器的训练。
进一步地,所述基分类器中,利用实体嵌入技术,将高基数分类变量的每一离散取值映射到一维数值向量,将一维数值向量经过线性单元转化得到分类变量的实体嵌入表示;将分类变量的实体嵌入表示与结构化数值变量合并后输入DNN网络层,经多层全连接神经网络的非线性转换,得到样本经DNN网络层学习到的数据特征表示。
进一步地,所述基分类器中,以
表示含有D个时序变量的第n个样本的多变量时序数据,T
n表示第n个样本的时间节点数量,
表示第n个样本的所有时序变量在第t个时间节点的观测值,t∈{1,2,...,T
n};以
表示第t个时间节点的事件观测时间,引入屏蔽向量m
t∈{0,1}
D表示在第t个时间节点某一时序变量值是否缺失,同时引入时间间隔因子
对时序变量d在第t个时间节点的不规则时间间隔进行建模,表示为:
所述GRU网络层中引入衰减系数,对缺失值与不规则时间间隔所含的潜在模式进行挖掘,并在模型端到端的学习过程中对每个时序变量的衰减系数进行学习;
γ
t=exp{-max(0,W
γδ
t+b
γ)}
其中W
γ和b
γ是在GRU网络层训练过程中与其他所有网络参数共同训练得到的与衰减系数相关的模型参数,δ
t表示在第t个时间节点的时间间隔因子,γ
t表示在第t个时间节点的衰减系数;
采取输入衰减系数对缺失变量进行衰减操作,直到变量经验均值;采取隐藏状态衰减系数在计算新的隐藏状态之前对其前一个时刻的隐藏状态进行衰减;
取GRU网络层在所有时序数据的最后一层网络输出作为多变量时序数据的特征表示。
本发明的有益效果是:
1.首次全面且系统地构建了发热待查潜在病因类别层次结构,全面囊括了感染性疾病、肿瘤性疾病与NIID等大类疾病,且基于上述类别层次结构实现了针对发热待查潜在病因进行辅助鉴别诊断的层次分类模型,并能够模拟临床医生的推理逻辑,逐层给出鉴别诊断意见。因此不仅鉴别范围更全面、系统,同时具有更高的鉴别准确度和更好的临床可解释性。除此之外,其由上向下的逐层推理模式也更加符合临床医生的临床实践习惯。
2.所利用的临床数据都是患者就诊早期极易获取的早期临床表现数据,因此在患者早期就诊阶段就能够基于有限信息给出极具临床价值和可信度的鉴别诊断意见。
3.对多变量时序数据、文本数据以及结构化数据等多模态数据均进行了数据预处理与特征提取,并给出了详细的多模态数据融合解决方案。
4.针对分散、多次的门诊就诊与住院就诊,设计了数据规整模块对其进行重新的分割与整合,有助于准确获取患者早期就诊数据,消弭因不规范就诊流程带来的数据获取不精确的问题,建立了临床业务数据与发热待查潜在病因层次分类模型输入数据之间的数据通路。
图1为本发明实施例提供的系统结构框架图;
图2为本发明实施例提供的数据流动路径图;
图3为本发明实施例提供的数据规整原理图;
图4为本发明实施例提供的发热待查潜在病因类别层次结构示意图;
图5为本发明实施例提供的发热待查潜在病因层次分类模型框架图;
图6为本发明实施例提供的引入衰减机制的GRU结构示意图;
图7为本发明实施例提供的基分类器具体神经网络结构图。
为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图对本发明的具体实施方式做详细的说明。
本发明实施提供一种基于任务分解策略的发热待查辅助鉴别诊断系统,如图1所示,该系统包括以下模块:
一、数据获取模块,包含数据库连接管理单元和目标数据自定义单元;
数据库连接管理单元:实现发热待查辅助鉴别诊断系统与异构源数据库的连接;
目标数据自定义单元:通过交互界面配置在异构源数据库内的目标临床信息的数据范围,以及患者唯一标识、就诊唯一标识,并完成对目标数据的扫描以及校验性数据的统计,建立目标数据采集的完整数据通路。
二、数据规整模块,包括:
(1)建立数据规整策略:通过设定发热待查诊断锚点和锚点前后就诊时间差,确定不同就诊周期;
(2)基于数据规整策略对临床业务当中因患者多次门诊就诊与住院就诊产生的不定间隔的业务数据进行重新分割与整合,形成单个患者因单次发热就诊产生的最小数据分析单元;
(3)在最小数据分析单元时间范围内提取最早的就诊病历记录数据,输入多模态数据预处理模块。
三、多模态数据预处理模块,包括文本数据预处理单元、时序数据预处理单元和结构化数据预处理单元;
文本数据预处理单元:针对指定类型的病历文本数据,利用正则表达式技术根据不同类型病历文本的结构特点分别采取位置导向模式和关键词导向模式对病历文本进行目标信息结构化提取;
时序数据预处理单元:对不同采样频率、不同长度以及存在缺失值的多变量时序数据,进行时间窗口对齐与归一化处理;
结构化数据预处理单元:针对结构化数据,完成对分类变量与数值变量的异常值处理、缺失值填充、标准编码以及标准化。
四、发热待查潜在病因层次鉴别模块,包括:
(1)结合医学文献与临床指南,基于任务分解策略构建发热待查潜在病因类别层次结构,将复杂且样本分布不均衡的多分类问题转化为包含多个二分类和三分类任务的层次分类问题;
(2)建立发热待查潜在病因层次分类模型,将模型分类输出空间定义在发热待查潜在病因类别层次结构之上;
(3)在模型训练阶段,采取siblings策略对阳性与阴性训练样本进行划分;基于划分的多个训练样本集,分别训练多个基分类器;
(4)在模型实际应用阶段,采取Top-Down算法对多个基分类器在上下层级间的分类结果进行后处理,修正单个基分类器的局部概率,给出符合发热待查潜在病因类别层次结构的一致概率,得到发热待查患者潜在病因的层次类别分类结果;基于层次类别分类结果给出层次化鉴别诊断意见。
进一步地,以端到端的多模态融合深度神经网络作为发热待查潜在病因层次分类模型的基分类器,基分类器的具体神经网络结构如下:
(1)针对高基数分类变量,利用实体嵌入技术构建嵌入网络层,对分类变量进行特征提取;
(2)通过DNN网络层对分类变量的实体嵌入表示与结构化数值变量进行特征提取;
(3)通过在GRU网络层中引入屏蔽向量、时间间隔因子以及衰减系数,对具有不同时间跨度和不规则采样频率以及缺失值的多变量时序数据进行特征提取;
(4)采取后期融合策略,对DNN网络层输出的特征表示与GRU网络层输出的特征表示进行融合,输入softmax层进行交叉熵损失函数的计算与基分类器的训练。
五、结果展示模块:对发热待查潜在病因层次分类模型涉及的临床表现数据以就诊时间线的方式进行可视化展示,并对发热待查潜在病因层次鉴别模块得到的层次类别分类结果及层次化鉴别诊断意见进行可视化展示。
数据流动路径如图2所示,下面详细阐述每个模块的实现过程。
一、数据获取模块
主要负责在物理层面实现对HIS(Hospital Information System)、LIS(Laboratory Information System)以及EMR(Electronic Medical Record)等目标临床信息系统后端数据的访问以及目标数据范围内的数据获取,包括数据库连接管理单元和目标数据自定义单元。
数据库连接管理单元实现方式主要通过java编程语言的现有类及接口编写多个JDBC模块,建立与异构数据库之间的数据通路,以此为基础实现与源数据库之间的SQL命令交互以及对源数据库返回数据的存储。
目标数据自定义单元主要基于数据库连接管理单元建立的数据通路,针对后续发热待查潜在病因层次分类模型所需要的目标临床信息划定源数据范围,目标临床信息范围包括:年龄、性别、身高和体重4大类基本信息,主诉、既往史、个人史、家族史、婚育史与现病史6大类病历文本数据,体温、呼吸、心律、脉搏与血压5大类护理时序数据以及血常规、尿 常规、凝血功能常规检查、心肌酶谱常规检查、肝肾脂糖电解质测定、粪便常规、红细胞沉降率测定、超敏C反应蛋白、钾钠氯测定、肝功能常规检查等实验室常规化验大项下的124项化验小项数据,即实验室常规化验数据。
通过交互界面人工标定涵盖上述目标临床信息的数据范围以及患者唯一标识、就诊唯一标识之后,完成目标数据到缓存数据库的数据传输,由此确定完整的数据通路。
二、数据规整模块
基于数据获取模块中建立的数据通路,数据规整模块即对临床业务当中产生的不定间隔的业务数据进行规整,以符合后续发热待查潜在病因层次分类模型输入分析的要求。
若将所有患者的电子病历记录数据表示为
其中Θ表示患者数量,R
i表示患者p
i的电子病历记录数据;则对患者p
i,其一般具有唯一的人口统计数据I
i,以及K
i份就诊病历记录a
ij;以A
i表示患者p
i的就诊病历集合,则A
i={a
ij|j=0,...,K
i-1},其中每一份就诊病历记录a
ij下包含多项时序数据集d
ij以及非时序数据集
因此需要在R
i内选定发热待查诊断锚点,对多个a
ij及其范围内的d
ij和
进行分割与组合,获取准确的与本次发热待查就诊相关的a
ij集合。本技术方案提出的数据规整方法,参见图3示例,首先以患者p
i最早被诊断为发热待查的电子病历记录事件为时间锚点t
i,往前纳入7个自然日以内的就诊病历记录,往后纳入下次就诊开始时间与本次就诊结束时间差小于等于24小时的所有就诊病历记录,作为一次就诊周期;下次就诊开始时间距离本次就诊结束时间大于24小时的就诊病历记录归为下一个就诊周期,以此形成单个患者最小的数据分析单元。其后基于前述数据分析单元,提取在该数据分析单元时间范围内发生的最早的就诊病历记录数据
组成后续发热待查潜在病因层次分类模型的输入特征空间。上述规整技术内容在操作数据库内完成。
三、多模态数据预处理模块
多模态数据预处理模块包括文本数据预处理单元、时序数据预处理单元和结构化数据预处理单元。
文本数据预处理单元通过接收指定类型的病历文本数据,利用自然语言处理技术理解输入的病历文本,对病历文本进行目标信息的结构化提取。此处主要是利用正则表达式技术根据不同类型病历文本的结构特点分别采取位置导向模式和关键词导向模式对病历文本进行目标信息结构化提取。其中,既往史、个人史、家族史与婚育史这些病历文本数据均具有固定格式要求,因此通过位置导向模式分别编写正则表达式语言达到信息提取的目的。主要基于主诉与现病史这两类病历文本数据对患者早期临床症状表现信息进行症状实体抽取,因此采取关键词导向模式,利用词典分词技术构建一部目标症状词典C以及一套词典匹配规则R。 词典C包括位置信息不敏感的全身症状词典C1(如消瘦、贫血、乏力等)、位置信息敏感的症状词典C2(如疼痛、占位、酸软等)以及身体部位词典C2-pos(如头部、四肢、扁桃体等)。词典匹配主要采取双向最长匹配算法对症状名称、持续时间、频次、身体部位信息进行结构化提取。最终结构化数据存储结构如表1所示。
表1 文本结构化数据存储示例
时序数据预处理单元主要针对体温、呼吸、心律、脉搏和血压5大类护理时序数据。由于临床业务环境相对复杂,因此在临床护理过程中产生的上述时序数据具有时间跨度不一、不同时序变量间采样频率差异大、普遍存在缺失值以及高度稀疏等特点,给该部分时序数据的分析利用带来极大困难。本技术方案针对护理时序数据的以上特点,首先对护理时序数据进行时间窗口对齐,取每次就诊Φ小时内的数据作为患者早期的临床表现数据;每行数据对应于每位患者的一项时序变量数据序列,依据每一项时序变量数据采样频率以及采样时间跨度的长短分布,明确输入数据时间窗口Φ以及列与列之间的时间间隔φ,实现对同一患者同一次就诊内多时序变量之间的时序对齐。进而采取Min-Max归一化对上述护理时序数据做数值归一化,同时保留时序波形。值得注意的是,此处无须对固有缺失值与重采样产生的缺失值,以及不同就诊之间的采样频率不一致问题进行预处理,因为本技术方案考虑到涉及的多变量时序数据是患者临床护理生命体征状态的反映,后续会通过发热待查潜在病因层次鉴别模块将护理时序数据内的数值缺失模式纳入模型特征范围内做统一处理。
结构化数据预处理单元主要针对结构化以后的病历文本数据、基本信息数据(年龄、性别、身高和体重)以及实验室常规化验数据进行以下预处理操作:异常值处理、缺失值填充、标准编码以及标准化。
异常值处理主要针对人为错误产生的离群点进行处理,针对数值变量的异常值检测,本技术方案主要采取简单统计分析和3σ原则,简单统计分析即对变量值进行描述性统计,预设数值合理空间[min:max],判断超出该数值合理空间的即识别为异常值;3σ原则即对符合正态分布的变量,距离变量平均值3σ之外的概率为P(|τ-μ|>3σ)≤0.003,即属于极小概率事件,因此变量值距离变量平均值大于3σ的即可被认定为异常值。
上述公式中f(τ)为变量τ的正态分布函数,μ为期望(均数),σ为标准差,因此在区间 [μ-3σ,μ+3σ]之外的数据即为异常值。处理方法即将异常值视为缺失值,利用缺失值处理方法进行处理。针对分类变量的异常值检测,即对预设类别之外的错误输入认定为异常值,处理方法即删除该异常值,并通过该变量内的众数值进行填充。
缺失值填充主要针对完全随机缺失;对于分类变量使用众数填充,对于数值变量若其分布符合正态分布则采取平均值填充法,若其分布不符合正态分布则采取中位数填充法,以此降低数据预处理阶段数据预处理的复杂度。
标准编码主要针对分类变量进行数值化处理,对于不同变量值之间存在序列关系、不平等重要性的变量,本技术方案采取整数编码,即对于存在
个唯一取值的变量,可以按顺序将其编码为
对于不同变量值之间无序列关系、无重要性差别的变量,本技术方案采取独热编码,即对于存在
个唯一取值的变量,将每个变量值表示为一个长度为
的[0,1]序列,假设某变量值在
个唯一取值中排序位置为k,则其独热编码后的值为
标准化即在不改变原始数据分布的前提下将数据转化为均值为0,标准差为1的标准正态分布,以消除不同变量之间不同量纲对于后续模型分类的影响。
四、发热待查潜在病因层次鉴别模块
本技术方案针对发热待查潜在病因类别多样、鉴别诊断困难等客观问题,结合既往医学文献与临床指南中对现有发热待查潜在病因的研究与总结,基于任务分解策略形成发热待查潜在病因类别层次结构,将原本复杂、样本分布不均衡的多分类问题转化为包含多个二分类和三分类任务的层次分类问题,详细的类别层次结构划分见图4所示。在基于发热待查潜在病因类别层次结构对发热待查患者进行发热潜在病因的分类时,首先区分发热待查潜在病因属于感染性疾病还是非感染性疾病,若属于感染性疾病,则继续区分是属于细菌性、病毒性、真菌性、寄生虫性还是其他感染性疾病;若属于非感染性疾病,则继续区分是属于肿瘤性疾病、非感染性炎症性疾病(non-infectious inflammatory disease,NIID)还是其他非感染性疾病;若属于肿瘤性疾病,则继续区分是属于血液系统恶性疾病、实体恶性肿瘤还是良性肿瘤;若属于NIID,则继续区分是属于自身免疫性疾病还是自身炎症性疾病。因此在解决大基数多分类任务中存在的样本分布不均衡问题的同时,实现了对临床医生推理逻辑的模拟建模,具有更好的临床可解释性。
层次分类可以看作是一种特殊类型的结构化分类问题,其分类输出空间定义在一个类别层次结构之上。本技术方案构建的类别层次结构T属于树状常规概念层次结构,其具体可被定义为一个偏序集(C,<),其中C表示发热待查潜在病因分类问题涉及到的所有类别概念的有限集合,符号<表示父子继承关系“IS-A”,将类别层次结构T的根节点记为root(T)。类别层 次结构T具有非对称性、反自反性和可传递性,分别表示如下:
可传递性:对于任意类别c
i,c
j,c
k∈C,若c
i<c
j且c
j<c
k,则c
i<c
k。
在模型训练阶段的阳性与阴性样本划分策略方面,为模拟临床医生的推理诊断逻辑,使发热待查潜在病因层次分类模型具有更好的临床可解释性和适用性,本技术方案采取siblings策略,即对类别c
i进行分类预测时,阳性样本为
其中*(c
i)表示类别为c
i的样本集合,
表示类别为c
i所有子类别的样本集合;阴性样本为
其中
表示与类别c
i隶属于同一父类别的同级类别的样本集合,
表示与类别c
i隶属于同一父类别的所有同级类别的所有子类别的样本集合;∪表示集合求并集。
为避免基于局部信息训练基分类器在模型实际应用阶段导致的上下层级间分类结果不一致的情况,本技术方案在模型实际应用阶段采取Top-Down算法对多层间的分类结果进行后处理,对于样本x,在节点π将其分类为类别c
i的基分类器决策概率为
Top-Down算法定义如下:
当前输入样本的类别分类结果不仅取决于当前基分类器对输入样本分类结果的置信度高低,同时也取决于输入样本当前类别的父类别节点基分类器分类结果的正确与否。模型训练阶段会基于前述类别层次结构T训练多个基分类器,模型训练阶段与模型实际应用阶段的实现框架如图5所示。
在模型实际应用阶段每个基分类器会估计给定样本x属于类别c
i的局部概率
后处理的Top-Down算法通过修正局部概率给出最终的一致概率p
i(x),若共有Ψ个类别,则样本x属于类别c
i的一致概率p
i(x)表示为:
对发热待查患者进行辅助鉴别诊断的临床必要性尤其体现在患者就诊早期,其临床症状表现复合度高,缺少鉴别诊断所需的特异性临床表现,因此本技术方案构建的发热待查潜在病因层次分类模型只采取患者就诊早期易获取的临床表现数据。以
表示含有N个发热待查就诊样本的数据集,其中E
n表示主要来自于病历文本数据的高基数分类变量,S
n表示结构化数值变量,V
n表示多变量时序数据,y
n表示就诊样本n的发热待查潜在病因标签。
针对上述以高基数分类变量、结构化数值变量以及多变量时序数据组成的模型输入特征空间,为实现在患者就诊早期即完成对发热待查患者潜在病因的有效鉴别,因此需要对上述多模态数据进行充分的应用与挖掘。故本技术方案构建端到端的多模态融合深度神经网络作为发热待查潜在病因层次分类模型的基分类器,其中包括针对高基数分类变量进行特征提取的实体嵌入网络层、针对多变量时序数据进行特征提取的GRU(门控循环单元)网络层以及针对结构化数值变量进行特征提取的DNN(前馈神经网络)网络层,基分类器的具体神经网络结构如图7所示。
首先为实现对分类变量内具体类别间关系的自动提取,本技术方案采取衍生于文本特征提取的word2vec技术的实体嵌入技术,将高基数分类变量的每一离散取值映射到一维数值向量。首先分类变量E
i的独热编码过程,可以表示为:
其中μ
i表示分类变量E
i到
的映射关系,
表示克罗内克符号,α的可能取值空间等同于E
i的可能取值空间,若m
i是分类变量E
i的可能取值数量,则
为一个长度为m
i的一维数值向量,其中元素仅在α=E
i时取值为1。进而以向量
作为输入,通过一层线性单元完成下述映射过程:
其中ω
αβ为独热编码后的一维数值向量
到嵌入层间的映射权重,可以随模型整体神经网络的误差反向传播进行学习与更新,β为嵌入层的索引,
即为最后分类变量E
i的嵌入表示。对于单个样本内所有分类变量的实体嵌入过程f
θ(·)即可表示为:
其中X
(l)为网络层l的输入向量,X
(l+1)为网络层l+1的输入向量,
表示X
(l)到X
(l+1)的映射转换过程,W
(l)和b
(l)分别为网络层l的权重矩阵与偏置,s
(l)(·)为网络层l的非线性激活函数,可以采取sigmoid,tanh或ReLu。假设DNN网络总层数为L,则采取X
(L-1)作为DNN网络层学习到的数据特征表示。对于单个样本的上述特征表示融合过程也可表示为:
考虑到本技术方案涉及的患者护理时序数据具有时间跨度不一、不规则采样频率以及普遍存在缺失值等问题,因此本技术方案采取循环神经网络框架,基于GRU(门控循环单元)网络对多变量时序数据进行特征提取。考虑到不规则采样频率以及缺失值可能是患者临床状态的反映,若患者某项症状消失,则医生可能会取消对其某项护理生命体征的监测或降低监测频率,因此在上述GRU网络层建模过程中同时将不规则采样频率信息以及缺失值信息纳入时序特征空间进行特征挖掘。本技术方案以
表示含有D个时序变量的第n个样本的多变量时序数据,T
n表示第n个样本的时间节点数量。其中t∈{1,2,...,T
n},
表示所有时序变量在第t个时间节点的观测值,
即
在第d维时序变量的值。以
表示第t个时间节点的事件观测时间,引入屏蔽向量m
t∈{0,1}
D表示在第t个时间节点某一时序变量值是否缺失,同时引入时间间隔因子
对时序变量d在第t个时间节点的不规则时间间隔进行建模,即可表示为:
通过在GRU网络层内引入衰减系数对缺失值与不规则时间间隔所含的潜在模式进行挖掘,修改后的GRU结构如图6所示,并在模型端到端的学习过程中对每个时序变量的衰减系数γ进行学习:
γ
t=exp{-max(0,W
γδ
t+b
γ)}
其中W
γ和b
γ是在GRU网络层训练过程中与其他所有网络参数共同训练得到的与衰减系数γ相关的模型参数,δ
t表示在第t个时间节点的时间间隔因子,γ
t表示在第t个时间节点的衰减系数。
具体的,本技术方案采取输入衰减系数γ
v对缺失变量进行衰减操作,直到变量经验均值,即:
其中
为第d维时序变量在第t个时间节点的观测值经过输入衰减计算之后的值,v
t′.d表示第d维时序变量在上一次非缺失的第t′个时间节点的观测值,
表示第d维时序变量的经验均值,m
t.d表示第d维时序变量在第t个时间节点的屏蔽向量取值,v
t.d表示第d维时序变量在第t个时间节点的观测值,
表示第d维时序变量在第t个时间节点的输入衰减系数。
为保证缺失值信息被充分挖掘,本技术方案同时引入隐藏状态衰减系数γ
h,即在计算新的隐藏状态h
t之前对其前一个时刻的隐藏状态h
t-1进行衰减:
加之将屏蔽向量m
t直接输入GRU网络层训练过程,实现在不显式地计算缺失值的前提下,通过将某个变量的缺失与否信息以及缺失状态持续时间信息输入发热待查潜在病因层次分类模型,实现在模型训练过程中端到端的解决多变量时序数据的不规则时间间隔以及缺失值问题,即:
其中
为经过输入衰减计算之后的第t个时间节点的时序变量输入,z
t为GRU网络隐藏层的净输入,h
t表示在第t个时间节点的隐藏状态,
表示通过非线性函数得到的第t个时间节点的候选状态,r
t表示GRU网络层在第t个时间节点的遗忘门,m
t表示在第t个时间节点的屏蔽向量取值,σ(·)为Logistic函数,其输出区间为(0,1),⊙表示元素点积运算,矩阵W
z,W
r,W,U
z,U
r,U,H
z,H
r,H以及向量b
z,b
r,b均为GRU网络层参数。
因此,将隐藏状态h
t作为GRU网络层在第t个时间节点的输出
后取GRU网络层在所有时序数据的最后一层网络输出
作为多变量时序数据的特征表示。则上述多变量时序数据的特征提取过程h
φ(·)可以表示为:
由于本技术方案在上述多模态数据融合框架中采取后期融合策略,因此最终多模态融合深度神经网络即可表示为:
其中H
σ(·)表示对结构化数值变量、分类变量与多变量时序数据进行特征融合,并得到样本分类预测结果的完整映射转换过程。
五、结果展示模块
结果展示模块主要通过系统前端可视化界面设计对发热待查潜在病因层次分类模型纳入考虑的临床表现数据通过就诊时间线的方式进行可视化展示,同时显示来自发热待查潜在病因层次鉴别模块输出的鉴别诊断意见以及每一基分类器鉴别诊断意见的置信度,以方便临床医生作参考。
本发明针对发热待查潜在病因的辅助鉴别诊断问题构建了全面且系统的发热待查潜在病因类别层次结构,基于任务分解策略将复杂、分类空间异质性大的多分类问题转化为包含多个二分类和三分类任务的层次分类问题,解决了分类难度大、标签样本分布不均衡的难题。
本发明充分考虑临床业务实际,设计数据规整策略并对其进行了自动化实现,将原有因患者多次就诊或转诊导致的分散临床数据进行了有效分割与整合,形成以发热待查患者单次发热病程为基本路径的最小数据分析单元。
本发明基于设计的发热待查潜在病因类别层次结构,设计实现发热待查潜在病因层次分类模型,其从上往下的逐层推理方式更加符合临床医生的鉴别诊断逻辑,有效提升模型可解释性和临床适用性。
本发明构建了完整的多模态融合深度神经网络,对患者入院早期易获取的病历文本数据、实验室常规化验数据以及护理时序数据进行了充分、有效的融合与挖掘,实现了对发热待查潜在病因进行早期辅助鉴别诊断的目的以及对早期有限临床表现数据最大程度的利用。
以上所述仅是本发明的优选实施方式,虽然本发明已以较佳实施例披露如上,然而并非用以限定本发明。任何熟悉本领域的技术人员,在不脱离本发明技术方案范围情况下,都可利用上述揭示的方法和技术内容对本发明技术方案做出许多可能的变动和修饰,或修改为等同变化的等效实施例。因此,凡是未脱离本发明技术方案的内容,依据本发明的技术实质对以上实施例所做的任何的简单修改、等同变化及修饰,均仍属于本发明技术方案保护的范围内。
Claims (9)
- 一种基于任务分解策略的发热待查辅助鉴别诊断系统,其特征在于,包括以下模块:(1)数据获取模块:实现发热待查辅助鉴别诊断系统与异构源数据库的连接;通过交互界面配置在异构源数据库内的目标临床信息的数据范围,以及患者唯一标识、就诊唯一标识,并完成对目标数据的扫描以及校验性数据的统计,建立目标数据采集的完整数据通路;(2)数据规整模块:建立数据规整策略,以患者最早被诊断为发热待查的电子病历记录事件为发热待查诊断锚点,往前纳入7个自然日以内的就诊病历记录,往后纳入下次就诊开始时间与本次就诊结束时间差小于等于24小时的所有就诊病历记录,作为一次就诊周期;下次就诊开始时间距离本次就诊结束时间大于24小时的就诊病历记录归为下一个就诊周期;基于数据规整策略对临床业务当中因患者多次门诊就诊与住院就诊产生的不定间隔的业务数据进行重新分割与整合,形成单个患者因单次发热就诊产生的最小数据分析单元;在最小数据分析单元时间范围内提取最早的就诊病历记录数据;(3)多模态数据预处理模块:针对指定类型的病历文本数据,利用正则表达式技术根据不同类型病历文本的结构特点分别采取位置导向模式和关键词导向模式对病历文本进行目标信息结构化提取;对不同采样频率、不同长度以及存在缺失值的多变量时序数据,进行时间窗口对齐与归一化处理;针对结构化数据,完成对分类变量与数值变量的异常值处理、缺失值填充、标准编码以及标准化;(4)发热待查潜在病因层次鉴别模块,包括:基于任务分解策略构建发热待查潜在病因类别层次结构,将复杂且样本分布不均衡的多分类问题转化为包含多个二分类和三分类任务的层次分类问题;建立发热待查潜在病因层次分类模型,将模型分类输出空间定义在发热待查潜在病因类别层次结构之上;在模型训练阶段,采取siblings策略对阳性与阴性训练样本进行划分,基于划分的多个训练样本集分别训练多个基分类器;在模型实际应用阶段,采取Top-Down算法对多个基分类器在上下层级间的分类结果进行后处理,修正单个基分类器的局部概率,给出符合发热待查潜在病因类别层次结构的一致概率,得到发热待查患者潜在病因的层次类别分类结果,基于层次类别分类结果给出层次化鉴别诊断意见。
- 根据权利要求1所述的一种基于任务分解策略的发热待查辅助鉴别诊断系统,其特征在于,所述系统还包括结果展示模块,所述结果展示模块用于对发热待查潜在病因层次分类模型涉及的临床表现数据以就诊时间线的方式进行可视化展示,并对发热待查潜在病因层次鉴别模块得到的层次类别分类结果及层次化鉴别诊断意见进行可视化展示。
- 根据权利要求1所述的一种基于任务分解策略的发热待查辅助鉴别诊断系统,其特征在于,所述数据获取模块包含数据库连接管理单元和目标数据自定义单元;所述数据库连接管理单元包括:通过java编程语言的类及接口编写多个JDBC模块,建立与异构数据库之间的数据通路,实现与源数据库之间的SQL命令交互以及对源数据库返回数据的存储;所述目标数据自定义单元包括:针对发热待查潜在病因层次分类模型所需的目标临床信息划定数据范围,通过交互界面配置数据范围、患者唯一标识和就诊唯一标识,完成目标数据到缓存数据库的数据传输,确定完整数据通路。
- 根据权利要求1所述的一种基于任务分解策略的发热待查辅助鉴别诊断系统,其特征在于,所述多模态数据预处理模块包括文本数据预处理单元、时序数据预处理单元和结构化数据预处理单元;所述文本数据预处理单元包括:针对既往史、个人史、家族史与婚育史这四类病历文本数据,采取位置导向模式分别编写正则表达式语句进行目标信息结构化提取;基于主诉与现病史这两类病历文本数据,采取关键词导向模式,利用词典分词技术构建目标症状词典及词典匹配规则;所述目标症状词典包括位置信息不敏感的全身症状词典、位置信息敏感的症状词典以及身体部位词典,所述词典匹配采取双向最长匹配算法对症状名称、持续时间、频次、身体部位信息进行结构化提取;所述时序数据预处理单元包括:对多变量时序数据进行时间窗口对齐,取每次就诊固定时间内的数据作为患者早期临床表现数据;每行数据对应于每位患者的一项时序变量数据序列,依据每一项时序变量数据采样频率以及采样时间跨度的长短分布,明确输入数据时间窗口以及列与列之间的时间间隔,实现对同一患者同一次就诊内多时序变量之间的时序对齐;采取Min-Max归一化对时序数据做数值归一化;所述结构化数据预处理单元包括:针对结构化后的病历文本数据、基本信息数据以及实验室常规化验数据进行以下预处理操作:异常值处理、缺失值填充、标准编码以及标准化。
- 根据权利要求4所述的一种基于任务分解策略的发热待查辅助鉴别诊断系统,其特征在于,所述结构化数据预处理单元中,所述异常值处理包括:针对数值变量的异常值检测,采取统计分析和3σ原则,将异常值视为缺失值,利用缺失值处理方法进行处理;针对分类变量的异常值检测,对预设类别之外的错误输入认定为异常值,删除异常值并通过分类变量内的众数值进行填充;所述缺失值填充包括:对于分类变量使用众数填充,对于数值变量若其分布符合正态分布则采取平均值填充法,若其分布不符合正态分布则采取中位数填充法;所述标准编码包括:针对分类变量进行数值化处理,对于不同变量值之间存在序列关系、不平等重要性的变量,采取整数编码,对于不同变量值之间无序列关系、无重要性差别的变量,采取独热编码。
- 根据权利要求1所述的一种基于任务分解策略的发热待查辅助鉴别诊断系统,其特征在于,所述发热待查潜在病因层次鉴别模块中,基于发热待查潜在病因类别层次结构对发热待查患者进行发热潜在病因分类时,首先区分发热潜在病因属于感染性疾病还是非感染性疾病,若属于感染性疾病,则继续区分是属于细菌性、病毒性、真菌性、寄生虫性还是其他感染性疾病;若属于非感染性疾病,则继续区分是属于肿瘤性疾病、NIID还是其他非感染性疾病;若属于肿瘤性疾病,则继续区分是属于血液系统恶性疾病、实体恶性肿瘤还是良性肿瘤;若属于NIID,则继续区分是属于自身免疫性疾病还是自身炎症性疾病;所述发热待查潜在病因类别层次结构具有非对称性、反自反性和可传递性。
- 根据权利要求1所述的一种基于任务分解策略的发热待查辅助鉴别诊断系统,其特征在于,所述发热待查潜在病因层次鉴别模块中,以端到端的多模态融合深度神经网络作为发热待查潜在病因层次分类模型的基分类器,所述基分类器结构如下:针对高基数分类变量,利用实体嵌入技术构建嵌入网络层,对分类变量进行特征提取;通过DNN网络层对分类变量的实体嵌入表示与结构化数值变量进行特征提取;通过在GRU网络层中引入屏蔽向量、时间间隔因子以及衰减系数,对具有不同时间跨度和不规则采样频率以及缺失值的多变量时序数据进行特征提取;采取后期融合策略,对DNN网络层输出的特征表示与GRU网络层输出的特征表示进行融合,输入softmax层进行交叉熵损失函数的计算与基分类器的训练。
- 根据权利要求7所述的一种基于任务分解策略的发热待查辅助鉴别诊断系统,其特征在于,所述基分类器中,利用实体嵌入技术,将高基数分类变量的每一离散取值映射到一维数值向量,将一维数值向量经过线性单元转化得到分类变量的实体嵌入表示;将分类变量的实体嵌入表示与结构化数值变量合并后输入DNN网络层,经多层全连接神经网络的非线性转换,得到样本经DNN网络层学习到的数据特征表示。
- 根据权利要求7所述的一种基于任务分解策略的发热待查辅助鉴别诊断系统,其特征在于,所述基分类器中,以 表示含有D个时序变量的第n个样本的多变量时序数据,T n表示第n个样本的时间节点数量, 表示第n个样本的所有时序变量在第t个时间节点的观测值,t∈{1,2,...,T n};以 表示第t个时间节点的事件观测时间,引入屏蔽向量m t∈{0,1} D表示在第t个时间节点某一时序变量值是否缺失, 同时引入时间间隔因子 对时序变量d在第t个时间节点的不规则时间间隔进行建模,表示为:所述GRU网络层中引入衰减系数,对缺失值与不规则时间间隔所含的潜在模式进行挖掘,并在模型端到端的学习过程中对每个时序变量的衰减系数进行学习;γ t=exp{-max(0,W γδ t+b γ)}其中W γ和b γ是在GRU网络层训练过程中与其他所有网络参数共同训练得到的与衰减系数相关的模型参数,δ t表示在第t个时间节点的时间间隔因子,γ t表示在第t个时间节点的衰减系数;采取输入衰减系数对缺失变量进行衰减操作,直到变量经验均值;采取隐藏状态衰减系数在计算新的隐藏状态之前对其前一个时刻的隐藏状态进行衰减;取GRU网络层在所有时序数据的最后一层网络输出作为多变量时序数据的特征表示。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111311947.0 | 2021-11-08 | ||
CN202111311947.0A CN113744873B (zh) | 2021-11-08 | 2021-11-08 | 一种基于任务分解策略的发热待查辅助鉴别诊断系统 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023078025A1 true WO2023078025A1 (zh) | 2023-05-11 |
Family
ID=78727712
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/124226 WO2023078025A1 (zh) | 2021-11-08 | 2022-10-10 | 一种基于任务分解策略的发热待查辅助鉴别诊断系统 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113744873B (zh) |
WO (1) | WO2023078025A1 (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116342345A (zh) * | 2023-05-26 | 2023-06-27 | 湖南智慧平安科技有限公司 | 一种基于大数据的智慧社区便民综合服务方法及平台 |
CN116700094A (zh) * | 2023-06-21 | 2023-09-05 | 哈尔滨博尼智能技术有限公司 | 一种数据驱动控制系统 |
CN116860977A (zh) * | 2023-08-21 | 2023-10-10 | 之江实验室 | 一种面向矛盾纠纷调解的异常检测系统及方法 |
CN117935249A (zh) * | 2024-03-20 | 2024-04-26 | 南昌工程学院 | 基于三维激光扫描参数自动提取的围岩等级辨识系统 |
CN118645218A (zh) * | 2024-08-09 | 2024-09-13 | 四川大学华西医院 | 基于数据结构化的培训策略生成方法、系统、终端及介质 |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113744873B (zh) * | 2021-11-08 | 2022-02-11 | 浙江大学 | 一种基于任务分解策略的发热待查辅助鉴别诊断系统 |
CN115547502B (zh) * | 2022-11-23 | 2023-04-07 | 浙江大学 | 基于时序数据的血透病人风险预测装置 |
CN116153516B (zh) * | 2023-04-19 | 2023-07-07 | 山东中医药大学第二附属医院(山东省中西医结合医院) | 一种基于分布式计算的疾病大数据挖掘分析系统 |
CN116383722A (zh) * | 2023-06-05 | 2023-07-04 | 青岛理工大学 | 一种基于门控循环单元神经网络的压裂措施过程监控方法 |
CN117116497B (zh) * | 2023-10-16 | 2024-01-12 | 长春中医药大学 | 一种用于妇科疾病的临床护理管理系统 |
CN117976130A (zh) * | 2023-11-29 | 2024-05-03 | 银川童宜棠互联网医院有限公司 | 基于智能语音交互的健康管理方案生成方法 |
CN117743957B (zh) * | 2024-02-06 | 2024-05-07 | 北京大学第三医院(北京大学第三临床医学院) | 一种基于机器学习的Th2A细胞的数据分选方法及相关设备 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190057774A1 (en) * | 2017-08-15 | 2019-02-21 | Computer Technology Associates, Inc. | Disease specific ontology-guided rule engine and machine learning for enhanced critical care decision support |
CN111192680A (zh) * | 2019-12-25 | 2020-05-22 | 山东众阳健康科技集团有限公司 | 一种基于深度学习和集成分类的智能辅助诊断方法 |
CN112768057A (zh) * | 2021-01-14 | 2021-05-07 | 重庆医科大学 | 鉴别儿童发热待查病因的系统 |
CN113488183A (zh) * | 2021-06-30 | 2021-10-08 | 南京云上数融技术有限公司 | 一种发热疾病多模态特征融合认知系统、设备、存储介质 |
CN113744873A (zh) * | 2021-11-08 | 2021-12-03 | 浙江大学 | 一种基于任务分解策略的发热待查辅助鉴别诊断系统 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106709252A (zh) * | 2016-12-26 | 2017-05-24 | 重庆星空云医疗科技有限公司 | 预测、诊断、治疗和控制医院感染的智能决策辅助系统 |
CN109830303A (zh) * | 2019-02-01 | 2019-05-31 | 上海众恒信息产业股份有限公司 | 基于互联网一体化医疗平台的临床数据挖掘分析与辅助决策方法 |
CN113342973A (zh) * | 2021-06-03 | 2021-09-03 | 重庆南鹏人工智能科技研究院有限公司 | 一种基于疾病二分类器的辅助诊断模型的诊断方法 |
-
2021
- 2021-11-08 CN CN202111311947.0A patent/CN113744873B/zh active Active
-
2022
- 2022-10-10 WO PCT/CN2022/124226 patent/WO2023078025A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190057774A1 (en) * | 2017-08-15 | 2019-02-21 | Computer Technology Associates, Inc. | Disease specific ontology-guided rule engine and machine learning for enhanced critical care decision support |
CN111192680A (zh) * | 2019-12-25 | 2020-05-22 | 山东众阳健康科技集团有限公司 | 一种基于深度学习和集成分类的智能辅助诊断方法 |
CN112768057A (zh) * | 2021-01-14 | 2021-05-07 | 重庆医科大学 | 鉴别儿童发热待查病因的系统 |
CN113488183A (zh) * | 2021-06-30 | 2021-10-08 | 南京云上数融技术有限公司 | 一种发热疾病多模态特征融合认知系统、设备、存储介质 |
CN113744873A (zh) * | 2021-11-08 | 2021-12-03 | 浙江大学 | 一种基于任务分解策略的发热待查辅助鉴别诊断系统 |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116342345A (zh) * | 2023-05-26 | 2023-06-27 | 湖南智慧平安科技有限公司 | 一种基于大数据的智慧社区便民综合服务方法及平台 |
CN116342345B (zh) * | 2023-05-26 | 2023-09-19 | 贺显雅 | 一种基于大数据的智慧社区便民综合服务方法及平台 |
CN116700094A (zh) * | 2023-06-21 | 2023-09-05 | 哈尔滨博尼智能技术有限公司 | 一种数据驱动控制系统 |
CN116700094B (zh) * | 2023-06-21 | 2024-03-01 | 哈尔滨博尼智能技术有限公司 | 一种数据驱动控制系统 |
CN116860977A (zh) * | 2023-08-21 | 2023-10-10 | 之江实验室 | 一种面向矛盾纠纷调解的异常检测系统及方法 |
CN116860977B (zh) * | 2023-08-21 | 2023-12-08 | 之江实验室 | 一种面向矛盾纠纷调解的异常检测系统及方法 |
CN117935249A (zh) * | 2024-03-20 | 2024-04-26 | 南昌工程学院 | 基于三维激光扫描参数自动提取的围岩等级辨识系统 |
CN117935249B (zh) * | 2024-03-20 | 2024-06-07 | 南昌工程学院 | 基于三维激光扫描参数自动提取的围岩等级辨识系统 |
CN118645218A (zh) * | 2024-08-09 | 2024-09-13 | 四川大学华西医院 | 基于数据结构化的培训策略生成方法、系统、终端及介质 |
Also Published As
Publication number | Publication date |
---|---|
CN113744873A (zh) | 2021-12-03 |
CN113744873B (zh) | 2022-02-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023078025A1 (zh) | 一种基于任务分解策略的发热待查辅助鉴别诊断系统 | |
WO2023202508A1 (zh) | 一种基于认知图谱的全科患者个性化诊疗方案推荐系统 | |
WO2022227294A1 (zh) | 一种基于多模态融合的疾病风险预测方法和系统 | |
CN113015977A (zh) | 使用自然语言处理的对疾病和病症的基于深度学习的诊断和转诊 | |
CN110880362B (zh) | 一种大规模医疗数据知识挖掘与治疗方案推荐系统 | |
Lee et al. | Machine learning in relation to emergency medicine clinical and operational scenarios: an overview | |
US20200303072A1 (en) | Method and system for supporting medical decision making | |
CN106934235B (zh) | 一种基于迁移学习的疾病领域间病人相似性度量迁移系统 | |
WO2016192612A1 (zh) | 基于深度学习对医疗数据进行分析的方法及其智能分析仪 | |
López-Martínez et al. | A neural network approach to predict early neonatal sepsis | |
Zhao et al. | Early prediction of sepsis based on machine learning algorithm | |
Ding et al. | Mortality prediction for ICU patients combining just-in-time learning and extreme learning machine | |
Gupta et al. | A novel deep similarity learning approach to electronic health records data | |
Chen et al. | A deep-learning based ultrasound text classifier for predicting benign and malignant thyroid nodules | |
Yang et al. | Disease prediction model based on bilstm and attention mechanism | |
CN117954090A (zh) | 一种基于多模态缺失数据患者的死亡率预测方法及系统 | |
Li et al. | Patient multi-relational graph structure learning for diabetes clinical assistant diagnosis | |
Song et al. | Research of medical aided diagnosis system based on temporal knowledge graph | |
Liu et al. | Interpretable machine learning model for early prediction of mortality in elderly patients with multiple organ dysfunction syndrome (MODS): a multicenter retrospective study and cross validation | |
Ge et al. | Using deep learning with attention mechanism for identification of novel temporal data patterns for prediction of ICU mortality | |
Shafqat et al. | A unified deep learning diagnostic architecture for big data healthcare analytics | |
Basha et al. | Deep learning neural network (DLNN)-based classification and optimization algorithm for organ inflammation disease diagnosis | |
CN116344028A (zh) | 一种基于多模态异构数据的肺部疾病自动识别方法及装置 | |
Reid | Diabetes diagnosis and readmission risks predictive modelling: USA | |
CN113450919A (zh) | 心衰预警模型的构建方法、构建系统和心衰预警装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22889051 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22889051 Country of ref document: EP Kind code of ref document: A1 |