Screening features that passed the hypothesis test were established with reference to the 2019 ESC guidelines for diagnosis and management of acute pulmonary embolism. The data set of the study on pulmonary embolism was established by cleaning, sorting and screening the massive data obtained from hospitals.

•

Five machine learning (SVM, LogisticRregression, random forest XGBoost, and BP neural network) models for pulmonary embolism were developed. XGBoost model is approved as optimal among the five models, and its sensitivity, specificity and missed diagnosis rate are all superior to the comparison model, reaching the standard of assisting doctors in the clinical application.

•

The important features that constitute the XGBoost decision result are obtained. the 2019 ESC guideline has shown that these features are also important in the clinic, suggesting that our model has learned important information about screening for pulmonary embolism.

•

The model is used prior to pulmonary angiography and only requires the input of routine laboratory and test results to assess the patient's risk of pulmonary embolism and provides a reference for doctors in the next-step examination.

Abstract

Background and objectives

Pulmonary embolism (PE) is a complex disease with high mortality and morbidity rate, leading to increasing society burden. However, current diagnosis is solely based on symptoms and laboratory data despite its complex pathology, which easily leads to misdiagnosis and missed diagnosis by inexperienced doctors. Especially, CT pulmonary angiography, the gold standard method, is not widely available. In this study, we aim to establish a rapid and accurate screening model for pulmonary embolism using machine learning technology. Importantly, data required for disease prediction are easily accessed, including routine laboratory data and medical record information of patients.

Methods

We extracted features from patients' routine laboratory results and medical records, including blood routine, biochemical group, blood coagulation routine and other test results, as well as symptoms and medical history information. Samples with a feature loss rate greater than 0.8 were deleted from the original database. Data from 4723 cases were retained, 231 of which were positive for pulmonary embolism. 50 features were retained through the positive and negative statistical hypothesis testing which was used to build the predictive model. In order to avoid identification as majority-class samples caused by the imbalance of sample proportion, we used the method of Synthetic Minority Oversampling Technique (SMOTE) to increase the amount of information on minority samples. Five typical machine learning algorithms were used to model the screening of pulmonary embolism, including Support Vector Machines, Logistic Regression, Random Forest, XGBoost, and Back Propagation Neural Networks. To evaluate model performance, sensitivity, specificity and AUC curve were analyzed as the main evaluation indicators. Furthermore, a baseline model was established using the characteristics of the pulmonary embolism guidelines as a comparison model.

Results

We found that XGBoost showed better performance compared to other models, with the highest sensitivity and specificity (0.99 and 0.99, respectively). Moreover, it showed significant improvement in performance compared to the baseline model (sensitivity and specificity were 0.76 and 0.76 respectively). More important, our model showed low missed diagnosis rate (0.46) and high AUC value (0.992). Finally, the calculation time of our model is only about 0.05 s to obtain the possibility of pulmonary embolism.

Conclusions

In this study, five machine learning classification models were established to assess the likelihood of patients suffering from pulmonary embolism, and the XGBoost model most significantly improved the precision, sensitivity, and AUC for pulmonary embolism screening. Collectively, we have established an AI-based model to accurately predict pulmonary embolism at early stage.

References

[1]

R. Osteresch, A. Fach, R. Hambrecht, et al., ESC-leitlinien 2019 zu diagnostik und management der akuten lungenembolie, Herz 44 (2019) 696–700,.

Highlights

Abstract

Background and objectives

Methods

Results

Conclusions

References

Recommendations

Model-based cardiac diagnosis of pulmonary embolism

AANet: Artery-Aware Network for Pulmonary Embolism Detection in CTPA Images

Attention Based CNN-LSTM Network for Pulmonary Embolism Prediction on Chest Computed Tomography Pulmonary Angiograms

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations