Nothing Special   »   [go: up one dir, main page]

Major Project Report - AKTU

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Cardio Guard: Heart Disease Prediction

using Machine Learning Algorithms

A Project Report Submitted

in Partial Fulfillment of the Requirements

for the Degree of

Bachelor In Technology


Computer Science and Engineering


Om Katiyar (2000910100112)
Manan Agarwal (2000910100097)
Manvendra Singh Pathya (2000910100100)



December, 2023

We hereby declare that this submission is our own work and that, to the best of our knowledge and
belief, it contains no material previously published or written by another person nor material which
to a substantial extent has been accepted for the award of any other degree or diploma of the
university or other institute of higher learning, except where due acknowledgment has been made in
the text.

Name : Om Katiyar
Roll No.: 2000910100112
Date : 23-12-2023

Name : Manan Agarwal
Roll No.: 2000910100097
Date : 23-12-2023

Name : Manvendra Singh Pathya
Roll No.: 2000910100100
Date : 23-12-2023

This is to certify that Project Report entitled “Heart Disease Prediction using Machine Learning
Algorithms” which is submitted by Om Katiyar, Manan Agarwal and Manvendra Singh Pathya
in partial fulfillment of the requirement for the award of degree B. Tech. in Department of
Computer Science and Engineering of Dr. APJ Abdul Kalam Technical University, Uttar Pradesh,
Lucknow is a record of the candidate’s own work carried out by him/her under my/our supervision.
The matter embodied in this thesis is original and has not been submitted for the award of any other

Supervisor: Mr. Mukesh Raj

Date: 23-12-2023

Cardiovascular diseases remain a leading global cause of mortality according to estimates from the
World Health Organization of 12 million deaths annually (World Health Organization, 2022). Often
referred to as heart diseases, early and accurate diagnosis of these conditions is crucial for effective
treatment and prevention. However, traditional diagnostic methods can be time-consuming, invasive,
and costly (Smith et al. 2020). Recent advancements in machine learning (ML) have prompted
researchers to explore its potential for predicting heart disease, offering a noninvasive and possibly
more efficient approach (Brown et al. 2021).

This literature review aims to provide a comprehensive and systematic analysis of the emerging field
of ML applications in heart disease prediction. A rigorous search of multiple academic databases was
conducted to identify relevant research papers published between 2010 and 2022. Only peer-
reviewed studies that applied one or more ML algorithms to predict cardiovascular outcomes were
included in the review.

The review examines several key aspects of the research. First, it analyzes the diversity of ML
algorithms and techniques employed across different studies. A wide range of supervised learning
methods have been explored, including logistic regression, support vector machines, deep learning
models, and ensemble techniques (Kumar et al. 2021). Second, it assesses the performance of
various algorithms by comparing their accuracy, sensitivity, specificity, and other evaluation metrics.
This helps identify the most promising ML approaches for heart disease prediction.

Third, the review addresses important challenges and limitations of current research. Issues
examined include data quality, bias in datasets, and lack of interpretability for "black-box" models
(Rajpurkar et al. 2017). Large, diverse datasets are still needed to develop generalizable models.
Fourth, it highlights several future directions that could advance the field, such as combining clinical
variables with other data types, developing explainable models, and validating models in real-world
clinical settings.

Through this systematic and critical analysis of the literature, the review aims to provide valuable
insights on both the potential and challenges of applying ML to revolutionize early heart disease
diagnosis and management. It serves as a useful resource for researchers, clinicians, and healthcare
professionals seeking to understand recent progress and identify opportunities in this important
application of artificial intelligence. With further advances, ML-based tools may help reduce the
global burden of cardiovascular diseases.

The medical industry currently faces the major challenge of predicting cardiovascular disease (CVD)
through less expensive and more reliable methods. CVD remains a leading cause of death globally
and disproportionately impacts individuals in low-income countries and developing regions (World
Health Organization, 2021). The costs associated with treating advanced CVD places a substantial
burden on both individuals and healthcare systems. Furthermore, late detection allows CVD to
progress to more severe stages, diminishing the effectiveness of interventions and management
strategies while also reducing quality of life for patients.

The text posits that early detection could help address this issue in two critical ways. First, finding
and treating CVD in earlier, more manageable stages would lower treatment costs over the long-term
by preventing the need for more intensive and expensive procedures as the disease compounds.
Second, detecting CVD sooner would catch many cases when lifestyle or pharmaceutical
interventions could still significantly impact progression, preserving health and function for patients.
Detecting the disease early thus not only cuts costs but also stands to improve patients' quality of
remaining years by forestalling disability or premature death.

However, the methods currently used for CVD prediction and screening—such as expensive
diagnostic imaging tests or invasive procedures—present barriers in low-resource settings and
developing nations where the burden is highest. More affordable and reliable alternatives are needed
to make early detection practices widely accessible in these contexts. Doing so could help curb the
rising personal and economic toll of CVD in vulnerable populations globally.

[1] Alkayyali, Z. K. & et al., (2023) presents systemic review of 40 studies using machine learning
for heart disease prediction. The authors used systematic learning techniques with CNN being the
most prevalent, achieving accuracy exceeding 88%. Showed how important is the need for a diverse
dataset. Suggested future scope studies in the under explored techniques like reinforcement learning
and semi-supervised learning with more diverse datasets to improve model generalizability.

[2] Bhatt, C. M. & et al., (2023) shows how the machine learning algorithms used to predict heart
diseases have up to 94% accuracy. However, these studies have often used small sample sizes, and
the results may not be generalizable to larger populations. In this paper, the authors have converted
the continuous data into categorical data which improves the performance of the machine learning
algorithms. However, there is no one approach to this, every method depends on the specific dataset
and the machine learning algorithm used. The authors used the following machine learning
algorithms – decision trees (that are used to classify data into different categories), random forests
(that combines various decision trees to improve accuracy) and vector machines (that are used for
classification and regression).

[3] Gupta, C. & et al., (2022) in their paper stated that they used supervise machine learning
algorithms to predict heart disease with high accuracy. However, there is no one-size-fits-all
approach for everything, different algorithms work in different datasets. Logistic regression is a type
of machine learning algorithm that is often used for classification tasks. It has been shown to be
effective for predicting cardiac disease in several studies. Other machine learning algorithms that
have been used to predict cardiac disease include decision trees, random forests, and support vector

[4] Rahman, M. M. & et al., (2022) the paper discusses a web-based heart disease prediction system
that uses machine learning algorithms to predict heart disease. The system uses 13 health parameters
which have been shown to be effective in the prediction of heart disease in other studies. The system
also uses eight machine learning algorithms to make predictions. The results show that decision trees
and random forests provide the best accuracy in predicting the heart disease. The system is hosted on
a website so that the people can check their heart condition from anywhere.

[5] Hasanova, H. & et al., (2022) proposed the use of blockchain technology to improve the heart
disease prediction using the machine learning algorithms. It discusses that the traditional methods of
storing and analysing medical data are not secure or efficient. There is a lack of transparency and
accessibility to the patients regarding their own data. The authors proposed the blockchain based
system which included decentralized data storage and patients control over their data. The authors
also proposed a machine learning based Sine Cosine Weighted K-Nearest Neighbour (SCA_WKNN)
algorithm for heart disease prediction that learns from the data being stored in the blockchain.

[6] Rajendran, R. & et al., (2022) developed a new machine learning pipeline for improving accuracy
in prediction of heart diseases. The authors combined four datasets to increase dataset volume and
address bias-variance issues. Imputes missing values and removes outliers based on attribute
relationships and mahala Nobis distance. Proposed a new entropy-based feature engineering (EFE)
technique for improved data quality. Utilized various machine learning models like logistic
regression, decision tree, random forest, support vector machine, naives bayes etc with different pre-
processing and FE techniques. Ensemble of logistic reasoning and naives bayes are found to be best
performing with accuracy of 96.8%, specificity of 92.7, precision of 91.5% and F1 score of 0.931.
[7] Absar, N. & et al., (2022) uses four machine learning models (random forest, decision tree,
adaboost, KNN) to predict heart diseases. It combined and analysed data from four sources
(Cleveland, Hungary, Switzerland, Long Beach). The proposed system achieves 99.03% accuracy on
the combined dataset and 93.43% on the individual Cleveland dataset. The authors used Streamlit for
building the prediction system adds potential for wider accessibility and ease of use.

[8] Sarra, R. R. & et al., (2022) proposes a new heart disease prediction based on support vector
machine (SVM) algorithm, aiming to improve the accuracy and reduce computation load. By using
the X2 statistical feature selection method, the model achieves 89.47% accuracy on the Cleveland
dataset and 89.7% accuracy on Statlog dataset. The X2 method selects only 6 important features out
of the 14, reducing computational load and potentially improving generalizability.

[9] Nagavelli, U. & et al., (2022) focuses on applying machine learning techniques for improved
detection and prediction of heart disease, particularly early stage heart failure (HFD). Existing
research indicates the potential of ML in disease diagnosis, highlighting various techniques like
SVM, XGBoost, and Naïve Bayes. Challenges mentioned include limited large scale and diverse
datasets for accurate model training and interpretability difficulties with certain ML algorithms.
Future research directions include exploring under-utilized techniques like DBSCAN and SMOTE-
ENN for data preparation and outlier handling and investigating more diverse datasets to improve
model generalizability.

[10] Raju, K. B. & et al. (2022) the paper discusses about the research about the heart disease
diagnosis using IoT and deep learning and how it suffers from limited accuracy or high
computational cost. Used Optimized Cascaded Convolutional Neural Network (CCNN) for
diagnosis. The CCNN hyperparameters are optimized by the Galactic Swarm Optimization (GSO)
algorithm. Challenges identified include security and privacy concerns, data complexity, and the
need for effective feature extraction and optimization techniques.

[11] Al Ahdal, A. & et al. (2022) investigates the potential of machine learning (ML) for predicting
heart disease by analysing the UCI medical dataset using various ML algorithms. The authors aim to
compare and evaluate the performance of different ML models for heart disease prediction, validate
the achieved results through accuracy and confusion matrix analysis and improve model efficiency
by handling irrelevant attributes and data normalization. The future work includes using more
advanced ML models and address the issue of data privacy and security concerns.

[12] Sharean, T. M. & et al. (2022) reviews various deep learning techniques for heart disease
prediction. It highlights the potential of these methods for early and accurate diagnosis, potentially
leading to improved patient outcomes. The deep learning approaches include CNN-based method
which extract features from medical data (e.g., ECG signals) and achieve high accuracy in predicting
heart disease (up to 99.1%) and DNN-based method which combine feature selection and deep
learning for efficient and reliable prediction (up to 98.77%). Ensemble deep learning method
combines multiple deep learning models which further improves accuracy (up to 98.5%). The future
direction is to explore more advanced deep learning models like LSTMs and capsule networks,
transfer learning for limited data, and addressing data privacy and security concerns.

Heart disease is the leading cause of death globally, responsible for an estimated 17.9 million
deaths per year. Early detection and intervention are crucial in improving patient outcomes and
reducing mortality rates. Traditional methods of diagnosis often rely on invasive procedures or
subjective analysis, making them less than ideal for early detection.

This project is motivated by the potential of machine learning algorithms to overcome these
limitations and revolutionize heart disease prediction.

Here are some specific motivations:

1. Improved Accuracy and Early Detection: Machine learning algorithms can analyse
vast amounts of data, including medical history, demographics, lifestyle factors, and
physiological measurements, to identify subtle patterns and relationships that may be
missed by traditional methods. This can lead to more accurate predictions of heart
disease risk, even in the early stages, allowing for earlier intervention and improved
patient outcomes.

2. Non-invasive and Cost-effective Approach: Machine learning models can be trained

on existing data, eliminating the need for expensive or invasive diagnostic procedures.
This can make heart disease screening more accessible and affordable, particularly for
vulnerable populations with limited access to healthcare.

3. Personalized Medicine: Machine learning algorithms can tailor individual risk

assessments by considering unique patient characteristics and risk factors. This
personalized approach can guide more targeted preventive measures and treatment
strategies, leading to better long-term health outcomes.

4. Enhanced Understanding of Heart Disease: By analysing and interpreting patterns in

the data, machine learning models can contribute to a deeper understanding of the
complex factors that contribute to heart disease. This knowledge can inform the
development of more effective interventions and preventive strategies in the future.

5. Continuous Improvement and Adaptability: Machine learning models can be

continuously updated and improved as new data becomes available. This allows them
to adapt to evolving trends and patterns in heart disease, further enhancing their
predictive accuracy and clinical relevance.

Overall, the motivation for this project stems from the potential of machine learning to
revolutionize heart disease prediction, offering a promising avenue for early detection,
personalized medicine, and improved patient outcomes.

This project aims to achieve the following objectives:

 Early Detection: Developing a machine learning model to predict risk of

cardiovascular disease as early as possible would allow for timely medical screening
and lifestyle modifications to reduce risk. Early detection is crucial, as intervention is
most effective when implemented prior to the onset of symptoms or disease.
 Accurate Risk Assessment: Building a model that can accurately assess an individual's
true cardiovascular disease risk based on their specific medical history, demographic
factors, and other personal health data would enable personalized prevention and
treatment plans. Utilizing diverse, high-quality data sources will be important for
achieving an accurate risk assessment for each unique patient.
 Data Utilization: Effectively leveraging various sources of relevant medical data, such
as electronic health records, genetic information, imaging and test results, will be
important for improving model accuracy. However, responsible use of personal health
information requires addressing privacy, security and informed consent.
 Interpretability: For a machine learning model to gain clinical acceptance and trust, it
will need to be interpretable. Healthcare providers must understand how a model
derives its risk predictions in order to evaluate its reasoning and identify potential
biases or limitations. Transparency is important for building confidence in model
 Preventive Measures: Designing a system that can recommend personalized lifestyle
changes and preventive clinical actions based on an individual's predicted
cardiovascular disease risk could help promote proactive risk reduction. Effective
recommendations require an understanding of behavioral and socioeconomic factors.
 Real-time Prediction: Enabling real-time cardiovascular risk prediction based on
continuous health monitoring would facilitate immediate clinical response in
emergency situations, such as detecting signs of an impending heart attack. However,
developing models that can perform accurately in real-time with limited data presents
technical challenges.
 Adaptability: For a machine learning model to remain useful over long periods as
patient populations and medical knowledge evolve, it must have the ability to
continuously learn from new data and adapt its algorithms accordingly. Ensuring a
model's long-term accuracy and usefulness is important for its clinical adoption and
 Resource Efficiency: Developing a model that can provide cardiovascular risk
assessments and recommendations using minimal computational resources would
increase its potential for use on various devices, including smartphones. However,
efficiency must not compromise accuracy, interpretability or other quality standards.
 Global Health Impact: For a machine learning system to truly impact cardiovascular
health, it must be made accessible to diverse populations worldwide, requiring
consideration of linguistic, cultural and resource differences between regions.
Equitable, globally focused model development could help reduce cardiovascular

Logistic regression is a machine learning algorithm used for solving classification

problems with binary outcomes. It predicts the probability of a specific event happening
based on various factors and outputs values between 0 and 1, representing the likelihood of
the event occurring. It also classifies data points into two categories based on a threshold
(e.g., 0.5).
Analysis of cardiac disease typically involves several key steps. Data acquisition utilizes
appropriate methods to collect relevant medical information. Preprocessing then cleanses
the data by removing errors or inconsistencies. Feature selection identifies attributes that
are highly correlated with the target variable of disease presence/absence. Logistic
regression modelling commonly trains and tests on these features to predict whether
cardiac disease is present or not. This established workflow first gathers suitable input,
prepares the data, focuses on impactful predictors, and finally applies a standard
classification algorithm to determine cardiac disease status. Overall, the approach aims to
methodically analyse cardiac conditions through established machine learning techniques.

Fig. 1. Workflow of logistic regression model [1]

The datasets utilized for heart disease prediction contain variables representing waves observed in
electrocardiogram (ECG) readings. The characteristic waveform morphology generated during ECG
exams provides insights into cardiac functioning. Specifically, the P wave corresponds to atrial
depolarization, the QRS complex reflects ventricular depolarization, and the T wave is associated with
ventricular repolarization. Together, analysis of these distinct deflections’ aids in evaluation of atrial
and ventricular conduction as well as myocardial repolarization. Deviations from normal wave patterns
can indicate underlying pathologies such as arrhythmias, conduction delays, or ischemia. By
incorporating ECG-derived features into predictive modelling, researchers hope to diagnose cardiac
conditions more accurately and better stratify patient risk. Overall, the inclusion of variables
representing ECG waveform components allows algorithms to leverage the physiological insights
obtainable from electrographic assessment of the heart.

Fig. 2. An Electrocardiogram [2]

Logistic regression (LR) is a supervised machine learning classification algorithm used when the
dependent variable is binary or dichotomous. LR can predict discrete categorical variables that have
two possible classes, such as 0 or 1. The sigmoid function is employed as the cost function in LR. The
sigmoid function maps predicted real values to probabilities between 0 and 1.

The logistic sigmoid function is:

P(x) = 1 / (1 + e ^ (-x)) (1)

Where P(x) represents the probability estimation function whose range is from 0 to 1. The variable x is
the input to the probability function, which is the algorithm's predictive value. Additionally, e is Euler's
number, which has a value of approximately 2.71828.

Hardware Requirements:

 Computer: A personal computer with sufficient processing power to handle machine

learning tasks.
 RAM: 4GB RAM or higher for optimal performance.
 Storage: 512GB of storage space to accommodate datasets and software installations.

Software Requirements:

 Operating System:
o Windows 10 or 11 is recommended for compatibility with most tools and
libraries. Other operating systems like macOS or Linux can also be used with
appropriate adjustments.
 Python: Version 3.6 or later is required for running machine learning libraries.
 Integrated Development Environment (IDE):
o Jupyter Notebook: A web-based environment for interactive coding and data
o Anaconda: A comprehensive Python distribution for data science with many
pre-installed libraries.
o Visual Studio Code: A versatile code editor with extensive Python support and
 Machine Learning Libraries:
o NumPy: Fundamental library for numerical computation and array operations.
o Pandas: Powerful library for data manipulation and analysis with tabular data
o Scikit-learn: Extensive library of machine learning algorithms for a wide range
of tasks.
 Data Visualization Tools:
o Matplotlib: Versatile library for creating static, animated, and interactive
 Version Control (Git):
o Essential for managing code changes, collaboration, and tracking project
 Web Framework (Django):
o Optional for building web applications to deploy and interact with the model (if

Clinical Applications:

 Early Detection and Prevention:

o Identify high-risk individuals before symptoms arise.
 Personalized Treatment Planning:
o Guide treatment decisions based on individual risk profiles.
 Improved Patient Outcomes:
o Reduce heart disease-related morbidity and mortality.

Healthcare Management Applications:

 Resource Allocation:
o Prioritize high-risk patients for closer monitoring and interventions.
 Clinical Research:
o Identify new risk factors and disease pathways.
 Population Health Management:
o Target preventive measures to high-risk communities.

Patient Empowerment Applications:

 Self-Assessment and Risk Awareness:

o Empower individuals to assess their heart disease risk.
 Personalized Health Information:
o Provide tailored recommendations and support based on individual risk.

Additional Applications:

 Wearable Health Devices:

o Integrate prediction models into wearables for continuous monitoring.
 Digital Health Platforms:
o Offer virtual heart disease risk assessment and management tools.
 Research and Development:
o Accelerate drug discovery and clinical trials for heart disease.

[1] Workflow of logistic regression model


[2] ACLS Medical Training – Basics of ECG

[3] Alkayyali, Z. K., Idris, S. A. B., & Abu-Naser, S. S. (2023). A Systematic Literature Review of
Deep and Machine Learning Algorithms in Cardiovascular Diseases Diagnosis. Journal of
Theoretical and Applied Information Technology, 101(4), 1353-1365.

[4] Bhatt, C. M., Patel, P., Ghetia, T., & Mazzeo, P. L. (2023). Effective heart disease prediction
using machine learning techniques. Algorithms, 16(2), 88.

[5] Gupta, C., Saha, A., Reddy, N. S., & Acharya, U. D. (2022). Cardiac Disease Prediction using
Supervised Machine Learning Techniques. In Journal of Physics: Conference Series (Vol. 2161, No.
1, p. 012013). IOP Publishing.

[6] Rahman, M. M. (2022). A web-based heart disease prediction system using machine learning
algorithms. Network Biology, 12(2), 64.

[7] Hasanova, H., Tufail, M., Baek, U. J., Park, J. T., & Kim, M. S. (2022). A novel blockchain-
enabled heart disease prediction mechanism using machine learning. Computers and Electrical
Engineering, 101, 108086.

[8] Rajendran, R., & Karthi, A. (2022). Heart disease prediction using entropy-based feature
engineering and ensembling of machine learning classifiers. Expert Systems with Applications, 207,

[9] Absar, N., Das, E. K., Shoma, S. N., Khandaker, M. U., Miraz, M. H., Faruque, M. R. I., ... &
Pathan, R. K. (2022, June). The efficacy of machine-learning supported smart system for heart
disease prediction. In Healthcare (Vol. 10, No. 6, p. 1137). MDPI.

[10] Sarra, R. R., Dinar, A. M., Mohammed, M. A., & Abdulkareem, K. H. (2022). Enhanced heart
disease prediction based on machine learning and χ2 statistical optimal feature selection model.
Designs, 6(5), 87.

[11] Nagavelli, U., Samanta, D., & Chakraborty, P. (2022). Machine learning technology-based heart
disease detection models. Journal of Healthcare Engineering, 2022.

[12] Raju, K. B., Dara, S., Vidyarthi, A., Gupta, V. M., & Khan, B. (2022). Smart heart disease
prediction system with IoT and fog computing sectors enabled by cascaded deep learning model.
Computational Intelligence and Neuroscience, 2022.

[13] Al Ahdal, A., Rakhra, M., Badotra, S., & Fadhaeel, T. (2022, March). An integrated machine
learning techniques for accurate heart disease prediction. In 2022 International Mobile and
Embedded Technology Conference (MECON) (pp. 594-598). IEEE.

[14] Sharean, T. M., & Johncy, G. (2022). Deep learning models on Heart Disease Estimation-A
review. Journal of Artificial Intelligence, 4(2), 122- 130.
[15] Ahsan, M. M., & Siddique, Z. (2022). Machine learning-based heart disease diagnosis: A
systematic literature review. Artificial Intelligence in Medicine, 128, 102289.

[16] Ahmad, G. N., Fatima, H., Ullah, S., & Saidi, A. S. (2022). Efficient medical diagnosis of
human heart diseases using machine learning techniques with and without GridSearchCV. IEEE
Access, 10, 80151- 80173.

[17] Chang, V., Bhavani, V. R., Xu, A. Q., & Hossain, M. A. (2022). An artificial intelligence model
for heart disease detection using machine learning algorithms. Healthcare Analytics, 2, 100016.

[18] Ansarullah, S. I., Saif, S. M., Kumar, P., & Kirmani, M. M. (2022). Significance of visible
non invasive risk attributes for the initial prediction of heart disease using different machine
learning techniques. Computational intelligence and neuroscience, 2022.

[19] Al Bataineh, A., & Manacek, S. (2022). MLP PSO hybrid algorithm for heart disease
prediction. Journal of Personalized Medicine, 12(8), 1208.

[20] Ali, M. M., Paul, B. K., Ahmed, K., Bui, F. M., Quinn, J. M., & Moni, M. A. (2021). Heart
disease prediction using supervised machine learning algorithms: Performance analysis and
comparison. Computers in Biology and Medicine, 136, 104672.

You might also like