Nothing Special   »   [go: up one dir, main page]

Intership Final

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Data Science

CHAPTER – 1

ABOUT THE ORGANIZATION

1.1 History

Elewayte is a young and dynamic startup founded in 2021 and headquartered in Bengaluru
(Bangalore), Karnataka, India. As a private company in the service sector, Elewayte has quickly
established itself as an emerging player in its field.

With a lean team of 11-50 employees, Elewayte exemplifies the agile and innovative nature of
modern Indian startups. The company was founded by Akash Ghosh, who likely serves in a key
leadership role.

While specific details about Elewayte's services are not provided, as a service-oriented startup,
they likely offer innovative solutions in areas such as technology, business services, or consumer-
facing applications. Their presence in Bangalore, often referred to as India's Silicon Valley, suggests
they may be operating in the tech or IT services space.

Elewayte maintains an online presence through their website (elewayte.com) and can be
contacted via email at info@elewayte.com or by phone at 6383453564. As a growing startup, they
may be actively expanding their team and services to meet market demands and scale their operations.

Mission and Vision

 Mission

At Elewayte, our mission is to empower individuals with the technical skills and practical knowledge
they need to thrive in the modern job market. We are committed to bridging the gap between
traditional education and industry demands, fostering a skill-based economy where employability is
driven by real-world expertise. Through our innovative and user-friendly platform, we aim to
transform the learning experience, emphasizing hands-on learning, comprehension, and independent
thinking. Our mission is to inspire ambition and equip students with the tools to embark on successful
and rewarding careers, ultimately contributing to the growth and development of society.

Dept. of CSE KLECET, Chikodi Page 1


Data Science

 Vision

Our vision at Elewayte is to become a global leader in skill development, recognized for
revolutionizing the way individuals acquire knowledge and expertise. We envision a world where
every learner can access practical and relevant education, irrespective of their geographical location or
background. By continuously adapting to the ever-changing landscape of technology and industry, we
aim to remain at the forefront of skill-based education, offering cutting-edge courses that meet the
needs of tomorrow's job market. Our vision is to empower millions of learners, transforming them
into skilled professionals who can drive innovation, solve complex challenges, and make a positive
impact on their communities and the world at large. Through our unwavering commitment to
excellence, accessibility, and student success, we strive to shape a future where talent and ambition
know no bounds.

Contact details:

· Company Name: Elewayte


· Headquarters: Bengaluru/Bangalore, Karnataka, India
· Office Location: Bangalore / Bengaluru
· Email: info@elewayte.com
· Phone: 6383453564
· Website: elewayte.com

Commitment

In a rapidly evolving global employment market, Elewayte stands as a beacon of hope for students
and job seekers alike. With the shift towards a skill-based economy, the focus has shifted from
superficial CVs to actionable skills and hands-on experience. Elewayte firmly believes that
empowering individuals with the right technical skill sets will lead them to successful and fulfilling
careers.

Dept. of CSE KLECET, Chikodi Page 2


Data Science

Quality

The main emphasis is to deliver best quality in every project the company undertake. With the time
tested business methodology, and structured solution building approach, company ensures to
maintain the global business standards.

 Technology, Innovation & Support

Elewayte offers a state-of-the-art, user-friendly platform that allows students to access a wide range
of skill development programs easily. The platform likely incorporates interactive elements to
enhance engagement and promote active learning. Elewayte's approach encourages independence by
offering self-paced courses, allowing students to learn at their own speed and convenience.

1.2 Various fields in which the company offers Services:

 Software Solutions
 Web Solutions
 Networking Solutions
 Quality Assurance & Testing
 Application Maintenance & Support

 Software Solutions

Elewayte has developed a number of customized products and MIS applications for its clients in
this service. Its mature software development processes, combined with excellent infrastructure
have significantly increased the “on-time and on-budget” delivery of software in the offshore mode.
Company uses a highly effective IMPACT Methodology for offshore and distributed software
development. Applications come in all sizes, be it a one-table database, or a massive client-server
application. The creation of complete database applications is yet another field that the company
specialize in.

Dept. of CSE KLECET, Chikodi Page 3


Data Science

Company Offers:

Application Development

 Interactive Application Development


 Custom Application Development/Maintenance
 MIS and ERP Solution & Support

 Web Solutions

Elewayte provides web solutions & services to help customer reach to a wider customer base. The
web is a new and different medium for communication and requires a different viewpoint and skill
set to use it in the most effective way. You need web consulting to get more return on your
investment in your web site. Elewayte help you get the most effective solution through:

 Website Development
 Web Multimedia
 Intranet Development
 Web Promotion
 Web hosting
 E-commerce

Dept. of CSE KLECET, Chikodi Page 4


Data Science

CHAPTER – 2

OBJCTIVES & PLAN OF INTERNSHIP

2.1 Why to do Internship

 It assist the student's in development of employer-valued skills such as teamwork,


communications and attention to detail.
 It exposes the student to the environment and expectations of performance on the part of
the programmer in professional practices, private/public companies or government entities.
 It enhance and/or expand the student's knowledge of a particular area(s) of technical
methods.
 It expose the student to professional role models or mentors who will provide the student
with support in the early stages of the internship and provide an example of the behaviour
expected in the intern's workplace. 

2.2 My Role as an Intern

 Responsible for the coding .


 Receive instructions and guidance from mentor regarding required tasks and expected
results.
 Reports on progress on daily basis or as required.

2.3 Department of interns:

The department I worked in was Data Science intern where I was trained to gather the datasets,
analyze it, code according to the requirements and provide a feedback to the company.

Starting and Ending Dates of the Internship:

I started my Internship program from October 2023 to November 2023 for five weeks.

Dept. of CSE KLECET, Chikodi Page 5


Data Science

CHAPTER – 3

INTERNSHIP ACTIVITIES
3.1 Machine Learning

Machine learning refers to a type of data analysis that uses algorithms that learn from data. Machine
learning algorithms can apply complex calculations to big data, very quickly.

Machine learning itself is a fast growing technical field and is highly relevant topic in both academia
and in the industry. It is therefore a relevant skill to have in both academia and in the private sector.
It is a field at the intersection of informatics and statistics, tightly connected with data science and
knowledge discovery. The prerequisites required are basic under-standing of statistics and python.

Figure 3.1 Machine Learning

The main software used in a typical Python machine learning pipeline can consist of almost any
combination of the following tools:

 NumPy, for matrix and vector manipulation.


 Pandas for time series and R-like DataFrame data structures.
 The 2D plotting library matplotlib.
 SciKit-Learn as a source for many machine learning algorithms and utilities.

Dept. of CSE KLECET, Chikodi Page 6


Data Science

>>>import numpy as np

>>>import pandas as pd

>>>import matplotlib.pyplot as plt

3.2 Jupyter
Jupyter, previously known as IPython Notebook, is a web-based, interactive development
environment. Originally developed for Python, it has since expanded to support over 40 other
programming languages including Julia and R.
Jupyter allows for notebooks to be written that contain text, live code, images, and equations. These
notebooks can be shared, and can even be hosted on GitHub for free.

3.3 Data

Python, NumPy, and Pandas sections I worked with either generated data or with a toy dataset. Later
in the Project I worked on one of the medical example, including heart disease prediction csv dataset.
The medical dataset used in project is freely available .we will see that analysing more involved
medical data using the same open-source tools is possible.

3.4 Introduction to Python

Python is a general purpose programming language that is used for anything from web- development
to deep learning. According to several metrics, it is ranked as one of the top three most popular
languages. Python is a widely used general-purpose, high level programming language. It was
initially designed by Guido van Rossum in 1991 and developed by Python Software Foundation. It
was mainly developed for emphasis on code readability, and its syntax allows programmers to
express concepts in fewer lines of code.

Python is a programming language that lets you work quickly and integrate systems moreefficiently.
There are two major Python versions- Python 2 and Python 3. Today, Python is one of the most
popular programming languages for this task and it has replaced many languages in the industry,
one of the reason is its vast collection of libraries.

Dept. of CSE KLECET, Chikodi Page 7


Data Science

3.5 Python libraries used in Machine Learning

3.5.1 NumPy:

It is a very popular python library for large multi-dimensional array and matrix processing, with the
help of a large collection of high-level mathematical functions. It is very useful for fundamental
scientific computations in Machine Learning. It is particularly useful for linear algebra, Fourier
transform, and random number capabilities. High-end libraries like TensorFlow uses NumPy
internally for manipulation of Tensors.

3.5.2 Scikit-learn:

Scikit-learn is one of the most popular ML libraries for classical ML algorithms. It is built on top of
two basic Python libraries, viz., NumPy and SciPy. Scikit-learn supports most of the supervised
and unsupervised learning algorithms. Scikit-learn can also be used for data- mining and data-
analysis, which makes it a great tool who is starting out with ML.

3.5.3 Tensorflow:

TensorFlow is a very popular open-source library for high performance numerical computation
developed by the Google Brain team in Google. As the name suggests, Tensorflow is a framework
that involves defining and running computations involving tensors. It can train and run deep neural
networks that can be used to develop several AI applications. TensorFlow is widely used in the field
of deep learning research and application.

3.5.4 Pandas:

Pandas is a popular Python library for data analysis. It is not directly related to Machine Learning.
As we know that the dataset must be prepared before training. In this case, Pandas comes handy as
it was developed specifically for data extraction and preparation. It provides high-level data
structures and wide variety tools for data analysis. It provides many inbuilt methods for grouping,
combining and filtering data.

Dept. of CSE KLECET, Chikodi Page 8


Data Science

3.5.5 Matplotlib

Matpoltlib is a very popular Python library for data visualization. Like Pandas, it is not directly
related to Machine Learning. It particularly comes in handy when a programmer wants to visualize
the patterns in the data. It is a 2D plotting library used for creating 2D graphs and plots. A module
named pyplot makes it easy for programmers for plotting as it provides features to control line
styles, font properties, formatting axes, etc. It provides various kinds of graphs and plots for data
visualization, viz., histogram, error charts, bar chats, etc.

3.6 Types of Machine Learning Algorithms:

Figure 3.6.1 shows Types of Machine Learning Algorithms

3.6.1 Supervised Machine Learning Algorithms:


This algorithm consist of a target / outcome variable (or dependent variable) which is to be predicted
from a given set of predictors (independent variables). Using these set of variables, we generate a
function that map inputs to desired outputs. The training process continues until the model achieves
a desired level of accuracy on the training data. Examples of Supervised Learning: Regression,
Decision Tree, Random Forest, KNN, Logistic Regression,etc.

Dept. of CSE KLECET, Chikodi Page 9


Data Science

3.6.2 Unsupervised Machine Learning Algorithms:


In this, we do not have any target or outcome variable to predict / estimate. It is used for
clustering population in different groups, which is widely used for segmenting customers in
different groups for specific intervention. Examples of Unsupervised Learning: PCA, K-means.
3.6.2 Unsupervised Machine Learning Algorithms:
Reinforcement Machine Learning Algorithms learn optimal actions through trial and error. This
means that the algorithm decides the next action by learning behaviors that are based on its current
state and that will maximize the reward in the future.

Dept. of CSE KLECET, Chikodi Page 10


Data Science

CHAPTER – 4

PROJECT INFORMATION

4.1 Heart disease prediction:

Machine Learning is used across many spheres around the world. The healthcare industryis no
exception. Machine Learning can play an essential role in predicting presence/absence of Heart
diseases. Such information, if predicted well in advance, can provide important insights to doctors
who can then adapt their diagnosis and treatment per patient basis. This project where Iworked on
predicting Heart Diseases in people using Machine Learning algorithms. The algorithms included
K-Neighbours Classifier, Logistic Regression, SVM etc.

Step 1 : Importing all the python libraries required

Figure 4.1.1: : Importing the python libraries .

Step 2: Reading the csv dataset file .It consists of training , testing dataset and its attributes . Each
row represents the Disease status and each column represents the attributes on which the disease
prediction is done. 1 in class column represents ‘Patient has a heart disease’ and 0 represents ‘Patient
doesn’t have heart disease’.

Figure 4.1.2 Reading the csv dataset file .

Dept. of CSE KLECET, Chikodi Page 11


Data Science

Step 3: (a) verify the data's shape and content before proceeding with further analysis or
processing.

Figure 4.1.3 Shows data’s shape

(b) Data Analysis: From the heatmap, It’s easy to see that there is no single feature that has a very
high correlation with our target value.

Figure 4.1.4 Shows Heatmap for training


values

Dept. of CSE KLECET, Chikodi Page 12


Data Science

(c) Identifying the shape of the distribution for each feature (e.g., normal, skewed, bimodal).
Detecting potential outliers. Understanding the range of values for each feature. Informing decisions
about data preprocessing or feature engineering in your cardiovascular disease prediction model

Figure 4.1.5 Histogram

Step 4: Understanding how exercise-induced ST depression relates to heart disease. Exploring the role
of ST segment slope in this relationship. Identifying potential diagnostic indicators for heart
disease.

Figure 4.1.6 ST Depression

Dept. of CSE KLECET, Chikodi Page 13


Data Science

Step 5: Understanding the distribution of ST depression across different groups. Identifying potential
sex-based differences in ST depression patterns. Exploring how ST depression relates to heart disease
presence

Figure 4.1.7 ST Distribution among Thalach level vs Heart Disease

Step 6: Understanding how maximum heart rate relates to heart disease presence. Identifying potential
sex-based differences in maximum heart rate patterns. Exploring the range and central tendencies of
maximum heart rate across different groups.

Dept. of CSE KLECET, Chikodi Page 14


Data Science

Figure 4.1.8.Checking whether the datatype of classes belonging to train and test are same.

Step 7: Prepare the data for machine learning algorithms. Splitting the data allows for model
evaluation on unseen data, while normalization helps many algorithms perform better by
putting all feature on a similar scale .

Figure 4.1.9. Train Test Split

Dept. of CSE KLECET, Chikodi Page 15


Data Science
Step 8: The accuracy score provides a simple metric to assess how well the Logistic Regression
model is performing for your specific classification task.

Figure 4.1.10 Logistic Regression

Step 9: Calculates the accuracy of the model by comparing its predictions (Y_pred_svm) to
the actual test labels (Y_test).Prints the accuracy score, showing how well the linear SVM
model performed on the test data.

Figure 4.1.11 Support Vector Machine

Step 10: Implement another classification algorithm (KNN) alongside previously used
methods. Evaluate its performance on the same dataset. Allow comparison of KNN's accuracy
with other algorithms. Provide insight into whether a neighborhood-based approach works
well for this data.

Dept. of CSE KLECET, Chikodi Page 16


Data Science

Figure 4.1.13 K-Nearest Neighbour

Step 11: This code continues the process of testing different algorithms, but adds an extra step
of basic optimization. It's trying to find the best possible performance for the Decision Tree
before comparing it to other algorithms.

Figure 4.1.12 Decision Tree Classifier

Step 12: This code continues the process of testing different algorithms, adding another
powerful classifier to the comparison. The Random Forest often performs well on a variety of
datasets, so this step is likely aimed at seeing if it outperforms the previously tested
algorithms.

Dept. of CSE KLECET, Chikodi Page 17


Data Science

Figure 4.1.13 Random forest Classifier

Step 13: Total 6 Algorithms used for finding testing accuracy. Highest accuracy giving
algorithm is the Random Forest . We will be using this algorithm for predicting the heart
disease.

Figure 4.1.14 Testing Accuracy

Step 14: Taking the values of Blood pressure, Cholesterol , family history, Body mass ratio
And Age from the user . Prediction of presence of heart disease is done through Logistic
RegressionAlgorithm.

Dept. of CSE KLECET, Chikodi Page 18


Data Science

Figure 4.1.15 Prediction of Heart Disease

Dept. of CSE KLECET, Chikodi Page 19


Data Science

CHAPTER – 5

INTERNSHIP OUTCOMES
After completing the internship, I was able to learn the process of development through various
cycles and also with the latest technologies involved in the industry.

I was able to understand the proper flow of code and professional code ethics to be followed by
the developer in order for the code to be globally accepted.

I was able to understand the implementation procedures for the machine learning algorithms.

I am capable of designing python programs for various Learning algorithms. Identify and apply
Machine Learning algorithms to solve real world problems.

I got to know about the time management and project management. And also got to know what
skill and knowledge I still need to work in a professional environment.

Dept. of CSE KLECET, Chikodi Page 20


Data Science

Conclusion
The machine learning approach to cardiovascular disease prediction demonstrates the potential for
using various algorithms to assist in medical diagnostics. By comparing multiple models, you've
identified which approaches work best for this particular dataset and problem.

The use of accuracy as a metric provides a straightforward way to compare models, though in
medical applications, it's often crucial to also consider metrics like sensitivity and specificity.

The feature set used (including blood pressure, cholesterol, family history, BMI, and age) aligns
with known risk factors for cardiovascular disease, lending clinical relevance to the model.

However, the mismatch between the number of features in the training data (13) and the prediction
input (6) highlights the importance of maintaining consistency between model training and
application.

Dept. of CSE KLECET, Chikodi Page 21


Data Science

References

[1] Zeel Code Lab - www.zeelnet.com


[2] Python - https://docs.python.org/3/tutorial/
[3] Machine Learning Algorithms
https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/
[4] Python libraries for Machine Lerning - https://www.geeksforgeeks.org/best-python
libraries-for-machine-learning/
Data Analysis - https://www.analyticsvidhya.com/blog/2020/10/the-clever-ingredient-that
decide-the-rise-and-the-fall-of-your-machine-learning-model-exploratory

Dept. of CSE KLECET, Chikodi Page 22

You might also like