What Is Data Science - A Beginner's Guide To Data Science - Edureka
What Is Data Science - A Beginner's Guide To Data Science - Edureka
What Is Data Science - A Beginner's Guide To Data Science - Edureka
Hemant Sharma
As the world entered the era of big data, the need for its storage also grew. It was the main challenge and concern for the
enterprise industries until 2010. The main focus was on building a framework and solutions to store data. Now when Hadoop
and other frameworks have successfully solved the problem of storage, the focus has shifted to the processing of this data. Data
Science is the secret sauce here. All the ideas which you see in Hollywood sci-fi movies can actually turn into reality by Data
Science. Data Science is the future of Artificial Intelligence. Therefore, it is very important to understand what is Data Science and
how can it add value to your business.
Edureka 2019 Tech Career Guide is out! Hottest job roles, precise learning paths, industry outlook & more in the
guide. Download now.
By the end of this blog, you will be able to understand what is Data Science and its role in extracting meaningful insights from the
complex and large sets of data all around us. To get in-depth knowledge on Data Science, you can enroll for live Data Science
Certification Training by Edureka with 24/7 support and lifetime access.
As you can see from the above image, a Data Analyst usually explains what is going on by processing history of the data. On the
other hand, Data Scientist not only does the exploratory analysis to discover insights from it, but FREE
alsoWEBINAR
uses various advanced
TopScientist
machine learning algorithms to identify the occurrence of a particular event in the future. A Data 5 Clustering Algorithms
will look Explain…
at the data
https://www.edureka.co/blog/what-is-data-science/ 1/14
6/18/2021 What Is Data Science? A Beginner's Guide To Data Science | Edureka
So, Data Science is primarily used to make decisions and predictions making use of predictive causal analytics, prescriptive
analytics (predictive plus decision science) and machine learning.
Subscribe to our Newsletter, and get personalized recommendations.
×
Predictive causal analytics – If you want a model that can predict the possibilities of a particular event in the future, you
Sign up with Google
need to apply predictive causal analytics. Say, if you are providing money on credit, then the probability of customers
making future credit payments on time is a matter of concern for you. Here, you can build a model that can perform
predictive analytics on the payment history of the customerSignup
to predict if the future payments will be on time or not.
with Facebook
Prescriptive analytics: If you want a model that has the intelligence of taking its own decisions and the ability to modify it
with dynamic parameters, you certainly need prescriptive analytics for it.Already
This relatively new field
have an account? is all
Sign in. about providing
advice. In other terms, it not only predicts but suggests a range of prescribed actions and associated outcomes.
The best example for this is Google’s self-driving car which I had discussed earlier too. The data gathered by vehicles can be
used to train self-driving cars. You can run algorithms on this data to bring intelligence to it. This will enable your car to take
decisions like when to turn, which path to take, when to slow down or speed up.
Machine learning for making predictions — If you have transactional data of a finance company and need to build a
model to determine the future trend, then machine learning algorithms are the best bet. This falls under the paradigm of
supervised learning. It is called supervised because you already have the data based on which you can train your machines.
For example, a fraud detection model can be trained using a historical record of fraudulent purchases.
Machine learning for pattern discovery — If you don’t have the parameters based on which you can make predictions,
then you need to find out the hidden patterns within the dataset to be able to make meaningful predictions. This is nothing
but the unsupervised model as you don’t have any predefined labels for grouping. The most common algorithm used for
pattern discovery is Clustering.
Let’s say you are working in a telephone company and you need to establish a network by putting towers in a region. Then,
you can use the clustering technique to find those tower locations which will ensure that all the users receive optimum
signal strength.
Let’s see how the proportion of above-described approaches differ for Data Analysis as well as Data Science. As you can see in
the image below, Data Analysis includes descriptive analytics and prediction to a certain extent. On the other hand, Data Science
is more about Predictive Causal Analytics and Machine Learning.
Now that you know what exactly is Data Science, let now find out the reason why it was needed in the first place.
FREE WEBINAR
This data is generated from different sources like financial logs, text files, multimedia forms, sensors, and instruments.
Simple BI tools are not capable of processing this huge volume and variety of data. This is why we need more complex and
advanced analytical tools and algorithms for processing, analyzing and drawing meaningful insights out of it.
This is not the only reason why Data Science has become so popular. Let’s dig deeper and see how Data Science is being used in
various domains.
How about if you could understand the precise requirements of your customers from the existing data like the customer’s
past browsing history, purchase history, age and income. No doubt you had all this data earlier too, but now with the vast
amount and variety of data, you can train models more effectively and recommend the product to your customers with
more precision. Wouldn’t it be amazing as it will bring more business to your organization?
Let’s take a different scenario to understand the role of Data Science in decision making. How about if your car had the
intelligence to drive you home? The self-driving cars collect live data from sensors, including radars, cameras, and lasers to
create a map of its surroundings. Based on this data, it takes decisions like when to speed up, when to speed down, when
to overtake, where to take a turn – making use of advanced machine learning algorithms.
Let’s see how Data Science can be used in predictive analytics. Let’s take weather forecasting as an example. Data from
ships, aircraft, radars, satellites can be collected and analyzed to build models. These models will not only forecast the
weather but also help in predicting the occurrence of any natural calamities. It will help you to take appropriate measures
beforehand and save many precious lives.
Let’s have a look at the below infographic to see all the domains where Data Science is creating its impression.
To know more about a Data Scientist you can refer to this article on Who is a Data Scientist?
Signup with Facebook
Moving further, lets now discuss BI. I am sure you might have heard of Business Intelligence (BI) too. Often Data Science is
confused with BI. I will state some concise and clear contrasts between Already
the two which
have will help
an account? Sign you
in. in getting a better
understanding. Let’s have a look.
Data Science is a more forward-looking approach, an exploratory way with the focus on analyzing the past or current data
and predicting the future outcomes with the aim of making informed decisions. It answers the open-ended questions as to
“what” and “how” events occur.
Structured
Both Structured and Unstructured
Data Sources (Usually SQL, often Data
( logs, cloud data, SQL, NoSQL, text)
Warehouse)
This was all about what is Data Science, now let’s understand the lifecycle of Data Science.
Explore Curriculum
A common mistake made in Data Science projects is rushing into data collection and analysis, without understanding the
requirements or even framing the business problem properly. Therefore, it is very important for you to follow all the phases
throughout the lifecycle of Data Science to ensure the smooth functioning of the project.
Phase 1—Discovery: Before you begin the project, it is important to understand the various specifications,
requirements, priorities and required budget. You must possess the ability to ask the right questions. Here, you
assess if you have the required resources present in terms of people, technology, time and data to support the
project. In this phase, you also need to frame the business problem and formulate initial hypotheses (IH) to
test.
Phase 2—Data preparation: In this phase, you require analytical sandbox in which you can perform analytics
for the entire duration of the project. You need to explore, preprocess and condition data prior to modeling.
Further, you will perform ETLT (extract, transform, load and transform) to get data into the sandbox. Let’s have
a look at the Statistical Analysis flow below.
You can use R for data cleaning, transformation, and visualization. This will help you to spot the outliers and establish a
relationship between the variables. Once you have cleaned and prepared the data, it’s time to do exploratory analytics on it. Let’s
see how you can achieve that.
Phase 3—Model planning: Here, you will determine the methods and techniques to draw the relationships
between variables. These relationships will set the base for the algorithms which you will implement in the next
phase. You will apply Exploratory Data Analytics (EDA) using various statistical formulas and visualization tools.
FREE WEBINAR
1. R has a complete set of modeling capabilities and provides a good environment for building interpretive models.
Top 5 Clustering Algorithms Explain…
https://www.edureka.co/blog/what-is-data-science/ 5/14
6/18/2021 What Is Data Science? A Beginner's Guide To Data Science | Edureka
2. SQL Analysis services can perform in-database analytics using common data mining functions and basic predictive
models.
3. SAS/ACCESS can be used to access data from Hadoop and is used for creating repeatable and reusable model flow
diagrams.
Subscribe to our Newsletter, and get personalized recommendations.
×
Although, many tools are present in the market but R is the most commonly used tool.
Sign up with Google
Now that you have got insights into the nature of your data and have decided the algorithms to be used. In the next stage, you
will apply the algorithm and build up a model.
Signup with Facebook
Phase 4—Model building: In this phase, you will develop datasets for training and testing purposes. Here you
need to consider whether your existing tools will suffice forAlready
running the
have anmodels
account?or it in.
Sign will need a more robust
environment (like fast and parallel processing). You will analyze various learning techniques like classification,
association and clustering to build the model.
Phase 5—Operationalize: In this phase, you deliver final reports, briefings, code and technical documents. In
addition, sometimes a pilot project is also implemented in a real-time production environment. This will provide
you a clear picture of the performance and other related constraints on a small scale before full deployment.
Phase 6—Communicate results: Now it is important to evaluate if you have been able to achieve your goal
that you had planned in the first phase. So, in the last phase, you identify all the key findings, communicate to
the stakeholders and determine if the results of the project are a success or a failure based on the criteria
developed in Phase 1.
Now, I will take a case study to explain you the various phases described above.
ta Science Training
CERTIFICATION PROGRAMMING LEARNING
CERTIFICATI
TRAINING WITH CERTIFICATION CERTIFICATION
COURSE USI
PYTHON TRAINING TRAINING
Data Science Certification Python Programming Machine Learning Data Science Certifi
Training with Python Certification Training Certification Training Course using R
In this use case, we will predict the occurrence of diabetes making use of the entire lifecycle that we discussed earlier. Let’s go
through the various steps.
Step 1:
First, we will collect the data based on the medical history of the patient as discussed in Phase 1.FREE
YouWEBINAR
can refer to the
sample data below. Top 5 Clustering Algorithms Explain…
https://www.edureka.co/blog/what-is-data-science/ 6/14
6/18/2021 What Is Data Science? A Beginner's Guide To Data Science | Edureka
Attributes:
Step 2:
Now, once we have the data, we need to clean and prepare the data for data analysis.
This data has a lot of inconsistencies like missing values, blank columns, abrupt values and incorrect data format which
need to be cleaned.
Here, we have organized the data into a single table under different attributes – making it look more structured.
Let’s have a look at the sample data below.
1. In the column npreg, “one” is written in words, whereas it should be in the numeric form like 1.
2. In column bp one of the values is 6600 which is impossible (at least for humans) as bp cannot go up to such huge value.
3. As you can see the Income column is blank and also makes no sense in predicting diabetes. Therefore, it is redundant to
FREE WEBINAR
have it here and should be removed from the table.
Top 5 Clustering Algorithms Explain…
https://www.edureka.co/blog/what-is-data-science/ 7/14
6/18/2021 What Is Data Science? A Beginner's Guide To Data Science | Edureka
So, we will clean and preprocess this data by removing the outliers, filling up the null values and normalizing the data type.
If you remember, this is our second phase which is data preprocessing.
Finally, we get the clean data as shown below which can be used for analysis.
Step 3:
First, we will load the data into the analytical sandbox and apply various statistical functions on it. For example, R has
functions like describe which gives us the number of missing values and unique values. We can also use the summary
function which will give us statistical information like mean, median, range, min and max values.
Then, we use visualization techniques like histograms, line graphs, box plots to get a fair idea of the distribution of data.
Step 4:
Now, based on insights derived from the previous step, the best fit for this kind of problem is the decision tree. Let’s see how?
Since, we already have the major attributes for analysis like npreg, bmi, etc., so we will use supervised learning technique to
build a model here.
Further, we have particularly used decision tree because it takes all attributes into consideration in one go, like the ones
which have a linear relationship as well as those which have a non-linear relationship. In our case, we have a linear
FREE WEBINAR
relationship between npreg and age, whereas the nonlinear relationship between npreg and ped .
Top 5 Clustering Algorithms Explain…
https://www.edureka.co/blog/what-is-data-science/ 8/14
6/18/2021 What Is Data Science? A Beginner's Guide To Data Science | Edureka
Decision tree models are also very robust as we can use the different combination of attributes to make various trees and
then finally implement the one with the maximum efficiency.
Here, the most important parameter is the level of glucose, so it is our root node. Now, the current node and its value
determine the next important parameter to be taken. It goes on until we get the result in terms of pos or neg. Pos means the
tendency of having diabetes is positive and neg means the tendency of having diabetes is negative.
If you want to learn more about the implementation of the decision tree, refer this blog How To Create A Perfect Decision Tree
Step 5:
In this phase, we will run a small pilot project to check if our results are appropriate. We will also look for performance
constraints if any. If the results are not accurate, then we need to replan and rebuild the model.
Step 6:
Once we have executed the project successfully, we will share the output for full deployment.
Being a Data Scientist is easier said than done. So, let’s see what all you need to be a Data Scientist. A Data Scientist requires
skills basically from three major areas as shown below.
FREE WEBINAR
As you can see in the above image, you need to acquire various hard skills and soft skills. You need to be good at statistics and
mathematics to analyze and visualize data. Needless to say, Machine Learning forms the heart of Data Science and requires you
to be good at it. Also, you need to have a solid understanding of the domain you are working in to understand the business
problems clearly. Your task does not end here. You should be capable of implementing various algorithms which require good
coding skills. Finally, once you have made certain key decisions, it is important for you to deliver them to the stakeholders. So,
good communication will definitely add brownie points to your skills.
I urge you to see this Data Science video tutorial that explains what is Data Science and all that we have discussed in the blog. Go
ahead, enjoy the video and tell me what you think.
What Is Data Science? Data Science Course – Data Science Tutorial For Beginners | Edureka
This Edureka Data Science course video will take you through the need of data science, what is data science, data science use
cases for business, BI vs data science, data analytics tools, data science lifecycle along with a demo.
In the end, it won’t be wrong to say that the future belongs to the Data Scientists. It is predicted that by the end of the year 2018,
there will be a need of around one million Data Scientists. More and more data will provide opportunities to drive key business
decisions. It is soon going to change the way we look at the world deluged with data around us. Therefore, a Data Scientist
should be highly skilled and motivated to solve the most complex problems.
l hope you enjoyed reading my blog and understood what is Data Science. Check out our Data Science certification
training here, that comes with instructor-led live training and real-life project experience.
Python Programming – Learn Python Numpy Tutorial – Python Loops – While, For and The Whys and Hows
Python Programming From Arrays In Python Nested Loops in Python Predictive Modeling-
Scratch Programming
‹›
‹›
Comments 24 Comments
It is really a nice and informative blog and the content is really precise. I liked your views on it. I will subscribe to it. I am looking forward for more
such kind of blogs as they are really mesmerizing. Thanks for such an interesting and wonderful blog.The list of Digital Marketing Blogs you shared
with us.
Reply
Great tips, I learned many things from your post It is very good for everyone. We want your more post because you are making people
knowledgeable Which is very important to success. And we know now days digital marketing is getting more success because it is very good work It
has more profit than other things. Thank you so much for sharing this article with us. Please keep us update.
Reply
Hi my name is anirban and I am currently working in a small finance Bank in Bangalore in risk department… So I want to know how data science
will help me to advance my career in banking risk profile.
Reply
atif says:
Great Post. Good to learn the difference between Data Science and Business Intelligence.
Reply
EdurekaSupport says:
Hey Atif, we are really glad you loved our content. Do check out our other blogs too. Cheers!
Reply
sumanta says:
What will more career growth between Data Science and Test Automation.
Reply
Romesh says:
Hi,
Suresh says:
Hi,
Subscribe
I have worked as Tech Lead in Microsoft Technologies(ASP.NET & SQL Server)toand
ouri am
Newsletter,
very strong and get personalized recommendations.
in SQL.
×
I want to change my career path into Data Science, Let me know which course is suitable for me and how its career chances in future,
Sign up with Google
Reply
hi
i want to know the scope of Data Science in the field of Library and Information Science in India.
Reply
EdurekaSupport says:
May 7, 2018 at 4:54 am GMT
Hey Aasha, thank you for reading our blog. We hope you found it useful. Implementation and usage of Data Science is wide. With innovation
and changing techniques leading the way, it can help you know a lot more about the reading habits of your customer. This can be leveraged
in organizing and managing your books better. Scope of data science is huge, there are many other ways in which dta science can leave a
lasting impact on Information Science in India. Hope this helps.Cheers :)
Reply
Hi,
In my past experience I have worked as Technical Lead for SSIS based project, it was very interesting period in my carrier. I’m very strong in SQL.
I’m currently working as Project Manager for a Digital Commerce project. Over the days i have started feeling bored about my job. I’m looking to
change my domain to Data Science . Would you advise the same and the next steps please.
Thanks,
Jay
Reply
EdurekaSupport says:
Feb 28, 2018 at 2:16 pm GMT
Hey Jayprakash,
We apologize for the delayed response. Yes, you can definitely think about taking up Data Science as a career option. However, to help you
move into Data Science at this stage in your career, you will need to clear some certifications that will help authenticate your knowledge and
expertise in this field. It is also the best way to show some credibility in front of potential employers. We provide complete live online
instructor led sessions for our Data Science Certification training. You can check it out here: https://www.edureka.co/data-science-r-
programming-certification-course
Reply
Ashima says:
Jan 30, 2017 at 3:17 am GMT
Hi ,
I am currently working as Tableau developer. I have data visualization background with javascript. I am trying to find out best career path for me in
big data or business intelligence path. I have strong SQL background as well.
What courses should I do. I am torn between choosing traditional business intelligence or datascience or Big data.
Which should be the best career choice for me, I am still more interested in Visulization.
Thanks
Reply
EdurekaSupport says:
Feb 2, 2017 at 1:43 pm GMT
Hey Ashima, thanks for checking out our blog. Looking at your work experience and knowledge, we suggest that you take up our Data
FREE WEBINAR
Science Course. In this course, you will get to learn R Programming in Data Science and use it for visualization.
Our Data Science course also includes the complete Data Life cycle covering Data Architecture, Statistics,Top 5 Clustering
Advanced Algorithms
Data Analytics Explain…
& Machine
https://www.edureka.co/blog/what-is-data-science/ 12/14
6/18/2021 What Is Data Science? A Beginner's Guide To Data Science | Edureka
Learning. You will learn Machine Learning Algorithms such as K-Means Clustering, Decision Trees, Random Forest and Naive Bayes. You will
need some knowledge of Statistics & Mathematics to take up this course. When you sign up for this course, we provide you with
complementary self-paced courses covering essentials of Hadoop, R , Statistics and Machine Learning to brush up the fundamentals
required for the course. You can check out the course curriculum and sample class recording here: https://www.edureka.co/data-science-r-
Subscribe to our Newsletter, and get personalized recommendations.
programming-certification-course . Hope this helps. Cheers!
×
Reply
Sign up with Google
1 2 Next »
Data Science Certification Python Programming Machine Learning Data Science Cert
Training with Pyth ... Certification Training Certification Training Course using R
89k Enrolled Learners 26k Enrolled Learners 11k Enrolled Learners 38k Enrolled Learne
Weekend/Weekday Weekend Weekend Weekend
Live Class Live Class Live Class Live Class
‹›
Browse Categories
Artificial Intelligence BI and Visualization Big Data Blockchain Cloud Computing Cyber Security Data Warehousing and ETL
Databases DevOps Digital Marketing Enterprise Front End Web Development Mobile Development Operating Systems
Programming & Frameworks Project Management and Methodologies Robotic Process Automation Software Testing
TRENDING CERTIFICATION COURSES TRENDING MASTERS COURSES
FREE WEBINAR
DevOps Certification Training Data Scientist Masters Program
Top 5 Clustering Algorithms Explain…
https://www.edureka.co/blog/what-is-data-science/ 13/14
6/18/2021 What Is Data Science? A Beginner's Guide To Data Science | Edureka
×
Tableau Training & Certification Big Data Architect Masters Program
Robotic Process Automation Training using UiPath Data Analyst Masters Program
Online Java Course and Training Post-Graduate Program in Big Data Engineering
Already have an account? Sign in.
Python Certification Course
About us Careers
Community
DOWNLOAD APP
Sitemap
Blog Sitemap
Community Sitemap
Webinars
CATEGORIES
CATEGORIES
Cloud Computing DevOps Big Data Data Science BI and Visualization Programming & Frameworks Software Testing
Project Management and Methodologies Robotic Process Automation Frontend Development Data Warehousing and ETL Artificial Intelligence
Blockchain Databases Cyber Security Mobile Development Operating Systems Architecture & Design Patterns Digital Marketing
Spring tutorial PHP interview questions Inheritance in Java Polymorphism in Java Spring interview questions Pointers in C Linux commands
Android tutorial JavaScript tutorial jQuery tutorial SQL interview questions MySQL tutorial Machine learning tutorial Python tutorial
What is machine learning Ethical hacking tutorial SQL injection AWS certification career opportunities AWS tutorial What Is cloud computing
What is blockchain Hadoop tutorial What is artificial intelligence Node Tutorial Collections in Java Exception handling in java
Python Programming Language Python interview questions Multithreading in Java ReactJS Tutorial Data Science vs Big Data vs Data Analyt…
Software Testing Interview Questions R Tutorial Java Programs JavaScript Reserved Words and Keywor… Implement thread.yield() in Java: Exam…
Implement Optical Character Recogniti… All you Need to Know About Implemen…
© 2021 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved. Terms & Conditions
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of
MongoDB, Inc.
FREE WEBINAR