Dsbda Unit 1
Dsbda Unit 1
Dsbda Unit 1
Prof.K.B.Sadafale
Assistant Professor
• Teaching Scheme :-
• Examination Scheme:-
• Discrete Mathematics
• CO1: Analyze needs and challenges for Data Science Big Data
Analytics
• CO2: Apply statistics for Big Data Analytics
• CO3: Apply the lifecycle of Big Data analytics to real world
problems
• CO4: Implement Big Data Analytics using Python
programming
• CO5: Implement data visualization using visualization tools in
Python programming
• CO6: Design and implement Big Databases using the Hadoop
ecosystem
Syllabus
• Unit I: Introduction to Data Science and Big Data
• All these decision factors will act as input data, and we will
get an appropriate answer from these decisions, so this
analysis of data is called the data analysis, which is a part of
data science.
Example 2
• Companies manufactured two types of candy
4) Transport:
•Transport industries also using data science technology to
create self-driving cars.
•With self-driving cars, it will be easy to reduce the number
of road accidents.
5) Healthcare:
•In the healthcare sector, data science is providing lots of benefits.
•Data science is being used for tumor detection, drug discovery,
medical image analysis, virtual medical bots, etc.
6) Recommendation systems:
•Most of the companies, such as Amazon, Netflix, Google Play, etc., are
using data science technology for making a better user experience
with personalized recommendations.
•Such as, when you search for something on Amazon, and you started
getting suggestions for similar products, so this is because of data
science technology.
7) Risk detection:
•Finance industries always had an issue of fraud and risk of losses, but
with the help of data science, this can be rescued.
•Most of the finance companies are looking for the data scientist to
avoid risk and any type of losses with an increase in customer
satisfaction.
Big Data Overview
• Data is created constantly, and at an ever-increasing rate.
• Mobile phones, social media, imaging technologies to
determine a medical diagnosis-all these and more create new
data, and that must be stored somewhere for some purpose.
• Devices and sensors automatically generate diagnostic
information that needs to be stored and processed in real
time.
• Merely keeping up with this huge influx of data is difficult, but
substantially more challenging is analyzing vast amounts of it.
• especially when it does not conform to traditional notions of
data structure, to identify meaningful patterns and extract
useful information.
• These challenges of the data deluge present the opportunity
to transform business, government, science, and everyday
life.
• Three attributes stand out as defining Big Data
characteristics:
• • Huge volume of data:
• Rather than thousands or millions of rows, Big Data can
be billions of rows and millions of columns.
• • Complexity of data types and structures:
• Big Data reflects the variety of new data sources,
formats, and structures, including digital traces being left
on the web and other digital repositories for subsequent
analysis.
• • Speed of new data creation and growth:
• Big Data can describe high velocity data, with rapid data
ingestion and near real time analysis.
What is Big Data?
Big Data Definition
• No single standard definition…
-- IBM
“Big data is the data characterized by 4
key attributes: volume, variety, velocity
and
value.”
-- Oracle
• Big Data is a collection of large datasets that cannot be
processed using traditional computing techniques.
• It is not a single technique or a tool, rather it involves many
areas of business and technology.
• Data which are very large in size is called Big Data.
• Normally we work on data of size MB(WordDoc ,Excel) or
maximum GB(Movies, Codes) but data in Peta bytes size is
called Big Data.
• Big Data refers to the large amounts of data which is display
in from various data sources and has different formats.
• Capturing data
• Storage
• Searching
• Sharing
• Transfer
• Analysis
• Presentation
Types of Big data
• The three different formats of big data are:
• Healthcare
• Big data has started making a massive difference in the
healthcare sector, with the help of predictive analytics,
medical professionals, and health care personnel.
• It can produce personalized healthcare and solo patients also.
Telecommunication and media
• Telecommunications and the multimedia sector are the main
users of Big Data.
• There are zettabytes to be generated every day and handling
large-scale data that require big data technologies.
E-commerce
• E-commerce is also an application of Big data.
• It maintains relationships with customers that is essential for
the e-commerce industry.
• E-commerce websites have many marketing ideas to retail
customers, manage transactions, and implement better
strategies of innovative ideas to improve businesses with Big
data.
• Amazon: Amazon is a tremendous e-commerce website
dealing with lots of traffic daily. But, when there is a
pre-announced sale on Amazon, traffic increase rapidly that
may crash the website.
• So, to handle this type of traffic and data, it uses Big Data.
• Big Data help in organizing and analyzing the data for far use.
Social Media
• Social Media is the largest data generator.
• The statistics have shown that around 500+ terabytes of fresh
data generated from social media daily, particularly on
Facebook.
• The data mainly contains videos, photos, message exchanges,
etc. A single activity on the social media site generates many
stored data and gets processed when required.
• The data stored is in terabytes (TB); it takes a lot of time for
processing. Big Data is a solution to the problem.
Big Data Characteristics
data efficiently.
It is mainly used for scientific purposes. It is mainly used for business purposes and
customer satisfaction.
It broadly focuses on the science of the It is more involved with the processes of
data. handling voluminous data.
What is Data Explosion
• The large scale of data is generated and stored in
computer systems, which is called data explosion.
• The world is currently used to sparing everything
without exception in the electronic space.
• Processing power, RAM speeds and hard-disk sizes
have expanded to level that has changed our
viewpoint towards data and its storage.
• Would you be able to envision having 256 or 512 MB
RAM in your PC now?
• On the off chance that we comprehend idea of byte, we can
envision how data growth has expanded over time and how
storage systems handle it.
• Step 6: Storage
• The last step of the data processing cycle is storage, where
data and metadata are stored for further use.
• This allows for quick access and retrieval of information
whenever needed, and also allows it to be used as input in
the next data processing cycle directly.
Types of Data Processing
Type Uses
Data is collected and processed in batches. Used
Batch Processing for large amounts of data.
Eg: payroll system
Data is processed within seconds when the input is
Real-time Processing given. Used for small amounts of data.
Eg: withdrawing money from ATM
Data is automatically fed into the CPU as soon as it
becomes available. Used for continuous processing
Online Processing
of data.
Eg: barcode scanning
Data is broken down into frames and processed
using two or more CPUs within a single computer
Multiprocessing
system. Also known as parallel processing.
Eg: weather forecasting
Allocates computer resources and data in time slots
Time-sharing
to several users simultaneously.
Examples of Data Processing
• Data processing occurs in our daily lives whether we may be aware of it or
not. Here are some real-life examples of data processing:
• A stock trading software that converts millions of
stock data into a simple graph
• An e-commerce company uses the search history of
customers to recommend similar products
• A digital marketing company uses demographic data
of people to strategize location-specific campaigns
• A self-driving car uses real-time data from sensors to
detect if there are pedestrians and other cars on the
road
Relationship between data science and
information science
• Data science is the discovery of knowledge or
actionable information in data.
• Information science is design of practices for storing
and retrieving information.
• data science and information science are distinct but
complementary disciplines.
• Data science is heavy on computer science and
mathematics.
• Information science is more concerned with area
such as library science ,cognitive science and
communications.
• Data science is used in business functions such as strategy
formation, decision making and operational processes.
• It touches on practices such as artificial intelligence, analytics,
predictive analytics and algorithm design.
• The discovery of knowledge and actionable information in
data.
• Data science is an interdisciplinary field about scientific
methods, processes, and systems to extract knowledge or
insights from data in various forms, either structured or
unstructured.
Business intelligence versus Data science
• Data Science:
• Data science is basically a field in which information and
knowledge are extracted from the data by using various
scientific methods, algorithms, and processes.
• It can thus be defined as a combination of various
mathematical tools, algorithms, statistics, and machine
learning techniques which are thus used to find the hidden
patterns and insights from the data which help in the
decision-making process.
• Data science deals with both structured as well as
unstructured data.
• It is related to both data mining and big data.
• Data science involves studying the historic trends and thus
using its conclusions to redefine present trends and also
predict future trends and Technologies.
• Business Intelligence:
• Business intelligence(BI) is a set of technologies, applications, and
processes that are used by enterprises for business data analysis.
• It is used for the conversion of raw data into meaningful
information which is thus used for business decision-making and
profitable actions.
• It deals with the analysis of structured and sometimes unstructured
data which paves the way for new and profitable business
opportunities.
• It supports decision-making based on facts rather than
assumption-based decision-making.
• Thus it has a direct impact on the business decisions of an
enterprise.
• Business intelligence tools enhance the chances of an enterprise to
enter a new market as well as help in studying the impact of
marketing efforts.
Factor Data Science Business Intelligence
It deals with both structured as well as It mainly deals only with structured
Data
unstructured data. data.
Method It makes use of the scientific method. It makes use of the analytic method.
Expertise It’s expertise is data scientist. It’s expertise is the business user.
It deals with the questions of what will It deals with the question of what
Questions
happen and what if. happened.
Business process
reengineering (BPR)
• Some of the important uses of Business Intelligence are –
• Survey Method
• Observation Method
• Experimental method
• Survey Method
❖ Interview
❖ Telephone Interview
❖ Mail survey
• Observation Method
• Structured observation
• Unstructured observation
• Live observation
• Record observation
• Direct observation
• Indirect observation
• Human observation
• Mechanical observation
• Experimental Method