Nothing Special   »   [go: up one dir, main page]

Big Data Analytics Nep Sem 2 23-24

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

BIG DATA APPLICATIONS:

Big data has a wide range of applications across various industries and sectors.
Here are some common applications:

1. Healthcare: Big data analytics can be used to improve patient care, optimize
hospital operations, predict disease outbreaks, and personalize treatment plans.
Analyzing large datasets of patient records, medical images, and genomic data
can lead to insights that improve diagnoses and treatments.

2. Retail: Big data helps retailers understand customer behavior, preferences,


and trends. By analyzing purchase history, social media interactions, and
demographic data, retailers can personalize marketing campaigns, optimize
pricing strategies, and forecast demand more accurately.

3. Finance: Big data analytics is used in finance for fraud detection, risk
assessment, algorithmic trading, and customer segmentation. Analyzing large
volumes of financial transactions and market data enables financial institutions
to identify suspicious activities, assess credit risk, and tailor financial products
to customer needs.

4. Manufacturing: Big data analytics can optimize production processes,


improve quality control, and minimize downtime in manufacturing plants.
Analyzing sensor data from machines and equipment helps identify
inefficiencies and predictive maintenance needs, leading to cost savings and
increased productivity.

5. Transportation and logistics: Big data is used in transportation and logistics to


optimize route planning, fleet management, and supply chain operations.
Analyzing GPS data, traffic patterns, and weather forecasts helps companies
streamline logistics operations, reduce fuel consumption, and improve delivery
efficiency.

6. Telecommunications: Big data analytics enables telecommunications


companies to improve network performance, enhance customer experience, and
reduce churn. Analyzing call records, network traffic, and customer feedback
helps identify network issues, predict service disruptions, and offer personalized
services to customers.

7. Energy: Big data analytics is used in the energy sector for predictive
maintenance of power plants and infrastructure, energy demand forecasting, and
optimizing energy distribution. Analyzing data from smart meters, sensors, and
weather forecasts helps utilities better manage energy resources and reduce
costs.

8. Government and public services: Big data is used by governments for urban
planning, public safety, and policy-making. Analyzing data from various
sources, such as census data, crime statistics, and social media, helps
governments identify areas for improvement, allocate resources effectively, and
respond to emergencies more efficiently.
These are just a few examples of how big data is being applied across different
industries to drive innovation, improve decision-making, and create value. As
technology advances and more data becomes available, the potential
applications of big data are likely to continue expanding.

WHAT IS ANALYTICS?
Analytics is a field of computer science that uses math, statistics, and
machine learning to find meaningful patterns in data.

DIFFERENCE BETWEEN DATA ANALYSIS AND DATA ANALYTICS:


● Data analysis is a process involving the collection, manipulation, and
examination of data for getting a deep insight. Data analytics is taking the
analysed data and working on it in a meaningful and useful way to make
well-versed business decisions.

● Data analysis helps design a strong business plan for businesses, using
historical data that tell about what worked, what did not, and what was
expected from a product or service. Data analytics helps businesses in
utilizing the potential of past data and in turn identify new opportunities
that would help them plan future strategies. It helps in business growth by
reducing risks, costs, and making the right decisions.

Classification of analytics
Descriptive analytics
Descriptive analytics is a statistical method that is used to search and summarize
historical data in
order to identify patterns or meaning.

Data aggregation and data mining are two techniques used in descriptive
analytics to discover
historical data. Data is first gathered and sorted by data aggregation in order to
make the datasets
more manageable by analysts.

Descriptive analytics answer the question, “What happened?”. This type of


analytics is by far the most commonly used by customers, providing reporting
and analysis centered on past events. It helps companies understand things such
as:
● How much did we sell as a company?
● What was our overall productivity?
● How many customers churned in the last quarter?

Diagnostic Analytics
Diagnostic analytics, just like descriptive analytics, uses historical data to
answer a question. But instead of focusing on “the what”, diagnostic analytics
addresses the critical question of “why” an occurrence or anomaly occurred
within your data.
This type of analytics helps companies answer questions such as:
● Why did our company sales decrease in the previous quarter?
● Why are we seeing an increase in customer churn?
● Why are a specific basket of products vastly outperforming their prior
year sales figures?

Predictive Analytics
Predictive analytics is a form of advanced analytics that determines what is
likely to happen based on historical data using machine learning. Historical data
that comprises the bulk of descriptive and diagnostic analytics is used as the
basis of building predictive analytics models. Predictive analytics helps
companies address use cases such as:
● Predicting maintenance issues and part breakdown in machines.
● Determining credit risk and identifying potential fraud.
● Predict and avoid customer churn by identifying signs of customer
dissatisfaction.

Prescriptive Analytics
Prescriptive analytics is the fourth, and final pillar of modern analytics.
Prescriptive analytics pertains to true guided analytics where your analytics is
prescribing or guiding you toward a specific action to take. It is effectively the
merging of descriptive, diagnostic, and predictive analytics to drive decision
making.

Prescriptive analytics help to address use cases such as:

● Automatic adjustment of product pricing based on anticipated customer


demand and external factors.
● Flagging select employees for additional training based on incident reports in
the field.

Prescriptive analytics primary aim is to take the educated guess or assessment


out of data analytics and streamline the decision-making process.

What Big Data Analytics Challenges?

1. Need For Synchronization Across Disparate Data Sources


As data sets are becoming bigger and more diverse, there is a big challenge to
incorporate them into an analytical platform. If this is overlooked, it will create
gaps and lead to wrong messages and insights.

2. Acute Shortage Of Professionals Who Understand Big Data Analysis


The analysis of data is important to make this voluminous amount of data being
produced in every minute, useful. With the exponential rise of data, a huge
demand for big data scientists and Big Data analysts has been created in the
market. It is important for business organizations to hire a data scientist having
skills that are varied as the job of a data scientist is multidisciplinary.
Another major challenge faced by businesses is the shortage of professionals
who understand Big Data analysis. There is a sharp shortage of data scientists in
comparison to the massive amount of data being produced.

3. Getting Meaningful Insights Through The Use Of Big Data Analytics


It is imperative for business organizations to gain important insights from Big
Data analytics, and also it is important that only the relevant department has
access to this information. A big challenge faced by the companies in the Big
Data analytics is mending this wide gap in an effective manner.

4. Getting Voluminous Data Into The Big Data Platform


It is hardly surprising that data is growing with every passing day. This simply
indicates that business organizations need to handle a large amount of data on
daily basis. The amount and variety of data available these days can overwhelm
any data engineer and that is why it is considered vital to make data
accessibility easy and convenient for brand owners and managers.

5. Uncertainty Of Data Management Landscape


With the rise of Big Data, new technologies and companies are being developed
every day.However, a big challenge faced by the companies in the Big Data
analytics is to find out which technology will be best suited to them without the
introduction of new problems and potential risks.

6. Data Storage And Quality


Business organizations are growing at a rapid pace. With the tremendous
growth of the companies and large business organizations, increases the amount
of data produced. The storage of this massive amount of data is becoming a real
challenge for everyone. Popular data storage options like data lakes/ warehouses
are commonly used to gather and store large quantities of unstructured and
structured data in its native format. The real problem arises when a data lakes/
warehouse try to combine unstructured and inconsistent data from diverse
sources, it encounters errors. Missing data, inconsistent data, logic conflicts, and
duplicates data all result in data quality challenges.

7. Security And Privacy Of Data


Once business enterprises discover how to use Big Data, it brings them a wide
range of possibilities and opportunities. However, it also involves the potential
risks associated with big data when it comes to the privacy and the security of
the data. The Big Data tools used for analysis and storage utilizes the data
disparate sources.
This eventually leads to a high risk of exposure of the data,making it vulnerable.
Thus, the rise of voluminous amount of data increases privacy and security
concerns.
REQUIREMENT FOR NEW ANALYTICAL ARCHITECTURE:

NEED FOR BIG DATA FRAMEWORKS:

Implementation of Big Data infrastructure and technology can


be seen in various industries like banking, retail, insurance,
healthcare, media, etc. Big Data management functions like
storage, sorting, processing and analysis for such colossal
volumes cannot be handled by the existing database systems
or technologies. Frameworks come into picture in such
scenarios. Frameworks are nothing but toolsets that offer
innovative, cost-effective solutions to the problems posed by
Big Data processing and helps in providing insights,
incorporating metadata and aids decision making aligned to
the business needs.

Apache Hadoop

Hadoop is a Java-based platform founded by Mike Cafarella


and Doug Cutting. This open-source framework provides batch
data processing as well as data storage services across a
group of hardware machines arranged in clusters. Hadoop
consists of multiple layers like HDFS and YARN that work
together to carry out data processing.

● HDFS (Hadoop Distributed File System) is the


hardware layer that ensures coordination of data
replication and storage activities across various data
clusters. In the event of a cluster node failure, real-
time can still be made available for processing.
● YARN (Yet Another Resource Negotiator) is the layer
responsible for resource management and job
scheduling.
● MapReduce is the software layer that functions as the
batch processing engine.

Apache Spark

The Spark framework was formed at the University of


California, Berkeley. It is a batch processing framework with
enhanced data streaming processing. With full in-memory
computation and processing optimisation, it promises a
lightning fast cluster computing system.

Spark framework is composed of five layers.

● HDFS and HBASE: They form the first layer of data


storage systems.
● YARN and Mesos: They form the resource
management layer.
● Core engine: This forms the third layer.
● Library: This forms the fourth layer containing Spark
SQL for SQL queries while stream processing,
GraphX and Spark R utilities for processing graph
data and MLlib for machine learning algorithms.
● The fifth layer contains an application program
interface such as Java or Scala.

Storm
Storm, an open source framework, was developed in Clojure
language specifically for near real-time data streaming. It is an
application development platform-independent, can be used
with any programming language and guarantees delivery of
data with the least latency.

In Storm architecture, there are 2 nodes – the Master Node and


Worker/ Supervisor Node. The master node monitors the
failures of machines and is responsible for task allocation. In
case of a cluster failure, the task is reassigned to another one.

Presto
Presto is the open-source distributed SQL tool most suited for
smaller datasets up to 3Tb.

Presto engine includes a coordinator and multiple workers.


When client submits queries, these are parsed, analysed, their
execution planned and distributed for processing among the
workers by the coordinator.

You might also like