Nothing Special   »   [go: up one dir, main page]

Copy of Data Mining

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Data Mining

CSE7/L Video Presentation


What is Data Mining?
Definition
Data mining is the process of extracting and discovering
patterns in large data sets and extract useful information
involving methods at the intersection of machine
learning, statistics, and database systems.
Data mining is an interdisciplinary subfield of computer
science and statistics with an overall goal of extracting
information from a data set and transforming the
information into a comprehensible structure for further
use.
Why is it important?
Effective data mining aids in various aspects of
planning business strategies and managing
operations. This includes marketing, sales, customer
support, manufacturing, supply chain management
(SCM), finance, and Human Resources (HR) .
Data mining also supports fraud detection, risk
management, cybersecurity planning and many other
critical business use cases.
Benefits
Discover hidden insights and trends - Data mining
takes raw data and finds order in the chaos. This can
result in better-informed planning across corporate
functions and industries, including advertising, finance,
government, healthcare, human resources (HR),
manufacturing, marketing, research, sales and supply
chain management (SCM).
Benefits
Save budget - By analyzing performance data from
multiple sources, bottlenecks in business processes can
be identified to speed resolution and increase efficiency.
Solve multiple challenges - Data mining is a versatile
tool. Data from almost any source and any aspect of an
organization can be analyzed to discover patterns and
better ways of conducting business.
Challenges
Uncertainty - a major data mining effort might be well run,
but produce unclear results, with no major benefit. Or
inaccurate data can lead to incorrect insights.
Cost - The data sets used must be stored and require heavy
computational power to analyze and some data may be
expensive to obtain. If new information is to be gathered by
an organization, setting up a data pipeline might represent a
new expense.
The Data Mining
Process
The Data Mining Process
The data mining process breaks down into four steps:
1. Data Gathering - Identify and assemble relevant data for an
analytics application. The data might be located in different
source systems, a data warehouse or a data lake, an
increasingly common repository in big data environments that
contain a mix of structured and unstructured data.
2. Data Preparation - Perform the necessary steps to get the
data ready to be mined such as data exploration, profiling, pre-
processing, data cleansing, and data transformation.
The Data Mining Process
1.
2.
3. Data Mining - Once the data is prepared, a data scientist
chooses the appropriate data mining technique and then
implements one or more algorithms to do the mining. These
techniques, for example, could analyze data relationships and
detect patterns, associations and correlations.
4. Data Analysis and Preparation - Communicate the findings to
the business. The data mining results are used to create
analytical models that can help drive decision-making and
other business actions.
Types of Data Mining
Techniques and
Industry Examples
Types of Data Mining
Techniques
Association rule mining Sequence and path analysis
Classification Neural Networks
Clustering Decision Trees
Regression KNN
Types of Data Mining
Techniques
Association rules, also referred to as market basket analysis,
search for relationships between variables. This relationship in
itself creates additional value within the data set as it strives to
link pieces of data.
Classification uses predefined classes to assign to objects.
These classes describe the characteristics of items or represent
what the data points have in common with each other.
Types of Data Mining
Techniques
Clustering is similar to classification. However, clustering
identifies similarities between objects, then groups those items
based on what makes them different from other items.
Decision trees are used to classify or predict an outcome based
on a set list of criteria or decisions. A decision tree is used to ask
for the input of a series of cascading questions that sort the
dataset based on the responses given.
Types of Data Mining
Techniques
K-Nearest Neighbor (KNN) is an algorithm that classifies data
based on its proximity to other data. The basis for KNN is rooted in
the assumption that data points that are close to each other are
more similar to each other than other bits of data.
Neural networks process data through the use of nodes. These
nodes are comprised of inputs, weights, and an output.
Types of Data Mining
Techniques
Predictive analysis strives to leverage historical information to
build graphical or mathematical models to forecast future
outcomes.
Regression analysis, this technique discovers relationships in
data by predicting outcomes based on predetermined variables.
This can include decision trees and multivariate and linear
regression.
Industry Examples
Manufacturing. Data mining applications for manufacturers
include efforts to improve uptime and operational efficiency
in production plants, supply chain performance and product
safety.
Entertainment. Streaming services analyze what users are
watching or listening to and make personalized
recommendations based on their viewing and listening
habits.
Industry Examples
Healthcare. Data mining helps doctors diagnose medical
conditions, treat patients, and analyze X-rays and other
medical imaging results. Medical research also depends
heavily on data mining, machine learning and other forms of
analytics.
Human Resource. HR departments typically work with large
amounts of data. This includes retention, promotion, salary
and benefit data.
Industry Examples
Social media. Social media companies use data mining to
gather large amounts of data about users and their online
activities. This data is controversially either used for targeted
advertising or might be sold to third parties.
Industry Examples
Social media. Social media companies use data mining to
gather large amounts of data about users and their online
activities. This data is controversially either used for targeted
advertising or might be sold to third parties.
Thank You
for Listening

You might also like