What is Data Mining? Definition Data mining is the process of extracting and discovering patterns in large data sets and extract useful information involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information from a data set and transforming the information into a comprehensible structure for further use. Why is it important? Effective data mining aids in various aspects of planning business strategies and managing operations. This includes marketing, sales, customer support, manufacturing, supply chain management (SCM), finance, and Human Resources (HR) . Data mining also supports fraud detection, risk management, cybersecurity planning and many other critical business use cases. Benefits Discover hidden insights and trends - Data mining takes raw data and finds order in the chaos. This can result in better-informed planning across corporate functions and industries, including advertising, finance, government, healthcare, human resources (HR), manufacturing, marketing, research, sales and supply chain management (SCM). Benefits Save budget - By analyzing performance data from multiple sources, bottlenecks in business processes can be identified to speed resolution and increase efficiency. Solve multiple challenges - Data mining is a versatile tool. Data from almost any source and any aspect of an organization can be analyzed to discover patterns and better ways of conducting business. Challenges Uncertainty - a major data mining effort might be well run, but produce unclear results, with no major benefit. Or inaccurate data can lead to incorrect insights. Cost - The data sets used must be stored and require heavy computational power to analyze and some data may be expensive to obtain. If new information is to be gathered by an organization, setting up a data pipeline might represent a new expense. The Data Mining Process The Data Mining Process The data mining process breaks down into four steps: 1. Data Gathering - Identify and assemble relevant data for an analytics application. The data might be located in different source systems, a data warehouse or a data lake, an increasingly common repository in big data environments that contain a mix of structured and unstructured data. 2. Data Preparation - Perform the necessary steps to get the data ready to be mined such as data exploration, profiling, pre- processing, data cleansing, and data transformation. The Data Mining Process 1. 2. 3. Data Mining - Once the data is prepared, a data scientist chooses the appropriate data mining technique and then implements one or more algorithms to do the mining. These techniques, for example, could analyze data relationships and detect patterns, associations and correlations. 4. Data Analysis and Preparation - Communicate the findings to the business. The data mining results are used to create analytical models that can help drive decision-making and other business actions. Types of Data Mining Techniques and Industry Examples Types of Data Mining Techniques Association rule mining Sequence and path analysis Classification Neural Networks Clustering Decision Trees Regression KNN Types of Data Mining Techniques Association rules, also referred to as market basket analysis, search for relationships between variables. This relationship in itself creates additional value within the data set as it strives to link pieces of data. Classification uses predefined classes to assign to objects. These classes describe the characteristics of items or represent what the data points have in common with each other. Types of Data Mining Techniques Clustering is similar to classification. However, clustering identifies similarities between objects, then groups those items based on what makes them different from other items. Decision trees are used to classify or predict an outcome based on a set list of criteria or decisions. A decision tree is used to ask for the input of a series of cascading questions that sort the dataset based on the responses given. Types of Data Mining Techniques K-Nearest Neighbor (KNN) is an algorithm that classifies data based on its proximity to other data. The basis for KNN is rooted in the assumption that data points that are close to each other are more similar to each other than other bits of data. Neural networks process data through the use of nodes. These nodes are comprised of inputs, weights, and an output. Types of Data Mining Techniques Predictive analysis strives to leverage historical information to build graphical or mathematical models to forecast future outcomes. Regression analysis, this technique discovers relationships in data by predicting outcomes based on predetermined variables. This can include decision trees and multivariate and linear regression. Industry Examples Manufacturing. Data mining applications for manufacturers include efforts to improve uptime and operational efficiency in production plants, supply chain performance and product safety. Entertainment. Streaming services analyze what users are watching or listening to and make personalized recommendations based on their viewing and listening habits. Industry Examples Healthcare. Data mining helps doctors diagnose medical conditions, treat patients, and analyze X-rays and other medical imaging results. Medical research also depends heavily on data mining, machine learning and other forms of analytics. Human Resource. HR departments typically work with large amounts of data. This includes retention, promotion, salary and benefit data. Industry Examples Social media. Social media companies use data mining to gather large amounts of data about users and their online activities. This data is controversially either used for targeted advertising or might be sold to third parties. Industry Examples Social media. Social media companies use data mining to gather large amounts of data about users and their online activities. This data is controversially either used for targeted advertising or might be sold to third parties. Thank You for Listening