Chapter 1
Chapter 1
Chapter 1
Introduction
Task-relevant Data
Data Cleaning
Data Integration
Databases
Data Mining: Concepts and Techniques 12/5/2019 15
Learning the application domain:
◦ relevant prior knowledge and goals of application
Creating a target data set: data selection
Data cleaning and preprocessing: (may take 60% of effort!)
Data reduction and transformation:
◦ Find useful features, dimensionality/variable reduction, invariant
representation.
Choosing functions of data mining
◦ summarization, classification, regression, association, clustering.
Choosing the mining algorithm(s)
Data mining: search for patterns of interest
Pattern evaluation and knowledge presentation
◦ visualization, transformation, removing redundant patterns, etc.
Use of discovered knowledge
Data Exploration
Statistical Analysis, Querying and Reporting
Pattern evaluation
Knowledge
Data mining engine Base
Database or data
warehouse
Data cleaning , integration and selection
12/5/2019 18
Data Mining: Concepts and Techniques
Relational databases
Data warehouses
Transactional databases
Advanced DB and information repositories
◦ Object-oriented and object-relational databases
◦ Spatial databases
◦ Time-series data and temporal data
◦ Text databases and multimedia databases
◦ Heterogeneous and legacy databases
◦ WWW
Machine
Learning
Data Mining Visualization
Information Other
Science Disciplines
Data Mining: Concepts and
Techniques 12/5/2019 25
General functionality
◦ Descriptive data mining
◦ Predictive data mining
Different views, different classifications
◦ Kinds of databases to be mined
◦ Kinds of knowledge to be discovered
◦ Kinds of techniques utilized
◦ Kinds of applications adapted