Seminar5-Week 5-Data Mining and Data Analytics
Seminar5-Week 5-Data Mining and Data Analytics
Seminar5-Week 5-Data Mining and Data Analytics
▪ Your Lecturer:
▪ June Cao
▪ Best contact is email (as above)
▪ No unique definition
• “The field of data mining is still relatively new and in a state of
evolution. The first International Conference on Knowledge Discovery
and Data Mining (KDD) was held in 1995, and there are a variety of
definitions of data mining.” (Shmueli, Patel, and Bruce 2010)
▪ Commonly defined as:
• … the use of efficient techniques for the analysis of very large
collections of data and the extraction of useful and possibly unexpected
patterns in data.
▪ Gartner Group
▪ “Data mining is the process of discovering meaningful
new correlations, patterns and trends by sifting through
large amounts of data stored in repositories, using pattern
recognition technologies as well as statistical and
mathematical techniques”
▪ Source: https://www.gartner.com/en/information-
technology/glossary/data-mining
Database Systems
Database
Technology Statistics
Machine Visualization
Learning Data Mining
Pattern
Recognition Other
Disciplines
Algorithm
Source: lecture notes of Tan, Steinbach, Karpatne, and Kumar, 2018 Computational Simulations
Sensor Networks
Curtin University is a trademark of Curtin University of Technology
CRICOS Provider Code 00301J
Why Data Mining?-Cont.
▪ Great opportunities to improve productivity in all walks of life
Improving health care and reducing costs Predicting the impact of climate change
Image source:https://databigandsmall.com/2014/01/09/network-data-new-and-old-from-informal-ties-to-formal-networks/
Correlation of stocks
▪ Recommendations:
Users who buy this item often buy this item as well
Users who watched James Bond movies, also watched Jason Bourne
movies.
▪ Document Clustering:
• Goal: To find groups of documents that are similar to each other
based on the important terms appearing in them.
• Approach: To identify frequently occurring terms in each
document. Form a similarity measure based on the frequencies of
different terms. Use it to cluster.
• Gain: Information Retrieval can utilize the clusters to relate a new
document or search term to clustered documents.
▪ Application:
• Create a catalog to send out that has at least one item of interest for every
customer.
Set Classifier
▪ Clicks to Customers
• Business problem: 50% of Dell’s clients order their computer through the web.
However, the retention rate is 0.5%, i.e., of visitors of Dell’s web page become
customers.
• Solution: Through the sequence of their clicks, cluster customers and design
website, interventions to maximize the number of customers who eventually buy.
• Benefits: Increase revenues.