Data Mining - Output: Knowledge Representation
Data Mining - Output: Knowledge Representation
Data Mining - Output: Knowledge Representation
Knowledge Representation
Chapter 3
Representing Structural Patterns
• There are many different ways of representing
patterns
• 2 covered in Chapter 1 – decision trees and
classification rules
• Learned pattern is a form of “knowledge
representation” (even if the knowledge does
not seem very impressive)
Decision Trees
• Make decisions by following branches down the tree
until a leaf is found
• Classification based on contents of leaf
• Non-leaf node usually involve testing a single
attribute
– Usually for different values of nominal attributes, or for
range of a numeric attribute (most commonly a two way
split, > some value and < same value)
– Less commonly, compare two attribute values, or some
function of multiple attributes
• Common for an attribute once used to not be used at
a lower level of same branch
Decision Trees
• Missing Values
– May be treated as another possible value of a
nominal attribute – if missing data may mean
something
– May follow most popular branch when data is
missing from test data
– More complicated approach – rather than going all-
or-nothing, can ‘split’ the test instance in proportion
to popularity of branches in test data –
recombination at end will use vote based on weights
Classification Rules
• Popular alternative to decision trees
• LHS / antecedent / precondition – tests to determine if rule
is applicable
– Tests usually ANDed together
– Could be general logical condition (AND/OR/NOT) but learning
such rules is MUCH less constrained
• RHS / consequent / conclusion – answer –usually the class
(but could be a probability distribution)
• Rules with the same conclusion essentially represent an OR
• Rules may be an ordered set, or independent
• If independent, policy may need to be established for if
more than one rule matches (conflict resolution strategy) or
if no rule matches
Rules / Trees
• Rules can be easily created from a tree – but not the
most simple set of rules
• Transforming rules into a tree is not straightforward
(see “replicated subtree” problem – next two slides)
• In many cases rules are more compact than trees –
particularly if default rule is possible
• Rules may appear to be independent nuggets of
knowledge (and hence less complicated than trees) –
but if rules are an ordered set, then they are much more
complicated than they appear
If a and b then x
If c and d then x
Apriori
Minimum support: 0.15
Minimum metric <confidence>: 0.9
Number of cycles performed: 17
Best rules found:
1. outlook=rainy 5 ==> play=no 5 conf:(1)
2. temperature=cool 4 ==> humidity=normal 4 conf:(1)
3. temperature=hot windy=FALSE 3 ==> play=no 3 conf:(1)
4. temperature=hot play=no 3 ==> windy=FALSE 3 conf:(1)
5. outlook=rainy windy=FALSE 3 ==> play=no 3 conf:(1)
6. outlook=rainy humidity=normal 3 ==> play=no 3 conf:(1)
7. outlook=rainy temperature=mild 3 ==> play=no 3 conf:(1)
8. temperature=mild play=no 3 ==> outlook=rainy 3 conf:(1)
9. temperature=hot humidity=high windy=FALSE 2 ==> play=no 2 conf:(1)
10. temperature=hot humidity=high play=no 2 ==> windy=FALSE 2 conf:(1)
Rules with Exceptions
• Skip
Rules involving Relations
• More than the value for attributes may be
important
• See book example on next slide
Shaded: standing
Unshaded: lying
•x •x
x
•y
•x
x
•y
•x
•x
•x T
•z •y •y
•z •z •z •y •y
•z x •z •y
•y
•z •y
•z •z
Additional Details
• Distance/ Similarity function must deal with
binaries/nominals – usually by all or nothing match –
but mild should be a better match to hot than cool is!
• Distance / Similarity function is simpler if data is
normalized in advance. E.g. $10 difference in
household income is not significant, while 1.0 distance
in GPA is big
• Distance/Similarity function should weight different
attributes differently – key task is determining those
weights
Further Wrinkles
• May not need to save all instances
– Very normal instances may not all need be be saved
– Some approaches actually do some generalization
But …
• Not really a structural pattern that can be
pointed to
• However, many people in many task/domains
will respect arguments based on “previous
cases” (diagnosis, law among them)
• Book points out that instances + distance metric
combine to form class boundaries
– With 2 attributes, these can actually be envisioned
<see next slide>
(a) (b) (c) (d)
(c) 1 2 3 (d)