Feature Engineering 1
Feature Engineering 1
Feature Engineering 1
com
Feature Engineering
Knowledge Discovery and Data Mining 1
Roman Kern
ISDS, TU Graz
2018-10-25
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 1 / 68
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Big picture: KDDM
Preprocessing
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 2 / 68
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Outline
1 Information Theory
2 Introduction
5 Feature Selection
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 3 / 68
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Recap
Review of the preprocessing phase
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 4 / 68
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Recap - Feature Extraction
Example of features:
Images → colours, textures, contours, ...
Signals → frequency, phase, samples, spectrum, ...
Time series → ticks, trends, self-similarities, ...
Biomed → dna sequence, genes, ...
Text → words, POS tags, grammatical dependencies, ...
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 5 / 68
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Recap - Feature Extraction
What is Part-of-Speech?
The process to apply word classes to words within a sentence
For example
Car → noun
Writing → noun or verb
Grow → verb
From → preposition
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 6 / 68
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Recap - Feature Extraction
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 7 / 68
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Recap - Feature Extraction
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 8 / 68
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Recap - Feature Extraction
Hidden Markov Models
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 10 / 68
Information Theory
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Information Theory
Review of information theory
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 11 / 68
Information Theory
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Entropy
What is Entropy?
Let 𝑋 be a discrete random variable with alphabet 𝒳 and probability
mass function 𝑝(𝑥)
𝑝(𝑥) = 𝑃 𝑟{𝑋 = 𝑥}, 𝑥 ∈ 𝒳
The entropy of a variable X is defined as
𝐻(𝑋) = − ∑𝑥∈𝒳 𝑝(𝑥)𝑙𝑜𝑔2 𝑝(𝑥)
... entropy is a measure for information content of a variable (in bits)
Note 1: By convention 0𝑙𝑜𝑔2 0 = 0
Note 2: Entropy is the lower bound on the average number of yes/no questions to guess the
state of a variable.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 12 / 68
Information Theory
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Entropy
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 13 / 68
Information Theory
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Entropy
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 14 / 68
Information Theory
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Entropy
What is Entropy?
Entropy is a measure for uncertainty
High entropy → uniform distribution
Histogram of the frequencies would be even
Values are hard to predict
Low entropy → peaks and valleys in the distribution
Histogram of the frequencies would have spikes
Values are easier to predict
Entropy is always non-negative
The entropy is always less (or equal) than the logarithm of the
alphabet size
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 15 / 68
Information Theory
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Joint Entropy
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 16 / 68
Information Theory
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Conditional Entropy
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 17 / 68
Information Theory
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Conditional Entropy
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 18 / 68
Information Theory
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Information Gain
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 19 / 68
Information Theory
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Information Gain
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 20 / 68
Information Theory
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Mutual Information
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 21 / 68
Information Theory
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Pointwise Mutual Information
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 22 / 68
Information Theory
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Relative Entropy
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 23 / 68
Information Theory
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Entropy Overview
Markov chains
Random variables 𝑋, 𝑌 , 𝑍 are said to form a Markov chain in that order
(denoted 𝑋 → 𝑌 → 𝑍) if the conditional distribution of Z depends only
on Y and is conditionally independent of X, ie if the joint probability mass
function can be written as:
𝑝(𝑥, 𝑦, 𝑧) = 𝑝(𝑥)𝑝(𝑦|𝑥)𝑝(𝑧|𝑦)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 25 / 68
Licensed to Erlon Abrantes Introduction
de Andrade - erlonabrantes@gmail.com
Introduction
What are features & feature engineering
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 26 / 68
Licensed to Erlon Abrantes Introduction
de Andrade - erlonabrantes@gmail.com
Introduction
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 27 / 68
Licensed to Erlon Abrantes Introduction
de Andrade - erlonabrantes@gmail.com
Introduction
Note: The exploration vs. experimental work characterises many data science scenarios
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 28 / 68
Licensed to Erlon Abrantes Introduction
de Andrade - erlonabrantes@gmail.com
Introduction
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 29 / 68
Licensed to Erlon Abrantes Introduction
de Andrade - erlonabrantes@gmail.com
Feature Engineering Goals
Goals
The task also depends on the goal of feature engineering:
1 If the goal is to get the best prediction accuracy
2 ... or an explainable model
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 30 / 68
Licensed to Erlon Abrantes Introduction
de Andrade - erlonabrantes@gmail.com
Feature Engineering Terminology
Important Terms
Feature Set Set of features used for a task
Feature Space High dimensional space spawned by the features (range of
the feature values)
Instance Single assignment of features and values (an example)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 31 / 68
Licensed to Erlon Abrantes Introduction
de Andrade - erlonabrantes@gmail.com
Introduction - Example
Figure: Features to predict which type of contact lens is most appropriate (none,
soft, hard)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 32 / 68
Licensed to Erlon Abrantes Introduction
de Andrade - erlonabrantes@gmail.com
Introduction - Example
Figure: Relation between the features with the contact lens type
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 33 / 68
Licensed to Erlon Abrantes Introduction
de Andrade - erlonabrantes@gmail.com
Introduction - Example
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 35 / 68
Feature Value Processing
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 36 / 68
Feature Value Processing
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Processing
Feature binarisation
Threshold numerical values to get boolean values
Needed as some algorithms just take boolean features as input
Feature discretization
Convert continuous features to discrete features
Equal sized partitions? Equal interval partitions?
Feature value transformation
Scaling of values
Move the centre
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 37 / 68
Feature Value Processing
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Normalisation
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 38 / 68
Feature Value Processing
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Weighting
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 39 / 68
Feature Engineering for Text Mining
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 40 / 68
Feature Engineering for Text Mining
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Contextual Features
Bigram Features
When working with single words as features, often the sequence
information is lost
... but, this could potentially a source of information
→ introduce new feature as a combination of two adjacent words
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 41 / 68
Feature Engineering for Text Mining
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Contextual Features
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 42 / 68
Feature Engineering for Text Mining
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Contextual Features
n-grams
Bigrams can be extended for more than two words
→ n-grams
Can be extended to allow gap in between words (skip n-grams)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 43 / 68
Feature Engineering for Text Mining
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Contextual Features
Character n-grams
n-gram can be created on words, but on characters as well
e.g. The quick brown fox jumps over the lazy dog
Character tri-grams: the, qui, uic, ick, bro, row, own, ...
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 44 / 68
Feature Engineering for Text Mining
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
External Sources
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 45 / 68
Feature Engineering for Text Mining
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
External Sources
Figure: Wordnet entry for the word fox, the first sense contains the hypernyms
canine and canid.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 46 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Selection
Less is more - sometimes...
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 47 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Selection
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 48 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Selection
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 49 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Selection
Curse of Dimensionality
The problem of having too many features
More features make the model more expressive
but not all of the features are relevant
The higher the dimensionality, the higher the chances of spurious
features
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 50 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Selection
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 51 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Selection
Feature selection
Approach: select the sub-set of all features without redundant or
irrelevant features
Set-of-all-subset problem → NP hard
Need to find more practical approaches
Unsupervised, e.g. heuristics
Supervised, e.g. using a training data set
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 52 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Selection
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 53 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Selection
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 54 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Selection
Unsupervised approach
Unsupervised ranked feature selection
Scoring function to rank the feature according to their importance
... then just use the top 5% (10%, 25%, ...)
e.g. for textual data use the frequency of words within a reference
corpus
Feature Count Freq.
the 3,032,573 0.879
in 2,919,623 0.846
a 2,903,352 0.841
of 2,888,379 0.837
is 2,639,282 0.765
and 2,634,096 0.763
⋮ ⋮ ⋮
with 1,703,251 0.494
Table: Top 50 word within the Wikipedia, the top ranked word (the) occurs in
88% of all instances.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 55 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Selection
Supervised approaches
Filter approaches
Wrapper approaches
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 56 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Selection
Supervised approaches
Filter approaches
Compute some measure for estimating the ability to discriminate
between classes
Typically measure feature weight and select the best n features →
supervised ranked feature selection
Problems:
Redundant features (correlated features will all have similar weights)
Dependant features (some features may only be important in
combination)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 57 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Selection - Information Gain
Example
Features on contact lenses
Ranked attributes:
0.5488 4 tear-prod-rate
0.377 3 astigmatism
0.0395 2 spectacle-prescrip
0.0394 1 age
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 58 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Selection
Supervised approaches
Wrapper approaches
Search through the space of all possible feature subsets
Each search subset is tried out with a learning algorithm
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 59 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Selection - Wrapper Approach
Wrapper approach
General algorithm:
1 Initial subset selection
2 Try a subset with a learner
3 Modify the feature subset
4 Rerun the learner
5 Measure the difference
6 GOTO 2
Advantages: combination of features, ignore redundant/irrelevant
features
Disadvantage: computationally intensive
2 basic ways for i) initial subset selection, ii) modification of subset:
forward selection and backward elimination
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 60 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Selection - Wrapper Approach
Forward Selection
Start with empty set
Add each feature not in set
Pick the one with the highest increase
Stop if there is no increase
Backward Elimination
Start with full feature set
Try to remove features
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 61 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Interactions
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 62 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Regularisation
e.g. 𝐿0 ... taking the number of non-zero features, 𝐿1 ... sum of the
feature values, ...
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 63 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Transformation
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 64 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Transformation
Feature transformation
Map features into high-dimensional space
Create more features
The more features, the higher the dimensionality
The higher the dimensionality, the higher the chances that the
problem is linearly separable
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 65 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Feature Transformation
Kernel trick
Some algorithms employ a scalar product of the features (e.g. SVMs)
Transform into higher dimensionality “on-the-fly”
... by introducing a (kernel) function
Original: < 𝑥, 𝑦 >, with kernel function: 𝜑(𝑥, 𝑦)
Number of different well-known kernel functions (e.g. Gaussian
kernel)
... which often require parameters (to tune)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 67 / 68
Feature Selection
Licensed to Erlon Abrantes de Andrade - erlonabrantes@gmail.com
Thank You!
Next up: Data Matrices
Further information
http://www.cs.cmu.edu/~awm/tutorials
http://www.icg.isy.liu.se/courses/infotheory/lect1.pdf
http://www.cs.princeton.edu/courses/archive/spring10/cos424/slides/18-feat.pdf
http://ufal.mff.cuni.cz/~zabokrtsky/courses/npfl104/html/feature_engineering.pdf
http://www.ke.tu-darmstadt.de/lehre/archiv/ss06/web-mining/wm-features.pdf
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Roman Kern (ISDS, TU Graz) KDDM1 - Feature Engineering 2018-10-25 68 / 68