Nothing Special   »   [go: up one dir, main page]

Finalexam01summer PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

United International University

Department of Computer Science and Engineering


CSE 389 Machine Learning,
Final Exam, Summer 2017
Total Marks: 40, Time: 2 Hours

Answer any 4 questions (4 × 10 = 40).

1. (a) How can you detect whether your method is over-fitting or not? [2]
(b) Suppose, you are trying to learn a model from your training data using Support Vector Machines (SVM) and found
that your method is over-fitting. What necessary steps are you going to take to eliminate or reduce over-fitting? [3]
(c) Consider the following data:

x1 2 1 -1 -2 1 2 -1 -2
x2 2 1 -1 -2 -1 -2 1 2
y +1 +1 +1 +1 -1 -1 -1 -1

Here, x1 and x2 are considered as features and y is the target label. Clearly this data is not linearly separable. Is it
possible to apply support vector machine (SVM) classifier on this data? How? [3]
(d) SVM is a maximum margin classifier. However, the goal is to minimize the following term:

1
min ||w||2 s.t. yi (wT xi + b) ≥ 1
w,b 2

Is it justified to minimize this expression to find a maximum margin? [2]

2. (a) Why does bootstrap sampling methods work in spite of existing outliers in the dataset? [2]
(b) Design a classifier algorithm that uses bagging. Write the pseudo-code of your algorithm. (Please do not write the
algorithm of Random Forest classifier here). How, your algorithm is different than the Random Forest classifier
Algorithm? [4]
(c) In an iteration of AdaBoost classifier, the weights of correctly classified instances were: 0.2, 0.15, 0.15, 0.1, 0.05 and
the weights for the incorrectly classified instances were, 0.5, 0.15, 0.1, 0.5. What is the curent error rate? What will
be the updated weights for the next iteration? [2]
(d) Decision function for AdaBoost classifier is as follows:

XM
H(~x) = sign( αt ht (~x))
t=1

What is the significance of αt in this expression? [2]

3. (a) How does decision tree algorithms like CART handles numerical attributes and numerical labels? [3]
(b) Explain the process of decision tree pruning. [3]
(c) Which feature selection technique you think is the best and why? [2]
(d) This is a problem related to feature selection. You notice in training your classifier that the training error rate you
achieve, as a function of the number of features, looks like the left-most plot in the following figure:
Which of the plots (a), (b), or (c) in this figure is most likely to reflect the error rate of your classifier on a held-out
validation set (as a function of the number of features)? and Why? [2]

4. (a) What is the optimal space complexity of the Viterbi algorithm? [2]
(b) Consider the following Hidden Markov Model:

There are two states denoted by 0 and 1 with two symbols A and B. Suppose, you observations in first three states
are, O1 = A, O2 = B, O3 = A. What is the most likely sequence of states? [6]
(c) Suppose, you are using SVM to learn a model. Consider the following three graphs (left, center and right), each
showing the relationship between number of samples needed against the number of features required to train. Which
graph is the most justified and reflection of good generalization properties? [2]

5. (a) What are the major factors and advances that paved the success of modern deep learning? Provide Justification. [3]
(b) Briefly explain the idea of auto-encoders and their applications. [2]
(c) ‘Depth is a more desired parameter compared to width in a deep neural network.’ - Justify this statement in the
context of deep learning. [3]
(d) Mr. V is proposing the following expression for SVM classifier to minimize in stead of the one in Question 1d:

m
1 X
min( ||w||2 + C ζi ) s.t. yi (wT xi + b) ≥ 1 − ζi
w,b 2
i=1

Here, m is the total number of instances in the training dataset. The term ζi ≥ 0 denotes the error of classification
for each instances. Now Mr. V is proposing to use two values of C. They are C = 1 and C = 1000 for a dataset,
where 40% of the training instances are miss-classified if the equation in Question 1d is used. What value of C you
suggest to Mr. V and why? [2]

You might also like