Nothing Special   »   [go: up one dir, main page]

ML Final MCQsa

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

1.

Gradient descent is an optimization algorithm for finding the local minimum of a


function.
a) True
b) False

2. We can use gradient descent as a best solution, when the parameters cannot be
calculated analytically.
a) False
b) True

3. Which of the following statements is false about gradient descent?


a) It updates the weight to comprise a small step in the direction of the negative
gradient
b) The learning rate parameter is η where η > 0
c) In each iteration, the gradient is re-evaluated for the new weight vector
d) In each iteration, the weight is updated in the direction of positive gradient

4. In batch method gradient descent, each step requires the entire training set be
processed in order to evaluate the error function.
a) True
b) False

5. Simple gradient descent is a better batch optimization method than conjugate


gradients and quasi-newton methods.
a) False
b) True

7. The gradient is set to zero to find the minimum or the maximum of a function.
a) False
b) True

8. The main difference between gradient descents variants are based on the amount
of data.
a) True
b) False

9. Which of the following statements is false about choosing learning rate in gradient
descent?
a) Small learning rate leads to slow convergence
b) Large learning rate cause the loss function to fluctuate around the minimum
c) Large learning rate can cause to divergence
d) Small learning rate cause the training to progress very fast
1. When we make the half-space learning more expressive, the computational
complexity of learning may increase.
a) False
b) True

3. Many learning algorithms for half-spaces can be carried out just on the basis of
the values of the kernel function over pairs of domain points.
a) True
b) False

1. A Support Vector Machine (SVM) is a discriminative classifier defined by a


separating hyperplane.
a) True
b) False

2. Support vector machines cannot be used for regression.


a) False
b) True

3. Which of the following statements is not true about SVM?


a) It is memory efficient
b) It can address a large number of predictor variables
c) It is versatile
d) It doesn’t require feature scaling

4. Which of the following statements is not true about SVM?


a) It has regularization capabilities
b) It handles non-linear data efficiently
c) It has much improved stability
d) Choosing an appropriate kernel function is easy

5. Minimizing a quadratic objective function (w2i) subject to certain constraints where


i= 1 to n, in SVM is known as primal formulation of linear SVMs.
a) True
b) False

7. Which of the following statements is not true about dual formulation in SVM
optimisation problem?
a) No need to access data, need to access only dot products
b) Number of free parameters is bounded by the number of support vectors
c) Number of free parameters is bounded by the number of variables
d) Regularizing the sparse support vector associated with the dual hypothesis is
sometimes more intuitive than regularizing the vector of regression coefficients
8. The optimal classifier is the one with the largest margin.
a) True
b) False

1. The goal of a support vector machine is to find the optimal separating hyperplane
which minimizes the margin of the training data.
a) False
b) True

2. Which of the following statements is not true about hyperplane in SVM?


a) If a hyperplane is very close to a data point, its margin will be small
b) If an hyperplane is far from a data point, its margin will be large
c) Optimal hyperplane will be the one with the biggest margin
d) If we select a hyperplane which is close to the data points of one class, then
it generalize well

3. Which of the following statements is not true about optimal separating


hyperplane?
a) It correctly classifies the training data
b) It is the one which will generalize better with unseen data
c) Finding the optimal separating hyperplane can be formulated as a convex
quadratic programming problem
d) The optimal hyperplane cannot correctly classifies all the data while being
farthest away from the data points

4. Support Vector Machines are known as Large Margin Classifiers.


a) True
b) False

9. SVM find outs the probability value.


a) True
b) False

1. In SVM the distance of the support vector points from the hyperplane are called
the margins.
a) True
b) False

2. If the support vector points are farther from the hyperplane, then this hyperplane
can also be called as margin maximizing hyperplane.
a) True
b) False

3. Which of the following statements is not true about the C parameter in SVM?
a) Large values of C give solutions with less misclassification errors
b) Large values of C give solutions with smaller margin
c) Small values of C give solutions with bigger margin
d) Small values of C give solutions with less classification errors

5. The maximum margin linear classifier is the linear classifier with the maximum
margin.
a) True
b) False

6. Which of the following statements is not true about maximum margin?


a) It is safe and empirically works well
b) It is not sensitive to removal of any non support vector data points
c) If the location of the boundary is not perfect due to noise, this gives us the least
chance of misclassification
d) It is not immune to removal of any non-support-vector data points

7. Hard SVM is the learning rule in which return an ERM hyperplane that separates
the training set with the largest possible margin.
a) True
b) False
8. The output of hard-SVM is the separating hyperplane with the largest margin.
a) True
b) False

1. The Soft SVM assumes that the training set is linearly separable.
a) True
b) False

2. Soft SVM is an extended version of Hard SVM.


a) True
b) False

3. Linear Soft margin SVM can only be used when the training data are linearly
separable.
a) True
b) False

9. The slack variable value of the point on the decision boundary of the Soft SVM is
equal to one.
a) True
b) False

1. Which of the following statements is not true about the Decision tree?
a) It can be applied on binary classification problems only
b) It is a predictor that predicts the label associated with an instance by traveling
from a root node of a tree to a leaf
c) At each node, the successor child is chosen on the basis of a splitting of the input
space
d) The splitting is based on one of the features or on a predefined set of splitting
rules

2. Decision tree uses the inductive learning machine learning approach.


a) True
b) False

3. Which of the following statements is not true about a splitting rule at internal nodes
of the tree based on thresholding the value of a single feature?
a) It move to the right or left child of the node on the basis of 1[xi < ϑ], where i ∈ [d]
is the index of the relevant feature
b) It move to the right or left child of the node on the basis of 1[xi < ϑ], where ϑ ∈ R
is the threshold
c) Here a decision tree splits the instance space, X = Rd, into cells, where each leaf
of the tree corresponds to one cell
d) Splits based on thresholding the value of a single feature are also known as
multivariate splits

2. Practical decision tree learning algorithms are based on heuristics.


a) True
b) False

3. Which of the following statements is not true about the Decision tree?
a) It starts with a tree with a single leaf and assign this leaf a label according to a
majority vote among all labels over the training set
b) It performs a series of iterations and on each iteration, it examine the effect of
splitting a single leaf
c) It defines some gain measure that quantifies the improvement due to the split
d) Among all possible splits, it either choose the one that minimizes the gain
and perform it, or choose not to split the leaf at all

5. Which of the following statements is not true about the ID3 algorithm?
a) It is used to generate a decision tree from a dataset
b) It begins with the original set S as the root node
c) On each iteration of the algorithm, it iterates through every unused attribute of the
set S and calculates the entropy or the information gain of that attribute
d) Finally it selects the attribute which has the largest entropy value

6. Which of the following statements is not true about Information Gain?


a) It is a gain measure that is used in the ID3 algorithms
b) It is the difference between the entropy of the label before and after the split
c) It is based on the decrease in entropy after a data-set is split on an attribute
d) Constructing a decision tree is all about finding attribute that returns the
lowest information gain

7. Which of the following statements is not true about Information Gain?


a) It is the addition in entropy by transforming a dataset
b) It is calculated by comparing the entropy of the dataset before and after a
transformation
c) It is often used in training decision trees
d) It is also known as Kullback-Leibler divergence

8. Which of the following statements is not true about Information Gain?


a) It is the amount of information gained about a random variable or signal from
observing another random variable
b) It tells us how important a given attribute of the feature vectors is
c) It implies how much entropy we removed
d) Higher Information Gain implies less entropy removed

10. Which of the following statements is not an objective of Information Gain?


a) It tries to determine which attribute in a given set of training feature vectors is
most useful for discriminating between the classes to be learned
b) Decision Trees algorithm will always tries to minimize Information Gain
c) It is used to decide the ordering of attributes in the nodes of a decision tree
d) Information Gain of certain event is the discrepancy of the amount ofinformation
before someone observes that event and the amount after observation

11. Information Gain and Gini Index are the same.


a) True
b) False

12. Which of the following statements is not true about Information Gain?
a) It is used to determine which feature/attribute gives us the maximum information
about a class
b) It is based on the concept of entropy, which is the degree of impurity or disorder
c) It aims to reduce the level of entropy starting from the root node to the leave
nodes
d) It is often promote the level of entropy starting from the root node to the
leave nodes

1. In the ID3 algorithm the returned tree will usually be very large.
a) True
b) False

You might also like