ML Final MCQsa
ML Final MCQsa
ML Final MCQsa
2. We can use gradient descent as a best solution, when the parameters cannot be
calculated analytically.
a) False
b) True
4. In batch method gradient descent, each step requires the entire training set be
processed in order to evaluate the error function.
a) True
b) False
7. The gradient is set to zero to find the minimum or the maximum of a function.
a) False
b) True
8. The main difference between gradient descents variants are based on the amount
of data.
a) True
b) False
9. Which of the following statements is false about choosing learning rate in gradient
descent?
a) Small learning rate leads to slow convergence
b) Large learning rate cause the loss function to fluctuate around the minimum
c) Large learning rate can cause to divergence
d) Small learning rate cause the training to progress very fast
1. When we make the half-space learning more expressive, the computational
complexity of learning may increase.
a) False
b) True
3. Many learning algorithms for half-spaces can be carried out just on the basis of
the values of the kernel function over pairs of domain points.
a) True
b) False
7. Which of the following statements is not true about dual formulation in SVM
optimisation problem?
a) No need to access data, need to access only dot products
b) Number of free parameters is bounded by the number of support vectors
c) Number of free parameters is bounded by the number of variables
d) Regularizing the sparse support vector associated with the dual hypothesis is
sometimes more intuitive than regularizing the vector of regression coefficients
8. The optimal classifier is the one with the largest margin.
a) True
b) False
1. The goal of a support vector machine is to find the optimal separating hyperplane
which minimizes the margin of the training data.
a) False
b) True
1. In SVM the distance of the support vector points from the hyperplane are called
the margins.
a) True
b) False
2. If the support vector points are farther from the hyperplane, then this hyperplane
can also be called as margin maximizing hyperplane.
a) True
b) False
3. Which of the following statements is not true about the C parameter in SVM?
a) Large values of C give solutions with less misclassification errors
b) Large values of C give solutions with smaller margin
c) Small values of C give solutions with bigger margin
d) Small values of C give solutions with less classification errors
5. The maximum margin linear classifier is the linear classifier with the maximum
margin.
a) True
b) False
7. Hard SVM is the learning rule in which return an ERM hyperplane that separates
the training set with the largest possible margin.
a) True
b) False
8. The output of hard-SVM is the separating hyperplane with the largest margin.
a) True
b) False
1. The Soft SVM assumes that the training set is linearly separable.
a) True
b) False
3. Linear Soft margin SVM can only be used when the training data are linearly
separable.
a) True
b) False
9. The slack variable value of the point on the decision boundary of the Soft SVM is
equal to one.
a) True
b) False
1. Which of the following statements is not true about the Decision tree?
a) It can be applied on binary classification problems only
b) It is a predictor that predicts the label associated with an instance by traveling
from a root node of a tree to a leaf
c) At each node, the successor child is chosen on the basis of a splitting of the input
space
d) The splitting is based on one of the features or on a predefined set of splitting
rules
3. Which of the following statements is not true about a splitting rule at internal nodes
of the tree based on thresholding the value of a single feature?
a) It move to the right or left child of the node on the basis of 1[xi < ϑ], where i ∈ [d]
is the index of the relevant feature
b) It move to the right or left child of the node on the basis of 1[xi < ϑ], where ϑ ∈ R
is the threshold
c) Here a decision tree splits the instance space, X = Rd, into cells, where each leaf
of the tree corresponds to one cell
d) Splits based on thresholding the value of a single feature are also known as
multivariate splits
3. Which of the following statements is not true about the Decision tree?
a) It starts with a tree with a single leaf and assign this leaf a label according to a
majority vote among all labels over the training set
b) It performs a series of iterations and on each iteration, it examine the effect of
splitting a single leaf
c) It defines some gain measure that quantifies the improvement due to the split
d) Among all possible splits, it either choose the one that minimizes the gain
and perform it, or choose not to split the leaf at all
5. Which of the following statements is not true about the ID3 algorithm?
a) It is used to generate a decision tree from a dataset
b) It begins with the original set S as the root node
c) On each iteration of the algorithm, it iterates through every unused attribute of the
set S and calculates the entropy or the information gain of that attribute
d) Finally it selects the attribute which has the largest entropy value
12. Which of the following statements is not true about Information Gain?
a) It is used to determine which feature/attribute gives us the maximum information
about a class
b) It is based on the concept of entropy, which is the degree of impurity or disorder
c) It aims to reduce the level of entropy starting from the root node to the leave
nodes
d) It is often promote the level of entropy starting from the root node to the
leave nodes
1. In the ID3 algorithm the returned tree will usually be very large.
a) True
b) False