Nothing Special   »   [go: up one dir, main page]

Machine Learning Bits

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

1.

Which of the following is a widely used and effective machine learning algorithm based on the
idea of bagging?

Decision Tree

Regression

Classification

Random Forest D

2. To find the minimum or the maximum of a function, we set the gradient to zero because:

The value of the gradient at extreme a of a function is always zero

Depends on the type of problem

Both A and B

None of the above A

3. The most widely used metrics and tools to assess a classification model are:

Confusion matrix

Cost-sensitive accuracy

Area under the ROC curve

All of the above D

4. Which of the following is a good test dataset characteristic?

Large enough to yield meaningful results

Is representative of the dataset as a whole

Both A and B

None of the above C

5. Which of the following is a disadvantage of decision trees?

Factor analysis

Decision trees are robust to outliers

Decision trees are prone to be overfit

None of the above C

6. How do you handle missing or corrupted data in a dataset?


Drop missing rows or columns

Replace missing values with mean/median/mode

Assign a unique category to missing values

All of the above D

7. "What is the purpose of performing cross-validation"

To assess the predictive performance of the models

To judge how the trained model performs outside the sample on test data

Both A and B

None of the above C

8. Why is second order differencing in time series needed?

To remove stationarity

To find the maxima or minima at the local point

Both A and B

None of the above C

9. when performing regression or classification, which of the following is the correct way to
preprocess the data?

Normalize the data ? PCA ? training

PCA ? normalize PCA output ? training

Normalize the data ? PCA ? normalize PCA output ? training

None of the above A

10. Which of the following is an example of feature extraction?

Constructing bag of words vector from an email

Applying PCA projects to a large high-dimensional data

Removing stop words in a sentence

All of the above D

11. What is pca components in Sklearn?

Set of all eigen vectors for the projection space


Matrix of principal components

Result of the multiplication matrix

None of the above options A

12. Which of the following is true about Naive Bayes ?

Assumes that all the features in a dataset are equally important

Assumes that all the features in a dataset are independent

Both A and B

None of the above options C

13. Which of the following statements about regularization is not correct?

Using too large a value of lambda can cause your hypothesis to underfit the data.

Using too large a value of lambda can cause your hypothesis to overfit the data.

Using a very large value of lambda cannot hurt the performance of your hypothesis.

None of the above D

14. How can you prevent a clustering algorithm from getting stuck in bad local optima?

Set the same seed value for each run

Use multiple random initializations

Both A and B

None of the above B

15. Which of the following techniques can be used for normalization in text mining?

Stemming

Lemmatization

Stop Word Removal

Both A and B D

16. In which of the following cases will K-means clustering fail to give good results? 1) Data points
with outlier 2) Data points with different densities 3) Data points with non convex shapes

1 and 2

2 and 3
1, 2, and 3

1 and 3 C

17. Which of the following is a reasonable way to select the number of principal components "k"?

Choose k to be the smallest value so that at least 99% of the varinace is retained.

Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).

Choose k to be the largest value so that 99% of the variance is retained.

Use the elbow method A

18. You run gradient descent for 15 iterations with a=0.3 and compute J(theta) after each iteration.
You find that the value of J(Theta) decreases quickly and then levels off. Based on this, which of
the following conclusions seems most plausible?

Rather than using the current value of a, use a larger value of a (say a=1.0)

Rather than using the current value of a, use a smaller value of a (say a=0.1)

a=0.3 is an effective choice of learning rate

None of the above C

19. What is a sentence parser typically used for?

It is used to parse sentences to check if they are utf-8 compliant.

It is used to parse sentences to derive their most likely syntax tree structures.

It is used to parse sentences to assign POS tags to all tokens.

It is used to check if sentences can be parsed into meaningful tokens. B

20. Suppose you have trained a logistic regression classifier and it outputs a new example x with a
prediction ho(x) = 0.2. This means

Our estimate for P (y=1 | x)

Our estimate for P (y=0 | x)

Our estimate for P (y=1 | x)

Our estimate for P (y=0 | x) B

21. Which of the following is an example of a deterministic algorithm?

PCA

K-Means
None of the above

Both A and B A

22. "Which of the following statement(s) is / are true for Gradient Decent (GD) and Stochastic
Gradient Decent (SGD)? 1.In GD and SGD, you update a set of parameters in an iterative manner
to minimize the error function. 2.In SGD, you have to run through all the samples in your training
set for a single update of a parameter in each iteration. 3.In GD, you either use the entire data or a
subset of training data to update a parameter in each iteration. "

Only 1

Only 2

Only 3

1 and 2 A

23. Which of the following hyper parameter(s), when increased may cause random forest to over fit
the data? 1. Number of Trees 2.Depth of Tree 3.Learning Rate"

Only 1 Only 2 Only 3 1 and 2 B

24. Below are the 8 actual values of target variable in the train file.[0,0,0,1,1,1,1,1]. What is the
entropy of the target variable?

-(5/8 log(5/8) + 3/8 log(3/8))

5/8 log(5/8) + 3/8 log(3/8)

3/8 log(5/8) + 5/8 log(3/8)

5/8 log(3/8) – 3/8 log(5/8) A

25. "Let’s say, you are working with categorical feature(s) and you have not looked at the distribution
of the categorical variable in the test data. You want to apply one hot encoding (OHE) on the
categorical feature(s). What challenges you may face if you have applied OHE on a categorical
variable of train dataset?

All categories of categorical variable are not present in the test dataset

Frequency distribution of categories is different in train as compared to the test dataset.

Train and Test always have same distribution.

Both A and B D

26. Let’s say, you are using activation function X in hidden layers of neural network. At a particular
neuron for any given input, you get the output as “-0.0001” Which of the following activation
function could X represent?
ReLU

tanh

SIGMOID

None of these B

27. LogLoss evaluation metric can have negative values.

TRUE FALSE B

28. "Which of the following statements is/are true about “Type-1” and “Type-2” errors?

1.Type1 is known as false positive and Type2 is known as false negative.

2.Type1is known as false negative and Type2 is known as false positive.

3.Type1 error occurs when we reject a null hypothesis when it is actually true."

Only 1

Only 2

Only 3

1 and 3 D

29. "Which of the following is/are one of the important step(s) to pre-process the text in NLP based
projects?? Stemming? Stop word removal? Object Standardization"

1 and 2

1 and 3

2 and 3

1,2 and 3 D

30. "Suppose you want to project high dimensional data into lower dimensions. The two most
famous dimensionality reduction algorithms used here are PCA and t-SNE.Let’s say you have
applied both algorithms respectively on data “X” and you got the datasets “X_projected_PCA” ,
“X_ projected_ tSNE”.Which of the following statementsis true for “X_projected_PCA” &
“X_projected_tSNE” ?

X_projected_PCA will have interpretation in the nearest neighbour space.

X_projected_tSNE will have interpretation in the nearest neighbour space.


Both will have interpretation in the nearest neighbour space.

None of them will have interpretation in the nearest neighbour space. B

31. "Adding a non-important feature to a linear regression model may result in.
1.Increase in R-square 2.Decrease in R-square"

Only 1 is correct

Only 2 is correct

Either 1 or 2

None of these A

32. "Suppose, you are given three variables X, Y and Z. The Pearson correlation coefficients for (X,
Y), (Y, Z) and (X, Z) are C1, C2 & C3 respectively. Now, you have added 2 in all values of X (i.e
new values become X+2), subtracted 2 from all values of Y (i.e. new values are Y-2) and Z
remains the same. The new coefficients for (X,Y), (Y,Z) and (X,Z) are given by D1, D2 & D3
respectively. How do the values of D1, D2 & D3 relate to C1, C2 & C3?

D1= C1, D2 < C2, D3 > C3

D1 = C1, D2 > C2, D3 < C3

D1 = C1, D2 = C2, D3 = C3

Cannot be determined C

33. "Imagine, you are solving a classification problem with highly imbalanced class. The majority
class is observed 99% of times in the training data. Your model has 99% accuracy after taking the
predictions on test data. Which of the following is true in such a case?

1.Accuracy metric is not a good idea for imbalanced class problems.

2.Accuracy metric is a good idea for imbalanced class problems.

3.Precision and recall metrics are good for imbalance class problems.

4.Precision and recall metrics aren’t good for imbalanced class problems"

1 and 3

1 and 4

2 and 3

2 and 4 A
34. "In ensemble learning, you aggregate the predictions for weak learners, so that an ensemble of
these models will give a better prediction than prediction of individual models.Which of the
following statements is / are true for weak learners used in ensemble model?

1.They don’t usually overfit.

2.They have high bias, so they cannot solve complex learning problems

3.They usually overfit.

1 and 2 1 and 3 2 and 3 Only 1 A

35. "Which of the following options is/are true for K-fold cross-validation? 1.Increase in K will result in
higher time required to cross validate the result. 2.Higher values of K will result in higher confidence on
the cross-validation result as compared to lower value of K. 3.If K=N, then it is called Leave one out
cross validation, where N is the number of observations.

1 and 2

2 and 3

1 and 3

1,2 and 3 d

36.What would you do in PCA to get the same projection as SVD?

Transform data to zero mean

Transform data to zero median

Not possible

None of these A

37."It is possible to construct a k-NN classification algorithm based on this black box alone.

Note: Where n (number of training observations) is very large compared to k.

TRUE FALSE A

38. "Instead of using 1-NN black box we want to use the j-NN (j>1) algorithm as black box.

Which of the following option is correct for finding k-NN using j-NN?1.J must be a proper

factor of k. 2.J > k.3.Not possible"

1 2 3 4 A
39.Which of the following value of K will have least leave-one-out cross validation accuracy?

1NN
3NN

4NN

All have same leave one out error A

40."Suppose we have a dataset which can be trained with 100% accuracy with help of a

decision tree of depth 6. Now consider the points below and choose the option based

on these points.Note: All other hyper parameters are same and other factors are not affected.

1.Depth 4 will have high bias and low variance 2.Depth 4 will have low bias and low
variance

"

Only 1

Only 2

Both 1 and 2

None of the above A

41 "Which of the following options can be used to get global minima in k-Means Algorithm? 1.Try to run
algorithm for different centroid initialization 2.Adjust number of iterations 3.Find out the optimal number
of clusters"

2 and 3

1 and 3

1 and 2

All of above D

42 "For which of the following hyper parameters, higher value is better for decision tree algorithm?
1.Number of samples used for split 2.Depth of tree 3.Samples for leaf"

1 and 2

2 and 3

1 and 3

Can’t say D

43 What is the dimension of output feature map when you are using the given parameters.

28 width, 28 height and 8 depth


13 width, 13 height and 8 depth

28 width, 13 height and 8 depth

13 width, 28 height and 8 depth A

44 What is the dimensions of output feature map when you are using following parameters.

28 width,28 height and 8 depth

13 width, 13 height and 8 depth

28 width, 13 height and 8 depth

13 width, 28 height and 8 depth B

45. k-NN algorithm does more computation on test time rather than train time.

TRUE FALSE A

46. Which of the following option is true about k-NN algorithm?

It can be used for classification

It can be used for regression

It can be used in both classification and regression

none of these C

47. "Which of the following statement is true about k-NN algorithm?1.k-NN performs much better if all
of the data have the same scale 2.k-NN works well with a small number of input variables (p),but
struggles when the number of inputs is very large3.k-NN makes no assumptions about the functional form
of the problem being solved"

1 and 2

1 and 3

Only 1

All of the above D

48. Which of the following machine learning algorithm can be used for imputing missing values of both
categorical and continuous variables?

K-NN

Linear Regression

Logistic Regression A
49. Which of the following is true about Manhattan distance?

It can be used for continuous variables

It can be used for categorical variables

It can be used for categorical as well as continuous

None of these A

50. "Which of the following distance measure do we use in case of categorical variables in k-NN?
1.Hamming Distance 2.Euclidean Distance3.Manhattan Distance"

1 2 3 4 A

51.Which of the following will be Euclidean Distance between the two data point A(1,3) and B(2,3)?

1 2 4 8 A

52.Which of the following will be Manhattan Distance between the two data point A(1,3) and B(2,3)?

1 2 4 8 A

53.Which of the following will be true about k in k-NN in terms of Bias?

When you increase the k the bias will be increases

When you decrease the k the bias will be increases

Can’t say

None of these A

54.Which of the following will be true about k in k-NN in terms of variance?

When you increase the k the variance will increases

When you decrease the k the variance will increases

Can’t say

None of these B

55.When you find noise in data which of the following option would you consider in k-NN?

I will increase the value of k

I will decrease the value of k

Noise cannot be dependent on value of k

None of these A
56."In k-NN it is very likely to overfit due to the curse of dimensionality. Which of the following option
would you consider to handle such problem? 1.Dimensionality Reduction 2.Feature selection"

1 2 1 AND 2 NONE OF THESE C

57. "Which of the following is/are true about Random Forest and Gradient Boosting ensemble methods?
1.Both methods can be used for classification task 2.Random Forest is use for classification whereas
Gradient Boosting is use for regression task 3.Random Forest is use for regression whereas Gradient
Boosting is use for Classification task 4.Both methods can be used for regression task"

1 2 4 1 AND 4 D

58. Which of the following algorithm are not an example of ensemble learning algorithm?

Random Forest

Decision Trees

Extra Trees

Gradient Boosting B

59."Suppose you are using a bagging based algorithm say a Random Forest in model building. Which of
the following can be true? 1. Number of tree should be as large as possible 2.You will have
interpretability after using Random Forest"

1 2 1 AND 2 NONE OF THESE A

60. A _________ is a decision support tool that uses a tree-like graph or model of decisions and their
possible consequences, including chance event outcomes, resource costs, and utility.

Decision tree

Graphs

Trees

Neural Networks A

61. Which of the following are the advantage/s of Decision Trees?

Possible Scenarios can be added

Use a white box model, If given result is provided by a model

Worst, best and expected values can be determined for different scenarios

All of the mentioned D

62. Decision Trees can be used for Classification Tasks.


TRUE FALSE A

63.Which is true for neural networks?

It has set of nodes and connections

Each node computes it’s weighted input

Node could be in excited state or non-excited state

All of the mentioned D

64."Which of the following is true for neural networks? (i) The training time depends on the size of the
network.(ii) Neural networks can be simulated on a conventional computer.(iii) Artificial neurons are
identical in operation to biological ones."

All of the mentioned

(ii) is true

(i) and (ii) are true

None of the mentioned C

65 Which algorithm is used for solving temporal probabilistic reasoning?

Hill-climbing search

Hidden Markov model

Depth-first search

Breadth-first search B

66 How does the state of the process is described in HMM?

Literal

Single random variable

Single discrete random variable

None of the mentioned C

67 Where does the additional variables are added in HMM?

Temporal model

Reality model

Probability model
All of the mentioned A

68.Which of the following is a representation learning algorithm?

Neural network

Random Forest

k-Nearest neighbor

None of the above A

69. Increase in size of a convolutional kernel would necessarily increase the performance of a
convolutional neural network.

TRUE FALSE B

70. Which of the following categories would be suitable for this type of problem?

Fine tune only the last couple of layers and change the last layer (classification layer) to
regression layer

Freeze all the layers except the last, re-train the last layer

Re-train the model for the new dataset

None of these A

71. Suppose you have 5 convolutional kernel of size 7 x 7 with zero padding and stride 1 in the first layer
of a convolutional neural network.You pass an input of dimension 224 x 224 x 3 through this layer. What
are the dimensions of the data which the next layer will receive?

217 x 217 x 3

217 x 217 x 8

218 x 218 x 5

220 x 220 x 7 C

72. "Suppose we have a neural network with ReLU activation function. Let’s say, we replace
ReLu activations by linear activations.Would this new neural network be able to approximate an XNOR
function? Note: The neural network was able to approximate XNOR function with activation function
ReLu."

YES NO B

73. "Which of the following is a data augmentation technique used in image recognition tasks?
1.Horizontal flipping 2.Random cropping 3.Random scaling. 4.Color jittering 5.Random
translation.6.Random shearing
1, 2, 4

2, 3, 4, 5, 6

1, 3, 5, 6

All of these D

74. "Given an n-character word, we want to predict which character would be the n+1th character in the
sequence. For example, our input is “predictio”(which is a 9-character word) and we have to predict what
would be the 10th character.Which neural network architecture would be suitable to complete this task?"

Fully-Connected Neural Network

Convolutional Neural Network

Recurrent Neural Network

Restricted Boltzmann Machine C

75.What is generally the sequence followed when building a neural network architecture for semantic
segmentation for image?

Convolutional network on input and deconvolutional network on output

Deconvolutional network on input and convolutional network on output


A

76. What is the technical difference between vanilla back propagation algorithm and back propagation
through time (BPTT) algorithm?

Unlike backprop, in BPTT we sum up gradients for corresponding weight for


each time step

Unlike backprop, in BPTT we subtract gradients for corresponding weight for each
time step A

77. "Exploding gradient problem is an issue in training deep networks where the gradient gets so large
that the loss goes to an infinitely high value and then explodes. What is the probable approach when
dealing with “Exploding Gradient” problem in RNNs?"

Use modified architectures like LSTM and GRUs

Gradient clipping

Dropout

None of these B

78.Which of the following is not a direct prediction technique for NLP tasks?
Recurrent Neural Network

Skip-gram model

PCA

Convolutional neural network C

79. Back propagation works by first calculating the gradient of ___ and then propagating it backwards.

Sum of squared error with respect to inputs

Sum of squared error with respect to weights

Sum of squared error with respect to outputs

None of the above C

80. A recurrent neural network can be unfolded into a full-connected neural network with infinite length

TRUE FALSE A

81. It is generally recommended to replace pooling layers in generator part of convolutional generative
adversarial nets with ________ ?

Affine layer

Strided convolutional layer

Fractional strided convolutional layer

ReLU layer C

82. In a neural network, knowing the weight and bias of each neuron is the most important step. If you
can somehow get the correct value of weight and bias for each neuron, you can approximate any function.
What would be the best way to approach this?

Assign random values and pray to God they are correct

Search every possible combination of weights and biases till you get the best value

Iteratively check that after assigning a value how far you are from the best values,
and slightly change, the assigned values ,values to make them better

None of these C

83. "What are the steps for using a gradient descent algorithm? 1.Calculate error between the actual value
and the predicted value 2.Reiterate until you find the best weights of network 3.Pass an input through the
network and get values from output layer 4.Initialize random weight and bias 5.Go to each neurons
which contributes to the error and change its respective values to reduce the error"
1, 2, 3, 4, 5

5, 4, 3, 2, 1

3, 2, 1, 5, 4

4, 3, 1, 5, 2 D

84.“Convolutional Neural Networks can perform various types of transformation (rotations or scaling) in
an input”. Is the statement correct True or False?

TRUE FALSE B

85 Which of the following techniques perform similar operations as dropout in a neural network?

Bagging

Boosting

Stacking

None of these A

86 Which of the following gives non-linearity to a neural network?

Stochastic Gradient Descent

Rectified Linear Unit

Convolution function

None of the above B

87. "What is the sequence of the following tasks in a perceptron? 1.Initialize weights of perceptron
randomly. 2.Go to the next batch of dataset. 3.If the prediction does not match the output, change the
weights 4.For a sample input, compute an output"

1, 2, 3, 4

4, 3, 2, 1

3, 1, 2, 4

1, 4, 3, 2 D

88 Can a neural network model the function (y=1/x)?

YES NO A

89 In which neural net architecture, does weight sharing occur?

Convolutional neural Network


Recurrent Neural Network

Fully Connected Neural Network

Both A and B D

90. The number of neurons in the output layer should match the number of classes (Where the number of
classes is greater than 2) in a supervised learning task. True or False?

TRUE FALSE B

91.In a neural network, which of the following techniques is used to deal with overfitting?

Dropout

Regularization

Batch Normalization

All of these D

92. "Y = ax^2 + bx + c (polynomial equation of degree 2) Can this equation be represented by a neural
network of single hidden layer with linear threshold?"

YES NO B

93. What is a dead unit in a neural network?

A unit which doesn’t update during training by any of its neighbour

A unit which does not respond completely to any of the training patterns

The unit which produces the biggest sum-squared error

None of these A

94. Which of the following statement is the best description of early stopping?

Train the network until a local minimum in the error function is reached

Simulate the network on a test dataset after every epoch of training. Stop training when the
generalization error starts to increase

Add a momentum term to the weight update in the Generalized Delta Rule, so that training
converges more quickly

A faster version of backpropagation, such as the `Quickprop’ algorithm B

95. What if we use a learning rate that’s too large?

Network will converge


Network will not converge

BOTH

Can’t Say B

96. Suppose a convolutional neural network is trained on ImageNet dataset (Object recognition dataset).
This trained model is then given a completely white image as an input . The output probabilities for this
input would be equal for all classes. True or False?

TRUE FALSE B

97. When pooling layer is added in a convolutional neural network, translation in-variance is preserved.
True or False?

TRUE FALSE A

98. Which gradient technique is more advantageous when the data is too big to handle in

RAM simultaneously?

Full Batch Gradient Descent

Stochastic Gradient Descent B

99.For a classification task, instead of random weight initializations in a neural network, we set all the
weights to zero. Which of the following statements is true?

There will not be any problem and the neural network will train properly

The neural network will train but all the neurons will end up recognizing the same
thing

The neural network will not train as there is no net gradient change

None of these B

100 For an image recognition problem (recognizing a cat in a photo), which architecture of

neural network would be better suited to solve the problem?

Multi- Layer Perceptron

Convolutional Neural Network

Recurrent Neural network

Perceptron B

101."What are the factors to select the depth of neural network? 1.Type of neural network (eg. MLP
CNN etc) 2.Input data 3.Computation power, i.e. Hardware capabilities and software capabilities
4.Learning Rate 5.The output function to map"

1, 2, 4, 5

2, 3, 4, 5

1, 3, 4, 5

All of these D

102. Consider the scenario. The problem you are trying to solve has a small amount of data.
Fortunately, you have a pre-trained neural network that was trained on a similar problem. Which of the
following methodologies would you choose to make use of this pre-trained network?

Re-train the model for the new dataset

Assess on every layer how the model performs and only select a few of them

Fine tune the last couple of layers only

Freeze all the layers except the last, re-train the last layer D

103 Increase in size of a convolutional kernel would necessarily increase the performance of a
convolutional network

TRUE FALSE B

104 Which of the following are universal approximators?

Kernel SVM

Neural Networks

Boosted Decision Trees

All of the above D

105 In which of the following applications can we use deep learning to solve the problem?

Protein structure prediction

Prediction of chemical reactions

Detection of exotic particles

All of these D

106 Which of the following statements is true when you use 1×1 convolutions in a CNN?

It can help in dimensionality reduction

It can be used for feature pooling


It suffers less overfitting due to small kernel size

All of the above D

107. "Which of the statements given above is true?

Statement 1: It is possible to train a network well by initializing all the weights as 0

Statement 2: It is possible to train a network well by initializing biases as 0"

Statement 1 is true while Statement 2 is false

Statement 2 is true while statement 1 is false

Both statements are true

Both statements are false B

108 The number of nodes in the input layer is 10 and the hidden layer is 5.

The maximum number of connections from the input layer to the hidden layer are

50

Less than 50

More than 50

It is an arbitrary value A

109 The input image has been converted into a matrix of size 28 X 28 and a kernel/filter

of size 7 X 7 with a stride of 1. What will be the size of the convoluted matrix?

22 X 22

21 X 21

28 X 28

7X7 A

110 In a simple MLP model with 8 neurons in the input layer, 5 neurons in the hidden

layer and 1 neuron in the output layer. What is the size of the weight matrices between

hidden output layer and input hidden layer?

[1 X 5] , [5 X 8]

[8 X 5] , [ 1 X 5]
[8 X 5] , [5 X 1]

[5 x 1] , [8 X 5] D

111.Which of the following functions can be used as an activation function in the output layer

if we wish to predict the probabilities of n classes (p1, p2..pk) such that sum of p over all n equals to 1?

Softmax

ReLu

Sigmoid

Tanh A

112. Assume a simple MLP model with 3 neurons and inputs= 1,2,3. The weights to the input

neurons are 4,5 and 6 respectively. Assume the activation function is a linear constant

value of 3. What will be the output ?

32 643 96 48 C

113. Which of following activation function can’t be used at output layer to classify an image

Sigmoid Tanh ReLU If(x>5,1,0) C

114 In the neural network, every parameter can have their different learning rate.

TRUE FALSE A

115 Dropout can be applied at visible layer of Neural Network model?

TRUE FALSE A

116 Which of the following neural network training challenge can be solved using batch
normalization?

Overfitting

Restrict activations to become too high or low

Training is too slow

Both B and C D

117 Which of the following would have a constant input in each epoch of training a Deep Learning
model?

Weight between input and hidden layer


Weight between hidden and output layer

Biases of all hidden layer neurons

Activation function of output layer A

118 Changing Sigmoid activation to ReLu will help to get over the vanishing gradient issue?

TRUE FALSE A

119 In CNN, having max pooling always decrease the parameters?

TRUE FALSE B

120 BackPropogation cannot be applied when using pooling layers

TRUE FALSE B

121 Suppose there is an issue while training a neural network. The training loss/validation loss

remains constant. What could be the possible reason?

Architecture is not defined correctly

Data given to the model is noisy

Both of these

NONE C

122. "Which of the following statement is true regrading dropout?

1: Dropout gives a way to approximate by combining many different architectures

2: Dropout demands high learning rate 3: Dropout can help preventing overfitting"

Both 1 and 2

Both 1 and 3

Both 2 and 3

All 1, 2 and 3 B

123 Gated Recurrent units can help prevent vanishing gradient problem in RNN.

TRUE FALSE A

124 What steps can we take to prevent overfitting in a Neural Network?

Data Augmentation
Weight Sharing

Early Stopping

All of the above D

125 What do you mean by generalization error in terms of the SVM?

How far the hyperplane is from the support vectors

How accurately the SVM can predict outcomes for unseen data

The threshold amount of error in an SVM B

126 When the C parameter is set to infinite, which of the following holds true?

The optimal hyperplane if exists, will be the one that completely separates the data

The soft-margin classifier will separate the data

None of the above A

127 What do you mean by a hard margin?

The SVM allows very low error in classification

The SVM allows high amount of error in classification

None of the above A

128 The minimum time complexity for training an SVM is O(n2). According to this fact,

what sizes of datasets are not best suited for SVM’s?

Large datasets

Small datasets

Medium sized datasets

Size does not matter A

129 The effectiveness of an SVM depends upon:

Selection of Kernel

Kernel Parameters

Soft Margin Parameter C

All of the above D


130 Support vectors are the data points that lie closest to the decision surface.

TRUE FALSE A

131 The SVM’s are less effective when:

The data is linearly separable

The data is clean and ready to use

The data is noisy and contains overlapping points C

132 Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?

The model would consider even far away points from hyperplane for modeling

The model would consider only the points close to the hyperplane for modeling

The model would not be affected by distance of points from hyperplane for modeling

None of the above B

133 Which of the following are real world applications of the SVM?

Text and Hypertext Categorization

Image Classification

Clustering of News Articles

All of the above D

134 Which algorithm is lazy algorithm

KNN

K Means

Support Vectors machines

Random Forest A

135 Different learning methods does not include?

Memorization

Analogy

Deduction

Introduction D
136 Which of the following is an example of a deterministic algorithm?

PCA K-Means Support Vectors machines KNN A

137 Another Name for output attribute is

Predictor variable

Independent variable

Response variable

dependent vairable A

138 Which of the following can be used to impute data sets based only on information in the training
set. ?

postProcess

preProcess

process

All of the Mentioned B

139 Which of the following is a categorical outcome?

RMSE

RSquared

Accuracy

All of the Mentioned C

140 Which of the following function provides unsupervised prediction ?

cl_forecast

cl_nowcast

cl_precast

None of the Mentioned D

141 Which algorithm is used for small and large data sets

SVM

RF

NAÏVE BAYES
DECISION TREES A

142 Which algorithm forms a Blend of trees

RF

SVM

Decsion Trees

KNN A

143 Which type of algorithms is used for statistical analysis

Classification

Clustering

Regression

Association C

144 Which type learning deala with environment

Supervised

Unsupervised

Reinforcement

None C

145 In which type of learning the input andpredicted output is given for training data

Supervised

Unsupervised

Reinforcement

None A

146 In which type of algorithms the distance between two points is uesd for identying neighbour

RF SVM Decsion Trees KNN D

147 Which algorithm is referred as CART

RF SVM Decsion Trees KNN C

148 Which algorithm involve post and prior probabilities


RF SVM Decsion Trees Naïve Bayes D

149 Which algorithm has Kernel based features

RF SVM Decsion Trees Naïve Bayes B

150 Which algorithm is subjected to over fitting problem

RF SVM Decsion Trees Naïve Bayes C

You might also like