21 Machine Learning Design Patterns Interview Questions (ANSWERED) MLStack

MLStackML SC
Café
1704 Machine Learning & Python
Interview Questions
Answered To Get Your Next Six-Figure Job Offer

MLStackML SC
Today, deploying machine learning models in

production is considered an engineering
discipline. A Machine Learning design pattern is
a way to answer the well known Machine
Learning problems like data quality,
reproducibility, scaling, bias, explainability,
deploying ML models and so on. Follow along
and check the 24 most common and advanced
Machine Learning Design Patterns interview
MLStackML
questions andSC
answers you might face and stay
prepared
Café before your next MLOps & Data
Science interview.
Q1: Name Junior
some
approaches
that you can
take to
implement
the Ensemble
Design
Pattern
Answer
Ensemble design patterns are meta-algorithms that combine several
machine learning submodels as a technique to decrease the bias and/or
variance and improve the general model performance. The idea is that
combining submultiple models helps to improve the machine learning
results. The approach or methods in ensemble learning are:
Bagging (short for bootstrap aggregating): If there are k submodels,

then there are k separate datasets used for training each submodel
of the ensemble. Each dataset is constructed by randomly sampling
(with replacement) from the original training dataset. This means
there is a high probability that any of the k datasets will be missing
some training examples, but also any dataset will likely have repeated
training examples. The aggregation takes place on the output of the
multiple ensemble model members, either an average in the case of
a regression task or a majority vote in the case of classification.
MLStackML SCis to iteratively build an ensemble of models
Boosting: The idea
where each successive model focuses on learning the examples the
Café
previous model got wrong. In short, boosting iteratively improves
upon a sequence of weak learners taking a weighted average to
ultimately yield a strong learner.
Stacking: combines the outputs of a collection of models to make a

prediction. The initial models, which are typical of different model
types, are trained to completion on the full training dataset. Then, a
secondary meta-model is trained using the initial model outputs as
features. This second meta-model learns how to best combine the
outcomes of the initial models to decrease the training error and can
be any type of machine learning model.
Having Machine Learning, Data Science or Python Interview? Check �

30 ML Design Patterns Interview Questions
Source: www.oreilly.com
Q2: Name Junior
some
methods you
know for
Rebalancing
a dataset
using
Rebalancing
Design
MLStackML
Pattern SC
Answer
Café
Rebalancing Design Pattern provides various approaches for handling
datasets that are inherently imbalanced. By this we mean datasets
where one label makes up the majority of the dataset, leaving far fewer
examples of other labels. Some methods to address this are:
Downsampling: it decreases the number of examples from the

majority class used during model training. Is usually combined with
the Ensemble pattern for better results.
Upsampling: we overrepresent our minority class by both replicating

minority class examples and generating additional, synthetic
examples.
Weighted Class: By weighting classes, we tell our model to treat

minority label classes with more importance during training. Exactly
how much importance your model should give to certain examples is
up to you, and is a parameter you can experiment with.

Source: medium.datadriveninvestor.com
Q3: What are Junior
the benefits
of using the
Workflow
Pipeline
Design
Pattern?
Answer
As ML practitioners, we can often find our daily routine following some or
MLStackML SC
all of the steps outlined in the figure below.
Café
Following the idea of the recent monolith-versus-microservice discussion

in the traditional programming domain, the Workflow Pipeline design
pattern aims to isolate and containerize the individual steps, which
turns the ML codes into pipelines. Such practice has the following
benefits:
Ensure the portability, scalability, and maintainability of the ML codes.

When we are working in a team, the workflow pipeline allows different
members can retrieve data from a common, immutable source to train
their own models so that the results can be compared on an equal
footing.
Another benefit of abstracting and isolating individual steps is that one
can insert validation between steps, to monitor the quality and
status. So if there is a data drift, or model quality degradation, it would
be easier to identify and faster to remediate.

Source: changyaochen.github.io
Q4: What's Junior
the
difference
between
Multiclass
Classification
models and
Multi Label
model?
Answer
Multiclass classification problems:
A single example is assigned exactly one label from a group of many

possible classes.
MLStackML SC
For example, if our model is classifying images as cats, dogs, or
rabbits, the softmax output might look like this for a given image:
Café
[.89, .02, .09] . This means our model is predicting an 89%
chance the image is a cat, 2% chance it’s a dog, and 9% chance it’s
a rabbit. Because each image can have only one possible label in this
scenario, we can take the argmax (index of the highest probability) to
determine our model’s predicted class.
Multilabel models:
Refers to problems where we can assign more than one label to a

given training example.
For example, in text models, we can imagine a few scenarios where
text can be labeled with multiple tags. Suppose that we have a
dataset of Stack Overflow questions, we could build a model to
predict the tags associated with a particular question. As an example,
the question “How do I plot a pandas DataFrame?” could be tagged
as “Python”, “pandas”, and “visualization”.

Q5: When would Junior
you use Grid

Search vs
Random Search
for
Hyperparameter
Tuning?
Answer
In Grid Search we define a search space as a grid of hyperparameter
values and evaluate every position in the grid. This is great for spot-
checking combinations that are known to perform well generally.
In Random Search we define a search space as a bounded domain
of hyperparameter values and randomly sample points in that domain.
This is great for discovery and getting hyperparameter combinations
that you would not have guessed intuitively, although it often requires
more time to execute.

MLStackML SCQuestions
30 Optimisation Interview
Source: machinelearningmastery.com
Café
50 Data Scientist Interview Questions
(ANSWERED with PDF) To Crack Next ML
Interview
Q6: When Junior
would you
use the
Hashed
Feature
Design
Pattern?
Answer
The Hashed Feature design pattern is used to address three possible
problems associated with categorical features:
Incomplete vocabulary,
Model size due to cardinality,
and cold start.
For example, One-hot encoding a categorical input variable requires

knowing the vocabulary beforehand. This is not a problem if the input
variable is something like the language a book is written in or the day of
the week that traffic level is being predicted.
But what if the categorical variable in question is something like the

hospital_id of where the baby is born or the physician_id of the person
delivering the baby? Categorical variables like these pose a few problems:
Knowing the vocabulary requires extracting it from the training data.

Due to random sampling, it is possible that the training data does not
contain all the possible hospitals or physicians. The vocabulary
might be incomplete.
The categorical variables have high cardinality. Instead of having

MLStackML SCthree languages or seven days, we have feature
feature vectors with
vectors whose length is in the thousands to millions. They involve so
Café
many weights that the training data may be insufficient. Even if we
can train the model, the trained model will require a lot of space to
store because the entire vocabulary is needed at serving time. Thus,
we may not be able to deploy the model on smaller devices.
After the model is placed into production, new hospitals might be built
and new physicians hired. The model will be unable to make
predictions for these, and so a separate serving infrastructure will be
required to handle such cold-start problems.

Q7: For what Mid
problems
would you use
the Neutral
Class Design
Pattern in
Machine
Learning?
Answer
Neutral Class Design Pattern involves introducing a third class (a neutral
class) when trying to solve a binary classification problem.
This is: Yes, No and Maybe.
The neutral class is helpful in dealing with disagreements among human

experts. For example, suppose we have human labelers to whom we
show patient history and ask them what medication they would prescribe.
We might have a clear signal for acetaminophen in some cases, a clear
signal for ibuprofen in other cases, and a huge swath of cases for which
human labelers disagree. The neutral class provides a way to deal with
such cases.
The need for a neutral class also arises with models that attempt to
predict customer satisfaction. If the training data consists of survey
responses where customers grade their experience on a scale of 1 to
MLStackML SC
10 , it might be helpful to bucket the ratings into three categories: 1 to 4
as bad, 8 to 10 as good, and 5 to 7 is neutral. If, instead, we attempt
Café
to train a binary classifier by thresholding at 6 , the model will spend too
much effort trying to get essentially neutral responses correct.

Q8: How does Mid
Feature Cross
Design Pattern
work in
Machine
Learning?
Answer
The Feature Cross design pattern helps models learn relationships
between inputs faster by explicitly making each combination of input
values a separate feature.
The feature cross is then a synthetic feature formed by concatenating

two or more categorical features in order to capture the interaction
between them. By joining two features in this way, it is possible to encode
nonlinearity into the model, which can allow for predictive abilities beyond
what each of the features would have been able to provide individually.
For example, consider the following dataset:
We can't draw a single straight line that neatly separates the blue and
orange dots. To solve the nonlinear problem we can create a feature cross
named x3 by crossing x1 and x2 :
We can treat this newly minted x3 feature cross just like any other
feature. The linear formula becomes:
A linear algorithm can learn a weight for x3 just as it would for x1 and
x2 . In other words, although x3 encodes nonlinear information, you
don’t need to change how the linear model trains to determine the value of
x3 .
In this way, feature crosses provide a way to have the ML model learn
MLStackML
relationships betweenSC
the features faster. While more complex models like
neural networks and trees can learn feature crosses on their own, using
Café
feature crosses explicitly can allow us to get away with training just a
linear model. Consequently, feature crosses can speed up model training
(less expensive) and reduce model complexity (less training data is
needed).

Source: developers.google.com
ML Interview Questio ns
Q&As
Q9: What ML Mid
Design
Patterns can
you use to
ensure
Reproducibilit
y of Machine
Learning jobs?
Answer
Transform: it works by capturing data preparation dependencies from
the model training pipeline to reproduce them during serving.
Repeatable Splitting: captures the way data is split among training,
MLStackML SC
validation, and test datasets to ensure that a training example that is
used in training is never used for evaluation or testing even as the
Café
dataset grows.
Bridged Schema: it looks at how to ensure reproducibility when the
training dataset is a hybrid of data conforming to a different schema.
Workflow Pipeline: captures all the steps in the machine learning
process to ensure that as the model is retrained, parts of the pipeline
can be reused.
Feature Store: addresses reproducibility and reusability of features
across different machine learning jobs.

17 Linear Algebra Interview Questions To

Brush Before Data Science Interview
Q10: What ML Mid
problems are
solved by the
Transformatio
n Design
Pattern?
Answer
The problem is that the inputs to a machine learning model are not the
features that the machine learning model uses in its computations. In a
text classification model, for example, the inputs are the raw text
documents and the features are the numerical embedding representations
of this text.
The Transform design pattern aims to make it easier to deploy and

maintain Machine Learning models in production by keeping inputs,
features and transforms as separate entities.
Raw data needs in fact to usually go through different preprocessing steps

to then be used as input for a Machine Learning model and some of
MLStackML SC
these transformations needs then to be saved to be reused when
preprocessing data for inference.
Café
For example, normalization/standardization techniques are commonly
applied to numerical data before training an ML model to deal with outliers
and make the data look like more of a gaussian distribution. These
transformations should then be saved so that they could be reused in the
future when new data is made available for inference. If these
transformations would not be saved, we would then create a data skew
between training and serving with the input data provided for inference
having a different distribution compared to the input data used to train the
ML model.

Source: towardsdatascience.com
Q11: What are Mid
some trade-
offs when
using
Embeddings in
Machine
Learning?
Answer
An embedding is a relatively low-dimensional space into which you can
translate high-dimensional vectors. Embeddings make it easier to do
machine learning on large inputs like sparse vectors representing words.
The main trade-off with using an embedding is the compromised

representation of the data. There is a loss of information involved in
going from a high-cardinality representation to a lower-dimensional
representation. In return, we gain information about the closeness and
context of the items.
The lossiness of the representation is controlled by the size of the

embedding layer. In practice, the exact dimensionality of the embedding
space is something that we choose as a practitioner:
By choosing a very small output dimension of an embedding layer,

too much information is forced into a small vector space and context
can be lost.
MLStackML SC
On the other hand, when the embedding dimension is too large, the
embedding loses the learned contextual importance of the features.
Café
The optimal embedding dimension is often found through
experimentation, similar to choosing the number of neurons in a
deep neural network layer.
However, if we’re in a hurry, there are two rules of thumb that we could
take:
1. Use the fourth root of the total number of unique categorical

elements as the embedding dimension.
2. The embedding dimension should be approximate 1.6 times the
square root of the number of unique elements in the category,
and no less than 600 .
For example, suppose we wanted to use an embedding layer to encode a

feature that has 625 unique values:
Using the first rule of thumb, we would choose an embedding

dimension for a plurality of 5 .
Using the second rule of thumb, we'd choose 40 .
If we are doing hyperparameter tuning, it might be worth searching
within this range.

Q12: What Mid
does the
Bridged
Schema
Design Pattern
do?
Answer
The Bridged Schema Design Pattern provides ways to adapt the data
used to train a model from its older, original data schema to newer, better
data.
For example, assuming that we are training a regression model, and one
of the (categorical) inputs is called payment_type . In the older training
data, this has been recorded as cash or card , However, the newer
MLStackML
training data provide SC
more detail on the type of card ( gift_card ,
debit_card , credit_card ) that was used.
Café
What the bridged schema do is find a representation (schema) for the
input (in this example, payment_type ), that works for both the older and
newer data by getting the observed frequency of the new data. In
general, this can be done in two ways:
Probabilistic method:
With this approach, we “impute” the older data.

When we see a payment_type == card in the older data, we
convert it to one of the ( gift_card , debit_card , credit_card ),
with the probability of the observed frequency in the newer
data.
The assumption here is that the distributions of the imputed input
are the same for the older and newer data, which may not hold.
Static method:
With this approach, we one-hot encode the input using the newer
data.
For the running example, the payment_type is one-hot to a
4 -dimension vector, as the cardinality is four (for newer data).
When we encounter the payment_type == card in the older data,

we will represent it as [0, 0.1, 0.3, 0.6] , where the first 0
corresponds to the cash category, and (0.1, 0.3, 0.6) is the
hypothetical, observed frequency of ( gift_card , debit_card ,
credit_card ) from newer data.
Comparing with the probabilistic method, we relax the “same-

distribution” assumption, but allow the ML model to learn the
relationship.

Q13: What's Mid
the difference
between the
Transform and
Feature Store
Design
MLStackML
Patterns in SC
Café
Machine
Learning?
Answer
Transform Design Pattern:
The key idea of this pattern is to separate input, feature, and

estimator. As a whole, they consist of what is considered a model,
that can be readily put into production. The process that turns input to
feature is what we call Transform.
Transform preprocess the raw inputs, e.g., by transforming them to
format/values (features) that are expected by the estimators.
Then the Transform pattern would enforce the consistency of the
feature consumed by the model at training and serving time.
Feature Store Design Pattern:
The Feature Store design pattern simplifies the management and

reuse of features across projects by decoupling the feature creation
process from the development of models using those features.
Feature Store pattern is to provide the availability of input/feature for
training and (particularly) serving.
For example, in the context of predicting the duration of an event, raw

input will need to be transformed (standardization, one-hot encoding,
etc) before passing to a model. In such cases, transform is a better
approach. In other cases, some data like the hour that the event occurs
may need to be passed as raw input to the model and the Feature Store
design pattern would be a better approach.

23 Logistic Regression Interview

Questions (SOLVED) To Nail On ML
Interview
Q14: What's Mid

MLStackML
the main SC
idea
of Café
Reframing
Design Pattern
for a ML
problem?
Answer
The Reframing Design Pattern solves the challenge of posing an
intuitive problem with a changed contextual output. Here, we change
the representation of the output of the problem. For example, an intuitive
regression problem can be reframed into a classification problem and vice
versa.
Take an example where we want to predict the amount of rainfall in a

given location in a timeframe of 20 minutes. Now, this appears to be a
straightforward time series forecasting problem and we’d be taking into
account the historical climate and weather conditions to predict the
amount of rainfall. Alternatively, this can also be defined as a regression
problem as the label (amount of rainfall) is a real number (e.g. 0.5 cm).
Suppose that after training your model several times, you realize that all
your predicted rainfall amounts are off from the real values. The model
says it’ll rain 0.2 cm but it actually rained 0.4 cm, surprisingly for the
same set of features.
The key issue here is that amount of rainfall follows a probabilistic

distribution. For the same weather conditions and features, it sometimes
rains 0.2 cm, and other times it rains 0.4 cm . A regression model is
limited to predict only a single number and the chances of getting it right
are feeble.
So rather than see the problem as a regression, we can reframe our

objective as a classification problem. This classification approach allows
the model to capture the probability distribution of rainfall in different
quantities instead of having to choose one single value of the distribution.
The model then will return the probability of receiving rainfall in a certain
range of amounts as shown below:
Therefore reframing a problem can help when building ml-powered

applications. Instead of narrowing down our predictions to a single real
number, we relax our prediction target to be a discrete probability
distribution.

MLStackML SC
Q15: When and Mid
Café
why would you
use
Checkpoints
for a ML
pipeline?
Answer
In Checkpoints, we store the full state of the model periodically so that
we have partially trained models available. These partially trained models
can serve as the final model (in the case of early stopping) or as the
starting points for continued training (in the cases of machine failure and
fine-tuning).
Checkpoints are useful to deal with complex models. The more complex
a model is (for example, the more layers and nodes a neural network has),
the larger the dataset that is needed to train it effectively. This is because
more complex models tend to have more tunable parameters. As model
sizes increase, the time it takes to fit one batch of examples also
increases. As the data size increases (and assuming batch sizes are
fixed), the number of batches also increases. Therefore, in terms of
computational complexity, this double whammy means that training will
take a long time. So when we have training that takes this long, the
chances of machine failure are uncomfortably high. If there is a problem,
we’d like to be able to resume from an intermediate point -using a
checkpoint, instead of from the very beginning.
How often should we checkpoint? In the case of a neural network, the

model state changes after every batch because of gradient descent. So,
technically, if we don’t want to lose any work, we should checkpoint after
every batch. However, checkpoints are huge and this would add
considerable overhead. Instead, model frameworks typically provide the
option to checkpoint at the end of every epoch.

ML Interview Questio ns
Q&As
MLStackML SC
Q16: When Mid
would you use

Windowed
Inference
Design
Pattern?
Answer
The Windowed Inference design pattern handles models that require an
ongoing sequence of instances in order to run inference. It works by
externalizing the model state and invoking the model from a stream
analytics pipeline. This pattern is useful when:
We want to avoid training–serving skew in the case of temporal

aggregate features. For example, when a machine learning model
requires features that need to be computed from aggregates over
time windows. By externalizing the state to a stream pipeline, the
Windowed Inference design pattern ensures that features calculated
in a dynamic time-dependent way can be correctly repeated between
training and serving.
We create stateful ML models such as recurrent neural networks

and when a stateless model requires stateful input features.

Q17: Explain Senior

MLStackML
how would SC
Café
you build a
Multi-Label
model using
Multi-Label
Design
Pattern?
Answer
For building these kinds of models we could use the Multilabel design
pattern. For neural networks, this design requires changing the activation
function used in the final output layer of the model and choosing how our
application will parse the model output.
The solution is to use the sigmoid activation function in our final output
layer. Rather than generating an array where all values sum to 1 (as in
softmax), each individual value in a sigmoid array is a float between 0
and 1 . That is to say when implementing the Multilabel design pattern,
our label needs to be multi-hot encoded. The length of the multi-hot
array corresponds with the number of classes in our model, and each
output in this label array will be a sigmoid value.
For example, suppose that we building a classifier model and our training
dataset included images with more than one animal: cats, dogs, and
rabbits. The sigmoid output for an image that contained a cat and a dog
but not a rabbit might look like the following: [.92, .85, .11] . This output
means the model is 92% confident the image contains a cat, 85%
confident it contains a dog, and 11% confident it contains a rabbit.

17 Recurrent Neural Network (RNN)

Interview Questions For Data Scientists
and ML Engineers
Q18: What Senior

MLStackML
are the two-SC
Café in the
phases
Two-Phase
Predictions
Design
Pattern?
Answer
The Two-Phase Predictions Design Pattern provides a way to address
the problem of keeping large, complex models performant when they have
to be deployed on distributed devices by splitting the use cases into two
phases:
1. Building the offline model: We start with a smaller model that can
be deployed on-device. The idea is that the model has a simpler task,
such that it can accomplish this task on-device with relatively high
accuracy. It should be small enough that it can be loaded on a mobile
device for quick inference without relying on internet connectivity.
2. Building the cloud model: Then we build a more complex model, we
deploy it in the cloud and it's triggered only when needed if the user
asks for something more complex. Depending on the use case, this
second model could take many different forms.

Q19: When Senior
would you
need to
implement
Transfer
Learning
Design
Pattern?
Answer
MLStackML SC design pattern, we can take a model that
With the Transfer Learning
has been trained on the same type of data for a similar task and apply it to
Café task using our own custom data.
a specialized
By same type of data we mean the same data modality—images,

text, and so forth. For example, use a model that has been pre-
trained on photographs if you are going to use it for photograph
classification and a model that has been pre-trained on remotely
sensed imagery if you are going to use it to classify satellite images.
By similar task, we’re referring to the problem being solved. To do
transfer learning for image classification, for example, it is better to
start with a model that has been trained for image classification,
rather than object detection.
So some scenarios when we would need to use transfer learning are:
To save time and resources from having to train multiple machine

learning models from scratch to complete similar tasks.
As an efficient saving in areas of machine learning that require high
amounts of resources such as image categorization or natural
language processing.
To negate a lack of labelled training data held by an organization, by
using pre-trained models.

Source: machinelearningmastery.com
Q20: When Senior
would you
need to use
the Two-
Phases
predictions
Design
Pattern?
Answer
The Two-Phases predictions Design Pattern is useful when we cannot
always rely on end users having reliable internet connections. In such
situations, models are deployed at the edge, meaning they are loaded on
MLStackML SCrequire an internet connection to generate
a user’s device and don’t
predictions. Given device constraints, models deployed on the edge
Café
typically need to be smaller than models deployed in the cloud, and
consequently require balancing trade-offs between model complexity and
size, update frequency, accuracy, and low latency.
There are various scenarios where we’d want our model deployed on an
edge device:
One example is a fitness tracking device, where a model makes

recommendations for users based on their activity, tracked through
accelerometer and gyroscope movement. It’s likely that a user could
be exercising in a remote outdoor area without connectivity.
Another example is an environmental application that uses
temperature and other environmental data to make predictions on
future trends.
In these cases, we’d still want our application to work, and even if we have
internet connectivity, it may be slow and expensive to continuously
generate predictions from a model deployed in the cloud, so Two-phases
predictions design pattern provides a way to deal with such cases.

Q21: Why Senior
would you
need to use
the
Continued
Model
Evaluation
Design
Pattern?
Answer
The Continued Model Evaluation design pattern handles the common
problem of needing to detect and take action when a deployed model is no
longer fit-for-purpose.
MLStackML
The world is dynamic,SC
but developing a machine learning model usually
creates a static model from historical data. This means that once the
Café
model goes into production, it can start to degrade and its predictions can
grow increasingly unreliable. Two of the main reasons models degrade
over time are concept drift and data drift:
Concept drift occurs whenever the relationship between the model

inputs and the target has changed. This often happens because the
underlying assumptions of your model have changed.
Data drift refers to any change that has occurred to the data being
fed to your model for prediction as compared to the data that was
used for training. Data drift can occur for a number of reasons: the
input data schema changes at the source, feature distributions
change over time, etc.
Continuous model evaluation provides a framework to evaluate a

deployed model’s performance exclusively on new data. This allows us to
detect model staleness as early as possible. This information helps
determine how frequently to retrain a model or when to replace it with a
new version entirely.

23 Supervised Learning Interview

Questions To Crush Data Science
Interview
Café
MLStack.Cafe is a biggest hand-picked collection of top Machine Learning,
Data Science & Python Interview Questions. Land your next 6-figure DS job
offer in no time. Find you next ML Job.
Coded with � using React in Australia
by @aershov24, Full Stack Cafe Pty Ltd �, 2018-2023
Privacy • Terms of Service • Guest Posts • Contacts • FullStack.Cafe
Great based on 65 Trustpilot Reviews

21 Machine Learning Design Patterns Interview Questions (ANSWERED) MLStack

Uploaded by

Copyright:

Available Formats

21 Machine Learning Design Patterns Interview Questions (ANSWERED) MLStack

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

21 Machine Learning Design Patterns Interview Questions (ANSWERED) MLStack

Uploaded by

Copyright:

Available Formats

What is the Ensemble Design Pattern?

What is the Ensemble Design Pattern?

Name some approaches that you can take to implement the Ensemble Design Pattern

Name some approaches that you can take to implement the Ensemble Design Pattern

MLStackML SC

1704 Machine Learning & Python

Answered To Get Your Next Six-Figure Job Offer

Today, deploying machine learning models in

Q1: Name Junior

Bagging (short for bootstrap aggregating): If there are k submodels,

Stacking: combines the outputs of a collection of models to make a

Having Machine Learning, Data Science or Python Interview? Check �

Q2: Name Junior

Downsampling: it decreases the number of examples from the

Upsampling: we overrepresent our minority class by both replicating

Weighted Class: By weighting classes, we tell our model to treat

Having Machine Learning, Data Science or Python Interview? Check �

Q3: What are Junior

Following the idea of the recent monolith-versus-microservice discussion

Ensure the portability, scalability, and maintainability of the ML codes.

Having Machine Learning, Data Science or Python Interview? Check �

Q4: What's Junior

A single example is assigned exactly one label from a group of many

Refers to problems where we can assign more than one label to a

Having Machine Learning, Data Science or Python Interview? Check �

Q5: When would Junior

you use Grid

Having Machine Learning, Data Science or Python Interview? Check �

Q6: When Junior

For example, One-hot encoding a categorical input variable requires

But what if the categorical variable in question is something like the

Knowing the vocabulary requires extracting it from the training data.

The categorical variables have high cardinality. Instead of having

Having Machine Learning, Data Science or Python Interview? Check �

Q7: For what Mid

This is: Yes, No and Maybe.

The neutral class is helpful in dealing with disagreements among human

Having Machine Learning, Data Science or Python Interview? Check �

Q8: How does Mid

The feature cross is then a synthetic feature formed by concatenating

Having Machine Learning, Data Science or Python Interview? Check �

Q9: What ML Mid

Having Machine Learning, Data Science or Python Interview? Check �

17 Linear Algebra Interview Questions To

Q10: What ML Mid

The Transform design pattern aims to make it easier to deploy and

Raw data needs in fact to usually go through different preprocessing steps

Having Machine Learning, Data Science or Python Interview? Check �

Q11: What are Mid

The main trade-off with using an embedding is the compromised

The lossiness of the representation is controlled by the size of the

By choosing a very small output dimension of an embedding layer,

1. Use the fourth root of the total number of unique categorical

For example, suppose we wanted to use an embedding layer to encode a

Using the first rule of thumb, we would choose an embedding

Having Machine Learning, Data Science or Python Interview? Check �

Q12: What Mid

With this approach, we “impute” the older data.

When we encounter the payment_type == card in the older data,