Part B Chapter 7 (Evaluation)
Part B Chapter 7 (Evaluation)
Part B Chapter 7 (Evaluation)
REVISION NOTES
What is evaluation?
Evaluation is the process of understanding the reliability of any AI model, based on outputs by
feeding test dataset into the model and comparing with actual answers. There can be different
Evaluation techniques, depending of the type and purpose of the model.
The various terms which are very important to the evaluation process.
Here, we can see in the picture that a forest fire has broken out in the forest. The model predicts a Yes
which means there is a forest fire. The Prediction matches with the Reality. Hence, this condition is
termed as True Positive.
Here the reality is that there is no forest fire. But the machine has incorrectly predicted that there is a
forest fire. This case is termed as False Positive.
Here, a forest fire has broken out in the forest because of which the Reality is Yes but the machine
has
incorrectly predicted it as a No which means the machine predicts that there is no Forest Fire.
Therefore, this case becomes False Negative.
Confusion matrix
The result of comparison between the prediction and reality can be recorded in what we call the
confusion matrix. The confusion matrix allows us to understand the prediction results. Note that it is
not an evaluation metric but a record which can help in evaluation. Let us once again take a look at
the four conditions that we went through in the Forest Fire example:
Evaluation Methods
Accuracy, precision, and recall are the three primary measures used to assess the success of a
classification algorithm.
Accuracy
Accuracy allows you to count the total number of accurate predictions made by a model. The
accuracy calculation is as follows: How many of the model predictions were accurate will be
determined by accuracy. True Positives and True Negatives are what accuracy considers.
Here, total observations cover all the possible cases of prediction that can be True Positive (TP), True
Negative (TN), False Positive (FP) and False Negative (FN).
Recall
It can be described as the percentage of positively detected cases that are positive. The scenarios
where a fire actually existed in reality but was either correctly or incorrectly recognised by the
machine are heavily considered. That is, it takes into account both False Negatives (there was a forest
fire but the model didn’t predict it) and True Positives (there was a forest fire in reality and the model
anticipated a forest fire).
TP
Recall =
TP FN
Which Metric is Important?
Depending on the situation the model has been deployed in, choosing between Precision and Recall is
necessary. A False Negative can cost us a lot of money and put us in danger in a situation like a forest
fire. Imagine there is no need for a warning, even in the case of a forest fire. The entire forest might
catch fire.
Viral Outbreak is another situation in which a False Negative might be harmful. Consider a scenario
in which a fatal virus has begun to spread but is not being detected by the model used to forecast viral
outbreaks. The virus may infect numerous people and spread widely.
Consider a model that can determine whether a mail is spam or not. People would not read the letter if
the model consistently predicted that it was spam, which could lead to the eventual loss of crucial
information.
The cost of a False Positive condition in this case (predicting that a message is spam when it is not)
would be high.
Take a look at the formula and think of when can we get a perfect F1 score?
An ideal situation would be when we have a value of 1 (that is 100%) for both Precision and Recall.
In that case, the F1 score would also be an ideal 1 (100%). It is known as the perfect value for F1
Score. As the values of both Precision and Recall ranges from 0 to 1, the F1 score also ranges from 0
to 1.
Let us explore the variations we can have in the F1 Score: