Confusion Matrices: Inconsistent labels may lead to confusion
For this blog, I will explain possible reasoning why precision, recall, and false-positive rates may be mistaken for the other. Before I dive into my hypothesis, I will reclarify what each evaluation metrics are.
It’s important to note that evaluation metrics(accuracy, precision, recall, etc.) are tools used to grade the model performance after training. For example, in binary classification, the model should return if an entry is of the positive case or negative case. The results can be gauged with ground truths to either be a True Positive (TP), False Positive (FP), False Negative (FN), or True Negative.
Accuracy is a common evaluation metric and is calculated by the total number of true positives and true numbers are divided by the total number of observations. However, a high accuracy score may be misleading if the target variable is highly imbalanced.
In response to resolve this problem, metrics such as recall, precision, or a combination of both(f1-score) is used. Recall score is used as a metric when the business case is more sensitive to False Negatives than False Positives. (For example, it would be bad if a model predicted people to not have a deadly disease even though in truth they do.) If the business case requires to be higher sensitivity toward False Positives than False Negatives, then precision is used.
It’s also quite common to plot the total true positives, false positives, etc. on a confusion matrix such as the one below:
I have also included the False Positive Rate, as well as True Positive Rate(same equation as recall) as these values will be needed when plotting a ROC curve. In short, a ROC curve is a plot of the FPR and TPR for threshold values ranging from 0 to 1. I might need to talk about this another time.
I believe the evaluation metrics are often confused because the x-axis and y-axis labels are not consistent when Google Image searching “confusion matrix.” I have seen four different variations of the confusion matrix with different labels and tick labels.
All 4 confusion matrices are the same but the labels and ticks have been swapped and therefore the value positions are changed. This can often lead to misunderstanding if a user is used to only seeing the top left confusion matrix and assume “okay the last 2 rows are used for recall.”
It’s understandable to get confused due to the different axis and labels. I hope this blog may have sorted out some confusion.