Cybercrimes have increased, and many technologies like Artificial Intelligence & Deep Learning, Blockchain Cybersecurity, Zero-Trust, Model, etc. are been used to stop these cybercrimes. In this article, we will discuss the role of the confusion matrix in IDS.
What is a Confusion Matrix?
A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. This gives us a holistic view of how well our classification model is performing and what kinds of errors it is making.
For a binary classification problem, we would have a 2 x 2 matrix as shown below with 4 values:
Let’s decipher the matrix:
- The target variable has two values: Positive or Negative
- The columns represent the actual values of the target variable
- The rows represent the predicted values of the target variable
But wait — what’s TP, FP, FN and TN here? That’s the crucial part of a confusion matrix. Let’s understand each term below.
Understanding True Positive, True Negative, False Positive and False Negative in a Confusion Matrix
True Positive (TP)
- The predicted value matches the actual value
- The actual value was positive and the model predicted a positive value
True Negative (TN)
- The predicted value matches the actual value
- The actual value was negative and the model predicted a negative value
False Positive (FP) — Type 1 error
- The predicted value was falsely predicted
- The actual value was negative but the model predicted a positive value
- Also known as the Type 1 error
False Negative (FN) — Type 2 error
- The predicted value was falsely predicted
- The actual value was positive but the model predicted a negative value
- Also known as the Type 2 error
Let me give you an example to better understand this. Suppose we had a classification dataset with 1000 data points. We fit a classifier on it and get the below confusion matrix:
The different values of the Confusion matrix would be as follows:
- True Positive (TP) = 560; meaning 560 positive class data points were correctly classified by the model
- True Negative (TN) = 330; meaning 330 negative class data points were correctly classified by the model
- False Positive (FP) = 60; meaning 60 negative class data points were incorrectly classified as belonging to the positive class by the model
- False Negative (FN) = 50; meaning 50 positive class data points were incorrectly classified as belonging to the negative class by the model
This turned out to be a pretty decent classifier for our dataset considering the relatively larger number of true positive and true negative values.