Model Performance metrics: How well does my model perform? – Part 2

Tavish Last Updated : 17 Apr, 2015

5 min read

The popularity of the last article forces us to publish this article this soon. In the last article, we discussed a few performance metrics used for classification problems. We saw confusion matrix is most commonly used with class output models, however can also be used with probability output models using a threshold probability. We also saw closely linked metrics like KS, Lift and gain. These metrics are generally used when the objective is to target a few out of many. These metrics also help us find out what will be the approx. new response rate if the targeting were revised as per the model.In this article we will take a look at few more metrics of evaluation for classification problems.

Illustrative Example

For the entire Classification model evaluation metric discussion, I have used my predictions for the problem BCI challenge from Kaggle (link) . The solution of the problem is irrelevant for the discussion, however the final predictions on the training set has been used for this article. The predictions made for this problem were probability outputs which have been converted to class outputs assuming a threshold of 0.5 .

Area Under the ROC curve (AUC – ROC)

This is again one of the popular metrics used in the industry. The biggest advantage of using ROC curve is that it is independent of the change in proportion of responders. This statement will get clearer in the following sections.

Let’s first try to understand what is ROC (Receiver operating characteristic) curve. If we look at the confusion matrix below, we observe that for a probabilistic model, we get different value for each metric.

Hence, for each sensitivity, we get a different specificity.The two vary as follows:

The ROC curve is the plot between sensitivity and (1- specificity). (1- specificity) is also known as false positive rate and sensitivity is also known as True Positive rate. Following is the ROC curve for the case in hand.

Let’s take an example of threshold = 0.5 (refer to last article for details). Here is the confusion matrix :

As you can see, the sensitivity at this threshold is 99.6% and the (1-specificity) is ~60%. This coordinate becomes on point in our ROC curve. To bring this curve down to a single number, we find the area under this curve (AUC). Note that the area of entire square is 1*1 = 1. Hence AUC itself is the ratio under the curve and the total area. For the case in hand, we get AUC ROC as 96.4%. Following are a few thumb rules:

.90-1 = excellent (A)
.80-.90 = good (B)
.70-.80 = fair (C)
.60-.70 = poor (D)
.50-.60 = fail (F)

We see that we fall under the excellent band for the current model. But this might simply be over-fitting. In such cases it becomes very important to to in-time and out-of-time validations.

Note : For a model which gives class as output, will be represented as a single point in ROC plot. Such models cannot be compared with each other as the judgement needs to be taken on a single metric and not using multiple metrics.For instance model with parameters (0.2,0.8) and model with parameter (0.8,0.2) can be coming out of the same model, hence these metrics should not be directly compared.In case of probabilistic model, we were fortunate enough to get a single number which was AUC-ROC. But still we need to look at the entire curve to make conclusive decisions. It is also possible that one model performs better in some region and other performs better in other.

Advantages of using ROC over other metrics like Lift cuve:

Lift is dependent over the total response rate of the population. Hence if the response rate of the population changes, the same model will give a different lift chart. A solution to this concern can be true lift chart (finding the ratio of lift and perfect model lift at each decile). But such ratio rarely makes sense for the business. ROC curve on the other hand is almost independent of the response rate. This is because it has the two axis coming out from columnar calculations of confusion matrix. The numerator and denominator of both x and y axis will change on similar scale in case of response rate shift.

Gini Coefficient

Gini coefficient is sometimes used in classification problems. Gini coefficient can be straigh away derived from the AUC ROC number. Gini is nothing but ratio between area between the ROC curve and the diagnol line & the area of the above triangle. Following is the formulae used :

Gini = 2*AUC – 1

Gini above 60% is a good model. For the case in hand we get Gini as 92.7%.

Concordant – Discordant ratio

Again one of the most important metric for any classification predictions problem. To undertand this let’s assume we have 3 students who have some likelihood to pass this year. Following are our predictions :

A – 0.9

B – 0.5

C – 0.3

Now , think of if we were to fetch pairs of two from these three student, how many pairs will we have? We will have 3 pairs : AB , BC, CA. Now after the year ends we saw that A and C passed this year while B failed. No we choose all the pairs where we will find one responder and other non-responder. How many such pairs do we have? We have two pairs AB and BC. Now for each of the 2 pairs, the concordant pair is where the probability of responder was higher than non-responder. Whereas discordant pair is where the vice-versa holds true. In case both the probabilities were equal, we say its a tie. Let’s see what happens in our case :

AB – Concordant

BC – Discordant

Hence, we have 50% of concordant cases in this example. Concordant ratio of more than 60% is considered to be a good model. This metric generally is not used when deciding how many customer to target etc. It is primarily used to access the model’s predictive power. For decisions like how many to target are again taken by KS / Lift charts.

End Notes

This article completes the list of commonly used performance metrics in classification models. In next few articles we will also talk about performance metrics in regression models.

Did you find the article useful? Which metrics do you prefer and why? Do let us know your thoughts about this article in the box below.

If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page.

Tavish

Free Courses

4.6

Exploratory Data Analysis with Python & GenAI

Learn EDA with Python: Transform data into insights using PandasAI & more.

4.5

Ace a Data Scientist Interview in 2025

Build a powerful 2025-ready data science resume using AI tools.

4.7

Introduction to CrewAI: Building a Researcher Assistant Agent

Build smart AI agents with CrewAI to automate tasks and solve problems.

4.7

Understanding the working of Neural Networks

Learn the neural network basics, concepts, layers, and activation functions.

4.5

No Code Predictive Analytics with Orange

No-code AI course for business pros with real-world ML use cases.

rohit

Great article Tavish. I have just one question, in case of Concordance-Discordance if we have probabilities as .48-.50 pairs of probabiliies and suppose we get 80% concordance then it won't be much good model because probabilities are too much close to each other.

Show 1 reply

Tavish Srivastava

Good question. If probabilities are very close, concordance ratio might just be noise. So you definitely need to do in-time and out-of-time validation. If the concordance ratio is still the same, our model can differentiate between observations well. So, if rank ordering is the only parameter you are looking at, model is doing well. Tavish

Reading list

Model Performance metrics: How well does my model perform? – Part 2

Illustrative Example

Area Under the ROC curve (AUC – ROC)

Gini Coefficient

Concordant – Discordant ratio

End Notes

If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page.

Login to continue reading and enjoy expert-curated content.

Free Courses

Exploratory Data Analysis with Python & GenAI

Ace a Data Scientist Interview in 2025

Introduction to CrewAI: Building a Researcher Assistant Agent

Understanding the working of Neural Networks

No Code Predictive Analytics with Orange

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Model Performance metrics: How well does my model perform? – Part 2

Illustrative Example

Area Under the ROC curve (AUC – ROC)

Gini Coefficient

Concordant – Discordant ratio

End Notes

If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page.

Login to continue reading and enjoy expert-curated content.

Free Courses

Exploratory Data Analysis with Python & GenAI

Ace a Data Scientist Interview in 2025

Introduction to CrewAI: Building a Researcher Assistant Agent

Understanding the working of Neural Networks

No Code Predictive Analytics with Orange

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques