Artificial Intelligence and Machine Learning
Problem Statement¶
Context¶
Businesses like banks which provide service have to worry about problem of 'Customer Churn' i.e. customers leaving and joining another service provider. It is important to understand which aspects of the service influence a customer's decision in this regard. Management can concentrate efforts on improvement of service, keeping in mind these priorities.
Objective¶
You as a Data scientist with the bank need to build a neural network based classifier that can determine whether a customer will leave the bank or not in the next 6 months.
Data Dictionary¶
-
CustomerId: Unique ID which is assigned to each customer
-
Surname: Last name of the customer
-
CreditScore: It defines the credit history of the customer.
-
Geography: A customer’s location
-
Gender: It defines the Gender of the customer
-
Age: Age of the customer
-
Tenure: Number of years for which the customer has been with the bank
-
NumOfProducts: refers to the number of products that a customer has purchased through the bank.
-
Balance: Account balance
-
HasCrCard: It is a categorical variable which decides whether the customer has credit card or not.
-
EstimatedSalary: Estimated salary
-
isActiveMember: Is is a categorical variable which decides whether the customer is active member of the bank or not ( Active member in the sense, using bank products regularly, making transactions etc )
-
Exited : whether or not the customer left the bank within six month. It can take two values ** 0=No ( Customer did not leave the bank ) ** 1=Yes ( Customer left the bank )
Importing necessary libraries¶
# Installing the libraries with the specified version.
!pip install tensorflow==2.15.0 scikit-learn==1.2.2 seaborn==0.13.1 matplotlib==3.7.1 numpy==1.25.2 pandas==2.0.3 imbalanced-learn==0.10.1 -q --user
Note: After running the above cell, please restart the notebook kernel/runtime (depending on whether you're using Jupyter Notebook or Google Colab) and then sequentially run all cells from the one below.
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np
# libaries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Library to split data
from sklearn.model_selection import train_test_split
# library to import to standardize the data
from sklearn.preprocessing import StandardScaler, LabelEncoder
# importing different functions to build models
import tensorflow as tf
from tensorflow import keras
from keras import backend
from keras.models import Sequential
from keras.layers import Dense, Dropout
# importing SMOTE
from imblearn.over_sampling import SMOTE
# importing metrics
from sklearn.metrics import confusion_matrix,roc_curve,classification_report,recall_score
from sklearn.metrics import precision_recall_curve
import random
import time
from sklearn.metrics import roc_curve, auc
# Library to avoid the warnings
import warnings
warnings.filterwarnings("ignore")
# Set the seed using keras.utils.set_random_seed. This will set:
# 1) `numpy` seed
# 2) backend random seed
# 3) `python` random seed
tf.keras.utils.set_random_seed(812)
# If using TensorFlow, this will make GPU ops as deterministic as possible,
# but it will affect the overall performance, so be mindful of that.
tf.config.experimental.enable_op_determinism()
Loading the dataset¶
# Loading dataset from my Google Drive same way I did on my previous Projects
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Reading the dataset¶
# Inputting the file path from my Google Drive to where the foodhub_order.csv data set is located
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/mod4_wk3_bank-1.csv')
# Making a copy of the dataframe to avoid making any changes to the original dataset.
data = df.copy()
Understanding the structure of the data¶
# let's view the first 5 rows of the data
data.head()
| RowNumber | CustomerId | Surname | CreditScore | Geography | Gender | Age | Tenure | Balance | NumOfProducts | HasCrCard | IsActiveMember | EstimatedSalary | Exited | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 15634602 | Hargrave | 619 | France | Female | 42 | 2 | 0.00 | 1 | 1 | 1 | 101348.88 | 1 |
| 1 | 2 | 15647311 | Hill | 608 | Spain | Female | 41 | 1 | 83807.86 | 1 | 0 | 1 | 112542.58 | 0 |
| 2 | 3 | 15619304 | Onio | 502 | France | Female | 42 | 8 | 159660.80 | 3 | 1 | 0 | 113931.57 | 1 |
| 3 | 4 | 15701354 | Boni | 699 | France | Female | 39 | 1 | 0.00 | 2 | 0 | 0 | 93826.63 | 0 |
| 4 | 5 | 15737888 | Mitchell | 850 | Spain | Female | 43 | 2 | 125510.82 | 1 | 1 | 1 | 79084.10 | 0 |
- Observations:
- From a quick view of the 1st 5 rows of data I can see that we can probably remove some columns during data preparation like RowNumber, Surname and CustomerID which will not be useful to our model building.
- I do see some zero account balances. I may decide to leave those, undecided until I dig a little deeper into the data.
# let's view the last 5 rows of the data
data.tail()
| RowNumber | CustomerId | Surname | CreditScore | Geography | Gender | Age | Tenure | Balance | NumOfProducts | HasCrCard | IsActiveMember | EstimatedSalary | Exited | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9995 | 9996 | 15606229 | Obijiaku | 771 | France | Male | 39 | 5 | 0.00 | 2 | 1 | 0 | 96270.64 | 0 |
| 9996 | 9997 | 15569892 | Johnstone | 516 | France | Male | 35 | 10 | 57369.61 | 1 | 1 | 1 | 101699.77 | 0 |
| 9997 | 9998 | 15584532 | Liu | 709 | France | Female | 36 | 7 | 0.00 | 1 | 0 | 1 | 42085.58 | 1 |
| 9998 | 9999 | 15682355 | Sabbatini | 772 | Germany | Male | 42 | 3 | 75075.31 | 2 | 1 | 0 | 92888.52 | 1 |
| 9999 | 10000 | 15628319 | Walker | 792 | France | Female | 28 | 4 | 130142.79 | 1 | 1 | 0 | 38190.78 | 0 |
- Observations:
- From a quick view of the last 5 rows of data it looks like there are about 10,000 rows and from above, there are 14 columns.
- I see more zero account balances as well.
Understand the shape of the dataset¶
# Checking the number of rows and columns in the data
data.shape
(10000, 14)
- Observations:
As confirmed above, there are 10000 rows and 14 columns as indicated from the data.shape command above.
Check the data types of the columns for the dataset¶
# data.info will give me the range of data types across the columns
# including column names, non-null counts and data types of the columns.
# data.dtypes will list each column's data type.
# data.info() gives more details, including non-null counts and memory usage
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10000 entries, 0 to 9999 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 RowNumber 10000 non-null int64 1 CustomerId 10000 non-null int64 2 Surname 10000 non-null object 3 CreditScore 10000 non-null int64 4 Geography 10000 non-null object 5 Gender 10000 non-null object 6 Age 10000 non-null int64 7 Tenure 10000 non-null int64 8 Balance 10000 non-null float64 9 NumOfProducts 10000 non-null int64 10 HasCrCard 10000 non-null int64 11 IsActiveMember 10000 non-null int64 12 EstimatedSalary 10000 non-null float64 13 Exited 10000 non-null int64 dtypes: float64(2), int64(9), object(3) memory usage: 1.1+ MB
- Observations:
- There are 14 columns and 10000 total rows ranging from index 0 to 9999.
- There is no missing data as indicated by the full 10000 non-null values across the columns.
- There are 11 numeric columns and 3 text or object type.
- The associated memory usage of this data frame is 1.1+ MB. This may come down a bit once we begin data preparation for modeling.
Checking the Statistical Summary¶
data.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| RowNumber | 10000.0 | 5.000500e+03 | 2886.895680 | 1.00 | 2500.75 | 5.000500e+03 | 7.500250e+03 | 10000.00 |
| CustomerId | 10000.0 | 1.569094e+07 | 71936.186123 | 15565701.00 | 15628528.25 | 1.569074e+07 | 1.575323e+07 | 15815690.00 |
| CreditScore | 10000.0 | 6.505288e+02 | 96.653299 | 350.00 | 584.00 | 6.520000e+02 | 7.180000e+02 | 850.00 |
| Age | 10000.0 | 3.892180e+01 | 10.487806 | 18.00 | 32.00 | 3.700000e+01 | 4.400000e+01 | 92.00 |
| Tenure | 10000.0 | 5.012800e+00 | 2.892174 | 0.00 | 3.00 | 5.000000e+00 | 7.000000e+00 | 10.00 |
| Balance | 10000.0 | 7.648589e+04 | 62397.405202 | 0.00 | 0.00 | 9.719854e+04 | 1.276442e+05 | 250898.09 |
| NumOfProducts | 10000.0 | 1.530200e+00 | 0.581654 | 1.00 | 1.00 | 1.000000e+00 | 2.000000e+00 | 4.00 |
| HasCrCard | 10000.0 | 7.055000e-01 | 0.455840 | 0.00 | 0.00 | 1.000000e+00 | 1.000000e+00 | 1.00 |
| IsActiveMember | 10000.0 | 5.151000e-01 | 0.499797 | 0.00 | 0.00 | 1.000000e+00 | 1.000000e+00 | 1.00 |
| EstimatedSalary | 10000.0 | 1.000902e+05 | 57510.492818 | 11.58 | 51002.11 | 1.001939e+05 | 1.493882e+05 | 199992.48 |
| Exited | 10000.0 | 2.037000e-01 | 0.402769 | 0.00 | 0.00 | 0.000000e+00 | 0.000000e+00 | 1.00 |
- Observations:
- Wow - the oldest person is 92.
- In reviewing the numerical data, In don't see any items that directly stand out.
Checking for duplicates and null values¶
# Let's check for duplicate values in the data
data.duplicated().sum()
0
# Let's check for missing values in the data
round(data.isnull().sum() / data.isnull().count() * 100, 2)
| 0 | |
|---|---|
| RowNumber | 0.0 |
| CustomerId | 0.0 |
| Surname | 0.0 |
| CreditScore | 0.0 |
| Geography | 0.0 |
| Gender | 0.0 |
| Age | 0.0 |
| Tenure | 0.0 |
| Balance | 0.0 |
| NumOfProducts | 0.0 |
| HasCrCard | 0.0 |
| IsActiveMember | 0.0 |
| EstimatedSalary | 0.0 |
| Exited | 0.0 |
- Observations:
- Based on the above review, there are no duplicated rows and there are no null values. All rows/columns are filled in.
Let's check the count of each unique category in each of the categorical variables¶
# Making a list of all categorical variables and assign to cat_col
cat_col = list(data.select_dtypes("object").columns)
# Printing number of count of each unique value in each column
for column in cat_col:
print("Unique values in", column, "are :")
print(data[column].value_counts())
print("-" * 50)
Unique values in Surname are :
Surname
Smith 32
Martin 29
Scott 29
Walker 28
Brown 26
..
Wells 1
Calzada 1
Gresswell 1
Aguirre 1
Morales 1
Name: count, Length: 2932, dtype: int64
--------------------------------------------------
Unique values in Geography are :
Geography
France 5014
Germany 2509
Spain 2477
Name: count, dtype: int64
--------------------------------------------------
Unique values in Gender are :
Gender
Male 5457
Female 4543
Name: count, dtype: int64
--------------------------------------------------
-
Observations:
-
Based on the above, I will probably remove the surname column as it will not add value during model building.
-
Gender is fairly evenly split.
-
France has the greatest number of customers.
Let's check the number of unique values in each column¶
# Let's check the number of unique values in each column
data.nunique()
| 0 | |
|---|---|
| RowNumber | 10000 |
| CustomerId | 10000 |
| Surname | 2932 |
| CreditScore | 460 |
| Geography | 3 |
| Gender | 2 |
| Age | 70 |
| Tenure | 11 |
| Balance | 6382 |
| NumOfProducts | 4 |
| HasCrCard | 2 |
| IsActiveMember | 2 |
| EstimatedSalary | 9999 |
| Exited | 2 |
-
Observations:
-
Based on the above, I don't see any particular item that stands out in the unique values across the columns.
Checking percentages of our Target Variable¶
data["Exited"].value_counts(1)
| proportion | |
|---|---|
| Exited | |
| 0 | 0.7963 |
| 1 | 0.2037 |
-
Observations:
-
Based on the above, it looks like ~80% of customers have not left the bank for another bank and ~20% have left.
Dropping columns Surname, RowNumber, and CustomerID¶
#RowNumber , CustomerId and Surname are unique hence dropping it
data = data.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1)
# Let's verify there are only 11 colums now in our dataset
data.shape
(10000, 11)
# Let's verify the 'RowNumber', 'CustomerId', and 'Surname' columns have in fact been dropped from our dataset.
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10000 entries, 0 to 9999 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 CreditScore 10000 non-null int64 1 Geography 10000 non-null object 2 Gender 10000 non-null object 3 Age 10000 non-null int64 4 Tenure 10000 non-null int64 5 Balance 10000 non-null float64 6 NumOfProducts 10000 non-null int64 7 HasCrCard 10000 non-null int64 8 IsActiveMember 10000 non-null int64 9 EstimatedSalary 10000 non-null float64 10 Exited 10000 non-null int64 dtypes: float64(2), int64(7), object(2) memory usage: 859.5+ KB
-
Observations:
-
As expected there are only 11 columns now and the 3 columns were in fact dropped from our dataset.
Exploratory Data Analysis¶
The below functions need to be defined to carry out the Exploratory Data Analysis.
def histogram_boxplot(data, feature, figsize=(15, 10), kde=False, bins=None):
"""
Boxplot and histogram combined
data: dataframe
feature: dataframe column
figsize: size of figure (default (15,10))
kde: whether to show the density curve (default False)
bins: number of bins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)}, # 25% / 75% split for histogram / boxplot
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a triangle will indicate the mean value of the column
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram with green line
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram with black green
# function to create labeled barplots
def labeled_barplot(data, feature, perc=False, n=None):
"""
Barplot with percentage at the top
data: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 1, 5))
else:
plt.figure(figsize=(n + 1, 5))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n].sort_values(),
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
plt.show() # show the plot
Univariate Analysis¶
Observations on Geography¶
# Calling the above function to create a barplot for 'Geography'
labeled_barplot(data, "Geography", perc=True)
-
Observations:
-
Half of the customers are from France. This may indicate that there are more bank branches in France. That data is not provided.
Observations on Gender¶
# Calling the above function to create a barplot for 'Gender'
labeled_barplot(data, "Gender", perc=True)
-
Observations:
-
Gender has a fairly evenl distribution with about a 10% increase in males over females with 54.6% and 45.4% respectively.
Observations on Number of Products¶
# Calling the above function to create a barplot for 'NumOfProducts'
labeled_barplot(data, "NumOfProducts", perc=True)
-
Observations:
-
Slightly over half of the customers only have 1 bank product but this is closely followed by those with 2 products at 50.8% and 45.9% respectively.
-
This with 3 and 4 bank products are a distant percentage at 2.7% and .6% respectively.
Observations on Has Credit Card¶
# Calling the above function to create a barplot for 'HasCrCard'
labeled_barplot(data, "HasCrCard", perc=True)
-
Observations:
-
The majority of customers do hold a credit card at 70.5% which is probably expected and typical.
Observations on Is Active Member¶
# Calling the above function to create a barplot for 'IsActiveMember'
labeled_barplot(data, "IsActiveMember", perc=True)
-
Observations:
-
Interestingly, Is Active Member is fairly evenly distributed. I would expect more customers to be active. This may be something we can look further in business recommendations.
Observations on Exited¶
# Calling the above function to create a barplot for 'Exited'
labeled_barplot(data, "Exited", perc=True)
-
Observations:
-
As expected, the majority of customers have not exited the bank for another bank within the past 6 months.
-
79.6% of customers have remained a current customer.
-
The data is inbalanced as nearly 80% of customers have stayed with the bank vs. 20.4% who have left.
Observations on Credit Score¶
# Calling the above function to create a histogram/boxplot for 'CreditScore'
histogram_boxplot(data, "CreditScore", kde=True)
-
Observations:
-
Credit Score has a fairly even distribution curve.
-
There is a slight tick up at the credit score of around 850. I'll have to compare CreditScore to EstimatedIncome and check correlation later.
Observations on Age¶
# Calling the above function to create a histogram/boxplot for 'Age'
histogram_boxplot(data, "Age", kde=True)
-
Observations:
-
Age is slightly right skewed with the median age around 36.
-
There is a slight tick up at around 38-39 in age which does not seem abnormal.
-
There does seem to be some outliers beyond the upper whisper IQR limit 75% percentile
Observations on Tenure¶
# Calling the above function to create a histogram/boxplot for 'Tenure'
histogram_boxplot(data, "Tenure", kde=True)
-
Observations:
-
Tenure has a fairly even distribution from customers ranging from around 1 year to 9 years.
Observations on Balance¶
# Calling the above function to create a histogram/boxplot for 'Balance'
histogram_boxplot(data, "Balance", kde=True)
-
Observations:
-
Balance is right skewed with a large number of customers with zero balances.
-
Customers with large balances taper off around 200K.
-
Most customers have balances rsanging from 50K - ~175K.
Observations on Estimated Salary¶
# Calling the above function to create a histogram/boxplot for 'EstimatedSalary'
histogram_boxplot(data, "EstimatedSalary", kde=True)
-
Observations:
-
Estimated Salary has a fairly even distribution from customers ranging from around 0 to 200K
Bivariate Analysis - (Target: Exited)¶
The below functions need to be defined to carry out the Exploratory Data Analysis.
### function to plot distributions wrt target
def distribution_plot_wrt_target(data, predictor, target):
fig, axs = plt.subplots(2, 2, figsize=(12, 10))
target_uniq = data[target].unique()
axs[0, 0].set_title("Distribution of target for target=" + str(target_uniq[0]))
sns.histplot(
data=data[data[target] == target_uniq[0]],
x=predictor,
kde=True,
ax=axs[0, 0],
color="teal",
stat="density",
)
axs[0, 1].set_title("Distribution of target for target=" + str(target_uniq[1]))
sns.histplot(
data=data[data[target] == target_uniq[1]],
x=predictor,
kde=True,
ax=axs[0, 1],
color="orange",
stat="density",
)
axs[1, 0].set_title("Boxplot w.r.t target")
sns.boxplot(data=data, x=target, y=predictor, ax=axs[1, 0], palette="gist_rainbow")
axs[1, 1].set_title("Boxplot (without outliers) w.r.t target")
sns.boxplot(
data=data,
x=target,
y=predictor,
ax=axs[1, 1],
showfliers=False,
palette="gist_rainbow",
)
plt.tight_layout()
plt.show()
### function for a stacked barplot chart
def stacked_barplot(data, predictor, target):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
"""
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
by=sorter, ascending=False
)
tab.plot(kind="bar", stacked=True, figsize=(count + 5, 5))
plt.legend(
loc="lower left", frameon=False,
)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
Correlation Matrix (For Numeric Features)¶
# Plot a Correlation Matrix to help identify relationships between numeric variables.
# Select only numerical features for correlation analysis
numeric_data = data.select_dtypes(include=['number'])
# Compute and plot the correlation matrix
plt.figure(figsize=(10, 6))
sns.heatmap(numeric_data.corr(), annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)
plt.title("Correlation Matrix of Numerical Features")
plt.show()
-
Observations:
-
Based on the above correlation matrix, I only see a couple correlations that I may want to look at in more detail below. These are age and exited with a .29 correlation and Number of Products and Balance at .3 which would seem to make sense.
Let's check some variables against our target.¶
# Calling the above function 'stacked_barplot' for 'Gender' and our target 'Exited'
stacked_barplot(data, "Gender", "Exited")
Exited 0 1 All Gender All 7963 2037 10000 Female 3404 1139 4543 Male 4559 898 5457 ------------------------------------------------------------------------------------------------------------------------
# Changing to CountPlots for ease of reading. 'Gender' and our target 'Exited'
sns.countplot(x='Gender', hue='Exited', data=data)
<Axes: xlabel='Gender', ylabel='count'>
-
Observations:
-
Based on the above CountPlot, roughly 20%-50% of customers across our listed genders have left the bank over the past 6 months.
-
on a percentage basis, more females have left the bank vs. their male counterparts. We may want to include some business recommendations below to promote more programs that would be of interest for women.
# Changing to CountPlots for ease of reading. 'Geography' and our target 'Exited'
sns.countplot(x='Geography', hue='Exited', data=data)
<Axes: xlabel='Geography', ylabel='count'>
-
Observations:
-
Based on the above CountPlot, Germany has a higher percentage basis of customers who have recently left the bank.
-
We may want to create some promotional programs targeting Germany.
-
Most customers are in France so France may have the more bank branches. This may indicate a center of customer loyalty.
# Changing to CountPlots for ease of reading. 'HasCrCard' and our target 'Exited'
sns.countplot(x='HasCrCard', hue='Exited', data=data)
<Axes: xlabel='HasCrCard', ylabel='count'>
# Calling the above function 'stacked_barplot' for 'HasCrCard' and our target 'Exited'
stacked_barplot(data, "HasCrCard", "Exited")
Exited 0 1 All HasCrCard All 7963 2037 10000 1 5631 1424 7055 0 2332 613 2945 ------------------------------------------------------------------------------------------------------------------------
-
Observations:
-
Based on the above CountPlot, roughly 20% of customers both with a credit card and without seemed to have left the bank recently.
-
The vast majority of customers have not left, regardless of whether or not they had a bank credit card.
# CountPlot for 'IsActiveMember' and our target 'Exited'
sns.countplot(x='IsActiveMember', hue='Exited', data=data)
<Axes: xlabel='IsActiveMember', ylabel='count'>
# Calling the above function 'stacked_barplot' for 'IsActiveMember' and our target 'Exited'
stacked_barplot(data, "IsActiveMember", "Exited")
Exited 0 1 All IsActiveMember All 7963 2037 10000 0 3547 1302 4849 1 4416 735 5151 ------------------------------------------------------------------------------------------------------------------------
-
Observations:
-
Based on the above charts, only ~10% of customers who are deemed active members have left the bank vs. those customers who were deemed as not active and ~25% of those customers have left the bank recently.
# CountPlot for 'NumOfProducts' and our target 'Exited'
sns.countplot(x='NumOfProducts', hue='Exited', data=data)
<Axes: xlabel='NumOfProducts', ylabel='count'>
# Calling the above function 'stacked_barplot' for 'NumOfProducts' and our target 'Exited'
stacked_barplot(data, "NumOfProducts", "Exited")
Exited 0 1 All NumOfProducts All 7963 2037 10000 1 3675 1409 5084 2 4242 348 4590 3 46 220 266 4 0 60 60 ------------------------------------------------------------------------------------------------------------------------
-
Observations:
-
Interestingly, the sweet spot here seems to be customers who are using either 1 or 2 bank products and the best are those customers using 2 bank products.
-
For customers using 3 different bank products, over 80% of those customers have left the bank recently. We may want to look as this, as those customers may be confused over the bank products.
-
Lastly, 100% of customers using 4 bank products left. This may infact indicate confusion of products amongst the bank's customers. We may want to send a survey or create a reach-out program to support these customers.
# Calling the above function 'distribution_plot_wrt_target' for 'Age' and our target 'Exited'
distribution_plot_wrt_target(data, predictor='Age', target='Exited')
-
Observations:
-
The distribution of customers who have left the bank recently is fairly evenly distributed, however for those customers that did not leave, it's right skewed showing a higher concentration of customers around 30-40 years of age.
# Calling the above function 'distribution_plot_wrt_target' for 'Balance' and our target 'Exited'
distribution_plot_wrt_target(data, predictor='Balance', target='Exited')
-
Observations:
-
Customers who did not exit (Exited = 0) show a large spike at 0 balance and a fairly uniform spread beyond that.
-
Customers who exited (Exited = 1) also have a spike at 0, but the distribution is more concentrated in the mid-range (50k–150k).
-
On the bottom left boxplot, the median balance is slightly higher for customers who exited.
-
On the bottom right boxplot, you can see there is a wider account balance spread among customers who have exited.
# Calling the above function 'distribution_plot_wrt_target' for 'EstimatedSalary' and our target 'Exited'
distribution_plot_wrt_target(data, predictor='EstimatedSalary', target='Exited')
-
Observations:
-
Both exited and retained customers have a fairly uniform distribution of salaries.
-
There's no strong pattern or peak — suggesting EstimatedSalary is likely not a strong predictor of customer churn by itself.
-
There is no significant differences in spread our outliers here.
# Calling the above function 'distribution_plot_wrt_target' for 'CreditScore' and our target 'Exited'
distribution_plot_wrt_target(data, predictor='CreditScore', target='Exited')
-
Observations:
-
Both exited and retained customers have only a slight left skew to them with credit scores approximately in the 550-750 range.
-
The median credit score for both groups is very close at approximately 650, also possibly indicating that credit score alone is not a good predictor of customer churn.
# Calling the above function 'distribution_plot_wrt_target' for 'Tenure' and our target 'Exited'
distribution_plot_wrt_target(data, predictor='Tenure', target='Exited')
-
Observations:
-
Both exited and retained customers have fairly even distribution.
-
The median tenure for both groups is very close at approximately 5 years, potenitally indicating that tenure alone is also not a good predictor of customer churn.
Data Preprocessing¶
# Let's take another look at our data copy to ensure we did not make any errors above during EDA.
# Above we removed RowNumber, CustomerId, and Surname — not useful for modeling
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10000 entries, 0 to 9999 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 CreditScore 10000 non-null int64 1 Geography 10000 non-null object 2 Gender 10000 non-null object 3 Age 10000 non-null int64 4 Tenure 10000 non-null int64 5 Balance 10000 non-null float64 6 NumOfProducts 10000 non-null int64 7 HasCrCard 10000 non-null int64 8 IsActiveMember 10000 non-null int64 9 EstimatedSalary 10000 non-null float64 10 Exited 10000 non-null int64 dtypes: float64(2), int64(7), object(2) memory usage: 859.5+ KB
Dummy Variable Creation¶
# Create dummy variables from all object (categorical) columns in 'data'
data = pd.get_dummies(
data,
columns=data.select_dtypes(include=["object"]).columns.tolist(),
drop_first=True
)
# Convert all columns to float (useful for modeling)
data = data.astype(float)
# Preview the result
data.head()
| CreditScore | Age | Tenure | Balance | NumOfProducts | HasCrCard | IsActiveMember | EstimatedSalary | Exited | Geography_Germany | Geography_Spain | Gender_Male | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 619.0 | 42.0 | 2.0 | 0.00 | 1.0 | 1.0 | 1.0 | 101348.88 | 1.0 | 0.0 | 0.0 | 0.0 |
| 1 | 608.0 | 41.0 | 1.0 | 83807.86 | 1.0 | 0.0 | 1.0 | 112542.58 | 0.0 | 0.0 | 1.0 | 0.0 |
| 2 | 502.0 | 42.0 | 8.0 | 159660.80 | 3.0 | 1.0 | 0.0 | 113931.57 | 1.0 | 0.0 | 0.0 | 0.0 |
| 3 | 699.0 | 39.0 | 1.0 | 0.00 | 2.0 | 0.0 | 0.0 | 93826.63 | 0.0 | 0.0 | 0.0 | 0.0 |
| 4 | 850.0 | 43.0 | 2.0 | 125510.82 | 1.0 | 1.0 | 1.0 | 79084.10 | 0.0 | 0.0 | 1.0 | 0.0 |
# Just checking the shape of the data after encoding
data.shape
(10000, 12)
# Just checking the data columns after encoding
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10000 entries, 0 to 9999 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 CreditScore 10000 non-null float64 1 Age 10000 non-null float64 2 Tenure 10000 non-null float64 3 Balance 10000 non-null float64 4 NumOfProducts 10000 non-null float64 5 HasCrCard 10000 non-null float64 6 IsActiveMember 10000 non-null float64 7 EstimatedSalary 10000 non-null float64 8 Exited 10000 non-null float64 9 Geography_Germany 10000 non-null float64 10 Geography_Spain 10000 non-null float64 11 Gender_Male 10000 non-null float64 dtypes: float64(12) memory usage: 937.6 KB
Train-validation-test Split¶
# Split Predictors (X) and Target (y)
X = data.drop(['Exited'],axis=1)
y = data['Exited'] # Exited
# Splitting the dataset into the Training and Testing set.
# Setting test size to 20%
X_large, X_test, y_large, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42,stratify=y,shuffle = True)
# Splitting the dataset further into the Training and Validation set.
# 64% training (80% of the above 80%)
# 16% validation (20% of the above 80%)
# 20% test (already set aside above)
X_train, X_val, y_train, y_val = train_test_split(X_large, y_large, test_size = 0.2, random_state = 42,stratify=y_large, shuffle = True)
print(X_train.shape, X_val.shape, X_test.shape)
(6400, 11) (1600, 11) (2000, 11)
print(y_train.shape, y_val.shape, y_test.shape)
(6400,) (1600,) (2000,)
-
Observations:
-
The data split as expected across training/validation/test and there are 11 features.
Data Normalization¶
Note* - Since all the numerical values are on a different scale, so we will be scaling all the numerical values to bring them to the same scale.
# Automatically identify numeric columns with more than 2 unique values (excluding binary)
cols_list = [col for col in X_train.columns if X_train[col].nunique() > 2]
# Check which columns will be scaled
print("Columns to be scaled:", cols_list)
Columns to be scaled: ['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'EstimatedSalary']
# import StandardScaler
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train[cols_list] = sc.fit_transform(X_train[cols_list])
X_val[cols_list] = sc.transform(X_val[cols_list])
X_test[cols_list] = sc.transform(X_test[cols_list])
Model Building¶
Model Evaluation Criterion¶
A model can make wrong predictions in the following ways:
- Predicting that a bank customer will leave and switch banks, when he/she in fact are not going to. (False Positive) Good to use F1.
- Predicting that a bank customer will not leave and switch banks, when he/she in fact are going to. (False Negative) Good to use Recall to minimize these false negtives.
Which case is more important?
-
Both are actually important in our case.
-
In our use case today predicting bank customer churn, it would make sense to use either Recall or potentialy F1 as our model performance metric.
-
F1 could be used to strike a good balance, avoiding spamming loyal customers with retention offers, and still catch as many churners as possible.
-
I have chosen to use Recall, however, to prioritize the model on reducing false negatives and to identify every customer who may be at risk of leaving, even if we sometimes "alert" falsely.
** As we are dealing with an imbalance in class distribution, we will be using class weights to allow the model to give proportionally more importance to the minority class for the 1st 3 models and SMOTE to the last 3 models without Class Weights.**
# Calculate class weights for imbalanced dataset
cw = (y_train.shape[0]) / np.bincount(y_train)
# Create a dictionary mapping class indices to their respective class weights
cw_dict = {}
for i in range(cw.shape[0]):
cw_dict[i] = cw[i]
cw_dict
{0: 1.2558869701726845, 1: 4.9079754601226995}
# defining the batch size and # epochs upfront as we'll be using the same values for all models
epochs = 25
batch_size = 64
- Creating a function for plotting the confusion matrix
def make_confusion_matrix(actual_targets, predicted_targets):
"""
To plot the confusion_matrix with percentages
actual_targets: actual target (dependent) variable values
predicted_targets: predicted target (dependent) variable values
"""
cm = confusion_matrix(actual_targets, predicted_targets)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(cm.shape[0], cm.shape[1])
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
- Creating two blank dataframes that will store the recall values for all the models we build to track trsaining and validation dataset performance.
train_metric_df = pd.DataFrame(columns=["recall"])
valid_metric_df = pd.DataFrame(columns=["recall"])
Model 0 - Neural Network with SGD Optimizer¶
- Let's start with a neural network consisting of
- two hidden layers with 14 and 7 neurons respectively
- activation function of ReLU.
- SGD as the optimizer
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
#Initializing the neural network
model = Sequential()
model.add(Dense(14,activation="relu",input_dim=X_train.shape[1]))
model.add(Dense(7,activation="relu"))
model.add(Dense(1,activation="sigmoid"))
optimizer = tf.keras.optimizers.SGD(0.001) # defining SGD as the optimizer to be used
metric = tf.keras.metrics.Recall()
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 14) 168
dense_1 (Dense) (None, 7) 105
dense_2 (Dense) (None, 1) 8
=================================================================
Total params: 281 (1.10 KB)
Trainable params: 281 (1.10 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
#Fitting the ANN
start = time.time()
history = model.fit(X_train, y_train,
validation_data=(X_val, y_val),
batch_size=batch_size,
epochs=epochs,
class_weight=cw_dict)
end = time.time()
Epoch 1/25 100/100 [==============================] - 1s 4ms/step - loss: 1.5586 - recall: 0.9931 - val_loss: 0.9099 - val_recall: 0.9908 Epoch 2/25 100/100 [==============================] - 0s 2ms/step - loss: 1.5196 - recall: 0.9808 - val_loss: 0.8709 - val_recall: 0.9663 Epoch 3/25 100/100 [==============================] - 0s 2ms/step - loss: 1.4921 - recall: 0.9494 - val_loss: 0.8414 - val_recall: 0.9387 Epoch 4/25 100/100 [==============================] - 0s 2ms/step - loss: 1.4718 - recall: 0.9110 - val_loss: 0.8182 - val_recall: 0.9080 Epoch 5/25 100/100 [==============================] - 0s 2ms/step - loss: 1.4563 - recall: 0.8827 - val_loss: 0.7998 - val_recall: 0.8620 Epoch 6/25 100/100 [==============================] - 0s 2ms/step - loss: 1.4443 - recall: 0.8374 - val_loss: 0.7849 - val_recall: 0.8190 Epoch 7/25 100/100 [==============================] - 0s 2ms/step - loss: 1.4348 - recall: 0.7860 - val_loss: 0.7726 - val_recall: 0.7914 Epoch 8/25 100/100 [==============================] - 0s 2ms/step - loss: 1.4271 - recall: 0.7469 - val_loss: 0.7622 - val_recall: 0.7577 Epoch 9/25 100/100 [==============================] - 0s 2ms/step - loss: 1.4206 - recall: 0.7216 - val_loss: 0.7534 - val_recall: 0.7117 Epoch 10/25 100/100 [==============================] - 0s 2ms/step - loss: 1.4153 - recall: 0.6771 - val_loss: 0.7459 - val_recall: 0.6779 Epoch 11/25 100/100 [==============================] - 0s 2ms/step - loss: 1.4107 - recall: 0.6549 - val_loss: 0.7395 - val_recall: 0.6564 Epoch 12/25 100/100 [==============================] - 0s 2ms/step - loss: 1.4068 - recall: 0.6273 - val_loss: 0.7339 - val_recall: 0.6350 Epoch 13/25 100/100 [==============================] - 0s 2ms/step - loss: 1.4033 - recall: 0.6081 - val_loss: 0.7289 - val_recall: 0.6135 Epoch 14/25 100/100 [==============================] - 0s 2ms/step - loss: 1.4003 - recall: 0.5867 - val_loss: 0.7245 - val_recall: 0.6043 Epoch 15/25 100/100 [==============================] - 0s 2ms/step - loss: 1.3975 - recall: 0.5637 - val_loss: 0.7206 - val_recall: 0.5859 Epoch 16/25 100/100 [==============================] - 0s 2ms/step - loss: 1.3950 - recall: 0.5445 - val_loss: 0.7172 - val_recall: 0.5706 Epoch 17/25 100/100 [==============================] - 0s 2ms/step - loss: 1.3927 - recall: 0.5345 - val_loss: 0.7141 - val_recall: 0.5368 Epoch 18/25 100/100 [==============================] - 0s 2ms/step - loss: 1.3905 - recall: 0.5169 - val_loss: 0.7114 - val_recall: 0.5153 Epoch 19/25 100/100 [==============================] - 0s 2ms/step - loss: 1.3886 - recall: 0.5130 - val_loss: 0.7089 - val_recall: 0.5031 Epoch 20/25 100/100 [==============================] - 0s 2ms/step - loss: 1.3867 - recall: 0.4962 - val_loss: 0.7067 - val_recall: 0.5000 Epoch 21/25 100/100 [==============================] - 0s 2ms/step - loss: 1.3850 - recall: 0.4877 - val_loss: 0.7047 - val_recall: 0.4969 Epoch 22/25 100/100 [==============================] - 0s 2ms/step - loss: 1.3834 - recall: 0.4923 - val_loss: 0.7029 - val_recall: 0.4877 Epoch 23/25 100/100 [==============================] - 0s 2ms/step - loss: 1.3818 - recall: 0.4847 - val_loss: 0.7013 - val_recall: 0.4877 Epoch 24/25 100/100 [==============================] - 0s 2ms/step - loss: 1.3802 - recall: 0.4801 - val_loss: 0.6998 - val_recall: 0.4908 Epoch 25/25 100/100 [==============================] - 0s 2ms/step - loss: 1.3788 - recall: 0.4793 - val_loss: 0.6985 - val_recall: 0.4877
print("Time taken in seconds ",end-start)
Time taken in seconds 6.09728479385376
Loss Function
#Plotting Train Loss vs Validation Loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
Recall
#Plotting Train recall vs Validation recall
plt.plot(history.history['recall'])
plt.plot(history.history['val_recall'])
plt.title('model recall')
plt.ylabel('Recall')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
#Predicting the results using best as a threshold on training set
y_train_pred = model.predict(X_train)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
200/200 [==============================] - 0s 1ms/step
array([[ True],
[ True],
[ True],
...,
[False],
[ True],
[ True]])
#Predicting the results using best as a threshold
y_val_pred = model.predict(X_val)
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 1ms/step
array([[False],
[ True],
[ True],
...,
[ True],
[False],
[False]])
model_name = "NN with SGD"
train_metric_df.loc[model_name] = recall_score(y_train, y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val, y_val_pred)
Classification Report
#Classification report on training set
cr = classification_report(y_train, y_train_pred)
print(cr)
precision recall f1-score support
0.0 0.81 0.58 0.68 5096
1.0 0.23 0.48 0.31 1304
accuracy 0.56 6400
macro avg 0.52 0.53 0.49 6400
weighted avg 0.69 0.56 0.60 6400
#classification report on validation set
cr=classification_report(y_val,y_val_pred)
print(cr)
precision recall f1-score support
0.0 0.81 0.56 0.66 1274
1.0 0.22 0.49 0.30 326
accuracy 0.55 1600
macro avg 0.52 0.52 0.48 1600
weighted avg 0.69 0.55 0.59 1600
# Generate a Classification Heatmap
report = classification_report(y_val, y_val_pred, output_dict=True)
df_report = pd.DataFrame(report).transpose()
plt.figure(figsize=(8, 5))
sns.heatmap(df_report.iloc[:-1, :-1], annot=True, cmap="YlGnBu", fmt=".2f")
plt.title("Classification Report Heatmap")
plt.show()
# Generate ROC Curve to show trade-off between True Positive Rate and False Positive Rate.
y_scores = model.predict(X_val).ravel()
fpr, tpr, thresholds = roc_curve(y_val, y_scores)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.2f}")
plt.plot([0, 1], [0, 1], linestyle='--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.grid(True)
plt.show()
50/50 [==============================] - 0s 4ms/step
Focus on the Business Goal: We're predicting customer churn (1.0 = exited) — so catching as many churners as possible is key.
That makes Recall for class 1.0 (churners) the most important metric to watch.
For the Validation Set:
-
Class 0.0 (Stayed): Precision=0.81, Recall=0.56
-
Class 1.0 (Exited): Precision=0.22, Recall=0.49
-
an AUC of 0.52 a signal that our current model is just slightly better than random guessing, which isn't strong enough for reliable churn prediction yet.
-
The model correctly identifies 49% of churners.
-
This model catches just under half of the churners, which is useful if the business can follow up with targeted retention campaigns.
-
Our model can currently identify ~49% of customers likely to churn. While it may incorrectly flag some customers who would stay, this recall level allows proactive retention efforts, such as targeted offers or outreach campaigns, to focus on a meaningful subset of at-risk customers.
Confusion matrix
#Calculating the confusion matrix on training set
make_confusion_matrix(y_train, y_train_pred)
#Calculating the confusion matrix v
make_confusion_matrix(y_val,y_val_pred)
** Observations **
- Out of all customers that actually churned, we were able to identify 49% of them based on Recall. While our predictions also flagged some loyal customers incorrectly (false positives), this trade-off may be acceptable when the cost of losing a customer is high and retention efforts are relatively inexpensive. - Recall (for churners) = TP / (TP + FN) = 159 / (159 + 167) ≈ 0.49
Model Performance Improvement¶
Model 1 - Neural Network with Adam Optimizer¶
- Now let's switch to a NN model using Adam Optimizer
- two hidden layers with 14 and 7 neurons respectively
- activation function of ReLU.
- Adam as the optimizer
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
#Initializing the neural network
model_1 = Sequential()
model_1.add(Dense(14,activation="relu",input_dim=X_train.shape[1]))
model_1.add(Dense(7,activation="relu"))
model_1.add(Dense(1,activation="sigmoid"))
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001) # defining Adam as the optimizer to be used
metric = tf.keras.metrics.Recall()
model_1.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
model_1.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 14) 168
dense_1 (Dense) (None, 7) 105
dense_2 (Dense) (None, 1) 8
=================================================================
Total params: 281 (1.10 KB)
Trainable params: 281 (1.10 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
#Fitting the ANN
start = time.time()
history_1 = model_1.fit(X_train, y_train,
validation_data=(X_val, y_val),
batch_size=batch_size,
epochs=epochs,
class_weight=cw_dict)
end = time.time()
Epoch 1/25 100/100 [==============================] - 1s 5ms/step - loss: 1.4336 - recall: 0.6143 - val_loss: 0.6760 - val_recall: 0.6350 Epoch 2/25 100/100 [==============================] - 0s 2ms/step - loss: 1.2864 - recall: 0.6787 - val_loss: 0.6083 - val_recall: 0.6472 Epoch 3/25 100/100 [==============================] - 0s 2ms/step - loss: 1.2037 - recall: 0.6787 - val_loss: 0.6035 - val_recall: 0.7117 Epoch 4/25 100/100 [==============================] - 0s 2ms/step - loss: 1.1468 - recall: 0.7132 - val_loss: 0.5716 - val_recall: 0.6871 Epoch 5/25 100/100 [==============================] - 0s 2ms/step - loss: 1.1090 - recall: 0.7370 - val_loss: 0.5403 - val_recall: 0.6718 Epoch 6/25 100/100 [==============================] - 0s 2ms/step - loss: 1.0787 - recall: 0.7339 - val_loss: 0.5337 - val_recall: 0.6963 Epoch 7/25 100/100 [==============================] - 0s 2ms/step - loss: 1.0518 - recall: 0.7523 - val_loss: 0.5372 - val_recall: 0.7423 Epoch 8/25 100/100 [==============================] - 0s 2ms/step - loss: 1.0294 - recall: 0.7592 - val_loss: 0.5281 - val_recall: 0.7454 Epoch 9/25 100/100 [==============================] - 0s 2ms/step - loss: 1.0119 - recall: 0.7653 - val_loss: 0.5039 - val_recall: 0.7239 Epoch 10/25 100/100 [==============================] - 0s 2ms/step - loss: 0.9950 - recall: 0.7554 - val_loss: 0.5424 - val_recall: 0.7914 Epoch 11/25 100/100 [==============================] - 0s 3ms/step - loss: 0.9839 - recall: 0.7646 - val_loss: 0.5067 - val_recall: 0.7607 Epoch 12/25 100/100 [==============================] - 0s 3ms/step - loss: 0.9791 - recall: 0.7699 - val_loss: 0.4945 - val_recall: 0.7515 Epoch 13/25 100/100 [==============================] - 0s 3ms/step - loss: 0.9665 - recall: 0.7722 - val_loss: 0.4937 - val_recall: 0.7607 Epoch 14/25 100/100 [==============================] - 0s 3ms/step - loss: 0.9583 - recall: 0.7753 - val_loss: 0.4710 - val_recall: 0.7423 Epoch 15/25 100/100 [==============================] - 0s 3ms/step - loss: 0.9555 - recall: 0.7745 - val_loss: 0.4845 - val_recall: 0.7546 Epoch 16/25 100/100 [==============================] - 0s 3ms/step - loss: 0.9493 - recall: 0.7715 - val_loss: 0.4823 - val_recall: 0.7607 Epoch 17/25 100/100 [==============================] - 0s 3ms/step - loss: 0.9460 - recall: 0.7730 - val_loss: 0.4794 - val_recall: 0.7638 Epoch 18/25 100/100 [==============================] - 0s 3ms/step - loss: 0.9416 - recall: 0.7730 - val_loss: 0.4883 - val_recall: 0.7638 Epoch 19/25 100/100 [==============================] - 0s 3ms/step - loss: 0.9391 - recall: 0.7623 - val_loss: 0.4677 - val_recall: 0.7423 Epoch 20/25 100/100 [==============================] - 0s 3ms/step - loss: 0.9376 - recall: 0.7707 - val_loss: 0.4922 - val_recall: 0.7791 Epoch 21/25 100/100 [==============================] - 0s 3ms/step - loss: 0.9344 - recall: 0.7707 - val_loss: 0.4760 - val_recall: 0.7546 Epoch 22/25 100/100 [==============================] - 0s 2ms/step - loss: 0.9327 - recall: 0.7722 - val_loss: 0.4631 - val_recall: 0.7454 Epoch 23/25 100/100 [==============================] - 0s 2ms/step - loss: 0.9308 - recall: 0.7692 - val_loss: 0.4762 - val_recall: 0.7515 Epoch 24/25 100/100 [==============================] - 0s 2ms/step - loss: 0.9294 - recall: 0.7730 - val_loss: 0.4682 - val_recall: 0.7393 Epoch 25/25 100/100 [==============================] - 0s 2ms/step - loss: 0.9278 - recall: 0.7638 - val_loss: 0.4738 - val_recall: 0.7577
print("Time taken in seconds ",end-start)
Time taken in seconds 10.908873319625854
Loss Function
#Plotting Train Loss vs Validation Loss
plt.plot(history_1.history['loss'])
plt.plot(history_1.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
Recall
#Plotting Train recall vs Validation recall
plt.plot(history_1.history['recall'])
plt.plot(history_1.history['val_recall'])
plt.title('model recall')
plt.ylabel('Recall')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
#Predicting the results using 0.5 as the threshold for training set
y_train_pred = model_1.predict(X_train)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
200/200 [==============================] - 0s 2ms/step
array([[ True],
[False],
[False],
...,
[False],
[ True],
[False]])
#Predicting the results using 0.5 as the threshold for validation set
y_val_pred = model_1.predict(X_val)
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 1ms/step
array([[ True],
[False],
[False],
...,
[False],
[ True],
[ True]])
model_name = "NN with Adam"
train_metric_df.loc[model_name] = recall_score(y_train, y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val, y_val_pred)
Classification Report
#Classification report on training set
cr = classification_report(y_train, y_train_pred)
print(cr)
precision recall f1-score support
0.0 0.93 0.78 0.85 5096
1.0 0.48 0.77 0.59 1304
accuracy 0.78 6400
macro avg 0.70 0.78 0.72 6400
weighted avg 0.84 0.78 0.80 6400
#classification report on validation set
cr=classification_report(y_val,y_val_pred)
print(cr)
precision recall f1-score support
0.0 0.93 0.77 0.84 1274
1.0 0.46 0.76 0.57 326
accuracy 0.77 1600
macro avg 0.69 0.76 0.71 1600
weighted avg 0.83 0.77 0.79 1600
# Generate a Classification Heatmap
report = classification_report(y_val, y_val_pred, output_dict=True)
df_report = pd.DataFrame(report).transpose()
plt.figure(figsize=(8, 5))
sns.heatmap(df_report.iloc[:-1, :-1], annot=True, cmap="YlGnBu", fmt=".2f")
plt.title("Classification Report Heatmap")
plt.show()
# Generate ROC Curve to show trade-off between True Positive Rate and False Positive Rate.
y_scores = model_1.predict(X_val).ravel()
fpr, tpr, thresholds = roc_curve(y_val, y_scores)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.2f}")
plt.plot([0, 1], [0, 1], linestyle='--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.grid(True)
plt.show()
50/50 [==============================] - 0s 2ms/step
Focus on the Business Goal: We're predicting customer churn (1.0 = exited) — so catching as many churners as possible is key.
That makes Recall for class 1.0 (churners) the most important metric to watch.
This NN model using Adam Optimizer is showing greatly improved results for Recall.
For the Validation Set:
-
Class 0.0 (Stayed): Precision=0.93, Recall=0.77
-
Class 1.0 (Exited): Precision=0.46, Recall=0.76
-
an AUC of 0.85 is showing great improvement in distinguishing between churners and non-churners
-
The model correctly identifies 76% of churners.
-
This model now catches over 3/4 of the churners, which is useful if the business can follow up with targeted retention campaigns.
-
Our model can currently identify ~76% of customers likely to churn. While it may incorrectly flag some customers who would stay, this recall level allows proactive retention efforts, such as targeted offers or outreach campaigns, to focus on a meaningful subset of at-risk customers.
Confusion matrix
#Calculating the confusion matrix on training set
make_confusion_matrix(y_train, y_train_pred)
#Calculating the confusion matrix on validation set
make_confusion_matrix(y_val,y_val_pred)
** Observations **
- Out of all customers that actually churned, we were able to identify 76% of them based on Recall. While our predictions also flagged some loyal customers incorrectly (false positives), this trade-off may be acceptable when the cost of losing a customer is high and retention efforts are relatively inexpensive. - Recall (for churners) = TP / (TP + FN) = 247 / (247 + 79) ≈ 0.76
Model 2 - Neural Network with Adam Optimizer and Dropout¶
- Now let's switch to a NN model using Adam Optimizer and Dropout
- one input layer with 32 neurons
- three hidden layers with 20 and 14 and 7 neurons respectively
- activation function of ReLU.
- Adam as the optimizer with dropout
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
# Initializing the neural network
model_2 = Sequential()
# Input layer + Dropout
model_2.add(Dense(32, activation="relu", input_dim=X_train.shape[1]))
model_2.add(Dropout(0.3))
# First hidden layer + Dropout
model_2.add(Dense(20, activation="relu"))
model_2.add(Dropout(0.2))
# Second hidden layer + Dropout
model_2.add(Dense(14, activation="relu"))
model_2.add(Dropout(0.1))
# Third hidden layer (no Dropout here is fine too)
model_2.add(Dense(7, activation="relu"))
# Output layer
model_2.add(Dense(1, activation="sigmoid"))
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001) # defining Adam as the optimizer to be used
metric = tf.keras.metrics.Recall()
model_2.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
model_2.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 32) 384
dropout (Dropout) (None, 32) 0
dense_1 (Dense) (None, 20) 660
dropout_1 (Dropout) (None, 20) 0
dense_2 (Dense) (None, 14) 294
dropout_2 (Dropout) (None, 14) 0
dense_3 (Dense) (None, 7) 105
dense_4 (Dense) (None, 1) 8
=================================================================
Total params: 1451 (5.67 KB)
Trainable params: 1451 (5.67 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
#Fitting the ANN
start = time.time()
history_2 = model_2.fit(X_train, y_train,
validation_data=(X_val, y_val),
batch_size=batch_size,
epochs=epochs,
class_weight=cw_dict)
end = time.time()
Epoch 1/25 100/100 [==============================] - 1s 4ms/step - loss: 1.3649 - recall: 0.6035 - val_loss: 0.6372 - val_recall: 0.6135 Epoch 2/25 100/100 [==============================] - 0s 2ms/step - loss: 1.2804 - recall: 0.6411 - val_loss: 0.5708 - val_recall: 0.6779 Epoch 3/25 100/100 [==============================] - 0s 3ms/step - loss: 1.2196 - recall: 0.6580 - val_loss: 0.5737 - val_recall: 0.7546 Epoch 4/25 100/100 [==============================] - 0s 2ms/step - loss: 1.1805 - recall: 0.7094 - val_loss: 0.5532 - val_recall: 0.7485 Epoch 5/25 100/100 [==============================] - 0s 3ms/step - loss: 1.1639 - recall: 0.7170 - val_loss: 0.5247 - val_recall: 0.7393 Epoch 6/25 100/100 [==============================] - 0s 2ms/step - loss: 1.1294 - recall: 0.7224 - val_loss: 0.5184 - val_recall: 0.7669 Epoch 7/25 100/100 [==============================] - 0s 3ms/step - loss: 1.1152 - recall: 0.7423 - val_loss: 0.5332 - val_recall: 0.7945 Epoch 8/25 100/100 [==============================] - 0s 3ms/step - loss: 1.0919 - recall: 0.7561 - val_loss: 0.5175 - val_recall: 0.7883 Epoch 9/25 100/100 [==============================] - 0s 2ms/step - loss: 1.0805 - recall: 0.7538 - val_loss: 0.4933 - val_recall: 0.7699 Epoch 10/25 100/100 [==============================] - 0s 2ms/step - loss: 1.0667 - recall: 0.7324 - val_loss: 0.5287 - val_recall: 0.8006 Epoch 11/25 100/100 [==============================] - 0s 3ms/step - loss: 1.0685 - recall: 0.7477 - val_loss: 0.4924 - val_recall: 0.7669 Epoch 12/25 100/100 [==============================] - 0s 3ms/step - loss: 1.0486 - recall: 0.7485 - val_loss: 0.5047 - val_recall: 0.7975 Epoch 13/25 100/100 [==============================] - 0s 2ms/step - loss: 1.0420 - recall: 0.7646 - val_loss: 0.4861 - val_recall: 0.7638 Epoch 14/25 100/100 [==============================] - 0s 3ms/step - loss: 1.0363 - recall: 0.7600 - val_loss: 0.4871 - val_recall: 0.7607 Epoch 15/25 100/100 [==============================] - 0s 2ms/step - loss: 1.0268 - recall: 0.7607 - val_loss: 0.4928 - val_recall: 0.7761 Epoch 16/25 100/100 [==============================] - 0s 2ms/step - loss: 1.0169 - recall: 0.7462 - val_loss: 0.4787 - val_recall: 0.7546 Epoch 17/25 100/100 [==============================] - 0s 3ms/step - loss: 1.0088 - recall: 0.7569 - val_loss: 0.4826 - val_recall: 0.7607 Epoch 18/25 100/100 [==============================] - 0s 2ms/step - loss: 1.0170 - recall: 0.7577 - val_loss: 0.5145 - val_recall: 0.8067 Epoch 19/25 100/100 [==============================] - 0s 3ms/step - loss: 0.9953 - recall: 0.7653 - val_loss: 0.4841 - val_recall: 0.7730 Epoch 20/25 100/100 [==============================] - 0s 2ms/step - loss: 1.0037 - recall: 0.7577 - val_loss: 0.4899 - val_recall: 0.7791 Epoch 21/25 100/100 [==============================] - 0s 3ms/step - loss: 1.0067 - recall: 0.7600 - val_loss: 0.4893 - val_recall: 0.7761 Epoch 22/25 100/100 [==============================] - 0s 2ms/step - loss: 0.9996 - recall: 0.7646 - val_loss: 0.4654 - val_recall: 0.7423 Epoch 23/25 100/100 [==============================] - 0s 3ms/step - loss: 0.9929 - recall: 0.7569 - val_loss: 0.4830 - val_recall: 0.7515 Epoch 24/25 100/100 [==============================] - 0s 2ms/step - loss: 0.9929 - recall: 0.7638 - val_loss: 0.4792 - val_recall: 0.7607 Epoch 25/25 100/100 [==============================] - 0s 3ms/step - loss: 0.9862 - recall: 0.7569 - val_loss: 0.4979 - val_recall: 0.8037
print("Time taken in seconds ",end-start)
Time taken in seconds 11.215099096298218
Loss Function
#Plotting Train Loss vs Validation Loss
plt.plot(history_2.history['loss'])
plt.plot(history_2.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
Recall
#Plotting Train recall vs Validation recall
plt.plot(history_2.history['recall'])
plt.plot(history_2.history['val_recall'])
plt.title('model recall')
plt.ylabel('Recall')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
-
Observations:
-
Both train and validation recall increase over time indicating our model is learning how to detect churners effectively.
-
Validation recall is consistently higher than training recall which is a bit rare but not necessarily bad. It could be because of the class imbalance or the use of class weights.
-
The model is generalizing well and maintains high recall on unseen data — which is good for churn detection, where missing a churner is more costly than a false alarm.
-
Validation recall hovers around 0.80+, which is strong.
#Predicting the results using 0.5 as the threshold for training set
y_train_pred = model_2.predict(X_train)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
200/200 [==============================] - 0s 1ms/step
array([[ True],
[False],
[False],
...,
[False],
[ True],
[False]])
#Predicting the results using 0.5 as the threshold for validation set
y_val_pred = model_2.predict(X_val)
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 1ms/step
array([[ True],
[False],
[False],
...,
[False],
[ True],
[ True]])
model_name = "NN with Adam and Dropout"
train_metric_df.loc[model_name] = recall_score(y_train, y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val, y_val_pred)
Classification Report
#Classification report on training set
cr = classification_report(y_train, y_train_pred)
print(cr)
precision recall f1-score support
0.0 0.94 0.75 0.83 5096
1.0 0.45 0.80 0.58 1304
accuracy 0.76 6400
macro avg 0.69 0.78 0.70 6400
weighted avg 0.84 0.76 0.78 6400
#classification report on validation set
cr=classification_report(y_val,y_val_pred)
print(cr)
precision recall f1-score support
0.0 0.94 0.74 0.82 1274
1.0 0.44 0.80 0.57 326
accuracy 0.75 1600
macro avg 0.69 0.77 0.70 1600
weighted avg 0.83 0.75 0.77 1600
# Generate a Classification Heatmap
report = classification_report(y_val, y_val_pred, output_dict=True)
df_report = pd.DataFrame(report).transpose()
plt.figure(figsize=(8, 5))
sns.heatmap(df_report.iloc[:-1, :-1], annot=True, cmap="YlGnBu", fmt=".2f")
plt.title("Classification Report Heatmap")
plt.show()
# Generate ROC Curve to show trade-off between True Positive Rate and False Positive Rate.
y_scores = model_2.predict(X_val).ravel()
fpr, tpr, thresholds = roc_curve(y_val, y_scores)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.2f}")
plt.plot([0, 1], [0, 1], linestyle='--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.grid(True)
plt.show()
50/50 [==============================] - 0s 1ms/step
Focus on the Business Goal: We're predicting customer churn (1.0 = exited) — so catching as many churners as possible is key.
That makes Recall for class 1.0 (churners) the most important metric to watch.
This NN model using **Adam Optimizer and Dropout is showing improved results for Recall.**
For the Validation Set:
-
Class 0.0 (Stayed): Precision=0.94, Recall=0.74
-
Class 1.0 (Exited): Precision=0.44, Recall=0.80
-
an AUC of 0.84 is showing improvement in distinguishing between churners and non-churners from the original model.
-
The model correctly identifies 80% of churners.
-
This model now catches well over 3/4 of the churners, which is useful if the business can follow up with targeted retention campaigns.
-
Our model can currently identify ~80% of customers likely to churn. While it may incorrectly flag some customers who would stay, this recall level allows proactive retention efforts, such as targeted offers or outreach campaigns, to focus on a meaningful subset of at-risk customers.
Confusion matrix
#Calculating the confusion matrix on training set
make_confusion_matrix(y_train, y_train_pred)
#Calculating the confusion matrix on validation set
make_confusion_matrix(y_val,y_val_pred)
** Observations **
- Out of all customers that actually churned, we were able to identify 80% of them based on Recall. While our predictions also flagged some loyal customers incorrectly (false positives), this trade-off may be acceptable when the cost of losing a customer is high and retention efforts are relatively inexpensive. - Recall (for churners) = TP / (TP + FN) = 262 / (262 + 64) ≈ 0.80
- We're maintaining strong generalization and avoiding overfitting while maximizing churn recall.
Model 3 - Neural Network with Balanced Data (by applying SMOTE) and SGD Optimizer¶
- Now let's switch to a NN model with Balanced Data (by applying SMOTE) and SGD Optimizer
- one input layer with 32 neurons
- two hidden layers with 20 and 14 neurons respectively
- activation function of ReLU.
Let's apply SMOTE to balance this dataset and then again apply hyperparamter tuning accordingly.
sm = SMOTE(random_state=42)
# Fit SMOTE on the training data and create balanced versions
X_train_smote, y_train_smote = sm.fit_resample(X_train, y_train)
print('After UpSampling, the shape of train_X: {}'.format(X_train_smote.shape))
print('After UpSampling, the shape of train_y: {} \n'.format(y_train_smote.shape))
After UpSampling, the shape of train_X: (10192, 11) After UpSampling, the shape of train_y: (10192,)
Let's build a model with the balanced dataset
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
# Initializing the neural network
model_3 = Sequential()
# Input layer
model_3.add(Dense(32, activation="relu", input_dim=X_train_smote.shape[1]))
# First hidden layer
model_3.add(Dense(20, activation="relu"))
# Second hidden layer
model_3.add(Dense(14, activation="relu"))
# Output layer
model_3.add(Dense(1, activation="sigmoid"))
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001) # defining SGD as the optimizer to be used
metric = tf.keras.metrics.Recall()
model_3.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
model_3.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 32) 384
dense_1 (Dense) (None, 20) 660
dense_2 (Dense) (None, 14) 294
dense_3 (Dense) (None, 1) 15
=================================================================
Total params: 1353 (5.29 KB)
Trainable params: 1353 (5.29 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
# Fitting the ANN without class_weight since SMOTE is already applied
start = time.time()
history_3 = model_3.fit(X_train_smote, y_train_smote,
validation_data=(X_val, y_val),
batch_size=batch_size,
epochs=epochs)
end = time.time()
Epoch 1/25 160/160 [==============================] - 1s 3ms/step - loss: 0.6907 - recall: 0.0850 - val_loss: 0.6396 - val_recall: 0.0859 Epoch 2/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6876 - recall: 0.1303 - val_loss: 0.6407 - val_recall: 0.1227 Epoch 3/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6847 - recall: 0.1890 - val_loss: 0.6410 - val_recall: 0.1748 Epoch 4/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6820 - recall: 0.2288 - val_loss: 0.6408 - val_recall: 0.2270 Epoch 5/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6795 - recall: 0.2682 - val_loss: 0.6403 - val_recall: 0.2761 Epoch 6/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6770 - recall: 0.3104 - val_loss: 0.6392 - val_recall: 0.3037 Epoch 7/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6746 - recall: 0.3411 - val_loss: 0.6380 - val_recall: 0.3282 Epoch 8/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6722 - recall: 0.3787 - val_loss: 0.6365 - val_recall: 0.3466 Epoch 9/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6699 - recall: 0.4029 - val_loss: 0.6346 - val_recall: 0.3650 Epoch 10/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6675 - recall: 0.4223 - val_loss: 0.6325 - val_recall: 0.3712 Epoch 11/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6651 - recall: 0.4368 - val_loss: 0.6304 - val_recall: 0.3804 Epoch 12/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6627 - recall: 0.4570 - val_loss: 0.6279 - val_recall: 0.3804 Epoch 13/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6603 - recall: 0.4674 - val_loss: 0.6257 - val_recall: 0.4049 Epoch 14/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6578 - recall: 0.4853 - val_loss: 0.6230 - val_recall: 0.4233 Epoch 15/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6553 - recall: 0.4957 - val_loss: 0.6202 - val_recall: 0.4387 Epoch 16/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6528 - recall: 0.5094 - val_loss: 0.6175 - val_recall: 0.4448 Epoch 17/25 160/160 [==============================] - 0s 3ms/step - loss: 0.6503 - recall: 0.5186 - val_loss: 0.6148 - val_recall: 0.4509 Epoch 18/25 160/160 [==============================] - 0s 3ms/step - loss: 0.6477 - recall: 0.5312 - val_loss: 0.6116 - val_recall: 0.4540 Epoch 19/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6451 - recall: 0.5398 - val_loss: 0.6086 - val_recall: 0.4663 Epoch 20/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6425 - recall: 0.5410 - val_loss: 0.6058 - val_recall: 0.4816 Epoch 21/25 160/160 [==============================] - 0s 3ms/step - loss: 0.6398 - recall: 0.5500 - val_loss: 0.6028 - val_recall: 0.4877 Epoch 22/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6371 - recall: 0.5595 - val_loss: 0.5997 - val_recall: 0.5031 Epoch 23/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6343 - recall: 0.5626 - val_loss: 0.5973 - val_recall: 0.5123 Epoch 24/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6316 - recall: 0.5742 - val_loss: 0.5944 - val_recall: 0.5215 Epoch 25/25 160/160 [==============================] - 0s 2ms/step - loss: 0.6288 - recall: 0.5810 - val_loss: 0.5917 - val_recall: 0.5276
print("Time taken in seconds ",end-start)
Time taken in seconds 9.132063150405884
Loss Function
#Plotting Train Loss vs Validation Loss
plt.plot(history_3.history['loss'])
plt.plot(history_3.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
Recall
#Plotting Train recall vs Validation recall
plt.plot(history_3.history['recall'])
plt.plot(history_3.history['val_recall'])
plt.title('model recall')
plt.ylabel('Recall')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
-
Observations:
-
Both train and validation recall increase consistently showing good generalization.
-
Validation recall is growing gradually meaning the model is learning meaningful patterns.
-
The gap between training and validation is small, indicating good generalization
#Predicting the results using 0.5 as the threshold for training set
y_train_pred = model_3.predict(X_train_smote)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
319/319 [==============================] - 0s 1ms/step
array([[ True],
[False],
[False],
...,
[False],
[ True],
[ True]])
#Predicting the results using 0.5 as the threshold for validation set
y_val_pred = model_3.predict(X_val)
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 1ms/step
array([[ True],
[False],
[False],
...,
[False],
[ True],
[False]])
model_name = "NN with SMOTE and SGD"
train_metric_df.loc[model_name] = recall_score(y_train_smote, y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val, y_val_pred)
Classification Report
#Classification report on training set
cr = classification_report(y_train_smote, y_train_pred)
print(cr)
precision recall f1-score support
0.0 0.65 0.78 0.71 5096
1.0 0.73 0.59 0.65 5096
accuracy 0.68 10192
macro avg 0.69 0.68 0.68 10192
weighted avg 0.69 0.68 0.68 10192
#classification report on validation set
cr=classification_report(y_val,y_val_pred)
print(cr)
precision recall f1-score support
0.0 0.87 0.78 0.82 1274
1.0 0.38 0.53 0.45 326
accuracy 0.73 1600
macro avg 0.63 0.66 0.63 1600
weighted avg 0.77 0.73 0.75 1600
# Generate a Classification Heatmap
report = classification_report(y_val, y_val_pred, output_dict=True)
df_report = pd.DataFrame(report).transpose()
plt.figure(figsize=(8, 5))
sns.heatmap(df_report.iloc[:-1, :-1], annot=True, cmap="YlGnBu", fmt=".2f")
plt.title("Classification Report Heatmap")
plt.show()
# Generate ROC Curve to show trade-off between True Positive Rate and False Positive Rate.
y_scores = model_3.predict(X_val).ravel()
fpr, tpr, thresholds = roc_curve(y_val, y_scores)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.2f}")
plt.plot([0, 1], [0, 1], linestyle='--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.grid(True)
plt.show()
50/50 [==============================] - 0s 1ms/step
Focus on the Business Goal: We're predicting customer churn (1.0 = exited) — so catching as many churners as possible is key.
That makes Recall for class 1.0 (churners) the most important metric to watch.
**This NN model using **Balanced Data with SMOTE and SGD ** is showing good results but not as good as the Adam Optimizer with Dropout.
For the Validation Set:
-
Class 0.0 (Stayed): Precision=0.87, Recall=0.78
-
Class 1.0 (Exited): Precision=0.38, Recall=0.53
-
AUC reflects the model's ability to rank churners above non-churners, and my AUC dropping from 0.76 → 0.73 shows worse ranking ability.
Confusion matrix
#Calculating the confusion matrix on training set
make_confusion_matrix(y_train_smote, y_train_pred)
#Calculating the confusion matrix on validation set
make_confusion_matrix(y_val,y_val_pred)
** Observations **
- With a recall of 53%, this model identifies over half of customers at risk of churn — without excessive false positives. Precision of 38% means predictions are reasonably accurate, helping target real churners more effectively. - Recall (for churners) = TP / (TP + FN) = 172 / (172 + 154) ≈ 0.53
- We're maintaining strong generalization and avoiding overfitting while maximizing churn recall.
Model 4 - Neural Network with Balanced Data (by applying SMOTE) and Adam Optimizer¶
- Now let's switch to a NN model with Balanced Data (by applying SMOTE) and Adam Optimizer
- one input layer with 32 neurons
- two hidden layers with 20 and 14 neurons respectively
- activation function of ReLU.
Let's apply SMOTE to balance this dataset and then again apply hyperparamter tuning accordingly.
sm = SMOTE(random_state=42)
# Fit SMOTE on the training data and create balanced versions
X_train_smote, y_train_smote = sm.fit_resample(X_train, y_train)
print('After UpSampling, the shape of train_X: {}'.format(X_train_smote.shape))
print('After UpSampling, the shape of train_y: {} \n'.format(y_train_smote.shape))
After UpSampling, the shape of train_X: (10192, 11) After UpSampling, the shape of train_y: (10192,)
Let's build a model with the balanced dataset
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
# Initializing the neural network
model_4 = Sequential()
# Input layer
model_4.add(Dense(32, activation="relu", input_dim=X_train_smote.shape[1]))
# First hidden layer
model_4.add(Dense(20, activation="relu"))
# Second hidden layer
model_4.add(Dense(14, activation="relu"))
# Output layer
model_4.add(Dense(1, activation="sigmoid"))
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001) # defining Adam as the optimizer to be used
metric = tf.keras.metrics.Recall()
model_4.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
model_4.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 32) 384
dense_1 (Dense) (None, 20) 660
dense_2 (Dense) (None, 14) 294
dense_3 (Dense) (None, 1) 15
=================================================================
Total params: 1353 (5.29 KB)
Trainable params: 1353 (5.29 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
# Fitting the ANN without class_weight since SMOTE is already applied
start = time.time()
history_4 = model_4.fit(X_train_smote, y_train_smote,
validation_data=(X_val, y_val),
batch_size=batch_size,
epochs=epochs)
end = time.time()
Epoch 1/25 160/160 [==============================] - 2s 5ms/step - loss: 0.6041 - recall: 0.6717 - val_loss: 0.5434 - val_recall: 0.7362 Epoch 2/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4916 - recall: 0.7867 - val_loss: 0.5117 - val_recall: 0.7975 Epoch 3/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4592 - recall: 0.7865 - val_loss: 0.4579 - val_recall: 0.7270 Epoch 4/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4452 - recall: 0.7920 - val_loss: 0.4291 - val_recall: 0.6810 Epoch 5/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4355 - recall: 0.7869 - val_loss: 0.4677 - val_recall: 0.7546 Epoch 6/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4285 - recall: 0.7963 - val_loss: 0.4410 - val_recall: 0.6994 Epoch 7/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4232 - recall: 0.8020 - val_loss: 0.4385 - val_recall: 0.6902 Epoch 8/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4163 - recall: 0.8020 - val_loss: 0.4397 - val_recall: 0.6963 Epoch 9/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4123 - recall: 0.8063 - val_loss: 0.4362 - val_recall: 0.7209 Epoch 10/25 160/160 [==============================] - 0s 3ms/step - loss: 0.4072 - recall: 0.8116 - val_loss: 0.4414 - val_recall: 0.7086 Epoch 11/25 160/160 [==============================] - 0s 3ms/step - loss: 0.4043 - recall: 0.8102 - val_loss: 0.4434 - val_recall: 0.7270 Epoch 12/25 160/160 [==============================] - 0s 3ms/step - loss: 0.4000 - recall: 0.8146 - val_loss: 0.4324 - val_recall: 0.6810 Epoch 13/25 160/160 [==============================] - 0s 3ms/step - loss: 0.3965 - recall: 0.8138 - val_loss: 0.4234 - val_recall: 0.6871 Epoch 14/25 160/160 [==============================] - 0s 3ms/step - loss: 0.3918 - recall: 0.8248 - val_loss: 0.4494 - val_recall: 0.7025 Epoch 15/25 160/160 [==============================] - 0s 3ms/step - loss: 0.3870 - recall: 0.8252 - val_loss: 0.4720 - val_recall: 0.7546 Epoch 16/25 160/160 [==============================] - 0s 3ms/step - loss: 0.3827 - recall: 0.8287 - val_loss: 0.4393 - val_recall: 0.6963 Epoch 17/25 160/160 [==============================] - 0s 2ms/step - loss: 0.3807 - recall: 0.8307 - val_loss: 0.4562 - val_recall: 0.7147 Epoch 18/25 160/160 [==============================] - 0s 2ms/step - loss: 0.3770 - recall: 0.8358 - val_loss: 0.4309 - val_recall: 0.6718 Epoch 19/25 160/160 [==============================] - 0s 2ms/step - loss: 0.3746 - recall: 0.8316 - val_loss: 0.4475 - val_recall: 0.6902 Epoch 20/25 160/160 [==============================] - 0s 2ms/step - loss: 0.3708 - recall: 0.8348 - val_loss: 0.4533 - val_recall: 0.6963 Epoch 21/25 160/160 [==============================] - 0s 2ms/step - loss: 0.3692 - recall: 0.8420 - val_loss: 0.4154 - val_recall: 0.6411 Epoch 22/25 160/160 [==============================] - 0s 2ms/step - loss: 0.3683 - recall: 0.8399 - val_loss: 0.4283 - val_recall: 0.6534 Epoch 23/25 160/160 [==============================] - 0s 2ms/step - loss: 0.3630 - recall: 0.8444 - val_loss: 0.4573 - val_recall: 0.6933 Epoch 24/25 160/160 [==============================] - 0s 2ms/step - loss: 0.3626 - recall: 0.8426 - val_loss: 0.4455 - val_recall: 0.6779 Epoch 25/25 160/160 [==============================] - 0s 2ms/step - loss: 0.3589 - recall: 0.8462 - val_loss: 0.4509 - val_recall: 0.6687
print("Time taken in seconds ",end-start)
Time taken in seconds 10.486416816711426
Loss Function
#Plotting Train Loss vs Validation Loss
plt.plot(history_4.history['loss'])
plt.plot(history_4.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
Recall
#Plotting Train recall vs Validation recall
plt.plot(history_4.history['recall'])
plt.plot(history_4.history['val_recall'])
plt.title('model recall')
plt.ylabel('Recall')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
-
Observations:
-
Training data continues up to the right, however my Recall for validation data is dropping increasing the gap between them meaning the model may be overfitting.
-
Our model is still catching ~65% of churners on validation even at the end, which is usable — just less stable.
#Predicting the results using 0.5 as the threshold for training set
y_train_pred = model_4.predict(X_train_smote)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
319/319 [==============================] - 0s 1ms/step
array([[ True],
[False],
[False],
...,
[ True],
[ True],
[ True]])
#Predicting the results using 0.5 as the threshold for validation set
y_val_pred = model_4.predict(X_val)
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 1ms/step
array([[False],
[False],
[False],
...,
[False],
[ True],
[ True]])
model_name = "NN with SMOTE and Adam"
train_metric_df.loc[model_name] = recall_score(y_train_smote, y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val, y_val_pred)
Classification Report
#Classification report on training set
cr = classification_report(y_train_smote, y_train_pred)
print(cr)
precision recall f1-score support
0.0 0.85 0.84 0.85 5096
1.0 0.84 0.85 0.85 5096
accuracy 0.85 10192
macro avg 0.85 0.85 0.85 10192
weighted avg 0.85 0.85 0.85 10192
#classification report on validation set
cr=classification_report(y_val,y_val_pred)
print(cr)
precision recall f1-score support
0.0 0.91 0.81 0.86 1274
1.0 0.48 0.67 0.56 326
accuracy 0.78 1600
macro avg 0.69 0.74 0.71 1600
weighted avg 0.82 0.78 0.80 1600
# Generate a Classification Heatmap
report = classification_report(y_val, y_val_pred, output_dict=True)
df_report = pd.DataFrame(report).transpose()
plt.figure(figsize=(8, 5))
sns.heatmap(df_report.iloc[:-1, :-1], annot=True, cmap="YlGnBu", fmt=".2f")
plt.title("Classification Report Heatmap")
plt.show()
# Generate ROC Curve to show trade-off between True Positive Rate and False Positive Rate.
y_scores = model_4.predict(X_val).ravel()
fpr, tpr, thresholds = roc_curve(y_val, y_scores)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.2f}")
plt.plot([0, 1], [0, 1], linestyle='--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.grid(True)
plt.show()
50/50 [==============================] - 0s 1ms/step
Focus on the Business Goal: We're predicting customer churn (1.0 = exited) — so catching as many churners as possible is key.
That makes Recall for class 1.0 (churners) the most important metric to watch.
**This NN model using **Balanced Data with SMOTE and Adam ** is showing improved results from SGD but not as good as the Adam Optimizer with Dropout above.
For the Validation Set:
-
Class 0.0 (Stayed): Precision=0.91, Recall=0.81
-
Class 1.0 (Exited): Precision=0.48, Recall=0.67
-
Recall = 0.67 → We're identifying 67% of actual churners.
-
Model 4 is an improvement in recall, precision, and F1 from Model 3.
-
AUC reflects the model's ability to rank churners above non-churners, and my AUC dropping from 0.73 → 0.83 shows improved ranking ability.
Confusion matrix
#Calculating the confusion matrix on training set
make_confusion_matrix(y_train_smote, y_train_pred)
#Calculating the confusion matrix on validation set
make_confusion_matrix(y_val,y_val_pred)
** Observations **
- With a recall of 67%, this model identifies 2/3rds of customers at risk of churn — without excessive false positives. Precision of 48% means predictions are reasonably accurate, helping target real churners more effectively. - Recall (for churners) = TP / (TP + FN) = 218 / (218 + 108) ≈ 0.67
- We're maintaining strong generalization and avoiding overfitting while maximizing churn recall.
Model 5 - Neural Network with Balanced Data (by applying SMOTE), Adam Optimizer, and Dropout¶
- Now let's switch to a NN model with Balanced Data (by applying SMOTE) and Adam Optimizer Plus Dropout
- one input layer with 32 neurons
- three hidden layers with 20 and 14 and 7 neurons respectively
- activation function of ReLU.
- Adam as the optimizer with dropout
Let's apply SMOTE to balance this dataset and then again apply hyperparamter tuning accordingly.
sm = SMOTE(random_state=42)
# Fit SMOTE on the training data and create balanced versions
X_train_smote, y_train_smote = sm.fit_resample(X_train, y_train)
print('After UpSampling, the shape of train_X: {}'.format(X_train_smote.shape))
print('After UpSampling, the shape of train_y: {} \n'.format(y_train_smote.shape))
After UpSampling, the shape of train_X: (10192, 11) After UpSampling, the shape of train_y: (10192,)
Let's build a model with the balanced dataset
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
# Initializing the neural network
model_5 = Sequential()
# Input layer + Dropout
model_5.add(Dense(32, activation="relu", input_dim=X_train_smote.shape[1]))
model_5.add(Dropout(0.3))
# First hidden layer + Dropout
model_5.add(Dense(20, activation="relu"))
model_5.add(Dropout(0.2))
# Second hidden layer + Dropout
model_5.add(Dense(14, activation="relu"))
model_5.add(Dropout(0.1))
# Third hidden layer (no Dropout here is fine too)
model_5.add(Dense(7, activation="relu"))
# Output layer
model_5.add(Dense(1, activation="sigmoid"))
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001) # defining Adam as the optimizer to be used
metric = tf.keras.metrics.Recall()
model_5.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
model_5.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 32) 384
dropout (Dropout) (None, 32) 0
dense_1 (Dense) (None, 20) 660
dropout_1 (Dropout) (None, 20) 0
dense_2 (Dense) (None, 14) 294
dropout_2 (Dropout) (None, 14) 0
dense_3 (Dense) (None, 7) 105
dense_4 (Dense) (None, 1) 8
=================================================================
Total params: 1451 (5.67 KB)
Trainable params: 1451 (5.67 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
# Fitting the ANN without class_weight since SMOTE is already applied
start = time.time()
history_5 = model_5.fit(X_train_smote, y_train_smote,
validation_data=(X_val, y_val),
batch_size=batch_size,
epochs=epochs)
end = time.time()
Epoch 1/25 160/160 [==============================] - 1s 3ms/step - loss: 0.6796 - recall: 0.6529 - val_loss: 0.6683 - val_recall: 0.7423 Epoch 2/25 160/160 [==============================] - 0s 2ms/step - loss: 0.5887 - recall: 0.6664 - val_loss: 0.5286 - val_recall: 0.7270 Epoch 3/25 160/160 [==============================] - 0s 2ms/step - loss: 0.5457 - recall: 0.7106 - val_loss: 0.4740 - val_recall: 0.7055 Epoch 4/25 160/160 [==============================] - 0s 2ms/step - loss: 0.5260 - recall: 0.7290 - val_loss: 0.4863 - val_recall: 0.7454 Epoch 5/25 160/160 [==============================] - 0s 2ms/step - loss: 0.5128 - recall: 0.7433 - val_loss: 0.4728 - val_recall: 0.7362 Epoch 6/25 160/160 [==============================] - 0s 2ms/step - loss: 0.5063 - recall: 0.7518 - val_loss: 0.4558 - val_recall: 0.7209 Epoch 7/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4998 - recall: 0.7677 - val_loss: 0.4652 - val_recall: 0.7362 Epoch 8/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4898 - recall: 0.7677 - val_loss: 0.4495 - val_recall: 0.7301 Epoch 9/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4904 - recall: 0.7639 - val_loss: 0.4785 - val_recall: 0.7577 Epoch 10/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4837 - recall: 0.7708 - val_loss: 0.4505 - val_recall: 0.7423 Epoch 11/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4787 - recall: 0.7688 - val_loss: 0.4602 - val_recall: 0.7485 Epoch 12/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4772 - recall: 0.7753 - val_loss: 0.4378 - val_recall: 0.7086 Epoch 13/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4744 - recall: 0.7749 - val_loss: 0.4435 - val_recall: 0.7178 Epoch 14/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4686 - recall: 0.7773 - val_loss: 0.4521 - val_recall: 0.7270 Epoch 15/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4724 - recall: 0.7684 - val_loss: 0.4466 - val_recall: 0.7147 Epoch 16/25 160/160 [==============================] - 0s 3ms/step - loss: 0.4660 - recall: 0.7786 - val_loss: 0.4393 - val_recall: 0.7178 Epoch 17/25 160/160 [==============================] - 1s 3ms/step - loss: 0.4668 - recall: 0.7765 - val_loss: 0.4465 - val_recall: 0.7117 Epoch 18/25 160/160 [==============================] - 0s 3ms/step - loss: 0.4629 - recall: 0.7875 - val_loss: 0.4296 - val_recall: 0.7025 Epoch 19/25 160/160 [==============================] - 1s 3ms/step - loss: 0.4597 - recall: 0.7947 - val_loss: 0.4293 - val_recall: 0.7209 Epoch 20/25 160/160 [==============================] - 1s 3ms/step - loss: 0.4587 - recall: 0.7879 - val_loss: 0.4450 - val_recall: 0.7393 Epoch 21/25 160/160 [==============================] - 0s 3ms/step - loss: 0.4554 - recall: 0.7867 - val_loss: 0.4346 - val_recall: 0.7209 Epoch 22/25 160/160 [==============================] - 1s 3ms/step - loss: 0.4569 - recall: 0.7834 - val_loss: 0.4329 - val_recall: 0.7362 Epoch 23/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4573 - recall: 0.7898 - val_loss: 0.4498 - val_recall: 0.7577 Epoch 24/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4565 - recall: 0.7947 - val_loss: 0.4267 - val_recall: 0.7055 Epoch 25/25 160/160 [==============================] - 0s 2ms/step - loss: 0.4526 - recall: 0.7955 - val_loss: 0.4472 - val_recall: 0.7454
print("Time taken in seconds ",end-start)
Time taken in seconds 11.28977108001709
Loss Function
#Plotting Train Loss vs Validation Loss
plt.plot(history_5.history['loss'])
plt.plot(history_5.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
Recall
#Plotting Train recall vs Validation recall
plt.plot(history_5.history['recall'])
plt.plot(history_5.history['val_recall'])
plt.title('model recall')
plt.ylabel('Recall')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
-
Observations:
-
Training data continues up to the right, and my Validation Recall is remaining fairly stable around 75%.
-
Our model is catching ~75% of churners on validation.
-
Dropout seemed to help generalization and reduced overfitting from Model 4.
#Predicting the results using 0.5 as the threshold for training set
y_train_pred = model_5.predict(X_train_smote)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
319/319 [==============================] - 1s 2ms/step
array([[ True],
[False],
[False],
...,
[ True],
[ True],
[ True]])
#Predicting the results using 0.5 as the threshold for validation set
y_val_pred = model_5.predict(X_val)
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 1ms/step
array([[False],
[False],
[False],
...,
[False],
[ True],
[ True]])
model_name = "NN with SMOTE and Adam Plus Dropout"
train_metric_df.loc[model_name] = recall_score(y_train_smote, y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val, y_val_pred)
Classification Report
#Classification report on training set
cr = classification_report(y_train_smote, y_train_pred)
print(cr)
precision recall f1-score support
0.0 0.81 0.81 0.81 5096
1.0 0.81 0.81 0.81 5096
accuracy 0.81 10192
macro avg 0.81 0.81 0.81 10192
weighted avg 0.81 0.81 0.81 10192
#classification report on validation set
cr=classification_report(y_val,y_val_pred)
print(cr)
precision recall f1-score support
0.0 0.92 0.80 0.86 1274
1.0 0.49 0.75 0.59 326
accuracy 0.79 1600
macro avg 0.71 0.77 0.72 1600
weighted avg 0.84 0.79 0.80 1600
# Generate a Classification Heatmap
report = classification_report(y_val, y_val_pred, output_dict=True)
df_report = pd.DataFrame(report).transpose()
plt.figure(figsize=(8, 5))
sns.heatmap(df_report.iloc[:-1, :-1], annot=True, cmap="YlGnBu", fmt=".2f")
plt.title("Classification Report Heatmap")
plt.show()
# Generate ROC Curve to show trade-off between True Positive Rate and False Positive Rate.
y_scores = model_5.predict(X_val).ravel()
fpr, tpr, thresholds = roc_curve(y_val, y_scores)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.2f}")
plt.plot([0, 1], [0, 1], linestyle='--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.grid(True)
plt.show()
50/50 [==============================] - 0s 1ms/step
Focus on the Business Goal: We're predicting customer churn (1.0 = exited) — so catching as many churners as possible is key.
That makes Recall for class 1.0 (churners) the most important metric to watch.
This NN model using **Balanced Data with SMOTE and Adam Plus Dropout is showing improved results from Adam without Dropout.
For the Validation Set:
-
Class 0.0 (Stayed): Precision=0.92, Recall=0.80
-
Class 1.0 (Exited): Precision=0.49, Recall=0.75
-
Recall = 0.75 → We're identifying 75% or 3 out of 4 of actual churners.
-
Model 5 is an improvement in recall, precision, and F1 from Model 4 and shows better improvement.
-
AUC reflects the model's ability to rank churners above non-churners, and my AUC dropping from 0.83 → 0.86 shows improved ranking ability as well.
Confusion matrix
#Calculating the confusion matrix on training set
make_confusion_matrix(y_train_smote, y_train_pred)
#Calculating the confusion matrix on validation set
make_confusion_matrix(y_val,y_val_pred)
** Observations **
- With a recall of 75%, this model identifies 3 out of 4 of customers at risk of churn — without excessive false positives. Precision of 49% means predictions are reasonably accurate, helping target real churners more effectively. - Recall (for churners) = TP / (TP + FN) = 243 / (218 + 108) ≈ 0.75
- We're maintaining strong generalization and avoiding overfitting while maximizing churn recall.
- Model 5 is the most balanced. It identifies 75% of churners while keeping false alerts relatively low, with nearly 50% precision. Overall accuracy is high at 79%, and the model generalizes well to unseen data.
- Model 3 with Adam Optimizer, Class Weights, and Dropout had the highest Recall Score of 80%.
Model Performance Comparison and Final Model Selection¶
print("Training performance comparison")
train_metric_df
Training performance comparison
| recall | |
|---|---|
| NN with SGD | 0.480828 |
| NN with Adam | 0.773773 |
| NN with Adam and Dropout | 0.804448 |
| NN with SMOTE and SGD | 0.586146 |
| NN with SMOTE and Adam | 0.853218 |
| NN with SMOTE and Adam Plus Dropout | 0.806319 |
print("Validation set performance comparison")
valid_metric_df
Validation set performance comparison
| recall | |
|---|---|
| NN with SGD | 0.487730 |
| NN with Adam | 0.757669 |
| NN with Adam and Dropout | 0.803681 |
| NN with SMOTE and SGD | 0.527607 |
| NN with SMOTE and Adam | 0.668712 |
| NN with SMOTE and Adam Plus Dropout | 0.745399 |
# Create a Barplot of the above Validation Data performance.
# Define model comparison data
data = {
"Model": [
"Model 0: NN using SGD + Class Weights",
"Model 1: NN using Adam + Class Weights",
"Model 2: NN using Adam + Class Weights + Dropout",
"Model 3: NN using Balanced Data SMOTE + SGD",
"Model 4: NN using Balanced Data SMOTE + Adam",
"Model 5: NN using Balanced Data SMOTE + Adam + Dropout"
],
"Recall (Churn)": [0.49, 0.76, 0.80, 0.53, 0.67, 0.75]
}
# Create DataFrame
df_models = pd.DataFrame(data)
# Set seaborn style
sns.set(style="whitegrid")
# Identify the index of the best model
best_model_idx = df_models["Recall (Churn)"].idxmax()
# Create the barplot
plt.figure(figsize=(10, 6))
barplot = sns.barplot(x="Recall (Churn)", y="Model", data=df_models, palette="Blues_d")
# Highlight the best model in orange
barplot.patches[best_model_idx].set_color('orange')
# Add text labels to bars
for i, p in enumerate(barplot.patches):
width = p.get_width()
plt.text(width + 0.01, p.get_y() + p.get_height() / 2,
f'{width:.2f}', va='center')
# Final plot adjustments
plt.title("Recall (Churn) by Model", fontsize=14)
plt.xlabel("Recall (Churn)")
plt.ylabel("Model")
plt.tight_layout()
plt.show()
# Final model metrics
data = {
"Model": [
"M0: SGD + CW",
"M1: Adam + CW",
"M2: Adam + CW + Dropout",
"M3: SMOTE + SGD",
"M4: SMOTE + Adam",
"M5: SMOTE + Adam + Dropout"
],
"Recall": [0.49, 0.76, 0.80, 0.53, 0.67, 0.75 ],
"Precision": [0.22, 0.46, 0.44, 0.38, 0.48, 0.49 ],
"F1 Score": [0.30, 0.57, 0.57, 0.45, 0.56, 0.59 ],
"AUC": [0.52, 0.85, 0.84, 0.73, 0.83, 0.86 ]
}
# Create DataFrame
df_metrics = pd.DataFrame(data)
df_metrics.set_index("Model", inplace=True)
# Plot grouped bar chart
ax = df_metrics.plot(kind="bar", figsize=(12, 6), colormap="Set2", edgecolor="black")
# Styling
plt.title("Final Model Comparison Across Metrics", fontsize=14)
plt.ylabel("Score")
plt.ylim(0, 1)
plt.grid(axis='y')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.legend(title="Metric")
plt.show()
Observations:
- Model 2: NN using Adam + Class Weights + Dropout had the best Recall value of 80%.
train_metric_df - valid_metric_df
| recall | |
|---|---|
| NN with SGD | -0.006902 |
| NN with Adam | 0.016104 |
| NN with Adam and Dropout | 0.000767 |
| NN with SMOTE and SGD | 0.058539 |
| NN with SMOTE and Adam | 0.184507 |
| NN with SMOTE and Adam Plus Dropout | 0.060920 |
Observations:
- Model 2: NN using Adam + Class Weights + Dropout had the smallest gap between training and validation sets of 0.000767.
- This model showed the best Generalization.
y_test_pred = model_2.predict(X_test)
y_test_pred = (y_test_pred > 0.5)
print(y_test_pred)
63/63 [==============================] - 0s 2ms/step [[False] [False] [False] ... [ True] [False] [False]]
#lets print classification report
cr=classification_report(y_test,y_test_pred)
print(cr)
precision recall f1-score support
0.0 0.93 0.73 0.82 1593
1.0 0.43 0.80 0.56 407
accuracy 0.74 2000
macro avg 0.68 0.76 0.69 2000
weighted avg 0.83 0.74 0.77 2000
Observations:
- Recall is remaining steady at 80% on the Test Set.
- Precision is also remaining steady at 43% for the Churn Class.
- The model seems to be generalizing well
#Calculating the confusion matrix
make_confusion_matrix(y_test,y_test_pred)
# Get predicted probabilities for the positive class (churn)
y_test_pred_proba = model_2.predict(X_test).ravel()
# Compute FPR, TPR, and thresholds
fpr, tpr, thresholds = roc_curve(y_test, y_test_pred_proba)
roc_auc = auc(fpr, tpr)
# Plot the ROC Curve
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC Curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', linestyle='--', label='Random Guessing')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate (Recall)')
plt.title('ROC Curve - Test Set')
plt.legend(loc='lower right')
plt.grid(True)
plt.tight_layout()
plt.show()
63/63 [==============================] - 0s 1ms/step
precision, recall, thresholds = precision_recall_curve(y_test, y_test_pred_proba)
plt.figure(figsize=(8, 6))
plt.plot(recall, precision, color="teal", lw=2)
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.title("Precision-Recall Curve - Test Set")
plt.grid(True)
plt.tight_layout()
plt.show()
Observations
- With a recall of 80%, this model identifies more than 3 out of 4 customers at risk of churn — without excessive false positives. Precision of 43% meaning predictions are reasonably accurate, helping target real churners more effectively. - Recall (for churners) = TP / (TP + FN) = 325 / (325 + 80) ≈ 0.80
- We're maintaining strong generalization and avoiding overfitting while maximizing churn recall.
- The ROC Curve shows an AUC of .85.
- The ROC curve confirms that our model can consistently rank churners above non-churners with 85% confidence. This supports the model’s utility in prioritizing outreach efforts based on predicted risk.
- The Precision-Recall Curve shows the expected inverse relationship.
- As recall increases, precision drops.
- We can capture more churners, but at the cost of more false positives
- We can choose a threshold that balances recall vs. precision depending on our business objective.
- For example: if we want ~70% recall, this curve shows we'll get ~60% precision.
- At ~80% recall, precision drops to ~43%, which aligns with our reported score.
Actionable Insights and Business Recommendations¶
Observations
- This model catches 80% of churners and flags them with 43% precision, making it an effective churn prediction engine. Despite some false positives, the model ensures the bank can intervene early on 8 out of 10 customers likely to leave, significantly reducing potential revenue loss.
Key Business Insights from Churn Prediction Analysis¶
-
The final model (NN with Adam + Dropout) achieved 80% recall on churners and an AUC of 0.83, indicating strong separation between churners and loyal customers.
-
This means 8 out of 10 customers who will churn can be flagged early — a major opportunity for targeted retention efforts.
False Alarms Are Reasonable for a Recall-Focused Strategy¶
-
While precision is ~44%, it may be acceptable in scenarios like churn prevention where missing a churner is more costly than wrongly predicting churn.
-
The model favors catching at-risk customers, even if some loyal customers are mistakenly flagged.
Recommendations for the Business¶
- Deploy the Model for Retention Campaign Targeting
- Use the model to score customers weekly/monthly
- Prioritize the top 20–30% highest churn risk scores for retention outreach
- Segment high-risk churners into A/B groups to test
- Create personalized retention offers to the high risk churn group.
- Integrate Model Into CRM or Customer Analytics
- Alert customer success teams when a customer crosses a churn risk threshold
- Retrain the model every 3–6 months to reflect changing customer behavior and campaign effectiveness
# Converting notebook to .html format for upload:
# Step 1: Copy the notebook locally
!cp '/content/drive/MyDrive/Colab_Notebooks/mod4_proj4_Full_code_Thomas_Hall.ipynb' "/content/"
# Step 2: Convert to HTML
!jupyter nbconvert --to html '/content/mod4_proj4_Full_code_Thomas_Hall.ipynb'